Issues with Traffic from Databricks to Private Endpoints in Azure

0
15
Asked By CloudCrafter42 On

I'm currently setting up some workloads using a hub and spoke architecture on Azure. In this setup, I have an Azure firewall and Private Endpoints (PE) for my storage accounts located within the hub virtual network (VNet). My Databricks workspace is in different spoke VNets. I've established peering between the hub and spoke VNets and can access the storage accounts from Databricks. However, I want to limit access to only a few specific storage accounts. While researching, I found that traffic from Databricks to the storage account Private Endpoints does not go through the firewall by default. To change this, I enabled network policies for the private endpoint subnet and created routes to ensure traffic goes through the firewall, along with an allow rule to permit selected private endpoint IPs while denying others. Unfortunately, now I can't access those allowed storage accounts at all from Databricks. Can anyone provide guidance on how to resolve this issue?

5 Answers

Answered By NetworkWhiz On

Are the private endpoints for your storage accounts located in the same VNet? If they are, the default routing rules should direct traffic from Databricks to the subnet of the private endpoint with no issues.

CloudCrafter42 -

Yes, the PEs for the storage accounts and the firewall are in the hub VNet but in different subnets. Databricks is in a different spoke VNet.

Answered By TechSavvyNerd On

I recommend using service endpoints on your Databricks subnets, as they often come enabled by default. This way, they respect your private endpoints setup. If you're processing large models in Databricks, consider placing critical private endpoints in the same VNet to avoid routing through the firewall. Typically, we don't put private endpoints in the Hub; instead, they should be within the workload VNet (like having the Databricks storage private endpoint in the Databricks VNet) and secured with Network Security Groups (NSGs). Also, be mindful that Databricks can consume a lot of bandwidth, so careful monitoring is key to prevent network congestion.

DataDrivenDude -

The reason for placing storage endpoints in the Hub is because multiple Databricks VNets in different spokes connect to the same storage account, so it makes sense to centralize the storage PEs there.

Answered By SubnetStrategist On

Check your route rules in the Databricks subnet. Be sure to add the private IP range to the firewall rules and make sure the firewall allows connections from the Databricks subnet to your private endpoints.

CloudCrafter42 -

I've done that. Should I also add routes to the PE subnet for return traffic?

Answered By CloudGuru88 On

To manage network traffic effectively, keep private endpoints direct rather than routing them through the firewall. You can control access with DNS linking or by setting network rules on the storage account.

Answered By ConsultingPro On

My rate is $150 per hour if you're interested in some professional help. I'm well-versed in Hub and Spoke configurations.

CloudCrafter42 -

Thanks for the offer, but I'm not looking for paid consultations at the moment.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.