I'm currently setting up some workloads using a hub and spoke architecture on Azure. In this setup, I have an Azure firewall and Private Endpoints (PE) for my storage accounts located within the hub virtual network (VNet). My Databricks workspace is in different spoke VNets. I've established peering between the hub and spoke VNets and can access the storage accounts from Databricks. However, I want to limit access to only a few specific storage accounts. While researching, I found that traffic from Databricks to the storage account Private Endpoints does not go through the firewall by default. To change this, I enabled network policies for the private endpoint subnet and created routes to ensure traffic goes through the firewall, along with an allow rule to permit selected private endpoint IPs while denying others. Unfortunately, now I can't access those allowed storage accounts at all from Databricks. Can anyone provide guidance on how to resolve this issue?
5 Answers
Are the private endpoints for your storage accounts located in the same VNet? If they are, the default routing rules should direct traffic from Databricks to the subnet of the private endpoint with no issues.
I recommend using service endpoints on your Databricks subnets, as they often come enabled by default. This way, they respect your private endpoints setup. If you're processing large models in Databricks, consider placing critical private endpoints in the same VNet to avoid routing through the firewall. Typically, we don't put private endpoints in the Hub; instead, they should be within the workload VNet (like having the Databricks storage private endpoint in the Databricks VNet) and secured with Network Security Groups (NSGs). Also, be mindful that Databricks can consume a lot of bandwidth, so careful monitoring is key to prevent network congestion.
The reason for placing storage endpoints in the Hub is because multiple Databricks VNets in different spokes connect to the same storage account, so it makes sense to centralize the storage PEs there.
Check your route rules in the Databricks subnet. Be sure to add the private IP range to the firewall rules and make sure the firewall allows connections from the Databricks subnet to your private endpoints.
I've done that. Should I also add routes to the PE subnet for return traffic?
To manage network traffic effectively, keep private endpoints direct rather than routing them through the firewall. You can control access with DNS linking or by setting network rules on the storage account.
My rate is $150 per hour if you're interested in some professional help. I'm well-versed in Hub and Spoke configurations.
Thanks for the offer, but I'm not looking for paid consultations at the moment.

Yes, the PEs for the storage accounts and the firewall are in the hub VNet but in different subnets. Databricks is in a different spoke VNet.