Hey everyone! I'm struggling with a network control issue in my Google Kubernetes Engine (GKE) cluster, and I would love to hear your suggestions.
I'm trying to prevent the majority of my pods from accessing the GCP metadata server IP (`169.254.169.254`), allowing access only to a few specific pods. My main requirement is to block this access at the **network level** and ensure it can't be bypassed by using different hostnames.
I've tried two major approaches:
1. **Using Istio:** I set up `VirtualServices` and `AuthorizationPolicies` to block requests to known metadata hostnames like `metadata.google.internal`, but it doesn't work if someone crafts a request with a different fully qualified domain name (FQDN) that they mapped to `169.254.169.254`.
2. **Implementing Calico:** I enabled Calico for the cluster and created a `GlobalNetworkPolicy` to deny egress traffic to `169.254.169.254/32`. However, when I applied a broader policy, connectivity issues arise with the pods. If I narrow the policy down to a namespace, it successfully blocks access to other random IPs, but requests to `169.254.169.254` still get through.
My main challenge is finding a way to block packets heading to this IP address from all pods, except those explicitly permitted to do so, regardless of the port or hostname used. It's essential to prevent any bypassing of this rule.
Has anyone successfully managed to enforce such a strict IP block for the GCP metadata server in GKE? Any insights on troubleshooting why Calico might be failing with this specific IP for HTTP traffic would be appreciated! Thanks!
7 Answers
Just a thought—how are you verifying which pods are allowed access? If it's IP-based, someone could spoof that. Maybe implementing mutual TLS (mTLS) on both ends would provide a logical security check? But keep in mind that access at the cluster level can undermine this control if someone has elevated permissions. Overall, securing metadata access can be a complex challenge in cloud environments.
You could try utilizing a CNI that supports NetworkPolicies; it may help you implement the block effectively. Also, you might want to check out the egress traffic monitoring with Istio. That could give some insights on how to handle this kind of policy intervention. Just ensure your networking setup aligns with the recommendations in the docs!
You might find that using an eBPF tc probe to drop packets destined for that IP could do the trick. It can provide a way to enforce the block without relying on other layers where issues might arise.
Definitely let us know what solution you end up with! I opted for a NetworkPolicy in every namespace to specifically block access to that IP, except for the namespace that is allowed. It’s a bit tedious since I'd need to set one up for each new namespace, though.
I haven’t tackled this in GCP, but when I was working in AWS, we would redirect requests to that IP at the Node level to avoid exposure through a cluster proxy service. While I don’t have direct experience with the K8s networking layer, I think implementing Workload Identity might help. It offers alternatives to metadata concealment and could secure access to sensitive metadata.
Some users have had luck with eBPF tools like Cilium, offering more granular traffic management. Another route could be to use a proxy init container to restrict traffic unless explicitly permitted. These could give you more control over the egress situation within your pods.
If you use Cilium, the nodes will perform Source Network Address Translation (SNAT) for outgoing connections unless stated otherwise in your configuration settings. Make sure to set "masqLinkLocal" to False in the IP masquerade agent settings—this might just solve your issue!
Have you considered using Kyverno for policy management? It could streamline the process when you scale up!