I'm currently running machine learning inference workloads in a Kubernetes environment. Right now, I'm using namespaces and network policies for tenant isolation, but recently, customer contracts require proof of hardware-level data isolation. The existing namespaces only provide logical separation, and I'm concerned that if someone compromises the node, they could potentially access other tenants' data.
We've explored options like Kata Containers for VM-level isolation, but the performance overhead is significant, and we lose out on some Kubernetes features. gVisor has similar tradeoffs too. I'm curious about what solutions others are using for genuine hardware isolation within Kubernetes. Is this issue even a solved problem, or are we looking at the possibility of moving away from Kubernetes altogether?
6 Answers
You might not need to abandon Kubernetes. Instead, consider allocating dedicated bare metal hardware for each tenant's cluster. It could be more secure, but yes, it can also get pricey.
Have you considered creating dedicated node pools with tainted nodes? It’s not a perfect solution, but it provides better isolation than a multi-tenant setup.
If hardware isolation is a revenue-generating requirement for you, I’d also explore some paid solutions. vCluster might not fit your exact needs, but their backers offer multi-tenant specific solutions which could be interesting. There's also cutting-edge tech like what VNode provides, focused on true hardware-level separation.
The issue of Kubernetes isolation is becoming more significant due to stricter security requirements. You’re spot on—namespaces offer limited protection. Have you looked into confidential containers that use hardware for isolating workloads even from the host OS? It takes a bit of setup and special node configurations, but it’s feasible. We eventually transitioned our sensitive workloads outside of Kubernetes to achieve true hardware isolation, using confidential VMs that Kubernetes interacts with through APIs. It’s more complex but much safer.
Simply labeling nodes and using selectors might not meet your needs. Binding clients to specific hardware often complicates cloud scaling, but it’s worth exploring if you want to hasten your data center expansion!
One way to tackle this is by using a mutating admission controller. You could enforce the presence of a `nodeSelector` for any pods within isolated namespaces. If you've set up logical isolation already, employing nodeSelectors that correspond to those namespaces and labeling the nodes accordingly could help. With tools like cluster autoscaler, you can even dynamically manage nodes for each tenant's namespace.

Related Questions
How to Build a Custom GPT Journalist That Posts Directly to WordPress
Fix Not Being Able To Add New Categories With Intuitive Category Checklist For Wordpress
Get Real User IP Without Installing Cloudflare Apache Module
How to Get Total Line Count In Visual Studio 2013 Without Addons
Install and Configure PhpMyAdmin on Centos 7
How To Setup PostfixAdmin With Dovecot and Postfix Virtual Mailbox