Best Practices for Setting Up a Multi-User K8s Cluster for a Research Team?

0
0
Asked By TechyExplorer92 On

I'm working with a small research group that needs help setting up a private cluster. We have a storage server with 48 TB of space and several V100 GPU servers linked through gigabit Ethernet. We're not using InfiniBand or a parallel file system, and the main focus will be on model training with easy access through Jupyter Notebooks.

I'm considering deploying a lightweight Kubernetes cluster using k3s. Here's what I have planned so far:
- Keycloak for authentication
- Harbor for managing images
- MinIO for object storage with policy-based user data isolation.

However, I have some unresolved questions:
1. What's the best choice for job orchestration? Should I use Argo Workflows, Flyte, or something else?
2. How can I implement resource scheduling that enforces per-user limits and job priorities, similar to how Slurm functions?
3. Any tips on creating an HPC-like user experience, perhaps a qsub-style job submission?

I have some experience with deploying apps on Kubernetes but no experience in managing it as a shared compute cluster. Any advice would be greatly appreciated!

1 Answer

Answered By DataNerd123 On

I faced a similar challenge at the University of Turin with a system called Dossier, which is a multi-tenant Jupyter Notebook as a Service. You might want to look into that for inspiration!

FriendlyCurious -

That sounds interesting! Do you know if there's any documentation for Dossier that outlines its key components? I’d love to dive deeper.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.