I resized my server to a Standard NV72ads A10 v5 configuration, which includes two GPUs. However, I'm noticing that one GPU is maxed out at 100% usage while the other one is only doing 5-30%. I'm curious if there's a way to balance the workload between these two GPUs or if I should even be concerned if one is running at full capacity. I'm using Style3D, a 3D design application, and currently, there are 4-5 concurrent users accessing the server.
2 Answers
It's possible that your application is only utilizing one GPU for processing. Many applications have settings that dictate which GPU to use, so it might be ignoring the second one. If that's the case, you might want to check how your app handles GPU selection for processing. Sometimes it helps to explicitly assign workloads to each GPU, especially in scenarios with multiple users.
Balancing the load might not be necessary here. However, you have a few options: you can enable GPU hardware scheduling, or you can assign specific users to particular GPUs using the CUDA_VISIBLE_DEVICES environment variable. If you really want a smoother experience, consider switching to either Standard_NV36ads_A10_v5 or Standard_NV18ads_A10_v5 for single GPU usage per user; it might simplify things for you.
Appreciate the advice! I'll look into the assignments.

But how do I implement that for multiple users? It seems to only work well with one.