How to Deploy a Docker Swarm Service on Multiple GPU Instances?

0
6
Asked By TechWhiz99 On

I'm currently running a service on a single GPU instance with Docker Swarm, and now I need to test deploying it on multiple GPU instances. However, I'm running into some issues. It seems like Docker Swarm is only starting one container and leaving all other GPUs idle, or it's starting all resources on the same GPU. I'm not sure if Swarm can handle this configuration at all. I made some adjustments to my Docker daemon.json by configuring the nvidia runtime to prevent any issues:

`nvidia-ctk runtime configure --runtime=docker`

Then I restarted Docker with:

`systemctl restart docker`

Here's part of my service defined in the stack:

```
worker:
image: image:tag
deploy:
replicas: 2
resources:
reservations:
generic_resources:
- discrete_resource_spec:
kind: 'NVIDIA-GPU'
value: 1
environment:
- NATS_URL=nats://nats:4222
command: >
bash -c "
cd apps/inferno &&
python3 -m process"
networks:
- net1
```

However, with this setup, it looks like both containers are using the same GPU according to nvidia-smi output. Has anyone faced similar issues or have an idea of what I might be missing?

1 Answer

Answered By CodeMaster23 On

I haven't tried this myself, but looking at Docker Compose specifications, you might want to explore using "resources -> devices -> capabilities -> device_ids". Another option is to create separate services instead of just replicas of the same service.

GadgetGuru77 -

You can also run it with command line flags if you're avoiding Docker Compose. For example, use:

`docker run --gpus "device=0" your-image-name`

`docker run --gpus "device=1" your-image-name`
That should allow you to specify which GPU each container uses.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.