Hey everyone, I built a FastAPI application that functions as a RAG summarizer using the vLLM inference engine. When I run the app directly in the terminal with the `uvicorn` command, the outputs are exactly as I expect. However, once I create a Docker image and hit the same endpoint, the outputs change dramatically, even though I haven't modified any code. Both environments are set up on Ubuntu and the paths should be identical. Can anyone shed some light on why this could be happening? Here's a snippet of my Dockerfile for reference.
```dockerfile
FROM python:3.12-bullseye
RUN apt-get update && apt-get install -y
wkhtmltopdf
fontconfig
libfreetype6
libx11-6
libxext6
libxrender1
curl
ca-certificates
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
RUN update-ca-certificates
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --upgrade -r requirements.txt
COPY ./models/models--sentence-transformers--all-mpnet-base-v2/snapshots/12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 /app/sentence-transformers/all-mpnet-base-v2
COPY . /app/
EXPOSE 8010
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8010"]
```
1 Answer
Have you given the Docker container access to your GPU? If your app relies on CUDA, that could definitely affect your outputs. While it shouldn't matter much, ensuring your setup mirrors the local environment as closely as possible is essential.
Yes, I've set up the NVIDIA runtime for Docker, so the container has access to CUDA and the necessary libraries. I'm puzzled as it seems everything should be compatible!