Hey everyone, I built a FastAPI application that functions as a RAG summarizer using the vLLM inference engine. When I run the app directly in the terminal with the `uvicorn` command, the outputs are exactly as I expect. However, once I create a Docker image and hit the same endpoint, the outputs change dramatically, even though I haven't modified any code. Both environments are set up on Ubuntu and the paths should be identical. Can anyone shed some light on why this could be happening? Here's a snippet of my Dockerfile for reference.
```dockerfile
FROM python:3.12-bullseye
RUN apt-get update && apt-get install -y
wkhtmltopdf
fontconfig
libfreetype6
libx11-6
libxext6
libxrender1
curl
ca-certificates
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
RUN update-ca-certificates
WORKDIR /app
COPY requirements.txt /app/
RUN pip install --upgrade -r requirements.txt
COPY ./models/models--sentence-transformers--all-mpnet-base-v2/snapshots/12e86a3c702fc3c50205a8db88f0ec7c0b6b94a0 /app/sentence-transformers/all-mpnet-base-v2
COPY . /app/
EXPOSE 8010
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8010"]
```
3 Answers
It sounds like there may be subtle differences between your local Ubuntu environment and the Docker image based on Debian. Even if you've explicitly stated package versions in your requirements file, the base image could have variations in other system libraries or configurations that impact the output. Check if there are any environment variables in your local setup that might not exist in the Docker setup, as those could influence your app's behavior too.
When you mention the output changing, are you getting errors or just unexpected results? If it’s the latter, check if you're using the same parameters like temperature and top_k settings, as they can sometimes lead to entirely different outputs. Also, ensure you're in sync with seeding; even small differences could produce variations.
Great point! I did specify parameters carefully, but I still see discrepancies in the output. I'll double-check those settings and ensure I'm consistently using the same seed too.
Have you given the Docker container access to your GPU? If your app relies on CUDA, that could definitely affect your outputs. While it shouldn't matter much, ensuring your setup mirrors the local environment as closely as possible is essential.
Yes, I've set up the NVIDIA runtime for Docker, so the container has access to CUDA and the necessary libraries. I'm puzzled as it seems everything should be compatible!
That makes sense! I've noticed that discrepancies in base images can lead to changes, even when everything else seems the same. It's worth looking into the environment variables, they can subtly change how your application behaves.