I'm building a grading automation tool for programming assignments, and I need to run each student's submission in a separate isolated virtual environment (venv). Each submission has a requirements.txt file that lists the necessary dependencies. I've tried using subprocess to create a venv for each submission, but it's super slow when dealing with over 200 of them. I've also attempted using multiprocessing, but it's still taking too long. Recently, I tried creating a base virtual environment and cloning it for each student, but it ends up including dependencies that are only supposed to belong to individual projects. I suspect there might be issues with shared paths or references being copied. Is there a faster and safer method to clone or set up a virtual environment for each student without impacting the original base environment?
5 Answers
Do you need to create all the virtual environments ahead of time? If not, you could automate creating the venvs as you go while grading, then delete them later if needed. You may also consider a bash script that navigates each submission folder, creates venvs, and installs the requirements. Also, could you clarify the structure of the submissions? Are they separated by directories for each student or are they all in one place?
Have you thought about using more subprocesses in parallel? You mentioned you attempted it, but it takes a while. It could be worth revisiting or optimizing. Honestly, it's almost like the venvs could be done by the time you finish asking the question!
I would recommend writing a shell script that leverages `uv`. It should only take a few minutes to create 200 virtual environments and install all the dependencies. It’s a solid way to automate the process without the overhead you're experiencing.
Thanks for the tip! Can you share more details or documentation on how to do that? I'm not very familiar with shell scripts.
You might want to consider using a template virtual environment that's common for all students. If most of the submissions are using similar libraries, having them agree on a defined venv could save time. You could scan all the requirements files to see similarities and build a master venv that everyone can use as a base. This way, if anyone needs something different, they can suggest changes to the approved environment.
So you suggest creating one venv with common dependencies and comparing requirements.txt files? Sounds interesting!
How flexible are you with project formats? If you can use something like [poetry](https://python-poetry.org), it could simplify your venv management drastically. Poetry handles it all for you plus it has a centralized cache which speeds things up.
I did use multiprocessing, but it still takes about 25 minutes. It’s part of a grading program that needs to be fast, so I'm looking for better solutions.