I'm consistently running into major issues with GPU Python packaging. Despite using tools like Docker, Poetry, uv, virtual environments, and lockfiles, I'm still often left compiling from source and fixing dependency conflicts when working with AI and native Python libraries. The core issue isn't just about basic Python packaging anymore; it revolves around compatibility across native and CUDA packages. Surprisingly, there aren't precompiled wheels available for many combinations you'd expect to work, which is frustrating. I often spend more time juggling versions of Python, PyTorch, CUDA, NumPy, and various dependencies trying to find the right setup that installs cleanly.
When it doesn't work, I'm back to compiling from source and hoping for the best. This has cost me a lot of time (and money) when working with expensive hardware like the H100. I realize that not every environment can be supported forever, but there are common setups that many of us encounter regularly, like standard Colab runtimes or popular Ubuntu/CUDA/PyTorch stacks. While Python tooling has improved, there's still significant friction at the native/CUDA level. Environment management might be better now, but once you stray from the smooth path, you're left with fragile builds and complex version issues. I just feel like there's still a lot of room for improvement, especially in the area of wheel coverage to make these common setups more reliable. Just venting about the struggle!
5 Answers
You're right that the tools have limitations, and the wheel format itself poses challenges, especially when it comes to compatibility metadata for things like BLAS libraries or what version of CUDA is needed. While uv is helpful, it can be a bit tricky if you're on Windows since it can end up installing the CPU version of PyTorch instead of the one that matches your CUDA setup. There's also a solid initiative at wheelnext.dev that might help, especially if PEP 817 (and its divide into new wheel variants) gets traction. That could really enhance support across various systems!
I'll look up the PEPs you mentioned, thanks for the info!
It seems like Astral is working on some improvements with their PYX project. They're trying to tackle the issues you're describing, which could be a step in the right direction! You can check out their work at astral.sh/pyx.
That's encouraging to hear! I really hope they continue development, especially after the OpenAI acquisition. It sounds promising!
Interesting concept for sure! Would love to see how it evolves.
I've noticed that CuPy installs fine via pip and has worked for me over the years. The issues often arise from packaging negligence or mixing incompatible versions. It's frustrating!
Honestly, the gap with CUDA and native wheels is where the real trouble lies. In my experience, it's best to set your CUDA version in stone first and then build everything else around it—torch with CUDA should be your anchor. I switch to NVIDIA's Docker base images instead of just a standard Python image; it makes a huge difference. Trust me, locking down your environment before tweaking anything saves a lot of headaches and money in the long run!
From my experience, using a Docker container that's based on a CUDA image can really simplify the process. You just need to ensure the Python packages you're installing are compatible with the CUDA version you're working with. It can streamline things a lot!

PEP 817 might need some tweaks before it gets approved. Hopefully splitting it into parts will help get some of these issues started on the right path.