I'm working on a pet project and curious about the performance of local models, particularly looking at the larger OpenAI models. I'm currently able to use some small models from Ollama which fit nicely into my PC's VRAM. However, I'm contemplating upgrading my PC to handle larger models, since they often require around 80GB of VRAM, which can be pricey. Does anyone have experience with these models? Also, could you recommend gaming hardware that can run something like Cyberpunk 2077 at 30fps and also be effective for LLMs?
3 Answers
For deep models, you'll need at least 128GB of RAM, and if you're aiming for the absolute highest performance, you're looking at 400GB+. It’s a hefty setup, but if it’s in your budget, it could work wonders for your projects.
With a GPU that has 16GB of VRAM, you can run some smaller models, but if you get a 5090, there are some decent 30B models, like Qwen or Nemotron, that you can work with. Be ready to spend a bit if you want anything larger! Just keep in mind that using MCP servers can slow things down and throw off context. Overflowing to system RAM is possible, but it runs super slow, as I tested with a high-end setup.
Exactly! That’s what I’ve been experiencing, too. If it doesn’t fit into the GPU memory, it ends up running on the CPU, which is just painfully slow.
RAM is crucial for running larger models—essentially, more RAM means you can handle models with higher parameter counts. While a powerful GPU helps with speed, the size of the models often demands substantial RAM, making those setups costly. You can check out the free OpenAI models at gpt-oss.com and find a guide for running them on Ollama as well.
I totally get what you're saying! I tried running a model on my laptop, but it was so slow using the CPU instead of the GPU because it didn’t fit in the VRAM. Definitely a tough spot!

Yikes, that’s a lot! I was hoping for something I could manage without remortgaging my house!