Programming

Best Ways to Store Large Lists of Numbers in Python?

July 17, 2025

Asked By TechieNerd23 On July 17, 2025

I'm working with a ton of high-resolution GPS data files that, when compressed, take up about 125GB, but uncompressed they might exceed 1TB. I've developed a Python script that processes each file one at a time and generates a numpy array for each file. Each of these arrays varies in length. After processing a file, I need to save the resulting array to disk and then append each new array to the previous one, so I end up with one expanding 1D list stored on my hard drive.

For instance, if I generate an array like [1,2,3,4] from the first file, and [5,6,7] from the second, I want my final file to look like [1,2,3,4,5,6,7]. The challenge here is that I anticipate the final array could contain over 10 billion floats, which means it would take up about 40GB of storage. However, I only have 16GB of RAM, so loading everything into memory to write it out at once isn't an option. I'm curious about how others handle this situation.

7 Answers

Answered By ZipNinja On July 18, 2025

You might want to use the zipfile module to stream your input data instead. Only load a small portion into memory and append to your output file as needed. Check out the documentation [here](https://docs.python.org/3/library/zipfile.html) for more details!

Answered By ProcessGuru On July 18, 2025

It might be easiest just to write each row out as you finish processing it. That way, you avoid hitting RAM limits altogether!

Answered By ChillCoder On July 18, 2025

Appending to files is pretty straightforward! Plus, if you want to save some space, consider streaming your output into a compression tool like gzip. Just output the data to stdout and pipe it right into gzip to save disk space as you go! Also, think about breaking your large dataset into smaller files. It'll make processing easier later on. And don’t forget to plan for crashes; you’ll want a way to resume if something goes wrong!

Answered By DataWrangler99 On July 18, 2025

You could append each new set of numbers directly to your file as you process them. This way, the RAM usage stays low since you're only holding onto the numbers from one file at a time. I'm not a Python guru, but I believe this approach should work just fine!

CodeMaster3000 - July 18, 2025

Absolutely! You can use the Pandas library, which allows you to append your data frame to a CSV file like this: `x.to_csv("filepath/filename.csv", mode='a')`. It's super effective!

MaxSpeed80 - July 18, 2025

Just be careful with batch sizes! It's often faster to write in chunks, sized appropriately for your disk's cache. If your hard drive has enough space, go ahead and write directly to the file without worrying too much about RAM.

Answered By CuriousCoder On July 18, 2025

Do you strictly need to use Python? You might have more control over memory management with a different programming language if you're open to it.

Answered By SQLiteFanatic On July 17, 2025

Using SQLite could simplify things a lot. It provides an easy way to handle data without getting overwhelmed. You can write each new array as you go, which would keep your memory usage down!

Answered By DataDevJohn On July 17, 2025

Think about how you'll retrieve the data. If you eventually need the entire array in memory, you might want to consider memory-mapped files. But really, it sounds like a database could serve you better here, depending on your needs.

Best Ways to Store Large Lists of Numbers in Python?

7 Answers

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply