I've got around 2000 objects, each with an 110x6 grid of float data ranging from -1 to 2. New objects are added weekly, but at the moment, I'm storing the data in separate text files named objectName_A, objectName_B, and so on. It's a pretty messy setup since the columns are generated separately. Additionally, I have a separate file for each object detailing its creation date, contributors, and basic stats.
I'm considering moving the _A, _B, etc. files into a single 110x6 CSV to simplify things, but since I often query only specific columns from different objects, I'm worried that might slow performance. At runtime, I load data based on an input file, usually accessing about 40 objects and several of the data columns.
What should I do to store and structure my data more effectively? Are there database options I should consider?
6 Answers
Any relational database should serve your needs well. It seems pretty straightforward—one table for storing floats and another for metadata. If the data isn't huge, you could also load it all into memory to make access super fast. That said, how you add new objects matters, especially if you're using file exchanges. You might want to keep your system updated where possible anyway.
Also, how important is calculation speed versus loading speed? If the dataset is small, it could sometimes be faster just to calculate data as needed rather than pulling it from files. Modern processors handle these calculations quickly and can help with performance if you're not stressing this too much.
If your grid size is fixed at 110x6, using a SQLite database could be a simple and effective solution. It would allow you to easily manage the data rather than dealing with multiple text files. Plus, you can index it for quicker access when querying specific columns.
Consider using Postgres with a table structured for your objects that holds descriptive fields and six JSON columns for your data. Each JSON field can contain the 110 floats, and while there might be better data types, JSON would work well enough for your needs.
You might want to clarify how often you need to access this data. If it's not too frequent, keeping it in CSVs or even a simple RDBMS with a basic schema could still be reliable and quick enough for your use case. But if you need rapid access, you’ll have to consider indexing based on what queries you'll be running most often.
One efficient approach is to store your grid data in a single array of floats. You can access specific cells using a formula like `y * width + x`, which is space-efficient and fast. If you're not editing the files manually, you might even consider saving the array in binary to cut down on file size. It might make sense to save the metadata in the same file, too, but keeping it separate is fine if you prefer that.
That sounds interesting! I like the idea of keeping things efficient with an array. I'll look into it!

That's a good point! Let me evaluate how often I need access versus calculation.