I'm working on a project where I need to use a CSV file as if it's a database, and I'm facing some challenges. I have a product catalog with characteristics like price, description, and category stored in this CSV. The structure doesn't lend itself well to querying like a relational database would. My mentor insisted on using a CSV, so I'm wondering if it's feasible to work this way or if I should switch to a proper database. It's a machine learning project with React on the front end and Python on the back end. Any advice would be greatly appreciated!
5 Answers
Most modern databases offer tools to easily import data from CSV files into tables. It might be worth exploring that route if you decide to make the switch.
When you say the CSV file is 'large,' what are we talking about? Is it around 10GB, 100GB, or even larger? The approach you take can change drastically based on the size of the data you're handling.
You might want to look into libraries like Polars or Pandas. They really help you organize your CSV data into a structured model, making it easier to query just like a database would.
Thanks for the quick answer! I'll definitely give that a try!
If you anticipate high transaction throughput, consider switching to a proper database. It could really simplify the entire process as your application evolves.
Just to clarify, since you're working on a machine learning project, is the CSV really being used as a database here? Are you looking to query it with SQL or something? It might help to know how you're intending to use the data. Normally, you would use CSV files for training models and ask those models questions afterward, which is different from standard database queries.
Yes, it's an educational project and I'm not directly querying the ML model. I need to query the input data that the ML model is using, which the user interface will display.

Currently, my test data is around 100MB, but for real use, it might be closer to 250GB.