I'm dealing with a scenario where generating a model or dataset takes about 10 minutes, and I want to optimize this process by saving and loading it instead of regenerating it each time. The basic idea I'm considering looks like this: check if the saved model exists, and if not, generate it, save it, and then use it. However, this approach seems cumbersome since it requires this logic everywhere—for models, datasets, intermediate values, and hyperparameters. I've thought about implementing a `SaveLoad` trait for objects that need saving, but I'm hoping there's a simpler, less repetitive solution. Ideally, I could even manage some random seed generation consistently across runs without wrapping basic data types each time. Any suggestions?
6 Answers
You could generate unique paths for each dataset and manage how your data is organized. Establish properties for naming consistency, and have a strategy for catching edge cases—you'll thank yourself later!
Have you considered using the Singleton pattern? It can help ensure that you have a single instance of your model or dataset throughout your application, which might streamline your workflow.
Consider adding a management layer that can handle your in-memory objects and manage loading or generation asynchronously. This layer can help you fetch the necessary data from various sources, including online ones if needed.
Lazy loading could be a great fit here. It allows you to delay the loading of your models until they're actually needed, which can improve performance, especially if generating them takes a while.
Another option is to split your workflow into two separate programs—one for generating your models or datasets and another for using them. This way, you manage the generation and loading distinctly.
What you really want is a read-through cache. It's a classic approach where your generator functions check if the data already exists and just return that instead of regenerating it. It prevents you from running unnecessary processes every time you need something.

Exactly! I don't get why some are making it more complex. Just implement caching in your slow getter and generator functions, and it will work like a charm.