I'm building a semantic cache in Go with around 10,000 entries. Each entry consists of a string key, a 1.25 KB binary vector, and a cached value, resulting in an approximate total of 15 MB. It works perfectly in memory, but every time I restart the application, I lose the cache.
I need a solution that ensures fast startup (under 100ms for 10,000 entries), crash recovery, and minimal complexity. I'm considering several options:
1. Using encoding/gob to dump the cache to a file on shutdown and load it back on startup. It seems straightforward and has no dependencies, but is it fast enough for 15 MB?
2. Memory-mapping the file (mmap) to make disk writes automatic. It's quick, but does that approach seem excessive for this size?
3. Using FlatBuffers or protobuf for faster decoding than gob while providing a stable wire format. However, is it worth the added dependency?
4. SQLite, which feels like overkill for a simple caching solution.
Has anyone dealt with gob at this scale? Is mmap worth the added complexity, or am I overcomplicating things for a 15 MB file? Are there any other patterns I should consider?
2 Answers
I'd skip options 2 and 4 for your situation. If you're debating between adding a dependency or not, I’d suggest going with FlatBuffers. It's not a massive deal to add another library, and you'll benefit from faster serialization. But, I also haven’t tried gob, so I’m curious about its efficiency too.
At around 15 MB, you could just read and write the cache directly to disk and let the OS handle the paging. It should work pretty well without any advanced caching techniques.
That’s what I’m leaning towards as well! For something that small, simply using gob.Encode to save on shutdown and gob.Decode to read on startup seems to be all you need. The OS will take care of the paging.

That's a good point! I prefer to stick with zero-dependency solutions, but the data structure is pretty simple: a []uint64 bit vector, a string key, and a string value. I'm leaning towards gob for now but I'm unsure how it will scale if the cache ends up with millions of entries.