I developed an offline semantic search engine using JavaScript and wanted to share it with you guys for feedback. My goal was to create a solution for small projects that required semantic search without the complications of a database or external services. The library operates fully offline, utilizing local embeddings and fuzzy matching to deliver results. It's best suited for small to medium datasets that can fit into memory, making it ideal for applications like product search, autocomplete, and offline-first apps. I'm not aiming to replace Elasticsearch, just providing a lightweight alternative. I would love your thoughts: Does this approach resonate with you? Are there any clear pitfalls I should be cautious of? What features would you expect to see? You can check out the repository on GitHub or the npm package for more details.
4 Answers
This project is definitely interesting! I'm saving it for later to see if I can integrate semantic search into my own apps that struggle with object metadata normalization.
This looks super useful. Thanks for sharing!
Thanks! Glad you find it helpful!
Your approach definitely fits a niche where using a full database or hosted service feels excessive. I’ve had experiences where maintaining a MySQL server just to handle a simple search was overkill. For potential issues, keep an eye on the model size and memory usage, especially when it comes to mobile and browser implementations. Also, I'm curious about language support since the default model seems to cater primarily to English. Do you have any recommendations for multilingual models that work well with Transformers.js? It’d be helpful to include a note about language support in your README.
Thanks for your feedback! You're spot on regarding the model size and cold-start times in browsers. We currently use `Xenova/all-MiniLM-L6-v2` (around 90 MB) to balance quality and size. For multilingual support, I recommend using `Xenova/paraphrase-multilingual-MiniLM-L12-v2` (around 120 MB); it's much better for languages like French and Spanish. I’ll add a note about language support in the README.
You might want to consider using web workers for the heavy lifting since this approach could be demanding on the main thread.

Just click the three dots and hit save to keep it handy.