I'm developing a feature on my website that matches users with relevant ideas based on their background profiles. I'm facing a dilemma between two approaches: the first is to run vector embeddings directly on long texts until reaching the token limit using a language model. However, this could lead to truncation where important information might be lost. The second approach involves summarizing the texts with the language model before embedding them, which also includes summarizing the user profiles for comparison. While summarizing would likely yield better relevance, it comes at a higher cost due to the additional API calls for summarization. I'm looking for advice on which method might be more effective and common in applications like this.
2 Answers
You might want to consider embedding the text in chunks, as mentioned earlier. This will indeed allow for more nuanced information retrieval. While summarizing is great for reducing token count, it might miss specific details that could be crucial for matching. Just weigh the costs against the potential accuracy!
It sounds like you're trying to set up a retrieval-augmented generation (RAG) system. A good strategy would be to break your texts into smaller chunks of about 200-400 tokens. This way, you can maintain overlap between sections, which helps capture important context without losing key details. It may require more embeddings, but it should enhance accuracy significantly.

Related Questions
Neural Network Simulation Tool
xAI Grok Token Calculator
DeepSeek Token Calculator
Google Gemini Token Calculator
Meta LLaMA Token Calculator
OpenAI Token Calculator