I'm looking to set up a system for searching through less than 500 PDF files, primarily journal articles. The goal is to have a search capability that can handle queries like, "What articles discuss frog habitats in North America?" Adding new PDFs will be rare—maybe just a few each month—and I expect only a couple of queries per day. I'm considering the S3 vector store for this purpose, but I've heard that Kendra can be quite expensive even for low usage. Is using a vector store a good option for my needs? I'm open to suggestions for an effective method.
1 Answer
I'm not sure if the S3 vector store supports natural language retrieval. I would suggest using Textract to extract text from your PDFs and then leverage Bedrock's capabilities to query that data. The only costs would come from the initial text conversion and then very minimal charges based on the tokens used during queries.
Would using vector stores be an option for simple keyword searches? For instance, if a user searches for 'eardrum', could it return all PDFs containing that word? They're willing to adjust the functionality to keep costs reasonable.