I'm a developer with a strong background, and I'm planning to create an AI chatbot for customer service. Previously, I used a LAMP stack on an EC2 instance, but I have a new approach in mind and would love your input.
Here's how I envision it working:
- When a user sends a WhatsApp message, it hits a webhook backend that decides whether to route it to the "New Customer Sales Agent" or the "Existing Customer Support Agent."
- The agent will use retrieval-augmented generation (RAG) to pull answers from an FAQ by utilizing vector embeddings and cosine similarity.
- After retrieving the relevant information, the language model (LLM) will respond to the user.
- The agent should also be able to create custom orders and send a "Pay Now" button.
- On the admin side, someone will manage and update the Q&A for RAG. Each Q&A piece will be treated separately when generating embeddings.
For the setup, I'm considering:
- WhatsApp official business API
- PHP webhook to trigger the bot
- Access to Claude and ChatGPT through API keys
- OpenAI small embedding model for RAG
- OpenAI Whisper API for transcribing audio messages
- OpenAI's multi-modal image recognition for interpreting images
- PHP backend hosted on an EC2 instance.
Now, I'm stuck on what to use for the vector database for RAG. Also, I know using a PHP backend instead of Python may seem odd, but I'm more proficient in PHP. I worried that using Python scripts with Lambda and API Gateway might lead to timeouts during API calls and RAG processing. Any suggestions for my infrastructure and tech stack?
3 Answers
Have you looked into using S3 for storing vectors? It's worth considering since it’s scalable and you might find it easier for vector storage when implementing RAG.
It sounds like you’re setting up a pretty neat project! Since your architecture involves slow downstream APIs, I'd recommend going with API Gateway -> Lambda -> SQS -> ECS (Fargate). This setup allows you to scale efficiently, and Fargate will manage containers without needing to provision EC2 instances manually.
Thanks for the tip! It’s my first time hearing about Fargate and ASG, so I’ll definitely check them out!
You might want to check out AWS Lex. It's a managed chatbot service by AWS, but keep in mind that incorporating RAG could require some manual setup. Alternatively, AWS Bedrock could be a great fit since it supports native RAG implementations and offers built-in vector generation and storage options like Titan and OpenSearch.
Bedrock sounds promising! I’ll definitely investigate that. Thanks for the insight!

S3 sounds interesting! I haven’t used it for vector storage before, so I’d love to explore that.