I'm currently the sole security person at my company and we launched an AI assistant a few months back. It's built on top of a foundation model and operates within our main product, dealing with real user queries. My background is mostly in traditional app security and cloud security, so I know how to test web apps and secure our AWS environment. However, I'm realizing that securing a large language model (LLM) product presents entirely different challenges, and I'm not confident that our current security measures are adequate.
We have implemented some basic controls like input validation, output filtering, rate limiting, and a content policy in the system prompt. At launch, I felt these measures were sufficient, but I'm starting to doubt that. I'm especially concerned about potential prompt injection that doesn't resemble traditional attacks, and the risks of model behavior changing over time without our awareness. Unlike traditional web apps where I know how to implement continuous security monitoring, I'm unsure what that looks like for an AI product.
Is there any established practice for ongoing AI security monitoring in production environments? What are effective strategies for ensuring continuous security after the model goes live?
2 Answers
You're right to be worried! The unique attack surface of LLMs really changes the security game. From what we've implemented, consider treating your system prompt like a secret. Logging every input and output with hashes can help you investigate any issues later. Additionally, separating retrieval from generation allows you to audit what the model actually sees, which is crucial. If you’re relying on external APIs or hosting your own inference, the security measures will vary significantly based on that.
Unfortunately, there’s not a one-size-fits-all solution yet. The field of AI security is still evolving. One effective approach is to treat your AI model as a primary security boundary. You can implement structured telemetry for all inputs and outputs, use automated behavioral anomaly detection, and establish prompt testing and rapid mitigation pipelines. It’s a mix of app security, fraud detection, and observability. Just keep in mind that you'll need constant feedback to handle drift; your security rules will decay over time otherwise.
That's a helpful way to look at it! But I'm still a bit confused about what behavioral anomaly detection really looks like with actual LLM traffic. Can you break it down?

What kind of feedback loops do you use to process this information?