RAG to Production: Build a Production-Ready AI App with Agenta
Building an MVP of an AI application is easy. Bringing it to production, collaborating with your team on quality, and increasing the speed of iteration is hard.
This tutorial series walks you through the full lifecycle of building a production AI application. You will set up observability to understand what your application does. You will add prompt management so that everyone on the team (engineers and domain experts) can iterate on prompts. You will create test cases from real production data and use them to evaluate changes before they ship. And you will set up online evaluation to monitor quality over time.
We use a RAG Q&A chatbot as our running example, but the patterns apply to any LLM application. Think of this series as a walkthrough of all the features in Agenta, applied to a real project.
The series
- Tracing and prompt management. Set up tracing, add prompt management, and link traces to prompt versions.
- SME workflow (coming soon). Enable domain experts to annotate traces, create test cases, and iterate on prompts in the playground.
- Evaluate prompts (coming soon). Set up LLM-as-judge evaluators with rubrics and annotation-driven guidelines. Evaluate prompt variants from the UI.
- End-to-end evaluation with the SDK (coming soon). Evaluate your entire system (retrieval + prompt + generation) programmatically.
- Online evaluation and guardrails (coming soon). Monitor production quality continuously and use evaluators as runtime guardrails.
What you will need
- A working RAG application. We use the RAG Q&A Chatbot example, but you can adapt the steps to your own app.
- An Agenta Cloud account (free tier works).
- Python 3.11+.
Start with Part 1: Tracing and prompt management.