What is Agenta?
Agenta is an open-source LLMOps platform that helps developers and product teams build reliable LLM applications.
Agenta covers the entire LLM development lifecycle: prompt management, evaluation, and observability.
Features
Prompt Engineering and Management
Teams often struggle with prompt collaboration. They keep prompts in code where subject matter experts cannot edit them. Or they use spreadsheets in an unreliable process.
Agenta organizes prompts for your team. Subject matter experts can collaborate with developers without touching the codebase. Developers can version prompts and deploy them to production.
The playground lets teams experiment with prompts. You can load traces and test sets. You can test prompts side by side.
Evaluation
Most teams lack a systematic evaluation process. They make random prompt changes based on vibes. Some changes improve quality but break other cases because LLMs are stochastic.
Agenta provides one place to evaluate systematically. Teams can run three types of evaluation:
- Automatic evaluation with LLMs at scale before production
- Human annotation where subject matter experts review results and provide feedback to AI engineers
- Online evaluation for applications already in production
Both subject matter experts and engineers can run evaluations from the UI.
Observability
Agenta helps you understand what happens in production. You can capture user feedback through an API (thumbs up or implicit signals). You can debug agents and applications with tracing to see what happens inside them.
Track costs over time. Find edge cases where things fail. Add those cases to your test sets. Have subject matter experts annotate the results.
Why Agenta?
Enable collaboration between developers and product teams
Agenta empowers non-developers to iterate on the configuration of any custom LLM application, evaluate it, annotate it, A/B test it, and deploy it, all within the user interface.
Open-source and MIT licensed
Agenta is open-source and MIT licensed, so you can self-host it, modify it, and use it in commercial projects without restrictions.
Works with any LLM app workflow
Agenta enables prompt engineering and evaluation on any LLM app architecture, such as Chain of Prompts, RAG, or LLM agents. It is compatible with any framework like Langchain or LlamaIndex, and works with any model provider, such as OpenAI, Cohere, or local models.