Agenta is now live on Product Hunt 🚀

Launch Week #2 Day 2: Online Evaluation

Today we're launching online evaluation. With online evaluation each request is automatically evaluated. This allows you to monitor things in production.

Nov 11, 2025

5 minutes

Ship reliable AI apps faster

Agenta is the open-source LLMOps platform: prompt management, evals, and LLM observability all in one place.

Star on Github

Get started

Did your customer support agent ever mention a competitor?

That would be awful.

And if you build AI apps for health or finance, it could be worse. One wrong answer can change someone’s life.

Pre-production evals can’t catch everything. You will never know exactly how users will interact with your app. You will never see all the edge cases of real-world data until you’re in production.

Today we're launching Online Evaluation. This feature closes the LLMOps feedback loop and solves this problem.

With Online Evaluation, every AI request gets evaluated in real time. You can spot hallucinations, off-brand answers, and subtle regressions as they happen.

With Online Evaluation, you get:

A live view of the reliability of your system in production
Confidence that your outputs meet your quality standards
A way to find edge cases and add them to your test cases to improve your AI system
Clear insight into how prompt changes behave in production

How it works:

Pick an evaluator (use an LLM-as-a-judge or write your own evaluator logic in Python)
Provide filters to target the right spans and set the sampling rate to control your cost and coverage
Measure changes against live traffic, spot regressions, and add them to your test set

You can set up online evaluation in a couple of minutes: Add one line to instrument your application. Then set up online evaluation with a few clicks.

Check out our docs to get started.

Did your customer support agent ever mention a competitor?

That would be awful.

And if you build AI apps for health or finance, it could be worse. One wrong answer can change someone’s life.

Today we're launching Online Evaluation. This feature closes the LLMOps feedback loop and solves this problem.

With Online Evaluation, every AI request gets evaluated in real time. You can spot hallucinations, off-brand answers, and subtle regressions as they happen.

With Online Evaluation, you get:

A live view of the reliability of your system in production
Confidence that your outputs meet your quality standards
A way to find edge cases and add them to your test cases to improve your AI system
Clear insight into how prompt changes behave in production

How it works:

Pick an evaluator (use an LLM-as-a-judge or write your own evaluator logic in Python)
Provide filters to target the right spans and set the sampling rate to control your cost and coverage
Measure changes against live traffic, spot regressions, and add them to your test set

You can set up online evaluation in a couple of minutes: Add one line to instrument your application. Then set up online evaluation with a few clicks.

Check out our docs to get started.

Did your customer support agent ever mention a competitor?

That would be awful.

And if you build AI apps for health or finance, it could be worse. One wrong answer can change someone’s life.

Today we're launching Online Evaluation. This feature closes the LLMOps feedback loop and solves this problem.

With Online Evaluation, every AI request gets evaluated in real time. You can spot hallucinations, off-brand answers, and subtle regressions as they happen.

With Online Evaluation, you get:

A live view of the reliability of your system in production
Confidence that your outputs meet your quality standards
A way to find edge cases and add them to your test cases to improve your AI system
Clear insight into how prompt changes behave in production

How it works:

Pick an evaluator (use an LLM-as-a-judge or write your own evaluator logic in Python)
Provide filters to target the right spans and set the sampling rate to control your cost and coverage
Measure changes against live traffic, spot regressions, and add them to your test set

You can set up online evaluation in a couple of minutes: Add one line to instrument your application. Then set up online evaluation with a few clicks.

Check out our docs to get started.