Online Evaluation

Online Evaluation allows you to continuously evaluate your LLM application on production traces. Instead of running evaluations on static test sets, you can automatically sample and evaluate real traffic.

Setting Up Online Evaluation

Navigate to the Evaluations page and select the Online Evaluation tab.

Click Create Online Evaluation to set up a new evaluation:

Name: Provide a descriptive name for your online evaluation.
Evaluator: Select one or more evaluators to run on sampled traces.
Trace Filter: Define which traces to evaluate (e.g., filter by application, variant, or metadata).
Sampling Rate: Set the percentage of matching traces to evaluate (e.g., 10% to sample 1 in 10 traces).

Click Create to start the online evaluation. Results will appear as traces are processed.

Viewing Results

Online evaluation results are displayed in real-time as traces are sampled and evaluated. You can:

View aggregated metrics over time
Drill down into individual evaluated traces
Compare performance across different time periods

Use Cases

Production Monitoring: Continuously monitor model quality on real user inputs
Regression Detection: Catch performance degradation early
A/B Testing: Compare different model versions on live traffic

Setting Up Online Evaluation​

Viewing Results​

Use Cases​

Setting Up Online Evaluation

Viewing Results

Use Cases