Online Evaluation
Online Evaluation allows you to continuously evaluate your LLM application on production traces. Instead of running evaluations on static test sets, you can automatically sample and evaluate real traffic.
Setting Up Online Evaluation
Navigate to the Evaluations page and select the Online Evaluation tab.
Click Create Online Evaluation to set up a new evaluation:
- Name: Provide a descriptive name for your online evaluation.
- Evaluator: Select one or more evaluators to run on sampled traces.
- Trace Filter: Define which traces to evaluate (e.g., filter by application, variant, or metadata).
- Sampling Rate: Set the percentage of matching traces to evaluate (e.g., 10% to sample 1 in 10 traces).
Click Create to start the online evaluation. Results will appear as traces are processed.
Viewing Results
Online evaluation results are displayed in real-time as traces are sampled and evaluated. You can:
- View aggregated metrics over time
- Drill down into individual evaluated traces
- Compare performance across different time periods
Use Cases
- Production Monitoring: Continuously monitor model quality on real user inputs
- Regression Detection: Catch performance degradation early
- A/B Testing: Compare different model versions on live traffic