New JSON Evaluator
We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.
We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.
We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.
Improvements:
Up until know, we required users to use our OpenAI API key when using cloud. Starting now, you can use your own API key for any new application you create.
Faster human evaluation workflow
We have updated the human evaluation table view to add annotation and correct answer columns.
Improvements:
Bug fixes:
We've spent the past month re-engineering our evaluation workflow. Here's what's new:
Running Evaluations
Evaluation Reports
This change requires you to pull the latest version of the agenta platform if you're using the self-serve version.
We've added a feature that allows you to compare the time taken by an LLM app, its cost, and track token usage, all in one place.
----#