Minor fixes
- Addressed issue when invoking LLM app with missing LLM provider key
- Updated LLM providers in Backend enum
- Fixed bug in variant environment deployment
- Fixed the sorting in evaluation tables
- Made use of server timezone instead of UTC
We've introduced the feature to version prompts, allowing you to track changes made by the team and revert to previous versions. To view the change history of the configuration, click on the sign in the playground to access all previous versions.
We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.
We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.
Improvements:
Up until know, we required users to use our OpenAI API key when using cloud. Starting now, you can use your own API key for any new application you create.
Faster human evaluation workflow
We have updated the human evaluation table view to add annotation and correct answer columns.
Improvements:
Bug fixes:
We've spent the past month re-engineering our evaluation workflow. Here's what's new:
Running Evaluations
Evaluation Reports
This change requires you to pull the latest version of the agenta platform if you're using the self-serve version.
We've added a feature that allows you to compare the time taken by an LLM app, its cost, and track token usage, all in one place.
----#