Skip to main content

Minor improvements

Toggle variants in comparison view

You can now toggle the visibility of variants in the comparison view, allowing you to compare a multitude of variants side-by-side at the same time.

Improvements

  • You can now add a datapoint from the playground to the test set even if there is a column mismatch

Bug fixes

  • Resolved issue with "Start Evaluation" button in Testset view
  • Fixed bug in CLI causing variant not to serve

New evaluators

We have added some more evaluators, a new string matching and a Levenshtein distance evaluation.

Improvements

  • Updated documentation for human evaluation
  • Made improvements to Human evaluation card view
  • Added dialog to indicate testset being saved in UI

Bug fixes

  • Fixed issue with viewing the full output value during evaluation
  • Enhanced error boundary logic to unblock user interface
  • Improved logic to save and retrieve multiple LLM provider keys
  • Fixed Modal instances to support dark mode

Minor improvements

  • Improved the logic of the Webhook evaluator
  • Made the inputs in the Human evaluation view non-editable
  • Added an option to save a test set in the Single model evaluation view
  • Included the evaluator name in the "Configure your evaluator" modal

Bug fixes

  • Fixed column resize in comparison view
  • Resolved a bug affecting the evaluation output in the CSV file
  • Corrected the path to the Evaluators view when navigating from Evaluations

Highlight ouput difference when comparing evaluations

We have improved the evaluation comparison view to show the difference to the expected output.

Improvements

  • Improved the error messages when invoking LLM applications
  • Improved "Add new evaluation" modal
  • Upgraded Sidemenu to display Configure evaluator and run evaluator under Evaluations section
  • Changed cursor to pointer when hovering over evaluation results

Deployment Versioning and RBAC

Deployment versioning

You now have access to a history of prompts deployed to our three environments. This feature allows you to roll back to previous versions if needed.

Role-Based Access Control

You can now invite team members and assign them fine-grained roles in agenta.

Improvements

  • We now prevent the deletion of test sets that are used in evaluations

Bug fixes

  • Fixed bug in custom code evaluation aggregation. Up until know the aggregated result for custom code evalution where not computed correctly.

  • Fixed bug with Evaluation results not being exported correctly

  • Updated documentation for vision gpt explain images

  • Improved Frontend test for Evaluations


Minor fixes

  • Addressed issue when invoking LLM app with missing LLM provider key
  • Updated LLM providers in Backend enum
  • Fixed bug in variant environment deployment
  • Fixed the sorting in evaluation tables
  • Made use of server timezone instead of UTC

Prompt Versioning

We've introduced the feature to version prompts, allowing you to track changes made by the team and revert to previous versions. To view the change history of the configuration, click on the sign in the playground to access all previous versions.


New JSON Evaluator

We have added a new evaluator to match JSON fields and added the possiblity to use other columns in the test set other than the correct_answer column as the ground truth.


Improved error handling in evaluation

We have improved error handling in evaluation to return more information about the exact source of the error in the evaluation view.

Improvements:

  • Added the option in A/B testing human evaluation to mark both variants as correct
  • Improved loading state in Human Evaluation