Skip to content

Latest commit

 

History

History
31 lines (19 loc) · 1.38 KB

File metadata and controls

31 lines (19 loc) · 1.38 KB
title Code Evaluations

Code Evaluations in Freeplay

In addition to Human and Model Graded evaluations, Freeplay also offers the ability to run code-driven evaluations directly on the client-side, then log those results to Freeplay. These evals are generally functions written and run in the client's code path and then recorded back to Freeplay.

These evaluations are particularly useful for criteria requiring logical expressions, such as JSON schema checks or category assertions on single answers, or for pairwise comparisons to an expected output via methods like embedding or string distance. Code evals can be added both to:

  • Individual Sessions
  • Test Runs executed with our SDK or API, which can include comparisons to ground truth data

In either case, any results you log to Freeplay flow through to the UI just like human or model-graded evals. See our SDK documentation for more details.


What’s Next

Now review each evaluation type and then move onto test runs once all your evaluations are configured!

Ask AI