Skip to content

flutter/evals

evals

Caution

This repo is highly unstable and the APIs will change.

Evaluation framework for testing AI agents' ability to write Dart and Flutter code. Built on Inspect AI.

Overview

This repo includes

  • eval runner — Python package for running LLM evaluations with configurable tasks, variants, and models
  • config packages — Dart and Python packages that resolve dataset YAML into EvalSet JSON for the runner
  • devals CLI — Dart CLI for creating and managing dataset samples, tasks, and jobs
  • Evaluation Explorer — Dart/Flutter app for browsing and analyzing results

Tip

Full documentation at dash-evals-docs.web.app/

Packages

Package Description Docs
dash_evals Python evaluation runner using Inspect AI dash_evals docs
dataset_config_dart Dart library for resolving dataset YAML into EvalSet JSON (includes shared data models) dataset_config_dart docs
dataset_config_python Python configuration models
devals_cli Dart CLI for managing evaluation tasks and jobs CLI docs
eval_explorer Dart/Flutter reporting app eval_explorer docs

Documentation

Doc Description
Quick Start Get started authoring your own evals
Contributing Guide Development setup and guidelines
CLI Reference Full devals CLI command reference
Configuration Reference YAML configuration file reference
Repository Structure Project layout
Glossary Terminology guide

Contributing

See CONTRIBUTING.md for details, or go directly to the Contributing Guide.

License

See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors