evals

Caution

This repo is highly unstable and the APIs will change.

Evaluation framework for testing AI agents' ability to write Dart and Flutter code. Built on Inspect AI.

Overview

This repo includes

eval runner — Python package for running LLM evaluations with configurable tasks, variants, and models
config packages — Dart and Python packages that resolve dataset YAML into EvalSet JSON for the runner
devals CLI — Dart CLI for creating and managing dataset samples, tasks, and jobs
Evaluation Explorer — Dart/Flutter app for browsing and analyzing results

Tip

Full documentation at dash-evals-docs.web.app/

Packages

Package	Description	Docs
dash_evals	Python evaluation runner using Inspect AI	dash_evals docs
dataset_config_dart	Dart library for resolving dataset YAML into EvalSet JSON (includes shared data models)	dataset_config_dart docs
dataset_config_python	Python configuration models	—
devals_cli	Dart CLI for managing evaluation tasks and jobs	CLI docs
eval_explorer	Dart/Flutter reporting app	eval_explorer docs

Documentation

Doc	Description
Quick Start	Get started authoring your own evals
Contributing Guide	Development setup and guidelines
CLI Reference	Full devals CLI command reference
Configuration Reference	YAML configuration file reference
Repository Structure	Project layout
Glossary	Terminology guide

Contributing

See CONTRIBUTING.md for details, or go directly to the Contributing Guide.

License

See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.gemini		.gemini
.github		.github
docs		docs
packages		packages
tool		tool
.firebaserc		.firebaserc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
analysis_options.yaml		analysis_options.yaml
firebase.json		firebase.json
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

evals

Overview

Packages

Documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

evals

Overview

Packages

Documentation

Contributing

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages