Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation by auschoi96 · Pull Request #166 · databricks-solutions/ai-dev-kit

auschoi96 · 2026-02-23T08:07:20Z

Addition to the .test file to support GEPA optimization and skill evaluation. Added a README to explain how to use this

…imize_anything

… add examples from MLflow traces or manual

auschoi96 · 2026-02-24T18:37:08Z

a significant portion of the lines from this comes from the generated yaml files, manifest, ground_truth and candidates

Add 7-lakehouse-monitoring.md reference file covering quality monitors, profile types (Snapshot, TimeSeries, InferenceLog), MCP tool usage, and Python SDK examples. Update SKILL.md with trigger condition and reference table entry. Tested against a live Databricks workspace - created and verified a snapshot monitor on a Unity Catalog table.

- SKILL.md: updated trigger bullet and reference table to data profiling - Renamed 7-lakehouse-monitoring.md to 7-data-profiling.md with new w.data_quality SDK examples - Added new Data Quality docs and SDK references, kept legacy Lakehouse Monitoring SDK link for backward compatibility

…ently. This is because tools are used universally so we may not be able to optimize the two together

auschoi96 · 2026-03-03T23:15:20Z

This PR touches many files, but most are data, not code:

~30 ground_truth.yaml + manifest.yaml files = test datasets for 16 skills (evaluation data, not code)
~12 source files under src/skill_test/optimize/ = the actual framework
~5 scripts = CLI entry points and test case generation utilities

For a full explanation of what's happening, check out the read me in .test/README.md

calreynolds · 2026-03-04T02:34:57Z

Woot woot! OK cool tomorrow i'm getting my bifocals on and going through this in-depth. will do a proper proper test of testing! awesome stuff, on PR glance looks excellent.

auschoi96 · 2026-03-04T02:37:51Z

@calreynolds I did a quick test on metric views and SDP! See Slack for results. So try a different skill (or one you made!) to get a good sense!

Also I recommend using AI gateway's fallbacks to avoid rate limits: https://e2-demo-field-eng.cloud.databricks.com/ml/ai-gateway/gepa-fallbacks?o=1444828305810485

auschoi96 · 2026-03-04T02:40:49Z

Also snuck in an update to the app to use serverless compute accidentally so that's probably in one of these commits

calreynolds · 2026-03-04T15:43:53Z

Also snuck in an update to the app to use serverless compute accidentally so that's probably in one of these commits

I think that's actually a good thing as a default! just took a look and doesn't seem too crazy to me (since users can swap into other compute as they see fit)

then otherwise here are some fixes:

Fixes:

app.yaml:117 — Replaced your hardcoded email (austin.choi@databricks.com) in MLFLOW_EXPERIMENT_NAME with an empty placeholder + comment
alembic/env.py — Removed the duplicated _resolve_hostname function (24 lines) and imported it from database.py instead
alembic/env.py — Added schema name validation (regex) before it gets interpolated into the CREATE SCHEMA IF NOT EXISTS SQL statement

To pull in:

git fetch origin cal/pr166-fixes
git cherry-pick d6c4d13

otherwise all good with me! i did testing on the parsing and looked good. great overhaul!

calreynolds · 2026-03-04T16:00:57Z

OK added in a couple more things into that temp branch, when those are merged I think we're ready to rock! 🥇

…, validate schema name - Replace austin.choi@databricks.com with empty placeholder in MLFLOW_EXPERIMENT_NAME - Import _resolve_hostname from database.py instead of duplicating in alembic/env.py - Add regex validation on schema name before interpolation into SQL Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

auschoi96 · 2026-03-04T16:58:45Z

@calreynolds done!

calreynolds

👍👍👍

auschoi96 added 7 commits February 20, 2026 23:10

adding skill evals and skill optimization capabilities using GEPA opt…

35c48c6

…imize_anything

optimization gepa updates and a demo/tutorial notebook

24419cb

fixed config json extra parameter issues

5f0c102

added optimizations for tools

33ae5a5

fix to use gepa 0.1.0

7b6e1ea

addition and updates to evaluation powering GEPA. added capability to…

bafd451

… add examples from MLflow traces or manual

readme updates

d3757df

auschoi96 and others added 16 commits February 26, 2026 14:14

Merge branch 'databricks-solutions:main' into main

ba0705f

Merge remote-tracking branch 'upstream/main'

37be3ee

readme updates and skillbench inclusion

ede86aa

Merge branch 'databricks-solutions:main' into main

e6c6410

Merge branch 'main' of https://github.com/auschoi96/ai-dev-kit

54b783d

Merge branch 'databricks-solutions:main' into main

d0a872e

add the parsing skill for parsing documents/custom rag

50b7cdd

updated client files with latest sdk syntax recommendations

0953f6f

install databricks parsing skill

3929bc6

refactoring to use mlflow

93292d1

refactoring to use mlflow

c3c2772

Merge remote-tracking branch 'upstream/main'

5dbabfb

added mlflow judges and ability to optimize skills and tools independ…

cb46f6a

…ently. This is because tools are used universally so we may not be able to optimize the two together

lint fixes

381bf0d

auschoi96 added 2 commits March 3, 2026 15:19

lint fixes

b91f1e6

removed unreferenced files and updated .test/README.md

de782e8

auschoi96 marked this pull request as ready for review March 4, 2026 02:25

auschoi96 requested review from calreynolds and dustinvannoy-db March 4, 2026 02:28

Merge branch 'databricks-solutions:main' into main

c8e2eeb

auschoi96 and others added 3 commits March 4, 2026 08:51

Merge remote-tracking branch 'upstream/main'

3c978e6

Merge branch 'main' of https://github.com/auschoi96/ai-dev-kit

08f18c1

calreynolds approved these changes Mar 4, 2026

View reviewed changes

calreynolds merged commit 54d371b into databricks-solutions:main Mar 4, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166
calreynolds merged 29 commits intodatabricks-solutions:mainfrom
auschoi96:main

auschoi96 commented Feb 23, 2026

Uh oh!

auschoi96 commented Feb 24, 2026

Uh oh!

auschoi96 commented Mar 3, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

auschoi96 commented Mar 4, 2026 •

edited

Loading

Uh oh!

auschoi96 commented Mar 4, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

auschoi96 commented Mar 4, 2026

Uh oh!

calreynolds left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

auschoi96 commented Feb 23, 2026

Uh oh!

auschoi96 commented Feb 24, 2026

Uh oh!

auschoi96 commented Mar 3, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

auschoi96 commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

auschoi96 commented Mar 4, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

calreynolds commented Mar 4, 2026

Uh oh!

auschoi96 commented Mar 4, 2026

Uh oh!

calreynolds left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

auschoi96 commented Mar 4, 2026 •

edited

Loading