Skip to content

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166

Merged
calreynolds merged 29 commits intodatabricks-solutions:mainfrom
auschoi96:main
Mar 4, 2026
Merged

Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166
calreynolds merged 29 commits intodatabricks-solutions:mainfrom
auschoi96:main

Conversation

@auschoi96
Copy link
Collaborator

Addition to the .test file to support GEPA optimization and skill evaluation. Added a README to explain how to use this

@auschoi96
Copy link
Collaborator Author

a significant portion of the lines from this comes from the generated yaml files, manifest, ground_truth and candidates

auschoi96 and others added 16 commits February 26, 2026 14:14
Add 7-lakehouse-monitoring.md reference file covering quality monitors,
profile types (Snapshot, TimeSeries, InferenceLog), MCP tool usage, and
Python SDK examples. Update SKILL.md with trigger condition and reference
table entry.

Tested against a live Databricks workspace - created and verified a
snapshot monitor on a Unity Catalog table.
- SKILL.md: updated trigger bullet and reference table to data profiling
- Renamed 7-lakehouse-monitoring.md to 7-data-profiling.md with new
  w.data_quality SDK examples
- Added new Data Quality docs and SDK references, kept legacy Lakehouse
  Monitoring SDK link for backward compatibility
…ently. This is because tools are used universally so we may not be able to optimize the two together
@auschoi96
Copy link
Collaborator Author

This PR touches many files, but most are data, not code:

  • ~30 ground_truth.yaml + manifest.yaml files = test datasets for 16 skills (evaluation data, not code)
  • ~12 source files under src/skill_test/optimize/ = the actual framework
  • ~5 scripts = CLI entry points and test case generation utilities

For a full explanation of what's happening, check out the read me in .test/README.md

@auschoi96 auschoi96 marked this pull request as ready for review March 4, 2026 02:25
@calreynolds
Copy link
Collaborator

Woot woot! OK cool tomorrow i'm getting my bifocals on and going through this in-depth. will do a proper proper test of testing! awesome stuff, on PR glance looks excellent.

@auschoi96
Copy link
Collaborator Author

auschoi96 commented Mar 4, 2026

@calreynolds I did a quick test on metric views and SDP! See Slack for results. So try a different skill (or one you made!) to get a good sense!

Also I recommend using AI gateway's fallbacks to avoid rate limits: https://e2-demo-field-eng.cloud.databricks.com/ml/ai-gateway/gepa-fallbacks?o=1444828305810485

@auschoi96
Copy link
Collaborator Author

Also snuck in an update to the app to use serverless compute accidentally so that's probably in one of these commits

@calreynolds
Copy link
Collaborator

Also snuck in an update to the app to use serverless compute accidentally so that's probably in one of these commits

I think that's actually a good thing as a default! just took a look and doesn't seem too crazy to me (since users can swap into other compute as they see fit)

then otherwise here are some fixes:

Fixes:

  1. app.yaml:117 — Replaced your hardcoded email (austin.choi@databricks.com) in MLFLOW_EXPERIMENT_NAME with an empty placeholder + comment
  2. alembic/env.py — Removed the duplicated _resolve_hostname function (24 lines) and imported it from database.py instead
  3. alembic/env.py — Added schema name validation (regex) before it gets interpolated into the CREATE SCHEMA IF NOT EXISTS SQL statement

To pull in:

git fetch origin cal/pr166-fixes
git cherry-pick d6c4d13

otherwise all good with me! i did testing on the parsing and looked good. great overhaul!

@calreynolds
Copy link
Collaborator

OK added in a couple more things into that temp branch, when those are merged I think we're ready to rock! 🥇

auschoi96 and others added 3 commits March 4, 2026 08:51
…, validate schema name

- Replace austin.choi@databricks.com with empty placeholder in MLFLOW_EXPERIMENT_NAME
- Import _resolve_hostname from database.py instead of duplicating in alembic/env.py
- Add regex validation on schema name before interpolation into SQL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@auschoi96
Copy link
Collaborator Author

@calreynolds done!

Copy link
Collaborator

@calreynolds calreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍👍👍

@calreynolds calreynolds merged commit 54d371b into databricks-solutions:main Mar 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants