Add GEPA Optimize_Anything skill eval and optimize Capability for Skill evaluation#166
Conversation
… add examples from MLflow traces or manual
|
a significant portion of the lines from this comes from the generated yaml files, manifest, ground_truth and candidates |
Add 7-lakehouse-monitoring.md reference file covering quality monitors, profile types (Snapshot, TimeSeries, InferenceLog), MCP tool usage, and Python SDK examples. Update SKILL.md with trigger condition and reference table entry. Tested against a live Databricks workspace - created and verified a snapshot monitor on a Unity Catalog table.
- SKILL.md: updated trigger bullet and reference table to data profiling - Renamed 7-lakehouse-monitoring.md to 7-data-profiling.md with new w.data_quality SDK examples - Added new Data Quality docs and SDK references, kept legacy Lakehouse Monitoring SDK link for backward compatibility
…ently. This is because tools are used universally so we may not be able to optimize the two together
|
This PR touches many files, but most are data, not code:
For a full explanation of what's happening, check out the read me in .test/README.md |
|
Woot woot! OK cool tomorrow i'm getting my bifocals on and going through this in-depth. will do a proper proper test of testing! awesome stuff, on PR glance looks excellent. |
|
@calreynolds I did a quick test on metric views and SDP! See Slack for results. So try a different skill (or one you made!) to get a good sense! Also I recommend using AI gateway's fallbacks to avoid rate limits: https://e2-demo-field-eng.cloud.databricks.com/ml/ai-gateway/gepa-fallbacks?o=1444828305810485 |
|
Also snuck in an update to the app to use serverless compute accidentally so that's probably in one of these commits |
I think that's actually a good thing as a default! just took a look and doesn't seem too crazy to me (since users can swap into other compute as they see fit) then otherwise here are some fixes: Fixes:
To pull in: git fetch origin cal/pr166-fixes
git cherry-pick d6c4d13otherwise all good with me! i did testing on the parsing and looked good. great overhaul! |
|
OK added in a couple more things into that temp branch, when those are merged I think we're ready to rock! 🥇 |
…, validate schema name - Replace austin.choi@databricks.com with empty placeholder in MLFLOW_EXPERIMENT_NAME - Import _resolve_hostname from database.py instead of duplicating in alembic/env.py - Add regex validation on schema name before interpolation into SQL Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@calreynolds done! |
Addition to the .test file to support GEPA optimization and skill evaluation. Added a README to explain how to use this