Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3738572
Rewrite synthetic-data-generation for improved performance and features
dustinvannoy-db Feb 16, 2026
e73c86d
Fix databricks-connect version requirements for Python compatibility
dustinvannoy-db Feb 16, 2026
58f92d8
Merge branch 'main' into feature/improve_data_gen
dustinvannoy-db Feb 16, 2026
805dbb6
Improve synthetic-data-generation skill with Spark preference and cat…
dustinvannoy-db Feb 19, 2026
fccf575
Cleanup data gen skill
dustinvannoy-db Feb 20, 2026
eb82b21
Add stronger guidance to use Databricks Connect
dustinvannoy-db Feb 20, 2026
c9ec683
Update data gen for different run modes
dustinvannoy-db Feb 24, 2026
728e454
Small updates to databricks-connect and environments
dustinvannoy-db Feb 24, 2026
3f2c9e0
Updates to improve serverless dbconnect and polars local for data gen
dustinvannoy-db Feb 24, 2026
c15572f
Add guidance on cache with serverless
dustinvannoy-db Feb 24, 2026
bdb3ab6
Update data gen for better cluster/job guidance
dustinvannoy-db Feb 25, 2026
0b9c9b3
Update classic library install
dustinvannoy-db Feb 25, 2026
d177f62
Suggest uv and improve python task job payload
dustinvannoy-db Feb 25, 2026
0e61f04
Merge branch 'main' into feature/improve_data_gen
dustinvannoy-db Feb 25, 2026
84ae64f
Add new data gen tests (first 3)
dustinvannoy-db Feb 27, 2026
ded1cf2
Update data gen ground_truth and baseline
dustinvannoy-db Feb 27, 2026
c680269
Remove default catalog setting
dustinvannoy-db Feb 27, 2026
09a9cd8
Add window syntax common issue
dustinvannoy-db Feb 27, 2026
c7e335a
Rename and overhaul data gen skill and tests timeouts
dustinvannoy-db Mar 3, 2026
d7b3c07
Merge branch 'main' into feature/improve_data_gen
dustinvannoy-db Mar 3, 2026
d4a7e3a
Fix skill name mismatch and add missing skills to install scripts
dustinvannoy-db Mar 3, 2026
e310b67
Fix PR review issues for databricks-synthetic-data-gen skill
dustinvannoy-db Mar 3, 2026
d1a8660
Simplify serverless job config in test response
dustinvannoy-db Mar 3, 2026
9c74e61
Add Python 3.12+ requirement to run instructions
dustinvannoy-db Mar 3, 2026
aa4d8c9
Remove commented out lines from manifest.yaml
dustinvannoy-db Mar 3, 2026
179856f
Update databricks-connect version range and fix version detection
dustinvannoy-db Mar 3, 2026
8265a9b
Reduce guidelines for faster tests with mlflwo
dustinvannoy-db Mar 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Databricks AI Dev Kit
.ai-dev-kit/
.claude/

.local

# Python
__pycache__/
Expand Down
14 changes: 14 additions & 0 deletions .test/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,3 +233,17 @@ uv pip install -e ".test/"
uv run pytest .test/tests/
uv run python .test/scripts/regression.py <skill-name>
```

---

## Troubleshooting

### MLflow evaluation not returning results

If `/skill-test <skill-name> mlflow` hangs or doesn't return results, run manually with debug logging:

```bash
MLFLOW_LOG_LEVEL=DEBUG uv run python .test/scripts/mlflow_eval.py <skill-name>
```

This will show detailed MLflow API calls and help identify connection or authentication issues.
21 changes: 21 additions & 0 deletions .test/baselines/databricks-synthetic-data-gen/baseline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
run_id: '20260303_071721'
created_at: '2026-03-03T07:17:21.838623'
skill_name: databricks-synthetic-data-gen
metrics:
pass_rate: 1.0
total_tests: 4
passed_tests: 4
failed_tests: 0
test_results:
- id: grp_20260302_113344
passed: true
execution_mode: local
- id: gen_serverless_job_catalog_json_002
passed: true
execution_mode: local
- id: grp_20260302_retail_csv_3tables_003
passed: true
execution_mode: local
- id: grp_20260303_manufacturing_delta_streaming_004
passed: true
execution_mode: local
54 changes: 53 additions & 1 deletion .test/scripts/mlflow_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,65 @@
"""Run MLflow evaluation for a skill.

Usage:
python mlflow_eval.py <skill_name> [--filter-category <category>] [--run-name <name>]
python mlflow_eval.py <skill_name> [--filter-category <category>] [--run-name <name>] [--timeout <seconds>]

Environment Variables:
DATABRICKS_CONFIG_PROFILE - Databricks CLI profile (default: "DEFAULT")
MLFLOW_TRACKING_URI - Set to "databricks" for Databricks MLflow
MLFLOW_EXPERIMENT_NAME - Experiment path (e.g., "/Users/{user}/skill-test")
MLFLOW_LLM_JUDGE_TIMEOUT - Timeout in seconds for LLM judge evaluation (default: 120)
"""
import os
import sys
import signal
import argparse

# Close stdin and disable tqdm progress bars when run non-interactively
# This fixes hanging issues with tqdm/MLflow progress bars in background tasks
if not sys.stdin.isatty():
try:
sys.stdin.close()
sys.stdin = open(os.devnull, 'r')
except Exception:
pass
# Disable tqdm progress bars
os.environ.setdefault("TQDM_DISABLE", "1")

# Import common utilities
from _common import setup_path, print_result, handle_error


class TimeoutException(Exception):
pass


def timeout_handler(signum, frame):
raise TimeoutException("MLflow evaluation timed out")


def main():
parser = argparse.ArgumentParser(description="Run MLflow evaluation for a skill")
parser.add_argument("skill_name", help="Name of skill to evaluate")
parser.add_argument("--filter-category", help="Filter by test category")
parser.add_argument("--run-name", help="Custom MLflow run name")
parser.add_argument(
"--timeout",
type=int,
default=120,
help="Timeout in seconds for evaluation (default: 120)",
)
args = parser.parse_args()

setup_path()

# Set up signal-based timeout (Unix only)
if hasattr(signal, 'SIGALRM'):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(args.timeout)
else:
# Windows: SIGALRM not available - no timeout enforcement
print("WARNING: Timeout not supported on Windows - test may run indefinitely", file=sys.stderr)

try:
from skill_test.runners import evaluate_skill

Expand All @@ -34,6 +70,10 @@ def main():
run_name=args.run_name,
)

# Cancel the alarm if we succeeded
if hasattr(signal, 'SIGALRM'):
signal.alarm(0)

# Convert to standard result format
if result.get("run_id"):
result["success"] = True
Expand All @@ -42,7 +82,19 @@ def main():

sys.exit(print_result(result))

except TimeoutException as e:
result = {
"success": False,
"skill_name": args.skill_name,
"error": f"Evaluation timed out after {args.timeout} seconds. This may indicate LLM judge endpoint issues.",
"error_type": "timeout",
}
sys.exit(print_result(result))

except Exception as e:
# Cancel alarm on any exception
if hasattr(signal, 'SIGALRM'):
signal.alarm(0)
sys.exit(handle_error(e, args.skill_name))


Expand Down
2 changes: 1 addition & 1 deletion .test/skills/_routing/ground_truth.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ test_cases:
prompt: "Generate synthetic customer data and evaluate the agent quality with MLflow scorers"
expectations:
expected_skills:
- "databricks-synthetic-data-generation"
- "databricks-synthetic-data-gen"
- "databricks-mlflow-evaluation"
is_multi_skill: true
metadata:
Expand Down
7 changes: 7 additions & 0 deletions .test/skills/databricks-synthetic-data-gen/candidates.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Candidates for databricks-synthetic-data-gen skill
# Test cases pending review before promotion to ground_truth.yaml
#
# Use `/skill-test databricks-synthetic-data-gen add` to create new candidates
# Use `/skill-test databricks-synthetic-data-gen review` to promote candidates to ground truth

candidates: []
Loading