diff --git a/.claude/skills/openstack-ci-analysis/SKILL.md b/.claude/skills/openstack-ci-analysis/SKILL.md new file mode 100644 index 0000000..91e2758 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/SKILL.md @@ -0,0 +1,158 @@ +--- +name: openstack-ci-analysis +description: Analyzes OpenStack CI job health, pass rates, coverage gaps, and failure categories. Use when asked to analyze CI jobs, generate CI health reports, compare platform performance, or investigate job failures for OpenStack/ShiftStack. +--- + +# OpenStack CI Analysis + +Comprehensive analysis of OpenStack CI job health using Sippy API metrics and CI configuration data. + +## Prerequisites + +- Python 3.6+ with pyyaml: `pip install pyyaml` +- Access to openshift/release repository (for `ci-operator/config`) + +## Quick Start + +Run all analysis with the wrapper script: + +```bash +python3 scripts/run_analysis.sh \ + --config-dir /path/to/release/ci-operator/config \ + --output-dir /tmp/analysis +``` + +Add `--force` to refresh cached Sippy data. + +## Workflow + +### Phase 1: Data Collection + +Run in order (each depends on prior outputs): + +```bash +# 1. Extract job inventory from CI config YAML files +python3 scripts/extract_openstack_jobs.py \ + --config-dir $CONFIG_DIR \ + --output-dir $OUTPUT_DIR \ + --summary + +# 2. Fetch pass rates from Sippy API +python3 scripts/fetch_job_metrics.py --output-dir $OUTPUT_DIR + +# 3. Calculate 14-day combined metrics +python3 scripts/fetch_extended_metrics.py --output-dir $OUTPUT_DIR + +# 4. Fetch platform comparison data +python3 scripts/fetch_comparison_data.py --output-dir $OUTPUT_DIR +``` + +### Phase 2: Configuration Analysis + +These analyze the job inventory (can run in parallel): + +```bash +python3 scripts/analyze_redundancy.py --output-dir $OUTPUT_DIR +python3 scripts/analyze_coverage.py --output-dir $OUTPUT_DIR +python3 scripts/analyze_triggers.py --output-dir $OUTPUT_DIR +``` + +### Phase 3: Runtime Analysis + +These analyze Sippy metrics (can run in parallel): + +```bash +python3 scripts/analyze_platform_comparison.py --output-dir $OUTPUT_DIR +python3 scripts/analyze_workflow_passrate.py --output-dir $OUTPUT_DIR +python3 scripts/categorize_failures.py --output-dir $OUTPUT_DIR +``` + +## Scripts Reference + +| Script | Purpose | Requires | +|--------|---------|----------| +| `extract_openstack_jobs.py` | Extract jobs from ci-operator/config | config-dir | +| `fetch_job_metrics.py` | Fetch Sippy API metrics | - | +| `fetch_extended_metrics.py` | 14-day combined metrics | sippy_jobs_raw.json | +| `fetch_comparison_data.py` | Platform comparison data | - | +| `analyze_redundancy.py` | Find duplicate/overlapping jobs | inventory.json | +| `analyze_coverage.py` | Find coverage gaps across releases | inventory.json | +| `analyze_triggers.py` | Trigger optimization opportunities | inventory.json | +| `analyze_platform_comparison.py` | OpenStack vs AWS/GCP/Azure | platform_comparison_raw.json | +| `analyze_workflow_passrate.py` | Pass rates by workflow type | inventory.json, sippy_jobs_raw.json | +| `categorize_failures.py` | Classify failures by root cause | extended_metrics_jobs.json | + +## Output Files + +### Reports (Markdown) + +| File | Contents | +|------|----------| +| `extended_metrics_report.md` | Overall health, trends, problem jobs | +| `platform_comparison_report.md` | OpenStack vs other platforms | +| `workflow_passrate_report.md` | Pass rates by workflow | +| `failure_categories_report.md` | Failures by root cause | +| `coverage_gaps_report.md` | Missing test coverage | +| `trigger_optimization_report.md` | Trigger improvements | +| `redundant_jobs_report.md` | Consolidation opportunities | + +### Data (JSON) + +| File | Contents | +|------|----------| +| `openstack_jobs_inventory.json` | Complete job inventory | +| `sippy_jobs_raw.json` | Cached Sippy data | +| `extended_metrics.json` | Combined metrics | +| `platform_comparison_analysis.json` | Platform analysis | +| `failure_categories.json` | Categorized failures | + +## Generating Executive Summary + +After running all scripts, extract key metrics: + +```python +import json +import os + +d = os.environ.get('OUTPUT_DIR', '.') + +ext = json.load(open(f'{d}/extended_metrics.json')) +plat = json.load(open(f'{d}/platform_comparison_analysis.json')) +fail = json.load(open(f'{d}/failure_categories.json')) + +print(f"Pass rate: {ext['overall']['combined_pass_rate']:.1f}%") +print(f"Problem jobs: {ext['overall']['problem_job_count']}") +print(f"OpenStack rank: #{plat['openstack_position']['rank']}/{plat['openstack_position']['total']}") + +print("\nFailure Categories:") +for cat, count in fail['summary']['by_category'].items(): + pct = fail['summary']['percentages'][cat] + print(f" {cat}: {count} ({pct}%)") +``` + +## Cluster Profiles Analyzed + +- openstack-vexxhost +- openstack-vh-mecha-central +- openstack-vh-mecha-az0 +- openstack-vh-bm-rhos +- openstack-hwoffload +- openstack-nfv + +## Failure Categories + +| Category | Criteria | +|----------|----------| +| Infrastructure | Low pass rate on install/provision jobs | +| Flaky | 30-70% pass rate (inconsistent) | +| Product Bug | Low pass rate with bugs filed | +| Needs Triage | Unknown cause, requires investigation | + +## Troubleshooting + +| Error | Solution | +|-------|----------| +| "No Sippy data found" | Run `fetch_job_metrics.py` first | +| "No job inventory found" | Run `extract_openstack_jobs.py` first | +| Import error for yaml | `pip install pyyaml` | +| Config directory not found | Point to ci-operator/config in openshift/release repo | diff --git a/.claude/skills/openstack-ci-analysis/scripts/analyze_coverage.py b/.claude/skills/openstack-ci-analysis/scripts/analyze_coverage.py new file mode 100755 index 0000000..1dbedcc --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/analyze_coverage.py @@ -0,0 +1,403 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI test coverage across releases. + +This script identifies: +1. Coverage matrix (which tests run on which releases) +2. Coverage gaps (tests missing from certain releases) +3. Release-to-release differences +4. Workflow usage across releases +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +# Current and recent releases to focus on +ACTIVE_RELEASES = [ + "release-4.17", + "release-4.18", + "release-4.19", + "release-4.20", + "release-4.21", + "release-4.22", + "release-4.23", +] + +MAIN_BRANCHES = ["main", "master"] + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def normalize_branch(branch): + """Normalize branch name for comparison.""" + if branch in MAIN_BRANCHES: + return "main/master" + return branch + + +def get_release_version(branch): + """Extract version number from release branch.""" + if branch.startswith("release-"): + return branch.replace("release-", "") + return None + + +def build_test_matrix(jobs): + """ + Build a matrix of tests by (workflow, cluster_profile) across releases. + """ + # Group by (org, repo) to see coverage per repo + repo_coverage = defaultdict(lambda: defaultdict(set)) + + # Group by (workflow, cluster_profile) to see overall test coverage + test_coverage = defaultdict(set) + + for job in jobs: + org = job['org'] + repo = job['repo'] + branch = job['branch'] + workflow = job['workflow'] + cluster = job['cluster_profile'] + job_name = job['job_name'] + + # Normalize branch + normalized = normalize_branch(branch) + + # Track per-repo coverage + if workflow: + key = (org, repo, job_name, workflow, cluster) + repo_coverage[key][normalized].add(job['config_file']) + + # Track workflow coverage + if workflow: + test_key = (workflow, cluster, job_name) + test_coverage[test_key].add(normalized) + + return repo_coverage, test_coverage + + +def analyze_release_coverage(jobs): + """ + Analyze which releases have what coverage. + """ + # Count jobs per release + release_counts = defaultdict(int) + for job in jobs: + branch = normalize_branch(job['branch']) + release_counts[branch] += 1 + + # Count unique test types per release + release_tests = defaultdict(set) + for job in jobs: + branch = normalize_branch(job['branch']) + key = (job['workflow'], job['cluster_profile'], job['job_name']) + release_tests[branch].add(key) + + return release_counts, release_tests + + +def find_coverage_gaps(jobs): + """ + Find tests that exist in some releases but not others. + """ + # Group jobs by (org, repo, job_name) + job_releases = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo'], job['job_name']) + job_releases[key].add(job['branch']) + + # Get all releases present per repo + repo_releases = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo']) + repo_releases[key].add(job['branch']) + + # Find gaps + gaps = [] + for (org, repo, job_name), present_releases in job_releases.items(): + all_releases = repo_releases[(org, repo)] + + # Focus on active releases + active_present = set(r for r in present_releases if r in ACTIVE_RELEASES) + active_all = set(r for r in all_releases if r in ACTIVE_RELEASES) + + # If job exists in some active releases but not all, it's a gap + missing = active_all - active_present + if missing and active_present: # Has some active releases but missing others + gaps.append({ + 'org': org, + 'repo': repo, + 'job_name': job_name, + 'present': sorted(active_present), + 'missing': sorted(missing), + }) + + return gaps + + +def analyze_workflow_coverage(jobs): + """ + Analyze which workflows are used in which releases. + """ + workflow_releases = defaultdict(lambda: defaultdict(int)) + + for job in jobs: + workflow = job['workflow'] + if not workflow: + continue + branch = job['branch'] + if branch in ACTIVE_RELEASES or branch in MAIN_BRANCHES: + normalized = normalize_branch(branch) + workflow_releases[workflow][normalized] += 1 + + return workflow_releases + + +def analyze_cluster_profile_usage(jobs): + """ + Analyze cluster profile usage by release. + """ + profile_releases = defaultdict(lambda: defaultdict(int)) + + for job in jobs: + profile = job['cluster_profile'] + branch = job['branch'] + if branch in ACTIVE_RELEASES or branch in MAIN_BRANCHES: + normalized = normalize_branch(branch) + profile_releases[profile][normalized] += 1 + + return profile_releases + + +def generate_report(jobs, output_file): + """Generate comprehensive coverage report.""" + report = [] + report.append("# OpenStack CI Test Coverage Analysis Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + # Release Coverage Summary + report.append("\n## 1. Jobs by Release\n\n") + release_counts, release_tests = analyze_release_coverage(jobs) + + # Sort releases naturally + def sort_key(r): + if r == "main/master": + return (1, "zzz") + if r.startswith("release-"): + ver = r.replace("release-", "") + parts = ver.split(".") + return (0, tuple(int(p) for p in parts if p.isdigit())) + return (2, r) + + sorted_releases = sorted(release_counts.keys(), key=sort_key) + + report.append("| Release | Total Jobs | Unique Tests |\n") + report.append("|---------|------------|---------------|\n") + for release in sorted_releases[-15:]: # Last 15 releases + count = release_counts[release] + unique = len(release_tests[release]) + report.append(f"| {release} | {count} | {unique} |\n") + + # Cluster Profile Usage + report.append("\n## 2. Cluster Profile Usage by Release\n\n") + profile_usage = analyze_cluster_profile_usage(jobs) + + # Header + active_releases_sorted = sorted( + [normalize_branch(r) for r in ACTIVE_RELEASES + MAIN_BRANCHES], + key=sort_key + )[-6:] # Last 6 + + report.append("| Cluster Profile | " + + " | ".join(active_releases_sorted) + " |\n") + report.append("|" + "-" * 17 + "|" + + "|".join(["-" * 8 for _ in active_releases_sorted]) + "|\n") + + for profile in sorted(profile_usage.keys()): + counts = [str(profile_usage[profile].get(r, 0)) + for r in active_releases_sorted] + report.append(f"| {profile} | " + " | ".join(counts) + " |\n") + + # Workflow Usage + report.append("\n## 3. Workflow Usage by Release\n\n") + workflow_usage = analyze_workflow_coverage(jobs) + + report.append("| Workflow | " + + " | ".join(active_releases_sorted) + " |\n") + report.append("|" + "-" * 40 + "|" + + "|".join(["-" * 8 for _ in active_releases_sorted]) + "|\n") + + for workflow in sorted(workflow_usage.keys()): + counts = [str(workflow_usage[workflow].get(r, 0)) + for r in active_releases_sorted] + report.append(f"| {workflow} | " + " | ".join(counts) + " |\n") + + # Coverage Gaps + report.append("\n## 4. Coverage Gaps\n") + report.append("Tests present in some active releases but missing from others.\n\n") + + gaps = find_coverage_gaps(jobs) + + if gaps: + # Group by repo + repo_gaps = defaultdict(list) + for gap in gaps: + repo_gaps[(gap['org'], gap['repo'])].append(gap) + + report.append(f"Found {len(gaps)} coverage gaps across " + f"{len(repo_gaps)} repositories.\n\n") + + report.append("### By Repository\n\n") + for (org, repo), repo_gap_list in sorted(repo_gaps.items()): + report.append(f"#### {org}/{repo}\n\n") + report.append("| Job | Present | Missing |\n") + report.append("|-----|---------|----------|\n") + for gap in repo_gap_list[:10]: + present = ', '.join(gap['present'][:3]) + if len(gap['present']) > 3: + present += f" (+{len(gap['present'])-3})" + missing = ', '.join(gap['missing'][:3]) + if len(gap['missing']) > 3: + missing += f" (+{len(gap['missing'])-3})" + report.append(f"| {gap['job_name']} | {present} | {missing} |\n") + if len(repo_gap_list) > 10: + report.append(f"\n... and {len(repo_gap_list)-10} more gaps\n") + report.append("\n") + else: + report.append("No coverage gaps found in active releases.\n") + + # Test Type Analysis + report.append("\n## 5. Test Type Coverage\n") + report.append("Summary of test types and their coverage.\n\n") + + # Categorize by test name patterns + test_categories = { + 'e2e-basic': [], + 'e2e-conformance': [], + 'e2e-csi': [], + 'e2e-nfv': [], + 'e2e-upgrade': [], + 'e2e-other': [], + } + + for job in jobs: + name = job['job_name'].lower() + if 'csi' in name or 'manila' in name or 'cinder' in name: + test_categories['e2e-csi'].append(job) + elif 'nfv' in name or 'sriov' in name or 'hwoffload' in name: + test_categories['e2e-nfv'].append(job) + elif 'upgrade' in name: + test_categories['e2e-upgrade'].append(job) + elif 'parallel' in name or 'serial' in name or 'conformance' in name: + test_categories['e2e-conformance'].append(job) + elif name.endswith('e2e-openstack') or name == 'e2e-openstack-ovn': + test_categories['e2e-basic'].append(job) + else: + test_categories['e2e-other'].append(job) + + report.append("| Category | Total Jobs | Unique Tests |\n") + report.append("|----------|------------|---------------|\n") + for category, cat_jobs in sorted(test_categories.items()): + unique = len(set((j['job_name'], j['workflow']) for j in cat_jobs)) + report.append(f"| {category} | {len(cat_jobs)} | {unique} |\n") + + # Recommendations + report.append("\n## 6. Coverage Recommendations\n\n") + + report.append("### Missing Coverage Areas\n\n") + + # Check for releases with low coverage + active_counts = {r: release_counts.get(r, 0) + for r in ACTIVE_RELEASES} + avg_count = sum(active_counts.values()) / len(active_counts) if active_counts else 0 + + low_coverage = [r for r, c in active_counts.items() if c < avg_count * 0.7] + if low_coverage: + report.append(f"1. **Low coverage releases**: {', '.join(sorted(low_coverage))} " + f"have fewer jobs than average.\n\n") + + # Check for profile gaps + for profile in sorted(profile_usage.keys()): + releases_with_profile = [r for r, c in profile_usage[profile].items() if c > 0] + if len(releases_with_profile) < len(active_releases_sorted) - 1: + missing = set(active_releases_sorted) - set(releases_with_profile) + if missing: + report.append(f"2. **{profile}**: Missing from {', '.join(sorted(missing))}\n\n") + + report.append("### Consolidation Opportunities\n\n") + report.append("- Jobs that appear in all releases with same config could use shared workflows\n") + report.append("- Consider periodic-only coverage for older releases (4.17, 4.18)\n") + report.append("- Evaluate if all cluster profiles need coverage in all releases\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'release_counts': dict(release_counts), + 'workflow_usage': {k: dict(v) for k, v in workflow_usage.items()}, + 'profile_usage': {k: dict(v) for k, v in profile_usage.items()}, + 'coverage_gaps': gaps[:100], + 'test_categories': {k: len(v) for k, v in test_categories.items()}, + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI test coverage" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Coverage Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "coverage_gaps_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/analyze_platform_comparison.py b/.claude/skills/openstack-ci-analysis/scripts/analyze_platform_comparison.py new file mode 100755 index 0000000..3cc5658 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/analyze_platform_comparison.py @@ -0,0 +1,293 @@ +#!/usr/bin/env python3 +""" +Analyze platform comparison data and generate report. +Compares OpenStack CI pass rates against AWS, GCP, Azure, vSphere. +""" + +import argparse +import json +import os +import sys +from datetime import datetime + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] +TARGET_PLATFORMS = ["OpenStack", "AWS", "GCP", "Azure", "vSphere", "Metal"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Analyze platform comparison data and generate report" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_comparison_data(): + """Load platform comparison raw data.""" + filepath = os.path.join(OUTPUT_DIR, "platform_comparison_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_metrics(): + """Load extended metrics for OpenStack-specific data.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def analyze_platforms(data, openstack_metrics): + """Analyze platform comparison data.""" + results = { + "generated": datetime.now().isoformat(), + "overall": {}, + "by_release": {}, + "openstack_position": {}, + } + + # Overall platform comparison + overall = data.get("overall_by_platform", {}) + + # Calculate OpenStack baseline + openstack_rate = 0 + if "OpenStack" in overall: + openstack_rate = overall["OpenStack"].get("pass_rate", 0) + elif openstack_metrics: + openstack_rate = openstack_metrics.get("overall", {}).get("combined_pass_rate", 0) + + # Build comparison table + platforms = [] + for platform in TARGET_PLATFORMS: + if platform in overall: + pdata = overall[platform] + rate = pdata.get("pass_rate", 0) + delta = rate - openstack_rate if platform != "OpenStack" else 0 + platforms.append({ + "platform": platform, + "job_count": pdata.get("job_count", 0), + "total_runs": pdata.get("total_runs", 0), + "total_passes": pdata.get("total_passes", 0), + "pass_rate": rate, + "vs_openstack": delta, + }) + + # Sort by pass rate descending + platforms.sort(key=lambda x: -x["pass_rate"]) + results["overall"]["platforms"] = platforms + + # Find OpenStack position + for i, p in enumerate(platforms): + if p["platform"] == "OpenStack": + results["openstack_position"]["rank"] = i + 1 + results["openstack_position"]["total"] = len(platforms) + break + + # Per-release comparison + for release in RELEASES: + release_data = data.get("releases", {}).get(release, {}) + job_metrics = release_data.get("job_metrics", {}) + + release_platforms = [] + for platform in TARGET_PLATFORMS: + if platform in job_metrics: + pdata = job_metrics[platform] + release_platforms.append({ + "platform": platform, + "job_count": pdata.get("job_count", 0), + "total_runs": pdata.get("total_runs", 0), + "pass_rate": pdata.get("pass_rate", 0), + }) + + release_platforms.sort(key=lambda x: -x["pass_rate"]) + results["by_release"][release] = release_platforms + + return results + + +def generate_report(analysis): + """Generate markdown report for platform comparison.""" + report = [] + report.append("# Platform Comparison Report") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("This report compares OpenStack CI job pass rates against other cloud platforms.") + report.append("") + + # Executive summary + report.append("## Executive Summary") + report.append("") + + platforms = analysis.get("overall", {}).get("platforms", []) + pos = analysis.get("openstack_position", {}) + + if pos: + report.append(f"OpenStack ranks **#{pos.get('rank', '?')} of {pos.get('total', '?')}** platforms by pass rate.") + report.append("") + + # Find best performer for comparison + if platforms: + best = platforms[0] + openstack = next((p for p in platforms if p["platform"] == "OpenStack"), None) + if openstack and best["platform"] != "OpenStack": + gap = best["pass_rate"] - openstack["pass_rate"] + report.append(f"- **Gap to best ({best['platform']}):** {gap:+.1f}%") + if openstack: + report.append(f"- **OpenStack pass rate:** {openstack['pass_rate']:.1f}%") + report.append(f"- **OpenStack job volume:** {openstack['total_runs']:,} runs across {openstack['job_count']} jobs") + report.append("") + + # Overall comparison table + report.append("## Overall Platform Comparison") + report.append("") + report.append("| Rank | Platform | Jobs | Runs | Pass Rate | vs OpenStack |") + report.append("|------|----------|------|------|-----------|--------------|") + + for i, p in enumerate(platforms, 1): + delta = p.get("vs_openstack", 0) + delta_str = f"+{delta:.1f}%" if delta > 0 else (f"{delta:.1f}%" if delta < 0 else "baseline") + runs_str = f"{p['total_runs']:,}" if p['total_runs'] >= 1000 else str(p['total_runs']) + report.append( + f"| {i} | {p['platform']} | {p['job_count']} | {runs_str} | " + f"{p['pass_rate']:.1f}% | {delta_str} |" + ) + report.append("") + + # Key observations + report.append("## Key Observations") + report.append("") + + if platforms: + openstack = next((p for p in platforms if p["platform"] == "OpenStack"), None) + if openstack: + # Calculate how many platforms are better + better = [p for p in platforms if p["pass_rate"] > openstack["pass_rate"]] + worse = [p for p in platforms if p["pass_rate"] < openstack["pass_rate"]] + + if better: + report.append(f"### Platforms with Better Pass Rates ({len(better)})") + report.append("") + for p in better: + gap = p["pass_rate"] - openstack["pass_rate"] + report.append(f"- **{p['platform']}:** {p['pass_rate']:.1f}% (+{gap:.1f}% vs OpenStack)") + report.append("") + + if worse: + report.append(f"### Platforms with Lower Pass Rates ({len(worse)})") + report.append("") + for p in worse: + gap = openstack["pass_rate"] - p["pass_rate"] + report.append(f"- **{p['platform']}:** {p['pass_rate']:.1f}% (-{gap:.1f}% vs OpenStack)") + report.append("") + + # Per-release breakdown + report.append("## Pass Rate by Release") + report.append("") + report.append("| Release | " + " | ".join(TARGET_PLATFORMS) + " |") + report.append("|---------|" + "|".join(["-------"] * len(TARGET_PLATFORMS)) + "|") + + for release in RELEASES: + release_data = analysis.get("by_release", {}).get(release, []) + rates = {} + for p in release_data: + rates[p["platform"]] = p["pass_rate"] + + row = f"| {release} |" + for platform in TARGET_PLATFORMS: + if platform in rates: + row += f" {rates[platform]:.1f}% |" + else: + row += " - |" + report.append(row) + report.append("") + + # Analysis + report.append("## Analysis") + report.append("") + report.append("### Potential Causes for Pass Rate Differences") + report.append("") + report.append("1. **Infrastructure maturity**: Platforms with longer CI history may have more stable infrastructure") + report.append("2. **Test suite differences**: Each platform runs different test subsets") + report.append("3. **Job volume**: Higher volume platforms may have more resources/attention") + report.append("4. **Platform complexity**: Some platforms have inherent complexity differences") + report.append("") + + report.append("### Recommendations") + report.append("") + report.append("1. Investigate top-performing platform configurations for applicable improvements") + report.append("2. Compare test failure patterns across platforms") + report.append("3. Review infrastructure provisioning reliability") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Platform Comparison Analysis") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + data = load_comparison_data() + if not data: + print("Error: No platform comparison data found.") + print("Run fetch_comparison_data.py first.") + sys.exit(1) + + openstack_metrics = load_extended_metrics() + + print(f"Loaded data from: {data.get('fetched_at')}") + print() + + # Analyze + analysis = analyze_platforms(data, openstack_metrics) + + # Save analysis + analysis_path = os.path.join(OUTPUT_DIR, "platform_comparison_analysis.json") + with open(analysis_path, 'w') as f: + json.dump(analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "platform_comparison_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + platforms = analysis.get("overall", {}).get("platforms", []) + for i, p in enumerate(platforms, 1): + marker = " <-- OpenStack" if p["platform"] == "OpenStack" else "" + print(f" {i}. {p['platform']}: {p['pass_rate']:.1f}%{marker}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/analyze_redundancy.py b/.claude/skills/openstack-ci-analysis/scripts/analyze_redundancy.py new file mode 100755 index 0000000..fb1c92f --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/analyze_redundancy.py @@ -0,0 +1,309 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI jobs for redundancy and consolidation opportunities. + +This script identifies: +1. Duplicate jobs between openshift and openshift-priv organizations +2. Similar tests running on the same code paths +3. Jobs with overlapping functionality +4. Presubmit jobs that could be consolidated +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + # Convert boolean strings to actual booleans + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def analyze_same_workflow_same_branch(jobs): + """ + Find cases where multiple jobs in the SAME repo/branch use identical + workflow + cluster_profile combinations. + + These might be testing overlapping functionality and could potentially + be consolidated, though they may have different env vars or test suites. + + NOTE: Jobs existing across different branches is EXPECTED, not redundant. + NOTE: Jobs in openshift/ vs openshift-priv/ are separate GitHub gates, not redundant. + """ + duplicates = [] + + # Group jobs by (org, repo, branch, workflow, cluster_profile) + job_groups = defaultdict(list) + for job in jobs: + if not job['workflow']: + continue + key = ( + job['org'], + job['repo'], + job['branch'], + job['workflow'], + job['cluster_profile'] + ) + job_groups[key].append(job) + + for key, group in job_groups.items(): + if len(group) > 1: + duplicates.append({ + 'org': key[0], + 'repo': key[1], + 'branch': key[2], + 'workflow': key[3], + 'cluster_profile': key[4], + 'job_count': len(group), + 'jobs': [j['job_name'] for j in group], + 'files': list(set(j['config_file'] for j in group)) + }) + + return duplicates + + +def analyze_presubmit_triggers(jobs): + """ + Analyze presubmit job trigger patterns. + Identify jobs that are always_run=true without throttling. + """ + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + + # Group by trigger pattern + always_run_no_throttle = [] + always_run_with_throttle = [] + optional_jobs = [] + conditional_jobs = [] + + for job in presubmit_jobs: + if job['always_run']: + if job['minimum_interval']: + always_run_with_throttle.append(job) + else: + always_run_no_throttle.append(job) + elif job['optional']: + optional_jobs.append(job) + elif job['run_if_changed'] or job['skip_if_only_changed']: + conditional_jobs.append(job) + else: + # Default presubmit (runs on PR but not always) + optional_jobs.append(job) + + return { + 'always_run_no_throttle': always_run_no_throttle, + 'always_run_with_throttle': always_run_with_throttle, + 'optional': optional_jobs, + 'conditional': conditional_jobs, + } + + +def analyze_branch_consistency(jobs): + """ + Find jobs that exist on some branches but not others. + Helps identify inconsistent coverage across releases. + """ + # Group by (org, repo, job_name) + job_groups = defaultdict(set) + for job in jobs: + key = (job['org'], job['repo'], job['job_name']) + job_groups[key].add(job['branch']) + + # Find jobs that have inconsistent branch coverage + inconsistencies = [] + repo_branches = defaultdict(set) + + for job in jobs: + repo_branches[(job['org'], job['repo'])].add(job['branch']) + + for (org, repo, job_name), branches in job_groups.items(): + all_branches = repo_branches[(org, repo)] + missing = all_branches - branches + if missing and len(branches) > 1: + inconsistencies.append({ + 'org': org, + 'repo': repo, + 'job_name': job_name, + 'present_branches': sorted(branches), + 'missing_branches': sorted(missing), + }) + + return inconsistencies + + +def generate_report(jobs, output_file): + """Generate comprehensive redundancy report.""" + report = [] + report.append("# OpenStack CI Job Redundancy Analysis Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + report.append("\n## Understanding This Report\n\n") + report.append("**What is NOT redundant:**\n") + report.append("- Jobs existing across different branches (release-4.20, release-4.21, etc.)\n") + report.append("- Jobs in both openshift/ and openshift-priv/ (separate GitHub gates)\n\n") + report.append("**What MAY be redundant:**\n") + report.append("- Multiple jobs in the SAME repo/branch using identical workflow+cluster\n") + report.append("- Jobs with overlapping test coverage\n\n") + + # Same Workflow/Cluster in Same Repo/Branch + report.append("\n## 1. Multiple Jobs with Same Workflow+Cluster\n") + report.append("Cases where multiple jobs in the SAME repo/branch use identical\n") + report.append("workflow + cluster_profile combinations.\n\n") + report.append("These MAY be intentional (different test suites, env vars) or\n") + report.append("could potentially be consolidated.\n\n") + + workflow_dups = analyze_same_workflow_same_branch(jobs) + if workflow_dups: + report.append(f"Found {len(workflow_dups)} cases of workflow duplication.\n\n") + report.append("| Org/Repo | Branch | Workflow | Jobs |\n") + report.append("|----------|--------|----------|------|\n") + for dup in sorted(workflow_dups, + key=lambda x: x['job_count'], reverse=True)[:20]: + jobs_str = ', '.join(dup['jobs'][:3]) + if len(dup['jobs']) > 3: + jobs_str += f" (+{len(dup['jobs'])-3} more)" + report.append( + f"| {dup['org']}/{dup['repo']} | {dup['branch']} | " + f"{dup['workflow']} | {jobs_str} |\n" + ) + else: + report.append("No workflow duplications found.\n") + + # Presubmit Trigger Analysis + report.append("\n## 2. Presubmit Trigger Analysis\n") + triggers = analyze_presubmit_triggers(jobs) + + report.append(f"\n### Trigger Pattern Summary\n\n") + report.append("| Pattern | Count | % of Presubmits |\n") + report.append("|---------|-------|------------------|\n") + + total_presubmit = sum(len(v) for v in triggers.values()) + for pattern, jobs_list in triggers.items(): + pct = len(jobs_list) / total_presubmit * 100 if total_presubmit else 0 + report.append(f"| {pattern} | {len(jobs_list)} | {pct:.1f}% |\n") + + # Always run without throttle is concerning + if triggers['always_run_no_throttle']: + report.append("\n### Always Run Jobs Without Throttling\n") + report.append("These run on every PR without minimum_interval.\n\n") + by_repo = defaultdict(list) + for job in triggers['always_run_no_throttle']: + by_repo[(job['org'], job['repo'])].append(job) + + report.append("| Org/Repo | Jobs |\n") + report.append("|----------|------|\n") + for (org, repo), jobs_list in sorted(by_repo.items()): + job_names = ', '.join(set(j['job_name'] for j in jobs_list))[:60] + report.append(f"| {org}/{repo} | {job_names} |\n") + + # Branch Consistency + report.append("\n## 3. Branch Coverage Inconsistencies\n") + report.append("Jobs present on some branches but missing from others.\n\n") + + inconsistencies = analyze_branch_consistency(jobs) + if inconsistencies: + # Filter to significant inconsistencies (missing recent releases) + significant = [i for i in inconsistencies + if any('release-4.2' in b or 'main' in b or 'master' in b + for b in i['missing_branches'])] + + report.append(f"Found {len(significant)} significant inconsistencies.\n\n") + + if significant: + report.append("| Org/Repo | Job | Missing Branches |\n") + report.append("|----------|-----|------------------|\n") + for inc in significant[:30]: + missing = ', '.join(inc['missing_branches'][:3]) + if len(inc['missing_branches']) > 3: + missing += f" (+{len(inc['missing_branches'])-3})" + report.append( + f"| {inc['org']}/{inc['repo']} | {inc['job_name']} | {missing} |\n" + ) + else: + report.append("No significant inconsistencies found.\n") + + # Consolidation Opportunities + report.append("\n## 4. Recommendations\n\n") + + report.append("### Review Items\n\n") + + if workflow_dups: + report.append("1. **Same workflow+cluster jobs**: Review jobs using identical\n") + report.append(" workflow+cluster in the same repo/branch. These may have\n") + report.append(" different env vars or test suites, but could potentially\n") + report.append(" be consolidated if testing overlapping functionality.\n") + report.append(f" - Cases to review: {len(workflow_dups)}\n\n") + + report.append("2. **Always-run jobs**: Review jobs marked `always_run: true` " + "without `minimum_interval` throttling.\n") + report.append(f" - Jobs to review: {len(triggers['always_run_no_throttle'])}\n\n") + + report.append("3. **Branch inconsistencies**: Consider adding missing jobs " + "to recent release branches for consistent coverage.\n") + report.append(f" - Inconsistencies found: {len(inconsistencies)}\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'same_workflow_same_branch': workflow_dups, + 'trigger_analysis': {k: len(v) for k, v in triggers.items()}, + 'branch_inconsistencies': inconsistencies[:100], + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI jobs for redundancy" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Redundancy Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "redundant_jobs_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/analyze_triggers.py b/.claude/skills/openstack-ci-analysis/scripts/analyze_triggers.py new file mode 100755 index 0000000..02e2799 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/analyze_triggers.py @@ -0,0 +1,403 @@ +#!/usr/bin/env python3 +""" +Analyze OpenStack CI job triggers for optimization opportunities. + +This script identifies: +1. Jobs missing file-change filters (skip_if_only_changed, run_if_changed) +2. Always-run jobs without throttling +3. Repos that could benefit from smarter triggering +4. Recommended patterns for skip_if_only_changed +""" + +import argparse +import csv +import json +import sys +from collections import defaultdict +from pathlib import Path + + +# Common patterns for files that typically don't need E2E tests +SKIP_PATTERNS = { + 'documentation': [ + r'^docs/', + r'\.md$', + r'^README', + ], + 'ownership': [ + r'(^|/)OWNERS(_ALIASES)?$', + ], + 'github_config': [ + r'^\.github/', + ], + 'general': [ + r'^CHANGELOG', + r'^LICENSE', + r'^DCO', + r'^SECURITY\.md$', + ], +} + +# Suggested skip pattern for E2E tests +SUGGESTED_SKIP_PATTERN = r'(^docs/)|(\\.md$)|((^|/)OWNERS(_ALIASES)?$)' + + +def load_inventory(csv_path): + """Load job inventory from CSV.""" + jobs = [] + with open(csv_path, 'r', encoding='utf-8') as f: + reader = csv.DictReader(f) + for row in reader: + row['optional'] = row['optional'].lower() == 'true' + row['always_run'] = row['always_run'].lower() == 'true' + jobs.append(row) + return jobs + + +def analyze_trigger_patterns(jobs): + """ + Analyze the current trigger patterns used across jobs. + """ + patterns = { + 'has_skip_if_only_changed': [], + 'has_run_if_changed': [], + 'has_minimum_interval': [], + 'always_run_true': [], + 'optional_true': [], + 'no_filters': [], # Jobs with no trigger optimization + } + + for job in jobs: + if job['skip_if_only_changed']: + patterns['has_skip_if_only_changed'].append(job) + if job['run_if_changed']: + patterns['has_run_if_changed'].append(job) + if job['minimum_interval']: + patterns['has_minimum_interval'].append(job) + if job['always_run']: + patterns['always_run_true'].append(job) + if job['optional']: + patterns['optional_true'].append(job) + + # Jobs that could benefit from trigger optimization + if (job['job_type'] == 'presubmit' and + not job['skip_if_only_changed'] and + not job['run_if_changed'] and + not job['optional']): + patterns['no_filters'].append(job) + + return patterns + + +def group_jobs_by_repo(jobs): + """Group jobs by org/repo for analysis.""" + repos = defaultdict(list) + for job in jobs: + key = (job['org'], job['repo']) + repos[key].append(job) + return repos + + +def analyze_repo_trigger_status(repos): + """ + For each repo, determine if it would benefit from skip_if_only_changed. + """ + repo_analysis = [] + + for (org, repo), jobs in repos.items(): + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + + if not presubmit_jobs: + continue + + # Count jobs with/without filters + with_skip = len([j for j in presubmit_jobs if j['skip_if_only_changed']]) + with_run_if = len([j for j in presubmit_jobs if j['run_if_changed']]) + optional = len([j for j in presubmit_jobs if j['optional']]) + always_run = len([j for j in presubmit_jobs if j['always_run']]) + no_filter = len([j for j in presubmit_jobs + if not j['skip_if_only_changed'] + and not j['run_if_changed'] + and not j['optional']]) + + # Determine if repo could benefit + could_benefit = no_filter > 0 and with_skip == 0 + + repo_analysis.append({ + 'org': org, + 'repo': repo, + 'total_presubmit': len(presubmit_jobs), + 'with_skip_pattern': with_skip, + 'with_run_if_changed': with_run_if, + 'optional': optional, + 'always_run': always_run, + 'no_filter': no_filter, + 'could_benefit': could_benefit, + 'job_names': sorted(set(j['job_name'] for j in presubmit_jobs)), + }) + + return repo_analysis + + +def analyze_always_run_jobs(jobs): + """ + Find jobs that are always_run=true without throttling. + These run on every PR and should be reviewed. + """ + always_run_jobs = [j for j in jobs + if j['always_run'] and j['job_type'] == 'presubmit'] + + # Group by whether they have minimum_interval + with_throttle = [j for j in always_run_jobs if j['minimum_interval']] + without_throttle = [j for j in always_run_jobs if not j['minimum_interval']] + + return { + 'with_throttle': with_throttle, + 'without_throttle': without_throttle, + } + + +def analyze_periodic_schedules(jobs): + """ + Analyze periodic job schedules for optimization. + """ + periodic_jobs = [j for j in jobs if j['job_type'] == 'periodic'] + + # Group by schedule pattern + schedules = defaultdict(list) + for job in periodic_jobs: + schedules[job['schedule']].append(job) + + return schedules + + +def generate_report(jobs, output_file): + """Generate comprehensive trigger optimization report.""" + report = [] + report.append("# OpenStack CI Trigger Optimization Report\n") + report.append(f"Total jobs analyzed: {len(jobs)}\n") + + presubmit_jobs = [j for j in jobs if j['job_type'] == 'presubmit'] + periodic_jobs = [j for j in jobs if j['job_type'] == 'periodic'] + + report.append(f"- Presubmit jobs: {len(presubmit_jobs)}\n") + report.append(f"- Periodic jobs: {len(periodic_jobs)}\n") + + # Trigger Pattern Analysis + report.append("\n## 1. Current Trigger Pattern Usage\n\n") + patterns = analyze_trigger_patterns(jobs) + + report.append("| Pattern | Count | % of Presubmits |\n") + report.append("|---------|-------|------------------|\n") + + total_pre = len(presubmit_jobs) + for pattern, pattern_jobs in patterns.items(): + count = len([j for j in pattern_jobs if j['job_type'] == 'presubmit']) + pct = count / total_pre * 100 if total_pre else 0 + report.append(f"| {pattern} | {count} | {pct:.1f}% |\n") + + # Jobs Missing Filters + report.append("\n## 2. Jobs Without Trigger Optimization\n") + report.append("Presubmit jobs without skip_if_only_changed, " + "run_if_changed, or optional flags.\n\n") + + no_filter_jobs = patterns['no_filters'] + if no_filter_jobs: + # Group by repo + by_repo = defaultdict(list) + for job in no_filter_jobs: + by_repo[(job['org'], job['repo'])].append(job) + + report.append(f"Found {len(no_filter_jobs)} jobs across " + f"{len(by_repo)} repositories that could benefit from " + f"trigger optimization.\n\n") + + report.append("| Org/Repo | Jobs Without Filters | Job Names |\n") + report.append("|----------|----------------------|-----------|\n") + + for (org, repo), repo_jobs in sorted(by_repo.items(), + key=lambda x: len(x[1]), + reverse=True)[:20]: + names = ', '.join(set(j['job_name'] for j in repo_jobs))[:50] + if len(names) >= 50: + names += "..." + report.append(f"| {org}/{repo} | {len(repo_jobs)} | {names} |\n") + else: + report.append("All presubmit jobs have some form of trigger optimization.\n") + + # Repository Analysis + report.append("\n## 3. Repository Trigger Analysis\n") + report.append("Repositories that could benefit from adding " + "`skip_if_only_changed` patterns.\n\n") + + repos = group_jobs_by_repo(jobs) + repo_analysis = analyze_repo_trigger_status(repos) + + # Filter to repos that could benefit + could_benefit = [r for r in repo_analysis if r['could_benefit']] + + if could_benefit: + report.append(f"Found {len(could_benefit)} repositories that could " + f"add skip patterns.\n\n") + + report.append("| Org/Repo | Presubmits | No Filter | Suggested Action |\n") + report.append("|----------|------------|-----------|------------------|\n") + + for repo in sorted(could_benefit, + key=lambda x: x['no_filter'], reverse=True)[:25]: + action = f"Add skip_if_only_changed to {repo['no_filter']} jobs" + report.append( + f"| {repo['org']}/{repo['repo']} | {repo['total_presubmit']} | " + f"{repo['no_filter']} | {action} |\n" + ) + else: + report.append("All repositories have adequate trigger patterns.\n") + + # Suggested Skip Pattern + report.append("\n## 4. Recommended skip_if_only_changed Patterns\n\n") + report.append("For OpenStack E2E tests, we recommend:\n\n") + report.append("```yaml\n") + report.append("skip_if_only_changed: ") + report.append(f"{SUGGESTED_SKIP_PATTERN}\n") + report.append("```\n\n") + + report.append("This pattern skips the job when changes only affect:\n") + report.append("- Documentation files (`docs/` directory)\n") + report.append("- Markdown files (`*.md`)\n") + report.append("- OWNERS files\n\n") + + report.append("### Individual Component Patterns\n\n") + for category, patterns_list in SKIP_PATTERNS.items(): + report.append(f"**{category}:**\n") + for p in patterns_list: + report.append(f"- `{p}`\n") + report.append("\n") + + # Periodic Schedule Analysis + report.append("\n## 5. Periodic Job Schedule Analysis\n\n") + schedules = analyze_periodic_schedules(jobs) + + if schedules: + report.append("| Schedule | Jobs | Examples |\n") + report.append("|----------|------|----------|\n") + + for schedule, sched_jobs in sorted(schedules.items()): + examples = ', '.join(set(j['job_name'] for j in sched_jobs))[:40] + if len(examples) >= 40: + examples += "..." + report.append(f"| {schedule} | {len(sched_jobs)} | {examples} |\n") + else: + report.append("No periodic jobs found.\n") + + # Optimization Recommendations + report.append("\n## 6. Optimization Recommendations\n\n") + + report.append("### High Impact\n\n") + + if could_benefit: + report.append(f"1. **Add skip_if_only_changed to {len(could_benefit)} repos**: " + f"Approximately {sum(r['no_filter'] for r in could_benefit)} jobs " + f"could skip runs on docs-only PRs.\n\n") + + # Calculate potential savings + total_no_filter = len(patterns['no_filters']) + report.append(f"2. **Total presubmit jobs without filters**: {total_no_filter}\n") + report.append(" These jobs run on every non-optional PR regardless of " + "which files changed.\n\n") + + report.append("### Medium Impact\n\n") + report.append("3. **Review always_run jobs**: Ensure jobs marked `always_run: true` " + "are truly required for every PR.\n\n") + + report.append("4. **Add minimum_interval to high-frequency jobs**: " + "Throttle jobs that don't need to run on every commit.\n\n") + + report.append("### Implementation Steps\n\n") + report.append("1. For each repo without skip patterns:\n") + report.append(" - Identify which test jobs are full E2E (vs unit tests)\n") + report.append(" - Add `skip_if_only_changed` to E2E tests\n") + report.append(" - Keep unit tests running on all changes\n\n") + + report.append("2. Example config change:\n") + report.append("```yaml\n") + report.append("tests:\n") + report.append("- as: e2e-openstack\n") + report.append(" skip_if_only_changed: (^docs/)|(\\\\..md$)|((^|/)OWNERS$)\n") + report.append(" steps:\n") + report.append(" cluster_profile: openstack-vexxhost\n") + report.append(" workflow: openshift-e2e-openstack-ipi\n") + report.append("```\n") + + # Write report + with open(output_file, 'w', encoding='utf-8') as f: + f.write(''.join(report)) + + print(f"Report written to {output_file}", file=sys.stderr) + + # Also write machine-readable data + json_output = output_file.replace('.md', '_data.json') + with open(json_output, 'w', encoding='utf-8') as f: + json.dump({ + 'trigger_patterns': {k: len(v) for k, v in patterns.items()}, + 'repos_without_skip': [ + { + 'org': r['org'], + 'repo': r['repo'], + 'jobs_without_filter': r['no_filter'], + 'job_names': r['job_names'], + } + for r in could_benefit + ], + 'jobs_without_filter': [ + { + 'org': j['org'], + 'repo': j['repo'], + 'branch': j['branch'], + 'job_name': j['job_name'], + } + for j in patterns['no_filters'] + ], + 'periodic_schedules': {k: len(v) for k, v in schedules.items()}, + 'suggested_pattern': SUGGESTED_SKIP_PATTERN, + }, f, indent=2) + + print(f"Data written to {json_output}", file=sys.stderr) + + +def main(): + import os + script_dir = os.path.dirname(os.path.abspath(__file__)) + + parser = argparse.ArgumentParser( + description="Analyze OpenStack CI job triggers for optimization" + ) + parser.add_argument( + "--output-dir", + default=script_dir, + help="Directory for input/output files (default: script directory)" + ) + parser.add_argument( + "--inventory", + default="openstack_jobs_inventory.csv", + help="Inventory CSV filename (default: openstack_jobs_inventory.csv)" + ) + + args = parser.parse_args() + + output_dir = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Trigger Optimization Analysis") + print("=" * 60) + print(f"Output directory: {output_dir}") + print() + + inventory_path = os.path.join(output_dir, args.inventory) + output_path = os.path.join(output_dir, "trigger_optimization_report.md") + + jobs = load_inventory(inventory_path) + generate_report(jobs, output_path) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/analyze_workflow_passrate.py b/.claude/skills/openstack-ci-analysis/scripts/analyze_workflow_passrate.py new file mode 100755 index 0000000..5d26a60 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/analyze_workflow_passrate.py @@ -0,0 +1,435 @@ +#!/usr/bin/env python3 +""" +Analyze workflow pass rates by correlating job inventory with Sippy metrics. +Maps inventory job names to Sippy data using substring matching. +""" + +import argparse +import json +import os +import sys +from datetime import datetime +from collections import defaultdict + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Analyze workflow pass rates" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_job_inventory(): + """Load job inventory.""" + filepath = os.path.join(OUTPUT_DIR, "openstack_jobs_inventory.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_sippy_data(): + """Load Sippy job data.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_metrics_jobs(): + """Load extended metrics per job.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def extract_workflow_from_name(job_name): + """Extract workflow pattern from job name.""" + # Common workflow patterns in OpenStack job names + patterns = [ + "openshift-e2e-openstack-ipi", + "openshift-e2e-openstack-upi", + "openshift-upgrade-openstack", + "openshift-e2e-openstack", + "openshift-installer-openstack", + ] + + # Check for specific test scenarios in name + name_lower = job_name.lower() + + # Extract key test characteristics + characteristics = [] + if "serial" in name_lower: + characteristics.append("serial") + if "parallel" in name_lower: + characteristics.append("parallel") + if "fips" in name_lower: + characteristics.append("fips") + if "proxy" in name_lower: + characteristics.append("proxy") + if "dualstack" in name_lower: + characteristics.append("dualstack") + if "singlestackv6" in name_lower or "single-stack-v6" in name_lower: + characteristics.append("singlestackv6") + if "upgrade" in name_lower: + characteristics.append("upgrade") + if "nfv" in name_lower: + characteristics.append("nfv") + if "hwoffload" in name_lower: + characteristics.append("hwoffload") + if "ccpmso" in name_lower: + characteristics.append("ccpmso") + if "csi" in name_lower: + characteristics.append("csi") + if "manila" in name_lower: + characteristics.append("manila") + if "cinder" in name_lower: + characteristics.append("cinder") + if "externallb" in name_lower: + characteristics.append("externallb") + if "kuryr" in name_lower: + characteristics.append("kuryr") + if "hypershift" in name_lower: + characteristics.append("hypershift") + if "techpreview" in name_lower: + characteristics.append("techpreview") + if "etcd" in name_lower: + characteristics.append("etcd") + + if characteristics: + return "-".join(sorted(characteristics)) + return "e2e-default" + + +def correlate_jobs(inventory, sippy_data, extended_jobs): + """Correlate inventory jobs with Sippy data.""" + # Build Sippy job lookup by name + sippy_lookup = {} + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + for job in jobs: + name = job.get("name", "") + sippy_lookup[name] = { + "release": release, + "current_runs": job.get("current_runs", 0), + "current_passes": job.get("current_passes", 0), + "previous_runs": job.get("previous_runs", 0), + "previous_passes": job.get("previous_passes", 0), + "pass_rate": job.get("current_pass_percentage", 0), + } + + # Build extended metrics lookup + extended_lookup = {} + if extended_jobs: + for job in extended_jobs: + name = job.get("name", "") + extended_lookup[name] = job + + # Group inventory jobs by workflow + workflow_jobs = defaultdict(list) + + for inv_job in inventory: + job_name = inv_job.get("job_name", "") + workflow = inv_job.get("workflow", "") or extract_workflow_from_name(job_name) + job_type = inv_job.get("job_type", "") + + # Only analyze periodic jobs (which have Sippy data) + if job_type != "periodic": + continue + + # Try to find matching Sippy job + sippy_match = None + extended_match = None + + # Look for exact or partial match + for sippy_name, sippy_job in sippy_lookup.items(): + # Check if inventory job name is in Sippy job name or vice versa + if job_name in sippy_name or sippy_name.endswith(job_name): + sippy_match = sippy_job + extended_match = extended_lookup.get(sippy_name) + break + + job_info = { + "job_name": job_name, + "workflow": workflow, + "cluster_profile": inv_job.get("cluster_profile", ""), + "org": inv_job.get("org", ""), + "repo": inv_job.get("repo", ""), + "branch": inv_job.get("branch", ""), + "has_sippy_data": sippy_match is not None, + } + + if sippy_match: + job_info.update({ + "release": sippy_match.get("release", ""), + "current_runs": sippy_match.get("current_runs", 0), + "current_passes": sippy_match.get("current_passes", 0), + "previous_runs": sippy_match.get("previous_runs", 0), + "previous_passes": sippy_match.get("previous_passes", 0), + "pass_rate": sippy_match.get("pass_rate", 0), + }) + if extended_match: + job_info["combined_runs"] = extended_match.get("combined_runs", 0) + job_info["combined_pass_rate"] = extended_match.get("combined_pass_rate", 0) + job_info["trend"] = extended_match.get("trend", "") + + # Extract scenario from job name + scenario = extract_workflow_from_name(job_name) + workflow_jobs[scenario].append(job_info) + + return workflow_jobs + + +def analyze_workflows(workflow_jobs): + """Analyze pass rates by workflow.""" + results = { + "generated": datetime.now().isoformat(), + "workflows": [], + "summary": {}, + } + + workflow_stats = [] + + for workflow, jobs in workflow_jobs.items(): + jobs_with_data = [j for j in jobs if j.get("has_sippy_data")] + + if not jobs_with_data: + continue + + total_runs = sum(j.get("current_runs", 0) + j.get("previous_runs", 0) for j in jobs_with_data) + total_passes = sum(j.get("current_passes", 0) + j.get("previous_passes", 0) for j in jobs_with_data) + pass_rate = (total_passes / total_runs * 100) if total_runs > 0 else 0 + + # Count problem jobs + problem_jobs = [j for j in jobs_with_data if j.get("pass_rate", 100) < 80] + + # Calculate trend + improving = sum(1 for j in jobs_with_data if j.get("trend") == "improving") + degrading = sum(1 for j in jobs_with_data if j.get("trend") == "degrading") + + trend = "stable" + if improving > degrading and improving > 0: + trend = "improving" + elif degrading > improving and degrading > 0: + trend = "degrading" + + # Determine severity + severity = "ok" + if pass_rate < 50: + severity = "critical" + elif pass_rate < 70: + severity = "warning" + elif pass_rate < 80: + severity = "needs_attention" + + workflow_stats.append({ + "workflow": workflow, + "job_count": len(jobs_with_data), + "total_runs": total_runs, + "total_passes": total_passes, + "pass_rate": pass_rate, + "problem_job_count": len(problem_jobs), + "trend": trend, + "severity": severity, + "jobs": jobs_with_data, + }) + + # Sort by pass rate (lowest first = most problematic) + workflow_stats.sort(key=lambda x: x["pass_rate"]) + + results["workflows"] = workflow_stats + + # Summary + total_workflows = len(workflow_stats) + critical = sum(1 for w in workflow_stats if w["severity"] == "critical") + warning = sum(1 for w in workflow_stats if w["severity"] == "warning") + + results["summary"] = { + "total_workflows_analyzed": total_workflows, + "critical_workflows": critical, + "warning_workflows": warning, + "ok_workflows": total_workflows - critical - warning, + } + + return results + + +def generate_report(analysis): + """Generate markdown report for workflow analysis.""" + report = [] + report.append("# Workflow Pass Rate Analysis") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("This report analyzes pass rates grouped by test workflow/scenario type.") + report.append("") + + # Summary + summary = analysis.get("summary", {}) + report.append("## Summary") + report.append("") + report.append(f"| Metric | Count |") + report.append(f"|--------|-------|") + report.append(f"| Total Workflows Analyzed | {summary.get('total_workflows_analyzed', 0)} |") + report.append(f"| Critical (<50% pass rate) | {summary.get('critical_workflows', 0)} |") + report.append(f"| Warning (50-70% pass rate) | {summary.get('warning_workflows', 0)} |") + report.append(f"| OK (>70% pass rate) | {summary.get('ok_workflows', 0)} |") + report.append("") + + workflows = analysis.get("workflows", []) + + # Critical workflows + critical = [w for w in workflows if w["severity"] == "critical"] + if critical: + report.append("## Critical Workflows (Pass Rate < 50%)") + report.append("") + report.append("These workflows require immediate attention:") + report.append("") + report.append("| Workflow | Jobs | Runs | Pass Rate | Trend |") + report.append("|----------|------|------|-----------|-------|") + for w in critical: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + report.append( + f"| {w['workflow']} | {w['job_count']} | {w['total_runs']} | " + f"**{w['pass_rate']:.1f}%** | {trend_icon} |" + ) + report.append("") + + # Warning workflows + warning = [w for w in workflows if w["severity"] == "warning"] + if warning: + report.append("## Warning Workflows (Pass Rate 50-70%)") + report.append("") + report.append("| Workflow | Jobs | Runs | Pass Rate | Trend |") + report.append("|----------|------|------|-----------|-------|") + for w in warning: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + report.append( + f"| {w['workflow']} | {w['job_count']} | {w['total_runs']} | " + f"{w['pass_rate']:.1f}% | {trend_icon} |" + ) + report.append("") + + # All workflows table + report.append("## All Workflows by Pass Rate") + report.append("") + report.append("| Rank | Workflow | Jobs | Runs | Pass Rate | Problems | Trend |") + report.append("|------|----------|------|------|-----------|----------|-------|") + for i, w in enumerate(workflows, 1): + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(w["trend"], "") + severity_marker = "" + if w["severity"] == "critical": + severity_marker = " ⚠️" + elif w["severity"] == "warning": + severity_marker = " ⚡" + report.append( + f"| {i} | {w['workflow']}{severity_marker} | {w['job_count']} | " + f"{w['total_runs']} | {w['pass_rate']:.1f}% | {w['problem_job_count']} | {trend_icon} |" + ) + report.append("") + + # Recommendations + report.append("## Recommendations") + report.append("") + if critical: + report.append("### Immediate Actions") + report.append("") + for w in critical[:5]: + report.append(f"- **{w['workflow']}**: {w['pass_rate']:.1f}% pass rate with {w['total_runs']} runs - investigate root cause") + report.append("") + + if warning: + report.append("### Short-term Improvements") + report.append("") + for w in warning[:5]: + report.append(f"- **{w['workflow']}**: {w['pass_rate']:.1f}% pass rate - monitor and triage failures") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Sources: Job inventory + [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Workflow Pass Rate Analysis") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + inventory = load_job_inventory() + if not inventory: + print("Error: No job inventory found. Run extract_openstack_jobs.py first.") + sys.exit(1) + print(f"Loaded inventory: {len(inventory)} jobs") + + sippy_data = load_sippy_data() + if not sippy_data: + print("Error: No Sippy data found. Run fetch_job_metrics.py first.") + sys.exit(1) + print(f"Loaded Sippy data from: {sippy_data.get('fetched_at')}") + + extended_jobs = load_extended_metrics_jobs() + print(f"Extended metrics loaded: {extended_jobs is not None}") + print() + + # Correlate and analyze + workflow_jobs = correlate_jobs(inventory, sippy_data, extended_jobs) + print(f"Found {len(workflow_jobs)} workflow types") + + analysis = analyze_workflows(workflow_jobs) + + # Save results + analysis_path = os.path.join(OUTPUT_DIR, "workflow_passrate_analysis.json") + with open(analysis_path, 'w') as f: + # Remove job details for smaller output + save_analysis = dict(analysis) + save_analysis["workflows"] = [ + {k: v for k, v in w.items() if k != "jobs"} + for w in analysis["workflows"] + ] + json.dump(save_analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "workflow_passrate_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + summary = analysis.get("summary", {}) + print(f" Workflows analyzed: {summary.get('total_workflows_analyzed', 0)}") + print(f" Critical (<50%): {summary.get('critical_workflows', 0)}") + print(f" Warning (50-70%): {summary.get('warning_workflows', 0)}") + print(f" OK (>70%): {summary.get('ok_workflows', 0)}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/categorize_failures.py b/.claude/skills/openstack-ci-analysis/scripts/categorize_failures.py new file mode 100755 index 0000000..569c3f8 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/categorize_failures.py @@ -0,0 +1,417 @@ +#!/usr/bin/env python3 +""" +Categorize job failures using heuristic classification. +Categories: Infrastructure, Flaky, Product Bug, Unknown/Needs Triage +""" + +import argparse +import json +import os +import sys +from datetime import datetime +from collections import defaultdict + +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Categorize job failures using heuristic classification" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + + +def load_extended_metrics(): + """Load extended metrics data.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_extended_jobs(): + """Load extended metrics per job.""" + filepath = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_sippy_data(): + """Load raw Sippy data for additional context.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def categorize_job(job): + """ + Categorize a job failure based on heuristics. + + Categories: + - infrastructure: Likely infrastructure/provisioning issues + - flaky: Inconsistent pass rates (30-70%) + - product_bug: Consistent failures with bugs filed + - needs_triage: Unknown, requires investigation + """ + name = job.get("name", "").lower() + brief_name = job.get("brief_name", "").lower() + combined_rate = job.get("combined_pass_rate") + current_rate = job.get("current_pass_rate") + open_bugs = job.get("open_bugs", 0) + combined_runs = job.get("combined_runs", 0) + trend = job.get("trend", "") + + # Skip jobs with no data + if combined_rate is None or combined_runs < 2: + return None, "insufficient_data" + + # Jobs at or above 80% are not problem jobs + if combined_rate >= 80: + return None, "passing" + + # Category determination heuristics + + # 1. Product Bug: 0% pass rate or very low with bugs filed + if combined_rate == 0: + if open_bugs > 0: + return "product_bug", "0% pass rate with filed bugs" + else: + return "needs_triage", "0% pass rate, no bugs filed" + + # 2. Infrastructure indicators + infra_keywords = [ + "install", "provision", "bootstrap", "create", + "vpc", "network", "dns", "loadbalancer", "lb", + ] + is_infra_job = any(kw in name or kw in brief_name for kw in infra_keywords) + + if combined_rate < 30 and is_infra_job: + return "infrastructure", "Low pass rate on infrastructure-related job" + + # 3. Flaky: 30-70% pass rate (inconsistent) + if 30 <= combined_rate < 70: + if trend == "degrading": + return "flaky", "Inconsistent pass rate, trending worse" + elif trend == "improving": + return "flaky", "Inconsistent pass rate, trending better" + else: + return "flaky", "Inconsistent pass rate (30-70%)" + + # 4. Product Bug: Low pass rate with bugs + if combined_rate < 50 and open_bugs > 0: + return "product_bug", f"Low pass rate with {open_bugs} open bug(s)" + + # 5. Check for specific failure patterns in job name + if "etcd" in name or "scaling" in name: + return "product_bug", "Known problematic component" + + if "techpreview" in name: + return "needs_triage", "Tech preview feature - expected instability" + + # 6. Very low rate without bugs = needs investigation + if combined_rate < 30: + return "needs_triage", "Very low pass rate, needs investigation" + + # 7. Moderate failures (70-80%) + if 70 <= combined_rate < 80: + if trend == "degrading": + return "needs_triage", "Recently degraded, needs investigation" + else: + return "flaky", "Borderline pass rate" + + # Default + return "needs_triage", "Uncategorized failure" + + +def categorize_all_jobs(extended_jobs, sippy_data): + """Categorize all problem jobs.""" + results = { + "generated": datetime.now().isoformat(), + "categories": { + "infrastructure": [], + "flaky": [], + "product_bug": [], + "needs_triage": [], + }, + "summary": {}, + "by_release": defaultdict(lambda: defaultdict(list)), + } + + # Build Sippy lookup for additional context + sippy_bugs = {} + if sippy_data: + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + for job in jobs: + sippy_bugs[job.get("name", "")] = job.get("open_bugs", 0) + + # Categorize each job + for job in extended_jobs: + # Ensure we have bug info + if job.get("open_bugs") is None and job.get("name") in sippy_bugs: + job["open_bugs"] = sippy_bugs[job.get("name")] + + category, reason = categorize_job(job) + + if category is None: + continue + + job_info = { + "release": job.get("release", ""), + "name": job.get("name", ""), + "brief_name": job.get("brief_name", ""), + "combined_runs": job.get("combined_runs", 0), + "combined_pass_rate": job.get("combined_pass_rate"), + "current_pass_rate": job.get("current_pass_rate"), + "open_bugs": job.get("open_bugs", 0), + "trend": job.get("trend", ""), + "reason": reason, + } + + results["categories"][category].append(job_info) + results["by_release"][job.get("release", "")][category].append(job_info) + + # Sort each category by pass rate + for category in results["categories"]: + results["categories"][category].sort( + key=lambda x: x.get("combined_pass_rate") or 0 + ) + + # Summary statistics + total_problems = sum(len(jobs) for jobs in results["categories"].values()) + results["summary"] = { + "total_problem_jobs": total_problems, + "by_category": { + cat: len(jobs) for cat, jobs in results["categories"].items() + }, + "percentages": {}, + } + + if total_problems > 0: + for cat, count in results["summary"]["by_category"].items(): + results["summary"]["percentages"][cat] = round(count / total_problems * 100, 1) + + return results + + +def generate_report(analysis): + """Generate markdown report for failure categorization.""" + report = [] + report.append("# Failure Categorization Report") + report.append("") + report.append(f"**Generated:** {analysis['generated']}") + report.append("") + report.append("Jobs with pass rate below 80% are categorized by likely root cause.") + report.append("") + + # Summary + summary = analysis.get("summary", {}) + report.append("## Summary") + report.append("") + report.append(f"**Total Problem Jobs:** {summary.get('total_problem_jobs', 0)}") + report.append("") + report.append("| Category | Count | Percentage | Description |") + report.append("|----------|-------|------------|-------------|") + + category_descriptions = { + "infrastructure": "Provisioning/infra failures", + "flaky": "Inconsistent (30-70% pass rate)", + "product_bug": "Known bugs filed", + "needs_triage": "Requires investigation", + } + + by_cat = summary.get("by_category", {}) + percentages = summary.get("percentages", {}) + for cat in ["infrastructure", "flaky", "product_bug", "needs_triage"]: + count = by_cat.get(cat, 0) + pct = percentages.get(cat, 0) + desc = category_descriptions.get(cat, "") + report.append(f"| {cat.replace('_', ' ').title()} | {count} | {pct}% | {desc} |") + report.append("") + + # Category breakdowns + categories = analysis.get("categories", {}) + + # Infrastructure issues + infra = categories.get("infrastructure", []) + if infra: + report.append("## Infrastructure Issues") + report.append("") + report.append("Jobs likely failing due to OpenStack provisioning or infrastructure problems:") + report.append("") + report.append("| Release | Job | Pass Rate | Runs | Reason |") + report.append("|---------|-----|-----------|------|--------|") + for job in infra[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['combined_runs']} | {job['reason'][:30]} |" + ) + if len(infra) > 15: + report.append(f"| ... | *{len(infra) - 15} more* | | | |") + report.append("") + + # Flaky jobs + flaky = categories.get("flaky", []) + if flaky: + report.append("## Flaky Jobs") + report.append("") + report.append("Jobs with inconsistent pass rates (30-70%) indicating test or timing issues:") + report.append("") + report.append("| Release | Job | Pass Rate | Trend | Runs |") + report.append("|---------|-----|-----------|-------|------|") + for job in flaky[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(job["trend"], "") + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{trend_icon} | {job['combined_runs']} |" + ) + if len(flaky) > 15: + report.append(f"| ... | *{len(flaky) - 15} more* | | | |") + report.append("") + + # Product bugs + bugs = categories.get("product_bug", []) + if bugs: + report.append("## Product Bugs") + report.append("") + report.append("Jobs with known bugs filed - track via bug system:") + report.append("") + report.append("| Release | Job | Pass Rate | Open Bugs | Runs |") + report.append("|---------|-----|-----------|-----------|------|") + for job in bugs[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['open_bugs']} | {job['combined_runs']} |" + ) + if len(bugs) > 15: + report.append(f"| ... | *{len(bugs) - 15} more* | | | |") + report.append("") + + # Needs triage + triage = categories.get("needs_triage", []) + if triage: + report.append("## Needs Triage") + report.append("") + report.append("Jobs requiring investigation to determine root cause:") + report.append("") + report.append("| Release | Job | Pass Rate | Runs | Reason |") + report.append("|---------|-----|-----------|------|--------|") + for job in triage[:15]: + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:40]} | {rate_str} | " + f"{job['combined_runs']} | {job['reason'][:30]} |" + ) + if len(triage) > 15: + report.append(f"| ... | *{len(triage) - 15} more* | | | |") + report.append("") + + # Recommendations + report.append("## Recommended Actions by Category") + report.append("") + report.append("### Infrastructure") + report.append("- Review OpenStack cloud health and quotas") + report.append("- Check for recurring provisioning failures") + report.append("- Validate network and DNS configuration") + report.append("") + report.append("### Flaky") + report.append("- Analyze test logs for timing-related failures") + report.append("- Consider adding retries for known flaky operations") + report.append("- Investigate environmental dependencies") + report.append("") + report.append("### Product Bug") + report.append("- Track existing bugs to resolution") + report.append("- Prioritize bugs blocking multiple jobs") + report.append("- Consider disabling jobs until bug is fixed") + report.append("") + report.append("### Needs Triage") + report.append("- Review recent job logs to identify patterns") + report.append("- File bugs with failure details") + report.append("- Categorize after investigation") + report.append("") + + report.append("---") + report.append("") + report.append("*Classification based on heuristics - manual review recommended*") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Failure Categorization") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load data + extended_jobs = load_extended_jobs() + if not extended_jobs: + print("Error: No extended metrics jobs data found.") + print("Run fetch_extended_metrics.py first.") + sys.exit(1) + print(f"Loaded {len(extended_jobs)} jobs") + + sippy_data = load_sippy_data() + print(f"Sippy data loaded: {sippy_data is not None}") + print() + + # Categorize + analysis = categorize_all_jobs(extended_jobs, sippy_data) + + # Convert defaultdict to regular dict for JSON serialization + analysis["by_release"] = {k: dict(v) for k, v in analysis["by_release"].items()} + + # Save results + analysis_path = os.path.join(OUTPUT_DIR, "failure_categories.json") + with open(analysis_path, 'w') as f: + json.dump(analysis, f, indent=2) + print(f"Saved: {analysis_path}") + + # Generate report + report = generate_report(analysis) + report_path = os.path.join(OUTPUT_DIR, "failure_categories_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Print summary + print() + print("=" * 60) + print("Summary:") + summary = analysis.get("summary", {}) + print(f" Total problem jobs: {summary.get('total_problem_jobs', 0)}") + for cat, count in summary.get("by_category", {}).items(): + pct = summary.get("percentages", {}).get(cat, 0) + print(f" {cat}: {count} ({pct}%)") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/extract_openstack_jobs.py b/.claude/skills/openstack-ci-analysis/scripts/extract_openstack_jobs.py new file mode 100755 index 0000000..8dbc201 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/extract_openstack_jobs.py @@ -0,0 +1,345 @@ +#!/usr/bin/env python3 +""" +Extract all OpenStack CI jobs from ci-operator/config files. + +This script parses CI configuration files and extracts job information +for tests using OpenStack cluster profiles. + +Target cluster profiles: +- openstack-vexxhost +- openstack-vh-mecha-central +- openstack-vh-mecha-az0 +- openstack-vh-bm-rhos +- openstack-hwoffload +- openstack-nfv +""" + +import argparse +import csv +import json +import os +import re +import sys +from pathlib import Path + +try: + import yaml +except ImportError: + print("Error: PyYAML is required. Install with: pip install pyyaml", file=sys.stderr) + sys.exit(1) + + +# Target OpenStack cluster profiles +OPENSTACK_PROFILES = [ + "openstack-vexxhost", + "openstack-vh-mecha-central", + "openstack-vh-mecha-az0", + "openstack-vh-bm-rhos", + "openstack-hwoffload", + "openstack-nfv", +] + + +def get_cluster_profile(test): + """Extract cluster_profile from a test definition.""" + if "steps" in test: + steps = test["steps"] + if isinstance(steps, dict): + return steps.get("cluster_profile") + return None + + +def get_workflow(test): + """Extract workflow from a test definition.""" + if "steps" in test: + steps = test["steps"] + if isinstance(steps, dict): + return steps.get("workflow") + return None + + +def get_job_type(test): + """Determine job type based on scheduling fields. + + Jobs are classified as: + - periodic: if they have cron/interval, OR if they have minimum_interval + but no presubmit triggers (always_run, run_if_changed, optional) + - postsubmit: if explicitly marked as postsubmit + - presubmit: otherwise + + Note: Jobs with minimum_interval but no presubmit triggers are periodic jobs + that run on a schedule. They're generated into *-periodics.yaml files. + """ + # Explicit periodic scheduling + if test.get("interval") or test.get("cron"): + return "periodic" + + if test.get("postsubmit"): + return "postsubmit" + + # Implicit periodic: minimum_interval without presubmit triggers + # These jobs run periodically, not on PRs + if test.get("minimum_interval"): + has_presubmit_trigger = ( + test.get("always_run") or + test.get("run_if_changed") or + test.get("optional") is True or + test.get("skip_if_only_changed") + ) + if not has_presubmit_trigger: + return "periodic" + + return "presubmit" + + +def get_schedule(test): + """Extract schedule (interval or cron) from a test. + + For implicit periodic jobs (those with minimum_interval but no presubmit + triggers), the minimum_interval acts as the schedule. + """ + if test.get("interval"): + return f"interval: {test['interval']}" + if test.get("cron"): + return f"cron: {test['cron']}" + # For implicit periodic jobs, minimum_interval is the effective schedule + if test.get("minimum_interval"): + has_presubmit_trigger = ( + test.get("always_run") or + test.get("run_if_changed") or + test.get("optional") is True or + test.get("skip_if_only_changed") + ) + if not has_presubmit_trigger: + return f"minimum_interval: {test['minimum_interval']}" + return "" + + +def parse_config_file(file_path): + """Parse a single CI config file and extract OpenStack jobs.""" + jobs = [] + + try: + with open(file_path, 'r', encoding='utf-8') as f: + config = yaml.safe_load(f) + except Exception as e: + print(f"Warning: Failed to parse {file_path}: {e}", file=sys.stderr) + return jobs + + if not config or "tests" not in config: + return jobs + + # Extract metadata + metadata = config.get("zz_generated_metadata", {}) + org = metadata.get("org", "") + repo = metadata.get("repo", "") + branch = metadata.get("branch", "") + variant = metadata.get("variant", "") + + # Parse each test + for test in config.get("tests", []): + if not isinstance(test, dict): + continue + + cluster_profile = get_cluster_profile(test) + + # Check if this is an OpenStack job + if cluster_profile and any(profile in cluster_profile for profile in OPENSTACK_PROFILES): + job_name = test.get("as", "") + + job_info = { + "job_name": job_name, + "cluster_profile": cluster_profile, + "job_type": get_job_type(test), + "schedule": get_schedule(test), + "workflow": get_workflow(test) or "", + "optional": test.get("optional", False), + "always_run": test.get("always_run", False), + "minimum_interval": test.get("minimum_interval", ""), + "skip_if_only_changed": test.get("skip_if_only_changed", ""), + "run_if_changed": test.get("run_if_changed", ""), + "org": org, + "repo": repo, + "branch": branch, + "variant": variant, + "config_file": str(file_path), + } + + jobs.append(job_info) + + return jobs + + +def find_config_files(config_dir): + """Find all CI config YAML files.""" + config_path = Path(config_dir) + + yaml_files = [] + for pattern in ["**/*.yaml", "**/*.yml"]: + yaml_files.extend(config_path.glob(pattern)) + + return sorted(set(yaml_files)) + + +def extract_jobs(config_dir): + """Extract all OpenStack jobs from config directory.""" + all_jobs = [] + + config_files = find_config_files(config_dir) + print(f"Found {len(config_files)} config files to scan", file=sys.stderr) + + for file_path in config_files: + jobs = parse_config_file(file_path) + all_jobs.extend(jobs) + + print(f"Extracted {len(all_jobs)} OpenStack jobs", file=sys.stderr) + return all_jobs + + +def output_csv(jobs, output_file): + """Output jobs to CSV format.""" + if not jobs: + print("No jobs to output", file=sys.stderr) + return + + fieldnames = [ + "job_name", "cluster_profile", "job_type", "schedule", "workflow", + "optional", "always_run", "minimum_interval", "skip_if_only_changed", + "run_if_changed", "org", "repo", "branch", "variant", "config_file" + ] + + with open(output_file, 'w', newline='', encoding='utf-8') as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(jobs) + + print(f"Wrote {len(jobs)} jobs to {output_file}", file=sys.stderr) + + +def output_json(jobs, output_file): + """Output jobs to JSON format.""" + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(jobs, f, indent=2) + + print(f"Wrote {len(jobs)} jobs to {output_file}", file=sys.stderr) + + +def print_summary(jobs): + """Print summary statistics.""" + print("\n=== OpenStack CI Job Summary ===\n") + + # By cluster profile + profile_counts = {} + for job in jobs: + profile = job["cluster_profile"] + profile_counts[profile] = profile_counts.get(profile, 0) + 1 + + print("Jobs by Cluster Profile:") + for profile in sorted(profile_counts.keys()): + print(f" {profile}: {profile_counts[profile]}") + + # By job type + type_counts = {} + for job in jobs: + job_type = job["job_type"] + type_counts[job_type] = type_counts.get(job_type, 0) + 1 + + print("\nJobs by Type:") + for job_type in sorted(type_counts.keys()): + print(f" {job_type}: {type_counts[job_type]}") + + # By org + org_counts = {} + for job in jobs: + org = job["org"] or "unknown" + org_counts[org] = org_counts.get(org, 0) + 1 + + print("\nJobs by Organization:") + for org in sorted(org_counts.keys(), key=lambda x: org_counts[x], reverse=True)[:10]: + print(f" {org}: {org_counts[org]}") + + # Unique workflows + workflows = set(job["workflow"] for job in jobs if job["workflow"]) + print(f"\nUnique Workflows: {len(workflows)}") + + # Unique repos + repos = set(f"{job['org']}/{job['repo']}" for job in jobs if job['org'] and job['repo']) + print(f"Unique Repositories: {len(repos)}") + + # Release branches + branches = set(job["branch"] for job in jobs if job["branch"]) + release_branches = sorted([b for b in branches if "release-" in b or b in ["main", "master"]]) + print(f"\nRelease Branches:") + for branch in release_branches[-10:]: + count = len([j for j in jobs if j["branch"] == branch]) + print(f" {branch}: {count}") + + print() + + +def main(): + parser = argparse.ArgumentParser( + description="Extract OpenStack CI jobs from ci-operator config files" + ) + parser.add_argument( + "--config-dir", + default="ci-operator/config", + help="Path to ci-operator/config directory (default: ci-operator/config)" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + parser.add_argument( + "--output-csv", + default="openstack_jobs_inventory.csv", + help="Output CSV filename (default: openstack_jobs_inventory.csv)" + ) + parser.add_argument( + "--output-json", + default="openstack_jobs_inventory.json", + help="Output JSON filename (default: openstack_jobs_inventory.json)" + ) + parser.add_argument( + "--summary", + action="store_true", + help="Print summary statistics" + ) + + args = parser.parse_args() + + # Resolve output directory + output_dir = os.path.abspath(args.output_dir) + os.makedirs(output_dir, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Job Extractor") + print("=" * 60) + print(f"Config directory: {args.config_dir}") + print(f"Output directory: {output_dir}") + print() + + # Ensure config directory exists + if not os.path.isdir(args.config_dir): + print(f"Error: Config directory not found: {args.config_dir}", file=sys.stderr) + sys.exit(1) + + # Extract jobs + jobs = extract_jobs(args.config_dir) + + # Output CSV + csv_path = os.path.join(output_dir, args.output_csv) + output_csv(jobs, csv_path) + + # Output JSON + json_path = os.path.join(output_dir, args.output_json) + output_json(jobs, json_path) + + # Print summary if requested + if args.summary: + print_summary(jobs) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/fetch_comparison_data.py b/.claude/skills/openstack-ci-analysis/scripts/fetch_comparison_data.py new file mode 100755 index 0000000..dbb723a --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/fetch_comparison_data.py @@ -0,0 +1,224 @@ +#!/usr/bin/env python3 +""" +Fetch platform comparison data from Sippy API. +Fetches variant data for all platforms to compare OpenStack against AWS, GCP, Azure, vSphere. +""" + +import argparse +import json +import os +import sys +import time +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +from datetime import datetime + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + +# Platform variants to compare +PLATFORMS = ["OpenStack", "AWS", "GCP", "Azure", "vSphere", "Metal"] + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Fetch platform comparison data from Sippy API" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + return parser.parse_args() + + +def fetch_json(url, retries=3, delay=2): + """Fetch JSON from URL with retries.""" + for attempt in range(retries): + try: + req = Request(url, headers={"User-Agent": "OpenStack-CI-Analysis/1.0"}) + with urlopen(req, timeout=60) as response: + return json.loads(response.read().decode()) + except (URLError, HTTPError) as e: + print(f" Attempt {attempt + 1} failed: {e}") + if attempt < retries - 1: + time.sleep(delay) + return None + + +def fetch_variants_for_release(release): + """Fetch variant data for a specific release.""" + url = f"{SIPPY_BASE}/variants?release={release}" + print(f"Fetching variants for release {release}...") + + data = fetch_json(url) + if data is None: + print(f" Failed to fetch variants for {release}") + return [] + + print(f" Retrieved {len(data)} variants") + return data + + +def extract_platform_variants(variants): + """Extract Platform:* variants from variant data.""" + platform_data = {} + + for variant in variants: + name = variant.get("name", "") + if name.startswith("Platform:"): + platform = name.replace("Platform:", "") + platform_data[platform] = { + "name": platform, + "variant_full_name": name, + "current_pass_percentage": variant.get("current_pass_percentage", 0), + "current_runs": variant.get("current_runs", 0), + "current_passes": variant.get("current_passes", 0), + "previous_pass_percentage": variant.get("previous_pass_percentage", 0), + "previous_runs": variant.get("previous_runs", 0), + "previous_passes": variant.get("previous_passes", 0), + "job_count": variant.get("job_count", 0), + } + + return platform_data + + +def fetch_jobs_for_release(release): + """Fetch all jobs for a release to get platform job counts.""" + url = f"{SIPPY_BASE}/jobs?release={release}" + print(f" Fetching jobs for platform counts...") + + data = fetch_json(url) + if data is None: + return {} + + # Count jobs by platform + platform_counts = {} + platform_runs = {} + platform_passes = {} + + for job in data: + name = job.get("name", "").lower() + runs = job.get("current_runs", 0) + job.get("previous_runs", 0) + passes = job.get("current_passes", 0) + job.get("previous_passes", 0) + + # Determine platform from job name + platform = None + if "openstack" in name: + platform = "OpenStack" + elif "aws" in name: + platform = "AWS" + elif "gcp" in name: + platform = "GCP" + elif "azure" in name: + platform = "Azure" + elif "vsphere" in name: + platform = "vSphere" + elif "metal" in name or "baremetal" in name: + platform = "Metal" + + if platform: + platform_counts[platform] = platform_counts.get(platform, 0) + 1 + platform_runs[platform] = platform_runs.get(platform, 0) + runs + platform_passes[platform] = platform_passes.get(platform, 0) + passes + + result = {} + for platform in platform_counts: + runs = platform_runs.get(platform, 0) + passes = platform_passes.get(platform, 0) + result[platform] = { + "job_count": platform_counts[platform], + "total_runs": runs, + "total_passes": passes, + "pass_rate": (passes / runs * 100) if runs > 0 else 0, + } + + return result + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + # Create output directory if needed + os.makedirs(OUTPUT_DIR, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Platform Comparison Data Fetcher") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + results = { + "fetched_at": datetime.now().isoformat(), + "releases": {}, + "overall_by_platform": {}, + } + + # Fetch data for each release + for release in RELEASES: + print(f"\n--- Release {release} ---") + + # Fetch variants + variants = fetch_variants_for_release(release) + platform_variants = extract_platform_variants(variants) if variants else {} + + # Fetch job counts + platform_jobs = fetch_jobs_for_release(release) + + # Combine data + release_data = { + "variants": platform_variants, + "job_metrics": platform_jobs, + } + results["releases"][release] = release_data + + time.sleep(1) # Be nice to the API + + # Calculate overall metrics by platform + overall = {} + for release, data in results["releases"].items(): + for platform, metrics in data.get("job_metrics", {}).items(): + if platform not in overall: + overall[platform] = { + "job_count": 0, + "total_runs": 0, + "total_passes": 0, + } + overall[platform]["job_count"] += metrics.get("job_count", 0) + overall[platform]["total_runs"] += metrics.get("total_runs", 0) + overall[platform]["total_passes"] += metrics.get("total_passes", 0) + + # Calculate pass rates + for platform, data in overall.items(): + runs = data["total_runs"] + passes = data["total_passes"] + data["pass_rate"] = (passes / runs * 100) if runs > 0 else 0 + + results["overall_by_platform"] = overall + + # Save results + output_path = os.path.join(OUTPUT_DIR, "platform_comparison_raw.json") + with open(output_path, 'w') as f: + json.dump(results, f, indent=2) + print(f"\nSaved: {output_path}") + + # Print summary + print("\n" + "=" * 60) + print("Summary by Platform (all releases):") + print("-" * 60) + print(f"{'Platform':<15} {'Jobs':>8} {'Runs':>10} {'Pass Rate':>10}") + print("-" * 60) + for platform in sorted(overall.keys(), key=lambda x: -overall[x]["pass_rate"]): + data = overall[platform] + print(f"{platform:<15} {data['job_count']:>8} {data['total_runs']:>10} {data['pass_rate']:>9.1f}%") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/fetch_extended_metrics.py b/.claude/skills/openstack-ci-analysis/scripts/fetch_extended_metrics.py new file mode 100755 index 0000000..6b28834 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/fetch_extended_metrics.py @@ -0,0 +1,383 @@ +#!/usr/bin/env python3 +""" +Fetch extended job metrics from Sippy API for OpenStack CI jobs. +Combines current + previous periods for ~14 day coverage. +Estimates job duration based on workflow/cluster profile. +""" + +import argparse +import json +import os +import sys +from datetime import datetime, timedelta +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +import time + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Calculate extended job metrics from Sippy data" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for input/output files (default: script directory)" + ) + return parser.parse_args() + +# Estimated durations by cluster profile (based on typical run times) +DURATION_ESTIMATES = { + "openstack-vexxhost": {"min": 60, "typical": 90, "max": 150}, + "openstack-vh-mecha-central": {"min": 60, "typical": 90, "max": 150}, + "openstack-vh-mecha-az0": {"min": 60, "typical": 100, "max": 180}, + "openstack-nfv": {"min": 90, "typical": 120, "max": 200}, + "openstack-hwoffload": {"min": 90, "typical": 120, "max": 200}, + "openstack-vh-bm-rhos": {"min": 120, "typical": 180, "max": 300}, +} + + +def load_collected_data(): + """Load previously collected Sippy data.""" + filepath = os.path.join(OUTPUT_DIR, "sippy_jobs_raw.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def load_job_inventory(): + """Load job inventory for cluster profile info.""" + filepath = os.path.join(OUTPUT_DIR, "openstack_jobs_inventory.json") + if os.path.exists(filepath): + with open(filepath) as f: + return json.load(f) + return None + + +def calculate_extended_metrics(sippy_data, inventory): + """Calculate extended metrics combining current + previous periods.""" + + results = { + "generated": datetime.now().isoformat(), + "period": "~14 days (current + previous Sippy windows)", + "releases": {}, + "overall": {}, + "problem_jobs": [], + "duration_estimates": {}, + } + + # Build a lookup for cluster profiles from inventory + cluster_profiles = {} + if inventory: + for job in inventory: + cluster_profiles[job.get("job_name", "")] = job.get("cluster_profile", "") + + all_jobs = [] + + for release, jobs in sippy_data.get("jobs_by_release", {}).items(): + release_stats = { + "total_jobs": len(jobs), + "current_runs": 0, + "current_passes": 0, + "previous_runs": 0, + "previous_passes": 0, + "combined_runs": 0, + "combined_passes": 0, + "pass_rate_current": 0, + "pass_rate_combined": 0, + "trend": "", + } + + for job in jobs: + name = job.get("name", "") + current_runs = job.get("current_runs", 0) + current_passes = job.get("current_passes", 0) + previous_runs = job.get("previous_runs", 0) + previous_passes = job.get("previous_passes", 0) + + combined_runs = current_runs + previous_runs + combined_passes = current_passes + previous_passes + + release_stats["current_runs"] += current_runs + release_stats["current_passes"] += current_passes + release_stats["previous_runs"] += previous_runs + release_stats["previous_passes"] += previous_passes + release_stats["combined_runs"] += combined_runs + release_stats["combined_passes"] += combined_passes + + # Calculate pass rates + current_rate = (current_passes / current_runs * 100) if current_runs > 0 else None + previous_rate = (previous_passes / previous_runs * 100) if previous_runs > 0 else None + combined_rate = (combined_passes / combined_runs * 100) if combined_runs > 0 else None + + # Determine trend + trend = "stable" + if current_rate is not None and previous_rate is not None: + diff = current_rate - previous_rate + if diff > 10: + trend = "improving" + elif diff < -10: + trend = "degrading" + + # Get cluster profile for duration estimate + cluster = cluster_profiles.get(name, "unknown") + duration_est = DURATION_ESTIMATES.get(cluster, {"min": 60, "typical": 90, "max": 180}) + + job_info = { + "release": release, + "name": name, + "brief_name": job.get("brief_name", name), + "cluster_profile": cluster, + "current_runs": current_runs, + "current_passes": current_passes, + "current_pass_rate": current_rate, + "previous_runs": previous_runs, + "previous_passes": previous_passes, + "previous_pass_rate": previous_rate, + "combined_runs": combined_runs, + "combined_passes": combined_passes, + "combined_pass_rate": combined_rate, + "trend": trend, + "last_pass": job.get("last_pass", ""), + "open_bugs": job.get("open_bugs", 0), + "estimated_duration_min": duration_est["typical"], + } + all_jobs.append(job_info) + + # Track problem jobs (< 80% and has runs) + if combined_rate is not None and combined_rate < 80 and combined_runs >= 2: + results["problem_jobs"].append(job_info) + + # Calculate release-level rates + if release_stats["current_runs"] > 0: + release_stats["pass_rate_current"] = ( + release_stats["current_passes"] / release_stats["current_runs"] * 100 + ) + if release_stats["combined_runs"] > 0: + release_stats["pass_rate_combined"] = ( + release_stats["combined_passes"] / release_stats["combined_runs"] * 100 + ) + + # Determine release trend + if release_stats["current_runs"] > 0 and release_stats["previous_runs"] > 0: + curr_rate = release_stats["current_passes"] / release_stats["current_runs"] + prev_rate = release_stats["previous_passes"] / release_stats["previous_runs"] + diff = (curr_rate - prev_rate) * 100 + if diff > 5: + release_stats["trend"] = "improving" + elif diff < -5: + release_stats["trend"] = "degrading" + else: + release_stats["trend"] = "stable" + + results["releases"][release] = release_stats + + # Overall statistics + total_current_runs = sum(r["current_runs"] for r in results["releases"].values()) + total_current_passes = sum(r["current_passes"] for r in results["releases"].values()) + total_combined_runs = sum(r["combined_runs"] for r in results["releases"].values()) + total_combined_passes = sum(r["combined_passes"] for r in results["releases"].values()) + + results["overall"] = { + "total_jobs": len(all_jobs), + "current_runs": total_current_runs, + "current_passes": total_current_passes, + "current_pass_rate": (total_current_passes / total_current_runs * 100) if total_current_runs > 0 else 0, + "combined_runs": total_combined_runs, + "combined_passes": total_combined_passes, + "combined_pass_rate": (total_combined_passes / total_combined_runs * 100) if total_combined_runs > 0 else 0, + "problem_job_count": len(results["problem_jobs"]), + } + + # Sort problem jobs by pass rate + results["problem_jobs"].sort(key=lambda x: x.get("combined_pass_rate", 0) or 0) + + # Duration estimates summary + jobs_by_profile = {} + for job in all_jobs: + profile = job.get("cluster_profile", "unknown") + if profile not in jobs_by_profile: + jobs_by_profile[profile] = [] + jobs_by_profile[profile].append(job) + + for profile, jobs in jobs_by_profile.items(): + est = DURATION_ESTIMATES.get(profile, {"min": 60, "typical": 90, "max": 180}) + total_runs = sum(j["combined_runs"] for j in jobs) + results["duration_estimates"][profile] = { + "job_count": len(jobs), + "total_runs": total_runs, + "typical_duration_min": est["typical"], + "estimated_total_hours": round(total_runs * est["typical"] / 60, 1), + } + + return results, all_jobs + + +def generate_extended_report(results, all_jobs): + """Generate markdown report with extended metrics.""" + report = [] + report.append("# OpenStack CI Extended Metrics Report") + report.append("") + report.append(f"**Generated:** {results['generated']}") + report.append(f"**Period:** {results['period']}") + report.append("") + + # Overall summary + report.append("## Executive Summary") + report.append("") + overall = results["overall"] + report.append(f"| Metric | Current (~7d) | Combined (~14d) |") + report.append(f"|--------|---------------|-----------------|") + report.append(f"| Total Jobs | {overall['total_jobs']} | {overall['total_jobs']} |") + report.append(f"| Total Runs | {overall['current_runs']} | {overall['combined_runs']} |") + report.append(f"| Pass Rate | {overall['current_pass_rate']:.1f}% | {overall['combined_pass_rate']:.1f}% |") + report.append(f"| Problem Jobs (<80%) | - | {overall['problem_job_count']} |") + report.append("") + + # Per-release breakdown + report.append("## Metrics by Release") + report.append("") + report.append("| Release | Jobs | Runs (14d) | Pass Rate | Trend |") + report.append("|---------|------|------------|-----------|-------|") + for release in RELEASES: + rel = results["releases"].get(release, {}) + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(rel.get("trend", ""), "") + report.append( + f"| {release} | {rel.get('total_jobs', 0)} | " + f"{rel.get('combined_runs', 0)} | " + f"{rel.get('pass_rate_combined', 0):.1f}% | {trend_icon} {rel.get('trend', '')} |" + ) + report.append("") + + # Problem jobs + report.append("## Problem Jobs (Pass Rate < 80%)") + report.append("") + problem_jobs = results.get("problem_jobs", []) + if problem_jobs: + report.append(f"**{len(problem_jobs)} jobs** need attention:") + report.append("") + report.append("| Release | Job | Runs | Pass Rate | Trend | Bugs |") + report.append("|---------|-----|------|-----------|-------|------|") + for job in problem_jobs[:25]: + trend_icon = {"improving": "↑", "degrading": "↓", "stable": "→"}.get(job.get("trend", ""), "") + rate = job.get("combined_pass_rate") + rate_str = f"{rate:.1f}%" if rate is not None else "N/A" + report.append( + f"| {job['release']} | {job['brief_name'][:50]} | " + f"{job['combined_runs']} | {rate_str} | {trend_icon} | {job.get('open_bugs', 0)} |" + ) + if len(problem_jobs) > 25: + report.append(f"| ... | *{len(problem_jobs) - 25} more jobs* | | | | |") + else: + report.append("All jobs with sufficient runs have pass rate >= 80%.") + report.append("") + + # Duration estimates + report.append("## Estimated Job Durations by Cluster Profile") + report.append("") + report.append("*Note: Durations are estimates based on typical run times.*") + report.append("") + report.append("| Cluster Profile | Jobs | Runs (14d) | Typical Duration | Est. Total Hours |") + report.append("|-----------------|------|------------|------------------|------------------|") + for profile, est in sorted(results.get("duration_estimates", {}).items(), + key=lambda x: -x[1]["total_runs"]): + report.append( + f"| {profile} | {est['job_count']} | {est['total_runs']} | " + f"~{est['typical_duration_min']}min | {est['estimated_total_hours']}h |" + ) + report.append("") + + # Trend analysis + report.append("## Trend Analysis") + report.append("") + improving = [j for j in all_jobs if j.get("trend") == "improving" and j["combined_runs"] >= 2] + degrading = [j for j in all_jobs if j.get("trend") == "degrading" and j["combined_runs"] >= 2] + report.append(f"- **Improving jobs:** {len(improving)}") + report.append(f"- **Degrading jobs:** {len(degrading)}") + report.append(f"- **Stable jobs:** {len(all_jobs) - len(improving) - len(degrading)}") + report.append("") + + if degrading: + report.append("### Degrading Jobs (investigate)") + report.append("") + for job in sorted(degrading, key=lambda x: (x.get("current_pass_rate") or 100))[:10]: + curr = job.get("current_pass_rate") + prev = job.get("previous_pass_rate") + curr_str = f"{curr:.0f}%" if curr is not None else "N/A" + prev_str = f"{prev:.0f}%" if prev is not None else "N/A" + report.append(f"- **{job['brief_name'][:50]}** ({job['release']}): {prev_str} → {curr_str}") + report.append("") + + report.append("---") + report.append("") + report.append("*Data Source: [Sippy](https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + print("=" * 60) + print("OpenStack CI Extended Metrics") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Load existing data + sippy_data = load_collected_data() + if not sippy_data: + print("Error: No Sippy data found. Run fetch_job_metrics.py first.") + sys.exit(1) + + inventory = load_job_inventory() + print(f"Loaded Sippy data from: {sippy_data.get('fetched_at')}") + print(f"Job inventory loaded: {inventory is not None}") + print() + + # Calculate extended metrics + results, all_jobs = calculate_extended_metrics(sippy_data, inventory) + + # Save results + results_path = os.path.join(OUTPUT_DIR, "extended_metrics.json") + with open(results_path, 'w') as f: + json.dump(results, f, indent=2) + print(f"Saved: {results_path}") + + all_jobs_path = os.path.join(OUTPUT_DIR, "extended_metrics_jobs.json") + with open(all_jobs_path, 'w') as f: + json.dump(all_jobs, f, indent=2) + print(f"Saved: {all_jobs_path}") + + # Generate report + report = generate_extended_report(results, all_jobs) + report_path = os.path.join(OUTPUT_DIR, "extended_metrics_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Saved: {report_path}") + + # Summary + print() + print("=" * 60) + print("Summary:") + overall = results["overall"] + print(f" Total jobs: {overall['total_jobs']}") + print(f" Combined runs (14d): {overall['combined_runs']}") + print(f" Combined pass rate: {overall['combined_pass_rate']:.1f}%") + print(f" Problem jobs: {overall['problem_job_count']}") + print("=" * 60) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/fetch_job_metrics.py b/.claude/skills/openstack-ci-analysis/scripts/fetch_job_metrics.py new file mode 100755 index 0000000..2df28c8 --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/fetch_job_metrics.py @@ -0,0 +1,319 @@ +#!/usr/bin/env python3 +""" +Fetch job metrics (pass rates, run counts) from Sippy API for OpenStack CI jobs. +Saves progress to files to allow resumption if interrupted. +""" + +import argparse +import json +import os +import sys +import time +from urllib.request import urlopen, Request +from urllib.error import URLError, HTTPError +from datetime import datetime + +SIPPY_BASE = "https://sippy.dptools.openshift.org/api" +RELEASES = ["4.17", "4.18", "4.19", "4.20", "4.21", "4.22"] + +# Will be set by parse_args() +OUTPUT_DIR = None + + +def parse_args(): + """Parse command line arguments.""" + parser = argparse.ArgumentParser( + description="Fetch job metrics from Sippy API for OpenStack CI jobs" + ) + parser.add_argument( + "--output-dir", + default=os.path.dirname(os.path.abspath(__file__)), + help="Directory for output files (default: script directory)" + ) + parser.add_argument( + "--force", + action="store_true", + help="Refetch data even if cache exists" + ) + return parser.parse_args() + +def fetch_json(url, retries=3, delay=2): + """Fetch JSON from URL with retries.""" + for attempt in range(retries): + try: + req = Request(url, headers={"User-Agent": "OpenStack-CI-Analysis/1.0"}) + with urlopen(req, timeout=60) as response: + return json.loads(response.read().decode()) + except (URLError, HTTPError) as e: + print(f" Attempt {attempt + 1} failed: {e}") + if attempt < retries - 1: + time.sleep(delay) + return None + +def fetch_openstack_jobs_for_release(release): + """Fetch all OpenStack jobs for a specific release.""" + url = f"{SIPPY_BASE}/jobs?release={release}" + print(f"Fetching jobs for release {release}...") + + data = fetch_json(url) + if data is None: + print(f" Failed to fetch data for {release}") + return [] + + # Filter for OpenStack jobs + openstack_jobs = [j for j in data if "openstack" in j.get("name", "").lower()] + print(f" Found {len(openstack_jobs)} OpenStack jobs out of {len(data)} total") + + return openstack_jobs + +def save_progress(data, filename): + """Save data to file.""" + filepath = os.path.join(OUTPUT_DIR, filename) + with open(filepath, 'w') as f: + json.dump(data, f, indent=2) + print(f"Saved: {filepath}") + +def load_progress(filename): + """Load data from file if exists.""" + filepath = os.path.join(OUTPUT_DIR, filename) + if os.path.exists(filepath): + with open(filepath, 'r') as f: + return json.load(f) + return None + +def analyze_job_metrics(all_jobs_by_release): + """Analyze and summarize job metrics.""" + summary = { + "generated": datetime.now().isoformat(), + "releases": {}, + "overall_stats": {}, + "worst_jobs": [], + "best_jobs": [], + "jobs_by_pass_rate": {} + } + + all_jobs_flat = [] + + for release, jobs in all_jobs_by_release.items(): + if not jobs: + continue + + release_stats = { + "total_jobs": len(jobs), + "total_runs": sum(j.get("current_runs", 0) for j in jobs), + "total_passes": sum(j.get("current_passes", 0) for j in jobs), + "avg_pass_rate": 0, + "jobs_below_90": 0, + "jobs_below_80": 0, + "jobs_below_50": 0, + } + + pass_rates = [] + for job in jobs: + rate = job.get("current_pass_percentage", 0) + pass_rates.append(rate) + if rate < 90: + release_stats["jobs_below_90"] += 1 + if rate < 80: + release_stats["jobs_below_80"] += 1 + if rate < 50: + release_stats["jobs_below_50"] += 1 + + # Add to flat list for overall analysis + all_jobs_flat.append({ + "release": release, + "name": job.get("name", ""), + "brief_name": job.get("brief_name", ""), + "pass_rate": rate, + "runs": job.get("current_runs", 0), + "passes": job.get("current_passes", 0), + "previous_pass_rate": job.get("previous_pass_percentage", 0), + "improvement": job.get("net_improvement", 0), + "last_pass": job.get("last_pass", ""), + "open_bugs": job.get("open_bugs", 0), + }) + + if pass_rates: + release_stats["avg_pass_rate"] = sum(pass_rates) / len(pass_rates) + + summary["releases"][release] = release_stats + + # Find worst and best performing jobs + jobs_with_runs = [j for j in all_jobs_flat if j["runs"] > 0] + if jobs_with_runs: + # Worst jobs (lowest pass rate with at least 2 runs) + jobs_with_sufficient_runs = [j for j in jobs_with_runs if j["runs"] >= 2] + summary["worst_jobs"] = sorted(jobs_with_sufficient_runs, key=lambda x: x["pass_rate"])[:20] + + # Best jobs (100% pass rate with most runs) + perfect_jobs = [j for j in jobs_with_runs if j["pass_rate"] == 100] + summary["best_jobs"] = sorted(perfect_jobs, key=lambda x: -x["runs"])[:20] + + # Group by pass rate ranges + ranges = { + "100%": [j for j in jobs_with_runs if j["pass_rate"] == 100], + "90-99%": [j for j in jobs_with_runs if 90 <= j["pass_rate"] < 100], + "80-89%": [j for j in jobs_with_runs if 80 <= j["pass_rate"] < 90], + "50-79%": [j for j in jobs_with_runs if 50 <= j["pass_rate"] < 80], + "below_50%": [j for j in jobs_with_runs if j["pass_rate"] < 50], + } + summary["jobs_by_pass_rate"] = {k: len(v) for k, v in ranges.items()} + + # Overall stats + if all_jobs_flat: + all_runs = sum(j["runs"] for j in all_jobs_flat) + all_passes = sum(j["passes"] for j in all_jobs_flat) + summary["overall_stats"] = { + "total_jobs": len(all_jobs_flat), + "total_runs": all_runs, + "total_passes": all_passes, + "overall_pass_rate": (all_passes / all_runs * 100) if all_runs > 0 else 0, + } + + return summary, all_jobs_flat + +def generate_metrics_report(summary, all_jobs): + """Generate a markdown report of job metrics.""" + report = [] + report.append("# OpenStack CI Job Metrics Report") + report.append("") + report.append(f"**Generated:** {summary['generated']}") + report.append("") + + # Overall stats + report.append("## Overall Statistics") + report.append("") + stats = summary.get("overall_stats", {}) + report.append(f"| Metric | Value |") + report.append(f"|--------|-------|") + report.append(f"| Total OpenStack Jobs Tracked | {stats.get('total_jobs', 0)} |") + report.append(f"| Total Job Runs (current period) | {stats.get('total_runs', 0)} |") + report.append(f"| Total Passes | {stats.get('total_passes', 0)} |") + report.append(f"| Overall Pass Rate | {stats.get('overall_pass_rate', 0):.1f}% |") + report.append("") + + # Pass rate distribution + report.append("## Pass Rate Distribution") + report.append("") + report.append("| Pass Rate Range | Job Count |") + report.append("|-----------------|-----------|") + for range_name, count in summary.get("jobs_by_pass_rate", {}).items(): + report.append(f"| {range_name} | {count} |") + report.append("") + + # By release + report.append("## Metrics by Release") + report.append("") + report.append("| Release | Jobs | Total Runs | Avg Pass Rate | <90% | <80% | <50% |") + report.append("|---------|------|------------|---------------|------|------|------|") + for release in RELEASES: + rel_stats = summary.get("releases", {}).get(release, {}) + if rel_stats: + report.append(f"| {release} | {rel_stats.get('total_jobs', 0)} | {rel_stats.get('total_runs', 0)} | {rel_stats.get('avg_pass_rate', 0):.1f}% | {rel_stats.get('jobs_below_90', 0)} | {rel_stats.get('jobs_below_80', 0)} | {rel_stats.get('jobs_below_50', 0)} |") + report.append("") + + # Worst performing jobs + report.append("## Worst Performing Jobs (by pass rate)") + report.append("") + report.append("Jobs with at least 2 runs, sorted by lowest pass rate:") + report.append("") + report.append("| Release | Job Name | Pass Rate | Runs | Passes |") + report.append("|---------|----------|-----------|------|--------|") + for job in summary.get("worst_jobs", [])[:15]: + report.append(f"| {job['release']} | {job['brief_name'][:60]} | {job['pass_rate']:.1f}% | {job['runs']} | {job['passes']} |") + report.append("") + + # Best performing jobs with high volume + report.append("## Best Performing Jobs (100% pass rate, most runs)") + report.append("") + report.append("| Release | Job Name | Runs | Last Pass |") + report.append("|---------|----------|------|-----------|") + for job in summary.get("best_jobs", [])[:10]: + last_pass = job['last_pass'][:10] if job['last_pass'] else "N/A" + report.append(f"| {job['release']} | {job['brief_name'][:60]} | {job['runs']} | {last_pass} |") + report.append("") + + # Jobs needing attention + report.append("## Jobs Needing Attention") + report.append("") + attention_jobs = [j for j in all_jobs if j["pass_rate"] < 80 and j["runs"] >= 2] + if attention_jobs: + report.append(f"**{len(attention_jobs)} jobs** have pass rate below 80%:") + report.append("") + for job in sorted(attention_jobs, key=lambda x: x["pass_rate"]): + report.append(f"- **{job['brief_name']}** ({job['release']}): {job['pass_rate']:.1f}% ({job['passes']}/{job['runs']} runs)") + else: + report.append("All jobs with sufficient runs have pass rate >= 80%.") + report.append("") + + # Data source + report.append("---") + report.append("") + report.append("*Data Source: Sippy (https://sippy.dptools.openshift.org/)*") + report.append("") + + return "\n".join(report) + +def main(): + global OUTPUT_DIR + args = parse_args() + OUTPUT_DIR = os.path.abspath(args.output_dir) + + # Create output directory if needed + os.makedirs(OUTPUT_DIR, exist_ok=True) + + print("=" * 60) + print("OpenStack CI Job Metrics Collector") + print("=" * 60) + print(f"Output directory: {OUTPUT_DIR}") + print() + + # Check for existing progress + progress_file = "sippy_jobs_raw.json" + existing_data = load_progress(progress_file) + + if existing_data and not args.force: + print(f"Found existing data from {existing_data.get('fetched_at', 'unknown')}") + print("Use --force to refetch") + all_jobs_by_release = existing_data.get("jobs_by_release", {}) + else: + all_jobs_by_release = {} + + for release in RELEASES: + jobs = fetch_openstack_jobs_for_release(release) + all_jobs_by_release[release] = jobs + + # Save progress after each release + save_progress({ + "fetched_at": datetime.now().isoformat(), + "releases_fetched": list(all_jobs_by_release.keys()), + "jobs_by_release": all_jobs_by_release + }, progress_file) + + time.sleep(1) # Be nice to the API + + print() + print("Analyzing metrics...") + summary, all_jobs = analyze_job_metrics(all_jobs_by_release) + + # Save summary + save_progress(summary, "job_metrics_summary.json") + save_progress(all_jobs, "job_metrics_all_jobs.json") + + # Generate report + report = generate_metrics_report(summary, all_jobs) + report_path = os.path.join(OUTPUT_DIR, "job_metrics_report.md") + with open(report_path, 'w') as f: + f.write(report) + print(f"Report saved: {report_path}") + + print() + print("=" * 60) + print("Summary:") + print(f" Total jobs: {summary['overall_stats'].get('total_jobs', 0)}") + print(f" Total runs: {summary['overall_stats'].get('total_runs', 0)}") + print(f" Overall pass rate: {summary['overall_stats'].get('overall_pass_rate', 0):.1f}%") + print("=" * 60) + +if __name__ == "__main__": + main() diff --git a/.claude/skills/openstack-ci-analysis/scripts/run_analysis.sh b/.claude/skills/openstack-ci-analysis/scripts/run_analysis.sh new file mode 100755 index 0000000..b3b934d --- /dev/null +++ b/.claude/skills/openstack-ci-analysis/scripts/run_analysis.sh @@ -0,0 +1,163 @@ +#!/bin/bash +# +# Run all OpenStack CI analysis scripts in the correct order. +# +# Usage: +# ./run_analysis.sh [--config-dir /path/to/ci-operator/config] [--output-dir /path/to/output] +# +# If --config-dir is not specified, defaults to ../../../ci-operator/config +# (relative to script location, assuming standard repo layout) +# +# If --output-dir is not specified, outputs to current working directory +# This allows running from any location in the filesystem + +set -e + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + +# Default config directory (relative to script location) +CONFIG_DIR="${SCRIPT_DIR}/../../../ci-operator/config" + +# Default output directory is current working directory +OUTPUT_DIR="$(pwd)" + +# Parse arguments +while [[ $# -gt 0 ]]; do + case $1 in + --config-dir) + CONFIG_DIR="$2" + shift 2 + ;; + --output-dir) + OUTPUT_DIR="$2" + shift 2 + ;; + --force) + FORCE="--force" + shift + ;; + --help) + echo "Usage: $0 [OPTIONS]" + echo "" + echo "Options:" + echo " --config-dir DIR Path to ci-operator/config directory" + echo " --output-dir DIR Directory for output files (default: current directory)" + echo " --force Refetch data from Sippy API" + echo "" + echo "Examples:" + echo " # Run from repo root, output to current directory" + echo " ./hack/openstack-ci-analysis/reporting-toolkit/run_analysis.sh" + echo "" + echo " # Run from anywhere, specify both directories" + echo " ./run_analysis.sh --config-dir /path/to/release/ci-operator/config --output-dir /tmp/analysis" + exit 0 + ;; + *) + echo "Unknown option: $1" + exit 1 + ;; + esac +done + +# Resolve to absolute paths +CONFIG_DIR="$(cd "$CONFIG_DIR" 2>/dev/null && pwd)" || { + echo "Error: Config directory not found: $CONFIG_DIR" + echo "Use --config-dir to specify the path to ci-operator/config" + exit 1 +} + +OUTPUT_DIR="$(mkdir -p "$OUTPUT_DIR" && cd "$OUTPUT_DIR" && pwd)" + +echo "============================================================" +echo "OpenStack CI Analysis Toolkit" +echo "============================================================" +echo "Script directory: $SCRIPT_DIR" +echo "Config directory: $CONFIG_DIR" +echo "Output directory: $OUTPUT_DIR" +echo "============================================================" +echo "" + +# Phase 1: Data Collection +echo "=== Phase 1: Data Collection ===" +echo "" + +echo "[1/4] Extracting job inventory..." +python3 "$SCRIPT_DIR/extract_openstack_jobs.py" \ + --config-dir "$CONFIG_DIR" \ + --output-dir "$OUTPUT_DIR" \ + --summary + +echo "" +echo "[2/4] Fetching job metrics from Sippy..." +python3 "$SCRIPT_DIR/fetch_job_metrics.py" \ + --output-dir "$OUTPUT_DIR" \ + ${FORCE:+"$FORCE"} + +echo "" +echo "[3/4] Calculating extended metrics..." +python3 "$SCRIPT_DIR/fetch_extended_metrics.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[4/4] Fetching platform comparison data..." +python3 "$SCRIPT_DIR/fetch_comparison_data.py" \ + --output-dir "$OUTPUT_DIR" + +# Phase 2: Configuration Analysis +echo "" +echo "=== Phase 2: Configuration Analysis ===" +echo "" + +echo "[1/3] Analyzing redundancy..." +python3 "$SCRIPT_DIR/analyze_redundancy.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[2/3] Analyzing coverage gaps..." +python3 "$SCRIPT_DIR/analyze_coverage.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[3/3] Analyzing trigger patterns..." +python3 "$SCRIPT_DIR/analyze_triggers.py" \ + --output-dir "$OUTPUT_DIR" + +# Phase 3: Runtime Analysis +echo "" +echo "=== Phase 3: Runtime Analysis ===" +echo "" + +echo "[1/3] Analyzing platform comparison..." +python3 "$SCRIPT_DIR/analyze_platform_comparison.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[2/3] Analyzing workflow pass rates..." +python3 "$SCRIPT_DIR/analyze_workflow_passrate.py" \ + --output-dir "$OUTPUT_DIR" + +echo "" +echo "[3/3] Categorizing failures..." +python3 "$SCRIPT_DIR/categorize_failures.py" \ + --output-dir "$OUTPUT_DIR" + +# Summary +echo "" +echo "============================================================" +echo "Analysis Complete!" +echo "============================================================" +echo "" +echo "Output directory: $OUTPUT_DIR" +echo "" +echo "Generated Reports:" +find "$OUTPUT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | while read -r f; do + echo " - $(basename "$f")" +done +echo "" +echo "Data Files:" +find "$OUTPUT_DIR" -maxdepth 1 -name "*.json" -type f 2>/dev/null | wc -l | xargs -I {} echo " {} JSON files generated" +echo "" +echo "To view key findings, run:" +echo " cd $OUTPUT_DIR" +echo " python3 -c \"import json; d=json.load(open('extended_metrics.json')); print(f'Pass rate: {d[\\\"overall\\\"][\\\"combined_pass_rate\\\"]:.1f}%')\"" +echo "" diff --git a/README.md b/README.md index a1d1385..40fccc8 100644 --- a/README.md +++ b/README.md @@ -99,3 +99,23 @@ Usage: python hack/openstack-job-audit.py [output_file] output_file: Output file path (default: ./openstack-ci-report.yaml) Example: ./openstack-job-audit.py /path/to/openshift/release ./report.yaml + +## OpenStack CI Analysis (Claude Code Skill) + +A comprehensive CI analysis toolkit is available as a Claude Code skill at +`.claude/skills/openstack-ci-analysis/`. This skill analyzes OpenStack CI job +health, pass rates, coverage gaps, and failure categories. + +**Features:** +- Extract job inventory from CI configuration files +- Fetch runtime metrics from Sippy API +- Compare OpenStack pass rates against AWS, GCP, Azure, vSphere +- Categorize failures by root cause (infrastructure, flaky, product bug) +- Identify coverage gaps and trigger optimization opportunities + +**Usage with Claude Code:** + +The skill is automatically available when using Claude Code in this repository. +Ask Claude to analyze CI jobs, generate health reports, or investigate failures. + +Requires Python 3.6+ and PyYAML (`pip install pyyaml`).