From 6fc661aa56e3ef8297a4c4982d032b2d43d6ff3c Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Tue, 17 Mar 2026 15:40:58 +0100
Subject: [PATCH 1/6] Add expert-reviewer agent, skill, and instructions from
 review history extraction

Generated from analysis of 695 review comments (2,102 effective weighted)
across 4 expert reviewers spanning 10 years of dotnet/fsharp and
fsharp/fslang-design review history.

Artifacts:
- .github/agents/expert-reviewer.md: 15-dimension review agent with
  5-wave workflow and folder hotspot mapping
- .github/skills/expert-review/SKILL.md: Slim invocation pointer with
  dimension selection guide and self-review checklist
- .github/instructions/ExpertReview.instructions.md: Top 10 compiler
  review rules scoped to src/Compiler/**
- .github/agents/extraction-pipeline.md: Reusable pipeline agent for
  future reviewer expertise extraction

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/agents/expert-reviewer.md             | 377 +++++++++++++++++
 .github/agents/extraction-pipeline.md         | 385 ++++++++++++++++++
 .../instructions/ExpertReview.instructions.md |  31 ++
 .github/skills/expert-review/SKILL.md         |  52 +++
 4 files changed, 845 insertions(+)
 create mode 100644 .github/agents/expert-reviewer.md
 create mode 100644 .github/agents/extraction-pipeline.md
 create mode 100644 .github/instructions/ExpertReview.instructions.md
 create mode 100644 .github/skills/expert-review/SKILL.md

diff --git a/.github/agents/expert-reviewer.md b/.github/agents/expert-reviewer.md
new file mode 100644
index 00000000000..df61e30ba10
--- /dev/null
+++ b/.github/agents/expert-reviewer.md
@@ -0,0 +1,377 @@
+---
+name: expert-reviewer
+description: "Multi-dimensional code review agent for F# compiler PRs. Invoke for thorough review of compiler changes, type checking, IL emission, AST correctness, binary compatibility, and IDE integration. Use when reviewing PRs or requesting expert feedback."
+---
+
+# Expert Reviewer
+
+A multi-dimensional review agent for the F# compiler. Evaluates changes across 15 dimensions derived from long-standing review practices in the codebase.
+
+## Overarching Principles
+
+- **Testing is the gating criterion.** No behavioral change merges without a test that exercises it. Missing tests are the single most common review blocker.
+- **Binary compatibility is non-negotiable.** Any change to serialized metadata must preserve forward and backward compatibility across compiler versions. Treat pickled data like a wire protocol.
+- **Determinism is a correctness property.** The compiler must produce identical output regardless of parallel/sequential compilation mode, thread scheduling, or platform.
+- **Feature gating protects users.** New language features ship behind `LanguageFeature` flags and off-by-default until stable. Breaking changes require an RFC.
+- **Diagnostics are user-facing.** Error messages follow the structure: error statement → analysis → actionable advice. Wording changes need the same care as API changes.
+- **IDE responsiveness is a feature.** Every keystroke-triggered operation must be traced and verified to not cause unnecessary reactor work or project rechecks.
+- **Prefer general solutions over special cases.** Do not hardwire specific library optimizations into the compiler. Prefer inlining-based optimizations that apply broadly.
+
+## Review Dimensions
+
+### 1. Test Coverage & Verification
+
+Every behavioral change, bug fix, and new feature requires corresponding tests before merge.
+
+**CHECK:**
+- Verify that every code path added or modified has a test exercising it.
+- Confirm new tests cover behavior not already exercised by existing test suites.
+- Explain all new errors in test baselines and confirm they are expected.
+- Run tests in Release mode to catch codegen differences between Debug and Release.
+- Run tests on both desktop (.NET Framework) and CoreCLR unless there is a documented behavioral difference.
+- Verify attribute availability in the target runtime version before writing tests against it.
+- Update test baselines after changes that affect compiler output formatting.
+- Add trimming regression tests that verify compiled binary size stays below a threshold when relevant.
+
+**Severity:** Missing tests for behavioral changes → **high**. Missing cross-TFM coverage → **medium**.
+
+**Hotspots:** `tests/FSharp.Compiler.ComponentTests/`, `tests/FSharp.Compiler.Service.Tests/`, `tests/fsharp/`
+
+---
+
+### 2. Type System Correctness
+
+Type checking and inference must be sound. Subtle bugs in constraint solving, overload resolution, or scope handling can produce incorrect programs.
+
+**CHECK:**
+- Use `tryTcrefOfAppTy` or the `AppTy` active pattern instead of direct `TType` pattern matches.
+- Avoid matching on empty contents as a proxy for signature file presence — use explicit markers.
+- Verify nullness errors on return types are intentional and documented.
+- Exclude current file signatures from `tcState` when processing implementation files.
+- Distinguish initialization-related bugs from recursive type checking bugs when triaging.
+- Document that type inference requires analyzing implementation before resolving member types.
+- Add error recovery for namespace fragment combination differences between parallel and sequential paths.
+
+**Severity:** Type system unsoundness → **critical**. Incorrect inference in edge cases → **high**.
+
+**Hotspots:** `src/Compiler/Checking/`, `src/Compiler/TypedTree/`
+
+---
+
+### 3. Binary Compatibility & Metadata Safety
+
+The pickled metadata format is a cross-version contract. DLLs compiled with any F# version must be consumable by any other version.
+
+**CHECK:**
+- Never remove, reorder, or reinterpret existing serialized data fields.
+- Ensure new data is invisible to old readers (stream B with tag-byte detection).
+- Exercise old-compiler-reads-new-output and new-compiler-reads-old-output for any metadata change.
+- Verify the byte count does not depend on compiler configuration (feature flags, LangVersion, TFM) — only on stream-encoded data.
+- Add cross-version compatibility tests when changing anonymous record or constraint emission.
+- Ensure new parameters are conditionally passed to preserve existing behavior for unchanged callers.
+
+**Severity:** Any metadata format breakage → **critical**. Missing compat test → **high**.
+
+**Hotspots:** `src/Compiler/TypedTree/TypedTreePickle.fs`, `src/Compiler/Driver/CompilerImports.fs`
+
+*See also: `.github/instructions/TypedTreePickle.instructions.md` for detailed stream alignment rules.*
+
+---
+
+### 4. IL Emission & Debug Correctness
+
+Code generation must produce correct, verifiable IL. Debug sequence points must enable accurate stepping.
+
+**CHECK:**
+- Test code changes with optimizations both enabled and disabled.
+- Strip debug points when matching on `Expr.Lambda` during code generation.
+- Check for existing values on the IL stack when debugging codegen issues with debug points.
+- Investigate IL diffs between parallel and sequential type checking to identify determinism root causes.
+- Verify that newly added code paths are actually reachable before merging.
+- Confirm that `Decimal` cases are not already eliminated by `TryEliminateDesugaredConstants` before adding them.
+- Cross-reference `GenFieldInit` in `IlxGen.fs` when modifying field initialization in `infos.fs`.
+- Manually verify debug stepping for loops, while loops, task code, list/array expressions, and sequence expressions.
+- Perform end-to-end debug stepping verification for all affected control flow cases before merge.
+
+**Severity:** Incorrect IL → **critical**. Debug stepping regression → **high**. Missing IL test → **medium**.
+
+**Hotspots:** `src/Compiler/CodeGen/`, `src/Compiler/AbstractIL/`, `src/Compiler/Optimize/`
+
+---
+
+### 5. Syntax Tree & Parser Integrity
+
+AST nodes must accurately represent source code. Parser changes are high-risk because they affect every downstream phase.
+
+**CHECK:**
+- Update all pattern matches on `SynBinding` in tree-walking code when modifying its shape.
+- Remove default wildcard patterns in type walkers to catch missing cases at compile time.
+- Handle all cases of discriminated unions including spreads in tree walkers.
+- Gate parser changes behind the appropriate language version.
+- Add new parser cases to existing rules that share the same prefix.
+- Visit spread types in `FileContentMapping` to build correct graph edges.
+- Assess compound expression compatibility when introducing new syntax to avoid breaking existing code.
+- Remove unused AST nodes created for experimental features that were not pursued.
+
+**Severity:** Incorrect AST node → **critical**. Missing walker case → **high**. Ungated parser change → **high**.
+
+**Hotspots:** `src/Compiler/SyntaxTree/`, `src/Compiler/pars.fsy`
+
+---
+
+### 6. Parallel Compilation & Determinism
+
+Parallel type checking must produce bit-identical output to sequential compilation.
+
+**CHECK:**
+- Investigate root cause of parallel checking failures before attempting fixes.
+- Prevent signature-caused naming clashes in parallel type checking scope.
+- Ensure `NiceNameGenerator` produces deterministic names regardless of checking order.
+- Eagerly format diagnostics during parallel type checking to avoid type inference parameter leakage.
+- Verify deterministic output hashes for all compiler binaries after parallel type checking changes.
+- Use ILDASM to diff binaries when investigating determinism bugs.
+- Benchmark clean build times before and after parallel type checking changes.
+- Resolve naming clashes between static members and top-level bindings in parallel type checking.
+
+**Severity:** Non-deterministic output → **critical**. Scoping error → **high**. Missing benchmark → **medium**.
+
+**Hotspots:** `src/Compiler/Checking/`, `src/Compiler/Driver/`
+
+---
+
+### 7. Concurrency & Cancellation Safety
+
+Async and concurrent code must handle cancellation, thread safety, and exception propagation correctly.
+
+**CHECK:**
+- Add observable reproduction tests for async cancellation edge cases.
+- Preserve stack traces when re-raising exceptions in async control flow.
+- Add tests verifying stack trace preservation through async exception handling paths.
+- Keep Task-to-Async conversions explicit and searchable in the codebase.
+- Check cancellation token status before executing async operations.
+- Verify `Cache` replacement preserves exactly-once initialization semantics of `Lazy` wrappers.
+- Do not rely on specific thread pool thread assignments in tests.
+- Do not swallow `OperationCanceledException` in catch-all handlers.
+
+**Severity:** Race condition or data corruption → **critical**. Swallowed cancellation → **high**. Missing async test → **medium**.
+
+**Hotspots:** `src/Compiler/Service/`, `src/Compiler/Facilities/`, `src/FSharp.Core/`
+
+---
+
+### 8. IDE Performance & Reactor Efficiency
+
+The compiler service must respond to editor keystrokes without unnecessary recomputation.
+
+**CHECK:**
+- Verify that `IncrementalBuilder` caching prevents duplicate builder creation per project.
+- Verify that changes do not trigger endless project rechecks by examining reactor traces.
+- Analyze reactor trace output to identify unnecessary work triggered by keystrokes.
+- Avoid triggering brace matching and navigation bar updates on non-structural keystrokes.
+- Verify each diagnostic service makes exactly one request per keystroke.
+- Investigate all causes of needless project rebuilding in IDE, not just the first one found.
+- Test IDE changes on large solutions before merging.
+
+**Severity:** Endless recheck loop → **critical**. Unnecessary rebuild → **high**. Missing trace verification → **medium**.
+
+**Hotspots:** `src/Compiler/Service/`, `src/FSharp.Compiler.LanguageServer/`
+
+---
+
+### 9. Editor Integration & UX
+
+VS extension and editor features must be correct, consistent, and not regress existing workflows.
+
+**CHECK:**
+- Separate legacy project and CPS project modalities clearly before reasoning about changes.
+- Revert problematic editor features to preview status when they cause regressions.
+- Ship new warnings off-by-default with tests and an IDE CodeFix.
+- Use existing compiler helpers for parameter name matching in IDE CodeFixes.
+- Fix inconsistent paste indentation behavior that varies by cursor column position.
+- Make project cache size a user-configurable setting.
+- Use Roslyn mechanisms for analyzer scheduling while tuning delays.
+
+**Severity:** Editor regression in common workflow → **critical**. Inconsistent behavior → **high**. Missing CodeFix → **medium**.
+
+**Hotspots:** `vsintegration/`, `src/FSharp.Compiler.LanguageServer/`
+
+---
+
+### 10. Diagnostic Quality
+
+Error and warning messages are the compiler's user interface. They must be precise, consistent, and actionable.
+
+**CHECK:**
+- Structure error messages as: error statement, then analysis, then advice.
+- Only rename error identifiers to reflect actual error message content.
+- Format suggestion messages as single-line when `--vserrors` is enabled.
+- Use clearer wording in quotation-related error messages.
+- Emit a warning rather than silently ignoring unsupported default parameter values.
+- Enable unused variable warnings by default in projects to catch common bugs.
+- Eagerly format diagnostics at production time to prevent parameter leakage in parallel checking.
+
+**Severity:** Misleading diagnostic → **high**. Inconsistent format → **medium**. Wording improvement → **low**.
+
+**Hotspots:** `src/Compiler/FSComp.txt`, `src/Compiler/Checking/`
+
+---
+
+### 11. Feature Gating & Compatibility
+
+New features must be gated behind language version checks. Breaking changes require RFC process.
+
+**CHECK:**
+- Gate new language features behind a `LanguageFeature` flag even if shipped as bug fixes.
+- Ship experimental features off-by-default and discuss enablement strategy with stakeholders.
+- Factor out cleanup changes separately from feature enablement.
+- Require an fslang suggestion and RFC for language additions.
+- Assess compound expression compatibility when introducing new syntax.
+- Reject changes that alter the C#/.NET visible assembly surface as breaking changes.
+- Create an RFC retroactively for breaking changes that were merged without one.
+- Treat all parser changes as high-risk and review thoroughly.
+
+**Severity:** Ungated breaking change → **critical**. Missing RFC → **high**. Bundled cleanup+feature → **medium**.
+
+**Hotspots:** `src/Compiler/Checking/`, `src/Compiler/SyntaxTree/`, `src/Compiler/Facilities/LanguageFeatures.fs`
+
+---
+
+### 12. Idiomatic F# & Naming
+
+Compiler code should follow F# idioms and use clear, consistent terminology for internal concepts.
+
+**CHECK:**
+- Use 4-space indent before pipe characters in compiler code.
+- Use `let mutable` instead of `ref` cells for mutable state.
+- Use `ResizeArray` type alias instead of `System.Collections.Generic.List<_>`.
+- Use pipeline operators for clearer data flow in service code.
+- Shadow variables when rebinding to improve code clarity.
+- Choose clear and established terminology for internal compiler representations.
+- Review terminology holistically before renaming within optimization phases.
+- Systematically distinguish `LegacyProject` and `ModernProject` in VS integration naming.
+
+**Severity:** Deprecated construct → **medium**. Naming confusion → **medium**. Style deviation → **low**.
+
+**Hotspots:** All `src/Compiler/` directories, `vsintegration/`
+
+---
+
+### 13. Code Structure & Technical Debt
+
+Keep the codebase clean. Extract helpers, remove dead code, and avoid ad-hoc patches.
+
+**CHECK:**
+- Remove default wildcard patterns in discriminated union matches to catch missing cases at compile time.
+- Extract duplicated logic into a shared function with input arguments.
+- Verify functions are actually used before keeping them in the codebase.
+- Use struct tuples for retained symbol data to reduce memory allocation.
+- Use `ConditionalWeakTable` for caches keyed by GC-collected objects.
+- Remove duplicate helpers and consolidate to a single shared implementation.
+- Fix compiler warnings like unused value bindings before merging.
+- Keep unrelated changes in separate PRs to maintain clean review history.
+- Follow existing abstraction patterns (e.g., `HasFSharpAttribute`) instead of ad-hoc checks.
+
+**Severity:** Unreachable code path → **high**. Duplication → **medium**. Missing helper extraction → **low**.
+
+**Hotspots:** `src/Compiler/Checking/`, `src/Compiler/Optimize/`, `src/Compiler/Service/`
+
+---
+
+### 14. API Surface & Public Contracts
+
+FSharp.Core and FCS public APIs are permanent commitments. Changes must be deliberate.
+
+**CHECK:**
+- Reject changes that alter the C#/.NET visible assembly surface as breaking changes.
+- Verify accuracy of IL-to-expression inversion and add systematic tests for quoted expression decompilation.
+- Ensure decompiled expression trees would pass type checking when reconverted to code.
+- Fix the root cause in `ConvExprPrim` rather than patching individual expression decompilation cases.
+- Document the purpose of new public API arguments in XML docs.
+- Update exception XML docs in `.fsi` files when behavior changes.
+- Name potentially long-running FCS API methods with descriptive prefixes like `GetOptimized`.
+- Systematically apply `InlineIfLambda` to every inlined function taking a lambda applied only once.
+
+**Severity:** Unintended public API break → **critical**. Missing XML docs → **medium**. Naming convention → **low**.
+
+**Hotspots:** `src/FSharp.Core/`, `src/Compiler/Service/`
+
+---
+
+### 15. Build & Packaging Integrity
+
+Build configuration and NuGet packaging must be correct, reproducible, and not introduce unnecessary dependencies.
+
+**CHECK:**
+- Avoid packaging internal files as content files in NuGet packages.
+- Place non-library binaries in `tools/` directory rather than `lib/` in NuGet packages.
+- Remove incorrect package dependencies from FSharp.Core nuspec.
+- Be mindful of package size impact when adding dependencies to core assemblies.
+- Run `build proto` or `git clean -xfd` when encountering stale build state issues.
+- Preserve parallel build settings unless there is a documented reason to remove them.
+- Verify nuspec comments match the correct package version.
+- Fix dynamic loader to handle transitive native references for NuGet packages in FSI.
+
+**Severity:** Wrong package layout → **high**. Stale build state → **medium**. Missing comment → **low**.
+
+**Hotspots:** `eng/`, `setup/`, `src/Compiler/Driver/`
+
+---
+
+## Review Workflow
+
+Execute review in five waves, each building on the previous.
+
+### Wave 0: Orientation
+
+1. Read the PR title, description, and linked issues.
+2. Identify which dimensions are relevant based on files changed.
+3. Use the hotspot table below to prioritize dimensions.
+4. Check if existing instructions files apply (`.github/instructions/`).
+
+### Wave 1: Structural Scan
+
+1. Verify every behavioral change has a corresponding test (Dimension 1).
+2. Check feature gating — new features must have `LanguageFeature` guards (Dimension 11).
+3. Verify no unintended public API changes (Dimension 14).
+4. Check for binary compatibility concerns in pickle/import code (Dimension 3).
+
+### Wave 2: Correctness Deep-Dive
+
+1. Trace type checking changes through constraint solving and inference (Dimension 2).
+2. Verify IL emission correctness with both Debug and Release optimizations (Dimension 4).
+3. Validate AST node accuracy against source syntax (Dimension 5).
+4. Check parallel determinism if checking/name-generation code is touched (Dimension 6).
+
+### Wave 3: Runtime & Integration
+
+1. Verify concurrency safety — no races, proper cancellation, stack traces preserved (Dimension 7).
+2. Check IDE reactor impact — no unnecessary rechecks or keystroke-triggered rebuilds (Dimension 8).
+3. Validate editor feature behavior across project types (Dimension 9).
+4. Review diagnostic message quality (Dimension 10).
+
+### Wave 4: Quality & Polish
+
+1. Check F# idiom adherence and naming consistency (Dimension 12).
+2. Look for code structure improvements — dead code, duplication, missing abstractions (Dimension 13).
+3. Verify build and packaging correctness (Dimension 15).
+4. Confirm all test baselines are updated and explained.
+
+## Folder Hotspot Mapping
+
+| Directory | Primary Dimensions | Secondary Dimensions |
+|---|---|---|
+| `src/Compiler/Checking/` | Type System (2), Parallel Compilation (6), Feature Gating (11) | Diagnostics (10), Code Structure (13) |
+| `src/Compiler/CodeGen/` | IL Emission (4) | Test Coverage (1), Debug Correctness (4) |
+| `src/Compiler/SyntaxTree/` | Parser Integrity (5) | Feature Gating (11), Code Structure (13) |
+| `src/Compiler/TypedTree/` | Binary Compatibility (3), Type System (2) | Parallel Compilation (6) |
+| `src/Compiler/Optimize/` | IL Emission (4), Code Structure (13) | Test Coverage (1) |
+| `src/Compiler/AbstractIL/` | IL Emission (4) | Debug Correctness (4) |
+| `src/Compiler/Driver/` | Binary Compatibility (3), Build (15) | Parallel Compilation (6) |
+| `src/Compiler/Service/` | IDE Performance (8), Concurrency (7) | API Surface (14) |
+| `src/Compiler/Facilities/` | Feature Gating (11), Concurrency (7) | Naming (12) |
+| `src/Compiler/FSComp.txt` | Diagnostic Quality (10) | — |
+| `src/FSharp.Core/` | API Surface (14), Concurrency (7) | Idiomatic F# (12) |
+| `src/FSharp.Compiler.LanguageServer/` | IDE Performance (8), Editor UX (9) | Concurrency (7) |
+| `vsintegration/` | Editor Integration (9) | IDE Performance (8), Naming (12) |
+| `tests/` | Test Coverage (1) | All dimensions |
+| `eng/`, `setup/` | Build & Packaging (15) | — |
diff --git a/.github/agents/extraction-pipeline.md b/.github/agents/extraction-pipeline.md
new file mode 100644
index 00000000000..e8e06b23c20
--- /dev/null
+++ b/.github/agents/extraction-pipeline.md
@@ -0,0 +1,385 @@
+---
+name: extraction-pipeline
+description: "Extracts review expertise from a GitHub user's history and generates Copilot instructions, skills, and a review agent. Invoke when setting up expert review capabilities for a repository based on a specific reviewer's historical feedback patterns."
+---
+
+# Expert Reviewer Extraction Pipeline
+
+Generate folder-scoped instructions, topic-scoped skills, and a multi-dimensional review agent from a GitHub user's public review history. Produces anonymized, deduplicated, Copilot-compatible `.github/` artifacts.
+
+## Scale Warning
+
+This pipeline processes **thousands** of GitHub items (typically 3,000–10,000+ issues, PRs, discussions, and review comments spanning a decade). It will not fit in a single context window.
+
+**Use sub-agents for everything.** The orchestrator manages SQLite state and dispatches work. Sub-agents do the heavy lifting:
+- **Collection**: one sub-agent per repo × date range chunk. Parallelize aggressively (6+ concurrent agents).
+- **Comment fetching**: one sub-agent per ~200 items. This is the most API-call-intensive phase.
+- **Semantic analysis**: one sub-agent per ~200 comments. Each classifies and extracts rules.
+- **Synthesis**: one sub-agent to read all analysis summaries (not raw data) and produce dimensions/principles.
+- **Artifact generation**: one sub-agent per artifact type (instructions, skills, agent).
+- **Validation/improvement**: one sub-agent per file or per concern.
+
+Store all intermediate results in **SQLite** (queryable) and **JSON backup files** (recoverable). Sub-agents write results to files; the orchestrator imports into SQLite and dispatches next phase. Never rely on passing large datasets through context — use the filesystem.
+
+Use `model: "claude-opus-4.6"` for all sub-agents. Use background mode (`mode: "background"`) so agents run in parallel and the orchestrator is notified on completion.
+
+## Inputs
+
+Collect these before starting. If any are missing, ask the user via the `ask_user` tool.
+
+| Parameter | Required | Example |
+|-----------|----------|---------|
+| `landing_repo` | yes | `dotnet/fsharp` — the repo receiving the artifacts |
+| `username` | yes | GitHub username whose review history to extract |
+| `earliest_year` | no (default: 10 years back) | `2015` |
+| `reference_repos` | no | Additional repos to search (e.g., `dotnet/sdk`, `dotnet/runtime`) |
+| `agent_name` | no (default: `expert-reviewer`) | Name for the review agent and skill |
+| `skill_trigger` | no (default: auto-derived) | Keywords that trigger the review skill |
+
+### Prerequisites
+
+The landing repo must be checked out locally — the pipeline searches its directory structure, verifies file paths, reads existing docs, and validates generated artifacts against the actual codebase. If not checked out, clone it first:
+
+```bash
+gh repo clone {landing_repo}
+```
+
+If `reference_repos` are specified and the pipeline needs to search their code (e.g., for cross-repo integration patterns), check those out as sibling directories.
+
+---
+
+## Phase 1: Data Collection
+
+### 1.1 Index all activity
+
+Search each repo for issues, PRs, and discussions where `username` participated. GitHub search returns max 1000 results per query — split by date ranges to capture everything.
+
+For each repo:
+```
+search_issues:        commenter:{username} created:{year_start}..{year_end}
+search_pull_requests: commenter:{username} created:{year_start}..{year_end}
+```
+
+For discussions (if the repo uses them), use the GitHub GraphQL API:
+```graphql
+query {
+  search(query: "repo:{owner}/{repo} commenter:{username} type:discussion", type: DISCUSSION, first: 100) {
+    nodes { ... on Discussion { number title body createdAt url category { name } } }
+  }
+}
+```
+
+Store in SQLite (`gh_activity` table): repo, type (issue/pr/discussion), number, title, state, created_at, updated_at, labels, url, author.
+
+Parallelize across repos and date ranges. Use sub-agents for large volumes.
+
+### 1.2 Collect actual comments
+
+For each indexed item, fetch the user's actual comments:
+- **Issues**: `issue_read` → `get_comments` → filter to username
+- **PRs — general comments**: `pull_request_read` → `get_comments` → filter to username
+- **PRs — review comments** (code-level, with file path + diff hunk): `pull_request_read` → `get_review_comments` → filter to username
+- **PRs — reviews** (approval/request-changes with summary body): `pull_request_read` → `get_reviews` → filter to username. These carry the reviewer's top-level verdict and summary — often the most opinionated content.
+- **Discussions**: Use GraphQL to fetch comment nodes filtered to username. Discussion comments often contain design rationale and architectural decisions.
+
+Store in SQLite (`user_comments` table): comment_id, activity_id, repo, comment_type (issue_comment, review_comment, pr_comment, review_summary, discussion_comment), body, created_at, file_path, diff_hunk, url.
+
+This is the most API-intensive phase. Batch into sub-agents by date range. Handle rate limits with retry.
+
+### 1.3 Collect PR context
+
+For PRs with review comments, also collect:
+- Files changed (`get_files`): path, additions, deletions, status
+- PR labels and description
+
+This maps comments to code areas.
+
+### 1.4 Cross-validate against current codebase
+
+Collected data references files and folders as they existed at the time of the comment — migrations and refactorings happen. Before enrichment, reconcile all file paths:
+
+1. Extract all unique file paths from collected comments (review comments have `file_path`, PR files have `path`).
+2. For each path, check if it exists in the current repo (`Test-Path` or `glob`).
+3. If missing, search for the filename in its current location (files get moved between folders). Update the path if found.
+4. If the file was deleted entirely, keep the comment's essence (the rule it teaches) but drop the file pointer. The rule may still apply to successor code.
+
+This prevents generating instructions that point at nonexistent files.
+
+### 1.5 Backup
+
+Write all collected data as JSON to a backup directory (e.g., `{landing_repo}-analysis/`). The SQLite database is the working copy; JSON is the safety net.
+
+---
+
+## Phase 2: Data Enrichment and Catalogization
+
+### 2.1 Study the landing repo
+
+Before analyzing comments, understand the codebase:
+- Directory structure → feature area mapping
+- Existing documentation (specs, wiki, guides)
+- Existing `.github/` artifacts (instructions, skills, agents, copilot-instructions.md, AGENTS.md)
+- Technology stack, conventions, key files
+
+Store as a feature area reference table in SQLite.
+
+### 2.2 Semantic analysis
+
+For each collected comment, classify using a sub-agent (Opus). **Do not use a hardcoded category list** — derive categories from the data:
+
+1. **Bootstrap pass**: Take a random sample of ~200 comments. Ask a sub-agent to read them and propose a category taxonomy that fits this specific reviewer and codebase. The agent should identify recurring themes, name them, and define each in one sentence. Expect 15–40 categories to emerge.
+
+2. **Classification pass**: Using the derived taxonomy, classify all comments in batches (~200 per sub-agent). For each comment extract:
+   - **Categories** (one or more, from the derived taxonomy)
+   - **Feature area**: map to the landing repo's code structure (from 2.1)
+   - **File/folder**: which code path does this apply to
+   - **Sentiment**: approval, concern, suggestion, question, blocking
+   - **Severity**: trivial, minor, moderate, major, critical
+   - **Focus point**: what specifically is being addressed
+   - **Derived rule**: actionable rule extracted from the comment
+
+3. **Taxonomy refinement**: After the first full pass, review category distribution. Merge categories with <5 occurrences into broader ones. Split categories with >500 occurrences if they contain distinct sub-themes. Re-classify affected comments.
+
+Store in SQLite (`comment_analysis` table).
+
+Process in batches. Use sub-agents — each handles ~200 comments. Run in parallel.
+
+### 2.3 Clustering
+
+Aggregate analysis results to identify:
+
+1. **Review dimensions**: Recurring themes across hundreds of comments. Each dimension should be specific enough to act on, broad enough to apply across many PRs. Target 8–24 dimensions.
+
+2. **Folder hotspots**: Which directories receive the most review feedback, and which dimensions apply there.
+
+3. **Overarching principles**: Cross-cutting rules that apply everywhere.
+
+4. **Repo-specific knowledge**: Rules that are unique to this codebase, not generic programming advice.
+
+Use a synthesis sub-agent (Opus) that reads all analysis summaries and produces:
+- Dimension list with rules, severity, evidence
+- Folder → dimension mapping
+- Principle list
+- Knowledge area reference table
+
+---
+
+## Phase 3: Artifact Generation
+
+### 3.1 Raw creation
+
+Generate three artifact types:
+
+#### Instructions (`.github/instructions/*.instructions.md`)
+- One per major code folder/area
+- YAML frontmatter with `applyTo` glob matching existing folders
+- Content: folder-specific rules derived from review feedback for that area
+- Concise (under 60 lines) — these load on every edit in scope
+- Reference docs, don't reproduce them
+- Do NOT duplicate AGENTS.md or copilot-instructions.md
+
+#### Skills (`.github/skills/*/SKILL.md`)
+- One per overarching topic that doesn't map to a single folder
+- YAML frontmatter: `name` (gerund form, lowercase+hyphens, ≤64 chars), `description` (third person, ≤1024 chars, describes WHAT and WHEN — this is the discovery trigger)
+- Content: decision frameworks, checklists, rules, examples
+- Under 500 lines — use progressive disclosure for longer content
+- Reference docs, don't reproduce them
+
+#### Review Agent (`.github/agents/{agent_name}.md`)
+- Single source of truth for the review methodology
+- Contains: overarching principles, all dimensions inline (with rules + CHECK flags), folder hotspot mapping, review workflow
+- The review workflow is 5 waves (see below)
+
+**Commit** after raw creation.
+
+### 3.2 Anonymize
+
+Remove all personal names, comment counts, PR number references, evidence statistics, "distilled from" language. The artifacts should read as authoritative engineering guidance, not data analysis output.
+
+**Commit** after anonymization.
+
+### 3.3 Improve per Anthropic guide
+
+Apply https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices:
+- `name`: gerund form, lowercase+hyphens
+- `description`: third person, specific triggers, ≤1024 chars
+- Concise — only add what the model doesn't already know
+- No time-sensitive information
+- Consistent terminology
+- Progressive disclosure (SKILL.md as overview, reference files for detail)
+- One level deep references only
+
+**Commit** after improvements.
+
+### 3.4 Deduplicate and cross-reference
+
+Compare new artifacts against existing `.github/` content:
+- Check trigger overlap between new and existing skills
+- Check body overlap (same howto in two places)
+- Resolve: if complementary (how vs when), add cross-references. If duplicate, merge or delete.
+- Instructions must not repeat AGENTS.md or copilot-instructions.md
+- All doc links verified to exist on disk
+
+**Commit** after deduplication.
+
+---
+
+## Phase 4: Review Workflow (embedded in the agent)
+
+The review agent runs a 5-wave process when invoked:
+
+### Wave 0: Build Review Briefing Pack
+
+Before any dimension analysis, assemble the full context. Sub-agents reviewing code without context hallucinate or miss important design intent.
+
+**Collect:**
+- **PR metadata**: title, description, author, labels, linked issues/specs
+- **Existing PR comments and reviews**: what has already been discussed — don't duplicate existing feedback
+- **Referenced issues and design documents**: if the PR links to a spec or issue, read them for design intent
+- **Changed files list**: `pull_request_read` → `get_files` for paths, additions, deletions
+
+**Compute the correct diff:**
+
+The PR diff must reflect only the PR's own changes — not unrelated commits on `main` since the branch was created. Agents often get this wrong (e.g., they see "deletions" that are actually new `main` commits not in the branch).
+
+Use `gh` CLI — it computes the diff correctly against the merge base:
+```bash
+# Correct diff via gh CLI (uses GitHub's merge-base computation)
+gh pr diff {pr_number} --repo {owner}/{repo}
+
+# Or via API (same correct merge-base diff)
+gh api repos/{owner}/{repo}/pulls/{pr_number} --jq '.diff_url' | xargs curl -sL
+```
+
+Alternatively, use the MCP tool `pull_request_read` → `get_diff` which GitHub also computes correctly against the merge base.
+
+**Do NOT use raw `git diff main..branch`** — this includes unrelated main commits and produces a wrong diff.
+
+**Save the briefing pack** to a file (e.g., `pr-{number}-briefing.md`). Every Wave 1 sub-agent receives this file as context.
+
+### Wave 1: Find
+
+Launch **one sub-agent per dimension** (parallel batches of 6). Each evaluates exactly one dimension against the PR diff.
+
+Sub-agent instructions:
+
+> Report `$DimensionName — LGTM` when the dimension is genuinely clean.
+>
+> Report an ISSUE only when you can construct a **concrete failing scenario**: a specific input, a specific call sequence, a specific state that triggers the bug. No hypotheticals.
+>
+> Read the **PR diff**, not main — new files only exist in the PR branch.
+>
+> Include exact file path and line range. Verify by tracing actual code flow.
+
+### Wave 2: Validate
+
+For each non-LGTM finding, actively prove or disprove it:
+
+- **Code flow tracing**: Read full source from PR branch (`refs/pull/{pr}/head`). Trace callers, callees, state mutations, error paths.
+- **Write and run tests for claims**: Write a minimal test that demonstrates the claimed issue. Run it against the PR branch. If the test fails as predicted → confirmed. If it passes → disputed.
+- **Proof-of-concept snippet**: When a full test is too complex to run inline, write pseudocode or partial code demonstrating the issue. Include in PR feedback as evidence — enough for another engineer to implement.
+- **Scenario simulation**: For complex issues (concurrency, state machines, protocol interactions), write a step-by-step execution trace showing how the bug manifests.
+- **Multi-model consensus**: For borderline findings, validate with 3 models (Opus, Codex, Gemini). Keep findings confirmed by ≥2/3.
+
+A finding is confirmed only with concrete evidence. Never validate against `main` — PR code only exists in the PR branch.
+
+### Wave 3: Post
+
+Post confirmed findings as inline review comments at exact file:line via GitHub CLI or MCP tools:
+
+```markdown
+**[$SEVERITY] $DimensionName**
+
+$Concrete scenario that triggers the bug.
+
+**Execution trace:** (when helpful)
+Step 1: caller invokes X with input Y (line N)
+Step 2: control reaches Z without validation (line M) ← bug
+
+**Proof-of-concept test:**
+```csharp
+[Fact]
+public void Scenario_Demonstrates_Issue() { ... }
+```
+
+**Recommendation:** $Fix.
+```
+
+Post design-level concerns (not tied to a line) as a single PR comment — one bullet each.
+
+### Wave 4: Summary
+
+Post a dimension checkbox table as the review body:
+
+```markdown
+| # | Dimension | Verdict |
+|---|-----------|---------|
+| 1 | Dimension Name | ✅ LGTM |
+| 2 | Another | ⚠️ 1 MAJOR |
+
+- [x] Dimension Name
+- [ ] Another — description of issue
+```
+
+`[x]` = LGTM or NITs only. `[ ]` = MAJOR or BLOCKING.
+All `[x]` → APPROVE. Any BLOCKING → REQUEST_CHANGES. Otherwise → COMMENT.
+
+---
+
+## Phase 5: Final Quality Gate
+
+### 5.1 Anthropic guide compliance
+
+Verify all artifacts against best practices:
+- YAML frontmatter: name (gerund, ≤64), description (third person, ≤1024, triggers)
+- No verbose explanations (the model is smart)
+- No time-sensitive info
+- Consistent terminology
+- Progressive disclosure respected
+
+### 5.2 Flow coherence
+
+Verify the three layers work together:
+- Instructions trigger on file edits → folder-specific rules
+- Skills trigger on topic keywords → overarching guidance
+- Agent triggers on `@{agent_name}` → full review workflow
+- No concept explained in two places
+- Skills point to agent for review, not duplicate it
+- Instructions don't repeat skills or AGENTS.md
+
+### 5.3 Link and path verification
+
+- All `applyTo` globs match existing folders (`Test-Path`)
+- All relative doc links resolve to existing files
+- No stale references to deleted files
+
+### 5.4 Anonymization verification
+
+- Zero occurrences of: the username, full name, comment counts, PR numbers, "distilled from", "extracted from", evidence statistics
+- Content reads as authoritative guidance, not analysis output
+
+### 5.5 Deduplication verification
+
+- No trigger overlap between skills (unless cross-referenced as complementary)
+- No body overlap between instructions and AGENTS.md/copilot-instructions.md
+- Agent doesn't repeat AGENTS.md content
+
+---
+
+## Lessons Learned (encoded in this process)
+
+Failure modes observed during development. The process above accounts for them, but they're listed for awareness:
+
+1. **Nodder bias**: Telling sub-agents "LGTM is the best outcome" caused them to explain away real issues. The correct framing: "LGTM when genuinely clean. Do not explain away real issues."
+
+2. **Nitpicker bias**: Without LGTM guidance, sub-agents generated 25+ findings including hypotheticals. The fix: require concrete failing scenarios — no "maybe in theory."
+
+3. **Wrong branch verification**: Verification agents checked `main` instead of the PR branch, disputing real findings because new files didn't exist on `main`. The fix: always verify against PR diff or `refs/pull/{pr}/head`.
+
+4. **Static-only analysis**: Reading code without tracing execution missed whether issues were real. The fix: Wave 2 requires active validation — build, test, write PoCs, simulate execution traces.
+
+5. **Duplicate skills**: Two skills covering the same topic emerged. The fix: single source of truth in one file; others are slim pointers.
+
+6. **Separate reference files get skipped**: Content in a separate file was not reliably read by the model. The fix: inline critical content in the agent file — it's loaded on invocation, guaranteed to be read.
+
+7. **Description is everything for discovery**: The YAML `description` field is how the model picks from 100+ skills. Invest in keyword-rich, third-person, specific trigger descriptions.
diff --git a/.github/instructions/ExpertReview.instructions.md b/.github/instructions/ExpertReview.instructions.md
new file mode 100644
index 00000000000..507a13e0c26
--- /dev/null
+++ b/.github/instructions/ExpertReview.instructions.md
@@ -0,0 +1,31 @@
+---
+applyTo:
+  - "src/Compiler/**/*.{fs,fsi}"
+---
+
+# Compiler Review Rules
+
+## Test discipline
+
+- Add tests for every behavioral change before merging — missing coverage is the most common review blocker.
+- Run tests in both Debug and Release mode to catch optimization-dependent codegen differences.
+- Explain all new errors in test baselines and confirm each is expected behavior, not a regression.
+
+## Type safety
+
+- Use `tryTcrefOfAppTy` or the `AppTy` active pattern instead of direct `TType` pattern matches — direct matches miss erased types.
+- Remove default wildcard patterns in discriminated union matches so the compiler catches missing cases.
+
+## Feature gating
+
+- Gate every new language feature behind a `LanguageFeature` flag and ship off-by-default until stable.
+- Factor cleanup changes into separate commits from feature enablement.
+
+## Parallel compilation
+
+- Eagerly format diagnostics at production time to prevent type inference parameter leakage across files.
+- Verify deterministic output by diffing binaries between parallel and sequential type checking.
+
+## Code generation
+
+- Strip debug points when matching on `Expr.Lambda` during code generation to prevent IL stack corruption.
diff --git a/.github/skills/expert-review/SKILL.md b/.github/skills/expert-review/SKILL.md
new file mode 100644
index 00000000000..ea706287c82
--- /dev/null
+++ b/.github/skills/expert-review/SKILL.md
@@ -0,0 +1,52 @@
+---
+name: expert-review
+description: "Performs multi-dimensional expert review of F# compiler PRs. Evaluates type checking correctness, IL emission, binary compatibility, AST accuracy, IDE performance, test coverage, and code quality. Invoke when reviewing code changes, checking PR quality, or requesting thorough feedback on compiler modifications."
+---
+
+# Expert Review
+
+Performs a thorough, multi-dimensional review of F# compiler changes. The full review logic lives in `.github/agents/expert-reviewer.md` — invoke that agent for the actual review.
+
+## When to Invoke
+
+- PR touches `src/Compiler/` — invoke the full agent
+- PR touches `src/FSharp.Core/` — invoke with focus on API Surface & Concurrency
+- PR touches `vsintegration/` or LanguageServer — invoke with focus on IDE Performance & Editor UX
+- PR touches `tests/` only — quick check: are baselines explained? Cross-TFM coverage present?
+- Pre-merge self-review — use the quick checklist below
+
+## Dimension Selection
+
+Not every PR needs all 15 dimensions. Select based on changed files:
+
+| Files Changed | Focus Dimensions |
+|---|---|
+| `Checking/`, `TypedTree/` | Type System, Parallel Compilation, Feature Gating |
+| `CodeGen/`, `AbstractIL/`, `Optimize/` | IL Emission, Debug Correctness, Test Coverage |
+| `SyntaxTree/`, `pars.fsy` | Parser Integrity, Feature Gating |
+| `TypedTreePickle.*`, `CompilerImports.*` | Binary Compatibility (highest priority) |
+| `Service/`, `LanguageServer/` | IDE Performance, Concurrency |
+| `FSComp.txt` | Diagnostic Quality |
+| `FSharp.Core/` | API Surface, Concurrency |
+| `vsintegration/` | Editor Integration |
+
+## Quick Self-Review Checklist
+
+Before requesting a full agent review, verify:
+
+1. [ ] Every behavioral change has a test
+2. [ ] Test baselines are updated with explanations for new errors
+3. [ ] New language features have a `LanguageFeature` guard
+4. [ ] No unintended public API surface changes
+5. [ ] Cleanup changes are separate from feature enablement
+6. [ ] Compiler warnings are resolved
+7. [ ] Tests run in both Debug and Release where relevant
+8. [ ] Error messages follow: statement → analysis → advice
+
+## Invocation
+
+```
+Invoke the expert-reviewer agent to review this PR.
+```
+
+The agent executes a 5-wave review workflow (Orientation → Structural → Correctness → Integration → Quality) and reports findings by severity.

From 21dfcde703102f983bf7a4101c94c0fe1561a3dc Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Tue, 17 Mar 2026 17:25:37 +0100
Subject: [PATCH 2/6] Apply Anthropic skill best practices and add
 cross-references

- Skill name: gerund form (reviewing-compiler-prs)
- Description: third person, keyword-rich, specific triggers
- Removed redundant prose (Claude already knows)
- Added cross-references to complementary skills (review-council,
  hypothesis-driven-debugging, ilverify-failure, vsintegration-ide-debugging)
- Slim pointer pattern: skill is overview, agent is source of truth

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/agents/expert-reviewer.md     |  6 ++--
 .github/skills/expert-review/SKILL.md | 40 ++++++++++-----------------
 2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/.github/agents/expert-reviewer.md b/.github/agents/expert-reviewer.md
index df61e30ba10..6db222cdec6 100644
--- a/.github/agents/expert-reviewer.md
+++ b/.github/agents/expert-reviewer.md
@@ -1,11 +1,13 @@
 ---
 name: expert-reviewer
-description: "Multi-dimensional code review agent for F# compiler PRs. Invoke for thorough review of compiler changes, type checking, IL emission, AST correctness, binary compatibility, and IDE integration. Use when reviewing PRs or requesting expert feedback."
+description: "Multi-dimensional code review agent for F# compiler PRs. Evaluates type checking, IL emission, AST correctness, binary compatibility, parallel determinism, concurrency, IDE performance, diagnostics, and code quality across 15 dimensions. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks."
 ---
 
 # Expert Reviewer
 
-A multi-dimensional review agent for the F# compiler. Evaluates changes across 15 dimensions derived from long-standing review practices in the codebase.
+Evaluates F# compiler changes across 15 dimensions. Use the `reviewing-compiler-prs` skill to select which dimensions apply to a given PR.
+
+**Related tools:** `review-council` (generic multi-model review), `hypothesis-driven-debugging` (investigating failures found during review), `ilverify-failure` (fixing IL verification issues), `vsintegration-ide-debugging` (fixing IDE debugging issues).
 
 ## Overarching Principles
 
diff --git a/.github/skills/expert-review/SKILL.md b/.github/skills/expert-review/SKILL.md
index ea706287c82..aff40ccfce4 100644
--- a/.github/skills/expert-review/SKILL.md
+++ b/.github/skills/expert-review/SKILL.md
@@ -1,24 +1,24 @@
 ---
-name: expert-review
-description: "Performs multi-dimensional expert review of F# compiler PRs. Evaluates type checking correctness, IL emission, binary compatibility, AST accuracy, IDE performance, test coverage, and code quality. Invoke when reviewing code changes, checking PR quality, or requesting thorough feedback on compiler modifications."
+name: reviewing-compiler-prs
+description: "Performs multi-dimensional expert review of F# compiler PRs across 15 dimensions: type checking, IL emission, binary compatibility, AST correctness, parallel determinism, concurrency safety, IDE performance, diagnostics, feature gating, and code quality. Invoke when reviewing compiler changes, requesting pre-merge feedback, or checking PR quality against established review standards."
 ---
 
 # Expert Review
 
-Performs a thorough, multi-dimensional review of F# compiler changes. The full review logic lives in `.github/agents/expert-reviewer.md` — invoke that agent for the actual review.
+Full review logic lives in the `expert-reviewer` agent. This skill provides dimension selection and a self-review checklist.
+
+For generic multi-model review (not F#-specific), see the `review-council` skill instead.
 
 ## When to Invoke
 
-- PR touches `src/Compiler/` — invoke the full agent
-- PR touches `src/FSharp.Core/` — invoke with focus on API Surface & Concurrency
-- PR touches `vsintegration/` or LanguageServer — invoke with focus on IDE Performance & Editor UX
-- PR touches `tests/` only — quick check: are baselines explained? Cross-TFM coverage present?
-- Pre-merge self-review — use the quick checklist below
+- PR touches `src/Compiler/` — invoke the `expert-reviewer` agent
+- PR touches `src/FSharp.Core/` — focus on API Surface & Concurrency dimensions
+- PR touches `vsintegration/` or `LanguageServer/` — focus on IDE Performance & Editor UX
+- PR touches `tests/` only — quick check: baselines explained? Cross-TFM coverage?
+- Pre-merge self-review — use the checklist below
 
 ## Dimension Selection
 
-Not every PR needs all 15 dimensions. Select based on changed files:
-
 | Files Changed | Focus Dimensions |
 |---|---|
 | `Checking/`, `TypedTree/` | Type System, Parallel Compilation, Feature Gating |
@@ -30,23 +30,13 @@ Not every PR needs all 15 dimensions. Select based on changed files:
 | `FSharp.Core/` | API Surface, Concurrency |
 | `vsintegration/` | Editor Integration |
 
-## Quick Self-Review Checklist
-
-Before requesting a full agent review, verify:
+## Self-Review Checklist
 
 1. [ ] Every behavioral change has a test
-2. [ ] Test baselines are updated with explanations for new errors
+2. [ ] Test baselines updated with explanations for new errors
 3. [ ] New language features have a `LanguageFeature` guard
 4. [ ] No unintended public API surface changes
-5. [ ] Cleanup changes are separate from feature enablement
-6. [ ] Compiler warnings are resolved
-7. [ ] Tests run in both Debug and Release where relevant
+5. [ ] Cleanup changes separate from feature enablement
+6. [ ] Compiler warnings resolved
+7. [ ] Tests pass in both Debug and Release
 8. [ ] Error messages follow: statement → analysis → advice
-
-## Invocation
-
-```
-Invoke the expert-reviewer agent to review this PR.
-```
-
-The agent executes a 5-wave review workflow (Orientation → Structural → Correctness → Integration → Quality) and reports findings by severity.

From 48d005a5ccfc2abde87695e58a4d0e758e669fcf Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Wed, 18 Mar 2026 11:11:12 +0100
Subject: [PATCH 3/6] Merge review-council into expert-review skill
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The review-council skill (generic multi-model review) and expert-review
skill (F#-specific dimensions) had 0.55 trigger overlap. Merged into
a single reviewing-compiler-prs skill that combines:
- F#-specific 15 dimensions with file-based selection
- Multi-model 3-phase methodology (briefing, parallel assessment,
  consolidation with dedup/false-positive filtering)
- Self-review checklist

Deleted .github/skills/review-council/ — its methodology is now
embedded in the merged skill's Multi-Model Review Methodology section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/agents/expert-reviewer.md      |   2 +-
 .github/skills/expert-review/SKILL.md  |  44 +++++++--
 .github/skills/review-council/SKILL.md | 124 -------------------------
 3 files changed, 39 insertions(+), 131 deletions(-)
 delete mode 100644 .github/skills/review-council/SKILL.md

diff --git a/.github/agents/expert-reviewer.md b/.github/agents/expert-reviewer.md
index 6db222cdec6..f064e28e750 100644
--- a/.github/agents/expert-reviewer.md
+++ b/.github/agents/expert-reviewer.md
@@ -7,7 +7,7 @@ description: "Multi-dimensional code review agent for F# compiler PRs. Evaluates
 
 Evaluates F# compiler changes across 15 dimensions. Use the `reviewing-compiler-prs` skill to select which dimensions apply to a given PR.
 
-**Related tools:** `review-council` (generic multi-model review), `hypothesis-driven-debugging` (investigating failures found during review), `ilverify-failure` (fixing IL verification issues), `vsintegration-ide-debugging` (fixing IDE debugging issues).
+**Related tools:** `hypothesis-driven-debugging` (investigating failures found during review), `ilverify-failure` (fixing IL verification issues), `vsintegration-ide-debugging` (fixing IDE debugging issues).
 
 ## Overarching Principles
 
diff --git a/.github/skills/expert-review/SKILL.md b/.github/skills/expert-review/SKILL.md
index aff40ccfce4..353dd3ddee4 100644
--- a/.github/skills/expert-review/SKILL.md
+++ b/.github/skills/expert-review/SKILL.md
@@ -1,15 +1,13 @@
 ---
 name: reviewing-compiler-prs
-description: "Performs multi-dimensional expert review of F# compiler PRs across 15 dimensions: type checking, IL emission, binary compatibility, AST correctness, parallel determinism, concurrency safety, IDE performance, diagnostics, feature gating, and code quality. Invoke when reviewing compiler changes, requesting pre-merge feedback, or checking PR quality against established review standards."
+description: "Multi-agent, multi-model code review for F# compiler PRs. Dispatches parallel assessment agents across 15 F#-specific dimensions (type checking, IL emission, binary compatibility, AST correctness, parallel determinism, concurrency, IDE performance, diagnostics, feature gating, code quality, code reuse, complexity). Invoke for: code review, PR review, expert review, multi-agent review, panel review, diverse model review, review council, fellowship review."
 ---
 
-# Expert Review
+# Reviewing Compiler PRs
 
-Full review logic lives in the `expert-reviewer` agent. This skill provides dimension selection and a self-review checklist.
+Orchestrates a multi-model review panel across F#-specific dimensions. The full dimension definitions and CHECK rules live in the `expert-reviewer` agent — this skill handles dimension selection, the multi-model dispatch methodology, and self-review.
 
-For generic multi-model review (not F#-specific), see the `review-council` skill instead.
-
-## When to Invoke
+## When to Invoke the Agent
 
 - PR touches `src/Compiler/` — invoke the `expert-reviewer` agent
 - PR touches `src/FSharp.Core/` — focus on API Surface & Concurrency dimensions
@@ -30,8 +28,42 @@ For generic multi-model review (not F#-specific), see the `review-council` skill
 | `FSharp.Core/` | API Surface, Concurrency |
 | `vsintegration/` | Editor Integration |
 
+## Multi-Model Review Methodology
+
+When thorough review is requested, the `expert-reviewer` agent uses this 3-phase approach:
+
+### Phase 1: Briefing Pack
+
+Build a self-contained briefing from:
+1. PR metadata (title, body, labels)
+2. Linked issues — fetch full body **and all comments** (real requirements live there)
+3. Merge-base diff only — use `pull_request_read` → `get_diff` or `gh pr diff`
+4. Changed files list with change types; test files called out separately
+5. Commit messages
+
+### Phase 2: Parallel Assessment
+
+Dispatch **one agent per selected dimension**, each using the dimension's CHECK rules from the agent file. For high-confidence reviews, assess each dimension with multiple models (`claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`).
+
+Each agent receives the briefing pack and returns findings with: location (file:line), description, severity (critical/high/medium/low).
+
+**Assessment gates** (apply before flagging):
+- **Execution context**: understand how and where the code runs. A test harness and the main compiler have different constraints.
+- **Direction**: classify each change as regression, improvement, or unclear. Only regressions are findings.
+- **Concrete scenario required**: report an ISSUE only when you can construct a specific failing scenario. No hypotheticals.
+
+### Phase 3: Consolidation
+
+1. **Deduplicate**: merge findings pointing at the same location with the same problem
+2. **Filter false positives**: wrong context assumed → discard; improvement not regression → downgrade; speculation without evidence → LOW confidence
+3. **Classify**: Behavioral (correctness) → Quality (structure) → Nitpick (style)
+4. **Rank**: prefer findings flagged by more agents (cross-model agreement = higher confidence)
+5. **Present**: Behavioral first, then Quality, then Nitpicks (only if no higher-level findings)
+
 ## Self-Review Checklist
 
+Before requesting a full agent review:
+
 1. [ ] Every behavioral change has a test
 2. [ ] Test baselines updated with explanations for new errors
 3. [ ] New language features have a `LanguageFeature` guard
diff --git a/.github/skills/review-council/SKILL.md b/.github/skills/review-council/SKILL.md
deleted file mode 100644
index 6733bfbc5a4..00000000000
--- a/.github/skills/review-council/SKILL.md
+++ /dev/null
@@ -1,124 +0,0 @@
----
-name: review-council
-description: "Multi-agent review council. Invoke for: review council, fellowship review, multi-agent review, panel review, diverse model review."
----
-
-# Review Council
-
-A multi-model code review panel. You orchestrate 3 phases: build a briefing pack, dispatch 15 parallel assessment agents (5 dimensions × 3 models), and consolidate findings.
-
-## Phase 1: Briefing Pack
-
-Auto-detect the PR from the current branch. Build a self-contained briefing document containing all of the following:
-
-1. **PR metadata**: title, body, labels
-2. **Linked issues**: for every issue referenced in the PR body or commits, fetch the full issue body and **all comments** — this is where the real requirements and context live
-3. **Full diff**: merge-base diff only (`git diff $(git merge-base HEAD <base>)..HEAD`) — shows only what the PR adds, not what main gained since the branch point
-4. **Changed files list**: with change type (added/modified/deleted), test files called out separately
-5. **Commit messages**: useful for correlating claims to specific code changes, but not a source of requirements
-
-Use `claude-opus-4.6` for this phase. The output is a single structured briefing document. Every assessment agent in Phase 2 receives this briefing verbatim — they should not need to re-fetch any of this context.
-
-## Phase 2: Parallel Assessment
-
-Dispatch **15 agents in parallel**: each of the 5 dimensions below assessed independently by each of 3 models.
-
-**Models**: `claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`
-
-Each agent receives:
-- The full briefing pack from Phase 1
-- Its specific dimension prompt (below)
-- Instruction to return findings as a list, each with: location (file:line), description, severity (critical/high/medium/low)
-
----
-
-### Dimension 1: Claims Coverage
-
-Are the goals met? Cross-reference every claim in the PR description, linked issues, and commit messages against actual code changes. Flag:
-
-- **Orphan claims**: stated in PR/issue but no corresponding code change implements it
-- **Orphan changes**: code changed but not mentioned in any claim — why is this here?
-- **Partial implementations**: claim says X, code does X-minus-something — what's missing?
-
-**Before flagging**, apply these gates:
-- **Execution context**: understand how and where the code runs before judging it. A test harness, a standalone project, and the main compiler have different constraints. Do not assume the worst-case context.
-- **Direction**: classify each behavioral change as regression (breaks something), improvement (fixes bug, removes race, simplifies), or unclear. Only regressions are findings. Improvements are observations at most. For unclear, state uncertainty — do not present as a defect.
-
----
-
-### Dimension 2: Test Coverage
-
-Is every feature, fix, or behavioral change covered with tests? Check for:
-
-- **Happy path**: does the basic intended usage work and is it tested end-to-end?
-- **Negative path**: what happens with invalid input, malformed syntax, error conditions? Are diagnostics tested?
-- **Feature interactions**: this is a compiler — edge cases are about how the change interacts with other F# features. Example: a codegen change should test both reference types and value types. A syntax change should test interaction with generics, constraints, computation expressions, etc.
-- **Assertion quality**: tests must actually assert the claimed behavior, not just "compiles and runs without throwing". A test that calls the function but doesn't check the result is not a test.
-
-Flag any behavioral change in the diff that lacks a corresponding test. Tests should be in the appropriate layer — pick based on what the issue is and what changed:
-- **Typecheck tests**: the bulk of coverage — type inference, constraint solving, overload resolution, expected compiler warnings and errors
-- **SyntaxTreeTests**: parser/syntax changes
-- **EmittedIL tests**: codegen/IL shape changes
-- **compileAndRun tests**: end-to-end behavioral correctness that absolutely needs proper execution on the .NET runtime
-- **Service.Tests**: FCS API, editor features
-- **FSharp.Core.Tests**: core library changes
-
-A PR can and often should have tests in multiple categories.
-
----
-
-### Dimension 3: Code Quality
-
-Assess structural quality of the changes:
-
-- **Logical layer placement**: is the code in the right module/file, or shoved somewhere convenient? Would a reader expect to find this logic here?
-- **Ad-hoc "if/then" patches**: flag any `if condition then specialCase else normalPath` that looks like a band-aid rather than a systematic fix. These are symptoms of not understanding the root cause — the fix should be at the source, not patched at a consumer. A conditional that exists only to work around a bug elsewhere is a code smell.
-- **Duplicated logic within the PR diff**: same or near-same code appearing in multiple places in the changeset
-- **Error handling**: not swallowing exceptions, not ignoring Result values, not using `failwith` where a typed error would be appropriate
-- **Respect intentional deviations**: some projects (standalone tests, isolated builds, special harnesses) deliberately diverge from repo-wide conventions. Before flagging a pattern as wrong, check whether the project has a structural reason to be different. Intentional isolation is not inconsistency.
-
----
-
-### Dimension 4: Code Reuse & Higher-Order Patterns
-
-Search the codebase for existing patterns that match the new code's structure. F# allows extracting logic into composable pieces — mappable structures, foldable, walkers, visitors. Look for:
-
-- **Highly similar nested pattern matches**: a familiar structure of nested `match ... with` but with a minor tweak. This is the #1 symptom — slight differences can almost always be extracted into a higher-order function or otherwise parameterized or made generic.
-- **Copy-paste-modify**: new code that duplicates an existing function with small changes. The difference should be a parameter, a generic type argument, or a function argument (higher-order function).
-- **Missed abstractions**: where two pieces of code share structure but differ in a specific operation — that operation should be a parameter, a generic type argument, or a function argument.
-- **Existing utilities ignored**: the codebase may already have helpers, combinators, or active patterns that do what the new code reimplements from scratch. Search for them.
-
----
-
-### Dimension 5: Cyclomatic Complexity
-
-Assess complexity of added/changed code:
-
-- **Pyramid of nested doom**: deeply nested `if/then/else`, heavily nested `match`or interleaving with `for` and other branching constructs. Any nesting beyond 2 levels should be questioned — is there a flatter way?
-- **F# offers better tools**: pattern matching and active patterns for non-trivial branching logic — flatter and easier to read than chains of `if/elif/else`. Suggest them as alternatives.
-- **Active patterns for complex conditions**: when a match guard or if-condition encodes domain logic, an active pattern names it and makes it reusable.
-- **Pipelines over nesting**: sequential operations should be pipelined, not nested. Collections, Result, Option, ValueOption all support this — use `|>`, `bind`, `map` chains instead of nested `match` or `if/then`.
-- **High branch count**: functions with many `match` arms or `if` branches — consider whether the cases can be grouped, or whether the function is doing too much and should be split.
-- **Flatter is better**: a flat pattern match with 10 arms is easier to read than 4 levels of nesting with 10 combinations. Prefer wide over deep.
-
----
-
-## Phase 3: Consolidation
-
-Collect all findings from the 15 agents. Then:
-
-1. **Deduplicate**: multiple agents will find the same issue. Merge findings that point at the same location and describe the same problem. Keep the best-written description.
-2. **Filter false positives** before classifying:
-   - **Wrong context assumed**: finding assumes a different execution context than the code actually runs in → discard.
-   - **Correct convention**: finding flags a pattern that is idiomatic for the tool/context in use → discard.
-   - **Improvement, not regression**: change makes things better (bugfix, simplification, race fix) → downgrade to informational, do not include in actionable findings.
-   - **"Unexplained" ≠ "wrong"**: if the only concern is missing rationale in the commit message, that is a documentation gap — not a defect. Convert to comment, not finding.
-   - **Speculation consensus**: ≥2 models agree but reasoning relies on "could cause" without evidence it does → LOW confidence.
-3. **Classify** into three buckets:
-   - **Behavioral**: missing feature coverage, missing tests, incorrect logic, claims not met — things that affect correctness
-   - **Quality**: code structure, readability, complexity, reuse opportunities — it works, but could be better
-   - **Nitpick**: typos, naming, formatting, minor style — agents love producing these to have *something* to say. Low priority. Only surface if there are no higher-level findings.
-4. **Rank within each bucket**: prefer findings flagged by more agents (cross-model agreement = higher confidence).
-5. **Present**: Behavioral first, then Quality, then Nitpicks (if any). For each finding: location, dimension, description, how many agents flagged it.
-
-If a model is unavailable at runtime, proceed with the remaining models. Minimum viable council = 2 models.

From bdf96e26a63b906ba8429344d121a94656693809 Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Wed, 18 Mar 2026 11:12:25 +0100
Subject: [PATCH 4/6] Apply Anthropic best practices to merged review skill

- Description: natural third-person prose, no keyword spam list
- Removed redundant opening paragraph (restated description)
- Trimmed Multi-Model section: removed briefing pack steps Claude
  already knows; kept only unique content (assessment gates,
  consolidation rules)
- Removed redundant 'Before requesting...' preamble
- 75 -> 54 lines (28% reduction, no content loss)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/skills/expert-review/SKILL.md | 58 +++++++++------------------
 1 file changed, 19 insertions(+), 39 deletions(-)

diff --git a/.github/skills/expert-review/SKILL.md b/.github/skills/expert-review/SKILL.md
index 353dd3ddee4..a8544c6d4e2 100644
--- a/.github/skills/expert-review/SKILL.md
+++ b/.github/skills/expert-review/SKILL.md
@@ -1,19 +1,18 @@
 ---
 name: reviewing-compiler-prs
-description: "Multi-agent, multi-model code review for F# compiler PRs. Dispatches parallel assessment agents across 15 F#-specific dimensions (type checking, IL emission, binary compatibility, AST correctness, parallel determinism, concurrency, IDE performance, diagnostics, feature gating, code quality, code reuse, complexity). Invoke for: code review, PR review, expert review, multi-agent review, panel review, diverse model review, review council, fellowship review."
+description: "Performs multi-agent, multi-model code review of F# compiler PRs across 15 dimensions including type checking, IL emission, binary compatibility, parallel determinism, and IDE performance. Dispatches parallel assessment agents per dimension, consolidates with cross-model agreement scoring, and filters false positives. Invoke when reviewing compiler changes, requesting expert feedback, or performing pre-merge quality checks."
 ---
 
 # Reviewing Compiler PRs
 
-Orchestrates a multi-model review panel across F#-specific dimensions. The full dimension definitions and CHECK rules live in the `expert-reviewer` agent — this skill handles dimension selection, the multi-model dispatch methodology, and self-review.
+Full dimension definitions and CHECK rules live in the `expert-reviewer` agent.
 
-## When to Invoke the Agent
+## When to Invoke
 
 - PR touches `src/Compiler/` — invoke the `expert-reviewer` agent
-- PR touches `src/FSharp.Core/` — focus on API Surface & Concurrency dimensions
+- PR touches `src/FSharp.Core/` — focus on API Surface & Concurrency
 - PR touches `vsintegration/` or `LanguageServer/` — focus on IDE Performance & Editor UX
 - PR touches `tests/` only — quick check: baselines explained? Cross-TFM coverage?
-- Pre-merge self-review — use the checklist below
 
 ## Dimension Selection
 
@@ -28,47 +27,28 @@ Orchestrates a multi-model review panel across F#-specific dimensions. The full
 | `FSharp.Core/` | API Surface, Concurrency |
 | `vsintegration/` | Editor Integration |
 
-## Multi-Model Review Methodology
+## Multi-Model Dispatch
 
-When thorough review is requested, the `expert-reviewer` agent uses this 3-phase approach:
+Dispatch one agent per selected dimension. For high-confidence reviews, assess each dimension with multiple models (`claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`).
 
-### Phase 1: Briefing Pack
+**Assessment gates** — apply before flagging:
+- Understand execution context before judging (test harness ≠ compiler runtime)
+- Classify as regression, improvement, or unclear — only regressions are findings
+- Require a concrete failing scenario — no hypotheticals
 
-Build a self-contained briefing from:
-1. PR metadata (title, body, labels)
-2. Linked issues — fetch full body **and all comments** (real requirements live there)
-3. Merge-base diff only — use `pull_request_read` → `get_diff` or `gh pr diff`
-4. Changed files list with change types; test files called out separately
-5. Commit messages
-
-### Phase 2: Parallel Assessment
-
-Dispatch **one agent per selected dimension**, each using the dimension's CHECK rules from the agent file. For high-confidence reviews, assess each dimension with multiple models (`claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`).
-
-Each agent receives the briefing pack and returns findings with: location (file:line), description, severity (critical/high/medium/low).
-
-**Assessment gates** (apply before flagging):
-- **Execution context**: understand how and where the code runs. A test harness and the main compiler have different constraints.
-- **Direction**: classify each change as regression, improvement, or unclear. Only regressions are findings.
-- **Concrete scenario required**: report an ISSUE only when you can construct a specific failing scenario. No hypotheticals.
-
-### Phase 3: Consolidation
-
-1. **Deduplicate**: merge findings pointing at the same location with the same problem
-2. **Filter false positives**: wrong context assumed → discard; improvement not regression → downgrade; speculation without evidence → LOW confidence
-3. **Classify**: Behavioral (correctness) → Quality (structure) → Nitpick (style)
-4. **Rank**: prefer findings flagged by more agents (cross-model agreement = higher confidence)
-5. **Present**: Behavioral first, then Quality, then Nitpicks (only if no higher-level findings)
+**Consolidation:**
+1. Deduplicate findings at same location
+2. Filter: wrong context → discard; improvement → downgrade; speculation → LOW
+3. Classify: Behavioral → Quality → Nitpick
+4. Rank by cross-model agreement
+5. Present Behavioral first; Nitpicks only if nothing higher
 
 ## Self-Review Checklist
 
-Before requesting a full agent review:
-
 1. [ ] Every behavioral change has a test
-2. [ ] Test baselines updated with explanations for new errors
+2. [ ] Test baselines updated with explanations
 3. [ ] New language features have a `LanguageFeature` guard
 4. [ ] No unintended public API surface changes
 5. [ ] Cleanup changes separate from feature enablement
-6. [ ] Compiler warnings resolved
-7. [ ] Tests pass in both Debug and Release
-8. [ ] Error messages follow: statement → analysis → advice
+6. [ ] Tests pass in both Debug and Release
+7. [ ] Error messages follow: statement → analysis → advice

From b08c82ec4497c79dce465accc63b8b3fa57b121b Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Wed, 18 Mar 2026 11:15:02 +0100
Subject: [PATCH 5/6] Restore all learned content from review-council into
 merged artifacts
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Agent (expert-reviewer.md):
- Dim 1: test layer selection guide (Typecheck/SyntaxTree/EmittedIL/
  compileAndRun/Service/Core), happy/negative/interaction paths,
  assertion quality
- Dim 12: pipelines over nesting, active patterns for complex
  conditions, nesting depth limit (2 levels), wide over deep
- Dim 13: ad-hoc if/then band-aid detection, higher-order pattern
  extraction, search for existing utilities before reimplementing,
  respect intentional deviations in isolated projects

Skill (SKILL.md):
- Claims coverage check (orphan claims/changes/partial impls)
- 'Correct convention' false positive filter
- 'Unexplained != wrong' — doc gap not defect
- 'Agents love nitpicks' — deprioritize
- Minimum viable council = 2 models

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/agents/expert-reviewer.md     | 25 ++++++++++++++++++++-----
 .github/skills/expert-review/SKILL.md | 12 ++++++++----
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/.github/agents/expert-reviewer.md b/.github/agents/expert-reviewer.md
index f064e28e750..ef0e38688c3 100644
--- a/.github/agents/expert-reviewer.md
+++ b/.github/agents/expert-reviewer.md
@@ -27,11 +27,20 @@ Every behavioral change, bug fix, and new feature requires corresponding tests b
 
 **CHECK:**
 - Verify that every code path added or modified has a test exercising it.
+- Test the happy path, negative path (invalid input, error conditions), and feature interactions (how the change interacts with generics, constraints, computation expressions, etc.).
+- Tests must actually assert the claimed behavior — a test that calls a function without checking results is not a test.
 - Confirm new tests cover behavior not already exercised by existing test suites.
 - Explain all new errors in test baselines and confirm they are expected.
 - Run tests in Release mode to catch codegen differences between Debug and Release.
 - Run tests on both desktop (.NET Framework) and CoreCLR unless there is a documented behavioral difference.
-- Verify attribute availability in the target runtime version before writing tests against it.
+- Place tests in the appropriate layer based on what changed:
+  - Typecheck tests: type inference, constraint solving, overload resolution, expected warnings/errors
+  - SyntaxTreeTests: parser/syntax changes
+  - EmittedIL tests: codegen/IL shape changes
+  - compileAndRun tests: end-to-end behavioral correctness requiring .NET runtime execution
+  - Service.Tests: FCS API, editor features
+  - FSharp.Core.Tests: core library changes
+- A PR can and often should have tests in multiple layers.
 - Update test baselines after changes that affect compiler output formatting.
 - Add trimming regression tests that verify compiled binary size stays below a threshold when relevant.
 
@@ -247,12 +256,14 @@ Compiler code should follow F# idioms and use clear, consistent terminology for
 - Use `let mutable` instead of `ref` cells for mutable state.
 - Use `ResizeArray` type alias instead of `System.Collections.Generic.List<_>`.
 - Use pipeline operators for clearer data flow in service code.
+- Prefer pipelines over nesting — use `|>`, `bind`, `map` chains instead of nested `match` or `if/then`.
+- Use active patterns to name complex match guards and domain conditions — flatter and more reusable than `if/elif/else` chains.
+- Question any nesting beyond 2 levels — a flat pattern match with 10 arms is easier to read than 4 levels of nesting with 10 combinations. Prefer wide over deep.
 - Shadow variables when rebinding to improve code clarity.
 - Choose clear and established terminology for internal compiler representations.
-- Review terminology holistically before renaming within optimization phases.
 - Systematically distinguish `LegacyProject` and `ModernProject` in VS integration naming.
 
-**Severity:** Deprecated construct → **medium**. Naming confusion → **medium**. Style deviation → **low**.
+**Severity:** Deprecated construct → **medium**. Naming confusion → **medium**. Deep nesting → **medium**. Style deviation → **low**.
 
 **Hotspots:** All `src/Compiler/` directories, `vsintegration/`
 
@@ -263,17 +274,21 @@ Compiler code should follow F# idioms and use clear, consistent terminology for
 Keep the codebase clean. Extract helpers, remove dead code, and avoid ad-hoc patches.
 
 **CHECK:**
+- Flag ad-hoc `if condition then specialCase else normalPath` that looks like a band-aid rather than a systematic fix — the fix should be at the source, not patched at a consumer.
+- Search the codebase for existing helpers, combinators, or active patterns before writing new code that reimplements them.
+- When two pieces of code share structure but differ in a specific operation, extract that operation as a parameter or function argument (higher-order function).
+- Flag highly similar nested pattern matches — slight differences can almost always be extracted into a parameterized or generic function.
 - Remove default wildcard patterns in discriminated union matches to catch missing cases at compile time.
 - Extract duplicated logic into a shared function with input arguments.
 - Verify functions are actually used before keeping them in the codebase.
 - Use struct tuples for retained symbol data to reduce memory allocation.
 - Use `ConditionalWeakTable` for caches keyed by GC-collected objects.
-- Remove duplicate helpers and consolidate to a single shared implementation.
 - Fix compiler warnings like unused value bindings before merging.
 - Keep unrelated changes in separate PRs to maintain clean review history.
 - Follow existing abstraction patterns (e.g., `HasFSharpAttribute`) instead of ad-hoc checks.
+- Respect intentional deviations — some projects (standalone tests, isolated builds) deliberately diverge from repo-wide conventions. Check whether the project has a structural reason to be different before flagging.
 
-**Severity:** Unreachable code path → **high**. Duplication → **medium**. Missing helper extraction → **low**.
+**Severity:** Unreachable code path → **high**. Band-aid patch → **high**. Duplication → **medium**. Missing helper extraction → **low**.
 
 **Hotspots:** `src/Compiler/Checking/`, `src/Compiler/Optimize/`, `src/Compiler/Service/`
 
diff --git a/.github/skills/expert-review/SKILL.md b/.github/skills/expert-review/SKILL.md
index a8544c6d4e2..dc14c8185e3 100644
--- a/.github/skills/expert-review/SKILL.md
+++ b/.github/skills/expert-review/SKILL.md
@@ -29,19 +29,23 @@ Full dimension definitions and CHECK rules live in the `expert-reviewer` agent.
 
 ## Multi-Model Dispatch
 
-Dispatch one agent per selected dimension. For high-confidence reviews, assess each dimension with multiple models (`claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`).
+Dispatch one agent per selected dimension. For high-confidence reviews, assess each dimension with multiple models (`claude-opus-4.6`, `gemini-3-pro-preview`, `gpt-5.2-codex`). Minimum viable council = 2 models.
+
+**Claims coverage** — before dimension assessment, cross-reference every claim in the PR description and linked issues against actual code changes. Flag orphan claims (stated but not implemented), orphan changes (code changed but not mentioned), and partial implementations.
 
 **Assessment gates** — apply before flagging:
 - Understand execution context before judging (test harness ≠ compiler runtime)
 - Classify as regression, improvement, or unclear — only regressions are findings
 - Require a concrete failing scenario — no hypotheticals
+- "Correct convention" for the context in use → discard, not a finding
+- "Unexplained" ≠ "wrong" — missing rationale in a commit message is a doc gap, not a defect
 
 **Consolidation:**
 1. Deduplicate findings at same location
 2. Filter: wrong context → discard; improvement → downgrade; speculation → LOW
-3. Classify: Behavioral → Quality → Nitpick
-4. Rank by cross-model agreement
-5. Present Behavioral first; Nitpicks only if nothing higher
+3. Classify: Behavioral (correctness) → Quality (structure) → Nitpick (style)
+4. Rank by cross-model agreement (≥2 models agree = higher confidence)
+5. Present Behavioral first; Nitpicks only if nothing higher — agents love producing nitpicks to have *something* to say, deprioritize them
 
 ## Self-Review Checklist
 

From f5d4bce41f3ded60b340b9b9a89e4f4aab89e7ee Mon Sep 17 00:00:00 2001
From: Tomas Grosup <tomasgrosup@microsoft.com>
Date: Wed, 18 Mar 2026 11:16:40 +0100
Subject: [PATCH 6/6] Remove overfitted parallel compilation section and
 Debug/Release rule from instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/instructions/ExpertReview.instructions.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/.github/instructions/ExpertReview.instructions.md b/.github/instructions/ExpertReview.instructions.md
index 507a13e0c26..67dbb5d73ed 100644
--- a/.github/instructions/ExpertReview.instructions.md
+++ b/.github/instructions/ExpertReview.instructions.md
@@ -8,7 +8,6 @@ applyTo:
 ## Test discipline
 
 - Add tests for every behavioral change before merging — missing coverage is the most common review blocker.
-- Run tests in both Debug and Release mode to catch optimization-dependent codegen differences.
 - Explain all new errors in test baselines and confirm each is expected behavior, not a regression.
 
 ## Type safety
@@ -21,11 +20,6 @@ applyTo:
 - Gate every new language feature behind a `LanguageFeature` flag and ship off-by-default until stable.
 - Factor cleanup changes into separate commits from feature enablement.
 
-## Parallel compilation
-
-- Eagerly format diagnostics at production time to prevent type inference parameter leakage across files.
-- Verify deterministic output by diffing binaries between parallel and sequential type checking.
-
 ## Code generation
 
 - Strip debug points when matching on `Expr.Lambda` during code generation to prevent IL stack corruption.