Skip to content

Add dotnet-inspect skill to dotnet plugin#340

Closed
richlander wants to merge 9 commits intodotnet:mainfrom
richlander:dotnet-inspect-skill
Closed

Add dotnet-inspect skill to dotnet plugin#340
richlander wants to merge 9 commits intodotnet:mainfrom
richlander:dotnet-inspect-skill

Conversation

@richlander
Copy link
Copy Markdown
Member

Summary

Add the dotnet-inspect skill to the dotnet plugin. This skill enables querying .NET APIs across NuGet packages, platform libraries, and local files — type discovery, member inspection, version diffing, extension methods, implementors, dependency graphs, SourceLink source URLs, and metadata auditing.

Files

  • plugins/dotnet/skills/dotnet-inspect/SKILL.md — skill definition (v0.6.8)
  • tests/dotnet/dotnet-inspect/eval.yaml — 11 eval scenarios

Eval Results

+42–51% improvement across runs (statistically significant, passed validation).

Scenario Baseline → Skill Δ
Port app (System.CommandLine beta → stable) 1–2.7 → 3.7–5.0 +20–50%
Find types (platform) 2.3 → 4.3–5.0 +38–44%
Find types (NuGet/IChatClient) 3.3–3.7 → 4.3–5.0 +13–33%
Extensions (HttpClient) 2.0–2.3 → 4.7–5.0 +39–59%
Interfaces (System.Text.Json) 3.0 → 5.0 +42–44%
Logging/tracing find 1.3–3.0 → 5.0 +46–77%
Implementors (IHostedService) 1.3–1.7 → 3.0–3.7 +52–55%
Package deps (M.E.AI) 3.3–3.7 → 4.7–5.0 +25–49%
Method overloads (M.E.AI) 3.3 → 4.7 +42%
SourceLink audit (Newtonsoft.Json) 1.3–1.7 → 5.0 +64–83%
Source code review (Dictionary HashDoS) NEW

Design Notes

  • Crisp decision tree: replaced vague 'Need API surface?' with concrete questions to reduce agent misrouting (eval-driven improvement)
  • 11 scenarios covering all major commands: type, member, find, diff, extensions, implements, depends, source, package, library
  • Supply chain scenarios: dependency chain inspection, signing/SourceLink audit
  • Hard scenarios that LLMs can't answer from training data (e.g., SourceLink audit 1→5, logging/tracing find 1→5)

Tool

dotnet-inspect — installed via dnx dotnet-inspect -y -- <command>

Copilot AI review requested due to automatic review settings March 12, 2026 08:49
Add SKILL.md (v0.6.8) with crisp decision tree, source command,
version resolution, and 11 eval scenarios covering: porting, type
discovery, member inspection, extensions, interfaces, find, implements,
depends, overloads, SourceLink audit, and source code review.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@richlander richlander force-pushed the dotnet-inspect-skill branch from 0ebcdd0 to 1cf684d Compare March 12, 2026 08:50
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new dotnet-inspect skill under the dotnet plugin, providing guidance for using dnx dotnet-inspect to query/compare .NET APIs across packages, platform libraries, and local artifacts, along with an eval suite to validate the skill’s impact.

Changes:

  • Introduces plugins/dotnet/skills/dotnet-inspect/SKILL.md documenting the skill workflow and command patterns.
  • Adds tests/dotnet/dotnet-inspect/eval.yaml with 11 evaluation scenarios covering type/member discovery, diffing, extensions, implementors, dependency inspection, and SourceLink auditing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
plugins/dotnet/skills/dotnet-inspect/SKILL.md New skill definition and usage guidance for dotnet-inspect.
tests/dotnet/dotnet-inspect/eval.yaml New eval scenarios and assertions to exercise the skill across multiple query types.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +1 to +5
---
name: dotnet-inspect
version: 0.6.8
description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents.
---
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frontmatter version: 0.6.0 conflicts with the PR description calling this skill definition v0.6.8. Please align the SKILL.md frontmatter version with the intended release/version (or drop the field if versions aren’t tracked for skills here) to avoid confusion in future updates.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +7
---
name: dotnet-inspect
version: 0.6.8
description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents.
---

# dotnet-inspect
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new skill under /plugins/dotnet/skills/dotnet-inspect/ (and matching tests), but .github/CODEOWNERS currently has no entries for these paths. CONTRIBUTING.md requires adding CODEOWNERS entries for every new skill/tests area so PRs route to the right owners; please add entries for both the skill directory and tests/dotnet/dotnet-inspect/.

Copilot uses AI. Check for mistakes.
Comment thread tests/dotnet/dotnet-inspect/eval.yaml Outdated
- "Uses dnx dotnet-inspect diff or member commands to understand API differences between System.CommandLine versions"
- "Uses dnx dotnet-inspect member to understand the new API surface (e.g., Command, Option, RootCommand)"
- "Correctly migrates Handler.SetHandler calls to the new System.CommandLine API (e.g., SetAction)"
- "Updates the PackageReference to the latest stable System.CommandLine version (2.0.4)"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario’s rubric hard-codes System.CommandLine “latest stable” as 2.0.4. That will become stale as soon as a newer stable version ships and can incorrectly penalize correct “upgrade to latest” behavior. Consider rewording the rubric to require upgrading to the latest stable version discovered during the run (or to the latest 2.x stable) rather than a specific patch version.

Suggested change
- "Updates the PackageReference to the latest stable System.CommandLine version (2.0.4)"
- "Updates the PackageReference to the latest stable System.CommandLine version discovered during the run"

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +5
Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project.
This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version.
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scenario depends on cloning an external GitHub repository at runtime. That can make evals flaky (repo moves, force-pushes, downtime, or content changes) and harder to reproduce. If possible, pin to a specific commit/tag (so results are stable) or include a minimal fixture project under tests/dotnet/dotnet-inspect/ and use setup.copy_test_files instead of cloning.

Suggested change
Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project.
This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version.
In the workspace, open the bad-code repository that has been copied for you under the bad-code/ZeroDaySearch directory.
This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version.
setup:
- type: copy_test_files
source: tests/dotnet/dotnet-inspect/bad-code
target: bad-code

Copilot uses AI. Check for mistakes.
Comment thread tests/dotnet/dotnet-inspect/eval.yaml Outdated
prompt: "Which interfaces are defined in the System.Text.Json NuGet package? Show me the complete set."
assertions:
- type: "output_contains"
value: "JsonConverter"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scenario is about discovering interfaces in System.Text.Json, but the assertion requires the output to contain JsonConverter, which is a (generic) class/abstract base type rather than an interface. This can cause false failures if the agent correctly filters to interfaces only. Consider asserting on a known interface name from that package (e.g., IJsonTypeInfoResolver) instead.

Suggested change
value: "JsonConverter"
value: "IJsonTypeInfoResolver"

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +20
rubric:
- "Uses dnx dotnet-inspect package to discover the latest stable System.CommandLine version (e.g., --latest-version)"
- "Uses dnx dotnet-inspect diff or member commands to understand API differences between System.CommandLine versions"
- "Uses dnx dotnet-inspect member to understand the new API surface (e.g., Command, Option, RootCommand)"
- "Correctly migrates Handler.SetHandler calls to the new System.CommandLine API (e.g., SetAction)"
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many rubric items are written as “Uses dnx dotnet-inspect …” (tool/technique-focused, often echoing exact commands). Given the repo’s overfitting detection guidance, this style is more likely to be flagged as overfitted than outcome-focused criteria. Consider rewriting rubric items to primarily score on outcomes (correct findings + sourced evidence from tool output) and rely on expect_tools: ["bash"] / lightweight assertions to ensure the tool is actually used.

Copilot uses AI. Check for mistakes.
richlander and others added 5 commits March 12, 2026 14:42
SKILL.md:
- Bump version 0.6.8 → 0.6.9
- Document --cat, --head N, --tail N flags
- Add --platform examples for source command

eval.yaml:
- Rewrite source scenario rubric to test outcomes not technique
- Remove narrow dnx assertion from source scenario
- Quality improved 2→5, overfitting 0.29→0.08 on source scenario

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove 'Uses dnx dotnet-inspect X command' technique-testing rubric items
and 'output_matches: dnx dotnet-inspect' narrow assertions from all 11
scenarios. Replace with outcome-focused criteria that test what the user
learns rather than which tool was used.

Overfitting score: 0.37 → 0.03

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove hard-coded System.CommandLine 2.0.4 version from rubric
- Fix JsonConverter assertion to IJsonTypeInfoResolver (class vs interface)
- Add CODEOWNERS entries for dotnet-inspect skill and tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Embed factual reference data from dotnet-inspect into rubric criteria
so the LLM judge can accurately score correctness without relying on
its own (potentially incorrect) knowledge:

- IHostedService: only BackgroundService + IHostedLifecycleService in platform
- Dependencies: exact 5 deps for net10.0 with versions
- Overloads: 3 of 6 GetResponseAsync accept JsonSerializerOptions
- IChatClient: 3 members, in AI.Abstractions package
- System.Text.Json: exactly 5 interfaces across 2 namespaces
- HttpClient: 6 extension method families from HttpClientJsonExtensions
- Newtonsoft.Json: signed (token 11425bc5744002c7), no SourceLink, v13.0.4

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…narios

Two new scenarios derived from usage study observations:

1. Build M.E.AI chatbot with tool calling (scenario 11)
   - Scaffolds a .NET 10 console project with M.E.AI + Ollama packages
   - Tests API correctness: OllamaChatClient constructor, AIFunctionFactory
     registration, ChatClientBuilder pipeline, streaming, chat history
   - Rubric embeds ground truth from dotnet-inspect (e.g., Ollama only
     available in preview versions, FunctionInvokingChatClient properties)
   - Study showed quality improvement: 4→5 (baseline missed adding
     assistant responses to chat history)

2. Discover M.E.AI 10.4 abstraction surface (scenario 12)
   - Tests discovery of all 4 client interfaces: IChatClient,
     ISpeechToTextClient, IHostedFileClient, IImageGenerator
   - Tests correct method signatures, pipeline infrastructure, and DI
     registration for each abstraction
   - Study showed 7x token efficiency: 79K vs 534K tokens

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 13, 2026 00:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +3 to +4
version: 0.6.9
description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents.
scenarios:
- name: "Port app to latest System.CommandLine version"
prompt: |
Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project.
rubric:
- "Reports that the Newtonsoft.Json assembly is signed with public key token 11425bc5744002c7"
- "Reports that Newtonsoft.Json does NOT have SourceLink (no readable PDB embedded)"
- "Includes additional metadata such as version (13.0.4), author (James Newton-King), license (MIT), or target frameworks (net6.0, netstandard2.0, etc.)"
Comment thread tests/dotnet/dotnet-inspect/eval.yaml Outdated
value: "IChatClient"
- type: "exit_success"
rubric:
- "OllamaChatClient constructor is called correctly — it takes (Uri, string?, HttpClient?) or (string, string?, HttpClient?) where the second param is the model name. The M.E.AI.Ollama package is only available in preview versions (latest 9.7.0-preview), not stable 10.4.0."
Removed scenarios that showed no quality improvement with high overhead:
- Find types across platform (Δ:0 quality, +634% token overhead)
- Discover interfaces in NuGet package (Δ:0, +94% — superseded by new
  M.E.AI abstraction discovery scenario)
- Inspect method overloads (Δ:0, +102% overhead)

Remaining 10 scenarios include the 2 new M.E.AI study-derived scenarios.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@timheuer timheuer added the DO NOT MERGE Flagging for ensuring that the issue is not merged. label Mar 13, 2026
@timheuer
Copy link
Copy Markdown
Member

New tool introduced, we need to review

Replace scenarios dotnet#8 and dotnet#9 (M.E.AI chatbot and abstraction discovery)
with Hex1b TUI library scenarios that test novel API discovery:

- Scenario dotnet#8: Build Hex1b TUI dashboard with charts — tests widget
  composition, layout, charting, and theming on a brand-new library
- Scenario dotnet#9: Discover Hex1b widget catalog and layout API — tests
  factual API enumeration where training data is absent

Results: 28.5% improvement, 0.47 normalized gain, 0.04 overfitting,
6 improved, 4 equal, 0 regressions (total Δ: +9).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 13, 2026 02:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +70 to +78
prompt: "What are the package dependencies of Microsoft.Extensions.AI? I want to understand what it pulls in."
assertions:
- type: "output_matches"
pattern: "(depend|Microsoft\\.Extensions)"
expect_tools: ["bash"]
rubric:
- "Lists the 5 direct dependencies for net10.0: Microsoft.Extensions.AI.Abstractions, Microsoft.Extensions.Caching.Abstractions, Microsoft.Extensions.DependencyInjection.Abstractions, Microsoft.Extensions.Logging.Abstractions, and System.Numerics.Tensors"
- "Provides version information (e.g., AI.Abstractions 10.4.0, other deps at 10.0.4)"
- "Shows transitive dependencies or the deeper tree — e.g., Caching.Abstractions pulls in Microsoft.Extensions.Primitives, or for net9.0 there are additional direct deps like System.Text.Json and System.Threading.Channels"
Comment on lines +81 to +90
- name: "Audit package signing and SourceLink"
prompt: "Is the Newtonsoft.Json NuGet package signed? Does it support SourceLink? Give me the full metadata."
assertions:
- type: "output_matches"
pattern: "(Newtonsoft|sign|source)"
expect_tools: ["bash"]
rubric:
- "Reports that the Newtonsoft.Json assembly is signed with public key token 11425bc5744002c7"
- "Reports that Newtonsoft.Json does NOT have SourceLink (no readable PDB embedded)"
- "Includes additional metadata such as version (13.0.4), author (James Newton-King), license (MIT), or target frameworks (net6.0, netstandard2.0, etc.)"
Comment thread .github/CODEOWNERS
Comment on lines +25 to +26
/plugins/dotnet/skills/dotnet-inspect/ @richlander
/tests/dotnet/dotnet-inspect/ @richlander
Comment on lines +3 to +5
prompt: |
Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project.
This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version.
This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version.
assertions:
- type: "output_matches"
pattern: "(diff|member|type)"
…ery scenarios

Replace open-ended Hex1b scenarios (dotnet#8 dashboard build, dotnet#9 widget catalog) with
precision-query scenarios that play to dotnet-inspect's proven strengths:

- dotnet#8: Enumerate Hex1b terminal builder configuration API (1→5, Δ+4)
- dotnet#9: Map Hex1b charting API and data model (5→5, 2.5x token efficiency)

Eval: PASSED, 19.6% improvement, 0.45 normalized gain, 0.03 overfitting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@richlander
Copy link
Copy Markdown
Member Author

richlander commented Mar 23, 2026

Closing in favor of a scenario-shaped approach. Instead of one tool-shaped skill, we're splitting into five skills that start from the developer's problem:

  1. dotnet-api-migration — "Upgrade this package, fix breaking changes"
  2. dotnet-platform-discovery — "What does .NET give me for X?"
  3. dotnet-dependency-analysis — "What does this pull in?"
  4. dotnet-supply-chain-visibility — "What should I know about this package?"
  5. dotnet-source-exploration — "Show me the source code"

Plan: docs/dotnet-inspect-skills-plan.md

This PR's eval patterns and dotnet-inspect usage informed the new design. Each new skill will be its own PR.

@richlander richlander closed this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DO NOT MERGE Flagging for ensuring that the issue is not merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants