Add dotnet-inspect skill to dotnet plugin#340
Add dotnet-inspect skill to dotnet plugin#340richlander wants to merge 9 commits intodotnet:mainfrom
Conversation
Add SKILL.md (v0.6.8) with crisp decision tree, source command, version resolution, and 11 eval scenarios covering: porting, type discovery, member inspection, extensions, interfaces, find, implements, depends, overloads, SourceLink audit, and source code review. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
0ebcdd0 to
1cf684d
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new dotnet-inspect skill under the dotnet plugin, providing guidance for using dnx dotnet-inspect to query/compare .NET APIs across packages, platform libraries, and local artifacts, along with an eval suite to validate the skill’s impact.
Changes:
- Introduces
plugins/dotnet/skills/dotnet-inspect/SKILL.mddocumenting the skill workflow and command patterns. - Adds
tests/dotnet/dotnet-inspect/eval.yamlwith 11 evaluation scenarios covering type/member discovery, diffing, extensions, implementors, dependency inspection, and SourceLink auditing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| plugins/dotnet/skills/dotnet-inspect/SKILL.md | New skill definition and usage guidance for dotnet-inspect. |
| tests/dotnet/dotnet-inspect/eval.yaml | New eval scenarios and assertions to exercise the skill across multiple query types. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| --- | ||
| name: dotnet-inspect | ||
| version: 0.6.8 | ||
| description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents. | ||
| --- |
There was a problem hiding this comment.
Frontmatter version: 0.6.0 conflicts with the PR description calling this skill definition v0.6.8. Please align the SKILL.md frontmatter version with the intended release/version (or drop the field if versions aren’t tracked for skills here) to avoid confusion in future updates.
| --- | ||
| name: dotnet-inspect | ||
| version: 0.6.8 | ||
| description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents. | ||
| --- | ||
|
|
||
| # dotnet-inspect |
There was a problem hiding this comment.
This PR adds a new skill under /plugins/dotnet/skills/dotnet-inspect/ (and matching tests), but .github/CODEOWNERS currently has no entries for these paths. CONTRIBUTING.md requires adding CODEOWNERS entries for every new skill/tests area so PRs route to the right owners; please add entries for both the skill directory and tests/dotnet/dotnet-inspect/.
| - "Uses dnx dotnet-inspect diff or member commands to understand API differences between System.CommandLine versions" | ||
| - "Uses dnx dotnet-inspect member to understand the new API surface (e.g., Command, Option, RootCommand)" | ||
| - "Correctly migrates Handler.SetHandler calls to the new System.CommandLine API (e.g., SetAction)" | ||
| - "Updates the PackageReference to the latest stable System.CommandLine version (2.0.4)" |
There was a problem hiding this comment.
This scenario’s rubric hard-codes System.CommandLine “latest stable” as 2.0.4. That will become stale as soon as a newer stable version ships and can incorrectly penalize correct “upgrade to latest” behavior. Consider rewording the rubric to require upgrading to the latest stable version discovered during the run (or to the latest 2.x stable) rather than a specific patch version.
| - "Updates the PackageReference to the latest stable System.CommandLine version (2.0.4)" | |
| - "Updates the PackageReference to the latest stable System.CommandLine version discovered during the run" |
| Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project. | ||
| This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version. |
There was a problem hiding this comment.
This scenario depends on cloning an external GitHub repository at runtime. That can make evals flaky (repo moves, force-pushes, downtime, or content changes) and harder to reproduce. If possible, pin to a specific commit/tag (so results are stable) or include a minimal fixture project under tests/dotnet/dotnet-inspect/ and use setup.copy_test_files instead of cloning.
| Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project. | |
| This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version. | |
| In the workspace, open the bad-code repository that has been copied for you under the bad-code/ZeroDaySearch directory. | |
| This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version. | |
| setup: | |
| - type: copy_test_files | |
| source: tests/dotnet/dotnet-inspect/bad-code | |
| target: bad-code |
| prompt: "Which interfaces are defined in the System.Text.Json NuGet package? Show me the complete set." | ||
| assertions: | ||
| - type: "output_contains" | ||
| value: "JsonConverter" |
There was a problem hiding this comment.
The scenario is about discovering interfaces in System.Text.Json, but the assertion requires the output to contain JsonConverter, which is a (generic) class/abstract base type rather than an interface. This can cause false failures if the agent correctly filters to interfaces only. Consider asserting on a known interface name from that package (e.g., IJsonTypeInfoResolver) instead.
| value: "JsonConverter" | |
| value: "IJsonTypeInfoResolver" |
| rubric: | ||
| - "Uses dnx dotnet-inspect package to discover the latest stable System.CommandLine version (e.g., --latest-version)" | ||
| - "Uses dnx dotnet-inspect diff or member commands to understand API differences between System.CommandLine versions" | ||
| - "Uses dnx dotnet-inspect member to understand the new API surface (e.g., Command, Option, RootCommand)" | ||
| - "Correctly migrates Handler.SetHandler calls to the new System.CommandLine API (e.g., SetAction)" |
There was a problem hiding this comment.
Many rubric items are written as “Uses dnx dotnet-inspect …” (tool/technique-focused, often echoing exact commands). Given the repo’s overfitting detection guidance, this style is more likely to be flagged as overfitted than outcome-focused criteria. Consider rewriting rubric items to primarily score on outcomes (correct findings + sourced evidence from tool output) and rely on expect_tools: ["bash"] / lightweight assertions to ensure the tool is actually used.
SKILL.md: - Bump version 0.6.8 → 0.6.9 - Document --cat, --head N, --tail N flags - Add --platform examples for source command eval.yaml: - Rewrite source scenario rubric to test outcomes not technique - Remove narrow dnx assertion from source scenario - Quality improved 2→5, overfitting 0.29→0.08 on source scenario Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove 'Uses dnx dotnet-inspect X command' technique-testing rubric items and 'output_matches: dnx dotnet-inspect' narrow assertions from all 11 scenarios. Replace with outcome-focused criteria that test what the user learns rather than which tool was used. Overfitting score: 0.37 → 0.03 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove hard-coded System.CommandLine 2.0.4 version from rubric - Fix JsonConverter assertion to IJsonTypeInfoResolver (class vs interface) - Add CODEOWNERS entries for dotnet-inspect skill and tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Embed factual reference data from dotnet-inspect into rubric criteria so the LLM judge can accurately score correctness without relying on its own (potentially incorrect) knowledge: - IHostedService: only BackgroundService + IHostedLifecycleService in platform - Dependencies: exact 5 deps for net10.0 with versions - Overloads: 3 of 6 GetResponseAsync accept JsonSerializerOptions - IChatClient: 3 members, in AI.Abstractions package - System.Text.Json: exactly 5 interfaces across 2 namespaces - HttpClient: 6 extension method families from HttpClientJsonExtensions - Newtonsoft.Json: signed (token 11425bc5744002c7), no SourceLink, v13.0.4 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…narios
Two new scenarios derived from usage study observations:
1. Build M.E.AI chatbot with tool calling (scenario 11)
- Scaffolds a .NET 10 console project with M.E.AI + Ollama packages
- Tests API correctness: OllamaChatClient constructor, AIFunctionFactory
registration, ChatClientBuilder pipeline, streaming, chat history
- Rubric embeds ground truth from dotnet-inspect (e.g., Ollama only
available in preview versions, FunctionInvokingChatClient properties)
- Study showed quality improvement: 4→5 (baseline missed adding
assistant responses to chat history)
2. Discover M.E.AI 10.4 abstraction surface (scenario 12)
- Tests discovery of all 4 client interfaces: IChatClient,
ISpeechToTextClient, IHostedFileClient, IImageGenerator
- Tests correct method signatures, pipeline infrastructure, and DI
registration for each abstraction
- Study showed 7x token efficiency: 79K vs 534K tokens
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| version: 0.6.9 | ||
| description: Query .NET APIs across NuGet packages, platform libraries, and local files. Search for types, list API surfaces, compare and diff versions, find extension methods and implementors. Use whenever you need to answer questions about .NET library contents. |
| scenarios: | ||
| - name: "Port app to latest System.CommandLine version" | ||
| prompt: | | ||
| Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project. |
| rubric: | ||
| - "Reports that the Newtonsoft.Json assembly is signed with public key token 11425bc5744002c7" | ||
| - "Reports that Newtonsoft.Json does NOT have SourceLink (no readable PDB embedded)" | ||
| - "Includes additional metadata such as version (13.0.4), author (James Newton-King), license (MIT), or target frameworks (net6.0, netstandard2.0, etc.)" |
| value: "IChatClient" | ||
| - type: "exit_success" | ||
| rubric: | ||
| - "OllamaChatClient constructor is called correctly — it takes (Uri, string?, HttpClient?) or (string, string?, HttpClient?) where the second param is the model name. The M.E.AI.Ollama package is only available in preview versions (latest 9.7.0-preview), not stable 10.4.0." |
Removed scenarios that showed no quality improvement with high overhead: - Find types across platform (Δ:0 quality, +634% token overhead) - Discover interfaces in NuGet package (Δ:0, +94% — superseded by new M.E.AI abstraction discovery scenario) - Inspect method overloads (Δ:0, +102% overhead) Remaining 10 scenarios include the 2 new M.E.AI study-derived scenarios. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
New tool introduced, we need to review |
Replace scenarios dotnet#8 and dotnet#9 (M.E.AI chatbot and abstraction discovery) with Hex1b TUI library scenarios that test novel API discovery: - Scenario dotnet#8: Build Hex1b TUI dashboard with charts — tests widget composition, layout, charting, and theming on a brand-new library - Scenario dotnet#9: Discover Hex1b widget catalog and layout API — tests factual API enumeration where training data is absent Results: 28.5% improvement, 0.47 normalized gain, 0.04 overfitting, 6 improved, 4 equal, 0 regressions (total Δ: +9). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| prompt: "What are the package dependencies of Microsoft.Extensions.AI? I want to understand what it pulls in." | ||
| assertions: | ||
| - type: "output_matches" | ||
| pattern: "(depend|Microsoft\\.Extensions)" | ||
| expect_tools: ["bash"] | ||
| rubric: | ||
| - "Lists the 5 direct dependencies for net10.0: Microsoft.Extensions.AI.Abstractions, Microsoft.Extensions.Caching.Abstractions, Microsoft.Extensions.DependencyInjection.Abstractions, Microsoft.Extensions.Logging.Abstractions, and System.Numerics.Tensors" | ||
| - "Provides version information (e.g., AI.Abstractions 10.4.0, other deps at 10.0.4)" | ||
| - "Shows transitive dependencies or the deeper tree — e.g., Caching.Abstractions pulls in Microsoft.Extensions.Primitives, or for net9.0 there are additional direct deps like System.Text.Json and System.Threading.Channels" |
| - name: "Audit package signing and SourceLink" | ||
| prompt: "Is the Newtonsoft.Json NuGet package signed? Does it support SourceLink? Give me the full metadata." | ||
| assertions: | ||
| - type: "output_matches" | ||
| pattern: "(Newtonsoft|sign|source)" | ||
| expect_tools: ["bash"] | ||
| rubric: | ||
| - "Reports that the Newtonsoft.Json assembly is signed with public key token 11425bc5744002c7" | ||
| - "Reports that Newtonsoft.Json does NOT have SourceLink (no readable PDB embedded)" | ||
| - "Includes additional metadata such as version (13.0.4), author (James Newton-King), license (MIT), or target frameworks (net6.0, netstandard2.0, etc.)" |
| /plugins/dotnet/skills/dotnet-inspect/ @richlander | ||
| /tests/dotnet/dotnet-inspect/ @richlander |
| prompt: | | ||
| Clone https://github.com/richlander/bad-code and look at the ZeroDaySearch project. | ||
| This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version. |
| This app uses a beta version of System.CommandLine. Upgrade it to the latest stable version. | ||
| assertions: | ||
| - type: "output_matches" | ||
| pattern: "(diff|member|type)" |
…ery scenarios Replace open-ended Hex1b scenarios (dotnet#8 dashboard build, dotnet#9 widget catalog) with precision-query scenarios that play to dotnet-inspect's proven strengths: - dotnet#8: Enumerate Hex1b terminal builder configuration API (1→5, Δ+4) - dotnet#9: Map Hex1b charting API and data model (5→5, 2.5x token efficiency) Eval: PASSED, 19.6% improvement, 0.45 normalized gain, 0.03 overfitting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing in favor of a scenario-shaped approach. Instead of one tool-shaped skill, we're splitting into five skills that start from the developer's problem:
Plan: This PR's eval patterns and dotnet-inspect usage informed the new design. Each new skill will be its own PR. |
Summary
Add the
dotnet-inspectskill to thedotnetplugin. This skill enables querying .NET APIs across NuGet packages, platform libraries, and local files — type discovery, member inspection, version diffing, extension methods, implementors, dependency graphs, SourceLink source URLs, and metadata auditing.Files
plugins/dotnet/skills/dotnet-inspect/SKILL.md— skill definition (v0.6.8)tests/dotnet/dotnet-inspect/eval.yaml— 11 eval scenariosEval Results
+42–51% improvement across runs (statistically significant, passed validation).
Design Notes
Tool
dotnet-inspect— installed viadnx dotnet-inspect -y -- <command>