Add skills for C# MCP Server Development by leslierichardson95 · Pull Request #317 · dotnet/skills

leslierichardson95 · 2026-03-10T18:56:51Z

This pull request introduces comprehensive documentation for creating MCP servers (creating, debugging, testing, publishing) using the C# SDK and .NET project templates. These documents provide step-by-step instructions, attribute references, implementation patterns, and advanced configuration guidance for developers building, debugging, testing, and publishing MCP server projects.

Reference documentation for implementation and configuration:

Added references/api-patterns.md, detailing attribute usage, tool return types, dependency injection, builder API, dynamic tool creation, server options, experimental APIs, and NuGet package selection.
Added references/transport-config.md, covering stdio and HTTP transport setup, custom path prefixes, stateless mode, authentication/authorization, accessing HTTP context, OAuth flows, idle timeout, port configuration, and OpenTelemetry observability.

All skills successfully passed skills-validator testing.

Four new skills for the C# MCP server development lifecycle: - mcp-csharp-create: Scaffolding with dotnet new mcpserver, tools/prompts/resources, transport config - mcp-csharp-debug: MCP Inspector, VS Code integration, breakpoint debugging, logging - mcp-csharp-test: Unit tests with ClientServerTestBase, integration with WebApplicationFactory, evals - mcp-csharp-publish: NuGet packaging, Docker/Azure deployment, MCP Registry publishing Each skill includes SKILL.md with progressive disclosure references/ and eval.yaml tests.

…te syntax Replace scaffolding-heavy scenarios with implementation-focused ones that test MCP-specific features (resources, prompts, logging). Fix assertion patterns to match combined C# attribute syntax [McpServerTool, Description()] instead of requiring standalone [McpServerTool]. Increase timeouts to 180s to account for skill-reading overhead. Validator result: passed=True, improvement=44.6% (threshold=10%)

Copilot

Pull request overview

Adds a set of new .NET skill documents (create/debug/test/publish) for building MCP servers with the C# SDK, along with corresponding eval scenarios under tests/dotnet/ to validate the skills via skill-validator.

Changes:

Added four new MCP C# skills: creation, debugging, testing, and publishing/deployment.
Added reference guides covering SDK API patterns, transport configuration, testing patterns, and publishing/registry workflows.
Added eval scenarios for each new skill under tests/dotnet/.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
tests/dotnet/mcp-csharp-create/eval.yaml	Adds eval scenarios for MCP server scaffolding, attributes/DI, and HTTP setup.
tests/dotnet/mcp-csharp-debug/eval.yaml	Adds eval scenarios for Inspector usage and IDE/Copilot configuration.
tests/dotnet/mcp-csharp-test/eval.yaml	Adds eval scenarios for unit/integration testing and evaluation authoring.
tests/dotnet/mcp-csharp-publish/eval.yaml	Adds eval scenarios for NuGet tool publishing, Azure deployment, and registry publishing.
plugins/dotnet/skills/mcp-csharp-create/SKILL.md	New skill doc for creating MCP servers with C# SDK and templates.
plugins/dotnet/skills/mcp-csharp-create/references/api-patterns.md	Reference for C# MCP SDK attributes, return types, DI, and builder patterns.
plugins/dotnet/skills/mcp-csharp-create/references/transport-config.md	Reference for stdio/HTTP transport configuration, auth, and observability.
plugins/dotnet/skills/mcp-csharp-debug/SKILL.md	New skill doc for running/debugging MCP servers and configuring IDEs.
plugins/dotnet/skills/mcp-csharp-debug/references/ide-config.md	Detailed VS Code/Visual Studio MCP + debugger configuration examples.
plugins/dotnet/skills/mcp-csharp-debug/references/mcp-inspector.md	Reference for using MCP Inspector across stdio/HTTP scenarios.
plugins/dotnet/skills/mcp-csharp-test/SKILL.md	New skill doc for unit/integration testing and evaluations for MCP servers.
plugins/dotnet/skills/mcp-csharp-test/references/test-patterns.md	Reference test patterns (in-memory, WebApplicationFactory, mocking).
plugins/dotnet/skills/mcp-csharp-test/references/evaluation-guide.md	Reference guidance for creating deterministic, verifiable eval sets.
plugins/dotnet/skills/mcp-csharp-publish/SKILL.md	New skill doc for packaging, Docker/Azure deployment, and registry publishing.
plugins/dotnet/skills/mcp-csharp-publish/references/nuget-packaging.md	Reference for `.csproj` tool packaging and NuGet publishing flow.
plugins/dotnet/skills/mcp-csharp-publish/references/docker-azure.md	Reference for Docker + Azure deployment commands and secret handling.
plugins/dotnet/skills/mcp-csharp-publish/references/mcp-registry.md	Reference for `server.json` and `mcp-publisher` workflow/CI guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…aging.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…on95/skills into lerich/mcp-skills

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-25T02:42:01Z

Skill Validation Results

Skill	Scenario	Quality (Isolated)	Quality (Plugin)	Skills Loaded	Agents Invoked	Overfit	Verdict
mcp-csharp-create	Implement MCP tools with proper attributes and DI	3.0/5 ⏰ → 4.0/5 ⏰ 🟢	3.0/5 ⏰ → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill, view / ✅ mcp-csharp-create; tools: skill, view	— / —	✅ 0.10	✅
mcp-csharp-create	Create an HTTP MCP server with tools and resources	1.7/5 ⏰ → 4.3/5 ⏰ 🟢	1.7/5 ⏰ → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill, create, edit, view / ✅ mcp-csharp-create; tools: skill, edit, create, view	— / —	✅ 0.10	✅
mcp-csharp-create	Create an MCP server with tools, prompts, and proper logging	3.0/5 ⏰ → 4.7/5 ⏰ 🟢	3.0/5 ⏰ → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill, edit, create / ✅ mcp-csharp-create; tools: skill, read_bash, create, edit	— / —	✅ 0.10	❌ [1]
mcp-csharp-publish	Publish an MCP server as a NuGet tool package	3.0/5 → 4.0/5 🟢	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill, glob / ✅ mcp-csharp-publish; tools: skill, glob	— / —	✅ 0.18	✅
mcp-csharp-publish	Deploy an HTTP MCP server to Azure Container Apps	3.0/5 → 5.0/5 🟢	3.0/5 → 5.0/5 🟢	✅ mcp-csharp-publish; tools: skill, report_intent, view / ✅ mcp-csharp-publish; tools: skill, report_intent, view	— / —	✅ 0.18	✅
mcp-csharp-publish	Publish to the MCP Registry	4.0/5 → 4.3/5 🟢	4.0/5 → 4.0/5	✅ mcp-csharp-publish; tools: skill, view / ✅ mcp-csharp-publish; tools: skill, view	— / —	✅ 0.18	❌ [2]
mcp-csharp-debug	Debug an MCP server with MCP Inspector	4.7/5 → 4.0/5 🔴	4.7/5 → 4.0/5 🔴	✅ mcp-csharp-debug; tools: report_intent, skill, view / ✅ mcp-csharp-debug; tools: skill	— / —	✅ 0.10	❌
mcp-csharp-debug	Configure VS Code to use an MCP server	4.0/5 → 5.0/5 🟢	4.0/5 → 4.7/5 🟢	✅ mcp-csharp-debug; tools: skill, view, glob / ✅ mcp-csharp-debug; tools: skill, view, glob	— / —	✅ 0.10	✅
mcp-csharp-debug	Debug a failing MCP server tool	3.3/5 → 3.7/5 🟢	3.3/5 → 3.3/5	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	— / —	✅ 0.10	❌ [3]
mcp-csharp-test	Write unit and integration tests for an MCP server	2.0/5 → 4.7/5 🟢	2.0/5 → 5.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	— / —	✅ 0.18	✅
mcp-csharp-test	Test an HTTP MCP server with WebApplicationFactory	3.7/5 → 3.3/5 🔴	3.7/5 → 4.0/5 🟢	✅ mcp-csharp-test; tools: report_intent, skill, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	— / —	✅ 0.18	❌
mcp-csharp-test	Create evaluations for an MCP server	2.0/5 → 2.0/5	2.0/5 → 2.0/5	✅ mcp-csharp-test; tools: task, bash, grep, glob, skill / ✅ mcp-csharp-test; tools: skill	explore / —	✅ 0.18	❌ [4]

[1] (Isolated) Quality improved but weighted score is -47.0% due to: judgment, quality
[2] (Isolated) Quality improved but weighted score is -4.6% due to: judgment
[3] (Plugin) Quality unchanged but weighted score is -8.3% due to: tokens (11750 → 28911), tool calls (0 → 1), time (14.2s → 18.6s)
[4] (Isolated) Quality unchanged but weighted score is -23.7% due to: judgment, quality, tool calls (5 → 10), tokens (40305 → 46265)

⏰ timeout — run hit the scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output

Model: claude-opus-4.6 | Judge: claude-opus-4.6

Full results

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

danmoseley · 2026-03-25T02:58:53Z

Created #440 so that on future PR's, local agent can help figure out next steps given an evaluation.

…oaches The rubric criterion 'Shows how to attach a debugger' was too narrow. The skilled answer correctly focused on the dotnet#1 cause (stdout pollution) but scored low because it didn't show a specific 'attach to process' flow. Broadened to accept any valid debugging approach: attaching, Debugger.Launch(), or launch.json configuration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-03-25T03:12:16Z

Note

This comment was AI/Copilot-generated.

Eval Results Analysis (run 23522050726)

The timeout increase to 360s fixed the create scenarios which were previously all 1.0/5. I also just pushed a debug rubric fix (511cb8d). Here's the full picture across all three eval runs:

Trend across runs

Skill	Scenario	Mar 20	Mar 25 (pre-fix)	Mar 25 (post-fix)	Status
create	MCP tools + DI	3.0→4.3 ✅	1.0→1.0 ❌	3.0→4.0 ✅	Fixed (timeout)
create	HTTP server	1.7→3.3 ✅	1.0→1.0 ❌	1.7→4.3 ✅	Fixed (timeout)
create	Tools+prompts+logging	1.7→3.7 ✅	2.3→1.0 ❌	3.0→4.7 ❌	Pairwise variance¹
publish	NuGet tool	✅	✅	3.0→4.0 ✅	Stable
publish	Azure Container Apps	✅	✅	3.0→5.0 ✅	Stable
publish	MCP Registry	❌	❌	4.0→4.3 ❌	Pairwise variance¹
debug	MCP Inspector	❌	❌	4.7→4.0 ❌	Skill content²
debug	VS Code config	✅	✅	4.0→5.0 ✅	Stable
debug	Failing tool	✅	✅	3.3→3.7 ❌	Fixed (rubric)
test	Unit + integration	✅	✅	2.0→4.7 ✅	Stable
test	WebApplicationFactory	❌	❌	3.7→3.3 ❌	Skill content²
test	Evaluations	❌	❌	2.0→2.0 ❌	Skill content²

Summary: 7/12 passing, likely 8–9 on re-run with fixes just pushed

¹ Pairwise variance = isolated quality improved but the pairwise LLM judge preferred baseline on this roll. No action needed; will fluctuate run-to-run.
² Skill content = the skill references don't cover the topic well enough for the agent to produce a strong answer.

What I fixed (just pushed to `lerich/mcp-skills`)

Create timeouts (earlier commit 1289126): 180s/180s/default(120s) → 360s/360s/360s. This fixed all three create scenarios.
Debug "failing tool" rubric (commit 511cb8d): Broadened "Shows how to attach a debugger to the running server process" to accept any valid debugging approach (attach, Debugger.Launch(), launch.json). The skilled answer correctly focused on stdout/stderr pollution (the Initial documentation and validation workflow #1 real-world cause) but was penalized for not showing a specific PID-attach workflow.

Remaining failures — what needs attention

Pairwise judge variance (no action needed)

These two scenarios actually improved in quality but the pairwise judge preferred baseline on this particular roll:

Create s3 "tools+prompts+logging": Isolated rubric scores went to 5/5 on all 4 criteria (up from 3.7 avg baseline). But pairwise judge preferred baseline's IHttpClientFactory pattern over skilled's bare HttpClient, and baseline's richer prompt construction. Will likely pass on re-run.
Publish "MCP Registry": Skilled scored 4.3 vs baseline 4.0, used 55% fewer tokens (88K→40K), 40% less time. Pairwise judge just disagreed. Will fluctuate.

Skill content gaps (for @leslierichardson95)

1. Debug "Inspector" (4.7→4.0)

The skilled answer only mentions tools when describing what Inspector shows. The rubric expects "tools, prompts, and resources."

File: plugins/dotnet-ai/skills/mcp-csharp-debug/references/mcp-inspector.md
What to change: The file has separate "Tool Testing", "Prompt Testing", "Resource Browsing" sections, but the intro only says "listing tools, calling them with custom parameters, and inspecting protocol messages." Add an intro line like: "Provides a web UI for testing tools, prompts, and resources" so the agent picks up all three when summarizing.

2. Test "WebApplicationFactory" (3.7→3.3)

The skilled answer shows WebApplicationFactory<Program> setup but never demonstrates an actual tool call through the HTTP endpoint. The rubric criterion "Tests tool invocation through the HTTP endpoint" scored 1.7/5 vs baseline's 3.7/5.

File: plugins/dotnet-ai/skills/mcp-csharp-test/references/test-patterns.md
What to change: The "HTTP Testing with WebApplicationFactory" section shows CreateClient() and PostAsJsonAsync for raw HTTP, but doesn't show a tool invocation example. Add an example showing how to send a tools/call JSON-RPC request through the HTTP endpoint and verify the response — similar to how the ClientServerTestBase section shows client.CallToolAsync("my_tool", ...) but adapted for HTTP.

3. Test "evaluations" (2.0→2.0, both baseline AND skilled fail)

The skilled answer literally said: "This is about LLM evaluations... outside the scope of the MCP server testing skill." Neither baseline nor skilled can answer this well — the skill simply doesn't have content about evaluation authoring.

File to create: plugins/dotnet-ai/skills/mcp-csharp-test/references/evaluation-authoring.md (new)
What to add: Content covering the XML qa_pair evaluation format, what makes good evaluation questions (read-only, deterministic, require multi-tool reasoning), and example questions for a product catalog scenario. Then link it from the skill's SKILL.md.
This is the biggest gap — without reference content, the agent has nothing to draw on.

Recommended next steps

Re-run eval (/evaluate) to pick up the two fixes just pushed (debug rubric + timeouts). The 2 pairwise-variance failures may also resolve on a fresh roll.
Address the 3 skill content gaps above — these are the systemic issues that won't resolve with re-runs.

danmoseley · 2026-03-25T15:22:45Z

Next action here on @leslierichardson95 -- hopefully above is helpful -- I'll try to get the "improved analysis guidance tailored for agents" merged in parallel

danmoseley · 2026-03-25T16:57:17Z

Merged my part, reevaluating

danmoseley · 2026-03-25T16:57:22Z

/evaluate

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-25T17:09:46Z

Skill Validation Results

Skill	Scenario	Quality (Isolated)	Quality (Plugin)	Skills Loaded	Agents Invoked	Overfit	Verdict
mcp-csharp-create	Implement MCP tools with proper attributes and DI	2.7/5 ⏰ → 4.0/5 🟢	2.7/5 ⏰ → 4.3/5 🟢	✅ mcp-csharp-create; tools: skill / ✅ mcp-csharp-create; tools: skill	— / —	✅ 0.07	✅
mcp-csharp-create	Create an HTTP MCP server with tools and resources	1.7/5 ⏰ → 4.0/5 🟢	1.7/5 ⏰ → 4.0/5 🟢	✅ mcp-csharp-create; tools: skill, stop_bash / ✅ mcp-csharp-create; tools: skill, read_bash, stop_bash	— / —	✅ 0.07	✅
mcp-csharp-create	Create an MCP server with tools, prompts, and proper logging	2.3/5 ⏰ → 4.0/5 🟢	2.3/5 ⏰ → 5.0/5 🟢	✅ mcp-csharp-create; tools: skill, create, write_bash, stop_bash / ✅ mcp-csharp-create; tools: skill, create	— / —	✅ 0.07	✅
mcp-csharp-test	Write unit and integration tests for an MCP server	2.0/5 → 4.7/5 🟢	2.0/5 → 5.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	— / —	🟡 0.21	✅
mcp-csharp-test	Test an HTTP MCP server with WebApplicationFactory	3.7/5 → 3.7/5	3.7/5 → 4.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	— / —	🟡 0.21	❌ [1]
mcp-csharp-test	Create evaluations for an MCP server	2.0/5 → 1.7/5 ⏰ 🔴	2.0/5 → 2.0/5	⚠️ NOT ACTIVATED / ✅ mcp-csharp-test; tools: report_intent, skill, view	explore / —	🟡 0.21	❌
mcp-csharp-debug	Debug an MCP server with MCP Inspector	4.7/5 → 4.3/5 🔴	4.7/5 → 4.7/5	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	— / —	✅ 0.16	❌ [2]
mcp-csharp-debug	Configure VS Code to use an MCP server	4.0/5 → 4.0/5	4.0/5 → 5.0/5 🟢	✅ mcp-csharp-debug; tools: skill, view, glob / ✅ mcp-csharp-debug; tools: skill, view, glob	— / —	✅ 0.16	✅
mcp-csharp-debug	Debug a failing MCP server tool	4.7/5 → 4.0/5 🔴	4.7/5 → 3.3/5 🔴	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	— / —	✅ 0.16	❌
mcp-csharp-publish	Publish an MCP server as a NuGet tool package	3.0/5 → 4.0/5 🟢	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill, glob / ✅ mcp-csharp-publish; tools: skill, glob	— / —	🟡 0.22	✅
mcp-csharp-publish	Deploy an HTTP MCP server to Azure Container Apps	3.0/5 → 5.0/5 🟢	3.0/5 → 5.0/5 🟢	✅ mcp-csharp-publish; tools: skill, report_intent, view / ✅ mcp-csharp-publish; tools: skill, report_intent, view	— / —	🟡 0.22	✅
mcp-csharp-publish	Publish to the MCP Registry	3.7/5 → 4.0/5 🟢	3.7/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill, view / ✅ mcp-csharp-publish; tools: skill, view	— / —	🟡 0.22	✅

[1] (Plugin) Quality improved but weighted score is -5.8% due to: tokens (12684 → 45818), tool calls (0 → 3)
[2] (Plugin) Quality unchanged but weighted score is -7.9% due to: tokens (12015 → 29458), tool calls (0 → 1)

⏰ timeout — run(s) hit the (120s, 360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

Full results

To investigate failures, paste this to your AI coding agent:

Download eval artifacts with gh run download 23553427829 --repo dotnet/skills --dir /tmp/eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/8a95d954e9d05b5b6120c39259d27d96bf9e1987/eng/skill-validator/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

danmoseley · 2026-03-25T18:07:30Z

Note

This analysis was generated by GitHub Copilot, following the "Download eval artifacts" investigation guidance from the evaluation table comment.

Eval Failure Analysis

I downloaded the eval artifacts (gh run download 23553427829 --repo dotnet/skills --pattern "skill-validator-results-*") and followed InvestigatingResults.md to diagnose the 4 failing scenarios. Here's what I found, using the guide's failure pattern taxonomy.

Failure 1: `mcp-csharp-debug` / "Debug an MCP server with MCP Inspector"

Score: iso=+4.3%, plug=‑7.9% · Pattern #4 — Quality unchanged, token overhead kills it

Rubrics were a wash (2 ties, baseline won 1, skill won 1). But tokens inflated 12K→29K (2.4×) and tool calls went 0→1. The ‑1.00 token and tool reduction penalties drag the weighted score below zero despite neutral quality.

Fix: Trim skill content size. Quality is genuinely neutral — the skill needs to be more concise or produce clearly differentiated output to offset its token cost.

Failure 2: `mcp-csharp-debug` / "Debug a failing MCP server tool"

Score: iso=‑21.1%, plug=‑29.7% · Pattern #4 + actual quality regression

Most concerning failure. All 3 per-run scores are negative [‑0.45, ‑0.40, ‑0.33] — consistently worse, not variance. Baseline won 2 rubrics (attaching a debugger, broader debugging approach), skill won 1 (stderr recommendation). The skilled output was shorter (1,331 chars vs 1,998 baseline) and narrower — it focused almost entirely on stdout corruption and stale builds, while the baseline covered a broader range of approaches (file logging, VS Code output panel, common culprits).

Fix: The skill is over-indexing on stderr/stdout as the debugging narrative. It should also cover attaching debuggers, checking VS Code MCP output channels, and other diagnostic approaches the rubric expects. This is a skill content quality issue.

Failure 3: `mcp-csharp-test` / "Test HTTP MCP server with WebApplicationFactory"

Score: iso=‑5.1%, plug=‑5.8% · Pattern #4 — Slight quality improvement overwhelmed by overhead

Quality actually improved slightly (qual=+0.05–0.07), but tokens went 12K→43–45K (3.5×!) and tool calls 0→3. Rubrics split: skill won on InternalsVisibleTo, baseline won on MCP initialize requests and HTTP tool invocation testing. Per-run scores [‑0.06, ‑0.13, ‑0.30] show variance.

Fix: Borderline case. Reducing skill content size would help the token penalty. The rubric also expects "MCP initialize request" and "tool invocation through HTTP" coverage — the skill should ensure it covers those patterns, not just the InternalsVisibleTo setup detail.

Failure 4: `mcp-csharp-test` / "Create evaluations for an MCP server"

Score: iso=‑30.0%, plug=‑24.0% · Patterns #5 (Not activated) + #1 (Timeout) + #2 (Baseline already weak)

Multi-pattern pile-up:

Isolated: Skill was NOT ACTIVATED. The agent went exploring on its own — 26 tool calls (7 bash, 3 glob, 2 grep), 163K tokens, timed out. It never loaded the skill.
Plugin: Skill activated, but a key assertion failed — output didn't mention "read-only/non-destructive/deterministic" evaluation guidance.
Baseline quality was already low (~2.0/5).

Fix priority:

Fix activation — the skill's description frontmatter likely doesn't mention "evaluations" or "eval", so the runtime doesn't select it. Add those keywords.
Fix content — even when activated (plugin run), the skill didn't cover the "read-only/deterministic" requirement. Add a section on writing good evaluation questions.
Consider increasing timeout from 120s, though fixing activation is the real fix.

Summary — What to fix first

Priority	Scenario	Action
1	Create evaluations	Fix skill activation (add "evaluation" to description), add eval-writing guidance to skill content
2	Debug a failing tool	Broaden skill content beyond stderr — cover debugger attachment, VS Code output panel
3	WebApplicationFactory	Reduce skill size; add MCP initialize request / HTTP invocation patterns
4	MCP Inspector	Trim skill content to reduce token overhead

The dominant theme across 3 of 4 failures is Pattern #4 (token overhead) — the skills are too large relative to the quality improvement they deliver.

Side note on the investigation flow itself: the gh run download command in the eval table instructions fails with exit code 1 because the workflow run includes a skill-validator-dist.tar.gz artifact that gh can't extract as zip. Adding --pattern "skill-validator-results-*" to the download command avoids this. The InvestigatingResults.md guide was excellent — the failure pattern taxonomy mapped cleanly to every issue found.

mcp-csharp-debug: - Trim SKILL.md verbosity (183->160 lines) and mcp-inspector.md (67->54 lines) - Add Diagnosing Tool Errors section covering debugger, output panel, Inspector, common culprits - Rebalance debugging narrative away from stderr-only focus - Move HTTP logging config to ide-config.md reference mcp-csharp-test: - Add eval/evaluations keywords to frontmatter for activation - Add HTTP tool invocation test pattern (tools/call via WebApplicationFactory) - Trim test-patterns.md bloat (remove Test Categories, Coverage, Input Validation) - Create references/evaluations.md with qa_pair format and read-only/deterministic guidance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

leslierichardson95 · 2026-03-25T19:50:55Z

/evaluate

github-actions · 2026-03-25T20:02:07Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
mcp-csharp-create	Implement MCP tools with proper attributes and DI	3.7/5 → 4.0/5 🟢	✅ mcp-csharp-create; tools: skill, edit, view / ✅ mcp-csharp-create; tools: skill, edit, view	✅ 0.06	✅
mcp-csharp-create	Create an HTTP MCP server with tools and resources	2.3/5 ⏰ → 4.3/5 🟢	✅ mcp-csharp-create; tools: skill, stop_bash / ✅ mcp-csharp-create; tools: skill, stop_bash	✅ 0.06	✅
mcp-csharp-create	Create an MCP server with tools, prompts, and proper logging	3.0/5 ⏰ → 4.3/5 🟢	✅ mcp-csharp-create; tools: skill, edit, create, read_bash / ✅ mcp-csharp-create; tools: skill, edit, create	✅ 0.06	✅
mcp-csharp-test	Write unit and integration tests for an MCP server	2.0/5 → 4.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	✅ 0.16	✅
mcp-csharp-test	Test an HTTP MCP server with WebApplicationFactory	3.3/5 → 5.0/5 🟢	✅ mcp-csharp-test; tools: skill, report_intent, view / ✅ mcp-csharp-test; tools: skill, report_intent, view	✅ 0.16	✅
mcp-csharp-test	Create evaluations for an MCP server	2.7/5 → 5.0/5 🟢	✅ mcp-csharp-test; tools: skill / ✅ mcp-csharp-test; tools: skill	✅ 0.16	✅
mcp-csharp-debug	Debug an MCP server with MCP Inspector	4.0/5 → 4.0/5	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: report_intent, skill	✅ 0.07	❌ [1]
mcp-csharp-debug	Configure VS Code to use an MCP server	4.3/5 → 4.7/5 🟢	✅ mcp-csharp-debug; tools: skill, view, glob / ✅ mcp-csharp-debug; tools: skill	✅ 0.07	✅
mcp-csharp-debug	Debug a failing MCP server tool	3.7/5 → 4.0/5 🟢	✅ mcp-csharp-debug; tools: report_intent, skill / ✅ mcp-csharp-debug; tools: skill	✅ 0.07	❌ [2]
mcp-csharp-publish	Publish an MCP server as a NuGet tool package	3.0/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill, glob / ✅ mcp-csharp-publish; tools: skill, glob	✅ 0.11	✅
mcp-csharp-publish	Deploy an HTTP MCP server to Azure Container Apps	3.0/5 → 5.0/5 🟢	✅ mcp-csharp-publish; tools: skill, report_intent, view / ✅ mcp-csharp-publish; tools: skill, report_intent, view	✅ 0.11	✅
mcp-csharp-publish	Publish to the MCP Registry	2.7/5 → 4.0/5 🟢	✅ mcp-csharp-publish; tools: skill, view, report_intent / ✅ mcp-csharp-publish; tools: skill, view, report_intent	✅ 0.11	✅

[1] (Plugin) Quality unchanged but weighted score is -10.7% due to: tokens (12072 → 29517), quality, tool calls (0 → 2)
[2] (Isolated) Quality improved but weighted score is -31.4% due to: quality, judgment, tokens (12175 → 27900), tool calls (0 → 2)

⏰ timeout — run(s) hit the (360s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

📖 See InvestigatingResults.md for how to diagnose failures. Additional debugging guidance may be provided by your workflow.

🔍 Full results — includes quality and agent details

To investigate failures, paste this to your AI coding agent:

Download eval artifacts with gh run download 23560926225 --repo dotnet/skills --dir /tmp/eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/c7f8110f791582174804a80f6a2ce1e0d656cfb7/eng/skill-validator/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

leslierichardson95 added 2 commits March 10, 2026 10:55

leslierichardson95 requested review from dbreshears and timheuer as code owners March 10, 2026 18:56

Copilot AI review requested due to automatic review settings March 10, 2026 18:56

Copilot started reviewing on behalf of leslierichardson95 March 10, 2026 18:57 View session

Add CODEOWNERS entries for MCP C# skills (create, debug, publish, test)

c0bd6b6

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Update plugins/dotnet/skills/mcp-csharp-test/references/test-patterns.md

e1afc18

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 10, 2026 19:02

Copilot started reviewing on behalf of leslierichardson95 March 10, 2026 19:03 View session

leslierichardson95 and others added 2 commits March 10, 2026 12:03

Update plugins/dotnet/skills/mcp-csharp-debug/SKILL.md

45423dc

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update plugins/dotnet/skills/mcp-csharp-publish/SKILL.md

a3636d2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Update plugins/dotnet/skills/mcp-csharp-publish/references/nuget-pack…

9966e31

…aging.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 10, 2026 19:10

Copilot started reviewing on behalf of leslierichardson95 March 10, 2026 19:10 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread plugins/dotnet-ai/skills/mcp-csharp-create/references/api-patterns.md

Update plugins/dotnet/skills/mcp-csharp-debug/SKILL.md

afcc456

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 10, 2026 19:16

Copilot started reviewing on behalf of leslierichardson95 March 10, 2026 19:16 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread plugins/dotnet/skills/mcp-csharp-test/references/evaluation-guide.md Outdated

Comment thread plugins/dotnet/skills/mcp-csharp-test/references/evaluation-guide.md Outdated

leslierichardson95 and others added 4 commits March 10, 2026 12:26

Remove evaluation step and guide from mcp-csharp-test skill

e96bf6c

Merge branch 'main' into lerich/mcp-skills

316fcc8

Add CODEOWNERS entries for dotnet-maui skills

78ccf19

Merge branch 'lerich/mcp-skills' of https://github.com/leslierichards…

bcd29a5

…on95/skills into lerich/mcp-skills

Copilot AI review requested due to automatic review settings March 10, 2026 19:47

Copilot started reviewing on behalf of leslierichardson95 March 10, 2026 19:48 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread plugins/dotnet-ai/skills/mcp-csharp-test/references/test-patterns.md

Comment thread plugins/dotnet-ai/skills/mcp-csharp-test/references/test-patterns.md

Comment thread .github/CODEOWNERS

Merge branch 'main' into lerich/mcp-skills

9d69851

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Comment thread eng/known-domains.txt Outdated

Comment thread .github/CODEOWNERS

Update eng/known-domains.txt

94c2710

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 25, 2026 02:52

Copilot started reviewing on behalf of danmoseley March 25, 2026 02:52 View session

danmoseley mentioned this pull request Mar 25, 2026

Add evaluation troubleshooting guide for AI agents #440

Closed

Copilot AI reviewed Mar 25, 2026

View reviewed changes

danmoseley mentioned this pull request Mar 25, 2026

Add evaluation troubleshooting guide for AI agents #441

Merged

This was referenced Mar 25, 2026

[TEST] Combined eval troubleshooting guide + MCP C# skills #442

Closed

[TEST] Combined eval troubleshooting guide + MCP C# skills #443

Closed

github-actions Bot mentioned this pull request Mar 25, 2026

🏥 Repository Health Dashboard #288

Open

Merge branch 'main' into lerich/mcp-skills

8a95d95

Copilot AI review requested due to automatic review settings March 25, 2026 16:57

Copilot started reviewing on behalf of danmoseley March 25, 2026 16:57 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

Comment thread tests/dotnet-ai/mcp-csharp-test/eval.yaml

danmoseley enabled auto-merge (squash) March 25, 2026 20:11

danmoseley approved these changes Mar 25, 2026

View reviewed changes

danmoseley merged commit 7286c31 into dotnet:main Mar 25, 2026
30 checks passed

lewing mentioned this pull request Mar 25, 2026

mcp-csharp-create: add tool description quality and naming guidance #450

Closed

Conversation

leslierichardson95 commented Mar 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Validation Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

danmoseley commented Mar 25, 2026

Uh oh!

danmoseley commented Mar 25, 2026

Eval Results Analysis (run 23522050726)

Trend across runs

What I fixed (just pushed to lerich/mcp-skills)

Remaining failures — what needs attention

Pairwise judge variance (no action needed)

Skill content gaps (for @leslierichardson95)

Recommended next steps

Uh oh!

danmoseley commented Mar 25, 2026

Uh oh!

danmoseley commented Mar 25, 2026

Uh oh!

danmoseley commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

github-actions Bot commented Mar 25, 2026

Skill Validation Results

Uh oh!

danmoseley commented Mar 25, 2026

What I fixed (just pushed to `lerich/mcp-skills`)

Failure 1: `mcp-csharp-debug` / "Debug an MCP server with MCP Inspector"

Failure 2: `mcp-csharp-debug` / "Debug a failing MCP server tool"

Failure 3: `mcp-csharp-test` / "Test HTTP MCP server with WebApplicationFactory"

Failure 4: `mcp-csharp-test` / "Create evaluations for an MCP server"