Skip to content

Generalize conflict detection to all classification fields#84

Merged
NoopDog merged 4 commits intomainfrom
noopdog/81-generalize-conflict-detection
Mar 27, 2026
Merged

Generalize conflict detection to all classification fields#84
NoopDog merged 4 commits intomainfrom
noopdog/81-generalize-conflict-detection

Conversation

@NoopDog
Copy link
Copy Markdown
Collaborator

@NoopDog NoopDog commented Mar 27, 2026

Summary

  • Extends conflict detection from reference_assembly only to all 5 classification fields (data_modality, data_type, platform, reference_assembly, assay_type)
  • Tier-aware: higher-tier rules can still override lower-tier values (tier 3 header refining tier 2 filename is expected). Only same-tier disagreements trigger conflicts.
  • Tracks which tier set each field via _field_set_by_tier dict
  • Generalizes conflict evidence rule ID from hardcoded conflicting_reference_rules to conflicting_{field}_rules

Fixes the minimap2-vs-basecall-model bug where program_minimap2 (genomic, tier 3) silently overwrote ont_basecall_rna (transcriptomic, tier 3). Now correctly produces not_classified.

HPRC validation: identical results — zero accuracy regressions.

Also filed #83 for a pre-existing missing-evidence bug found during investigation (assay_type inference records no evidence, resulting in 12,913 files at confidence 0.0).

Closes #81

Test plan

  • 325 tests pass (3 new + 1 updated)
  • HPRC validation unchanged: 100% accuracy on all dimensions
  • FASTA assembly override still works (tier 1 → tier 2 refinement)
  • Reference assembly conflict still works (same-tier, existing behavior)
  • minimap2 + RNA basecall → not_classified (new, was broken)
  • Illumina + CCS platform conflict → not_classified (new)
  • Reports regenerated

🤖 Generated with Claude Code

NoopDog and others added 2 commits March 27, 2026 01:04
Previously only reference_assembly had conflict detection. Now all 5
classification fields (data_modality, data_type, platform,
reference_assembly, assay_type) detect same-tier conflicts.

Higher-tier rules can still override lower-tier values — a tier 3
header rule refining a tier 2 filename rule is expected behavior.
Two same-tier rules disagreeing on a field produces not_classified.

Fixes the minimap2-vs-basecall-model bug where program_minimap2
(genomic, tier 3) silently overwrote ont_basecall_rna
(transcriptomic, tier 3).

Closes #81

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No accuracy changes — same HPRC validation results as before.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 27, 2026 08:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the rule engine’s conflict detection from reference_assembly to all classification fields (data_modality, data_type, platform, reference_assembly, assay_type) with tier awareness, and updates tests/reports to reflect the new conflict behavior (e.g., minimap2 vs ONT RNA basecall).

Changes:

  • Generalize same-tier conflict detection to all classification fields and emit conflicting_{field}_rules evidence IDs.
  • Track which tier last set each field via _field_set_by_tier to allow higher-tier refinement without triggering conflicts.
  • Add/adjust tests for new conflict cases and regenerate validation/coverage report artifacts.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/meta_disco/rule_engine.py Implements tier-aware, per-field conflict detection and conflict evidence IDs.
tests/test_rule_engine.py Updates reference conflict evidence expectation and adds new conflict-related tests.
tests/test_header_classifier.py Adds header-based regression test for ONT RNA basecall vs minimap2; updates platform conflict expectation.
docs/validation-report.md Updates run timestamp.
docs/validation-dashboard.html Updates embedded validation data timestamp.
docs/anvil-coverage-report.md Updates run timestamp and conflict reason wording.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/meta_disco/rule_engine.py Outdated
Comment thread tests/test_rule_engine.py Outdated
NoopDog and others added 2 commits March 27, 2026 07:50
…engine tests

Conflict detection now appends the conflict marker to existing
evidence instead of replacing it, so both the original rule and
the conflict are visible. Unit tests now exercise the actual rule
engine with real filenames that trigger same-tier conflicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only change: conflict reason text for .xg and .chain files now shows
the first rule's reason instead of the conflict marker, since evidence
is preserved rather than replaced. Classification values unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 27, 2026 17:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/meta_disco/rule_engine.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Generalize conflict detection to all classification fields

2 participants