Generalize conflict detection to all classification fields#84
Merged
Conversation
Previously only reference_assembly had conflict detection. Now all 5 classification fields (data_modality, data_type, platform, reference_assembly, assay_type) detect same-tier conflicts. Higher-tier rules can still override lower-tier values — a tier 3 header rule refining a tier 2 filename rule is expected behavior. Two same-tier rules disagreeing on a field produces not_classified. Fixes the minimap2-vs-basecall-model bug where program_minimap2 (genomic, tier 3) silently overwrote ont_basecall_rna (transcriptomic, tier 3). Closes #81 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No accuracy changes — same HPRC validation results as before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the rule engine’s conflict detection from reference_assembly to all classification fields (data_modality, data_type, platform, reference_assembly, assay_type) with tier awareness, and updates tests/reports to reflect the new conflict behavior (e.g., minimap2 vs ONT RNA basecall).
Changes:
- Generalize same-tier conflict detection to all classification fields and emit
conflicting_{field}_rulesevidence IDs. - Track which tier last set each field via
_field_set_by_tierto allow higher-tier refinement without triggering conflicts. - Add/adjust tests for new conflict cases and regenerate validation/coverage report artifacts.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/meta_disco/rule_engine.py |
Implements tier-aware, per-field conflict detection and conflict evidence IDs. |
tests/test_rule_engine.py |
Updates reference conflict evidence expectation and adds new conflict-related tests. |
tests/test_header_classifier.py |
Adds header-based regression test for ONT RNA basecall vs minimap2; updates platform conflict expectation. |
docs/validation-report.md |
Updates run timestamp. |
docs/validation-dashboard.html |
Updates embedded validation data timestamp. |
docs/anvil-coverage-report.md |
Updates run timestamp and conflict reason wording. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…engine tests Conflict detection now appends the conflict marker to existing evidence instead of replacing it, so both the original rule and the conflict are visible. Unit tests now exercise the actual rule engine with real filenames that trigger same-tier conflicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Only change: conflict reason text for .xg and .chain files now shows the first rule's reason instead of the conflict marker, since evidence is preserved rather than replaced. Classification values unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This was referenced Mar 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
reference_assemblyonly to all 5 classification fields (data_modality,data_type,platform,reference_assembly,assay_type)_field_set_by_tierdictconflicting_reference_rulestoconflicting_{field}_rulesFixes the minimap2-vs-basecall-model bug where
program_minimap2(genomic, tier 3) silently overwroteont_basecall_rna(transcriptomic, tier 3). Now correctly producesnot_classified.HPRC validation: identical results — zero accuracy regressions.
Also filed #83 for a pre-existing missing-evidence bug found during investigation (assay_type inference records no evidence, resulting in 12,913 files at confidence 0.0).
Closes #81
Test plan
not_classified(new, was broken)not_classified(new)🤖 Generated with Claude Code