chore: reorganize dedupe code#14641
chore: reorganize dedupe code#14641valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
Conversation
Expose match_batch_* and match_batch_of_findings for read-only matching. Support unsaved findings in location/endpoint comparison and _is_candidate_older. Refactor default_importer close_old_findings to use get_close_old_findings_queryset. Restore batch deduplication debug logging.
…m_db queries. Replace per-finding refresh_from_db(false_p, risk_accepted, out_of_scope) with one values() query for all PKs and assign onto instances, falling back to refresh_from_db when a row is missing.
…r for performance Passing tags= directly to the Finding() constructor triggers expensive tagulous processing for every finding. Using finding.unsaved_tags instead bypasses this overhead and lets the import pipeline handle tags efficiently. Affected parsers: jfrog_xray_unified, dependency_check, cargo_audit, anchore_grype, threat_composer. Benchmark on 14,219 findings: 99s -> 7.97s (12x faster).
…ring Update tests for dependency_check and jfrog_xray_unified parsers to match the actual list format returned by unsaved_tags, and fix the expected order of tags for the suppressed-without-notes case in dependency_check.
…dings Tags from the report were being appended to matched findings via tags.add(), causing tags to accumulate across reimports instead of being left unchanged. This aligns tag handling with how other finding fields are treated on reimport. Closes DefectDojo#14606
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Conflicts have been resolved. A maintainer will review the pull request shortly. |
|
This pull request introduces two high-severity potential cross-site scripting issues where various fields (advisory fields in dojo/tools/cargo_audit/parser.py and several variables in dojo/tools/anchore_grype/parser.py) are interpolated directly into Markdown/HTML-like output without escaping or sanitization, which could allow injection if those values are attacker-controlled and later rendered as HTML (or converted from Markdown) with escaping disabled. Both findings are flagged as risky but non-blocking and should be remediated by properly escaping or sanitizing user-controllable inputs before formatting into HTML/Markdown.
🟠 Potential Cross-Site Scripting in
|
| Vulnerability | Potential Cross-Site Scripting |
|---|---|
| Description | The patch builds Markdown/HTML-like strings by interpolating advisory fields (description, categories, affected function names/versions, references) directly into formatted text without any escaping or sanitization. If those advisory fields can contain attacker-controlled input and are later rendered into HTML with auto-escaping disabled (or converted from Markdown to HTML without sanitization), this can lead to XSS. |
django-DefectDojo/dojo/tools/cargo_audit/parser.py
Lines 83 to 86 in 96aed74
🟠 Potential Cross-Site Scripting in dojo/tools/anchore_grype/parser.py (drs_d79233ca)
| Vulnerability | Potential Cross-Site Scripting |
|---|---|
| Description | The code builds markdown-like strings by directly interpolating variables (vuln_datasource, vuln_urls, rel_datasource, rel_urls) into finding_references without any escaping or sanitization. If any of these values can contain attacker-controlled input and are later rendered into an HTML context without escaping (or rendered as raw HTML), this allows injection of malicious markup/JS (XSS). |
django-DefectDojo/dojo/tools/anchore_grype/parser.py
Lines 144 to 147 in 96aed74
Comment to provide feedback on these findings.
Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]
Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing
All finding details can be found in the DryRun Security Dashboard.
Summary
Refactors deduplication code to allow for extensions in Pro.
It has some other open PRs merged in to it to be able to do representative tests and performance measurements.