chore: reorganize dedupe code by valentijnscholten · Pull Request #14641 · DefectDojo/django-DefectDojo

valentijnscholten · 2026-04-05T13:10:44Z

Summary

Refactors deduplication code to allow for extensions in Pro.

It has some other open PRs merged in to it to be able to do representative tests and performance measurements.

Expose match_batch_* and match_batch_of_findings for read-only matching. Support unsaved findings in location/endpoint comparison and _is_candidate_older. Refactor default_importer close_old_findings to use get_close_old_findings_queryset. Restore batch deduplication debug logging.

…m_db queries. Replace per-finding refresh_from_db(false_p, risk_accepted, out_of_scope) with one values() query for all PKs and assign onto instances, falling back to refresh_from_db when a row is missing.

…r for performance Passing tags= directly to the Finding() constructor triggers expensive tagulous processing for every finding. Using finding.unsaved_tags instead bypasses this overhead and lets the import pipeline handle tags efficiently. Affected parsers: jfrog_xray_unified, dependency_check, cargo_audit, anchore_grype, threat_composer. Benchmark on 14,219 findings: 99s -> 7.97s (12x faster).

…ring Update tests for dependency_check and jfrog_xray_unified parsers to match the actual list format returned by unsaved_tags, and fix the expected order of tags for the suppressed-without-notes case in dependency_check.

…dings Tags from the report were being appended to matched findings via tags.add(), causing tags to accumulate across reimports instead of being left unchanged. This aligns tag handling with how other finding fields are treated on reimport. Closes DefectDojo#14606

github-actions · 2026-04-13T17:45:39Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

github-actions · 2026-04-13T18:28:35Z

Conflicts have been resolved. A maintainer will review the pull request shortly.

dryrunsecurity · 2026-04-13T19:27:56Z

This pull request introduces two high-severity potential cross-site scripting issues where various fields (advisory fields in dojo/tools/cargo_audit/parser.py and several variables in dojo/tools/anchore_grype/parser.py) are interpolated directly into Markdown/HTML-like output without escaping or sanitization, which could allow injection if those values are attacker-controlled and later rendered as HTML (or converted from Markdown) with escaping disabled. Both findings are flagged as risky but non-blocking and should be remediated by properly escaping or sanitizing user-controllable inputs before formatting into HTML/Markdown.

🟠 Potential Cross-Site Scripting in dojo/tools/cargo_audit/parser.py (drs_6b340cd6)

Vulnerability	Potential Cross-Site Scripting
Description	The patch builds Markdown/HTML-like strings by interpolating advisory fields (description, categories, affected function names/versions, references) directly into formatted text without any escaping or sanitization. If those advisory fields can contain attacker-controlled input and are later rendered into HTML with auto-escaping disabled (or converted from Markdown to HTML without sanitization), this can lead to XSS.

django-DefectDojo/dojo/tools/cargo_audit/parser.py

Lines 83 to 86 in 96aed74

    
           description = categories + f"\n**Description:** `{advisory.get('description')}`" 
        
           if item["affected"] is not None and "functions" in item["affected"]: 
        
               affected_func = [

🟠 Potential Cross-Site Scripting in dojo/tools/anchore_grype/parser.py (drs_d79233ca)

Vulnerability	Potential Cross-Site Scripting
Description	The code builds markdown-like strings by directly interpolating variables (vuln_datasource, vuln_urls, rel_datasource, rel_urls) into finding_references without any escaping or sanitization. If any of these values can contain attacker-controlled input and are later rendered into an HTML context without escaping (or rendered as raw HTML), this allows injection of malicious markup/JS (XSS).

django-DefectDojo/dojo/tools/anchore_grype/parser.py

Lines 144 to 147 in 96aed74

    
                   finding_references += f"**Vulnerability URL:** {vuln_urls[0]}\n" 
        
           else: 
        
               finding_references += "**Vulnerability URLs:**\n" 
        
               for url in vuln_urls:

Comment to provide feedback on these findings.

Report false positive: @dryrunsecurity fp [FINDING ID] [FEEDBACK]
Report low-impact: @dryrunsecurity nit [FINDING ID] [FEEDBACK]

Example: @dryrunsecurity fp drs_90eda195 This code is not user-facing

All finding details can be found in the DryRun Security Dashboard.

valentijnscholten added 9 commits April 5, 2026 00:09

Batch-refresh close_old_findings status fields to avoid N refresh_fro…

1678591

…m_db queries. Replace per-finding refresh_from_db(false_p, risk_accepted, out_of_scope) with one values() query for all PKs and assign onto instances, falling back to refresh_from_db when a row is missing.

docs: cite DefectDojo#12291 for close_old_findings status refresh origin

3d59de2

perf: chunk close_old_findings status sync queries (1000 PKs per SELECT)

28053c6

fix: resolve ruff D203 and COM812 lint errors from formatter conflict

498178e

fix: update tests to check unsaved_tags instead of tags

092d3e3

github-actions bot added unittests parser labels Apr 5, 2026

valentijnscholten changed the title ~~perf: fix watson async index update never actually being async~~ perf(reimport): batch close_old_findings queries and use unsaved_tags in parsers Apr 5, 2026

valentijnscholten changed the title ~~perf(reimport): batch close_old_findings queries and use unsaved_tags in parsers~~ feat: add import and reimport scan preview support Apr 5, 2026

valentijnscholten changed the title ~~feat: add import and reimport scan preview support~~ chore: reorganize dedupe code Apr 5, 2026

Merge remote-tracking branch 'upstream/dev' into feature/import-preview

acdc636

github-actions bot added the conflicts-detected label Apr 13, 2026

Merge branch 'dev' into feature/import-preview

96aed74

github-actions bot removed the unittests label Apr 13, 2026

github-actions bot removed the conflicts-detected label Apr 13, 2026

valentijnscholten marked this pull request as ready for review April 13, 2026 19:27

valentijnscholten requested review from Maffooch and mtesauro as code owners April 13, 2026 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: reorganize dedupe code#14641

chore: reorganize dedupe code#14641
valentijnscholten wants to merge 11 commits intoDefectDojo:devfrom
valentijnscholten:feature/import-preview

valentijnscholten commented Apr 5, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

dryrunsecurity bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valentijnscholten commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

dryrunsecurity bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

valentijnscholten commented Apr 5, 2026 •

edited

Loading