Skip to content

Teycir/DiffCatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

55 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Support Development

If this project helps your work, support ongoing maintenance and new features.

ETH Donation Wallet
0x11282eE5726B3370c8B480e321b3B2aA13686582

Ethereum donation QR code

Scan the QR code or copy the wallet address above.

DiffCatcher Logo

Rust License Security

A Rust CLI tool that recursively discovers Git repositories, captures state changes, generates diffs, extracts code elements with full snippets, and produces security-focused reports for code review and audit workflows.


🎯 Key Features

  • Repository Discovery: Recursively scan directories for Git repos with configurable filters
  • State Tracking: Capture pre/post-pull state with commit hashes, messages, and dirty detection
  • Diff Generation: Automatic N vs N-1 and historical diff creation with file manifests
  • Element Extraction: Parse diffs to identify functions, structs, classes, imports, and more across 10+ languages
  • Code Snippets: Extract full before/after code with boundary detection and context windows
  • Security Tagging: 18 built-in security patterns (crypto, auth, secrets, SQL injection, XSS, etc.)
  • Multi-Format Reports: JSON, Markdown, text, and SARIF outputs with cross-repo security overview
  • Branch-Diff Mode: Diff any two refs (branches, tags, commits) in a single repo β€” ideal for PR reviews
  • Performance: Parallel processing with progress bars, LRU caching, and incremental mode

Why not just use bash?

A one-liner like ls | while read line; do git -C "$line" diff HEAD~1 HEAD || true; done only shows raw diffs. DiffCatcher adds recursive discovery, code element extraction, security pattern detection, SARIF output for CI/CD, parallel processing, and cross-repo security aggregation. See full comparison below.

πŸ“‹ Table of Contents

πŸš€ Installation

From Source

git clone https://github.com/Teycir/DiffCatcher.git
cd DiffCatcher
cargo build --release
./target/release/diffcatcher --help

Requirements

  • Rust 1.70+
  • Git 2.0+

⚑ Quick Start

# Scan all repos in a directory (fetch-only, no modifications)
diffcatcher ~/projects

# Pull updates and generate security report
diffcatcher ~/projects --pull -o ./report

# Diff two branches in a single repo (PR review mode)
diffcatcher ./my-repo --diff main..feature/auth -o ./pr-report

# Generate SARIF output for GitHub Code Scanning
diffcatcher ~/projects --summary-format sarif,json -o ./report

# Dry run to see what would be scanned
diffcatcher ~/projects --dry-run

# Fast scan with 8 parallel workers
diffcatcher ~/projects -j 8 --quiet

πŸ“– Usage

Basic Scanning

# Scan with default settings (fetch-only)
diffcatcher <ROOT_DIR>

# Custom output directory
diffcatcher ~/projects -o ./my-report

# Include nested repos and follow symlinks
diffcatcher ~/projects --nested --follow-symlinks

# Skip hidden directories
diffcatcher ~/projects --skip-hidden

Pull Modes

# Fetch only (default - no working tree changes)
diffcatcher ~/projects

# Actually pull changes
diffcatcher ~/projects --pull

# Force pull with stash/pop for dirty repos
diffcatcher ~/projects --pull --force-pull

# Use rebase strategy
diffcatcher ~/projects --pull --pull-strategy rebase

# Skip fetch/pull entirely (historical diffs only)
diffcatcher ~/projects --no-pull

Extraction Options

# Skip element extraction (raw diffs only)
diffcatcher ~/projects --no-summary-extraction

# Extract elements but skip code snippets
diffcatcher ~/projects --no-snippets

# Adjust snippet context and limits
diffcatcher ~/projects --snippet-context 10 --max-snippet-lines 300

# Limit elements per diff
diffcatcher ~/projects --max-elements 1000

Security Tagging

# Skip security tagging
diffcatcher ~/projects --no-security-tags

# Include test files in security analysis
diffcatcher ~/projects --include-test-security

# Use custom security patterns
diffcatcher ~/projects --security-tags-file ./custom-patterns.json

Configuration File

DiffCatcher can auto-load project-local configuration from:

  • <ROOT_DIR>/.diffcatcher.toml (default)
  • a custom file via --config <FILE>
  • disabled with --no-config

Example:

output = "reports-local"
no_pull = true
history_depth = 2
summary_formats = ["json", "txt"]
no_security_tags = false

[plugins]
security_pattern_files = ["plugins/security-extra.json"]
extractor_files = ["plugins/extractors.json"]

CLI flags still override config values when explicitly set.

Plugin System

DiffCatcher supports two plugin types:

  • Security pattern plugins via --security-plugin-file <FILE> (repeatable)
  • Extractor plugins via --extractor-plugin-file <FILE> (repeatable)

Security plugin format matches --security-tags-file JSON (version, mode, tags).

Extractor plugin format:

{
  "version": 1,
  "extractors": [
    {
      "name": "policy-rule",
      "kind": "Config",
      "regex": "^policy\\s+([A-Za-z_][A-Za-z0-9_]*)"
    }
  ]
}

Branch-Diff Mode (PR Review)

# Diff two branches in a single repo
diffcatcher ./my-repo --diff main..feature/auth

# Diff specific commits
diffcatcher ./my-repo --diff abc123..def456

# Diff with SARIF output for CI integration
diffcatcher ./my-repo --diff origin/main..HEAD --summary-format sarif -o ./pr-report

The --diff BASE..HEAD flag skips repository discovery and fetch/pull β€” it directly diffs two refs (branches, tags, or commit SHAs) and runs the full extraction + security tagging pipeline on the result.

SARIF Output

# Generate SARIF alongside other formats
diffcatcher ~/projects --summary-format sarif,json,md

# SARIF-only for CI/CD upload
diffcatcher ~/projects --summary-format sarif -o ./report

When sarif is included in --summary-format, a results.sarif file is written to the report root. This file follows the SARIF 2.1.0 standard and integrates with GitHub Code Scanning, VS Code SARIF Viewer, Azure DevOps, and other SARIF-compatible tools.

Advanced Features

# Incremental mode (skip unchanged repos)
diffcatcher ~/projects --incremental -o ./report

# Filter by branch pattern
diffcatcher ~/projects --branch-filter "main"

# Adjust history depth
diffcatcher ~/projects --history-depth 5

# JSON output for CI/CD
diffcatcher ~/projects --quiet --json > result.json

# Verbose output with discovered paths
diffcatcher ~/projects --verbose

πŸ“ Report Structure

<report_dir>/
β”œβ”€β”€ summary.json                    # Global summary
β”œβ”€β”€ summary.md                      # Markdown summary
β”œβ”€β”€ results.sarif                   # SARIF 2.1.0 output (when --summary-format sarif)
β”œβ”€β”€ security_overview.json          # Cross-repo security aggregation
β”œβ”€β”€ security_overview.md
β”œβ”€β”€ <repo-name>/
β”‚   β”œβ”€β”€ status.json                 # Repo state
β”‚   β”œβ”€β”€ pull_log.txt
β”‚   └── diffs/
β”‚       β”œβ”€β”€ diff_N_vs_N-1.patch     # Raw unified diff
β”‚       β”œβ”€β”€ changes_N_vs_N-1.txt    # File manifest
β”‚       β”œβ”€β”€ summary_N_vs_N-1.json   # Element extraction
β”‚       β”œβ”€β”€ summary_N_vs_N-1.md
β”‚       └── snippets/
β”‚           β”œβ”€β”€ 001_validate_token_ADDED.rs
β”‚           β”œβ”€β”€ 002_check_permissions_BEFORE.rs
β”‚           β”œβ”€β”€ 002_check_permissions_AFTER.rs
β”‚           └── 002_check_permissions.diff
└── ...

βš™οΈ Configuration

CLI Flags

Flag Default Description
-o, --output ./reports/<timestamp> Report output directory
-j, --parallel 4 Concurrent repo processing
-t, --timeout 120 Git operation timeout (seconds)
-d, --history-depth 2 Historical commits to diff
--snippet-context 5 Context lines around changes
--max-snippet-lines 200 Max lines per snippet
--max-elements 500 Max elements per diff
--diff β€” Diff two refs in a single repo (BASE..HEAD)
--summary-format json,md Output formats: json, md, txt, sarif

See diffcatcher --help for all options.

Custom Security Patterns

Create a JSON file with custom patterns:

{
  "version": 1,
  "mode": "extend",
  "tags": [
    {
      "tag": "pii-handling",
      "description": "PII data processing",
      "severity": "High",
      "patterns": ["ssn", "social_security", "passport"]
    }
  ]
}

Use with --security-tags-file ./patterns.json

πŸ—οΈ Architecture

src/
β”œβ”€β”€ cli.rs              # Argument parsing
β”œβ”€β”€ scanner.rs          # Repository discovery
β”œβ”€β”€ git/                # Git operations
β”‚   β”œβ”€β”€ commands.rs     # Git wrappers
β”‚   β”œβ”€β”€ state.rs        # State capture
β”‚   β”œβ”€β”€ diff.rs         # Diff generation
β”‚   └── file_retrieval.rs
β”œβ”€β”€ extraction/         # Element extraction
β”‚   β”œβ”€β”€ parser.rs       # Unified diff parser
β”‚   β”œβ”€β”€ elements.rs     # Element detection
β”‚   β”œβ”€β”€ snippets.rs     # Code snippet extraction
β”‚   β”œβ”€β”€ boundary.rs     # Bracket/indentation tracking
β”‚   └── languages/      # Language-specific patterns
β”œβ”€β”€ security/           # Security tagging
β”‚   β”œβ”€β”€ tagger.rs       # Pattern matching
β”‚   β”œβ”€β”€ patterns.rs     # Built-in patterns
β”‚   └── overview.rs     # Cross-repo aggregation
└── report/             # Report generation
    β”œβ”€β”€ writer.rs       # Directory structure
    β”œβ”€β”€ json.rs         # JSON serialization
    β”œβ”€β”€ sarif.rs        # SARIF 2.1.0 output
    β”œβ”€β”€ markdown.rs     # Markdown formatting
    └── snippet_writer.rs

Bash Comparison

A simple bash one-liner can list diffs:

ls | while read line; do git -C "$line" diff HEAD~1 HEAD || true; done

This works for quick checks, but DiffCatcher adds significant capabilities:

Capability Bash One-Liner DiffCatcher
Recursive discovery Top-level items only Nested repos, symlinks, filters
State tracking None Commit hashes, dirty detection, pull logs
Code understanding Raw diff only Extracts functions/structs/classes across 10+ languages
Code snippets None Full before/after with context windows
Security analysis None 18 built-in patterns (auth, crypto, secrets, SQLi, XSS)
Output formats Terminal only JSON, Markdown, SARIF (GitHub Code Scanning)
Cross-repo view Per-repo only Aggregated security report across all repos
Performance Sequential Parallel workers, LRU caching, incremental mode
CI/CD integration None SARIF upload to GitHub/Azure DevOps
Error handling `
Path handling Fails on spaces Handles all path names correctly
Historical context Fixed HEAD~1 Configurable depth, state tracking

The bash one-liner is ~100 bytes. DiffCatcher is a security-focused audit tool with full code element extraction.

πŸ§ͺ Testing

# Run all tests
cargo test

# Run specific test suite
cargo test security_tagger

# Run with output
cargo test -- --nocapture

Test coverage includes:

  • Unit tests for extraction, security tagging, boundary detection
  • Integration tests for state capture, diff generation, reports
  • Golden-file tests for extraction accuracy
  • Edge case tests (detached HEAD, bare repos, single-commit)

Performance Benchmarks

# Compile benchmark binaries
cargo bench --no-run

# Run benchmark harness
cargo bench --bench core_bench

Benchmark source lives in benches/core_bench.rs and tracks parser/extraction throughput.

CI/CD

GitHub Actions workflows are included:

  • .github/workflows/ci.yml: format check, clippy, tests, bench build
  • .github/workflows/release.yml: tag-based release packaging and GitHub release publishing

πŸ“š Documentation

Project Documentation

  • Plan.md - Full specification (v1.2)
  • Roadmap.md - Implementation roadmap and progress
  • Security patterns reference (see src/security/patterns.rs)

Code Documentation

All modules include comprehensive inline documentation. Key modules:

  • src/extraction/parser.rs - Unified diff parser with hunk extraction
  • src/extraction/elements.rs - Language-aware code element detection
  • src/extraction/snippets.rs - Full code snippet extraction with boundary detection
  • src/security/tagger.rs - Security pattern matching engine
  • src/git/commands.rs - Git operation wrappers

Generate full API docs:

cargo doc --open

🏷️ Tags

#rust #git #security #code-review #diff-analysis #static-analysis #devops #cli-tool #audit #vulnerability-detection #code-quality #snippet-extraction #parallel-processing #security-scanning

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure cargo test passes
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ“§ Contact

πŸ”— Links

About

A Rust CLI tool that recursively discovers Git repositories, captures state changes, generates diffs, extracts code elements with full snippets, and produces security-focused reports for code review and audit workflows.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors