feat: add security audit agent by rohitthink · Pull Request #12 · rohitthink/freefile

rohitthink · 2026-04-15T13:37:00Z

Description

Adds a three-part security audit agent that scans FreeFile for PII leaks, exposed secrets, and GitHub configuration loopholes. Motivated by open-sourcing the repo and wanting continuous assurance that nothing sensitive has leaked.

What's included

1. `scripts/security_audit.py` — deterministic scanner

Scans committed files for 17 pattern families (AWS/GitHub/OpenAI/Anthropic/Slack/Stripe tokens, PEM private keys, Indian PAN/Aadhaar/GSTIN/IFSC/mobile, JWT, generic secret assignments)
Scans git history via git log -S for known personal values
Scans untracked surface for risky extensions not covered by .gitignore
Audits .gitignore coverage against a baseline of critical patterns
Audits GitHub repo settings via gh api: secret scanning, push protection, Dependabot alerts + security updates, branch protection, webhooks
Optional external check (--check-external) to search other public GitHub repos for known personal values
CLI: --write-report, --check-external, --repo, --fail-on={warn,critical}, --json-only
Exit codes: 0 clean, 1 warnings (med/low), 2 critical/high

2. `scripts/security_patterns.json` — regex library

Each pattern supports:

placeholder_values — exact matches to suppress (e.g., `ABCDE1234F`, `9876543210`)
allowlist_context — line keywords that suppress matches (e.g., `test`, `example`, `your_`)

3. `scripts/security_allowlist.json` — intentional exposures

File-glob + rule + optional value combinations that are known-intentional. Each entry requires a `reason` string for auditability. Supports recursive `**` globs.

Current allowlist covers:

Maintainer name + email in `docs/launch/claude-for-oss-application.md` (required for OSS application)
Maintainer email in `docs/launch/oss-strengthening-playbook.md` (already public)
Placeholder PAN `ABCDE1234F` in frontend components and tests
Placeholder mobile in tests
History entries for the above

4. Private config (gitignored)

`.security/known_values.json` — user's real PAN, name, email for literal matching. Never committed.
`.security/known_values.json.example` — template (committed) so others can set up

5. Expanded .gitignore

Adds: `.sqlite`, `.sqlite3`, `.key`, `.pem`, `.pfx`, `.p12`, `secrets.json`, `credentials.json`, `.ssh/`, plus `.security/` state files and `.oss-metrics-history.jsonl`.

Verification performed

Test	Expected	Actual
Clean run on current HEAD	0 findings, exit 0	✅ 0 findings, exit 0
Planted AWS/Anthropic/GitHub/PAN secrets	5 findings, exit 2	✅ 5 findings, exit 2
Allowlist filters placeholder PANs	5 suppressed	✅ 5 suppressed
`` glob matching (`tests//*.py`)	Works	✅ Fixed via custom glob_to_regex
py_compile	OK	✅ OK

One-time hardening (already applied outside this PR)

gh api -X PUT repos/rohitthink/freefile/vulnerability-alerts
gh api -X PUT repos/rohitthink/freefile/automated-security-fixes

Dependabot alerts + automated security updates are now enabled (were disabled before this work).

Scheduled monitoring

A weekly `freefile-security-audit` scheduled task runs Mondays at 08:43 local time:

Runs the scanner with `--write-report`
Diffs against last week's snapshot in `.security/audit-history/`
Archives today's report as `YYYY-MM-DD.json`
Only notifies if new unexpected findings appear, a CRITICAL persists, or GitHub config drifts from baseline
Never auto-modifies source or git history — recommendations only

How to use locally

# One-time setup: create the known-values file with your real PII
cp .security/known_values.json.example .security/known_values.json
# edit .security/known_values.json with your real PAN/name/email (it's gitignored)

# Run the audit
./venv/bin/python scripts/security_audit.py --write-report

# Optional: also check other public GitHub repos for your personal values
./venv/bin/python scripts/security_audit.py --write-report --check-external

Type of Change

New feature
Tests (planted-secret verification)
Documentation (this PR + docstrings)

Checklist

`python -m py_compile scripts/security_audit.py` — OK
Scanner runs cleanly on current HEAD (0 findings)
Scanner correctly catches planted secrets (exit 2)
No new code style issues
`.security/known_values.json` confirmed gitignored (contains real PII)
`scripts/security_audit.py` and `scripts/security_patterns.json` self-exclude from scanning (they contain their own pattern strings)

Introduces a three-part security audit system to detect PII leaks and configuration loopholes: 1. scripts/security_audit.py — deterministic Python scanner that checks committed files, git history, untracked surface, .gitignore coverage, and GitHub repo security settings. Supports --write-report, --check- external, --repo, --fail-on, and --json-only flags. Exits 2 on CRITICAL/HIGH, 1 on warnings, 0 on clean. 2. scripts/security_patterns.json — regex library with severity ratings for AWS/GitHub/OpenAI/Anthropic/Slack/Stripe tokens, PEM private keys, Indian PAN/Aadhaar/GSTIN/IFSC/mobile, JWT, and generic secret assignments. Each pattern supports placeholder_values to filter known-good test fixtures and allowlist_context to filter by line keywords. 3. scripts/security_allowlist.json — explicit list of intentional exposures (e.g., maintainer name/email in OSS grant application docs, placeholder PAN values in UI components and tests) with required reason strings for auditability. Uses file-globs with ** support. The scanner also reads .security/known_values.json (gitignored) to search for the user's real PAN/name/email via literal match in committed files and git history — far more reliable than regex alone. A .security/known_values.json.example template ships in the repo. .gitignore is expanded to close gaps found during initial scan: *.sqlite, *.sqlite3, *.key, *.pem, *.pfx, *.p12, secrets.json, credentials.json, .ssh/, plus the .security/ state files. A weekly scheduled task (freefile-security-audit, Mondays 08:43 local) runs the scanner, diffs against last week's report, and only notifies when new unexpected findings appear, a CRITICAL persists, or GitHub config drifts from baseline. Verification: - Dry clean run on current HEAD: 0 findings, 5 allowlisted - Planted-secret test (AWS/Anthropic/GitHub tokens + non-placeholder PAN in scripts/_test_secret_plant.py): 5 findings detected, exit 2 - Dependabot alerts + automated security updates enabled via gh api (one-time hardening action) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The backend CI job has been failing since the workflow was introduced in e52b4a2 — two pre-existing bugs surfaced on this PR's CI run: 1. pytest, pytest-asyncio, and httpx are not in requirements.txt, so `pip install -r requirements.txt` leaves them uninstalled and the subsequent `pytest tests/` step fails with "command not found". 2. Even with pytest installed, plain `pytest` invocation doesn't add the project root to sys.path, so `from backend.main import app` in tests/conftest.py fails with ModuleNotFoundError. Fixes: - Add pytest>=8.0.0, pytest-asyncio>=0.23.0, httpx>=0.27.0 to requirements.txt (consistent with playwright already being there). - Add a minimal pytest.ini with `pythonpath = .` so tests can import the backend package from the project root regardless of how pytest is invoked. Verified locally: all 39 tests pass with plain `pytest tests/`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rohitthink · 2026-04-15T13:55:29Z

CI fix added (`ab5e464`)

Pushing this PR surfaced a pre-existing CI bug introduced by e52b4a2 when the workflow was first added. Two issues:

pytest, pytest-asyncio, and httpx were missing from requirements.txt, so pip install -r requirements.txt in CI left them uninstalled and the test step failed with 'pytest: command not found' (which is why CI has been silently red on all prior commits).
Even with pytest installed, plain pytest invocation doesn't add the project root to sys.path, so from backend.main import app in tests/conftest.py would fail with ModuleNotFoundError.

Added:

Test deps to requirements.txt
Minimal pytest.ini with pythonpath = .

Verified locally: all 39 tests pass with plain pytest tests/. Scope-wise this is orthogonal to the security audit agent, but the CI was blocking this PR and shipping both together unblocks everything else too.

Add show-hn-post.md with the title, body, technical first comment, and pre-launch checklist for the Day 6 Show HN submission (playbook §5.3). Mark awesome-privacy #500 and awesome-fastapi #281 as CLOSED in the PR tracker; both were closed 2026-04-15. #500 was rejected for repo being too new (<16 weeks) and 0 stars — plan to resubmit at Day 14+ once metrics improve. Active monitoring narrowed to 3 PRs (#2, #663, #766).

ro4git25 and others added 2 commits April 15, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add security audit agent#12

feat: add security audit agent#12
rohitthink wants to merge 3 commits intomainfrom
feat/security-audit-agent

rohitthink commented Apr 15, 2026

Uh oh!

rohitthink commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rohitthink commented Apr 15, 2026

Description

What's included

1. scripts/security_audit.py — deterministic scanner

2. scripts/security_patterns.json — regex library

3. scripts/security_allowlist.json — intentional exposures

4. Private config (gitignored)

5. Expanded .gitignore

Verification performed

One-time hardening (already applied outside this PR)

Scheduled monitoring

How to use locally

Type of Change

Checklist

Uh oh!

rohitthink commented Apr 15, 2026

CI fix added (ab5e464)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `scripts/security_audit.py` — deterministic scanner

2. `scripts/security_patterns.json` — regex library

3. `scripts/security_allowlist.json` — intentional exposures

CI fix added (`ab5e464`)