Helper Python scripts for the go-client-classifier project (antibot bypass tests, request-log statistics, dashboard payload, and manual assessment sampling).
- Python 3.12+
- Poetry
From the repo root:
cd tools/python
poetry installOr from any directory:
poetry install --directory tools/pythonAfter that the environment with dependencies is ready to use.
Run all commands from tools/python after poetry install, or via poetry run:
cd tools/python
poetry run python antibot_test.pyOr activate the shell and run scripts as usual:
cd tools/python
poetry shell
python antibot_test.pyFrom tools/python you can run the same checks as pre-commit (task check, trailing whitespace, end-of-file-fixer, check-yaml, check-added-large-files, black, isort, mypy, autoflake, ruff) over the whole repo:
cd tools/python
poetry run lintThis runs pre-commit from the repo root, so the result is identical to pre-commit run --all-files in the root. Config for black and isort: repo root pyproject.toml and tools/python/pyproject.toml (shared line_length and isort profile = "black" so they don’t conflict).
-
antibot_test.py — antibot detection bypass check via curl_cffi (TLS/HTTP2 fingerprint as Chrome/Safari). Dependency:
curl-cffi. -
build_dashboard_payload.py — builds the dashboard JSON consumed by the TS dashboard (
tools/ts/dashboard). Reads JSONL request logs (same format asrequest_log_stats.py:classification,timestamp,signals.score_breakdown, optionalrequest_metricsfor behavioural signals). Output:windows(hour, day, week, month, all),timeline(fixed 60 bars; granularity 10 s / 1 min / 10 min chosen by median total per bar—if median < threshold then 1 min, then 10 min so bars stay visible when traffic is sparse),signals(transport signals from score_breakdown + behavioural:req_per_min,gap_median,gap_std_mean,gap_mean_median), plustimeline_bucket_secandtimeline_window_secfor the UI. Records withouttimestampare skipped.poetry run python build_dashboard_payload.py "logs/**/requests_*.jsonl" poetry run python build_dashboard_payload.py -o dashboard.json "logs/**/*.jsonl" poetry run python build_dashboard_payload.py --timeline-minutes 10 "logs/**/requests_*.jsonl" poetry run python build_dashboard_payload.py --progress "logs/**/requests_*.jsonl"
Options:
-o/--output— output file (default: stdout);--timeline-minutes— timeline window in minutes for the initial 10 s granularity (default: 10);--progress— show tqdm progress bar (default: simple stderr log “Reading N file(s)…” / “Read M records.”). -
request_log_stats.py — statistics over request logs (JSONL) for bot detection methodology: top-N by fields (path, method, IP, user_agent, accept, JA3/JA4/JA4H, headers), bot/browser split, scoring signal prevalence, global summary (unique IPs/URLs). Metrics in the spirit of Cloudflare Signals Intelligence; optional significance filter (√N). Accounts for delivery channels (docs/nginx.md); unified interpretation behind proxy (signals.is_http2, fingerprint.tls). Details: docs/METHODOLOGY.md, Appendix J (Request log statistics and collection methodology).
Run (from
tools/pythonor repo root):poetry run python request_log_stats.py -n 20 "logs/**/requests_*.jsonl" poetry run python request_log_stats.py -n 10 -o report.txt --format text "logs/**/*.jsonl" poetry run python request_log_stats.py --format json -o stats.json "logs/**/requests_*.jsonl"
Options:
-n/--top— number of top values per field (default 15);-o— output file;--format text|json;--sort count|discriminative;--exclude-stress-tests— exclude go-http-client;--no-significance-filter— disable significance filter (√N). Record format: one JSON per line (tests/testdata/reference_browser.json). -
request_log_stats_by_class.py — same statistics as above but by group: all (optionally excluding stress tests), bot, browser. Input: one or more globs for JSON (single array of objects) or JSONL (one object per line). Output: file or stdout, text or JSON. Reuses aggregation and formatting from
request_log_stats.py.poetry run python request_log_stats_by_class.py -n 15 "logs/**/requests_*.jsonl" poetry run python request_log_stats_by_class.py -o report.txt "logs/**/*.json" "logs/**/*.jsonl" poetry run python request_log_stats_by_class.py --format json -o stats.json "logs/requests.jsonl" poetry run python request_log_stats_by_class.py --exclude-stress-tests "logs/**/requests_*.jsonl"
Options: same as
request_log_stats.pyplus--no-progress. Text output: three sections (ALL, BOT, BROWSER). JSON output:{"all": {...}, "bot": {...}, "browser": {...}}. -
behavioral_bars.py — splits values of four behavioural metrics (request_rate_per_min, inter_arrival_median_sec, inter_arrival_std_per_mean, inter_arrival_mean_median_ratio) from
request_metrics.ip_derivedinto 99 percentile bars (p01–p99). For each bar it outputs: total row count, rows classified as browser, as bot, bot−browser, and (bot−browser)/(bot+browser). Optionally generates and saves charts per parameter with the current edge threshold marked (METHODOLOGY Appendix M).poetry run python behavioral_bars.py "logs/requests.jsonl" poetry run python behavioral_bars.py -o report.json --charts-dir ./charts "logs/**/*.jsonl" poetry run python behavioral_bars.py --p-from 5 --p-to 95 "logs/requests.jsonl" poetry run python behavioral_bars.py --req-per-min 2.0 --gap-median-sec 4.0 "logs/**/requests_*.jsonl"
Options:
-o— output JSON (default: stdout);--charts-dir— directory for PNG charts;--p-from,--p-to— percentile bar range, 1-based (default: 1 and 99, i.e. p01–p99);--no-progress;--req-per-min,--gap-median-sec,--gap-std-mean,--gap-mean-median— edge thresholds for display on charts. -
sample_assessment.py — builds a random representative sample from request JSONL for manual FP/FN assessment. Excludes IPs in the top-10 and bottom-10 by request count and IPs with fewer than 2 requests; randomly selects 100 bot-labeled IPs and 100 browser-labeled IPs (configurable), then for each IP outputs the first 10 requests by time with: time, delta (ms) from previous request, classification, url, client, cookies, referrer. Console output is human-readable;
-owrites full JSON. See METHODOLOGY Appendix O.poetry run python sample_assessment.py "logs/requests.jsonl" poetry run python sample_assessment.py -o sample.json --json "logs/requests.jsonl" poetry run python sample_assessment.py --bot-n 50 --browser-n 50 --seed 42 "logs/**/requests_*.jsonl"
Options:
-o/--output— write full result JSON;--json— print JSON to stdout;--bot-n,--browser-n— number of IPs to sample per class (default 100 each);--seed— random seed for reproducibility.
The script is the recommended way to produce dashboard.json for the TS dashboard. Place the output in tools/ts/dashboard/public/dashboard.json for local dev, or serve it from the same origin (or set VITE_DASHBOARD_JSON_URL at build time). Cron example:
cd tools/python
poetry run python build_dashboard_payload.py -o /var/www/dashboard.json "logs/**/requests_*.jsonl"Managed via Poetry, see pyproject.toml. Main ones: curl-cffi, pandas, numpy, matplotlib (for behavioral_bars.py charts).
Adding a new dependency:
cd tools/python
poetry add <package>Updating the lock file after editing pyproject.toml:
poetry lock
poetry install