Academic research project for classifying automated HTTP clients (bots, LLMs, crawlers) vs real browsers using transport-level fingerprinting.
Version: 1.1.1 | Changelog | Methodology
Localhost (10s, 50 concurrent connections; Go server only, no nginx in front):
| Mode | RPS | RPM | Latency avg |
|---|---|---|---|
| HTTP (no TLS) | ~11,550 | ~693K | ~4.3 ms |
| HTTPS (TLS fingerprinting, JA3/JA4/JA4H) | ~8,210 | ~493K | ~6.1 ms |
Over the network (patched nginx + TLS termination + X-FP-* at edge, 10s, 50 concurrent, HTTPS):
| Endpoint | RPS | RPM | Latency avg |
|---|---|---|---|
| GET / (classify) | ~2,640 | ~158K | ~18.9 ms |
| GET /health | ~4,242 | ~255K | ~11.8 ms |
Summary (10s, c=50, same host): Localhost: 8–11K RPS, 4–6 ms avg. Over the network (nginx/TLS): classify ~2.6K RPS, ~18.9 ms; health ~4.2K RPS, ~11.8 ms — so classification adds ~7 ms vs health. With Redis + on-the-fly request metrics on the same host: ~1,620 RPS, 30.9 ms (vs classify without Redis: −39% RPS, +12 ms per request). Takeaways: classification cost ~7 ms over transport; Redis + stats add ~12 ms and ~40% throughput drop on endpoints that build request_metrics.
Create a single HTTP endpoint that classifies clients as browser or bot based exclusively on:
- TLS handshake patterns (JA3/JA4 fingerprinting)
- HTTP/2 negotiation behavior
- Header structure and semantics
- Request patterns
No JavaScript challenges, no rate limiting — pure network fingerprinting.
Phase 1 [COMPLETED] — TLS + HTTP fingerprinting: ClientHello capture, JA3/JA4/JA4H, TLS and HTTP signals in scoring, JA4H↔HTTP consistency (evasion detection), HTTPS server mode.
Phase 2 — HTTP/2: H2 fingerprint consumed from proxy (X-FP-H2); SETTINGS/PRIORITY/window come from nginx modules at the edge (e.g. nginx-http2-fingerprint) and are used in classification when present. No low-level H2 parsing in Go. Planned: H2/H3 ratio tracking. See docs/nginx.md and Methodology → Phase 2.
Phase 3 — Inconsistency detection: spatial (JA4H vs HTTP, TLS/HTTP version mismatch) in place. Planned: temporal inconsistency (same client, changing FPs), header–UA validation. See Methodology → Phase 3.
Redis & behavioural (Appendix L, M) — Optional Redis (REDIS_URL): (1) Challenge store — nonce→User-Agent stored in Redis so multiple instances share state; (2) Behavioural metrics — request counts and timestamps per IP and per __ch_nonce (sliding window). When behavioral_edges are set in scoring config, the classifier adds bot score for rate and inter-arrival conditions (Appendix M). /debug returns request_metrics for the current request. See docs/deploy/README.md, Methodology Appendix L, config/README.md.
See CHANGELOG.md for detailed release notes.
Direct TLS (Go terminates HTTPS):
client → TLS listener (Go) → fingerprint collector → classifier → response
Via nginx (TLS termination at edge, fingerprint via headers):
client → nginx (TLS + JA3 + H2 fingerprint) → proxy_pass → Go (HTTP :8080, X-FP-* headers) → collector → classifier → response
See docs/nginx.md and Methodology Appendix F.
- Core: Go (HTTP/2 server, TLS fingerprinting, classification)
- Analytics: Python (log analysis, pattern extraction). Request log statistics — tools/python/request_log_stats.py aggregates JSONL logs: top-N by path, method, IP, user agent, JA3/JA4/JA4H, headers; bot/browser breakdown; scoring-signal prevalence; optional significance filter (√N). See tools/python/README.md and Methodology Appendix J.
- Logging: Structured JSON logs per day (
logs/requests_YYYYMMDD.jsonl) for research analysis
.
├── cmd/
│ └── server/ # HTTP server entry point
├── internal/
│ ├── config/ # Scoring config loader (JSON → classifier + fingerprint)
│ ├── fingerprint/ # TLS/HTTP signal collection
│ ├── classifier/ # Rule-based classification
│ ├── logger/ # Structured JSON logging
│ ├── metrics/ # Behavioral metrics (Redis: per IP, per __ch_nonce; Appendix L)
│ └── server/ # HTTP handlers (challenge store, Redis wiring)
├── config/ # Scoring config (scoring.json, scoring.default.json, README)
├── tests/
│ ├── integration/ # Automated client tests
│ ├── unit/ # Unit tests
│ └── testdata/ # Test stubs (e.g. ja4db_fixture.json, reference_*.json)
├── tools/
│ ├── benchmark/ # HTTP benchmark tool
│ ├── python/ # Analytics tools
│ └── shell/ # Integration test scripts
├── internal/fingerprint/data/ # JA4 DB path (ja4db.json downloaded on first start if missing)
├── logs/ # JSON traffic logs (requests_YYYYMMDD.jsonl per day)
└── docs/ # Research documentation
- Full ClientHello capture via custom TLS listener
- JA3/JA4 fingerprint hashing
- ALPN negotiation (h2, http/1.1)
- Cipher suite count and complexity (15+ suggests browser)
- TLS extensions count (10+ suggests browser)
- Supported versions, signature schemes, elliptic curve groups
- Session ticket and early data support
- HTTP/2 vs HTTP/1.1; HTTP/2 fingerprint (SETTINGS, PRIORITY, window) when provided by proxy
- JA4H fingerprinting (HTTP fingerprint from JA4+ family)
- Header order and structure; browser-specific headers (sec-fetch-*, accept-language); header count and entropy
- Cross-signal consistency: JA4H vs HTTP; TLS vs User-Agent (known library/browser JA3/JA4); H2 vs JA4 (ALPN); TLS ALPN vs HTTP version (direct TLS)
- Absence signals (direct TLS only): missing SNI or ALPN when TLS is available scores toward bot; optional browser bonus when no smoking-gun bot signals fire (see config/README.md)
- Collect: Run server, generate traffic (curl, browsers, LLM tools)
- Log: All requests logged as structured JSON to daily files (
logs/requests_YYYYMMDD.jsonl) - Analyze: Run request_log_stats.py on JSONL logs for top-N by path/method/IP/fingerprint and scoring-signal prevalence; see Methodology Appendix J
- Iterate: Update classification heuristics based on findings
- Test: Automated integration tests validate behavior
- Go 1.22+ — Download installers for Windows, macOS, Linux; or install via package manager (e.g.
winget install GoLang.Go,brew install go,apt install golang-go). Ensuregois on your PATH. - Go tools directory in PATH — add
$HOME/go/bin(default when Go is installed in the usual way). Required sotaskandgolangci-lintare found aftergo install. Using this explicit path avoids errors whengocannot read the current directory (e.g. aftersudo su). Do not install thetaskortaskwarriorapt/snap packages (they are different programs). - TLS certificate and key (for HTTPS mode)
# Clone repository
git clone https://github.com/muliwe/go-client-classifier.git
cd go-client-classifier
# Install dependencies and dev tools
go mod tidy
go install github.com/go-task/task/v3/cmd/task@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
# Ensure Go bin is on PATH (required for `task` and `golangci-lint`)
# Use explicit path so it works even when current directory has permission issues (e.g. after sudo su)
export PATH=$PATH:$HOME/go/bin
# To make it permanent, add the same line to your shell profile and reload:
echo 'export PATH=$PATH:$HOME/go/bin' >> ~/.bashrc && source ~/.bashrc # bashFor TLS fingerprinting to work, the server must run in HTTPS mode. Place your certificate and key in the certs/ directory:
certs/
├── server.crt
└── server.key
Note: The certs/ directory is in .gitignore — certificates are not committed to the repository.
To generate a self-signed certificate for local development:
# Create certs directory
mkdir certs
# Generate self-signed certificate (valid for 1 year)
openssl req -x509 -newkey rsa:4096 -keyout certs/server.key -out certs/server.crt \
-days 365 -nodes -subj "/CN=localhost"Add the certificate to your system's trusted certificates for browser testing without warnings.
Using Let's Encrypt (certbot) — for a public hostname with HTTPS:
# Install certbot (Ubuntu/Debian)
sudo apt install certbot
# Obtain a certificate (standalone mode: port 80 must be free for the challenge)
sudo certbot certonly --standalone -d your-domain.example.com
# Certbot stores certs under /etc/letsencrypt/live/<domain>/
# Point the server at them via env or symlink into certs/:
# TLS_CERT=/etc/letsencrypt/live/your-domain.example.com/fullchain.pem
# TLS_KEY=/etc/letsencrypt/live/your-domain.example.com/privkey.pem
# Or copy/symlink into project certs/ (ensure deploy user can read; certbot files are root-readable):
sudo cp /etc/letsencrypt/live/your-domain.example.com/fullchain.pem certs/server.crt
sudo cp /etc/letsencrypt/live/your-domain.example.com/privkey.pem certs/server.key
sudo chown $(whoami) certs/server.crt certs/server.keyRenewal: certbot can renew via sudo certbot renew (e.g. from cron or systemd timer). After renewal, restart the Go server so it reloads the certs.
# Build binary
task build
# Run server (HTTP mode, no TLS fingerprinting)
task run
# Run server with HTTPS (required for TLS fingerprinting)
task run:tls
# Run tests
task test
# Run linter
task lint
# Format code
task fmt
# Run all checks (fmt, lint, test)
task check
# List all available tasks
task --list# Build binary to bin/server
task build
# Or manually
go build -o bin/server ./cmd/server
# Run the binary
./bin/serverThe server uses a JA4 fingerprint database (ja4db.com) for TLS vs User-Agent consistency. If the file is absent, the server downloads it itself on first use (saved to internal/fingerprint/data/ja4db.json when running from repo root). No manual step is required for basic runs.
For deployment, you can optionally download the dictionary manually (e.g. to avoid first-request latency or when the host has no outbound HTTPS):
# From repo root; creates internal/fingerprint/data/ja4db.json
curl -o internal/fingerprint/data/ja4db.json "https://ja4db.com/api/read/"Or with PowerShell:
Invoke-WebRequest -Uri "https://ja4db.com/api/read/" -OutFile "internal/fingerprint/data/ja4db.json" -UseBasicParsingEnsure the directory exists (mkdir -p internal/fingerprint/data or New-Item -ItemType Directory -Force -Path internal/fingerprint/data). Override path with env JA4DB_PATH if you place the file elsewhere.
# Run all tests
task test
# Run tests (short mode)
task test:short
# Test with curl (HTTP mode)
curl http://localhost:8080/
# Test with curl (HTTPS mode)
curl https://localhost:8443/
# Test health endpoint
curl http://localhost:8080/health
curl https://localhost:8443/healthRun integration tests against a running server using curl:
# HTTP mode
task run # Start server (terminal 1)
task integration # Run tests (terminal 2)
# HTTPS mode (TLS fingerprinting)
task run:tls # Start HTTPS server (terminal 1)
task integration:tls # Run tests with --insecure (terminal 2)
# Custom base URL
task integration BASE_URL=http://localhost:3000
task integration:tls BASE_URL=https://localhost:8443Run HTTP performance benchmark against a running server. You can pass a URL to test different routes (e.g. /, /health, /debug).
# Start server
task run:tls # HTTPS mode (terminal 1)
# Run benchmark (terminal 2)
task bench:tls # Default URL: https://localhost:8443/, 10s, 10 concurrent
# Pass URL to test a specific path (variable or positional after --)
task bench:tls URL=https://localhost:8443/debug
task bench:tls -- https://localhost:8443/health
# Custom duration and concurrency
task bench:tls DURATION=30s CONCURRENCY=50
# HTTP mode (default URL: http://localhost:8080/)
task bench
task bench URL=http://localhost:8080/health DURATION=10s CONCURRENCY=10
task bench -- http://localhost:8080/Benchmark output includes RPS, RPM, and latency statistics (avg/min/max).
The integration tests automatically detect the OS and use:
tools/shell/integration_test.ps1for Windows (PowerShell)tools/shell/integration_test.shfor Unix (Linux/macOS)
Tests verify:
GET /health— health check endpoint returns{"status":"ok"}GET /— classify endpoint returns classificationGET /debug— debug endpoint returns fingerprint data- curl is correctly detected as bot
| Endpoint | Description |
|---|---|
GET / |
Classify client as browser or bot |
GET /health |
Health check |
GET /debug |
Debug info with full fingerprint (dev only) |
Example API response (GET /):
{
"classification": "browser",
"confidence": "0.95",
"message": "You appear to be using a browser",
"request_id": "uuid",
"timestamp": "2026-02-18T12:00:00Z",
"version": "0.10.0"
}(confidence is a string with 2 decimal places to avoid float instability.)
Each request is logged as one JSON line (JSONL) with full fingerprint data. Log files are written by day in UTC: logs/requests_YYYYMMDD.jsonl (e.g. logs/requests_20260217.jsonl). The server rotates to a new file automatically when the date changes.
{
"timestamp": "2026-02-12T12:40:35Z",
"request_id": "uuid",
"classification": "browser",
"confidence": 0.99,
"fingerprint": {
"tls": {
"version": "TLS 1.3",
"cipher_suites_count": 16,
"extensions_count": 18,
"ja3_hash": "9b0d79d10808bc0e509b4789f870a650",
"ja4_hash": "t13d1516h2_8daaf6152771_d8a2da3f94cd",
"supported_groups": ["GREASE", "x25519", "secp256r1", "secp384r1"]
},
"http": {
"version": "HTTP/2.0",
"header_count": 14
}
},
"signals": {
"browser_score": 18,
"bot_score": 0,
"score_breakdown": "BROWSER[http2(+2) sec-fetch(+3) ...] BOT[]"
},
"score": 18
}You can run the service on Ubuntu as a systemd unit: one process listens on both HTTP and HTTPS, and restarts on failure or after a reboot.
1. Build the Linux binary
On your dev machine or in CI:
task build:prodThe binary will be at bin/server. Copy it to the server (e.g. /opt/go-client-classifier/).
2. Certificates
Place the certificate and key in the app directory, for example:
/opt/go-client-classifier/
├── server # binary
├── certs/
│ ├── server.crt
│ └── server.key
└── logs/ # created automatically
3. systemd unit file
Create /etc/systemd/system/go-client-classifier.service:
[Unit]
Description=Go Client Classifier (bot detector)
After=network.target
[Service]
Type=simple
User=deploy
Group=deploy
WorkingDirectory=/opt/go-client-classifier
ExecStart=/opt/go-client-classifier/bin/server
Restart=always
RestartSec=5
# HTTP :8080, HTTPS :8443
Environment=PORT=8080
Environment=TLS_PORT=8443
Environment=TLS_CERT=/opt/go-client-classifier/certs/server.crt
Environment=TLS_KEY=/opt/go-client-classifier/certs/server.key
# Optional: Redis — challenge store + behavioural metrics (Appendix L). If unset, challenge store is in-memory and metrics are not collected.
# Environment=REDIS_URL=redis://127.0.0.1:6379/0
# Environment=CHALLENGE_TTL_SEC=120
# Optional: enable PROXY protocol on TLS port (when nginx stream uses proxy_protocol on → real client IP in logs)
# Environment=PROXY_PROTOCOL=1
# Raise open-file limit (default 1024 can cause SSL/connection failures under load)
LimitNOFILE=65535
# Optional: disable request logging, only health/debug
# Environment=DEBUG=false
[Install]
WantedBy=multi-user.targetRedis (optional) — If you use REDIS_URL, ensure Redis is running. Check and install:
# Check: expect PONG
redis-cli ping
# Install if missing (Debian/Ubuntu)
command -v redis-server >/dev/null 2>&1 || sudo apt-get update && sudo apt-get install -y redis-server
# Or RHEL/Rocky/Fedora
command -v redis-server >/dev/null 2>&1 || sudo dnf install -y redis && sudo systemctl enable --now redis
# After install: ensure it runs and responds (service name: redis on RHEL/Ubuntu 24+, redis-server on older Debian/Ubuntu)
sudo systemctl enable --now redis 2>/dev/null || sudo systemctl enable --now redis-server
redis-cli pingReplace User=deploy and Group=deploy with the user and group that should run the service. Ensure that user can read the binary, certs/, and write to logs/ (e.g. chown -R deploy:deploy /opt/go-client-classifier).
Alternatively, put variables in a file: create /opt/go-client-classifier/.env (or environment.conf) and add EnvironmentFile=/opt/go-client-classifier/.env to the unit.
4. Enable and start
sudo systemctl daemon-reload
sudo systemctl enable go-client-classifier
sudo systemctl start go-client-classifier
sudo systemctl status go-client-classifierVerify: curl http://localhost:8080/health and curl -k https://localhost:8443/health.
Viewing logs in real time
- Service output (stdout/stderr: startup message, per-request console line, errors):
journalctl -u go-client-classifier -f
- Request log file (JSONL, one line per classify request):
Or from the app directory:
tail -f /opt/go-client-classifier/logs/requests_$(date +%Y%m%d).jsonltail -f logs/requests_*.jsonl(today’s file).
Note: Any request that hits the classify handler (including non-root paths like /not-known) is classified and written to the JSONL and console logs; only GET / returns 200 JSON, other paths return 404. GET /health and GET /debug are handled by other handlers and are not logged. If the log stays empty, check journalctl -u go-client-classifier -f for the "Logs:" path at startup and any "Error logging result" messages.
Environment variables
| Variable | Description | Example |
|---|---|---|
PORT |
HTTP port | 8080 |
TLS_PORT |
HTTPS port (when using TLS) | 8443 |
PROXY_PROTOCOL |
PROXY protocol on TLS (if nginx has proxy_protocol on) |
true |
TLS_CERT |
Path to certificate file | certs/server.crt |
TLS_KEY |
Path to key file | certs/server.key |
DEBUG |
Enable /debug endpoint |
true / false |
SCORING_CONFIG |
Path to scoring JSON (points, thresholds, classifier) | config/scoring.json |
If only TLS_CERT and TLS_KEY are set (no TLS_PORT), the service runs in HTTPS-only mode on PORT.
Scoring config — All scoring points, thresholds, classifier weight and confidence parameters are read from a single JSON file at startup. Path: SCORING_CONFIG or default config/scoring.json. If the file is missing or invalid, built-in defaults are used. Tuning (e.g. reducing false bots for incognito) is done via the config without code changes. See config/README.md for the schema, smoking guns (+3), strong/weak bot signals, and zero-point (easily spoofable) signals; config/scoring.default.json is the reference default.
- Can transport-level signals reliably distinguish browsers from automation?
- Which signals are most predictive?
- How do sophisticated bots (headless Chrome) behave?
- What are the false positive/negative rates?
Project uses git pre-commit hooks for code quality:
- Format check (
go fmt) - Linter (
golangci-lint) - Tests (
go test)
Hooks are automatically run before each commit.
- CHANGELOG.md — version history and release notes
- config/README.md — scoring config schema, smoking guns, weak/zero signals, thresholds
- docs/METHODOLOGY.md — research methodology, signals, scoring algorithm, references; Appendix J — request log statistics and collection methodology
- docs/nginx.md — nginx setup for TLS termination, HTTP/2 fingerprint (X-FP-H2), JA3 (X-FP-JA3); Go consumes headers and uses H2/JA3 in cross-validation (Appendix G)
- tools/python/README.md — Python tools: request_log_stats (aggregate JSONL), antibot_test
MIT (Academic Research)
Research project for academic purposes.