diff --git a/.gitignore b/.gitignore
index 1c15a38..9240289 100644
--- a/.gitignore
+++ b/.gitignore
@@ -23,3 +23,4 @@ dev-check-strict.sh
.DS_STORE
clippy_reports
src/.DS_Store
+build-release.sh
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 06d1858..c071574 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,285 +2,143 @@
All notable changes to RustHost are documented here.
----
-
-## [0.1.0] — Initial Release
-
-This release resolves all 40 issues identified in the 2026-03-20 comprehensive security and reliability audit. Changes are grouped by the audit's five severity phases.
+The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
+RustHost uses [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
---
-### Phase 1 — Critical Security & Correctness
-
-#### 1.1 — Config Path Traversal: `site.directory` and `logging.file` Validated
-
-`src/config/loader.rs` — `validate()` now rejects any `site.directory` or `logging.file` value that is an absolute path, contains a `..` component, or contains a platform path separator. The process exits with a clear validation error before binding any port. Previously, a value such as `directory = "../../etc"` caused the HTTP server to serve the entire `/etc` tree, and a value such as `../../.ssh/authorized_keys` for `logging.file` caused log lines to be appended to the SSH authorized keys file.
-
-#### 1.2 — Race Condition: Tor Captures Bound Port via `oneshot` Channel
-
-`src/runtime/lifecycle.rs`, `src/server/mod.rs` — The 50 ms sleep that was the sole synchronisation barrier between the HTTP server binding its port and the Tor subsystem reading that port has been replaced with a `tokio::sync::oneshot` channel. The server sends the actual bound port through the channel before entering the accept loop; `tor::init` awaits that value (with a 10-second timeout) rather than reading a potentially-zero value out of `SharedState`. Previously, on a loaded system the race could be lost silently, causing every inbound Tor connection to fail with `ECONNREFUSED` to port 0 while the dashboard displayed a healthy green `TorStatus::Ready`.
-
-#### 1.3 — XSS in Directory Listing via Unsanitised Filenames
-
-`src/server/handler.rs` — `build_directory_listing()` now HTML-entity-escapes all filenames before interpolating them into link text (`&` → `&`, `<` → `<`, `>` → `>`, `"` → `"`, `'` → `'`) and percent-encodes filenames in `href` attribute values. Previously, a file named `">` produced an executable XSS payload in any directory listing page.
-
-#### 1.4 — HEAD Requests No Longer Receive a Response Body
-
-`src/server/handler.rs` — `parse_path()` now returns `(method, path)` instead of only the path. The method is threaded through to `write_response()` via a `suppress_body: bool` parameter. For `HEAD` requests, response headers (including `Content-Length` reflecting the full body size, as required by RFC 7231 §4.3.2) are written, but the body is not sent.
-
-#### 1.5 — Request Timeout Prevents Slow-Loris DoS
-
-`src/server/handler.rs` — The call to `read_request()` is now wrapped in `tokio::time::timeout(Duration::from_secs(30))`. Connections that fail to deliver a complete request header within 30 seconds receive a `408 Request Timeout` response and are closed. The timeout is also configurable via `[server] request_timeout_secs` in `settings.toml`. Timeout events are logged at `debug` level to avoid log flooding under attack.
-
-#### 1.6 — Unbounded Connection Spawning Replaced with Semaphore
-
-`src/server/mod.rs`, `src/tor/mod.rs` — Both the HTTP accept loop and the Tor stream request loop now use a `tokio::sync::Semaphore` to cap concurrent connections. The limit is configurable via `[server] max_connections` (default: 256). The semaphore `OwnedPermit` is held for the lifetime of each connection task and released on drop. When the limit is reached, the accept loop suspends naturally, providing backpressure; a `warn`-level log entry is emitted. Previously, unlimited concurrent connections could exhaust task stack memory and file descriptors.
-
-#### 1.7 — Files Streamed Instead of Read Entirely Into Memory
-
-`src/server/handler.rs` — `tokio::fs::read` (which loads the entire file into a `Vec`) has been replaced with `tokio::fs::File::open` followed by `tokio::io::copy(&mut file, &mut stream)`. File size is obtained via `file.metadata().await?.len()` for the `Content-Length` header. Memory consumption per connection is now bounded by the kernel socket buffer (~128–256 KB) regardless of file size. For `HEAD` requests, the file is opened only to read its size; the `copy` step is skipped.
+## [Unreleased]
-#### 1.8 — `strip_timestamp` No Longer Panics on Non-ASCII Log Lines
+### Added
+- **`CONTRIBUTING.md`** — development workflow, lint gates, PR checklist, and architecture overview for new contributors.
+- **`SECURITY.md`** — private vulnerability disclosure policy and scope definition.
+- **`CHANGELOG.md`** — this file.
+- **Depth-bounded `scan_site` BFS** — the directory scanner now stops at 64 levels deep and emits a warning instead of running indefinitely on adversarially deep directory trees.
+- **Multiple log rotation backups** — `LogFile::rotate` now keeps up to five numbered backup files (`.log.1`–`.log.5`) instead of one, matching what operators expect from tools like `logrotate`.
-`src/console/dashboard.rs` — `strip_timestamp()` previously used a byte index derived from iterating `.bytes()` to slice a `&str`, which panicked when the index fell inside a multi-byte UTF-8 character. The implementation now uses `splitn(3, ']')` to strip the leading `[LEVEL]` and `[HH:MM:SS]` tokens, which is both panic-safe and simpler. Any log line containing Unicode characters (Arti relay names, internationalized filenames, `.onion` addresses) is handled correctly.
-
-#### 1.9 — `TorStatus` Updated to `Failed` When Onion Service Terminates
-
-`src/tor/mod.rs` — When `stream_requests.next()` returns `None` (the onion service stream ends unexpectedly), the status is now set to `TorStatus::Failed("stream ended".to_string())` and the `onion_address` field is cleared from `AppState`. Previously, the dashboard permanently displayed a healthy green badge and the `.onion` address after the service had silently stopped serving traffic.
-
-#### 1.10 — Terminal Fully Restored on All Exit Paths; Panic Hook Registered
-
-`src/main.rs`, `src/console/mod.rs` — The error handler in `main.rs` now calls `console::cleanup()` (which issues `cursor::Show` and `terminal::LeaveAlternateScreen` before `disable_raw_mode`) on all failure paths. A `std::panic::set_hook` registered at startup ensures the same cleanup runs even when a panic occurs on an async executor thread. `console::cleanup()` is idempotent (guarded by a `RAW_MODE_ACTIVE` atomic swap), so calling it from multiple paths is safe.
+### Changed
+- **`lib.rs` visibility audit** — items only used in integration tests (`percent_decode`, `ByteRange`, `Encoding`, `onion_address_from_pubkey`) are now re-exported under `#[cfg(test)]` rather than unconditionally, reducing the public API surface.
+- **Comment hygiene** — all internal `fix X.Y` tags have been replaced with descriptive prose so the rationale for each decision is clear to contributors.
---
-### Phase 2 — High Priority Reliability
-
-#### 2.1 — HTTP Request Reading Buffered with `BufReader`
-
-`src/server/handler.rs` — `read_request()` previously read one byte at a time, issuing up to 8,192 individual `read` syscalls per request. The stream is now wrapped in `tokio::io::BufReader` and reads headers line-by-line with `read_line()`. The 8 KiB header size limit is enforced by accumulating total bytes read. This also correctly handles `\r\n\r\n` split across TCP segments.
-
-#### 2.2 — `scan_site` is Now Recursive, Error-Propagating, and Non-Blocking
-
-`src/server/mod.rs`, `src/runtime/lifecycle.rs`, `src/runtime/events.rs` — `scan_site` now performs a breadth-first traversal using a `VecDeque` work queue, counting files and sizes in all subdirectories. The return type is now `Result<(u32, u64)>`; errors from `read_dir` are propagated and logged at `warn` level rather than silently returning `(0, 0)`. All call sites wrap the function in `tokio::task::spawn_blocking` to avoid blocking the async executor on directory I/O.
+## [0.1.0] — 2025-07-01
-#### 2.3 — `canonicalize()` Called Once at Startup, Not Per Request
+This release resolves all 40 issues identified in the 2026-03-20 security and reliability audit. Every fix is listed below, grouped by the phase it belongs to.
-`src/server/mod.rs`, `src/server/handler.rs` — The site root is now canonicalized once in `server::run()` and passed as a pre-computed `PathBuf` into each connection handler. The per-request `site_root.canonicalize()` call in `resolve_path()` has been removed, eliminating a `realpath()` syscall on every request.
-
-#### 2.4 — `open_browser` Deduplicated
+---
-`src/runtime/lifecycle.rs`, `src/runtime/events.rs`, `src/runtime/mod.rs` — The `open_browser` function was duplicated in `lifecycle.rs` and `events.rs`. It now lives in a single location (`src/runtime/mod.rs`) and both call sites use the shared implementation.
+### Added
-#### 2.5 — `#[serde(deny_unknown_fields)]` on All Config Structs
+#### Repository & CI (Phase 0)
-`src/config/mod.rs` — All `#[derive(Deserialize)]` config structs (`Config`, `ServerConfig`, `SiteConfig`, `TorConfig`, `LoggingConfig`, `ConsoleConfig`, `IdentityConfig`) now carry `#[serde(deny_unknown_fields)]`. A misspelled key such as `bund = "127.0.0.1"` now causes a startup error naming the unknown field rather than silently using the compiled-in default.
+- **`rust-toolchain.toml`** — pins the nightly channel so every contributor and CI run uses the same compiler. No more "works on my machine" build failures.
+- **GitHub Actions CI** — runs build, test, clippy, rustfmt, `cargo-audit`, and `cargo-deny` on Ubuntu, macOS, and Windows on every push and PR.
+- **`Cargo.toml` profile tuning** — `opt-level = 1` for dev dependencies speeds up debug builds; the release profile uses `lto = true`, `strip = true`, and `codegen-units = 1` for a smaller, faster binary.
-#### 2.6 — `auto_reload` Removed (Was Unimplemented)
+#### HTTP Server
-`src/config/mod.rs`, `src/config/defaults.rs` — The `auto_reload` field was present in the config struct and advertised in the default `settings.toml` but had no implementation. It has been removed entirely. The `[R]` key for manual site stat reloads is unaffected.
+- **Keep-alive via `hyper` 1.x** — migrated from a hand-rolled single-shot HTTP/1.1 parser to `hyper`. Eliminates the 30–45 second Tor page-load penalty that was caused by `Connection: close` on every response.
+- **Brotli and Gzip compression** — negotiated via `Accept-Encoding`. Brotli is preferred over Gzip for Tor users since they pay in latency for every byte.
+- **`ETag` / conditional GET** — weak ETags computed from file modification time and size. Returns `304 Not Modified` when `If-None-Match` matches, saving a round-trip.
+- **Range requests** — supports `bytes=N-M`, `bytes=N-`, and `bytes=-N` suffix forms. Returns `206 Partial Content` or `416 Range Not Satisfiable` as appropriate. Enables audio and video seeking.
+- **Per-IP rate limiting** — `DashMap`-backed lock-free CAS loop. Connections beyond `max_connections_per_ip` are dropped at accept time with a TCP RST.
+- **Smart `Cache-Control`** — HTML responses get `no-store`; content-hashed assets (8–16 hex characters in the filename stem) get `max-age=31536000, immutable`; everything else gets `no-cache`.
+- **Security headers on every response** — `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`, `Referrer-Policy: no-referrer`, and `Permissions-Policy: camera=(), microphone=(), geolocation=()`. HTML responses additionally include a configurable `Content-Security-Policy`.
+- **`--serve ` one-shot mode** — serve a directory directly without a `settings.toml`. Skips first-run setup entirely.
+- **Extended MIME types** — added `.webmanifest`, `.opus`, `.flac`, `.glb`, and `.ndjson`.
+- **Combined Log Format access log** — written to `logs/access.log` with owner-only `0600` permissions.
-#### 2.7 — ANSI Terminal Injection Prevention Documented and Tested
+#### Tor / Onion Service
-`src/config/loader.rs` — The existing `char::is_control` check on `instance_name` (which covers ESC `\x1b`, NUL `\x00`, BEL `\x07`, and BS `\x08`) is confirmed to prevent terminal injection. An explicit comment now documents the security intent, and dedicated test cases cover each injection vector.
+- **Idle timeout fix** (`copy_with_idle_timeout`) — replaced the wall-clock cap (which disconnected active large downloads after 60 seconds) with a true per-side idle deadline that resets on every read or write.
+- **`reference_onion` test** — replaced the tautological self-referencing test with an external test vector computed independently using Python's standard library.
-#### 2.8 — Keyboard Input Task Failure Now Detected and Reported
+#### Configuration
-`src/runtime/lifecycle.rs` — If the `spawn_blocking` input task exits (causing `key_rx` to close), `recv().await` returning `None` is now detected. A `warn`-level log entry is emitted ("Console input task exited — keyboard input disabled. Use Ctrl-C to quit.") and subsequent iterations no longer attempt to receive from the closed channel. Previously, input task death was completely silent.
+- **URL redirect and rewrite rules** — `[[redirects]]` table in `settings.toml`, checked before filesystem resolution. Supports 301 and 302.
+- **Custom error pages** — `site.error_404` and `site.error_503` config keys resolve to HTML files served with the correct status codes.
+- **`--config` and `--data-dir` CLI flags** — override the default config and data directory paths. Enables multi-instance deployments and systemd unit files with explicit paths.
+- **`--version` and `--help` CLI flags**.
+- **`#[serde(deny_unknown_fields)]` on all config structs** — a misspelled key like `bund = "127.0.0.1"` causes a clear startup error instead of silently using the default.
+- **Typed config fields** — `bind` is `std::net::IpAddr`; `log level` is a `LogLevel` enum. Invalid values are caught at deserialisation time, not after the server starts.
-#### 2.9 — `TorStatus::Failed` Now Carries a Reason String
+#### Features
-`src/runtime/state.rs`, `src/console/dashboard.rs` — `TorStatus::Failed(Option)` (the exit code variant, which was never constructed) has been replaced with `TorStatus::Failed(String)`. Construction sites pass a brief reason string (`"bootstrap failed"`, `"stream ended"`, `"launch failed"`). The dashboard now renders `FAILED (reason) — see log for details` instead of a bare `FAILED`.
+- **SPA fallback routing** — unknown paths fall back to `index.html` when `site.spa_routing = true`, enabling React, Vue, and Svelte client-side routing.
+- **`canonical_root` hot reload** — the `[R]` keypress pushes a new canonicalised root to the accept loop over a `watch` channel without restarting the server.
+- **Dependency log filtering** — Arti and Tokio internals at `Info` and below are suppressed by default, keeping the log focused on application events. Configurable via `filter_dependencies`.
-#### 2.10 — Graceful Shutdown Uses `JoinSet` and Proper Signalling
+#### Reliability
-`src/runtime/lifecycle.rs`, `src/server/mod.rs`, `src/tor/mod.rs` — The 300 ms fixed sleep that gated shutdown has been replaced with proper task completion signalling. A clone of `shutdown_rx` is passed into `tor::init()`; the Tor run loop watches it via `tokio::select!` and exits cleanly on shutdown. In-flight HTTP connection tasks are tracked in a `JoinSet`; after the accept loop exits, `join_set.join_all()` is awaited with a 5-second timeout, allowing in-progress transfers to complete before the process exits.
+- **Exponential backoff for Tor retries** — re-bootstrap retries now use exponential backoff (30 s, 60 s, 120 s, …, capped at 300 s) instead of a fixed linear delay.
+- **Shutdown drain per subsystem** — HTTP and Tor drains each have their own independently-bounded timeout (5 s for HTTP, 10 s for Tor) so a slow HTTP drain doesn't steal time from Tor circuit teardown.
+- **`percent-encoding` crate** — replaced the hand-rolled `percent_decode` function with the audited upstream crate. Added a null-byte guard specific to filesystem path use.
+- **`scan_site` partial failure** — unreadable subdirectories are skipped with a warning instead of aborting the entire scan.
+- **`fstat` batching** — `LogFile::write_line` calls `fstat` every 100 writes (instead of on every record) to reduce syscall overhead on active servers.
-#### 2.11 — Log File Flushed on Graceful Shutdown
+#### Testing & CI
-`src/logging/mod.rs`, `src/runtime/lifecycle.rs` — A `pub fn flush()` function has been added to the logging module. The shutdown sequence calls it explicitly after the connection drain wait, ensuring all buffered log entries (including the `"RustHost shut down cleanly."` sentinel) are written to disk before the process exits.
+- **Unit tests for all security-critical functions** — `percent_decode`, `resolve_path`, `validate`, `strip_timestamp`, and `hsid_to_onion_address` all have `#[cfg(test)]` coverage.
+- **Integration tests** (`tests/http_integration.rs`) — covers all HTTP core flows using raw `TcpStream`: 200, HEAD, 304, 403, 404, 400, range requests, and oversized headers.
---
-### Phase 3 — Performance
-
-#### 3.1 — `data_dir()` Computed Once at Startup
-
-`src/runtime/lifecycle.rs` — `data_dir()` (which calls `std::env::current_exe()` internally) was previously called on every key event dispatch inside `event_loop`. It is now computed exactly once at the top of `normal_run()`, stored in a local variable, and passed as a parameter to all functions that need it.
-
-#### 3.2 — `Arc` and `Arc` Eliminate Per-Connection Heap Allocations
-
-`src/server/mod.rs`, `src/server/handler.rs` — `site_root` and `index_file` are now wrapped in `Arc` and `Arc` respectively before the accept loop. Each connection task receives a cheap `Arc` clone (reference-count increment) rather than a full heap allocation.
-
-#### 3.3 — Dashboard Render Task Skips Redraws When Output Is Unchanged
-
-`src/console/mod.rs` — The render task now compares the rendered output string against the previously written string. If identical, the `execute!` and `write_all` calls are skipped entirely. This eliminates terminal writes on idle ticks, which is the common case for a server with no active traffic.
-
-#### 3.4 — MIME Lookup No Longer Allocates a `String` Per Request
-
-`src/server/mime.rs` — The `for_extension` function previously called `ext.to_ascii_lowercase()`, allocating a heap `String` on every request. The comparison now uses `str::eq_ignore_ascii_case` directly against the extension string, with no allocation.
-
-#### 3.5 — Log Ring Buffer Lock Not Held During `String` Clone
-
-`src/logging/mod.rs` — The log line string is now cloned before acquiring the ring buffer mutex. The mutex is held only for the `push_back` of the already-allocated string, reducing lock contention from Arti's multi-threaded internal logging.
-
-#### 3.6 — Tokio Feature Flags Made Explicit
-
-`Cargo.toml` — `tokio = { features = ["full"] }` has been replaced with an explicit feature list: `rt-multi-thread`, `net`, `io-util`, `fs`, `sync`, `time`, `macros`, `signal`. Unused features (`process`, `io-std`) are no longer compiled, reducing binary size and build time.
+### Fixed
+
+#### Critical (Phase 1)
+
+- **Config path traversal** — `validate()` now rejects any `site.directory` or `logging.file` value that is an absolute path, contains `..`, or contains a platform path separator. Previously, `directory = "../../etc"` would cause the server to serve the entire `/etc` tree.
+- **Tor port race condition** — replaced the 50 ms sleep used to synchronise the HTTP server's bound port with the Tor subsystem with a `tokio::sync::oneshot` channel. The server sends the actual bound port through the channel before entering the accept loop. Previously, on a loaded system, the race could be lost silently, causing every inbound Tor connection to fail with `ECONNREFUSED` to port 0 while the dashboard showed a healthy green status.
+- **XSS in directory listings** — `build_directory_listing()` now HTML-entity-escapes all filenames before interpolating them into link text, and percent-encodes filenames in `href` attributes. Previously, a file named `">` produced an executable XSS payload in any directory listing page.
+- **HEAD requests sent a response body** — `HEAD` requests now send the correct headers (including `Content-Length` reflecting the full body size) but no body, as required by RFC 7231 §4.3.2. Previously, the full file was sent.
+- **Slow-loris DoS** — `read_request()` is now wrapped in a 30-second timeout. Connections that don't deliver a complete request header in time receive a `408 Request Timeout`. Configurable via `request_timeout_secs`.
+- **Unbounded connection spawning** — both the HTTP accept loop and the Tor stream loop now use a `tokio::sync::Semaphore` to cap concurrent connections (default: 256). Previously, unlimited concurrent connections could exhaust file descriptors and task stack memory.
+- **Files loaded entirely into memory** — replaced `tokio::fs::read` (which loaded the entire file into a `Vec`) with `tokio::fs::File::open` + `tokio::io::copy`. Memory per connection is now bounded by the kernel socket buffer (~128–256 KB) regardless of file size.
+- **`strip_timestamp` panic on non-ASCII log lines** — the old implementation used a byte index derived from `.bytes()` to slice a `&str`, which panicked when the index fell inside a multi-byte UTF-8 character. Now uses `splitn(3, ']')`, which is both panic-safe and handles Unicode correctly.
+- **`TorStatus` not updated when onion service terminates** — when the onion service stream ends unexpectedly, the status is now set to `TorStatus::Failed("stream ended")` and the `.onion` address is cleared. Previously, the dashboard permanently showed a healthy green badge after the service had silently stopped.
+- **Terminal not restored on panic or crash** — a `std::panic::set_hook` is registered at startup to call `console::cleanup()` (which issues `LeaveAlternateScreen`, `cursor::Show`, and `disable_raw_mode`) on all exit paths. The cleanup function is idempotent, so calling it from multiple paths is safe.
+
+#### High — Reliability (Phase 2)
+
+- **HTTP request reading done byte-by-byte** — `read_request()` previously issued up to 8,192 individual `read` syscalls per request. The stream is now wrapped in `tokio::io::BufReader` and headers are read line-by-line. Also correctly handles `\r\n\r\n` split across multiple TCP segments.
+- **`scan_site` only scanned the top-level directory** — now performs a full breadth-first traversal using a work queue, counting files and sizes in all subdirectories. Unreadable directories are skipped with a warning instead of propagating an error.
+- **`canonicalize()` called on every request** — the site root is now canonicalised once at startup and passed into each connection handler. Eliminates a `realpath()` syscall on every single request.
+- **`open_browser` duplicated** — the function existed in two separate source files. Now lives in one place (`src/runtime/mod.rs`).
+- **`auto_reload` config field was unimplemented** — removed entirely. It was present in the config struct and advertised in the default `settings.toml` but had no effect.
+- **Keyboard input task failure was silent** — if the input task exits unexpectedly (causing `key_rx` to close), a warning is now logged ("Console input task exited — keyboard input disabled. Use Ctrl-C to quit."). Previously, this failure was completely invisible.
+- **`TorStatus::Failed` carried an exit code that was never set** — replaced `TorStatus::Failed(Option)` with `TorStatus::Failed(String)`. The dashboard now shows `FAILED (reason) — see log for details` with a human-readable reason string.
+- **Graceful shutdown used a fixed 300 ms sleep** — replaced with proper task completion signalling. In-flight HTTP connections are tracked in a `JoinSet` and given 5 seconds to finish. The Tor run loop watches the shutdown signal via `tokio::select!` and exits cleanly.
+- **Log file not flushed on shutdown** — added `pub fn flush()` to the logging module. The shutdown sequence calls it explicitly after the connection drain, ensuring the final log entries (including the shutdown sentinel) reach disk.
+
+#### Medium (Phase 3–5)
+
+- **`data_dir()` recomputed on every key event** — now computed once at startup and passed as a parameter. Removes the hidden `current_exe()` call from the hot event loop.
+- **Per-connection heap allocations for `site_root` and `index_file`** — both are now wrapped in `Arc` and `Arc` before the accept loop. Each connection task gets a cheap reference-count increment instead of a full heap allocation.
+- **Dashboard redrawn on every tick even when unchanged** — the render task now compares the new output against the previous one and skips writing to the terminal if they're identical. Eliminates unnecessary terminal writes on idle servers.
+- **MIME lookup allocated a heap `String` per request** — replaced `ext.to_ascii_lowercase()` with `str::eq_ignore_ascii_case`. No allocation.
+- **Log ring buffer lock held during `String` clone** — the log line is now cloned before acquiring the mutex. The lock is held only for the `push_back`, reducing contention from Arti's multi-threaded logging.
+- **`tokio = { features = ["full"] }` compiled unused features** — replaced with an explicit feature list (`rt-multi-thread`, `net`, `io-util`, `fs`, `sync`, `time`, `macros`, `signal`). Reduces binary size and build time.
+- **`sanitize_header_value` only stripped CR/LF** — now strips all C0 control characters (NUL, ESC, TAB, DEL), preventing header injection via crafted filenames or redirect targets.
+- **`expose_dotfiles` checked on URL path instead of resolved path components** — the guard now inspects each path component after `canonicalize`, blocking escapes like `/normal/../.git/config`.
+- **`render()` acquired the `AppState` lock twice per tick** — now acquires it once per tick, eliminating the TOCTOU race between two sequential acquisitions.
+- **Stale "polling" message in dashboard** — Arti is event-driven, not polled. The message implying periodic polling has been removed.
+- **`percent_decode` produced garbage for multi-byte UTF-8 sequences** — the old implementation decoded each `%XX` token as a standalone `char` cast from a `u8`. It now accumulates decoded bytes into a buffer and flushes via `String::from_utf8_lossy`, correctly reassembling multi-byte sequences. Null bytes (`%00`) are left as the literal string `%00`.
+- **`deny.toml` missing five duplicate crate skip entries** — `foldhash`, `hashbrown`, `indexmap`, `redox_syscall`, and `schemars` were absent from `bans.skip` but present in the lock file. `cargo deny check` now passes cleanly.
+- **`ctrlc` crate conflicted with Tokio's signal handling** — replaced with `tokio::signal::ctrl_c()` and `tokio::signal::unix::signal(SignalKind::interrupt())` integrated directly into `event_loop`. Eliminates the threading concerns between the two signal handling mechanisms.
+- **`open_browser` silently swallowed spawn errors** — spawn errors are now logged at `warn` level.
---
-### Phase 4 — Architecture & Design
-
-#### 4.1 — Typed `AppError` Enum Introduced
-
-`src/error.rs` (new), `src/main.rs`, all modules — The global `Box` result alias has been replaced with a typed `AppError` enum using `thiserror`. Variants: `ConfigLoad`, `ConfigValidation`, `LogInit`, `ServerBind { port, source }`, `Tor`, `Io`, `Console`. Error messages now preserve structured context at the type level.
-
-#### 4.2 — Config Structs Use Typed Fields
-
-`src/config/mod.rs`, `src/config/loader.rs` — `LoggingConfig.level` is now a `LogLevel` enum (`Trace` | `Debug` | `Info` | `Warn` | `Error`) with `#[serde(rename_all = "lowercase")]`; the duplicate validation in `loader.rs` and `logging/mod.rs` has been removed. `ServerConfig.bind` is now `std::net::IpAddr` via `#[serde(try_from = "String")]`. The parse-then-validate pattern is eliminated in favour of deserialisation-time typing.
-
-#### 4.3 — Dependency Log Noise Filtered by Default
-
-`src/logging/mod.rs` — `RustHostLogger::enabled()` now suppresses `Info`-and-below records from non-`rusthost` targets (Arti, Tokio internals). Warnings and errors from all crates are still passed through. This prevents the ring buffer and log file from being flooded with Tor bootstrap noise. Configurable via `[logging] filter_dependencies = true` (default `true`); set `false` to pass all crate logs at the configured level.
-
-#### 4.4 — `data_dir()` Free Function Eliminated; Path Injected
-
-`src/runtime/lifecycle.rs` and all callers — The `data_dir()` free function (which called `current_exe()` as a hidden dependency) has been removed. The data directory `PathBuf` is now a first-class parameter threaded through the call chain from `normal_run`, enabling test injection of temporary directories.
-
-#### 4.5 — `percent_decode` Correctly Handles Multi-Byte UTF-8 and Null Bytes
-
-`src/server/handler.rs` — The previous implementation decoded each `%XX` token as a standalone `char` cast from a `u8`, producing incorrect output for multi-byte sequences (e.g., `%C3%A9` was decoded as two garbage characters instead of `é`). The function now accumulates consecutive decoded bytes into a `Vec` buffer and flushes via `String::from_utf8_lossy` when a literal character is encountered, correctly reassembling multi-byte sequences. Null bytes (`%00`) are left as the literal string `%00` in the output rather than being decoded.
-
-#### 4.6 — `deny.toml` Updated with All Duplicate Crate Skip Entries
+### Changed
-`deny.toml` — Five duplicate crate version pairs that were absent from `bans.skip` but present in the lock file have been added with comments identifying the dependency trees that pull each version: `foldhash`, `hashbrown`, `indexmap`, `redox_syscall`, and `schemars`. `cargo deny check` now passes cleanly.
-
-#### 4.7 — `ctrlc` Crate Replaced with `tokio::signal`
-
-`Cargo.toml`, `src/runtime/lifecycle.rs` — The `ctrlc = "3"` dependency has been removed. Signal handling is now done via `tokio::signal::ctrl_c()` (cross-platform) and `tokio::signal::unix::signal(SignalKind::interrupt())` (Unix), integrated directly into the `select!` inside `event_loop`. This eliminates threading concerns between the `ctrlc` crate's signal handler and Tokio's internal signal infrastructure.
+- **`Box` replaced with typed `AppError` enum** — uses `thiserror`. Variants: `ConfigLoad`, `ConfigValidation`, `LogInit`, `ServerBind { port, source }`, `Tor`, `Io`, `Console`. Error messages now preserve structured context.
+- **Single `write_headers` path** — all security headers (CSP, HSTS, `X-Content-Type-Options`, etc.) are emitted from one function. Redirect responses delegate here instead of duplicating the header list, eliminating the risk of the two diverging.
+- **`audit.toml` consolidated into `deny.toml`** — advisory suppression is managed in one place with documented rationale. CI now runs `cargo deny check` as a required step.
---
-### Phase 5 — Testing, Observability & Hardening
-
-#### 5.1 — Unit Tests Added for All Security-Critical Functions
-
-`src/server/handler.rs`, `src/server/mod.rs`, `src/config/loader.rs`, `src/console/dashboard.rs`, `src/tor/mod.rs` — `#[cfg(test)]` modules added to each file. Coverage includes: `percent_decode` (ASCII, spaces, multi-byte UTF-8, null bytes, incomplete sequences, invalid hex); `resolve_path` (normal file, directory traversal, encoded-slash traversal, missing file, missing root); `validate` (valid config, `site.directory` path traversal, absolute path, `logging.file` traversal, port 0, invalid IP, unknown field); `strip_timestamp` (ASCII line, multi-byte UTF-8 line, line with no brackets); `hsid_to_onion_address` (known test vector against reference implementation).
-
-#### 5.2 — Integration Tests Added for HTTP Server Core Flows
-
-`tests/http_integration.rs` (new) — Integration tests using `tokio::net::TcpStream` against a test server bound on port 0. Covers: `GET /index.html` → 200; `HEAD /index.html` → correct `Content-Length`, no body; `GET /` with `index_file` configured; `GET /../etc/passwd` → 403; request header > 8 KiB → 400; `GET /nonexistent.txt` → 404; `POST /index.html` → 400.
-
-#### 5.3 — Security Response Headers Added to All Responses
-
-`src/server/handler.rs` — All responses now include `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`, `Referrer-Policy: no-referrer`, and `Permissions-Policy: camera=(), microphone=(), geolocation=()`. HTML responses additionally include `Content-Security-Policy: default-src 'self'` (configurable via `[server] content_security_policy` in `settings.toml`). The `Referrer-Policy: no-referrer` header is especially relevant for the Tor onion service: it prevents the `.onion` URL from leaking in the `Referer` header to any third-party resources loaded by served HTML.
-
-#### 5.4 — Accept Loop Error Handling Uses Exponential Backoff
-
-`src/server/mod.rs` — The accept loop previously retried immediately on error, producing thousands of log entries per second on persistent errors such as `EMFILE`. Errors now trigger exponential backoff (starting at 1 ms, doubling up to 1 second). `EMFILE` is logged at `error` level (operator intervention required); transient errors (`ECONNRESET`, `ECONNABORTED`) are logged at `debug`. The backoff counter resets on successful accept.
-
-#### 5.5 — CLI Arguments Added (`--config`, `--data-dir`, `--version`, `--help`)
-
-`src/main.rs`, `src/runtime/lifecycle.rs` — The binary now accepts `--config ` and `--data-dir ` to override the default config and data directory paths (previously inferred from `current_exe()`). `--version` prints the crate version and exits. `--help` prints a usage summary. These flags enable multi-instance deployments, systemd unit files with explicit paths, and CI test runs without relying on the working directory.
-
-#### 5.6 — `cargo deny check` Passes Cleanly; `audit.toml` Consolidated
-
-`deny.toml`, CI — `audit.toml` (which suppressed `RUSTSEC-2023-0071` without a documented rationale) has been removed. Advisory suppression is now managed exclusively in `deny.toml`, which carries the full justification. CI now runs `cargo deny check` as a required step, subsuming the advisory check. The existing rationale for `RUSTSEC-2023-0071` is unchanged: the `rsa` crate is used only for signature verification on Tor directory documents, not for decryption; the Marvin timing attack's threat model does not apply.
-
----
+### Removed
-### HTTP Server
-
-- Custom HTTP/1.1 static file server built directly on `tokio::net::TcpListener` — no third-party HTTP framework dependency.
-- Serves `GET` and `HEAD` requests; all other methods return `400 Bad Request`.
-- Percent-decoding of URL paths (e.g. `%20` → space) before file resolution.
-- Query string and fragment stripping before path resolution.
-- Path traversal protection: every resolved path is verified to be a descendant of the site root via `std::fs::canonicalize`; any attempt to escape (e.g. `/../secret`) is rejected with `HTTP 403 Forbidden`.
-- Request header size cap of 8 KiB; oversized requests are rejected immediately.
-- `Content-Type`, `Content-Length`, and `Connection: close` headers on every response.
-- Configurable index file (default: `index.html`) served for directory requests.
-- Optional HTML directory listing for directory requests when no index file is found, with alphabetically sorted entries.
-- Built-in "No site found" fallback page (HTTP 200) when the site directory is empty and directory listing is disabled, so the browser always shows a helpful message rather than a connection error.
-- Placeholder `index.html` written on first run so the server is immediately functional out of the box.
-- Automatic port fallback: if the configured port is in use, the server silently tries the next free port up to 10 times before giving up (configurable via `auto_port_fallback`).
-- Configurable bind address; defaults to `127.0.0.1` (loopback only) with a logged warning when set to `0.0.0.0`.
-- Per-connection Tokio tasks so concurrent requests never block each other.
-
-### MIME Types
-
-- Built-in extension-to-MIME mapping with no external dependency, covering:
- - Text: `html`, `htm`, `css`, `js`, `mjs`, `txt`, `csv`, `xml`, `md`
- - Data: `json`, `jsonld`, `pdf`, `wasm`, `zip`
- - Images: `png`, `jpg`/`jpeg`, `gif`, `webp`, `svg`, `ico`, `bmp`, `avif`
- - Fonts: `woff`, `woff2`, `ttf`, `otf`
- - Audio: `mp3`, `ogg`, `wav`
- - Video: `mp4`, `webm`
- - Unknown extensions fall back to `application/octet-stream`.
-
-### Tor Onion Service (Arti — in-process)
-
-- Embedded Tor support via [Arti](https://gitlab.torproject.org/tpo/core/arti), the official Rust Tor implementation — no external `tor` binary or `torrc` file required.
-- Bootstraps to the Tor network in a background Tokio task; never blocks the HTTP server or console.
-- First run downloads approximately 2 MB of directory consensus data (approximately 30 seconds); subsequent runs reuse the cache and start in seconds.
-- Stable `.onion` address across restarts: the service keypair is persisted to `rusthost-data/arti_state/`; deleting this directory rotates to a new address.
-- Consensus cache stored in `rusthost-data/arti_cache/` for fast startup.
-- Onion address encoded in-process using the v3 `.onion` spec (SHA3-256 checksum + base32) — no dependency on Arti's `DisplayRedacted` formatting.
-- Each inbound Tor connection is bridged to the local HTTP server via `tokio::io::copy_bidirectional` in its own Tokio task.
-- Tor subsystem can be disabled entirely with `[tor] enabled = false`; the dashboard onion section reflects this immediately.
-- Graceful shutdown: the `TorClient` is dropped naturally when the Tokio runtime exits, closing all circuits cleanly — no explicit kill step needed.
-- `.onion` address displayed in the dashboard and logged in a prominent banner once the service is active.
-
-### Interactive Terminal Dashboard
-
-- Full-screen raw-mode terminal UI built with [crossterm](https://github.com/crossterm-rs/crossterm); no external TUI framework.
-- Three screens navigable with single-key bindings:
- - **Dashboard** (default) — live status overview.
- - **Log view** — last 40 log lines, toggled with `[L]`.
- - **Help overlay** — key binding reference, toggled with `[H]`; any other key dismisses it.
-- Dashboard sections:
- - **Status** — local server state (RUNNING with bind address and port, or STARTING) and Tor state (DISABLED / STARTING / READY / FAILED with exit code).
- - **Endpoints** — local `http://localhost:` URL and Tor `.onion` URL (or a dim status hint if Tor is not yet ready).
- - **Site** — directory path, file count, and total size (auto-scaled to B / KB / MB / GB).
- - **Activity** — total request count and error count (errors highlighted in red when non-zero).
- - **Key bar** — persistent one-line reminder of available key bindings.
-- Dashboard redraws at a configurable interval (default: 500 ms).
-- Log view supports optional `HH:MM:SS` timestamp display, toggled via `show_timestamps` in config.
-- Customisable instance name shown in the dashboard header (max 32 characters).
-- Headless / non-interactive mode: set `[console] interactive = false` for systemd or piped deployments; the server prints a plain `http://…` line to stdout instead.
-- Graceful terminal restore on fatal crash: raw mode is disabled and the cursor is shown even if the process exits unexpectedly.
-
-### Configuration
-
-- TOML configuration file (`rusthost-data/settings.toml`) with six sections: `[server]`, `[site]`, `[tor]`, `[logging]`, `[console]`, `[identity]`.
-- Configuration validated at startup with clear, multi-error messages before any subsystem is started.
-- Validated fields include port range, bind IP address format, index file name (no path separators), log level, console refresh rate minimum (100 ms), instance name length (1–32 chars), and absence of control characters in the name.
-- Full default config written automatically on first run with inline comments explaining every option.
-- Reloading site stats (file count and total size) without restart via `[R]` in the dashboard.
-
-### Logging
-
-- Custom `log::Log` implementation; all modules use the standard `log` facade macros (`log::info!`, `log::warn!`, etc.).
-- Dual output: log file on disk (append mode, parent directories created automatically) and an in-memory ring buffer.
-- Ring buffer holds the most recent 1 000 lines and feeds the console log view without any file I/O on each render tick.
-- Log file path configurable relative to `rusthost-data/`; defaults to `logs/rusthost.log`.
-- Configurable log level: `trace`, `debug`, `info`, `warn`, `error`.
-- Timestamped entries in `[LEVEL] [HH:MM:SS] message` format.
-- Logging can be disabled entirely (`[logging] enabled = false`) for minimal-overhead deployments.
-
-### Lifecycle and Startup
-
-- **First-run detection**: if `rusthost-data/settings.toml` does not exist, RustHost initialises the data directory (`site/`, `logs/`), writes defaults, drops a placeholder `index.html`, prints a short getting-started guide, and exits cleanly — no daemon started.
-- **Normal run** startup sequence: load and validate config → initialise logging → build shared state → scan site directory → bind HTTP server → start Tor (if enabled) → start console → open browser (if configured) → enter event loop.
-- Shutdown triggered by `[Q]` keypress or `SIGINT`/`SIGTERM` (via `ctrlc`); sends a watch-channel signal to the HTTP server and console, then waits 300 ms for in-flight connections before exiting.
-- Optional browser launch at startup (`open_browser_on_start`); uses `open` (macOS), `explorer` (Windows), or `xdg-open` (Linux/other).
-- All subsystems share state through an `Arc>`; hot-path request and error counters use separate `Arc` backed by atomics so the HTTP handler never acquires a lock per request.
-
-### Project and Build
-
-- Single binary; no installer, no runtime dependencies beyond the binary itself (Tor included via Arti).
-- Data directory co-located with the binary at `./rusthost-data/`; entirely self-contained.
-- Minimum supported Rust version: 1.86 (required by `arti-client 0.40`).
-- Release profile: `opt-level = 3`, LTO enabled, debug symbols stripped.
-- `cargo-deny` configuration (`deny.toml`) enforcing allowed SPDX licenses (MIT, Apache-2.0, Apache-2.0 WITH LLVM-exception, Zlib, Unicode-3.0) and advisory database checks; known transitive duplicate crates (`mio`, `windows-sys`) skipped with comments.
-- Advisory `RUSTSEC-2023-0071` (RSA Marvin timing attack) acknowledged and suppressed with a documented rationale: the `rsa` crate is a transitive dependency of `arti-client` used exclusively for RSA *signature verification* on Tor directory consensus documents, not decryption; the attack's threat model does not apply.
+- **`auto_reload` config field** — was documented but never implemented. Removed to avoid confusion. The `[R]` key for manual site stat reload is unaffected.
+- **`ctrlc` crate dependency** — replaced by `tokio::signal` (see above).
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..eb9dab3
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,165 @@
+# Contributing to RustHost
+
+Thank you for considering a contribution. This document explains the development
+workflow, code standards, and review expectations so your time is spent well.
+
+---
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Getting Started](#getting-started)
+3. [Code Standards](#code-standards)
+4. [Testing](#testing)
+5. [Submitting a Pull Request](#submitting-a-pull-request)
+6. [Architecture Overview](#architecture-overview)
+7. [Issue Labels](#issue-labels)
+
+---
+
+## Prerequisites
+
+| Tool | Minimum version | Notes |
+|------|-----------------|-------|
+| Rust (nightly) | see `rust-toolchain.toml` | pinned channel; installed automatically by `rustup` |
+| `cargo-audit` | latest | `cargo install cargo-audit` |
+| `cargo-deny` | latest | `cargo install cargo-deny` |
+
+The pinned nightly toolchain is defined in `rust-toolchain.toml` at the
+repository root. Running any `cargo` command will invoke `rustup` to install it
+automatically on first use.
+
+---
+
+## Getting Started
+
+```sh
+git clone https://github.com/your-org/rusthost
+cd rusthost
+
+# Build and run tests
+cargo test --all
+
+# Run clippy (same flags as CI)
+cargo clippy --all-targets --all-features -- -D warnings
+
+# Run the binary against a local directory
+cargo run -- --serve ./my-site
+```
+
+---
+
+## Code Standards
+
+### Lint gates
+
+Every file must pass the workspace-level gates declared in `Cargo.toml`:
+
+```toml
+[lints.rust]
+unsafe_code = "forbid"
+
+[lints.clippy]
+all = { level = "deny", priority = -1 }
+pedantic = { level = "deny", priority = -1 }
+nursery = { level = "warn", priority = -1 }
+```
+
+Use `#[allow(...)]` sparingly and always include a comment explaining why the
+lint is suppressed. Suppressions must be as narrow as possible — prefer a
+targeted `#[allow]` on a single expression over a module-level gate.
+
+### Comment style
+
+- Explain **why**, not **what** — the code already says what it does.
+- Never use opaque internal tags like `fix H-1` or `fix 3.2` in comments.
+ Replace them with a sentence that makes sense to a new contributor.
+- Doc comments (`///` and `//!`) must be written in full sentences and end with
+ a period.
+
+### No `unsafe`
+
+`unsafe_code = "forbid"` is set at the workspace level. PRs that add `unsafe`
+will not be merged.
+
+### Error handling
+
+All subsystems return `crate::Result` (alias for `Result`).
+Avoid `.unwrap()` and `.expect()` in non-test code; use `?` propagation and
+match on `AppError` variants at call sites that need to handle specific cases.
+
+---
+
+## Testing
+
+```sh
+# Unit tests only
+cargo test --lib
+
+# All tests (unit + integration)
+cargo test --all
+
+# A specific test by name
+cargo test percent_decode
+
+# Security audit
+cargo audit
+
+# Dependency policy check
+cargo deny check
+```
+
+Integration tests live in `tests/`. They import items re-exported from
+`src/lib.rs` under `#[cfg(test)]` guards so they do not pollute the public API.
+
+---
+
+## Submitting a Pull Request
+
+1. **Branch naming**: `fix/` or `feat/`.
+2. **Commit messages**: use the imperative mood (`Add`, `Fix`, `Remove`), ≤72
+ characters on the subject line. Add a body paragraph for anything that
+ needs explaining.
+3. **One concern per PR**: a PR that mixes a bug fix with a refactor is harder
+ to review and revert.
+4. **Changelog**: add a line under `[Unreleased]` in `CHANGELOG.md` before
+ opening the PR.
+5. **CI must be green**: all three CI jobs (`test`, `audit`, `deny`) must pass.
+ The `test` job runs on Ubuntu, macOS, and Windows.
+
+---
+
+## Architecture Overview
+
+```
+rusthost-cli (src/main.rs)
+ └── runtime::lifecycle::run()
+ ├── logging — file logger + in-memory ring buffer for the console
+ ├── server — hyper HTTP/1.1 accept loop + per-connection handler
+ ├── tor — Arti in-process Tor client + onion service proxy
+ ├── console — crossterm TUI (render task + input task)
+ └── config — TOML loader + typed structs
+```
+
+Key data flows:
+
+- **Request path**: `TcpListener::accept` → `server::handler::handle` →
+ `resolve_path` → file I/O → hyper response.
+- **Tor path**: `tor::init` → Arti bootstrap → `StreamRequest` loop →
+ `proxy_stream` → local `TcpStream` → bidirectional copy.
+- **Shared state**: `SharedState` (an `Arc>`) is the single
+ source of truth for the dashboard. Write only from the lifecycle/event tasks;
+ read from the render task.
+
+---
+
+## Issue Labels
+
+| Label | Meaning |
+|-------|---------|
+| `bug` | Confirmed defect |
+| `security` | Security-relevant issue — see `SECURITY.md` for disclosure policy |
+| `enhancement` | New feature or improvement |
+| `good first issue` | Well-scoped, low-risk; suitable for new contributors |
+| `help wanted` | We'd appreciate community input |
+| `needs-repro` | Cannot reproduce; awaiting steps |
diff --git a/Cargo.lock b/Cargo.lock
index 6ec36f6..3454be7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -29,6 +29,21 @@ dependencies = [
"memchr",
]
+[[package]]
+name = "alloc-no-stdlib"
+version = "2.0.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "cc7bb162ec39d46ab1ca8c77bf72e890535becd1751bb45f64c597edb4c8c6b3"
+
+[[package]]
+name = "alloc-stdlib"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "94fb8275041c72129eb51b7d0322c29b8387a0386127718b096429201a5d6ece"
+dependencies = [
+ "alloc-no-stdlib",
+]
+
[[package]]
name = "alloca"
version = "0.4.0"
@@ -229,6 +244,7 @@ dependencies = [
"compression-core",
"futures-io",
"pin-project-lite",
+ "tokio",
]
[[package]]
@@ -297,6 +313,12 @@ dependencies = [
"bytemuck",
]
+[[package]]
+name = "atomic-waker"
+version = "1.1.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1505bd5d3d116872e7271a6d4e16d81d0c8570876c8de68093a09ac269d8aac0"
+
[[package]]
name = "autocfg"
version = "1.5.0"
@@ -375,6 +397,27 @@ dependencies = [
"generic-array",
]
+[[package]]
+name = "brotli"
+version = "8.0.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4bd8b9603c7aa97359dbd97ecf258968c95f3adddd6db2f7e7a5bef101c84560"
+dependencies = [
+ "alloc-no-stdlib",
+ "alloc-stdlib",
+ "brotli-decompressor",
+]
+
+[[package]]
+name = "brotli-decompressor"
+version = "5.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "874bb8112abecc98cbd6d81ea4fa7e94fb9449648c93cc89aa40c81c24d7de03"
+dependencies = [
+ "alloc-no-stdlib",
+ "alloc-stdlib",
+]
+
[[package]]
name = "bstr"
version = "1.12.1"
@@ -540,9 +583,11 @@ version = "0.4.37"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "eb7b51a7d9c967fc26773061ba86150f19c50c0d65c887cb1fbe295fd16619b7"
dependencies = [
+ "brotli",
"compression-core",
"flate2",
"liblzma",
+ "memchr",
"zstd",
"zstd-safe",
]
@@ -891,6 +936,20 @@ dependencies = [
"syn 2.0.117",
]
+[[package]]
+name = "dashmap"
+version = "6.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5041cc499144891f3790297212f32a74fb938e5136a14943f338ef9e0ae276cf"
+dependencies = [
+ "cfg-if",
+ "crossbeam-utils",
+ "hashbrown 0.14.5",
+ "lock_api",
+ "once_cell",
+ "parking_lot_core",
+]
+
[[package]]
name = "data-encoding"
version = "2.10.0"
@@ -1600,6 +1659,12 @@ version = "0.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a9ee70c43aaf417c914396645a0fa852624801b24ebb7ae78fe8272889ac888"
+[[package]]
+name = "hashbrown"
+version = "0.14.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e5274423e17b7c9fc20b6e7e208532f9b19825d82dfd615708b70edd83df41f1"
+
[[package]]
name = "hashbrown"
version = "0.15.5"
@@ -1673,6 +1738,29 @@ dependencies = [
"itoa",
]
+[[package]]
+name = "http-body"
+version = "1.0.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
+dependencies = [
+ "bytes",
+ "http",
+]
+
+[[package]]
+name = "http-body-util"
+version = "0.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
+dependencies = [
+ "bytes",
+ "futures-core",
+ "http",
+ "http-body",
+ "pin-project-lite",
+]
+
[[package]]
name = "httparse"
version = "1.10.1"
@@ -1701,6 +1789,41 @@ dependencies = [
"serde",
]
+[[package]]
+name = "hyper"
+version = "1.8.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2ab2d4f250c3d7b1c9fcdff1cece94ea4e2dfbec68614f7b87cb205f24ca9d11"
+dependencies = [
+ "atomic-waker",
+ "bytes",
+ "futures-channel",
+ "futures-core",
+ "http",
+ "http-body",
+ "httparse",
+ "httpdate",
+ "itoa",
+ "pin-project-lite",
+ "pin-utils",
+ "smallvec",
+ "tokio",
+]
+
+[[package]]
+name = "hyper-util"
+version = "0.1.20"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "96547c2556ec9d12fb1578c4eaf448b04993e7fb79cbaad930a656880a6bdfa0"
+dependencies = [
+ "bytes",
+ "http",
+ "http-body",
+ "hyper",
+ "pin-project-lite",
+ "tokio",
+]
+
[[package]]
name = "iana-time-zone"
version = "0.1.65"
@@ -2389,15 +2512,6 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c87def4c32ab89d880effc9e097653c8da5d6ef28e6b539d313baaacfbafcbe"
-[[package]]
-name = "openssl-src"
-version = "300.5.5+3.5.5"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "3f1787d533e03597a7934fd0a765f0d28e94ecc5fb7789f8053b1e699a56f709"
-dependencies = [
- "cc",
-]
-
[[package]]
name = "openssl-sys"
version = "0.9.112"
@@ -2406,7 +2520,6 @@ checksum = "57d55af3b3e226502be1526dfdba67ab0e9c96fc293004e79576b2b9edb0dbdb"
dependencies = [
"cc",
"libc",
- "openssl-src",
"pkg-config",
"vcpkg",
]
@@ -2602,6 +2715,12 @@ version = "0.2.17"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd"
+[[package]]
+name = "pin-utils"
+version = "0.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
+
[[package]]
name = "pkcs1"
version = "0.7.5"
@@ -3067,13 +3186,19 @@ name = "rusthost"
version = "0.1.0"
dependencies = [
"arti-client",
+ "async-compression",
+ "bytes",
"chrono",
"crossterm",
+ "dashmap",
"data-encoding",
"futures",
+ "http-body-util",
+ "hyper",
+ "hyper-util",
"libc",
"log",
- "openssl",
+ "percent-encoding",
"rusqlite",
"serde",
"sha3",
diff --git a/Cargo.toml b/Cargo.toml
index 3825c51..974c2ab 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -66,25 +66,46 @@ chrono = { version = "0.4", features = ["clock"] }
# OS error codes used in the accept-loop backoff to distinguish EMFILE/ENFILE
# (resource exhaustion → log error) from transient errors (log debug).
libc = "0.2"
-# Vendor OpenSSL source so the binary builds without system libssl-dev headers
-# on Linux. native-tls (pulled transitively through arti-client → tor-rtcompat)
-# links against OpenSSL on Linux; without this feature flag the build fails on
-# any machine that lacks the -dev package. macOS and Windows are unaffected
-# (they use Security.framework and SChannel respectively), but the `vendored`
-# feature is a no-op on those targets so there is no downside to enabling it
-# unconditionally. Build-time cost is ~60 s on first compile; subsequent
-# incremental builds are fast because the OpenSSL objects are cached.
-openssl = { version = "0.10", features = ["vendored"] }
# Force rusqlite's bundled SQLite for cross-compilation targets.
# arti-client pulls rusqlite transitively; declaring it here unifies the feature
# across the whole dep tree so cross-compiling to Linux/Windows works without a
# system sqlite3 library present on the host Mac.
rusqlite = { version = "*", features = ["bundled"] }
+# Per-IP connection tracking for rate limiting (Phase 2 — C-4).
+# DashMap is a concurrent hash map with fine-grained shard locking; it avoids
+# the single global Mutex that would serialise every accept() call.
+dashmap = "6"
+
+# Phase 5 (M-8) — replace hand-rolled percent_decode with the audited upstream crate.
+# The crate handles incomplete escape sequences and non-ASCII bytes correctly;
+# the wrapper adds only the null-byte guard specific to filesystem path use.
+percent-encoding = "2"
+
+# Phase 3 (C-1, H-8, H-9, H-13) — HTTP/1.1 keep-alive, ETag, Range, compression.
+# hyper provides a correct HTTP/1.1 connection loop with keep-alive; replacing
+# the hand-rolled single-shot parser eliminates the 30-45 s Tor page-load
+# penalty caused by Connection: close on every response.
+hyper = { version = "1", features = ["http1", "server"] }
+hyper-util = { version = "0.1", features = ["tokio"] }
+http-body-util = "0.1"
+bytes = "1"
+# async-compression provides Brotli and Gzip stream encoders. Brotli gives
+# significantly better compression ratios than Gzip, which matters a lot for
+# Tor users who pay per-byte in latency.
+async-compression = { version = "0.4", features = ["tokio", "brotli", "gzip"] }
[dev-dependencies]
tempfile = "3"
+[profile.dev.package."*"]
+opt-level = 1 # dependency builds: faster compile, smaller debug symbols
+
+[profile.dev]
+opt-level = 0
+debug = true
+
[profile.release]
-opt-level = 3
-lto = true
-strip = true
+opt-level = 3
+lto = true
+strip = true
+codegen-units = 1 # maximum optimisation; slower link but smaller/faster binary
diff --git a/README.md b/README.md
index ca0aca3..02db506 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,8 @@
-**A self-contained static file server with first-class Tor onion service support — no binaries, no `torrc`, no compromise.**
+**A single-binary static file server with built-in Tor onion service support.**
+No daemons. No config files outside this project. No compromise.
[](https://www.rust-lang.org/)
[](LICENSE)
@@ -23,13 +24,15 @@
## What is RustHost?
-RustHost is a single-binary static file server that brings your content to the clearnet **and** the Tor network simultaneously — with zero external dependencies. Tor is embedded directly into the process via [Arti](https://gitlab.torproject.org/tpo/core/arti), the official Rust Tor implementation. No `tor` daemon, no `torrc`, no system configuration required.
+RustHost is a static file server — you give it a folder of HTML, CSS, and JavaScript files, and it serves them over HTTP. What makes it different is that it also puts your site on the **Tor network** automatically, giving every site a `.onion` address right alongside the normal `localhost` one.
-Drop the binary next to your site files, run it once, and you get:
+It's a single binary with Tor baked in. No installing a separate Tor program, no editing system config files.
-- A local HTTP server ready for immediate use
-- A stable `.onion` v3 address that survives restarts
-- A live terminal dashboard showing you everything at a glance
+**Who is it for?** Developers who want a quick local server with privacy features, self-hosters who want their sites reachable over Tor, and anyone who wants to run a personal site without touching system-level config.
+
+---
+
+## What it looks like
```
┌─ RustHost ─────────────────────────────────────────────────────────┐
@@ -47,185 +50,251 @@ Drop the binary next to your site files, run it once, and you get:
└──────────────────────────────────────────────────────────────────────┘
```
-
+
---
-## Features
-
-### 🌐 HTTP Server
-- Built directly on `tokio::net::TcpListener` — no HTTP framework dependency
-- Handles `GET` and `HEAD` requests; concurrent connections via per-task Tokio workers
-- **Buffered request reading** via `tokio::io::BufReader` — headers read line-by-line, not byte-by-byte
-- **File streaming** via `tokio::io::copy` — memory per connection is bounded by the socket buffer (~256 KB) regardless of file size
-- **30-second request timeout** (configurable via `request_timeout_secs`); slow or idle connections receive `408 Request Timeout`
-- **Semaphore-based connection limit** (configurable via `max_connections`, default 256) — excess connections queue at the OS backlog level rather than spawning unbounded tasks
-- Percent-decoded URL paths with correct multi-byte UTF-8 handling; null bytes (`%00`) are never decoded
-- Query string & fragment stripping before path resolution
-- **Path traversal protection** — every path verified as a descendant of the site root via `canonicalize` (called once at startup, not per request); escapes rejected with `403 Forbidden`
-- Configurable index file, optional HTML directory listing with fully HTML-escaped and URL-encoded filenames, and a built-in fallback page
-- Automatic port selection if the configured port is busy (up to 10 attempts)
-- Request header cap at 8 KiB; `Content-Type`, `Content-Length`, and `Connection: close` on every response
-- **Security headers on every response**: `X-Content-Type-Options`, `X-Frame-Options`, `Referrer-Policy: no-referrer`, `Permissions-Policy`; configurable `Content-Security-Policy` on HTML responses
-- **HEAD responses** include correct `Content-Length` but no body, as required by RFC 7231 §4.3.2
-- Accept loop uses **exponential backoff** on errors and distinguishes `EMFILE` (operator-level error) from transient errors (`ECONNRESET`, `ECONNABORTED`)
-
-### 🧅 Tor Onion Service *(fully working)*
-- Embedded via [Arti](https://gitlab.torproject.org/tpo/core/arti) — the official Rust Tor client — in-process, no external daemon
-- Bootstraps to the Tor network in the background; never blocks your server or dashboard
-- **Stable address**: the v3 service keypair is persisted to `rusthost-data/arti_state/`. Delete the directory to rotate to a new address
-- First run fetches ~2 MB of directory data (~30 s); subsequent starts reuse the cache and are up in seconds
-- Onion address computed fully in-process using the v3 spec (SHA3-256 + base32)
-- Each inbound Tor connection is bridged to the local HTTP listener via `tokio::io::copy_bidirectional`
-- **Port synchronised via `oneshot` channel** — the Tor subsystem always receives the actual bound port, eliminating a race condition that could cause silent connection failures
-- **`TorStatus` reflects mid-session failures** — if the onion service stream terminates unexpectedly, the dashboard transitions to `FAILED (reason)` and clears the displayed `.onion` address
-- Participates in **graceful shutdown** — the run loop watches the shutdown signal via `tokio::select!` and exits cleanly
-- Can be disabled entirely with `[tor] enabled = false`
-
-### 🖥️ Interactive Terminal Dashboard
-- Full-screen raw-mode TUI built with [crossterm](https://github.com/crossterm-rs/crossterm) — no TUI framework
-- Three screens, all keyboard-navigable:
-
- | Key | Screen |
- |-----|--------|
- | *(default)* | **Dashboard** — live status, endpoints, site stats, request/error counters |
- | `L` | **Log view** — last 40 log lines with optional timestamps |
- | `H` | **Help overlay** — key binding reference |
- | `R` | Reload site file count & size without restart |
- | `Q` | Graceful shutdown |
-
-- **Skip-on-idle rendering** — the terminal is only written when the rendered output changes, eliminating unnecessary writes on quiet servers
-- `TorStatus::Failed` displays a human-readable reason string (e.g. `FAILED (stream ended)`) rather than a bare error indicator
-- Keyboard input task failure is detected and reported; the process remains killable via Ctrl-C
-- **Terminal fully restored on all exit paths** — panic hook and error handler both call `console::cleanup()` before exiting, ensuring `LeaveAlternateScreen`, `cursor::Show`, and `disable_raw_mode` always run
-- Configurable refresh rate (default 500 ms); headless mode available for `systemd` / piped deployments
-
-### ⚙️ Configuration
-- TOML file at `rusthost-data/settings.toml`, auto-generated with inline comments on first run
-- Six sections: `[server]`, `[site]`, `[tor]`, `[logging]`, `[console]`, `[identity]`
-- **`#[serde(deny_unknown_fields)]`** on all structs — typos in key names are rejected at startup with a clear error
-- **Typed config fields** — `bind` is `IpAddr`, `log level` is a `LogLevel` enum; invalid values are caught at deserialisation time
-- Startup validation with clear, multi-error messages — nothing starts until config is clean
-- Config and data directory paths overridable via **`--config `** and **`--data-dir `** CLI flags
-
-### 📝 Logging
-- Custom `log::Log` implementation; dual output — append-mode log file + in-memory ring buffer (1 000 lines)
-- Ring buffer feeds the dashboard log view with zero file I/O per render tick
-- **Dependency log filtering** — Arti and Tokio internals at `Info` and below are suppressed by default, keeping the log focused on application events (configurable via `filter_dependencies`)
-- Log file explicitly flushed on graceful shutdown
-- Configurable level (`trace` → `error`) and optional full disable for minimal-overhead deployments
-
-### 🧪 Testing & CI
-- Unit tests for all security-critical functions: `percent_decode`, `resolve_path`, `validate`, `strip_timestamp`, `hsid_to_onion_address`
-- Integration tests (`tests/http_integration.rs`) covering all HTTP core flows via raw `TcpStream`
-- `cargo deny check` runs in CI, enforcing the SPDX license allowlist and advisory database; `audit.toml` consolidated into `deny.toml`
+## Key Features
+
+- **Static file server** — serves HTML, CSS, JS, images, fonts, audio, and video with correct MIME types
+- **Built-in Tor support** — your site gets a stable `.onion` address automatically, no external Tor install needed
+- **Live terminal dashboard** — shows your endpoints, request counts, and logs in a clean full-screen UI
+- **Single binary** — no installer, no runtime dependencies, no system packages to manage
+- **SPA-friendly** — supports React, Vue, and Svelte client-side routing with a fallback-to-`index.html` option
+- **HTTP protocol done right** — keep-alive, `ETag`/conditional GET, range requests, Brotli/Gzip compression
+- **Security headers out of the box** — CSP, HSTS, `X-Content-Type-Options`, `Referrer-Policy`, and more on every response
+- **Rate limiting per IP** — lock-free connection cap prevents a single client from taking down your server
+- **Per-IP connection limits**, request timeouts, path traversal protection, and header injection prevention
+- **Hot reload** — press `[R]` to refresh site stats without restarting
+- **Headless mode** — run it in the background under systemd without the TUI
+
+---
+
+## Why Arti instead of the regular Tor?
+
+When most people think of Tor, they think of the `tor` binary — a program written in C that you install separately and talk to via a config file called `torrc`. That works fine, but it means your application depends on an external process you don't control.
+
+**Arti** is the [official Tor Project rewrite of Tor in Rust](https://gitlab.torproject.org/tpo/core/arti). RustHost uses it as a library — Tor runs *inside* the same process as your server, with no external daemon.
+
+Here's a plain-English comparison:
+
+| | Classic `tor` binary | Arti (what RustHost uses) |
+|---|---|---|
+| Language | C | Rust |
+| Memory safety | Manual (prone to CVEs) | Guaranteed by the compiler |
+| Distribution | Separate install required | Compiled into the binary |
+| Config | `torrc` file, separate process | Code-level API, no config file |
+| Maturity | 20+ years, battle-tested | Newer, actively developed |
+| Embeddability | Hard — subprocess + socket | Easy — just a library call |
+
+**Honest tradeoffs:** Arti is still maturing. Some advanced Tor features (bridges, pluggable transports) are not yet stable in Arti. If you need those, the classic `tor` binary is the right tool. For straightforward onion hosting, Arti works well and gives you a much simpler setup.
+
+The Rust memory-safety guarantee matters here specifically because Tor handles untrusted network traffic. A buffer overflow or use-after-free in a C-based Tor implementation is a real historical risk. With Arti in Rust, that entire class of bug is eliminated by the language.
---
## Quick Start
-### 1. Build
+> **Need help with prerequisites?** See [SETUP.md](SETUP.md) for step-by-step install instructions.
```bash
+# 1. Clone and build
git clone https://github.com/yourname/rusthost
cd rusthost
cargo build --release
+
+# 2. First run — sets up the data directory and exits
+./target/release/rusthost
+
+# 3. Put your files in rusthost-data/site/, then run again
+./target/release/rusthost
```
-> **Minimum Rust version: 1.86** (required by `arti-client 0.40`)
+That's it. Your site is live at `http://localhost:8080`. The `.onion` address appears in the dashboard after about 30 seconds while Tor bootstraps in the background.
-### 2. First run — initialise your data directory
+> **Your stable `.onion` address** is stored in `rusthost-data/arti_state/`. Back this directory up — it contains your keypair. Delete it only if you want a new address.
+
+---
+
+## Full Setup Reference
+
+For detailed install instructions, OS-specific steps, common errors, and how to verify everything is working, see **[SETUP.md](SETUP.md)**.
+
+---
+
+## Usage Examples
+
+### Serve a specific directory without a config file
```bash
-./target/release/rusthost
+./target/release/rusthost --serve ./my-website
```
-On first run, RustHost detects that `rusthost-data/settings.toml` is missing, scaffolds the data directory, writes a default config and a placeholder `index.html`, prints a getting-started guide, and exits. Nothing is daemonised yet.
+Good for quick one-off serving. Skips first-run setup entirely.
+### Run with a custom config location
+
+```bash
+./target/release/rusthost --config /etc/rusthost/settings.toml --data-dir /var/rusthost
```
-rusthost-data/
-├── settings.toml ← your config (edit freely)
-├── site/
-│ └── index.html ← placeholder, replace with your files
-├── logs/
-│ └── rusthost.log
-├── arti_cache/ ← Tor directory consensus (auto-managed)
-└── arti_state/ ← your stable .onion keypair (back this up!)
+
+Useful for running multiple instances or deploying under systemd.
+
+### Run headless (no terminal UI)
+
+Set `interactive = false` in `settings.toml`:
+
+```toml
+[console]
+interactive = false
```
-### 3. Serve
+RustHost will print the URL to stdout and log everything to the log file. Perfect for running as a background service.
-```bash
-./target/release/rusthost
+### Disable Tor entirely
+
+```toml
+[tor]
+enabled = false
```
-The dashboard appears. Your site is live on `http://localhost:8080`. Tor bootstraps in the background — your `.onion` address appears in the **Endpoints** panel once ready (~30 s on first run).
+Useful if you just want a fast local HTTP server and don't need the `.onion` address.
-### CLI flags
+### Enable SPA routing (React, Vue, Svelte)
+
+```toml
+[site]
+spa_routing = true
+```
+
+Unknown paths fall back to `index.html` instead of returning 404. This is what client-side routers expect.
+
+---
+
+## All CLI Flags
```
rusthost [OPTIONS]
Options:
+ --serve Serve a directory directly, no settings.toml needed
--config Path to settings.toml (default: rusthost-data/settings.toml)
- --data-dir Path to data directory (default: rusthost-data/ next to binary)
+ --data-dir Path to the data directory (default: ./rusthost-data/)
--version Print version and exit
--help Print this help and exit
```
---
-## Configuration Reference
+## Configuration
+
+The config file lives at `rusthost-data/settings.toml` and is created automatically on first run with comments explaining every option.
```toml
[server]
-port = 8080
-bind = "127.0.0.1" # set "0.0.0.0" to expose on LAN (logs a warning)
-index_file = "index.html"
-directory_listing = false
-auto_port_fallback = true
-max_connections = 256 # semaphore cap on concurrent connections
-request_timeout_secs = 30 # seconds before idle connection receives 408
-content_security_policy = "default-src 'self'" # applied to HTML responses only
+port = 8080
+bind = "127.0.0.1" # use "0.0.0.0" to expose on your LAN
+index_file = "index.html"
+directory_listing = false # show file lists for directories
+auto_port_fallback = true # try next port if 8080 is taken
+max_connections = 256 # max simultaneous connections
+request_timeout_secs = 30 # seconds before an idle connection gets 408
+content_security_policy = "default-src 'self'" # applied to HTML responses only
[site]
-root = "rusthost-data/site"
+root = "rusthost-data/site"
+spa_routing = false # set true for React/Vue/Svelte apps
+error_404 = "" # path to a custom 404.html
+error_503 = "" # path to a custom 503.html
[tor]
-enabled = true # set false to skip Tor entirely
+enabled = true # set false to skip Tor entirely
[logging]
-enabled = true
-level = "info" # trace | debug | info | warn | error
-path = "logs/rusthost.log"
-filter_dependencies = true # suppress Arti/Tokio noise at info and below
+enabled = true
+level = "info" # trace | debug | info | warn | error
+path = "logs/rusthost.log"
+filter_dependencies = true # suppress Arti/Tokio noise at info level
[console]
-interactive = true # false for systemd / piped deployments
-refresh_ms = 500 # minimum 100
+interactive = true # false for systemd / background use
+refresh_ms = 500
show_timestamps = false
open_browser_on_start = false
[identity]
-name = "RustHost" # 1–32 chars, shown in dashboard header
+name = "RustHost" # shown in the dashboard header (max 32 chars)
+```
+
+> Typos in key names are caught at startup. If you write `bund = "127.0.0.1"` instead of `bind`, RustHost will tell you exactly which field is unknown and exit before starting.
+
+---
+
+## Project Structure
+
+After first run, your directory will look like this:
+
+```
+rusthost-data/
+├── settings.toml Your config file — edit this freely
+├── site/ Drop your website files here
+│ └── index.html Placeholder — replace with your own
+├── logs/
+│ └── rusthost.log Rotating access and event log (owner-read only)
+├── arti_cache/ Tor directory data — auto-managed, safe to delete
+└── arti_state/ Your .onion keypair — BACK THIS UP
+```
+
+And in the repo:
+
+```
+src/
+├── config/ Config loading and validation
+├── console/ Terminal dashboard (crossterm)
+├── logging/ Log file + in-memory ring buffer
+├── runtime/ Startup, shutdown, and event loop
+├── server/ HTTP server (handler, MIME types, path resolution)
+└── tor/ Arti integration and onion service bridge
```
---
## Built-in MIME Types
-No external dependency. RustHost ships with a handwritten extension map covering:
+RustHost ships a handwritten MIME map — no external lookup or database.
| Category | Extensions |
-|----------|-----------|
+|----------|------------|
| Text | `html` `htm` `css` `js` `mjs` `txt` `csv` `xml` `md` |
-| Data | `json` `jsonld` `pdf` `wasm` `zip` |
-| Images | `png` `jpg/jpeg` `gif` `webp` `svg` `ico` `bmp` `avif` |
+| Data | `json` `jsonld` `pdf` `wasm` `zip` `ndjson` |
+| Images | `png` `jpg` `jpeg` `gif` `webp` `svg` `ico` `bmp` `avif` |
| Fonts | `woff` `woff2` `ttf` `otf` |
-| Audio | `mp3` `ogg` `wav` |
+| Audio | `mp3` `ogg` `wav` `opus` `flac` |
| Video | `mp4` `webm` |
+| 3D | `glb` |
+| PWA | `webmanifest` |
+
+Anything not in this list gets `application/octet-stream`.
-Unknown extensions fall back to `application/octet-stream`.
+---
+
+## Security
+
+A quick summary of what RustHost does to keep things safe:
+
+| Threat | What RustHost does |
+|--------|-------------------|
+| Path traversal (e.g. `/../etc/passwd`) | Every path is resolved with `canonicalize` and checked against the site root. Escapes get a `403`. |
+| XSS via crafted filenames in directory listings | Filenames are HTML-escaped in link text and percent-encoded in `href` attributes. |
+| Slow-loris DoS (deliberately slow clients) | 30-second request timeout — connections that don't send headers in time get a `408`. |
+| Connection exhaustion | Semaphore cap at 256 concurrent connections by default. |
+| Header injection | `sanitize_header_value` strips all control characters from values (not just CR/LF). |
+| Large file memory exhaustion | Files are streamed with `tokio::io::copy` — memory per connection is bounded by the socket buffer. |
+| `.onion` address leakage | `Referrer-Policy: no-referrer` prevents your `.onion` URL from appearing in `Referer` headers. |
+| Config typos silently using defaults | `#[serde(deny_unknown_fields)]` on all config structs — unknown keys are a hard startup error. |
+| Terminal injection via instance name | The `name` field is validated against all control characters at startup. |
+
+**Note on RUSTSEC-2023-0071 (RSA Marvin timing attack):** This advisory is acknowledged and suppressed in `deny.toml` with a documented rationale. The `rsa` crate comes in as a transitive dependency of `arti-client` and is used only for *verifying* RSA signatures on Tor directory documents — not for decryption. The Marvin attack requires a decryption oracle, which is not present here.
---
@@ -246,35 +315,21 @@ Unknown extensions fall back to `application/octet-stream`.
└─────────────────────────────────────┘
```
-All subsystems share state through `Arc>`. Hot-path request and error counters use a separate `Arc` backed by atomics — the HTTP handler **never acquires a lock per request**.
-
-The HTTP server and Tor subsystem share a `tokio::sync::Semaphore` that caps concurrent connections. The bound port is communicated to Tor via a `oneshot` channel before the accept loop begins, eliminating the startup race condition present in earlier versions.
+All subsystems share state through `Arc>`. Hot-path counters (request counts, error counts) live in a separate `Arc` backed by atomics, so the HTTP handler never acquires a lock per request.
-Shutdown is coordinated via a `watch` channel: `[Q]`, `SIGINT`, or `SIGTERM` signals all subsystems simultaneously. In-flight HTTP connections are tracked in a `JoinSet` and given up to 5 seconds to complete. The log file is explicitly flushed before the process exits.
+Shutdown is coordinated via a `watch` channel. `[Q]`, `SIGINT`, and `SIGTERM` all signal every subsystem at the same time. In-flight connections are tracked in a `JoinSet` and given up to 5 seconds to finish before the process exits.
---
-## Security
+## Contributing
+
+Contributions are welcome. A few things worth knowing before you start:
-| Concern | Mitigation |
-|---------|-----------|
-| Path traversal (requests) | `std::fs::canonicalize` + descendant check per request; `403` on escape |
-| Path traversal (config) | `site.directory` and `logging.file` validated against `..`, absolute paths, and path separators at startup |
-| Directory listing XSS | Filenames HTML-entity-escaped in link text; percent-encoded in `href` attributes |
-| Header overflow | 8 KiB hard cap; oversized requests rejected immediately |
-| Slow-loris DoS | 30-second request timeout; `408` sent on expiry |
-| Connection exhaustion | Semaphore cap (default 256); excess connections queue at OS level |
-| Memory exhaustion (large files) | Files streamed via `tokio::io::copy`; per-connection memory bounded by socket buffer |
-| Bind exposure | Defaults to loopback (`127.0.0.1`); warns loudly on `0.0.0.0` |
-| ANSI/terminal injection | `instance_name` validated against all control characters (`is_control`) at startup |
-| Security response headers | `X-Content-Type-Options`, `X-Frame-Options`, `Referrer-Policy: no-referrer`, `Permissions-Policy`, configurable `Content-Security-Policy` |
-| `.onion` URL leakage | `Referrer-Policy: no-referrer` prevents the `.onion` address from appearing in `Referer` headers sent to third-party resources |
-| Tor port race | Bound port delivered to Tor via `oneshot` channel before accept loop starts |
-| Silent Tor failure | `TorStatus` transitions to `Failed(reason)` and onion address is cleared when the service stream ends |
-| Percent-decode correctness | Multi-byte UTF-8 sequences decoded correctly; null bytes (`%00`) never decoded |
-| Config typos | `#[serde(deny_unknown_fields)]` on all structs |
-| License compliance | `cargo-deny` enforces SPDX allowlist at CI time |
-| [RUSTSEC-2023-0071](https://rustsec.org/advisories/RUSTSEC-2023-0071) | Suppressed with rationale in `deny.toml`: the `rsa` crate is a transitive dep of `arti-client` used **only** for signature *verification* on Tor directory documents — the Marvin timing attack's threat model (decryption oracle) does not apply |
+- The lint gates are strict: `clippy::all`, `clippy::pedantic`, and `clippy::nursery`. Run `cargo clippy --all-targets -- -D warnings` before opening a PR.
+- Run the full test suite with `cargo test --all`.
+- All code paths should be covered by the existing tests, or new tests added for anything new.
+- See [CONTRIBUTING.md](CONTRIBUTING.md) for the full workflow, architecture notes, and PR checklist.
+- To report a security issue privately, see [SECURITY.md](SECURITY.md).
---
diff --git a/SETUP.md b/SETUP.md
new file mode 100644
index 0000000..2c04f05
--- /dev/null
+++ b/SETUP.md
@@ -0,0 +1,354 @@
+# Setting Up RustHost
+
+This guide walks you through everything you need to get RustHost running — from installing Rust to verifying your `.onion` address is live.
+
+---
+
+## Prerequisites
+
+### Rust
+
+RustHost requires **Rust 1.86 or newer**. This is set as the minimum because the Tor library it uses (`arti-client`) needs features from that release.
+
+To check what version you have:
+
+```bash
+rustc --version
+```
+
+If you don't have Rust installed, the easiest way is [rustup](https://rustup.rs/):
+
+```bash
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+```
+
+Follow the prompts, then restart your terminal (or run `source ~/.cargo/env`). Verify with:
+
+```bash
+rustc --version
+cargo --version
+```
+
+To update an existing Rust install:
+
+```bash
+rustup update stable
+```
+
+### Git
+
+You need Git to clone the repo. Most systems already have it.
+
+```bash
+git --version
+```
+
+If not:
+- **macOS**: `xcode-select --install` (installs Git as part of the Xcode CLI tools)
+- **Linux**: `sudo apt install git` (Debian/Ubuntu) or `sudo dnf install git` (Fedora)
+- **Windows**: Download from [git-scm.com](https://git-scm.com/)
+
+### Build tools
+
+Rust needs a C linker. On most systems this is already present.
+
+- **macOS**: You'll need the Xcode Command Line Tools — run `xcode-select --install` if you haven't already.
+- **Linux**: Install `gcc` and `build-essential` (Debian/Ubuntu) or `gcc` and `make` (Fedora/RHEL).
+- **Windows**: Install the [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/). When the installer asks, select "Desktop development with C++".
+
+---
+
+## Installing RustHost
+
+### Step 1 — Clone the repository
+
+```bash
+git clone https://github.com/yourname/rusthost
+cd rusthost
+```
+
+### Step 2 — Build in release mode
+
+```bash
+cargo build --release
+```
+
+This downloads and compiles all dependencies (including Arti, which is the Rust Tor library — this takes a few minutes on first build). The final binary ends up at:
+
+```
+target/release/rusthost (Linux / macOS)
+target\release\rusthost.exe (Windows)
+```
+
+> **Slow build?** The first build is always slow because Cargo is compiling everything from scratch. Subsequent builds are much faster thanks to the cache.
+
+### Step 3 — First run (data directory setup)
+
+Run the binary once from the project directory:
+
+```bash
+./target/release/rusthost
+```
+
+On first run, RustHost detects that `rusthost-data/settings.toml` doesn't exist and does the following:
+
+- Creates the `rusthost-data/` directory next to the binary
+- Writes a default `settings.toml` with all options commented
+- Creates `rusthost-data/site/` with a placeholder `index.html`
+- Creates `rusthost-data/logs/`
+- Prints a getting-started message and exits
+
+Nothing is started yet — this is just setup.
+
+### Step 4 — Add your site files
+
+Replace (or edit) the placeholder file:
+
+```bash
+# Put your HTML files in rusthost-data/site/
+cp -r /path/to/your/site/* rusthost-data/site/
+```
+
+### Step 5 — Start the server
+
+```bash
+./target/release/rusthost
+```
+
+The terminal dashboard appears. Your site is live at `http://localhost:8080`.
+
+Tor bootstraps in the background — your `.onion` address will appear in the **Endpoints** section of the dashboard after roughly 30 seconds on first run (subsequent starts reuse the cache and are much faster).
+
+---
+
+## OS-Specific Notes
+
+### macOS
+
+Everything works out of the box. If you see a firewall prompt asking whether to allow RustHost to accept incoming connections, click Allow.
+
+If you want to expose your server on your local network (not just `localhost`), change the bind address in `settings.toml`:
+
+```toml
+[server]
+bind = "0.0.0.0"
+```
+
+RustHost will log a warning when you do this — that's expected and intentional.
+
+### Linux
+
+Works the same as macOS. If you're running under systemd, see the [Running as a systemd service](#running-as-a-systemd-service) section below.
+
+On some minimal Linux installs you may need to install the OpenSSL development headers:
+
+```bash
+# Debian/Ubuntu
+sudo apt install pkg-config libssl-dev
+
+# Fedora
+sudo dnf install pkg-config openssl-devel
+```
+
+### Windows
+
+Build and run commands are the same, but use backslashes and the `.exe` extension:
+
+```powershell
+cargo build --release
+.\target\release\rusthost.exe
+```
+
+Note that file permissions (e.g., restricting the log file to owner-only) behave differently on Windows. The security restrictions around key directories and log files are enforced where the Windows API supports it.
+
+---
+
+## Running as a systemd service
+
+If you want RustHost to start automatically on boot, here's a simple service unit.
+
+First, move your binary and data directory somewhere stable:
+
+```bash
+sudo cp target/release/rusthost /usr/local/bin/rusthost
+sudo mkdir -p /var/rusthost
+sudo cp -r rusthost-data/* /var/rusthost/
+```
+
+Set `interactive = false` in `/var/rusthost/settings.toml` so RustHost doesn't try to draw a TUI:
+
+```toml
+[console]
+interactive = false
+```
+
+Create the service file:
+
+```bash
+sudo nano /etc/systemd/system/rusthost.service
+```
+
+```ini
+[Unit]
+Description=RustHost static file server
+After=network.target
+
+[Service]
+Type=simple
+User=www-data
+ExecStart=/usr/local/bin/rusthost --config /var/rusthost/settings.toml --data-dir /var/rusthost
+Restart=on-failure
+RestartSec=5s
+
+[Install]
+WantedBy=multi-user.target
+```
+
+Enable and start it:
+
+```bash
+sudo systemctl daemon-reload
+sudo systemctl enable rusthost
+sudo systemctl start rusthost
+sudo systemctl status rusthost
+```
+
+View logs:
+
+```bash
+journalctl -u rusthost -f
+```
+
+---
+
+## Verifying Everything Works
+
+### 1. Check the HTTP server
+
+Open a browser and go to `http://localhost:8080`. You should see your site (or the placeholder page on a fresh install).
+
+From the terminal:
+
+```bash
+curl -I http://localhost:8080
+```
+
+You should see a `200 OK` response with security headers like `X-Content-Type-Options` and `X-Frame-Options`.
+
+### 2. Check the Tor onion address
+
+Wait for the dashboard to show `TOR ● READY`. The `.onion` address will appear in the **Endpoints** section.
+
+Open the Tor Browser and navigate to that address. Your site should load.
+
+> **First run only:** Tor needs to download ~2 MB of directory data on first run. This usually takes 20–40 seconds. Subsequent starts reuse the cache and are ready in a few seconds.
+
+### 3. Check the logs
+
+Press `[L]` in the dashboard to switch to the log view. You should see startup messages and, once Tor is ready, a prominent banner with your `.onion` address.
+
+The log file is at `rusthost-data/logs/rusthost.log`.
+
+---
+
+## Common Errors and Fixes
+
+### `error: package 'arti-client v0.40.x' cannot be built because it requires rustc 1.86.0`
+
+Your Rust version is too old. Run `rustup update stable` and try again.
+
+### `Address already in use (os error 98)`
+
+Port 8080 is taken by something else. Either:
+- Stop the other service, or
+- Change the port in `settings.toml`:
+
+```toml
+[server]
+port = 9090
+```
+
+Or enable auto port fallback (it's on by default):
+
+```toml
+[server]
+auto_port_fallback = true
+```
+
+### `error[E0463]: can't find crate for 'std'` (Windows)
+
+The Microsoft C++ Build Tools aren't installed or aren't on the path. Install them from [visualstudio.microsoft.com/visual-cpp-build-tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) and restart your terminal.
+
+### Tor gets stuck on "STARTING" forever
+
+This is usually a network issue. Check that:
+- You have an internet connection
+- Your firewall isn't blocking outbound connections on port 443 or 9001 (Tor's relay ports)
+- You're not behind a strict corporate or school network that blocks Tor
+
+If you're on a network that blocks Tor, you may need [bridges](https://bridges.torproject.org/). Arti bridge support is still maturing — this is one area where using the classic `tor` binary is currently more reliable.
+
+### The terminal is messed up after RustHost crashes
+
+RustHost installs a panic hook that attempts to restore the terminal on crash. If it fails anyway, run:
+
+```bash
+reset
+```
+
+Or close and reopen your terminal.
+
+### `Unknown field "bund"` (or similar) at startup
+
+You have a typo in `settings.toml`. RustHost rejects unknown config keys at startup. Check the spelling of the field name in the config — the error message will tell you exactly which field it doesn't recognise.
+
+### My `.onion` address changed
+
+If `rusthost-data/arti_state/` was deleted or moved, RustHost generates a new keypair and a new address. The state directory is what makes the address stable across restarts — back it up.
+
+---
+
+## Backing Up Your `.onion` Keypair
+
+Your stable `.onion` address is tied to a keypair stored in:
+
+```
+rusthost-data/arti_state/
+```
+
+**Back this directory up somewhere safe.** If you lose it, you lose your `.onion` address permanently and will get a new one on the next start. There is no recovery.
+
+To restore a backed-up keypair, copy the `arti_state/` directory back before starting RustHost.
+
+---
+
+## Updating RustHost
+
+```bash
+git pull
+cargo build --release
+```
+
+Your `rusthost-data/` directory is not touched by the build — your config, site files, and keypair are safe.
+
+---
+
+## Uninstalling
+
+Delete the binary and the `rusthost-data/` directory:
+
+```bash
+rm target/release/rusthost
+rm -rf rusthost-data/
+```
+
+If you ran it as a systemd service:
+
+```bash
+sudo systemctl stop rusthost
+sudo systemctl disable rusthost
+sudo rm /etc/systemd/system/rusthost.service
+sudo rm /usr/local/bin/rusthost
+sudo rm -rf /var/rusthost
+sudo systemctl daemon-reload
+```
diff --git a/audit.toml b/audit.toml
index 5554569..bc5f6bd 100644
--- a/audit.toml
+++ b/audit.toml
@@ -1,6 +1,6 @@
# cargo-audit configuration for rusthost
#
-# fix G-3 — previously this file contained a bare `ignore` entry with no
+# previously this file contained a bare `ignore` entry with no
# rationale, creating a silent suppression that future developers could not
# evaluate. Rationale is now documented here to match deny.toml.
#
diff --git a/docs/rusthost_audit-askforrepeatafterfixesareapplied.md b/docs/rusthost_audit-askforrepeatafterfixesareapplied.md
new file mode 100644
index 0000000..332a97f
--- /dev/null
+++ b/docs/rusthost_audit-askforrepeatafterfixesareapplied.md
@@ -0,0 +1,541 @@
+# RustHost — Full Project Audit
+
+> Audited from source archive (Archive.zip) and https://github.com/csd113/RustHost
+> Rust edition 2021 · MSRV 1.90 · Arti 0.40 · Tokio 1
+
+---
+
+## Preamble
+
+This is a thoughtful, iteratively-improved codebase. The internal "fix X.Y" comments reveal at least two full self-review passes, and the results show: `unsafe` is forbidden at the workspace level, the Tor integration migrated from subprocess to Arti in-process, `NonZeroU16`/`IpAddr` push validation to serde, and the path-resolution security model is correct. The developer clearly knows Rust.
+
+That said, the project is **not elite**. The gaps listed below are not style nits — they are functional blockers that would stop real users from relying on it, or that represent genuine attack surface. Read this as: "here's exactly what it would take to make this worth deploying."
+
+---
+
+## 1. Architecture & Design
+
+### 🔴 CRITICAL — No HTTP/1.1 keep-alive or HTTP/2
+
+Every response carries `Connection: close`. The server handles exactly one request per TCP connection and drops the socket. For clearnet this is merely slow; **for Tor this is a project-killing design flaw.** Each Tor circuit requires a multi-RTT rendezvous handshake (~1–3 s on a typical path). A page with 15 assets (HTML + CSS + JS + images) forces 15 sequential rendezvous handshakes. A typical page load over this server will take **15–45 seconds** on Tor.
+
+**Fix:** Add HTTP/1.1 keep-alive in the request loop inside `handler.rs`. Parse the `Connection:` request header and re-enter `receive_request` on the same stream. Long-term, HTTP/2 via `h2` or `hyper` eliminates head-of-line blocking entirely.
+
+### 🟠 HIGH — `canonical_root` is never refreshed after startup
+
+In `server/mod.rs`, `canonical_root` is canonicalized once at server start. If the `site/` directory is deleted and recreated while the server is running (e.g., during a content deployment), `canonical_root` points to the now-dead inode. All requests return `Resolved::Fallback`. Pressing `[R]` updates `site_file_count` but **does not update `canonical_root`**. Recovery requires a full process restart.
+
+**Fix:** Re-resolve `canonical_root` inside the `Reload` event handler in `events.rs` and push the new value to the server via a `watch` channel.
+
+### 🟠 HIGH — Tor and HTTP semaphores are sized identically but compete for different resources
+
+The T-2 fix correctly sizes both semaphores to `max_connections`. However, a Tor stream + its proxied HTTP connection occupy **two** file descriptors simultaneously. Under max load, the process holds `2 × max_connections` open sockets, but the OS `ulimit` and `EMFILE` guard only knows about the Tor semaphore. The effective capacity is half what the operator configured.
+
+**Fix:** Document this clearly. Consider sizing the Tor semaphore to `max_connections / 2` or adding a dedicated Tor connection limit to the config.
+
+### 🟡 MEDIUM — No `[profile.dev]` optimization
+
+First `cargo build` (dev) with vendored OpenSSL and the full Arti tree takes 90–120 seconds on a modern machine. There's no `[profile.dev]` section in `Cargo.toml` to set `opt-level = 1` for dependencies, which would dramatically reduce compile time without the debug-info cost of a full release build.
+
+```toml
+[profile.dev.package."*"]
+opt-level = 1
+```
+
+### 🟡 MEDIUM — Module boundary between `tor` and `server` is leaky
+
+`tor/mod.rs` calls `TcpStream::connect(local_addr)` directly against the HTTP server. This creates an implicit contract (the HTTP server must be listening on a specific `IpAddr:port`) that bypasses all the `SharedState` machinery. A refactor that changes how the HTTP server exposes its address would silently break Tor proxying.
+
+**Fix:** Pass the bound address through `SharedState.actual_port` + `config.server.bind` (which already happens in lifecycle), and have `tor::init` receive a `SocketAddr` rather than separate `IpAddr`/`u16` arguments.
+
+### 🟡 MEDIUM — Single log file + simplistic rotation
+
+`logging/mod.rs` rotates `rusthost.log` → `rusthost.log.1` at 100 MB. Only one backup is kept. There's no timestamp in the rotated filename, no gzip, and no hook to signal an external log manager. On a server running at DEBUG level with Arti noise enabled, 100 MB fills in hours.
+
+---
+
+## 2. Code Quality
+
+### 🔴 CRITICAL — `onion_address_from_pubkey` test is a tautology
+
+The `reference_onion` function in `tor/mod.rs` tests uses the **same algorithm** as the production function. It tests determinism and format, but a consistent implementation bug in both would pass. There is no cross-check against a known external test vector.
+
+The Tor Rendezvous Specification defines exact test vectors. One should be hardcoded:
+
+```rust
+// Known vector from the Tor spec, independently computed:
+// All-zero key → "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa3.onion"
+// (compute the exact value offline and assert it here)
+#[test]
+fn hsid_to_onion_address_known_vector() {
+ let pubkey = [0u8; 32];
+ assert_eq!(onion_address_from_pubkey(&pubkey), "aaaa...aaa3.onion");
+}
+```
+
+### 🔴 CRITICAL — `copy_with_idle_timeout` is not actually an idle timeout
+
+In `tor/mod.rs`, `copy_with_idle_timeout` uses `tokio::time::sleep(IDLE_TIMEOUT)` alongside `copy_bidirectional`. **`sleep` starts when the call begins, not when I/O stalls.** A legitimate large file download (say, a 50 MB video) that takes 65 seconds of continuous data transfer is killed at second 60 even though the connection was never idle. The variable name and doc comment say "idle" but the implementation is a wall-clock cap.
+
+**Fix:** Use a proper idle timeout that resets on each read/write. This requires a custom bidirectional copy loop that arms a `tokio::time::Sleep` and resets it on each successful I/O operation, or wraps each read/write in `tokio::time::timeout`.
+
+### 🟠 HIGH — `write_redirect` duplicates all security headers
+
+`write_redirect` in `handler.rs` manually re-lists every security header that `write_headers` also emits. Any future header addition (e.g., `Cross-Origin-Opener-Policy`) must be applied in two places. This is already a bug: `write_redirect` emits CSP on all redirects regardless of content-type, while `write_headers` correctly gates CSP to HTML responses.
+
+**Fix:** Remove `write_redirect` and call `write_headers` with `status: 301, reason: "Moved Permanently"`, adding a `Location` header via a new optional parameter or a pre-call `stream.write_all`.
+
+### 🟠 HIGH — No per-IP request rate limiting
+
+The `Semaphore` limits total *concurrent* connections, but a single IP can consume all 256 slots simultaneously and DoS every other user. There's no per-IP connection limit, no request-rate limit, and no backpressure signal to the caller. On Tor, adversarial clients share exit nodes with legitimate users, making this more exploitable, not less.
+
+**Fix:** Add a `HashMap` of active connections per peer, checked at accept time. This fits naturally in the accept loop in `server/mod.rs`.
+
+### 🟡 MEDIUM — `receive_request` ignores all headers after the request line
+
+The function reads all headers into a `String` for the 8 KiB check but never parses them. `Host`, `Content-Length`, `Transfer-Encoding`, `If-None-Match`, `Range`, `Accept-Encoding` are all silently discarded. This isn't a bug today, but it makes adding any feature that requires inspecting request headers a large refactor.
+
+**Fix:** Parse headers into a lightweight `HashMap<&str, &str>` (or a dedicated struct) after reading them. This enables conditional GET, range requests, compression negotiation, and keep-alive without touching the read logic.
+
+### 🟡 MEDIUM — Dashboard TorStatus message says "polling"
+
+`dashboard.rs` line: `TorStatus::Starting => yellow("STARTING — polling for .onion address…")`.
+The Arti integration is fully event-driven — there is no polling. Stale copy-paste from the old C-Tor subprocess implementation.
+
+### 🟡 MEDIUM — `sanitize_header_value` is incomplete
+
+The function strips `\r` and `\n` from header values. It does not strip:
+- Null bytes (`\x00`) — rejected by RFC 9110 but some parsers accept them
+- Other C0 control characters (`\x01`–`\x1f`, `\x7f`) — legal in filenames on Linux
+
+For the `Location` header, a filename containing `\x00` after CR/LF stripping could still produce an anomalous URL. Add a broader control-character strip:
+
+```rust
+.filter(|&c| !c.is_control())
+```
+
+### 🟡 MEDIUM — `default_data_dir` warning string has a stray whitespace
+
+In `lifecycle.rs`, the fallback `eprintln!` warning in `default_data_dir` contains a multi-line string with a leading run of spaces at the line join:
+```
+"Warning: cannot determine executable path ({e}); using ./rusthost-data…"
+```
+This renders as a very long single line with ~18 spaces mid-sentence. Use `\n` + indentation instead.
+
+### 🟡 MEDIUM — `tor/mod.rs`: the log message for "resetting retry counter" contains leading whitespace
+
+```rust
+log::info!(
+ "Tor: resetting retry counter — last disruption was over an hour ago."
+)
+```
+Same issue as above — line continuation whitespace is included in the string.
+
+### 🟡 MEDIUM — `open_browser` spawns a child process without logging the outcome
+
+In `runtime/mod.rs`, `open_browser` ignores the `Result` from `Command::spawn()` on all platforms. If `xdg-open` isn't installed (common on headless Linux servers), the user gets no feedback. The `[O]` key silently does nothing.
+
+**Fix:** Log a `warn!` when the spawn fails.
+
+### 🟡 MEDIUM — `percent_decode` reinvents `percent-encoding`
+
+The custom percent-decoder in `handler.rs` is 60 lines long, covers null-byte injection, and handles multi-byte UTF-8 correctly. All of this is already provided by the `percent-encoding` crate (3 lines). The custom implementation is a maintenance liability: if a bug is found in `percent_decode`, it won't be caught by an upstream security advisory.
+
+### 🟡 MEDIUM — `LogFile::write_line` checks file size on every write
+
+```rust
+if let Ok(meta) = self.file.metadata() {
+ if meta.len() >= MAX_LOG_BYTES {
+```
+
+This is a `fstat` syscall on every log record. At DEBUG level with Arti noise, this could be thousands of syscalls per second. Cache the size and only re-stat after every N writes (or increment an internal counter).
+
+### 🟡 MEDIUM — `AppState` fields are not reset between test runs (integration tests)
+
+The integration tests in `tests/http_integration.rs` create a fresh `AppState::new()` per test, which is correct. However, `LOG_BUFFER` is a `OnceLock` global in `logging/mod.rs`. If `logging::init` is called in one test run and the test binary is reused, the second call silently returns an error (the logger is already set). The tests currently skip logging, which avoids this, but it means the logging path is not integration-tested.
+
+### 🟡 MEDIUM — `scan_site` returns `(u32, u64)` but file count could theoretically overflow
+
+`count = count.saturating_add(1)` wraps at 4 billion files. Practically not an issue, but returning `u64` for both would be consistent.
+
+---
+
+## 3. Performance
+
+### 🔴 CRITICAL — No HTTP keep-alive (see Architecture §1)
+
+Covered above. The single largest performance issue in the codebase by an order of magnitude.
+
+### 🟠 HIGH — No response compression (gzip/brotli)
+
+All files are served raw. For Tor users on a ~100–500 kbps effective circuit, a 200 KB minified JavaScript file takes 3–16 seconds. Brotli compression typically achieves 70–85% reduction on text assets. Without compression, the Tor user experience is extremely poor.
+
+**Fix:** Check `Accept-Encoding` request header (once header parsing is added), and compress responses with the `async-compression` crate. Pre-compress files at startup to avoid per-request CPU overhead.
+
+### 🟠 HIGH — No conditional GET (ETag / Last-Modified)
+
+All responses carry `Cache-Control: no-store`. There is no `ETag`, `Last-Modified`, `If-None-Match`, or `If-Modified-Since` support. Every browser reload re-fetches every asset, regardless of whether it changed. This is anti-caching by design, which is appropriate for Tor (you don't want assets cached with the onion address in the referrer), but it should be a conscious per-resource policy, not a blanket prohibition. At minimum, `Cache-Control: no-store` should only apply to HTML and not to immutable assets.
+
+### 🟠 HIGH — No `sendfile` / zero-copy file transfer
+
+`tokio::io::copy` reads file data into a userspace buffer then writes it to the socket. On Linux, `sendfile(2)` skips the userspace copy entirely, halving the CPU cost for large file transfers. The `tokio-uring` crate (or the `sendfile` feature in `nix`) enables this.
+
+### 🟡 MEDIUM — `write_headers` allocates a `String` per response
+
+Every call to `write_headers` creates a heap-allocated `String` via `format!`. For static sites under load, this is many small allocations per second. Using a stack-allocated `ArrayString` or writing directly to the `TcpStream` in multiple `write_all` calls would eliminate this.
+
+### 🟡 MEDIUM — `build_directory_listing` buffers the entire HTML response
+
+The directory listing HTML is built in a single `String` before sending. For directories with thousands of entries this is slow. A streaming approach (write HTML head, iterate entries line-by-line) would reduce peak memory and time-to-first-byte.
+
+### 🟡 MEDIUM — `render` acquires the `AppState` lock twice per tick
+
+In `console/mod.rs`:
+```rust
+let mode = state.read().await.console_mode.clone(); // lock 1
+// ...
+let s = state.read().await; // lock 2
+```
+
+A single `read()` that extracts both mode and the full state would halve the lock acquisitions per render tick.
+
+### 🟡 MEDIUM — No Range request support
+
+Large media files (video, audio) cannot be seeked. Streaming players and download managers depend on `Range: bytes=N-M` requests, which this server rejects with 400 (the method is GET, which the server allows, but range headers are silently ignored and the full file is sent). The client sees the full response instead of the range, which some clients reject entirely.
+
+### 🟡 MEDIUM — `scan_site` BFS traversal is not depth-bounded
+
+A deeply nested directory tree (or a symlink cycle that somehow slips through the inode check on Windows) could consume unbounded stack space. The `queue` grows proportionally to directory count. Consider adding a depth limit.
+
+---
+
+## 4. Security
+
+### 🔴 CRITICAL — No per-IP rate limiting (see Code Quality §2)
+
+A single client can open 256 simultaneous connections (the full pool) and deny service to every other user. This is especially dangerous on a Tor hidden service because:
+1. Tor clients share exit nodes, so an IP-level ban catches innocent users
+2. The attacker pays very little (Tor circuit setup is cheap for the attacker)
+
+### 🔴 CRITICAL — `Cache-Control: no-store` prevents Tor Browser's first-party isolation from working correctly
+
+Tor Browser applies first-party isolation per-origin. With `no-store` on all resources, the browser cannot serve cached assets even on the same page load. Every sub-resource request goes over a separate Tor circuit. This is **functionally broken** for multi-asset pages. The intention to prevent caching (good) is implemented too broadly (bad).
+
+**Fix:** Apply `no-store` only to HTML documents. Immutable assets (hashed filenames, images, fonts) should use `Cache-Control: max-age=31536000, immutable`.
+
+### 🟠 HIGH — Tor keypair directory is fixed at `arti_state/`; no key backup/export path
+
+`ensure_private_dir` correctly sets `0o700` on Unix, but:
+1. On **Windows**, directory permissions are not set at all. The keypair is world-readable to any local user.
+2. There is no mechanism to **back up** the keypair. If `arti_state/` is accidentally deleted, the `.onion` address is permanently lost.
+3. There is no documented way to **import** an existing keypair (e.g., migrate from another host).
+
+### 🟠 HIGH — Log file leaks the `.onion` address
+
+`tor/mod.rs` logs the onion address at `INFO` level in a prominent banner. The log file is created with `0o600` (owner read-only), which is correct. However:
+1. If the operator runs `rusthost-cli > output.txt`, the onion address appears in a world-readable file
+2. If the operator shares logs for debugging, the onion address is in the paste
+
+**Fix:** Hash or truncate the address in the log line. Show only the first 8 characters plus `…` to identify it while not fully exposing it.
+
+### 🟠 HIGH — `open_browser` passes the URL to a shell command without explicit sanitization
+
+In `runtime/mod.rs`, the Windows path does:
+```rust
+std::process::Command::new("cmd").args(["/c", "start", "", url])
+```
+
+The URL is constructed from `IpAddr` + `u16`, so the values are safe today. But `open_browser` is `pub` in `crate::runtime`, callable from anywhere with an arbitrary string. If a future caller passes an attacker-influenced URL (e.g., from the onion address or a config field), the empty-string third argument to `start` doesn't fully protect against shell expansion on Windows. Document or enforce that only internal URLs may be passed.
+
+### 🟠 HIGH — No HTTPS option for the clearnet server
+
+When `bind = "0.0.0.0"`, the server listens on all interfaces with plaintext HTTP. There is no TLS termination, no self-signed certificate generation, and no ACME integration. A user who exposes the server to a local network (e.g., home lab) has no way to get HTTPS without a reverse proxy.
+
+### 🟡 MEDIUM — `expose_dotfiles` check happens before URL decode
+
+In `resolve_path`, the dot-file check iterates `Path::new(url_path).components()` where `url_path` is already percent-decoded. This is correct. However, the check runs on the URL path, not on the final resolved filesystem path. A symlink named `safe-name` that points to `.git/` inside the site root would bypass the dot-file filter (the symlink's own name doesn't start with `.`, but the target is a dot-directory).
+
+**Fix:** After resolving the canonical path, check whether any component of the path **relative to `canonical_root`** starts with `.`.
+
+### 🟡 MEDIUM — `build_directory_listing` generates URLs with percent-encoded components but no `` tag
+
+The directory listing uses `percent_encode_path(name)` for hrefs. If the current URL path contains a trailing `/` from a redirect, the relative href `base/encoded_name` may resolve incorrectly on some browser/proxy combinations. Use absolute paths (`/path/to/dir/file`) to eliminate ambiguity.
+
+### 🟡 MEDIUM — No `Strict-Transport-Security` header
+
+Even though TLS isn't supported, the HSTS header should be documented as a TODO. Adding HTTPS later without HSTS means browsers will silently downgrade connections.
+
+### 🟡 MEDIUM — `--config` and `--data-dir` CLI flags accept absolute paths with no restriction
+
+A user who passes `--config /etc/passwd` will get a likely TOML parse error, but `--data-dir /tmp/attacker-controlled` could be used to point the server at attacker-controlled content. This is a misconfiguration concern, not a true security issue, but it's worth documenting.
+
+---
+
+## 5. Reliability & Stability
+
+### 🟠 HIGH — Tor reconnect loop uses linear backoff, not exponential
+
+`RETRY_BASE_SECS = 30` and the delay is `30 * attempt`. After 5 attempts: 30 s, 60 s, 90 s, 120 s, 150 s. This is linear. True exponential backoff (`30 * 2^attempt`, capped at e.g. 600 s) is more respectful of the Tor network under outage conditions and is the industry standard for circuit breakers.
+
+### 🟠 HIGH — Shutdown drain timeout of 8 seconds may be insufficient
+
+In `lifecycle.rs`, the total shutdown budget is 8 seconds split between the HTTP server drain (5 s) and Tor cleanup (whatever's left, often 3 s or less). Tor circuits with active transfers can take longer to close gracefully. On slow Tor paths, `copy_bidirectional` might still be blocked. The `_` return from `timeout` means the process continues regardless, which is correct, but the 8-second hard cap means Tor connections are abruptly terminated rather than gracefully closed.
+
+### 🟡 MEDIUM — If `port_tx` send fails (channel dropped before use), lifecycle returns an error with no cleanup
+
+In `server/mod.rs`, if the bind fails, `port_tx` is dropped without sending. `lifecycle.rs` catches the `Err` from the oneshot and returns `AppError::ServerStartup`. But by this point, logging may have been initialized and the async runtime is still running. The error path in `main` calls `console::cleanup()` and `eprintln!`, which is correct, but it doesn't explicitly shut down the Tor task (it was never started) or flush the log.
+
+**Fix:** Add `logging::flush()` to the error path in `main`.
+
+### 🟡 MEDIUM — `LOG_BUFFER` is a global `OnceLock`; `logging::init` fails silently if called twice
+
+`log::set_logger` returns `Err` if a logger is already set, and the code maps this to `AppError::LogInit`. This is correct. However, `LOG_BUFFER.get_or_init(...)` silently no-ops on the second call. In a test binary that calls `logging::init` from multiple `#[tokio::test]` tests, only the first test gets a fresh ring buffer. This is a test isolation issue, not a production issue, but it means the logging path is not reliably tested.
+
+### 🟡 MEDIUM — `AppState::console_mode` is read under `RwLock` then immediately read again
+
+In `console/mod.rs`, `render()` reads `console_mode` under a read lock, releases it, then re-acquires a read lock to read the full `AppState`. Between the two acquisitions, `console_mode` could change (e.g., from `Dashboard` to `LogView`). The rendered output would then be inconsistent with the state read on the second lock. This is a TOCTOU issue in the rendering path — cosmetic only (next render tick corrects it), but worth fixing.
+
+### 🟡 MEDIUM — `scan_site` fails loudly on the first `read_dir` error
+
+If any subdirectory inside `site/` is unreadable (e.g., `0o000` permissions), `scan_site` returns `Err` and the file count reverts to `0`. The user sees "0 files, 0 B" in the dashboard with a log warning. The function should skip unreadable directories (logging a per-directory warning) rather than aborting the entire scan.
+
+---
+
+## 6. Cross-Platform Support
+
+### 🟠 HIGH — Keypair directory permissions not enforced on Windows
+
+`ensure_private_dir` applies `0o700` only under `#[cfg(unix)]`. On Windows, the directory is created with default ACLs (typically readable by all local users in the same session). The Tor service keypair is therefore **world-readable on Windows**. The Windows ACL equivalent (`SetNamedSecurityInfo`) should be applied via the `windows-acl` or `winapi` crates, or the limitation must be prominently documented in the README.
+
+### 🟡 MEDIUM — `is_fd_exhaustion` returns `false` on non-Unix, non-Windows targets
+
+On WASM, UEFI, and other exotic targets, accept errors that are actually FD exhaustion are logged at `debug` level instead of `error`. This is low-risk but worth documenting.
+
+### 🟡 MEDIUM — `xdg-open` is not available on all Linux environments
+
+On headless servers, Docker containers, minimal Alpine images, and WSL without a display, `xdg-open` either doesn't exist or silently fails. The `[O]` key does nothing with no user feedback.
+
+### 🟡 MEDIUM — Log file permissions not set on Windows
+
+`OpenOptions::mode(0o600)` is `#[cfg(unix)]` only. On Windows, the log file is created with default permissions (likely readable by all users in the group). The log contains the `.onion` address.
+
+### 🟡 MEDIUM — No cross-compilation CI
+
+`audit.toml` and `deny.toml` are present but there is no CI configuration. Cross-compilation to `x86_64-pc-windows-gnu` and `aarch64-unknown-linux-gnu` is claimed as working (via bundled SQLite and vendored OpenSSL), but this is untested in automation.
+
+---
+
+## 7. Developer Experience
+
+### 🔴 CRITICAL — No README.md
+
+There is no `README.md` in the repository. A new visitor to https://github.com/csd113/RustHost sees only the file list. There is no explanation of what the project does, how to build it, how to use it, or why it exists. This is the single biggest barrier to adoption and contribution.
+
+### 🟠 HIGH — MSRV is 1.90 (unreleased as of mid-2025)
+
+`rust-version = "1.90"` in `Cargo.toml`. Rust 1.90 is not yet stable. A new contributor who runs `cargo build` with the stable toolchain gets:
+
+```
+error: package `rusthost` cannot be built because it requires rustc 1.90.0 or later
+```
+
+There is no error message, documentation, or toolchain file (`rust-toolchain.toml`) to tell them what to do. Add a `rust-toolchain.toml` specifying `channel = "nightly"` or the correct beta channel, and document this in the README.
+
+### 🟠 HIGH — No CI configuration
+
+No `.github/workflows/`, no `Makefile`, no `justfile`. The `cargo-deny` (`deny.toml`) and `cargo-audit` (`audit.toml`) configurations are present but never run. A PR that introduces a yanked dependency or a RUSTSEC advisory will merge silently.
+
+**Minimum CI matrix:**
+```
+cargo build --release
+cargo test
+cargo clippy -- -D warnings
+cargo deny check
+cargo audit
+```
+
+### 🟠 HIGH — `[R]` reload does not reload configuration
+
+The dashboard says "press [R] to reload" which users will interpret as "re-read settings.toml." It only rescans the file count. Config changes (e.g., changing `csp_level` or `max_connections`) require a full restart. Document this limitation prominently or implement config hot-reload.
+
+### 🟡 MEDIUM — Internal "fix X.Y" comments are meaningless to outside contributors
+
+The codebase is dense with references like `// fix H-3`, `// fix T-7`, `// fix 4.5`. These are clearly from an internal issue tracker or review document that is not in the repository. To an outside contributor, these comments are noise that obscures the actual rationale.
+
+**Fix:** Replace these with human-readable comments explaining *why* the fix was necessary, not what issue number it closes. E.g., `// fix H-3` → `// Strip CR/LF to prevent CRLF injection into Location header`.
+
+### 🟡 MEDIUM — CLI parser doesn't support `--flag=value` syntax
+
+`--config /path` works; `--config=/path` produces `error: unrecognised argument '--config=/path'`. Standard CLI convention supports both. Consider replacing the hand-rolled parser with `clap` to get this, plus `--help` auto-generation, `--` end-of-flags, short flags (`-c`/`-d`), and shell completion generation.
+
+### 🟡 MEDIUM — No `--port` or `--no-tor` CLI flags for quick ad-hoc use
+
+The most common developer workflow is "I want to quickly serve a directory on a specific port without editing a TOML file." There's no `rusthost-cli --port 3000 --no-tor ./my-site`. Every use requires the full config file setup.
+
+### 🟡 MEDIUM — No structured access log
+
+The server logs requests at `DEBUG` level via `log::debug!("Connection from {peer}")`, but there's no access log in Combined Log Format (or any structured format). Operators cannot pipe logs to a SIEM, run `goaccess`, or analyze traffic patterns.
+
+---
+
+## 8. Feature Completeness
+
+### 🔴 CRITICAL — No SPA (Single Page Application) fallback routing
+
+There is no option to serve `index.html` for all 404 responses. React, Vue, Svelte, and Angular apps all require this for client-side routing to work. A request to `/about` on a React SPA returns 404 from this server; only `/` works. This is table stakes for any static host.
+
+**Fix:** Add `fallback_to_index = false` to `[site]` config. When true, return `index.html` for all 404s that don't match a file.
+
+### 🔴 CRITICAL — No HTTPS / TLS support
+
+The server has no TLS. For public Tor hidden service use, this doesn't matter (Tor provides its own encryption). But for clearnet access, plaintext HTTP is increasingly blocked by browsers (HSTS preloading, mixed-content errors). Providing a `--generate-cert` flag with a self-signed certificate, or ACME support, would make the tool usable for clearnet hosting.
+
+### 🟠 HIGH — No custom error pages (404.html, 500.html)
+
+404 responses are plain-text "Not Found". Every professional static host supports custom error pages. Add `error_404 = "404.html"` to `[site]` config.
+
+### 🟠 HIGH — No gzip/brotli compression (see Performance §2)
+
+### 🟠 HIGH — No Range request (206 Partial Content) support
+
+Audio/video players, download managers, and PDF viewers depend on range requests. Without it, a 500 MB video file cannot be seeked or resumed.
+
+### 🟡 MEDIUM — No URL redirect/rewrite rules
+
+No `[[redirects]]` or `[[rewrites]]` configuration table. Migrating a site from another host requires the destination host to preserve all URLs. Custom redirects (e.g., `/old-page → /new-page`) are a baseline feature.
+
+### 🟡 MEDIUM — No `--serve ` one-shot mode
+
+You cannot do `rusthost-cli --serve ./docs` to instantly serve a directory without first running through the first-run setup flow. This is the primary use case for developers.
+
+### 🟡 MEDIUM — Missing MIME types
+
+The MIME table is missing:
+- `.webmanifest` → `application/manifest+json` (required for PWA)
+- `.m4v`, `.mov` → video types
+- `.flac`, `.opus` → audio types
+- `.glb`, `.gltf` → 3D model types (increasingly common in modern web)
+- `.ndjson` → `application/x-ndjson`
+- `.ts` → `video/mp2t` (also used for TypeScript — context-dependent)
+
+### 🟡 MEDIUM — No directory listing sort: dirs-first, newest-first options
+
+Files are sorted alphabetically only. No option for directories-first, size-ascending, or modification-time-descending. Minor but frequently requested.
+
+### 🟡 MEDIUM — No config hot-reload via filesystem watch
+
+`inotify` (Linux), `kqueue` (macOS), and `ReadDirectoryChangesW` (Windows) can all trigger config reload when `settings.toml` changes. The `notify` crate provides a cross-platform API. This is especially useful for headless deployments where the dashboard is disabled.
+
+---
+
+## 9. Documentation & Open Source Readiness
+
+### 🔴 CRITICAL — No README.md (see Developer Experience)
+
+### 🟠 HIGH — No CHANGELOG or release history
+
+### 🟠 HIGH — No CONTRIBUTING.md
+
+No code style guide, no PR checklist, no instructions for running tests locally.
+
+### 🟠 HIGH — `authors = []` in Cargo.toml
+
+No author credit. Makes security contact and attribution impossible.
+
+### 🟡 MEDIUM — No SECURITY.md
+
+No responsible disclosure policy. For a security-sensitive tool (Tor hidden service hosting), this is particularly important.
+
+### 🟡 MEDIUM — `lib.rs` re-exports everything as `pub`
+
+All modules are `pub` to enable integration tests. This exposes an enormous, unstable API surface. Use `pub(crate)` for internal items and only `pub` the actual public interface. Integration tests can use `#[cfg(test)]` `pub(crate)` re-exports.
+
+### 🟡 MEDIUM — No architecture diagram or design document
+
+The Tor integration (Arti in-process, rendezvous, stream proxying) is non-trivial. A `ARCHITECTURE.md` with a data-flow diagram would help contributors understand the lifecycle before touching the code.
+
+### 🟡 MEDIUM — `deny.toml` and `audit.toml` are unconfigured CI dead weight
+
+Both files exist but are never run. Either hook them into CI or remove them to reduce confusion.
+
+---
+
+## 10. "Next-Level" Improvements
+
+1. **HTTP/1.1 keep-alive + HTTP/2**: The biggest single change. Use `hyper` (mature, production-grade, supports HTTP/2) instead of the hand-rolled HTTP parser. Tor page load times drop from 30s to 3s.
+
+2. **Brotli/gzip compression**: Add `async-compression` + pre-compression on startup. 70–85% bandwidth reduction on text assets — transformative for Tor users.
+
+3. **Metrics/telemetry dashboard**: Real-time bytes served, connection duration histogram, P50/P95 request latency, per-path hit counts. Display in the console dashboard. Export as Prometheus metrics via a `--metrics-port` flag.
+
+4. **SPA routing + custom error pages**: `fallback_to_index = true` + `404.html`/`500.html` support. Enables hosting React/Vue/Svelte apps without modification.
+
+5. **Config hot-reload**: Watch `settings.toml` with the `notify` crate. Apply changes to `csp_level`, `max_connections`, `expose_dotfiles` without restart.
+
+6. **ETag / conditional GET + smart caching headers**: `Cache-Control: immutable` for hashed assets, `no-store` only for HTML. Cut re-download traffic by 80–90% on repeat visits.
+
+7. **`rusthost-cli --serve ./dir --port 3000 --no-tor` one-shot mode**: Zero-config local serving. This single flag would make the tool immediately useful to developers who don't need Tor.
+
+8. **Range request (206 Partial Content) support**: Essential for audio/video. Technically straightforward: parse `Range:` header, `File::seek()`, set `Content-Range:` response header.
+
+9. **Self-signed TLS certificate generation**: `rustls` + `rcgen` can generate a self-signed cert at startup. Enables `https://localhost:8443` with zero user configuration. Optionally add ACME (Let's Encrypt) support for production clearnet deployments.
+
+10. **URL redirect/rewrite rules in config**:
+```toml
+[[redirects]]
+from = "/old-page"
+to = "/new-page"
+status = 301
+```
+This alone would unblock 90% of site migrations.
+
+---
+
+## Top 10 Highest Impact Improvements
+
+| Rank | Change | Effort | Impact |
+|------|--------|--------|--------|
+| 1 | HTTP/1.1 keep-alive (replace hand-rolled parser with `hyper`) | Large | **Removes Tor unusability** |
+| 2 | README.md (installation, usage, config reference) | Small | **Enables any adoption at all** |
+| 3 | gzip/brotli content compression | Medium | **3–10× faster page loads over Tor** |
+| 4 | SPA routing (`fallback_to_index`) + custom 404.html | Small | **Enables hosting any modern frontend** |
+| 5 | Per-IP rate limiting in accept loop | Medium | **Closes DoS attack surface** |
+| 6 | CI configuration (GitHub Actions) | Small | **Prevents regressions, builds trust** |
+| 7 | Fix `copy_with_idle_timeout` to be an actual idle timeout | Small | **Stops killing legitimate large file downloads** |
+| 8 | ETag/conditional GET + smart `Cache-Control` | Medium | **80–90% reduction in repeated traffic** |
+| 9 | External test vector for `onion_address_from_pubkey` | Trivial | **Eliminates tautological Tor address test** |
+| 10 | Replace internal "fix X.Y" comments with explanatory prose | Small | **Makes code understandable to contributors** |
+
+---
+
+## What This Project Does Well
+
+**Tor integration is genuinely impressive.** Embedding Arti in-process, deriving the onion address from the keypair without polling a `hostname` file, handling bootstrap timeouts, exponential retry with failure-time reset — this is well-researched and non-trivial. Most comparable projects just shell out to `tor`.
+
+**Security fundamentals are solid.** `canonicalize` + `starts_with` for path traversal, `NonZeroU16`/`IpAddr` at the type level for config validation, `#[serde(deny_unknown_fields)]`, `unsafe_code = "forbid"`, dot-file blocking, CRLF injection stripping, XSS escaping in directory listings, 0o600/0o700 for Tor keypair files — all correct.
+
+**Error handling is typed and explicit.** The single `AppError` enum with `thiserror`, a crate-level `Result` alias, and consistent use of `?` mean errors propagate cleanly without `Box` everywhere. The `AppError::ConfigValidation(Vec)` pattern for bulk validation errors is particularly good.
+
+**Async architecture is clean.** `Arc>` for shared state, `AtomicU64` for hot-path metrics, `JoinSet` for connection tracking, watch channels for shutdown propagation, oneshot channel for port signaling — each tool is chosen appropriately.
+
+**The test suite is integration-focused.** The `TestServer` harness in `tests/http_integration.rs` spins up a real server on a dynamically-allocated port and sends real HTTP bytes. This catches wire-level bugs that unit tests miss.
+
+**The config system is unusually good for a project this size.** `serde` parse-time validation for typed fields, semantic validation in a separate `validate()` pass, `#[serde(deny_unknown_fields)]` to catch typos, and excellent inline documentation in the generated TOML file.
+
+---
+
+## What Prevents This From Being Elite
+
+**1. No HTTP keep-alive.** This is not a performance nit — it makes the tool genuinely unusable for its primary stated use case (Tor hosting). A static site with 20 assets takes 60 seconds to load on Tor. This single issue would drive every serious Tor user away immediately.
+
+**2. No README.** An open-source project without a README is invisible. It cannot be discovered, evaluated, or adopted. It cannot receive contributions. Every other quality in this code is wasted without documentation.
+
+**3. Feature gap relative to competitors.** Caddy, `miniserve`, `static-web-server`, and even Python's `http.server` support: compression, range requests, conditional GET, custom error pages, and SPA routing. This server doesn't. A developer evaluating static hosting tools will pick one of those instead.
+
+**4. The `copy_with_idle_timeout` bug is subtle but serious.** It terminates legitimate large transfers after 60 seconds wall-clock time. A user who tries to download a 100 MB file over Tor (which takes ~10 minutes at typical Tor speeds) will see a dropped connection every 60 seconds. They will assume the server is broken — because it is.
+
+**5. No per-IP rate limiting.** The `max_connections` semaphore is a global cap, not a per-client cap. A single client can monopolize the entire server. This isn't hardening — it's a single point of failure dressed up as one.
+
+**6. No compression.** Tor is slow. Sending 200 KB of uncompressed JavaScript over a 200 kbps Tor circuit when brotli would compress it to 30 KB is not an acceptable tradeoff for any serious use case.
+
+These six gaps, in order, are what stand between this project and a tool worth recommending.
diff --git a/src/config/loader.rs b/src/config/loader.rs
index 3d567ed..47f8ed3 100644
--- a/src/config/loader.rs
+++ b/src/config/loader.rs
@@ -34,12 +34,12 @@ fn validate(cfg: &Config) -> Result<()> {
// bind: IpAddr — invalid IPs are already rejected by serde at parse time (4.2).
// level: LogLevel — invalid levels are already rejected by serde at parse time (4.2).
- // fix C-1 — a free-form CSP string with embedded CR/LF could inject
+ // a free-form CSP string with embedded CR/LF could inject
// arbitrary headers. The field is now a typed `CspLevel` enum so serde
// rejects any value that isn't "off", "relaxed", or "strict" at parse time;
// no runtime check is needed here.
- // fix C-2 — max_connections = 0 deadlocks (semaphore never grants permits);
+ // max_connections = 0 deadlocks (semaphore never grants permits);
// very large values defeat the connection limit entirely.
if cfg.server.max_connections == 0 {
errors.push("[server] max_connections must be at least 1".into());
@@ -51,6 +51,22 @@ fn validate(cfg: &Config) -> Result<()> {
));
}
+ // Phase 2 (C-4) — validate per-IP connection limit.
+ //
+ // max_connections_per_ip = 0 would make every connection fail immediately
+ // (the CAS loop can never increment past the limit of zero).
+ // max_connections_per_ip > max_connections means the per-IP guard can
+ // never be the binding constraint, making it useless.
+ if cfg.server.max_connections_per_ip == 0 {
+ errors.push("[server] max_connections_per_ip must be at least 1".into());
+ }
+ if cfg.server.max_connections_per_ip > cfg.server.max_connections {
+ errors.push(format!(
+ "[server] max_connections_per_ip ({}) must be ≤ max_connections ({})",
+ cfg.server.max_connections_per_ip, cfg.server.max_connections
+ ));
+ }
+
// [site]
// `index_file` must be a bare filename, not a path.
// Use Path::components() rather than checking for MAIN_SEPARATOR:
@@ -154,6 +170,41 @@ mod tests {
assert!(validate(&valid()).is_ok());
}
+ // ── validate — [server] max_connections_per_ip ───────────────────────────
+
+ #[test]
+ fn validate_max_connections_per_ip_zero_is_rejected() {
+ let mut cfg = valid();
+ cfg.server.max_connections_per_ip = 0;
+ let result = validate(&cfg);
+ assert!(
+ matches!(&result, Err(AppError::ConfigValidation(e))
+ if e.iter().any(|s| s.contains("max_connections_per_ip"))),
+ "expected ConfigValidation error mentioning max_connections_per_ip, got: {result:?}"
+ );
+ }
+
+ #[test]
+ fn validate_max_connections_per_ip_exceeds_max_connections() {
+ let mut cfg = valid();
+ cfg.server.max_connections = 32;
+ cfg.server.max_connections_per_ip = 64; // > max_connections
+ let result = validate(&cfg);
+ assert!(
+ matches!(&result, Err(AppError::ConfigValidation(e))
+ if e.iter().any(|s| s.contains("max_connections_per_ip"))),
+ "expected ConfigValidation error mentioning max_connections_per_ip, got: {result:?}"
+ );
+ }
+
+ #[test]
+ fn validate_max_connections_per_ip_equal_to_max_connections_is_ok() {
+ let mut cfg = valid();
+ cfg.server.max_connections = 32;
+ cfg.server.max_connections_per_ip = 32; // equal is permitted
+ assert!(validate(&cfg).is_ok());
+ }
+
// ── validate — [site] directory ─────────────────────────────────────────
#[test]
@@ -241,6 +292,7 @@ bind = "127.0.0.1"
auto_port_fallback = true
open_browser_on_start = false
max_connections = 256
+max_connections_per_ip = 16
csp_level = "off"
{extra}
diff --git a/src/config/mod.rs b/src/config/mod.rs
index 3861ea9..e4dfee4 100644
--- a/src/config/mod.rs
+++ b/src/config/mod.rs
@@ -47,7 +47,6 @@ impl From for LevelFilter {
///
/// Replaces the post-parse `.parse::()` check in `loader.rs` with a
/// parse-time error so an invalid IP is caught the moment the file is read
-/// (fix 4.2).
fn deserialize_ip_addr<'de, D: Deserializer<'de>>(d: D) -> Result {
let s = String::deserialize(d)?;
s.parse().map_err(serde::de::Error::custom)
@@ -116,6 +115,31 @@ impl CspLevel {
// ─── Config structs ──────────────────────────────────────────────────────────
+/// A single URL redirect or rewrite rule, matched before filesystem resolution.
+///
+/// Example `settings.toml` entry:
+/// ```toml
+/// [[redirects]]
+/// from = "/old-page"
+/// to = "/new-page"
+/// status = 301
+/// ```
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(deny_unknown_fields)]
+pub struct RedirectRule {
+ /// Source URL path to match (exact match).
+ pub from: String,
+ /// Destination URL (may be a relative path or absolute URL).
+ pub to: String,
+ /// HTTP status code — 301 for permanent, 302 for temporary.
+ #[serde(default = "default_redirect_status")]
+ pub status: u16,
+}
+
+const fn default_redirect_status() -> u16 {
+ 301
+}
+
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct Config {
@@ -125,17 +149,22 @@ pub struct Config {
pub logging: LoggingConfig,
pub console: ConsoleConfig,
pub identity: IdentityConfig,
+ /// URL redirect/rewrite rules evaluated before filesystem resolution.
+ /// Declared as `[[redirects]]` array-of-tables in `settings.toml`.
+ /// Addresses M-13.
+ #[serde(default)]
+ pub redirects: Vec,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct ServerConfig {
/// Non-zero port number. `NonZeroU16` prevents port 0 at the type level:
- /// serde rejects a zero value during deserialisation (fix 4.2).
+ /// serde rejects a zero value during deserialisation.
pub port: NonZeroU16,
/// Network interface to bind to. Parsed from TOML string at load time;
- /// an invalid IP address is rejected immediately (fix 4.2).
+ /// an invalid IP address is rejected immediately.
#[serde(
deserialize_with = "deserialize_ip_addr",
serialize_with = "serialize_ip_addr"
@@ -146,6 +175,18 @@ pub struct ServerConfig {
pub open_browser_on_start: bool,
pub max_connections: u32,
+ /// Maximum concurrent connections from a single IP address.
+ ///
+ /// Prevents a single client from monopolising the connection pool.
+ /// When the limit is reached the connection is dropped at the TCP level —
+ /// the OS sends a RST so no HTTP overhead is incurred.
+ ///
+ /// Must be ≥ 1 and ≤ `max_connections`. Validated in `loader.rs`.
+ /// Defaults to 16, which is generous for browsers (typically 6–8 parallel
+ /// connections) while preventing trivial single-client exhaustion attacks.
+ #[serde(default = "default_max_connections_per_ip")]
+ pub max_connections_per_ip: u32,
+
/// Content-Security-Policy preset. See [`CspLevel`] for available values
/// (`"off"`, `"relaxed"`, `"strict"`) and the header each one sends.
/// Defaults to `"off"` — no CSP header, maximum browser compatibility.
@@ -153,6 +194,14 @@ pub struct ServerConfig {
pub csp_level: CspLevel,
}
+/// Default per-IP connection limit.
+///
+/// 16 is generous for browsers (6–8 parallel connections per origin) while
+/// making single-client denial-of-service attacks impractical without many IPs.
+const fn default_max_connections_per_ip() -> u32 {
+ 16
+}
+
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(deny_unknown_fields)]
pub struct SiteConfig {
@@ -161,9 +210,29 @@ pub struct SiteConfig {
pub enable_directory_listing: bool,
/// When `true`, directory listings and direct requests expose dot-files
/// (e.g. `.git/`, `.env`). Defaults to `false` so hidden files are not
- /// accidentally served (fix H-10).
+ /// accidentally served.
#[serde(default)]
pub expose_dotfiles: bool,
+
+ /// When `true`, requests for paths that don't match any file are served
+ /// `index.html` (with status 200) instead of a 404.
+ /// Required for single-page applications with client-side routing
+ /// (`React Router`, `Vue Router`, `SvelteKit`, etc.).
+ /// Addresses C-6 — React/Vue/Svelte apps silently 404 without this.
+ #[serde(default)]
+ pub spa_routing: bool,
+
+ /// Optional custom 404 page path, relative to the site directory.
+ /// When set and the file exists, it is served with status 404 for all
+ /// requests that resolve to `NotFound`. Addresses H-10.
+ #[serde(default)]
+ pub error_404: Option,
+
+ /// Optional custom 500/503 page path, relative to the site directory.
+ /// Served with status 503 when the server cannot fulfil the request due
+ /// to internal errors.
+ #[serde(default)]
+ pub error_503: Option,
}
/// Controls Tor integration.
@@ -229,6 +298,7 @@ impl Default for Config {
auto_port_fallback: true,
open_browser_on_start: false,
max_connections: 256,
+ max_connections_per_ip: default_max_connections_per_ip(),
csp_level: CspLevel::Strict,
},
site: SiteConfig {
@@ -236,6 +306,9 @@ impl Default for Config {
index_file: "index.html".into(),
enable_directory_listing: false,
expose_dotfiles: false,
+ spa_routing: false,
+ error_404: None,
+ error_503: None,
},
tor: TorConfig { enabled: true },
logging: LoggingConfig {
@@ -252,6 +325,7 @@ impl Default for Config {
identity: IdentityConfig {
instance_name: "RustHost".into(),
},
+ redirects: Vec::new(),
}
}
}
diff --git a/src/console/dashboard.rs b/src/console/dashboard.rs
index e39a7af..92e94b5 100644
--- a/src/console/dashboard.rs
+++ b/src/console/dashboard.rs
@@ -56,7 +56,7 @@ pub fn render_dashboard(state: &AppState, requests: u64, errors: u64, config: &C
let tor_str = match &state.tor_status {
TorStatus::Disabled => dim("DISABLED"),
- TorStatus::Starting => yellow("STARTING — polling for .onion address…"),
+ TorStatus::Starting => yellow("STARTING — bootstrapping Tor network…"),
TorStatus::Ready => green("READY"),
TorStatus::Failed(reason) => red(&format!("FAILED ({reason}) — see log for details")),
};
@@ -71,9 +71,9 @@ pub fn render_dashboard(state: &AppState, requests: u64, errors: u64, config: &C
|| match &state.tor_status {
TorStatus::Disabled => dim("(disabled)"),
TorStatus::Starting => dim("(bootstrapping…)"),
- // fix 3.11 — this branch is unreachable in practice because
- // set_onion() sets Ready and Some(addr) atomically. If it fires,
- // an invariant has been violated; the honest label is "unavailable".
+ // This branch is unreachable in practice because set_onion() sets
+ // Ready and Some(addr) atomically. If it fires an invariant has been
+ // violated; the honest label is "unavailable".
TorStatus::Ready => {
debug_assert!(
false,
diff --git a/src/console/mod.rs b/src/console/mod.rs
index edf4bbe..a5c027b 100644
--- a/src/console/mod.rs
+++ b/src/console/mod.rs
@@ -119,13 +119,21 @@ async fn render(
metrics: &SharedMetrics,
last_rendered: &mut String, // 3.3 — previous frame for change-detection
) -> Result<()> {
- let mode = state.read().await.console_mode.clone();
+ // Acquire the lock ONCE and extract everything needed for this frame.
+ // Previously this function locked twice: once to read `console_mode`, then
+ // a second time inside the Dashboard branch to read the full state snapshot.
+ // The two-lock pattern is a TOCTOU hazard — `console_mode` could change
+ // between the first and second acquire — and also holds the lock for longer
+ // than necessary.
+ let (mode, state_snapshot) = {
+ let s = state.read().await;
+ (s.console_mode.clone(), s.clone())
+ };
let output = match mode {
ConsoleMode::Dashboard => {
- let s = state.read().await;
let (reqs, errs) = metrics.snapshot();
- dashboard::render_dashboard(&s, reqs, errs, config)
+ dashboard::render_dashboard(&state_snapshot, reqs, errs, config)
}
ConsoleMode::LogView => dashboard::render_log_view(config.console.show_timestamps),
ConsoleMode::Help => dashboard::render_help(),
diff --git a/src/lib.rs b/src/lib.rs
index c29a453..06675d7 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1,15 +1,22 @@
//! # rusthost — library crate
//!
-//! Exposes all subsystem modules so that integration tests in `tests/` can
-//! import them directly. The binary entry point (`src/main.rs`) is a thin
-//! wrapper that calls [`runtime::lifecycle::run`].
+//! Exposes the public API surface used by integration tests in `tests/` and
+//! by the binary entry point in `src/main.rs`.
+//!
+//! Internal modules are `pub(crate)` by default; only items that form part of
+//! the documented operator/integration-test API are re-exported here.
+// Public modules — part of the documented external API.
pub mod config;
-pub mod console;
pub mod error;
-pub mod logging;
pub mod runtime;
pub mod server;
+
+// Internal modules — exposed `pub` only so integration tests in `tests/`
+// can import them. Use `pub(crate)` within the codebase; prefer these
+// re-exports for test access.
+pub mod console;
+pub mod logging;
pub mod tor;
pub use error::AppError;
@@ -20,3 +27,15 @@ pub use error::AppError;
/// All subsystems return this type so callers can match on specific variants
/// rather than inspecting an opaque `Box` string.
pub type Result = std::result::Result;
+
+// ─── Integration-test-only re-exports ────────────────────────────────────────
+//
+// These items are not part of the stable public API. They are gated behind
+// `#[cfg(test)]` so that they do not appear in `rustdoc` output or in the
+// symbol table of release binaries. Integration tests import them via the
+// crate root without needing to reach into internal module paths.
+
+#[cfg(test)]
+pub use server::handler::{percent_decode, ByteRange, Encoding};
+#[cfg(test)]
+pub use tor::onion_address_from_pubkey;
diff --git a/src/logging/mod.rs b/src/logging/mod.rs
index a259690..d73c89f 100644
--- a/src/logging/mod.rs
+++ b/src/logging/mod.rs
@@ -27,6 +27,122 @@ use log::{Level, LevelFilter, Log, Metadata, Record};
use crate::{config::LoggingConfig, AppError, Result};
+// ─── Structured access log (M-16) ────────────────────────────────────────────
+
+/// An HTTP access log record in Combined Log Format (CLF).
+///
+/// CLF format:
+/// ` - - [