Skip to content

ArshveerN/httpcache-proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

httpcache-proxy

A non-blocking HTTP proxy server with disk-based response caching, written in Python using only the standard library.

Requests are made by visiting http://localhost:<port>/<target-host>/<path> in your browser. The proxy forwards the request to the target host, stores the response on disk, and serves it from cache on subsequent requests until the cache TTL expires.


Features

  • Non-blocking I/O via select() — handles many concurrent browser requests
  • Disk-based response caching with configurable TTL
  • Configurable listen port, upstream port, and bind address
  • Structured log output (INFO by default, DEBUG with --verbose)
  • Graceful shutdown on Ctrl-C
  • Zero external dependencies — pure Python 3 standard library

Requirements

  • Python 3.7 or later
  • No third-party packages needed

Quick Start

1. Start a test website

The Folder/ directory contains a test page with 272 images — great for observing cache behavior.

cd Folder
python -m http.server 8000

Leave this running in its own terminal.

2. Start the proxy

# Cache responses for 60 seconds, connect to upstream on port 8000
python proxy.py 60 --upstream-port 8000

3. Open in your browser

http://localhost:8888/localhost/

The proxy fetches localhost:8000/ and logs each request. Reload within 60 seconds to see cache hits. After the TTL expires, the proxy re-fetches from the upstream.


Usage

python proxy.py <stale_time> [options]

Positional argument

Argument Description
stale_time Cache TTL in seconds. Use 0 to disable caching (always fetch fresh).

Options

Flag Default Description
--port 8888 Port the proxy listens on
--upstream-port 80 Port used when connecting to upstream servers
--host 127.0.0.1 Address to bind to. Use 0.0.0.0 to accept connections from other machines on your network.
--cache-dir ./cache Directory where cached responses are stored
--verbose, -v off Enable debug-level logging

Examples

# 5-minute cache on default port 8888
python proxy.py 300

# 1-hour cache on a custom port
python proxy.py 3600 --port 9090

# No caching — always fetch from upstream
python proxy.py 0

# Test site running on port 8000 instead of 80
python proxy.py 120 --upstream-port 8000

# Store cache in a specific directory
python proxy.py 300 --cache-dir /tmp/proxycache

# Debug logging
python proxy.py 60 --verbose

URL Format

This proxy uses an embedded-host URL scheme rather than the standard proxy protocol. You visit URLs in the form:

http://localhost:<port>/<target-host>/<path>
Example URL What it fetches
http://localhost:8888/example.com/index.html example.com/index.html on port 80
http://localhost:8888/localhost/ localhost/ on the configured upstream port
http://localhost:8888/myserver.local/api/data myserver.local/api/data on port 80

Note: This proxy only supports plain HTTP (port 80). HTTPS targets are not supported.


Simple (no-cache) version

simple_proxy.py is a minimal blocking proxy that handles one request at a time with no caching. It is useful as a reference implementation or for simple debugging.

python simple_proxy.py
python simple_proxy.py --port 9999
python simple_proxy.py --upstream-port 8000

Test Site

The Folder/ directory contains a self-hosted test website:

  • index.html — a grid of 272 images (JPG + GIF)
  • images/ — image assets served as binary content

Its purpose is to generate many parallel HTTP GET requests so you can observe cache misses on first load vs. cache hits on reload.

Setup:

# Terminal 1 — test website
cd Folder
python -m http.server 8000

# Terminal 2 — proxy with 30-second TTL
python proxy.py 30 --upstream-port 8000

Open http://localhost:8888/localhost/ and watch the proxy log. Reload within 30 seconds — all 272 images should be served from cache instantly. Wait 30 seconds and reload again to see them fetched fresh.


How It Works

Browser ──GET /localhost/index.html──> Proxy (port 8888)
                                            │
                                   cache miss? fetch from upstream
                                            │
                                      Upstream Server (port 8000)
                                            │
                                   store response to ./cache/
                                            │
Browser <──────────── HTTP response ────────┘

The proxy uses a single-threaded event loop built on select():

  1. Accept — new browser connections are registered for reading
  2. Parse — once a full HTTP request arrives (\r\n\r\n), check the cache
  3. Cache hit — serve the stored response directly from disk
  4. Cache miss — open a non-blocking connection to the upstream, send the GET request, accumulate the full response, store it to disk, then forward to the browser
  5. Partial sends — large responses that can't be sent in one call are queued and continued when the socket becomes writable again

Renaming the GitHub Repository

To rename the repo on GitHub: Settings → General → Repository name → set it to httpcache-proxy (or your preferred name), then update your local remote:

git remote set-url origin https://github.com/<your-username>/httpcache-proxy.git

License

MIT — do whatever you like with it.

About

Non-blocking HTTP proxy with disk-based caching — built in pure Python, zero dependencies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors