Turn a $200/month Claude subscription into a scalable AI processing API.
Architecture | Worker Mode | API Spec | Schema
Infrastructure component of NSKit — bound by structure, free to combine. CSW powers the AI processing layer behind NSKit's production services, turning subscription-tier Claude into cost-efficient back-end capability.
Claude API pricing is pay-per-token:
| Model | Input | Output |
|---|---|---|
| Opus | $15/M tokens | $75/M tokens |
| Sonnet | $3/M tokens | $15/M tokens |
| Haiku | $0.25/M tokens | $1.25/M tokens |
A Claude Max subscription costs $200/month for unlimited* usage via the CLI (claude -p).
CSW bridges this gap. It wraps the Claude CLI into a REST API server, turning your subscription into a production AI backend.
*Subject to Anthropic's fair use policy. CSW is designed for legitimate processing workloads, not abuse.
Real-world math for a service handling 1,000 requests/day (~30,000/month), averaging ~2,000 tokens per request:
| Approach | Monthly Cost | Cost/Request |
|---|---|---|
| Claude API (Opus) | ~$4,500 | ~$0.15 |
| Claude API (Sonnet) | ~$900 | ~$0.03 |
| CSW (subscription) | $200 | ~$0.007 |
That's 4.5-22x cheaper for equivalent work. With response caching enabled, effective cost drops even further since cached responses cost nothing.
CSW operates in two modes:
Stateless processing. Each request spawns a new claude -p process, executes, and returns.
- Pros: Simple, reliable, no state management
- Cons: Cold-start overhead (~14-40s per request), no conversation context
- Best for: One-off processing tasks, batch jobs
Persistent tmux-based sessions. Claude stays running in a tmux session, receiving commands via file-based inbox/outbox.
- Pros: 2.5x faster (no cold-start), maintains conversation context, supports
alwaysOn - Cons: More complex lifecycle management, ~16s initial session creation
- Best for: User-facing APIs, conversational workflows, high-frequency requests
See Worker Mode Architecture for the full lifecycle.
Create and resume conversations with full message history. MySQL-backed sessions with automatic message tracking.
Two strategies for different needs:
- Simple mode: 1:1 cache. Same key always returns the same response.
- Pool mode: Uses Growing Pool Cache -- each cache key grows a pool of diverse AI responses that expands based on demand.
Reusable prompt templates with variable substitution. Define a system prompt, output format, and options once, then invoke by preset ID.
System prompt + memory entries (rules, properties, actions) + conversation history, all assembled automatically before each request.
Supports text, JSON, and structured output with JSON schema validation. Claude's --output-format json + --json-schema when a schema is provided.
When conversation messages exceed a threshold, older messages are automatically summarized using a lightweight model (Haiku), keeping context windows manageable.
30-second health checks, automatic restart on hang detection, staggered recovery on server reboot, and configurable idle/awake lifecycle.
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/process |
Process an AI request (stateless, new session, or existing session) |
GET |
/api/v1/sessions |
List sessions |
GET |
/api/v1/sessions/:id |
Get session with messages |
POST |
/api/v1/sessions/:id/archive |
Archive a session |
GET/POST |
/api/v1/sessions/:id/memory |
Get/set session memory |
GET/POST |
/api/v1/presets |
List/create presets |
GET/PUT/DELETE |
/api/v1/presets/:id |
CRUD preset |
GET |
/api/v1/cache/stats |
Cache statistics |
POST |
/api/v1/cache/purge |
Purge expired cache |
POST |
/api/v1/workers |
Create a worker (Worker Mode) |
GET |
/api/v1/workers |
List workers |
DELETE |
/api/v1/workers/:name |
Destroy a worker |
See the full API Specification for request/response schemas and examples.
CSW powers the AI backend for NewMyoung -- an AI-powered fortune and naming service operating across Korea, Japan, and Chinese-speaking regions (Taiwan, Singapore, Macau, Malaysia, Hong Kong):
- 9 presets handling fortune readings, fortune cookies, and divination queries
- tmux-based workers with
alwaysOnfor user-facing endpoints - Growing Pool Cache ensuring diverse AI-generated fortunes (no two users see the same reading)
- 17,000+ requests/month served at ~$0.01/request effective cost
This repository provides the concept and reference architecture. The production CSW runs at csw.nskit.io, but the patterns here are general-purpose.
You can:
- Use this as a blueprint to build your own CLI-to-API wrapper for any AI provider with a CLI tool
- Adopt the Worker Mode pattern -- tmux-based persistent sessions with inbox/outbox file communication work for any long-running CLI process
- Use the caching strategy -- install growing-pool-cache from npm for the pool caching pattern
- Copy the database schema -- the reference schema covers sessions, messages, memory, presets, and response caching
| Component | Technology |
|---|---|
| API Server | Node.js + Express |
| AI Backend | Claude CLI (claude -p) |
| Worker Sessions | tmux |
| Database | MySQL 8 |
| Process Manager | systemd / PM2 |
- growing-pool-cache -- Self-growing cache pool for AI-generated content (npm package)
- ai-native-design -- AI-Native Design philosophy for building AI-collaborative frameworks
- NSKit -- The AI-native web framework that CSW was built to serve
CC BY-NC-SA 4.0 — see LICENSE