|
| 1 | +# Production Operations Guide |
| 2 | + |
| 3 | +This guide is the canonical runbook for operating `@fetchkit/ffetch` in production. |
| 4 | + |
| 5 | +## 1. Pre-Deployment Checklist |
| 6 | + |
| 7 | +### Core configuration |
| 8 | + |
| 9 | +- [ ] Set `timeout` per dependency SLA (avoid using one global value for everything). |
| 10 | +- [ ] Set `retries` conservatively (`1-3` in most systems). |
| 11 | +- [ ] Decide `throwOnHttpError` policy (`true` for strict exception flow, `false` for response-driven handling). |
| 12 | +- [ ] Use a runtime-appropriate `fetchHandler` for SSR/edge/custom environments. |
| 13 | + |
| 14 | +### Resilience plugins |
| 15 | + |
| 16 | +- [ ] Enable `circuitPlugin` for external dependencies. |
| 17 | +- [ ] Enable `bulkheadPlugin` for dependencies that can saturate under load. |
| 18 | +- [ ] Enable `dedupePlugin` on bursty read-heavy endpoints. |
| 19 | +- [ ] Enable `hedgePlugin` only for safe methods and latency-sensitive paths. |
| 20 | +- [ ] Validate plugin order assumptions when composing multiple plugins. |
| 21 | + |
| 22 | +### Observability and hooks |
| 23 | + |
| 24 | +- [ ] Instrument `before`/`after`/`onError` hooks for logs and metrics. |
| 25 | +- [ ] Track request latency (`p50/p95/p99`) and error-rate by endpoint. |
| 26 | +- [ ] Track resilience signals: circuit opens, bulkhead queue depth, retry counts. |
| 27 | +- [ ] Add request correlation IDs in `transformRequest`. |
| 28 | + |
| 29 | +## 2. Operational Metrics |
| 30 | + |
| 31 | +Minimum metrics to collect: |
| 32 | + |
| 33 | +- Request count by endpoint/method/status |
| 34 | +- Latency histogram (`p50`, `p95`, `p99`) |
| 35 | +- Error count by error class (`TimeoutError`, `CircuitOpenError`, `NetworkError`, `RetryLimitError`, etc.) |
| 36 | +- Circuit breaker open events and duration |
| 37 | +- Bulkhead `activeCount`, `queueDepth`, rejection count |
| 38 | +- Retry attempts and eventual success-after-retry rate |
| 39 | + |
| 40 | +## 3. Alerting Baseline |
| 41 | + |
| 42 | +- Alert on sustained high error rate for a dependency. |
| 43 | +- Alert when circuit remains open beyond expected recovery windows. |
| 44 | +- Alert when bulkhead queue remains near capacity. |
| 45 | +- Alert when `p99` latency regresses significantly from baseline. |
| 46 | + |
| 47 | +## 4. Incident Playbook |
| 48 | + |
| 49 | +### Circuit open incidents |
| 50 | + |
| 51 | +1. Check downstream dependency health first. |
| 52 | +2. Confirm `threshold`/`reset` values are aligned with failure patterns. |
| 53 | +3. Use fallback/degraded responses at app level while circuit is open. |
| 54 | +4. Avoid releasing queued traffic all at once during recovery. |
| 55 | + |
| 56 | +### High latency incidents |
| 57 | + |
| 58 | +1. Check bulkhead saturation (`activeCount`, `queueDepth`). |
| 59 | +2. Check retry inflation (too many retries amplifying load). |
| 60 | +3. If using hedging, verify `delay` and `maxHedges` are tuned for current latency distribution. |
| 61 | + |
| 62 | +### Rate-limit incidents (429) |
| 63 | + |
| 64 | +1. Ensure retry strategy honors `Retry-After` behavior. |
| 65 | +2. Reduce concurrency (`bulkhead`) and hedge aggressiveness. |
| 66 | +3. Add app-level backpressure or queueing upstream. |
| 67 | + |
| 68 | +## 5. Recommended Baseline Configs |
| 69 | + |
| 70 | +### Internal service-to-service client |
| 71 | + |
| 72 | +```typescript |
| 73 | +createClient({ |
| 74 | + timeout: 10_000, |
| 75 | + retries: 2, |
| 76 | + plugins: [ |
| 77 | + circuitPlugin({ threshold: 5, reset: 30_000 }), |
| 78 | + bulkheadPlugin({ maxConcurrent: 20, maxQueue: 100 }), |
| 79 | + dedupePlugin({ ttl: 30_000 }), |
| 80 | + ], |
| 81 | +}) |
| 82 | +``` |
| 83 | + |
| 84 | +### Latency-sensitive read path |
| 85 | + |
| 86 | +```typescript |
| 87 | +createClient({ |
| 88 | + timeout: 5_000, |
| 89 | + retries: 1, |
| 90 | + plugins: [ |
| 91 | + dedupePlugin({ ttl: 10_000 }), |
| 92 | + hedgePlugin({ delay: 75, maxHedges: 1 }), |
| 93 | + ], |
| 94 | +}) |
| 95 | +``` |
| 96 | + |
| 97 | +## 6. Related References |
| 98 | + |
| 99 | +- [api.md](./api.md) for full option and plugin reference |
| 100 | +- [plugins.md](./plugins.md) for plugin lifecycle and ordering semantics |
| 101 | +- [advanced.md](./advanced.md) for retry, circuit, and operational patterns |
| 102 | +- [errorhandling.md](./errorhandling.md) for exact error behavior |
0 commit comments