Open
Conversation
Add O11Y_SERVICE_URL environment variable for agent observability ingest service with empty string fallback
Implement POST /ingest/api-metrics endpoint using Hono framework to collect observability metrics from API clients. The endpoint validates incoming metrics data including client info, model usage, timing data, token counts, and error states using Zod schema validation. Add dependencies hono and zod for routing and validation. Enable smart placement in Cloudflare Workers configuration for optimal performance. Include comprehensive tests for valid payloads and validation error handling.
Extract inline Zod validation code from the API metrics endpoint into a reusable zodJsonValidator utility function. This improves code maintainability and enables consistent validation patterns across multiple endpoints. - Add @hono/zod-validator dependency - Create validation.ts utility with zodJsonValidator function - Simplify /ingest/api-metrics endpoint by using the new validator - Standardize error response format with ErrorResponse type
Integrate observability metrics collection into the OpenRouter API proxy to track model usage patterns. Creates a new server-side metrics emitter that sends provider, requested model, and resolved model information to the observability service. The metrics are emitted asynchronously using Next.js after() to avoid impacting request latency. Includes best-effort error handling to ensure metrics failures never affect the primary request flow.
Remove clientName from API metrics parameters and derive it server-side from clientSecret using a new mapping function. This simplifies the client API by reducing required parameters and centralizes client identification logic. - Add client-secrets.ts with getClientNameFromSecret() mapping function - Update validation schema to verify clientSecret and transform to include clientName - Remove clientName parameter from ApiMetricsParams type - Update tests to use mapped clientSecret instead of explicit clientName
Add toolsAvailable field to API metrics to track which tools are available in requests. Implement getToolsAvailable helper to extract and format tool names from OpenAI tool definitions, supporting both function and custom tool types. Refactor URL initialization to use IIFE pattern for better error handling and null safety.
Add getToolsUsed function to extract tool usage from assistant messages in the conversation history. This complements the existing toolsAvailable tracking by capturing which tools were actually invoked during the request. The function parses tool_calls from assistant messages and categorizes them as function, custom, or unknown types with their respective names.
Move emitApiMetrics call to after provider response to capture time-to-first-byte (TTFB) measurement. This enables monitoring of model response latency for observability purposes.
Add new emitApiMetricsForResponse function that drains the response body to measure full upstream response time. This provides more accurate timing data by capturing the complete request lifecycle including TTFB and total duration. The original emitApiMetrics function is preserved for backward compatibility.
Add a 60-second timeout to drainResponseBody to prevent background work from running indefinitely on long-lived SSE connections. The function now tracks elapsed time and uses Promise.race to enforce the timeout, properly canceling the reader when the limit is reached.
Add statusCode field to API metrics tracking to enable monitoring of response status distributions and error rates. The status code is captured from the response object and included in the metrics parameters.
…icit field Remove the explicit 'success' boolean field from API metrics schema and instead infer success/failure from the HTTP status code. Error messages are now required when statusCode >= 400, making the API more intuitive and reducing redundancy.
The errorMessage field and its validation logic have been removed from the API metrics schema. Error context can be derived from the statusCode field, eliminating the need for explicit error messages in metrics collection.
Add ApiMetricsTokens type to track input, output, cache write, cache hit, and total tokens. Implement getTokensFromCompletionUsage helper function to extract token metrics from OpenAI CompletionUsage objects. Extend ApiMetricsParams to include optional tokens field for comprehensive API usage monitoring.
Add kiloUserId, organizationId, isStreaming, userByok, and mode fields to API metrics schema to enable better tracking and analysis of API usage patterns. Update all metric emission points to include the new contextual information.
Remove hardcoded TODO placeholder and implement proper client secret configuration for API metrics. The secret is now loaded from O11Y_KILO_GATEWAY_CLIENT_SECRET environment variable and automatically included in metrics requests.
Implement dynamic secret retrieval from Cloudflare's Secrets Store binding to authenticate API clients. The authentication logic has been moved from schema-level validation into the route handler to support asynchronous secret fetching operations. Configuration updates include wrangler.jsonc binding setup for the O11Y_KILO_GATEWAY_CLIENT_SECRET resource. Test suite enhancements provide mock secret bindings and verify proper rejection of unauthorized requests with 403 status codes.
Implement PostHog event capture to track API usage metrics. The /ingest/api-metrics endpoint now forwards validated metrics to PostHog for analytics and monitoring. - Add captureApiMetrics function to send events to PostHog - Configure PostHog API key and host via environment variables - Exclude clientSecret from captured properties for security - Set $process_person_profile based on isAnonymous flag - Update tests to include PostHog configuration
Remove intermediate ctx variable assignments and waitOnExecutionContext calls in favor of directly passing createExecutionContext() to worker.fetch(). This reduces test boilerplate while maintaining the same test behavior.
Add new reusable workflow for deploying o11y service to Cloudflare Workers with environment selection (dev/prod). Integrate o11y deployment into production workflow with automatic triggering when cloudflare-o11y directory changes are detected.
Add optional ipAddress field to API metrics schema and pass it through to PostHog analytics. This enables PostHog to resolve geographic location from the user's actual IP address rather than the Cloudflare worker's IP address, providing more accurate location analytics. Changes include: - Add ipAddress field to ApiMetricsParamsSchema with IPv4/IPv6 validation - Extract and forward IP address in PostHog capture request - Thread ipAddress parameter through OpenRouter API route - Update ApiMetricsParams type definition
…detection Add comprehensive alerting system for LLM API observability: - Implement multi-window burn rate alerting following Google SRE Workbook approach with 3 severity windows (5m/1m, 30m/3m page; 360m/30m ticket) - Add Analytics Engine integration for time-series metrics storage with weighted sampling support for error rates and latency percentiles - Implement KV-based alert deduplication with severity-aware suppression (pages suppress tickets for same dimension) - Add Slack notification delivery with separate webhooks for pages/tickets - Integrate recommended models API endpoint to determine page-eligible models - Configure cron trigger for per-minute alert evaluation - Add comprehensive test coverage for dedup logic and SLO configuration The system tracks error rates (99.9% SLO) and latency p50/p90 thresholds, firing alerts only when both long and short windows exceed burn rate thresholds to reduce false positives.
…variable - Replace toUInt64 type casting with IF expressions for better readability in error rate and slow request queries - Change query format from JSONEachRow to JSON for consistency - Rename O11Y_APP_BASE_URL to O11Y_API_BASE_URL for clarity - Update all references across configuration, tests, and type definitions
Configure custom domain routing for o11y.kiloapps.io in the Cloudflare Worker configuration to enable direct access to the observability service through the custom domain.
Add clientName parameter to alert deduplication functions to ensure alerts are tracked separately per client. This prevents alerts for the same provider:model combination from being incorrectly suppressed across different clients. - Update alertKey() to include clientName in key generation - Add clientName parameter to shouldSuppress() and recordAlertFired() - Update all call sites in evaluate.ts to pass client_name - Add test coverage for client-specific alert suppression - Remove null return from effectiveSeverity() as it always returns a severity
There was a problem hiding this comment.
Pull request overview
This pull request adds a new Cloudflare Worker for observability (o11y) that implements API metrics ingestion and SLO-based alerting for the Kilo AI platform.
Changes:
- Adds a new Cloudflare Worker that ingests API metrics from the Kilo gateway
- Implements multi-window burn-rate alerting based on Google SRE Workbook practices for error rates and latency
- Integrates with PostHog for analytics and Slack for alert notifications
- Adds a new API endpoint
/api/recommended-modelsto expose recommended models for alert filtering
Reviewed changes
Copilot reviewed 29 out of 32 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/lib/o11y/api-metrics.server.ts | Helper functions for extracting API metrics from OpenAI-compatible responses |
| src/lib/config.server.ts | Adds configuration variables for o11y service |
| src/app/api/recommended-models/route.ts | New API endpoint exposing recommended model list |
| src/app/api/openrouter/[...path]/route.ts | Integrates API metrics emission into the gateway proxy |
| cloudflare-o11y/* | Complete o11y worker implementation with alerting, querying, and notification logic |
| .github/workflows/* | CI/CD workflows for deploying the o11y worker |
| pnpm-workspace.yaml | Adds cloudflare-o11y to workspace |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Code Review SummaryStatus: 2 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Files Reviewed (5 files)
|
- Store alert timestamps in ISO 8601 format instead of Unix epoch for better readability - Update authentication error message to be more generic and security-conscious
Wrap JSON.parse in try-catch to prevent crashes when KV cache contains invalid JSON data. Falls through to network fetch on parse errors.
Update worker configuration types generated by wrangler with workerd@1.20260128.0. Adds readonly exports property to ExecutionContext and DurableObjectState interfaces, and removes unnecessary eslint-disable comments from AIGateway type definitions.
- Add type guards to safely access tool.function.name and tool.custom.name properties - Move toolsAvailable and toolsUsed extraction after tool repair logic to ensure accurate metrics - Add fetch-depth: 2 to checkout steps in deploy workflow for proper path filtering
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.