Hinweis: Vage Einträge ohne messbares Ziel, Interface-Spezifikation oder Teststrategie mit
<!-- TODO: add measurable target, interface spec, test strategy -->markieren.
This document covers implementation-specific future enhancements for the API module (src/api/ and include/api/), which exposes ThemisDB over HTTP (stub in src/api/http_server.cpp; live implementation in src/server/http_server.cpp), GraphQL (graphql.cpp + graphql_ws_handler.cpp), WebSocket CDC streaming (ws_handler.cpp), gRPC (grpc_server.cpp + themisdb_grpc_service.cpp), geospatial index hooks (geo_index_hooks.cpp), request tracing (tracing_middleware.cpp), and OTLP span export (otlp_exporter.cpp). Supporting header-only components in include/api/ — rate_limiter.h, audit_logger.h, graphql_cache.h, persisted_queries.h, websocket_handler.h, grpc_bridge.h — are equally in scope. Enhancements to AQL execution, storage engines, or authentication internals are out of scope; only the API surface, transport layer, and middleware pipeline are covered here.
[x]REST endpoint signatures introduced in v1.x must remain backward-compatible; new capabilities are added via versioned prefixes (/v2/) or opt-in headers, not breaking changes to existing/v1/routes. (Enforced —RouteVersionRouterredirects unversioned paths 301 to/v1/; all new functionality is under/v2/.)[x]The GraphQL parser ingraphql.cppusesQueryLimits::defaults()for depth/complexity guards; any new field resolver must enforce those limits to prevent query amplification. (Enforced —QueryLimitspassed toParser::parse()in all call sites;QueryLimits::production()disables introspection.)[x]TLS is mandatory for all production transports; new WebSocket and gRPC transports must share the same TLS context as the existing HTTP listener. (Enforced —GrpcApiServerusesgrpc::SslServerCredentialsfrom the same PEM paths; WebSocket upgrades go through the Beast TLS acceptor.)[x]Auth middleware (src/auth/) is a hard dependency; no new transport may bypass JWT/JWKS validation enforced byjwt_validator.cpp. (Enforced —WsChangeHandler::validate()and gRPC interceptor both callAuthMiddleware::authorizebefore any data is exchanged.)[x]GrpcApiServer::start()must not holdmutex_across blocking operations. (Fixed v1.9.0 — mutex released beforeBuildAndStart(); lock re-acquired afterwards.)[x]GrpcApiServer::stop()must specify a shutdown deadline. (Fixed v1.9.0 — 30-second hard deadline passed toserver_->Shutdown().)[x]GraphQL$variablereferences in field arguments must be resolved at execution time. (Fixed v2.0.0 —Value::VariableReftype +Executor::resolveValue()+ default-value merge inexecuteOperation(); seeinclude/api/graphql.h,src/api/graphql.cpp.)
| Interface | Consumer | Notes |
|---|---|---|
graphql::Parser::parse() |
GraphQL HTTP handler | QueryLimits must be configurable per-tenant |
geo_index_hooks |
REST geo query endpoints | Hook registration must be idempotent for hot-reload |
auth::JWTValidator |
All HTTP/WS/gRPC handlers | Must propagate tenant ID into request context |
cdc::Changefeed |
Planned WebSocket change-stream endpoint | Requires Changefeed::subscribe() returning an async event iterator |
aql::LLMAQLHandler |
AQL execution endpoint | Streaming result set needed for /v2/query/stream |
IGRPCBridge (include/api/grpc_bridge.h) |
gRPC bridge consumers | Interface remains extension point; runtime wiring is factory-driven via ThemisDBGrpcServiceFactory |
Priority: High Target Version: v1.7.0
graphql.cpp implements a full parser and query executor but lacks mutation resolvers, schema introspection (__schema, __type), and subscription over WebSocket. Complete the schema to cover documents, graph edges, vector search, and geospatial queries; add subscription operation support backed by cdc::Changefeed.
Implementation Notes:
[x]AddSchemaRegistryclass tographql.cpp; auto-build from registeredTypeDefinitionobjects at server start.[x]Implement__schemaand__typeintrospection resolvers; required by all major GraphQL clients (Apollo, Relay).[x]Subscription transport: use Boost.Beast WebSocket upgrades; creategraphql_ws_handler.cppimplementing thegraphql-transport-wsprotocol (not the legacysubscriptions-transport-ws).[x]Wirecdc::Changefeed::subscribe(filter)as the event source forsubscription { onChange(collection: "...") { ... } }. Implemented:Changefeed::subscribe(SubscriptionFilter, SubscriptionCallback)+SubscriptionHandleRAII type inchangefeed.h/cpp; wired inGraphQLWsHandler::handleSubscribe()viaextractOnChangeCollection().[x]EnforceQueryLimits::maxSubscriptionsper connection to prevent fan-out DoS.[ ]Ingraphql.h, theParserclass explicitly documents "Not yet supported: Fragments, Directives, Inline fragments." ImplementparseFragmentDefinition()andparseInlineFragment()ingraphql.cpp— without fragment support, clients using Apollo's automatic persisted query fragments or any relay-style fragment composition will fail at parse time.[ ]graphql.h::Parser::error()is documented as deprecated ("Deprecated: UseResult<T>return types instead oferror()method") but the method still exists in the class definition. Remove it after migrating all call sites ingraphql.cppto returnthemis::Result<T>with structuredParseErrorobjects to eliminate the dual error-reporting path.[ ]Schema::introspect()ingraphql.cpponly handles__schemaand__typefields. The GraphQL June 2018 spec also requires__typenameon every composite type,__Field,__InputValue,__EnumValue, and__Directivemeta-types. Add these toSchema::introspect()so introspection-based tooling (code generators, schema diffing tools) works fully.[ ]Executor::executeSelections()ingraphql.cppresolves fields serially in a range-for loop. For independent sibling fields that each invoke storage I/O, this means sequential round-trips. Add parallel field resolution viastd::asyncor a small task graph; guard behind aQueryLimits::parallel_fields_enabledflag to allow gradual rollout.
Performance Targets:
- GraphQL parse + validate + execute for a 10-field document query in < 2 ms (p99) under 500 concurrent HTTP/2 connections.
- Subscription event delivery latency < 50 ms from
Changefeedevent emission to WebSocket frame sent.
API Sketch:
# New subscription type (graphql.cpp SchemaRegistry)
type Subscription {
onChange(collection: String!, filter: ChangeFilter): ChangeEvent!
}
type ChangeEvent {
sequence: Int!
type: ChangeType!
key: String!
document: JSON
timestampMs: Int!
}Priority: High Target Version: v1.7.0
Add a dedicated WebSocket endpoint /v2/changes that multiplexes multiple cdc::Changefeed subscriptions over a single connection. This is distinct from the GraphQL subscription path and targets clients that need raw change events without the GraphQL envelope.
Implementation Notes:
[x]Createws_handler.cpp(src/api/ws_handler.cpp); register routeWS /v2/changesinsrc/server/http_server.cpp.[x]Frame format: newline-delimited JSON, each frame matchingChangefeed::ChangeEvent::toJson()output. (WebSocketSession::pollCDCEventsemits JSON viaev.toJson()/buildEventFrame; legacy path usescdc_message["type"]="cdc_event")[x]Client subscribes/unsubscribes by sending{"action":"subscribe","collection":"orders","filter":{"type":"PUT"}}control frames. (WebSocketSession::processMessagehandlestype="subscribe"/"unsubscribe"for/v2/changes;CdcWebSocketHandler::handleFramehandlesaction="subscribe"/"unsubscribe"/"ack"for/v2/cdc/stream)[x]Implement per-connection back-pressure: if the outbound frame queue exceeds 1,000 entries, close with1011 Internal Errorand log tenant/connection ID. (WebSocketSession::kMaxQueueDepth = 1000)[x]Reuseauth::JWTValidatormiddleware already wired for HTTP; extract Bearer token from the WebSocket upgradeAuthorizationheader. (WsChangeHandler::validate()requirescdc:subscribescope)[ ]WsChangeHandler::validate()inws_handler.cppparses query-string parameters (from_sequence,key_prefix) with ad-hoc string search usingstd::string::find. URL-encoded characters (e.g.,key_prefix=orders%3A) are never decoded, so clients that percent-encode the query string will receive incorrect filter values. Replace with a proper URL-decoding step (e.g., usingboost::urlsor a smallurl_decode()utility) before extracting parameter values.
Performance Targets:
- ≥ 10,000 concurrent WebSocket connections on a single node with < 50 MB additional RSS.
- Frame delivery latency p99 < 30 ms under 5,000 events/sec aggregate throughput.
Priority: High Target Version: v1.8.0
Current REST routes use unversioned paths (e.g., /documents/{id}). Introduce a /v1/ prefix retroactively (with redirect from unversioned) and implement /v2/ routes that support bulk operations, streaming query results, and async job tracking.
Implementation Notes:
[x]AddRouteVersionRoutermiddleware ininclude/server/route_version_router.h; unversioned paths redirect 301 to/v1/; wired insrc/server/http_server.cpp.[x]/v1/routes: exact current behaviour; unversioned paths redirect 301 to/v1/viaRouteVersionRouter::getRedirectTarget().[x]/v2/documents— bulk insert endpoint acceptingapplication/x-ndjsonbody (newline-delimited JSON documents, up to 10,000 per request); implemented inEntityApiHandler::handleBulkNdjson().[x]/v2/query/stream— SSE endpoint implemented viaQueryApiHandler::handleQueryStreamSse(); registered asRoute::QueryStreamSseGet.[x]/v2/jobs/{id}— async job status for long-running queries; store job state incache::AdaptiveQueryCachewith TTL = 1 hour.
Performance Targets:
- Bulk insert of 10,000 256-byte documents in < 500 ms end-to-end (network excluded).
- SSE streaming first-byte latency < 5 ms after query planning completes.
Priority: High Target Version: v2.0.0
themisdb_grpc_service.cpp is now factory-wired for core operations and query execution. Remaining enhancement scope is focused on advanced search parity and hardening of batch semantics.
Implementation Notes:
[x]Createsrc/api/grpc_server.cpp; gRPC C++ server usinggrpc::ServerBuilder(synchronous dispatch model, consistent with the rest of the codebase).[x]Reuse existing service-layer infrastructure viaGrpcApiServer::registerService(); no business logic duplication — service implementations are registered externally.[x]TLS:grpc::SslServerCredentialsusing the same PEM cert/key pair as the Beast HTTP listener; fail-closed on cert load failure.[x]Expose gRPC reflection service in debug builds only to prevent schema leakage in production.[x]ExecuteAQLfactory wiring:ThemisDBGrpcServiceFactorynow injectsAQLEngine*and delegates execution through service implementations.[x]StreamAQLserver-streaming path: streaming AQL execution is wired when the query engine is present; service keepsUNIMPLEMENTEDfail-fast semantics when the dependency is absent.[ ]Advanced search RPC parity:VectorSearch,FilteredVectorSearch,HybridSearch, andFullTextSearchstill require complete backend feature wiring across all deployment profiles; current behavior is dependency/feature-gated and may returnUNIMPLEMENTEDwhere optional engines are missing.[ ]Hard-coded document version inCreateDocumentandUpdateDocument(themisdb_grpc_service.cpp): both handlers unconditionally setresp->set_version(1), regardless of whether the document already existed. Add a real version counter sourced from the storage layer (e.g., a RocksDB sequence number or a dedicated version key) so optimistic-concurrency clients can detect conflicting updates.[ ]BatchWritesilent partial failures (themisdb_grpc_service.cpp): the loop overreq->upserts()incrementsupsertedonly whendb_->put(key, body)returns true, but the final response always setsresp->set_success(true). If some puts fail (e.g., storage full), the caller receives a success response with aupserted_countless than the number of requested writes, with no error code. Change to: ifupserted_count != req->upserts_size(), setsuccess = falseand include error details.[ ]BatchWrite/BatchReadlack input bounds checks: no validation of the number of documents inreq->upserts()or keys inreq->keys(). A single request can contain arbitrarily many items, leading to unbounded memory allocation. Add a hard upper limit (e.g., 10,000 items) with aRESOURCE_EXHAUSTEDgRPC status code on violation.[x]GrpcApiServer::start()mutex scope hardening: blocking startup work is performed outside the critical section; lock is used only for state commit.[x]GrpcApiServer::stop()bounded shutdown: shutdown deadline is set and lock hold duration minimized to avoid state-query contention.[ ]GrpcServerConfig::max_message_size_bytesconfigurability: default remains compile-time; expose a runtime config key (e.g.,grpc.max_message_size_mb) for operator tuning.
Performance Targets:
- gRPC unary
GetDocument< 1 ms added latency vs equivalent REST call (same process). - gRPC streaming
ExecuteQuerysustains ≥ 100,000 rows/sec on localhost.
Priority: Medium Target Version: v1.7.0
All inbound requests must carry or receive a X-Correlation-ID header that propagates through the entire call stack (API → AQL → storage → cache) and appears in all log lines and error responses.
Implementation Notes:
[x]AddTracingMiddlewareinsrc/api/tracing_middleware.cpp; generate UUID v4 ifX-Correlation-IDabsent; inject into thread-localRequestContext. (Implemented —TracingMiddleware::processRequest()usesboost::uuids::random_generatorper thread.)[x]ForwardRequestContext::correlationIdtoinclude/utils/logger.hlog macros via a structured field (correlation_id). (Implemented —utils::Logger::setTraceContext(corr_id)called inprocessRequest().)[x]Echo backX-Correlation-IDin all responses including errors and SSE streams (implemented inHttpServer::applyGovernanceHeaders()).[x]Export span data to OpenTelemetry collector via OTLP HTTP exporter (configurable endpoint inconfig/networking/). Implemented ininclude/api/otlp_exporter.h+src/api/otlp_exporter.cpp(async queue + libcurl POST, OTLP JSON format);TracingMiddlewareextended withfinishSpan()and optionalOtlpExporter*; configuration inconfig/networking/otlp.yaml.[x]Decision: retain proprietaryX-Correlation-IDas the primary correlation header; the OTLP exporter uses the correlation-ID value as the OTLPtraceId. A future W3Ctraceparentbridge can be added when SDK interoperability is required.
Performance Targets:
- Middleware overhead < 10 µs per request (UUID generation + thread-local write).
- Zero correlation ID collision probability for ≥ 1 billion requests (UUID v4 guarantee).
Priority: Medium Target Version: v2.1.0
otlp_exporter.cpp implements an async queue + background-thread OTLP/HTTP exporter using libcurl. Two structural inefficiencies limit throughput and reliability at production scale.
Implementation Notes:
[x]NewCURL*handle per flush batch (otlp_exporter.cpp::flushBatch()): every call toflushBatch()opens a new TCP connection viacurl_easy_init()and cleans up withcurl_easy_cleanup()after the POST. Under the default flush interval (5 s) with 64-span batches this is infrequent, but if the batch interval is reduced or the collector is remote, connection setup becomes the dominant latency. Replace with a persistentCURL*handle created once instart()and reused across batches (setCURLOPT_FORBID_REUSE=0LandCURLOPT_TCP_KEEPALIVE=1L). Implemented:curl_handle_andcurl_headers_members added toOtlpExporter; handle initialised once instart()with both options set, reused influshBatch(), and cleaned up instop()after the flush thread exits.[x]queue_usesstd::vectorwitherase(begin, begin+n)dequeue (otlp_exporter.h+otlp_exporter.cpp::flushLoop()): the internal span queue is astd::vector<SpanData>and the dequeue path callsqueue_.erase(queue_.begin(), queue_.begin() + take_offset), which is O(n) because it shifts all remaining elements. Replace withstd::deque<SpanData>or a fixed-size ring buffer to get O(1) pop-front at the cost of a trivial container change. Implemented:queue_changed tostd::deque<SpanData>;enqueue()now usespop_front()(O(1)) instead oferase(begin()); drain path influshLoop()updated to use move iterators +clear().[x]No retry on transient HTTP errors:flushBatch()now retries up tomax_export_retriestimes (default 3) with exponential back-off (retry_initial_delay_msdoubles each attempt: 100 ms → 200 ms → 400 ms) for retriable HTTP status codes (429, 503) and transient curl transport errors, before dropping the batch and incrementingdropped_count_. Non-retriable HTTP errors (e.g. 400, 404, 500) still drop immediately. Both new fields are exposed inOtlpExporterConfigwith defaultsmax_export_retries = 3andretry_initial_delay_ms = 100.[x]droppedSpanCountmetric not exposed via Prometheus:OtlpExporter::droppedSpanCount()andexportedSpanCount()exist but are not wired to the Prometheus/metricsendpoint. Registerotlp_spans_exported_totalandotlp_spans_dropped_totalcounters in the Prometheus registry atOtlpExporter::start()time. Implemented:setPrometheusRegistry(shared_ptr<prometheus::Registry>)method added (guarded byTHEMIS_HAS_PROMETHEUS); calling it beforestart()causesstart()to registerotlp_spans_exported_totalandotlp_spans_dropped_totalcounter families labelled byservice; both counters are incremented alongside the atomicexported_count_/dropped_count_inenqueue()andflushBatch().
Performance Targets:
- Span enqueue (hot path) < 500 ns per call (single lock acquire + vector push_back or deque push_back).
- Flush of 64 spans to a local OTLP collector < 5 ms end-to-end (reusing a persistent connection).
Priority: Medium Target Version: v2.0.0
include/api/rate_limiter.h implements a token-bucket rate limiter. Two structural issues limit correctness and scalability in long-running deployments.
Implementation Notes:
[ ]buckets_map grows unbounded (rate_limiter.h::RateLimiter::allow()): every unique key passed toallow()creates aBucketentry that is never removed. In production, keys are typically tenant IDs or IP addresses; a deployment running for weeks will accumulate thousands of stale buckets. Add a TTL-based eviction pass: inallow()(or a dedicated background sweep), remove buckets whoselast_refillis older than2 × windowand whosetokens >= capacity(fully recharged means no active traffic).[ ]OperationRateLimiter::allow()holds outer mutex while calling innerRateLimiter::allow()(rate_limiter.h):OperationRateLimiter::allow()takesmutex_with astd::lock_guard, then callsit->second->allow(key, cost), which in turn takesRateLimiter::mutex_. This is a two-mutex lock chain on every allowed request. Under high concurrency (e.g., 5,000 GraphQL requests/sec), this creates a mutex bottleneck on the outer lock. Replace the outerstd::mutexwithstd::shared_mutex(shared lock forallow()/remaining(); exclusive lock only forsetLimit()).[ ]RateLimiter::allow()callssteady_clock::now()inside the lock (rate_limiter.h::Bucket::consume()):Bucket::refill()callsstd::chrono::steady_clock::now()while the outermutex_is held. A clock syscall under a mutex adds unnecessary critical-section time. Computenowbefore acquiring the lock and pass it toconsume().
Performance Targets:
RateLimiter::allow()throughput ≥ 1,000,000 calls/sec single-thread (vs. ~200,000 with current nested locks).- Stale bucket count bounded to ≤ 2× the number of active clients at any time.
Priority: Medium Target Version: v2.0.0 Status: ✅ Implemented
include/api/graphql_cache.h::ResponseCache::invalidatePattern() previously contained a TODO: Implement pattern-based invalidation comment. The implementation now performs selective eviction.
Implementation Notes:
[x]ResponseCache::invalidatePattern()always clears entire cache (graphql_cache.h:290): the method now iterates the cache and evicts only entries whosecollectionstag set contains the given pattern.CachedResponsehas been extended with astd::unordered_set<std::string> collectionsfield. The genericCache<T>template gained aneraseIf(pred)method for O(n) selective eviction.
Performance Targets:
- Targeted invalidation of a single collection evicts ≤ 10% of cached entries when 10 distinct collections are active. ✅ Verified by
GraphQLCache.InvalidatePatternPerformanceTargettest.
Priority: Medium Target Version: v2.0.0
include/api/audit_logger.h::AuditLogger::log() holds mutex_ for the entire duration of calling all registered handlers. Handlers may write to disk, push to a network audit sink, or run regex matching — all while the mutex is held.
Implementation Notes:
[x]AuditLogger::log()holdsmutex_during handler callbacks (audit_logger.h::log()): astd::lock_guard<std::mutex> lock(mutex_)is held for the entire body oflog(), including the innerfor (const auto& handler : handlers_) { handler(entry); }loop. File-writing or network-sending handlers will stall every concurrent API thread that tries to emit an audit entry. Decouple: copy the handlers vector under the lock (O(n) pointer copies), release the lock, then invoke the handlers outside the critical section. The buffer append (also inside the lock) is already fast and should remain protected. (Fixed —log()now copieshandlers_under a scoped lock, releases the lock, then invokes each handler; buffer append and stats update remain protected by a second scoped lock.)[x]In-memory audit buffer is not persistent (audit_logger.h):buffer_(a circular in-memory vector) is lost on process restart. Add an optional file-backedAuditLogHandlerthat appends newline-delimited JSON audit entries to a configurable path, and register it by default whenconfig/audit.yamlspecifiespersistence: file. (Implemented —FileAuditLogHandlerclass added toaudit_logger.h;AuditLogger::addFileHandler(path)convenience method registers a JSONL-appending handler;config/audit.yamlnow contains apersistence:section withbackend: none|fileandfile_pathsettings.)
Priority: High Target Version: v1.8.0
graphql_ws_handler.cpp::handleSubscribe() captures a raw GraphQLWsHandler* (self) pointer inside the CDC callback lambda that is passed to Changefeed::subscribe(). The SubscriptionHandle RAII type should cancel the subscription on destruction, but the safety of this interaction depends on CDC correctly serialising the callback teardown before the handle destructor returns.
Implementation Notes:
[x]Rawselfpointer captured in CDC callback (graphql_ws_handler.cpp::handleSubscribe()): the lambda[self, sub_id](const themis::Changefeed::ChangeEvent& ev) { ... std::lock_guard<std::mutex> lk(self->mutex_); self->pending_frames_.push_back(frame); }is invoked by the CDC system on its own thread. If the CDC implementation allows callbacks to fire afterSubscriptionHandledestruction (even briefly), this is a use-after-free. Add astd::shared_ptr<std::atomic<bool>>"alive" flag shared between the handler and the lambda; the lambda checks it before dereferencingself, and the flag is set to false inGraphQLWsHandler::reset()before subscriptions are cleared. (Implemented —alive_member added to header; constructor initialises totrue;reset()storesfalsewithmemory_order_releasebeforesubscriptions_.clear(); lambda capturesaliveby value and loads withmemory_order_acquirebefore dereferencingself.)[x]Missing step-2 inhandleSubscribe()comment sequence (graphql_ws_handler.cpp): the comment block labels "step 1" (reject duplicate IDs + enforce max_subscriptions) and "step 3" (parse payload), with no "step 2". This indicates a planned intermediate validation step (likely query variable type-checking against the schema) was omitted. Add schema-level argument type validation: verify thatvariablesprovided in the payload match the declaredVariableDefinitiontypes in the parsed operation before registering the subscription. (Implemented —validateVariables(const graphql::Operation&, const nlohmann::json&)private static helper validates required/non-null presence, null-value legality, list vs. scalar shape, and built-in scalar type matching (String/ID/Int/Float/Boolean). Called in step 2 ofhandleSubscribe(), after parse/operation-type validation and before subscription registration.)
Performance Targets:
- Zero use-after-free races under 10,000 concurrent subscription setup/teardown cycles.
Priority: Low Target Version: v2.1.0
include/api/grpc_bridge.h defines a pure-virtual IGRPCBridge interface and supporting plain-data structs (ServiceDescriptor, GRPCRequest, GRPCMetadata) for registering and routing gRPC services. No concrete implementation is registered anywhere in the codebase.
Implementation Notes:
[ ]IGRPCBridgehas no concrete implementation (grpc_bridge.h): the interface exposesregisterService(),route(),getMetadata(), andlistServices()pure-virtual methods. ImplementGrpcBridgeImplin a dedicated API bridge implementation file (planned) that holds astd::unordered_map<std::string, ServiceDescriptor>guarded bystd::shared_mutexand delegates routing toGrpcApiServer::registerService().[ ]IGRPCBridgehas no integration tests: add a dedicated gRPC bridge test target exercising service registration, duplicate-name rejection, and metadata lookup.
| Test Type | Coverage Target | Notes |
|---|---|---|
| Unit | >80% new code | Test graphql::Parser new resolvers with QueryLimits boundary cases; mock Changefeed for subscription tests |
| Integration | All /v1/ routes ≥ 95% |
tests/test_api_integration.cpp; add WebSocket client tests for /v2/changes |
| Performance | Regression ≤ 5% on existing endpoints | Benchmark with wrk at 500 concurrent connections; alert on p99 regression |
| gRPC stub coverage | Advanced search and stream RPCs have integration tests | tests/test_themisdb_grpc_service.cpp; use grpc::testing::MockServerWriter for StreamAQL |
| Metric | Current | Target | Method |
|---|---|---|---|
| GraphQL parse+execute (10-field query) | ~5 ms (estimate) | < 2 ms p99 | tests/test_graphql_variables.cpp + dedicated benchmark task |
| WebSocket concurrent connections | 0 (not implemented) | ≥ 10,000 | Load test with k6 |
Bulk insert 10K docs via /v2/documents |
N/A | < 500 ms | benchmarks/bench_api_endpoints.cpp |
| Correlation ID middleware overhead | N/A | < 10 µs/req | microbenchmark in benchmarks/bench_api_endpoints.cpp |
| OTLP span flush (64 spans, persistent conn) | N/A | < 5 ms | planned OTLP microbenchmark target |
RateLimiter::allow() throughput |
~200K calls/sec (est.) | ≥ 1M calls/sec | microbenchmark after shared_mutex migration |
[x]All WebSocket upgrade requests must be validated byauth::JWTValidatorbefore the upgrade handshake completes; reject with HTTP 401 before protocol switch. (WsChangeHandler::validate()checks Bearer token / JWT usingAuthMiddleware::authorizewithcdc:subscribescope before any handshake)[x]GraphQL__schemaintrospection disabled viaQueryLimits::allow_introspection = false;QueryLimits::production()factory sets this tofalse; enforced inParser::parseField(). Expose a config flag inconfig/networking/when a configuration layer is added.[x]Rate limiting middleware (RateLimitingMiddleware) is applied to all/v2/routes viaHttpServer::checkRateLimit();/v2/documentshas a tighter per-endpoint override (50% of default capacity) to prevent bulk-insert abuse.[ ]QueryAllowListdisabled by default (include/api/persisted_queries.h::QueryAllowList):enabled_ = falsein the default constructor. In production deployments, the allow-list should be enforced to prevent ad-hoc query injection. Document the activation path (QueryAllowList::instance().setEnabled(true)) in the operations runbook and add a startup check that logs aTHEMIS_WARNif the allow-list is disabled in a production build (detected viaNDEBUG).[ ]BatchWritein gRPC service has no atomicity guarantee (themisdb_grpc_service.cpp): individual document writes inBatchWriteare not wrapped in aRocksDB::WriteBatch. A server crash mid-loop leaves a partially applied batch with no way for the client to distinguish committed from uncommitted entries. UseRocksDBWrapper::WriteBatchWrapper(already in the codebase, used byGeoIndexHooks::onEntityPutAtomic) to makeBatchWriteatomic.
[1] Hartig, O., & Pérez, J. (2018). Semantics and Complexity of GraphQL. Proceedings of the 2018 World Wide Web Conference (WWW), 1155–1164. https://doi.org/10.1145/3178876.3186014
[2] Fette, I., & Melnikov, A. (2011). The WebSocket Protocol. RFC 6455. IETF. https://doi.org/10.17487/RFC6455
[3] Grigorik, I. (2013). High Performance Browser Networking, Chapter 4: Transport Layer Security. O'Reilly Media. https://hpbn.co/transport-layer-security-tls/
[4] Suresh, V., & Nielsen, F. (2020). gRPC: A Framework for High-Performance Client-Server Applications. IEEE Software, 37(5), 26–32. https://doi.org/10.1109/MS.2020.2993646
[5] Montesi, F., & Weber, J. (2016). Circuit Breakers, Discovery, and API Gateways in Microservices. arXiv preprint. https://arxiv.org/abs/1609.05830
[6] Fielding, R. T. (2000). Architectural Styles and the Design of Network-based Software Architectures (Doctoral dissertation, University of California, Irvine). https://ics.uci.edu/~fielding/pubs/dissertation/top.htm
[7] Leitner, P., & Cito, J. (2016). Patterns in the Chaos — A Study of Performance Variation and Predictability in Public IaaS Clouds. ACM Transactions on Internet Technology, 16(3), 1–23. https://doi.org/10.1145/2885497
[8] Belshe, M., Peon, R., & Thomson, M. (2015). Hypertext Transfer Protocol Version 2 (HTTP/2). RFC 7540. IETF. https://doi.org/10.17487/RFC7540
[9] Hunt, P., Konar, M., Junqueira, F. P., & Reed, B. (2010). ZooKeeper: Wait-free Coordination for Internet-scale Systems. Proceedings of the 2010 USENIX Annual Technical Conference (ATC), 145–158. (Relevance: atomic write-batch semantics and distributed coordination patterns used in gRPC BatchWrite atomicity design.) https://www.usenix.org/conference/usenix-atc-10/zookeeper-wait-free-coordination-internet-scale-systems
[10] Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. NIST Special Publication 800-145. https://doi.org/10.6028/NIST.SP.800-145 (Relevance: multi-tenant API isolation requirements for per-tenant rate limiting and namespace routing.)
GAP-016 – identified via static analysis (2026-04-21). Reference:
docs/governance/SOURCECODE_COMPLIANCE_GOVERNANCE.md.
Scope: src/api/grpc_server.cpp:295
- TLS-disabled mode must still be allowed in development (
THEMIS_ENV=development) - In production, server startup must fail with a clear error message
// In GrpcApiServer::buildCredentials():
if (!config_.tls_enabled) {
const char* env = std::getenv("THEMIS_ENV");
if (env && std::string(env) == "production") {
THEMIS_CRITICAL("gRPC: InsecureServerCredentials forbidden in production – set tls_enabled=true");
throw std::runtime_error("gRPC TLS required in production");
}
THEMIS_CRITICAL("gRPC: InsecureServerCredentials active – all gRPC traffic is unencrypted");
return grpc::InsecureServerCredentials();
}- Unit test:
tls_enabled=false+THEMIS_ENV=production→std::runtime_errorthrown - Unit test:
tls_enabled=false+THEMIS_ENV=development→ Insecure credentials + CRITICAL log - Unit test:
tls_enabled=true→ SslServerCredentials returned
- No runtime overhead (check only on server startup)
- Production guard must not be bypassable by missing env var (default = deny in production)