Skip to content

fix: structog instead of logging, also updated middlewares#175

Merged
HardMax71 merged 12 commits intomainfrom
fix/logging
Feb 13, 2026
Merged

fix: structog instead of logging, also updated middlewares#175
HardMax71 merged 12 commits intomainfrom
fix/logging

Conversation

@HardMax71
Copy link
Owner

@HardMax71 HardMax71 commented Feb 12, 2026


Summary by cubic

Switched backend to structlog and OpenTelemetry tracing with a single OTLP traces endpoint and tail sampling. Removed correlation IDs across APIs, events, and storage; request size limits remain enforced by RequestSizeLimitMiddleware using MAX_REQUEST_SIZE_MB.

  • Refactors

    • Replaced logging.Logger with structlog BoundLogger; logs bind OTEL span context, sanitize sensitive data, and use key=value fields.
    • Removed CorrelationMiddleware and all correlation_id usage; events link via aggregate_id; replays use replay_id with updated indexes, models, and responses.
    • Added a DI-managed Tracer (app/core/tracing/tracer.py) initialized from Settings; instruments httpx/pymongo/logging with ParentBased + TraceIdRatioBased sampling; removed custom tracing/adaptive sampling code.
    • Providers inject BoundLogger and Tracer; services/workers no longer initialize tracing.
    • Settings/CORS: dropped ENABLE_TRACING/Jaeger fields; added OTLP_TRACES_ENDPOINT; stopped exposing X-Correlation-ID and X-Request-ID.
    • Updated otel-collector to include tail_sampling (error, slow, probabilistic) in the traces pipeline.
    • Pod monitor config is provided via DI with defaults (namespace=integr8scode, kubeconfig_path=None).
    • Reverted request size enforcement to RequestSizeLimitMiddleware (dependency approach removed).
  • Migration

    • Set OTLP_TRACES_ENDPOINT and deploy the updated otel-collector with tail sampling.
    • Stop sending/expecting X-Correlation-ID and X-Request-ID; rely on OTEL trace context.
    • Update any dashboards/parsers that used correlation_id (events, indexes, exports, user settings, pod labels) to use aggregate_id or replay_id.
    • Configure MAX_REQUEST_SIZE_MB as needed; enforcement is via RequestSizeLimitMiddleware.
    • If pod monitoring relied on K8S_NAMESPACE/KUBECONFIG env vars, adjust deployment or provider configuration to the new defaults.

Written for commit ab40197. Summary will update on new commits.

Summary by CodeRabbit

  • New Features

    • Structured logging rolled out for clearer, consistent logs.
    • Trace collector tail-sampling added to reduce noise (focus on error/slow traces).
  • Configuration Changes

    • Tracing config simplified to a single OTLP endpoint.
  • Refactor

    • Correlation ID removed across UI, filters, exports and APIs; replay_id introduced for replay flows and shown in replay responses.

@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed legacy correlation-ID handling and adaptive sampler; consolidated tracing into a new Tracer class with OTLP support; migrated logging to structlog and updated many DI/provider/service/repository signatures to accept structlog BoundLogger; renamed/removed correlation_id in replay/event models and APIs.

Changes

Cohort / File(s) Summary
Logging core
backend/app/core/logging.py, backend/pyproject.toml
Replaced legacy JSON/formatter/filter pipeline with structlog processors (sanitizer, add_otel_context); setup_logger now returns a structlog BoundLogger; added structlog dependency.
Tracing consolidation
backend/app/core/tracing/__init__.py, (removed) backend/app/core/tracing/config.py, backend/app/core/tracing/models.py, backend/app/core/tracing/utils.py
Introduced a single Tracer class that builds Resource/sampler/provider and instruments libraries; removed previous tracing config, models, utils and TracerManager artifacts.
Adaptive sampler removed
backend/app/core/adaptive_sampling.py, backend/tests/unit/core/test_adaptive_sampling.py
Deleted AdaptiveSampler implementation and its unit tests.
Correlation removal & replay rename
backend/app/core/correlation.py, backend/app/domain/events/typed.py, backend/app/db/docs/replay.py, backend/app/domain/admin/replay_models.py, backend/app/domain/replay/models.py, backend/app/schemas_pydantic/admin_events.py
Removed CorrelationContext/CorrelationMiddleware; stripped correlation_id from many domain/schema models; renamed replay-related identifiers to replay_id.
Admin routes / export changes
backend/app/api/routes/admin/events.py, frontend/src/routes/admin/AdminEvents.svelte, frontend/src/lib/admin/events/eventTypes.ts, frontend/src/lib/api/types.gen.ts
Removed correlation_id from export route signature and query params; replay/export flows now use replay_id/aggregate_id; frontend UI and generated types updated (filters, displays, tests).
Provider & DI signature updates
backend/app/core/providers.py, backend/app/core/dishka_lifespan.py, backend/app/core/middlewares/metrics.py
Replaced many provider logger types to structlog.stdlib.BoundLogger; replaced TracerManager with new Tracer; updated provider/factory signatures to accept tracer/logger and propagate bound logging.
Logger migration across codebase
backend/app/services/..., backend/app/db/repositories/..., backend/app/events/..., backend/app/dlq/manager.py, many tests under backend/tests/**
Updated ~40+ constructors and call sites to use structlog BoundLogger, converting extra={} logging to structured keyword args; tests switched logger initialization to structlog.
Span attribute updates
backend/app/api/routes/execution.py, backend/app/db/repositories/event_repository.py, backend/app/services/notification_service.py, backend/app/services/saga/saga_orchestrator.py
Replaced custom add_span_attributes utility with direct OpenTelemetry calls via trace.get_current_span().set_attributes(...) using plain string attribute keys.
Worker startup simplification
backend/workers/run_*.py
Removed conditional tracing initialization and related imports from worker entrypoints.
Database typing cleanup
backend/app/core/database_context.py
Removed asynchronous MongoDB typing aliases and their exports.
Schema / docs / frontend adjustments
docs/*, docs/reference/openapi.json, frontend/src/components/admin/events/*
Removed correlation_id from OpenAPI and docs, adjusted DB indexes and frontend components/tests, and renamed replay fields to replay_id.
OTel collector / config changes
backend/otel-collector-config.yaml, backend/config.toml, backend/config.test.toml
Added tail_sampling processor to collector config; replaced Jaeger/old tracing flags with OTLP_TRACES_ENDPOINT and aligned test config.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Tracer as Tracer (new)
    participant OTLP as OTLP Exporter
    participant Libs as Instrumented Libraries

    App->>Tracer: instantiate Tracer(settings, logger)
    Tracer->>Tracer: build Resource, choose Sampler, configure Provider
    Tracer->>OTLP: attach exporter if OTLP_TRACES_ENDPOINT set
    Tracer->>Libs: instrument FastAPI / HTTPX / PyMongo / logging
    Tracer->>App: set global tracer provider & propagator
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through code with nimble paws,
Swapped IDs and loggers without a pause,
Tracer wakes and instruments the streams,
Replay IDs leap through UI dreams,
A rabbit’s small tweak — tidy, bold applause.

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title mentions 'structog' (likely a typo for 'structlog') and 'middlewares', but is vague about the scope of changes. Consider revising the title to be more specific and accurate, such as: 'chore: migrate logging to structlog and remove correlation ID infrastructure' or 'refactor: replace logging with structlog, remove correlation middleware'.
✅ Passed checks (2 passed)
Check name Status Explanation
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/logging

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 85 files

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/core/logging.py">

<violation number="1" location="backend/app/core/logging.py:26">
P1: Sensitive data redaction now only applies to the `event` field, so exception/stack strings added by `format_exc_info`/`StackInfoRenderer` remain unredacted. This can leak secrets in stack traces or exception messages. Consider sanitizing all string fields in the event dict.</violation>
</file>

<file name="backend/app/core/tracing/__init__.py">

<violation number="1" location="backend/app/core/tracing/__init__.py:49">
P2: Hard-coding `insecure=True` disables TLS for all OTLP gRPC exports, which can leak trace data or break when the collector requires TLS. Make this configurable or derive it from the endpoint scheme so production can use secure transport.</violation>
</file>

<file name="backend/app/db/repositories/event_repository.py">

<violation number="1" location="backend/app/db/repositories/event_repository.py:48">
P2: OpenTelemetry span attributes only allow primitive types; passing EventType (likely an Enum) can lead to attributes being dropped or rejected. Convert the event type to a string or value before setting the attribute.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/services/saved_script_service.py (1)

21-28: ⚠️ Potential issue | 🟠 Major

Use keyword arguments directly instead of extra={} for idiomatic structlog logging.

The current extra={...} pattern doesn't work as intended with this structlog configuration. Without structlog.stdlib.ExtraAdder() in the processor chain, the extra dict is treated as a single nested key in the JSON output, not as flattened context fields. This affects ~10 logging call sites in this file.

Pass context as direct keyword arguments instead:

self.logger.info(
    "Creating new saved script",
    user_id=user_id,
    script_name=saved_script_create.name,
    script_length=len(saved_script_create.script),
)

This is the idiomatic structlog approach and will ensure proper log structure for analysis and aggregation.

backend/app/services/notification_service.py (1)

144-153: ⚠️ Potential issue | 🟠 Major

extra={...} creates nested dicts instead of top-level fields in structlog output.

In stdlib logging, extra is a special parameter that merges keys into the LogRecord. In structlog, extra has no special meaning—it becomes a nested dict under an "extra" key in the structured output.

The test suite (test_logging_and_correlation.py:180) demonstrates the idiomatic pattern: logger.info("test message", key="value") produces {"event": "test message", "key": "value"} at the top level. Using extra={"key": "value"} instead produces {"event": "test message", "extra": {"key": "value"}}, which breaks downstream log parsing and alerting that depend on top-level field access.

This pattern appears across 30+ files in the codebase (kafka_event_service.py, sse_service.py, user_settings_service.py, admin services, saga services, execution_service.py, dlq/manager.py, repositories, and others).

♻️ Suggested fix (example for lines 144-153)
         self.logger.info(
-            f"Creating notification for user {user_id}",
-            extra={
-                "user_id": user_id,
-                "channel": channel,
-                "severity": str(severity),
-                "tags": tags,
-                "scheduled": scheduled_for is not None,
-            },
+            "Creating notification",
+            user_id=user_id,
+            channel=channel,
+            severity=str(severity),
+            tags=tags,
+            scheduled=scheduled_for is not None,
         )
🤖 Fix all issues with AI agents
In `@backend/app/core/logging.py`:
- Around line 20-31: The sanitizer currently only redacts sensitive data in
event_dict["event"] inside sanitize_sensitive_data, so keyword args (e.g.,
token=secret) remain unhashed; update the function to iterate over event_dict
items and, for any value that is a str, apply the same SENSITIVE_PATTERNS
substitutions and replace the value in event_dict, while leaving non-string
values untouched; also update the docstring of sanitize_sensitive_data to
document that all string fields in event_dict are sanitized and reference
SENSITIVE_PATTERNS and the function name for clarity.
- Around line 58-88: The setup_logger function is reconfiguring structlog every
time it's called, clearing global state; add an idempotency guard (e.g., a
module-level boolean like _logger_configured and a cached module-level logger
variable) so setup_logger(...) performs structlog.configure(...) only once and
subsequent calls return the cached structlog.get_logger("integr8scode") without
reconfiguring; alternatively remove direct calls to setup_logger and ensure
callers use the DI-provided LoggingProvider/APP-scoped logger (update references
in callers such as main.py and dishka_lifespan.py to accept/inject the logger
instead of calling setup_logger directly).

In `@backend/app/domain/events/typed.py`:
- Line 37: EventMetadata's correlation_id default of "" causes app-scoped
services to emit events without tracing; to fix, ensure correlation IDs are
propagated from request-scoped code into services that create events: either
make services that emit events (e.g., UserSettingsService) request-scoped
(change DI registration to request scope) so they can read
request.state.correlation_id in publish_event, or explicitly thread a
correlation_id parameter through public APIs and constructors so methods that
build EventMetadata (and event classes like UserSettingsUpdatedEvent) receive
and set correlation_id; update call sites in ExecutionService and request
handlers to pass the request's correlation_id into service calls and into
publish_event instead of relying on the empty default.

In `@backend/pyproject.toml`:
- Line 125: The pyproject.toml dependency for structlog is using a range
operator ("structlog>=25.5.0") which breaks reproducible builds; change that
entry to an exact pin ("structlog==25.5.0") so it matches the rest of the file
and ensures deterministic installs—update the line containing
"structlog>=25.5.0" accordingly and run your dependency install/check to
confirm.
🧹 Nitpick comments (26)
backend/app/services/saga/execution_saga.py (1)

23-23: Prefer structlog keyword arguments over f-strings for structured log messages.

Using f-strings with structlog defeats the purpose of structured logging — the interpolated values won't be individually queryable/filterable in log aggregation systems. This pattern recurs throughout this file (lines 23, 52, 90, 97, 126, 149, 167, 183).

Example fix for line 23:

♻️ Suggested refactor
-        logger.info(f"Validating execution {event.execution_id}")
+        logger.info("Validating execution", execution_id=event.execution_id)

Apply the same pattern to all other f-string log calls in this file.

backend/app/services/user_settings_service.py (1)

41-50: Mixed logging styles: f-strings and extra={} dict are both non-idiomatic for structlog.

The file mixes two patterns:

  • extra={...} dict (line 41) — this is a stdlib logging convention; structlog passes context as keyword arguments directly.
  • f-strings (line 50, 201, 206, 212) — interpolated values are not structured.

With structlog.stdlib.BoundLogger, the idiomatic approach is:

# Instead of extra={}
self.logger.info("UserSettingsService initialized",
    cache_ttl_seconds=self._cache_ttl.total_seconds(),
    max_cache_size=self._max_cache_size)

# Instead of f-strings
self.logger.debug("Settings cache hit", user_id=user_id, cache_size=len(self._cache))

Whether extra= works depends on your structlog processor chain configuration, but switching to direct kwargs is more portable and idiomatic.

backend/app/services/execution_service.py (1)

116-124: Same extra={} pattern note as in user_settings_service.py.

The extra= dict style is used extensively throughout this file. With structlog, prefer passing context as direct keyword arguments for idiomatic structured logging. This applies to all extra={...} calls in this file.

backend/app/services/admin/admin_user_service.py (1)

36-36: Consider using structlog's native keyword arguments instead of extra={}.

With structlog.stdlib.BoundLogger, context variables can be passed directly as keyword arguments (e.g., self.logger.info("Admin getting user overview", target_user_id=user_id, hours=hours)). The extra={} dict is a stdlib logging convention. While it works when structlog is configured with stdlib compatibility, using native kwargs is more idiomatic and ensures the context is properly processed by all structlog processors.

This applies throughout this file and likely across the codebase.

backend/app/services/result_processor/processor.py (1)

73-74: Prefer structured key-value pairs over f-string interpolation in structlog.

Using f-strings in log messages (e.g., f"Failed to handle ExecutionCompletedEvent: {e}") defeats the purpose of structured logging — the error details get baked into the message string rather than being searchable fields. This pattern appears in lines 73, 95, and 121.

♻️ Example refactor
-            self.logger.error(f"Failed to handle ExecutionCompletedEvent: {e}", exc_info=True)
+            self.logger.error("Failed to handle ExecutionCompletedEvent", error=str(e), exc_info=True)
backend/app/services/pod_monitor/event_mapper.py (1)

82-85: Structured logging calls still use f-strings instead of structlog keyword arguments.

Throughout this file, log calls embed data in f-strings (e.g., self.logger.info(f"POD-EVENT: type={event_type} ...")). This negates the key benefit of structlog — machine-parseable structured fields. Consider using keyword arguments instead:

self.logger.info("pod_event_received", event_type=event_type, name=..., namespace=..., phase=...)

This applies to all ~20 log calls in the file. Not blocking for this PR, but worth tracking as a follow-up.

backend/tests/unit/services/pod_monitor/test_event_mapper.py (1)

206-206: caplog fixture is unused in this test.

The caplog: pytest.LogCaptureFixture parameter is declared but never asserted on. With structlog (unless explicitly configured to route through stdlib), caplog won't capture structured log output anyway. Consider removing it.

backend/app/services/sse/sse_service.py (1)

6-7: Nit: extra blank line after import.

Line 7 is an empty line that creates a double blank line between the structlog import and the next import block (line 8). Minor formatting nit.

backend/app/core/tracing/__init__.py (2)

45-51: insecure=True is hardcoded — make it configurable.

This forces plaintext gRPC for all environments. If the OTLP collector is reachable over a network (not localhost), spans are transmitted without TLS. Consider adding a setting (e.g., OTLP_INSECURE: bool = True) so production deployments with TLS-enabled collectors can disable this.

🔒 Suggested change
             provider.add_span_processor(
                 BatchSpanProcessor(OTLPSpanExporter(
                     endpoint=settings.OTLP_TRACES_ENDPOINT,
-                    insecure=True,
+                    insecure=settings.OTLP_INSECURE,
                 ))
             )

20-64: No shutdown() — pending spans may be lost on graceful exit.

BatchSpanProcessor buffers spans and flushes them periodically. Without calling provider.shutdown() during application teardown, any buffered spans at exit will be dropped. Consider exposing a shutdown method (or registering an atexit handler) so the DI container or lifespan can flush spans cleanly.

♻️ Suggested addition
 class Tracer:
     """DI-managed OpenTelemetry tracer. Initialization happens on construction."""

     def __init__(self, settings: Settings, logger: structlog.stdlib.BoundLogger) -> None:
+        self._provider: TracerProvider | None = None
         name = settings.TRACING_SERVICE_NAME
         rate = settings.TRACING_SAMPLING_RATE
         ...
         provider = TracerProvider(resource=resource, sampler=sampler)
+        self._provider = provider
         ...

+    def shutdown(self) -> None:
+        """Flush pending spans and shut down the tracer provider."""
+        if self._provider:
+            self._provider.shutdown()
backend/app/services/event_replay/replay_service.py (3)

60-61: Prefer structured kwargs over f-strings with structlog.

With structlog, using f-strings loses the structured context. Pass the error as a keyword argument so it appears as a separate field in structured output.

♻️ Suggested change
-            self.logger.error(f"Failed to create replay session: {e}")
+            self.logger.error("Failed to create replay session", error=str(e))

270-278: Use direct keyword arguments instead of extra={} for structlog.

The extra={} dict is a stdlib logging pattern. With structlog.stdlib.BoundLogger, keyword arguments should be passed directly so they appear as top-level keys in the structured log output. Using extra={...} nests them under an extra key, which defeats the purpose of the structlog migration.

This applies to all extra={} usages in this file (lines 172, 270–278, 296–298, 325–327, and 369).

♻️ Example fix for this call site
         self.logger.info(
             "Replay session finished",
-            extra={
-                "session_id": session.session_id,
-                "status": session.status,
-                "replayed_events": session.replayed_events,
-                "failed_events": session.failed_events,
-            },
+            session_id=session.session_id,
+            status=session.status,
+            replayed_events=session.replayed_events,
+            failed_events=session.failed_events,
         )

368-369: Same two issues here: f-string + extra pattern missing.

-            self.logger.error(f"Failed to update session in database: {e}")
+            self.logger.error("Failed to update session in database", error=str(e))
backend/app/db/repositories/notification_repository.py (1)

205-205: Use structured key-value pairs instead of f-string for structlog.

With structlog, embedding data via f-strings loses the benefit of structured logging (fields won't be queryable in log aggregators). Pass values as keyword arguments instead.

♻️ Suggested fix
-        self.logger.info(f"Found {len(user_ids)} users with roles {roles}")
+        self.logger.info("Found users by roles", count=len(user_ids), roles=roles)
backend/app/services/admin/admin_settings_service.py (1)

20-23: Prefer structlog keyword arguments over extra={} dict.

With structlog's BoundLogger, pass context as direct keyword arguments rather than wrapping them in extra={}. The extra pattern is a stdlib logging convention; structlog natively supports self.logger.info("msg", user_id=user_id) which makes fields first-class in the structured output.

This pattern appears in multiple log calls in this file (lines 22, 29, 39) and likely across other migrated files.

♻️ Suggested fix (example for one call site)
         self.logger.info(
             "Admin retrieving system settings",
-            extra={"user_id": user_id},
+            user_id=user_id,
         )
backend/app/db/repositories/execution_repository.py (1)

22-24: Same extra={} pattern — use structlog keyword args directly.

As noted in admin_settings_service.py, prefer passing context as keyword arguments rather than wrapping in extra={}. This applies to all log calls in this file (lines 22, 24, 28, 31, 34, 51).

♻️ Example fix
-        self.logger.info("Inserting execution into MongoDB", extra={"execution_id": doc.execution_id})
+        self.logger.info("Inserting execution into MongoDB", execution_id=doc.execution_id)
backend/app/db/repositories/event_repository.py (1)

55-57: Consider using structlog's key-value style instead of f-strings.

With structlog, the idiomatic approach is to pass context as keyword arguments (e.g., self.logger.debug("Event already stored", event_id=event.event_id)). Using f-strings defeats the purpose of structured logging since the data gets baked into the message string and can't be queried/filtered by log aggregation tools.

This pattern appears throughout the codebase (not just this file), so it could be addressed as a follow-up.

Example for this file
-            self.logger.debug(f"Event {event.event_id} already stored, skipping")
+            self.logger.debug("Event already stored, skipping", event_id=event.event_id)
             return event.event_id
-        self.logger.debug(f"Stored event {event.event_id} of type {event.event_type}")
+        self.logger.debug("Stored event", event_id=event.event_id, event_type=event.event_type)
backend/app/api/routes/execution.py (1)

60-68: Hardcoded http.method and http.route are fragile and likely redundant.

OpenTelemetry's FastAPI/Starlette instrumentation (if enabled) already sets http.method and http.route on the server span automatically. Hardcoding them here means they'll go stale if the route path or method changes. The custom attributes (execution.language, execution.script_length, user.id, client.address) are the valuable additions.

Also, note that user.id in trace attributes could be a PII/compliance consideration depending on your organization's observability data policies — ensure your trace backend's retention and access controls account for this.

Remove redundant attributes
     trace.get_current_span().set_attributes({
-        "http.method": "POST",
-        "http.route": "/api/v1/execute",
         "execution.language": execution.lang,
         "execution.language_version": execution.lang_version,
         "execution.script_length": len(execution.script),
         "user.id": current_user.user_id,
         "client.address": get_client_ip(request),
     })
backend/app/services/admin/admin_events_service.py (1)

107-113: extra={} is a stdlib logging pattern — structlog prefers direct keyword arguments.

With structlog.stdlib.BoundLogger, passing extra={...} wraps the context under an "extra" key in the log event dict rather than promoting each field to a top-level key. The idiomatic structlog approach is to pass context as keyword arguments directly. This affects all logging calls in this file that use extra=.

Whether this matters depends on your structlog processor pipeline (e.g., if you have a processor that extracts extra), but it's worth aligning with structlog conventions for consistent log structure.

Example fix (apply same pattern to all similar calls in this file)
         self.logger.info(
             "Preparing replay session",
-            extra={
-                "dry_run": dry_run,
-                "replay_correlation_id": replay_correlation_id,
-            },
+            dry_run=dry_run,
+            replay_correlation_id=replay_correlation_id,
         )
backend/app/db/repositories/user_settings_repository.py (1)

29-29: Prefer structured keyword arguments over f-strings with structlog.

Using f-strings in structlog calls (self.logger.info(f"Created settings snapshot for user {settings.user_id}")) embeds variables into the event string, which defeats structlog's key advantage: machine-parseable structured fields. The same applies to line 81.

Idiomatic structlog passes variable data as keyword args so log aggregators can filter/index on them:

♻️ Suggested change
-        self.logger.info(f"Created settings snapshot for user {settings.user_id}")
+        self.logger.info("Created settings snapshot", user_id=settings.user_id)
-        self.logger.info(f"Deleted all settings data for user {user_id}")
+        self.logger.info("Deleted all settings data", user_id=user_id)

This pattern appears across most files in this PR (coordinator, notification_service, notification_scheduler, etc.). Consider a project-wide pass to replace f-string log messages with keyword args.

backend/app/services/notification_scheduler.py (1)

37-50: Same f-string pattern — use keyword args for structured fields.

Already flagged in user_settings_repository.py. Same applies here for lines 37, 46, and 50.

♻️ Example for line 37
-        self.logger.info(f"Found {len(due)} due scheduled notifications")
+        self.logger.info("Found due scheduled notifications", count=len(due))
backend/app/services/coordinator/coordinator.py (1)

74-74: "HANDLER CALLED:" looks like a debug/development artifact.

This log message reads like a temporary trace rather than a production log entry. Consider downgrading to debug or removing the HANDLER CALLED: prefix.

♻️ Suggested change
-        self.logger.info(f"HANDLER CALLED: handle_execution_requested for event {event.event_id}")
+        self.logger.info("Handling execution request", event_id=event.event_id)
backend/app/settings.py (1)

114-115: Two OTLP endpoint settings with inconsistent naming and default conventions.

There are now two endpoint fields:

  • OTLP_TRACES_ENDPOINT: str = "" (traces, line 115)
  • OTEL_EXPORTER_OTLP_ENDPOINT: str | None = None (metrics, line 157)

The naming prefixes differ (OTLP_ vs OTEL_EXPORTER_OTLP_) and the "disabled" sentinels differ ("" vs None). Consider aligning them — e.g., OTLP_METRICS_ENDPOINT: str | None = None alongside OTLP_TRACES_ENDPOINT: str | None = None — to reduce cognitive load and make the disabled-check consistent.

backend/app/core/logging.py (1)

7-17: Sensitive patterns look solid; note the broad email regex.

The email pattern on line 16 ([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}) will match any email-like string in log messages. This is good for GDPR/PII compliance, but be aware it may over-redact strings like user@service-name in non-PII contexts (e.g., Kubernetes resource names like pod@cluster). If this causes issues, you could tighten the TLD portion (e.g., require .com, .org, etc.) or move it to a separate opt-in processor.

backend/app/core/providers.py (2)

162-162: Prefer structured key-value args over f-strings in structlog calls.

Throughout this file, f-strings are used in structlog log calls (lines 162, 184, 186, 368, 840, 844, etc.). This defeats much of the benefit of structured logging — the dynamic values get baked into the message string rather than being separate, queryable fields.

Example:

-logger.info(f"Redis connected: {settings.REDIS_HOST}:{settings.REDIS_PORT}/{settings.REDIS_DB}")
+logger.info("Redis connected", host=settings.REDIS_HOST, port=settings.REDIS_PORT, db=settings.REDIS_DB)

This applies broadly across the file. Not blocking, but worth addressing to get the full benefit of the structlog migration.


93-94: Consider adding a comment explaining the _tracer dependency.

_tracer: Tracer is injected solely to ensure the Tracer is initialized before the broker starts (side-effect dependency). A brief inline comment would clarify this for future readers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/core/providers.py (1)

92-108: ⚠️ Potential issue | 🟠 Major

Add explanatory comment to _tracer parameter and fix logger type passed to KafkaBroker.

The _tracer: Tracer parameter is intentionally unused—it's a dependency injection ordering trick to ensure the Tracer is initialized (instrumenting FastAPI, HTTPX, etc.) before the broker starts. Add a brief inline comment to document this so future maintainers don't accidentally remove it.

Critical: KafkaBroker expects logger: Optional[logging.Logger] (standard library), but the code passes structlog.stdlib.BoundLogger. According to FastStream's logging documentation, the correct pattern is to pass a stdlib logging.Logger and let structlog wrap it at the application level, not at the provider level. Extract the underlying logger or create a stdlib logger instance to pass here.

🧹 Nitpick comments (1)
backend/app/core/providers.py (1)

147-147: Consider using structlog keyword arguments instead of f-strings for structured log messages.

With structlog, passing data as keyword arguments (e.g., logger.info("Redis connected", host=..., port=..., db=...)) enables structured querying and avoids formatting costs when the level is filtered. This applies to several places in this file (lines 147, 169, 171, 334, 353, etc.).

Not blocking, but it undermines the benefit of switching to structlog if messages remain unstructured f-strings.

Example for line 147
-        logger.info(f"Redis connected: {settings.REDIS_HOST}:{settings.REDIS_PORT}/{settings.REDIS_DB}")
+        logger.info("Redis connected", host=settings.REDIS_HOST, port=settings.REDIS_PORT, db=settings.REDIS_DB)

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/api/routes/auth.py">

<violation number="1" location="backend/app/api/routes/auth.py:43">
P2: Switching to `structlog.stdlib.BoundLogger` but still using `extra={...}` means your structured fields won’t be captured the way structlog expects (it uses keyword args). This will likely drop or nest log context unless you add processors like `ExtraAdder`. Update log calls to pass fields as keyword arguments or bind the context instead.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
backend/app/events/handlers.py (1)

316-336: ⚠️ Potential issue | 🟡 Minor

Injected logger is unused in on_dlq_message.

The logger parameter is injected via DI on line 320 but is never referenced in the function body. Either remove it or use it for observability (e.g., logging receipt of DLQ messages, errors, or processing duration).

backend/app/services/execution_service.py (1)

295-305: ⚠️ Potential issue | 🟡 Minor

Inconsistent correlation_id usage in pseudo-event metadata.

The correlation_id parameter is correctly forwarded to execute_script (lines 292, 337), but the pseudo-event on line 301 hardcodes correlation_id=str(uuid4()) instead of using the passed-in correlation_id. This means idempotency operations (check, reserve, mark) will be tracked under a random ID rather than the request's correlation context.

Proposed fix
         pseudo_event = BaseEvent(
             event_id=str(uuid4()),
             event_type=EventType.EXECUTION_REQUESTED,
             timestamp=datetime.now(timezone.utc),
             metadata=EventMetadata(
                 user_id=user_id,
-                correlation_id=str(uuid4()),
+                correlation_id=correlation_id or str(uuid4()),
                 service_name="api",
                 service_version="1.0.0",
             ),
         )
backend/app/api/routes/auth.py (1)

43-55: 🛠️ Refactor suggestion | 🟠 Major

Refactor logging calls to use idiomatic structlog kwargs instead of extra={}.

With structlog.stdlib.BoundLogger configured via structlog.stdlib.LoggerFactory() and the current processor chain, passing extra={...} nests the dict under an "extra" key in the event dict rather than merging values at the top level. This changes the log output shape.

Idiomatic structlog would be:

logger.info(
    "Login attempt",
    username=form_data.username,
    client_ip=get_client_ip(request),
    endpoint="/login",
    user_agent=request.headers.get("user-agent"),
)

This pattern appears in approximately 125 locations across the codebase. Aligns with your PR note about updating dashboards/parsers for the new log shape — this refactoring will flatten the output and make it easier for parsers to consume the fields directly.

🧹 Nitpick comments (5)
backend/app/events/handlers.py (1)

49-49: Use structured key-value args instead of f-string for structlog.

Now that you've migrated to structlog, this log line should leverage structured logging rather than embedding data in the message string. f-string messages defeat the purpose of structlog (machine-parseable fields, filtering by key, etc.).

♻️ Proposed fix
-        logger.info(f"Duplicate event: {event.event_type} ({event.event_id})")
+        logger.info("duplicate_event", event_type=event.event_type, event_id=event.event_id)
backend/tests/e2e/app/test_main_app.py (1)

260-266: Minor: local import and stale docstring.

structlog is imported locally inside the test method, whereas test_container.py imports it at module level. Consider moving the import to the top of the file for consistency. Also, the docstring on Line 261 still says "Logger" rather than "BoundLogger".

Suggested diff

At the top of the file (after line 4):

 import pytest
+import structlog
 import redis.asyncio as aioredis

In the test method:

     async def test_container_resolves_logger(self, scope: AsyncContainer) -> None:
-        """Container can resolve Logger."""
-        import structlog
-
+        """Container can resolve BoundLogger."""
         logger = await scope.get(structlog.stdlib.BoundLogger)
backend/app/core/providers.py (2)

146-146: Prefer structured key-value args over f-strings with structlog.

Throughout this file, log calls use f-strings (e.g., Lines 146, 168, 170, 352, 822, 826). This embeds data into the message string, losing the main benefit of structured logging (machine-parseable fields). Use keyword arguments instead.

Example fix for Line 146
-        logger.info(f"Redis connected: {settings.REDIS_HOST}:{settings.REDIS_PORT}/{settings.REDIS_DB}")
+        logger.info("redis_connected", host=settings.REDIS_HOST, port=settings.REDIS_PORT, db=settings.REDIS_DB)

The same pattern should be applied to all f-string log calls in this file (Lines 168, 170, 352, 822, 826).


91-107: _tracer parameter used solely as a DI side-effect trigger.

The underscore-prefixed _tracer: Tracer on Line 93 is never used in the method body — it exists only to force the DI container to instantiate Tracer before the broker is created. This works but is non-obvious. A brief inline comment would help future readers understand the intent.

Suggested comment
     `@provide`
     async def get_broker(
-        self, settings: Settings, logger: structlog.stdlib.BoundLogger, _tracer: Tracer,
+        self, settings: Settings, logger: structlog.stdlib.BoundLogger,
+        _tracer: Tracer,  # injected to ensure OTEL tracing is initialized before broker starts
     ) -> AsyncIterator[KafkaBroker]:
backend/app/services/execution_service.py (1)

438-439: Prefer structured message over f-string with structlog.

Using an f-string in the log message embeds variable data into the message string, defeating structlog's structured logging benefits (searchability, parsing). Pass the count as a key instead.

Proposed fix
         self.logger.debug(
-            f"Retrieved {len(executions)} executions for user",
+            "Retrieved executions for user",
             extra={
                 "user_id": user_id,
+                "count": len(executions),
                 "filters": {k: v for k, v in query.items() if k != "user_id"},
                 "limit": limit,
                 "skip": skip,
             },
         )

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 53 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/db/repositories/admin/admin_events_repository.py">

<violation number="1" location="backend/app/db/repositories/admin/admin_events_repository.py:81">
P2: Guard against a missing aggregate_id; otherwise the related-events query can return unrelated events with aggregate_id=None.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
backend/app/dlq/manager.py (2)

83-83: ⚠️ Potential issue | 🟠 Major

extra={…} is a stdlib logging pattern, not idiomatic structlog.

With structlog.stdlib.BoundLogger, passing extra={"event_id": ...} logs a single key "extra" containing a dict, rather than flattening event_id into the log event's top-level context. The idiomatic structlog approach is to pass context as keyword arguments directly. This applies to all logging calls in this file (lines 83, 161, 191, 218, 222, 269, 273).

Example fix (apply same pattern everywhere)
-        self.logger.info("Message filtered out", extra={"event_id": message.event.event_id})
+        self.logger.info("Message filtered out", event_id=message.event.event_id)

Line 191:

-        self.logger.warning("Discarded message", extra={"event_id": message.event.event_id, "reason": reason})
+        self.logger.warning("Discarded message", event_id=message.event.event_id, reason=reason)

Line 222:

-        self.logger.info("Skipping manual retry", extra={"event_id": event_id, "status": message.status})
+        self.logger.info("Skipping manual retry", event_id=event_id, status=message.status)

251-251: ⚠️ Potential issue | 🟡 Minor

Avoid f-string interpolation in structured log messages.

Using f"Error retrying message {event_id}: {e}" defeats structured logging — the event_id and error won't be queryable as separate fields. Pass them as keyword arguments instead.

Suggested fix
-                self.logger.error(f"Error retrying message {event_id}: {e}")
+                self.logger.error("Error retrying message", event_id=event_id, error=str(e))
backend/app/services/saga/saga_orchestrator.py (1)

258-258: ⚠️ Potential issue | 🟠 Major

extra={} is a stdlib logging pattern, not idiomatic structlog.

With structlog, context should be passed as keyword arguments directly, not via extra={}. The extra dict is a logging.Logger concept and may not be processed correctly by structlog processors (depending on configuration).

For example:

-self.logger.error("Saga not found", extra={"saga_id": saga_id})
+self.logger.error("Saga not found", saga_id=saga_id)
-self.logger.warning(
-    "Cannot cancel saga in current state. Only RUNNING or CREATED sagas can be cancelled.",
-    extra={"saga_id": saga_id, "state": saga_instance.state},
-)
+self.logger.warning(
+    "Cannot cancel saga in current state. Only RUNNING or CREATED sagas can be cancelled.",
+    saga_id=saga_id, state=saga_instance.state,
+)

This applies to all extra={} calls in cancel_saga. The f-string log calls elsewhere in this file have the same issue noted in execution_saga.py — prefer keyword args for structured data.

Also applies to: 262-265, 273-280, 303-303, 307-311

docs/operations/tracing.md (2)

84-84: ⚠️ Potential issue | 🟡 Minor

Stale reference to adaptive sampling.

The PR removes adaptive sampling support, but this line still says "The sampler can be ratio-based or adaptive; both are supported." Update to reflect the current sampling strategy (tail_sampling via otel-collector per PR description).


7-7: ⚠️ Potential issue | 🟡 Minor

Update documentation to reflect new Tracer initialization flow.

The documentation at line 7 references the removed init_tracing function and incorrect module path. The tracing initialization now happens via a Tracer class (in app/core/tracing/__init__.py) that is instantiated as a DI-managed component in app/core/providers.py. Update the paragraph to describe this new flow instead of the old init_tracing-based initialization.

🤖 Fix all issues with AI agents
In `@docs/operations/logging.md`:
- Around line 90-91: Update the stale reference to Jaeger in the sentence that
tells readers to "jump to Jaeger" — change it to reference the generic OTLP
tracing backend or the configured tracing system (e.g., OTLP collector/Tempo)
and mention the environment variable OTLP_TRACES_ENDPOINT and the trace_id for
locating traces; ensure the text now reads generically (e.g., "use the trace_id
to view the full distributed trace in your tracing backend or OTLP collector")
and remove any Jaeger-specific wording so it aligns with removal of
ENABLE_TRACING and Jaeger config.

In `@frontend/src/routes/admin/AdminEvents.svelte`:
- Around line 169-170: The new event object is being created with replay_id: ''
which leaves the replay banner inconsistent; instead assign replay_id from the
API response when mapping/constructing the event (e.g., use response.replay_id
or the API payload's replay_id) while still keeping created_at: new
Date().toISOString() as-is; update the code in the function that builds events
(where replay_id and created_at are set) to read and use the replay_id field
from the server response, falling back to an empty string only if the response
value is missing.
🧹 Nitpick comments (9)
backend/tests/unit/services/pod_monitor/test_event_mapper.py (1)

204-204: Unused caplog fixture parameter.

caplog is declared but never referenced in the test body. With the migration to structlog, it no longer captures log output unless structlog is explicitly configured to propagate to standard logging. Consider removing it.

Suggested fix
-async def test_parse_and_log_paths_and_analyze_failure_variants(caplog: pytest.LogCaptureFixture) -> None:
+async def test_parse_and_log_paths_and_analyze_failure_variants() -> None:
backend/app/services/result_processor/processor.py (1)

73-73: Use structured key-value args instead of f-strings with structlog.

The whole point of migrating to structlog is machine-parseable, structured logs. Using f-strings collapses context into an opaque string. Same applies to lines 95 and 121.

♻️ Proposed fix (apply same pattern to lines 95 and 121)
-            self.logger.error(f"Failed to handle ExecutionCompletedEvent: {e}", exc_info=True)
+            self.logger.error("failed to handle ExecutionCompletedEvent", error=str(e), execution_id=event.execution_id, exc_info=True)
backend/app/services/saga/execution_saga.py (1)

23-23: Prefer structured key-value args over f-strings with structlog.

Using f-strings with structlog (e.g., logger.info(f"Validating execution {event.execution_id}")) embeds dynamic values into the message string, which defeats structlog's structured logging benefits — values won't be independently searchable or filterable. Consider using keyword arguments instead:

logger.info("Validating execution", execution_id=event.execution_id)

This applies to all log calls in this file. Not blocking, but it would be good to address given you're already touching every log line in this migration.

Also applies to: 52-52, 97-97, 124-124, 147-147, 165-165, 180-180

backend/app/services/saga/saga_orchestrator.py (1)

141-141: Tracer is re-created on every _execute_saga call.

trace.get_tracer(__name__) is cheap but idiomatic usage is to create it once at module scope (like you did with the structlog logger in execution_saga.py). Consider hoisting to module level:

_tracer = trace.get_tracer(__name__)
backend/app/services/coordinator/coordinator.py (1)

74-74: Leverage structlog's structured logging instead of f-strings.

Now that the logger is a structlog.stdlib.BoundLogger, all the log calls in this file still use f-string interpolation (e.g., lines 74, 84, 94, 109, 158, 161, 190–194, 212–214, etc.). This defeats the primary benefit of structlog — structured, machine-parsable, filterable key-value log entries.

For example:

♻️ Suggested pattern (representative, not exhaustive)
-        self.logger.info(f"HANDLER CALLED: handle_execution_requested for event {event.event_id}")
+        self.logger.info("handle_execution_requested called", event_id=event.event_id)
-        self.logger.error(f"Failed to handle execution request {event.execution_id}: {e}", exc_info=True)
+        self.logger.error("failed to handle execution request", execution_id=event.execution_id, exc_info=True)
-        self.logger.info(
-            f"Added execution {event.execution_id} to queue. "
-            f"Priority: {event.priority}, Position: {position}, "
-            f"Queue size: {len(self._queue)}"
-        )
+        self.logger.info(
+            "execution added to queue",
+            execution_id=event.execution_id,
+            priority=event.priority,
+            position=position,
+            queue_size=len(self._queue),
+        )

This applies to every self.logger.* call in the file. Consider applying the same pattern across all files touched by this PR for consistency.

Also applies to: 84-84, 94-94, 190-194

frontend/src/components/admin/events/EventsTable.svelte (1)

194-198: Mobile view correctly updated to show aggregate_id instead of correlation_id.

Note that the desktop table view doesn't include an Aggregate column — only Time, Type, User, Service, and Actions are shown. If aggregate info is useful, consider adding it to the desktop view as well for parity, though this is a minor UX consideration.

backend/app/db/repositories/user_settings_repository.py (1)

29-29: Use structlog's native key-value binding instead of f-strings.

The logging documentation (lines 37–49 of docs/operations/logging.md) states that messages should be static strings with details passed as structured data. With structlog, pass context as keyword args rather than interpolating into the message:

Suggested fix
-        self.logger.info(f"Created settings snapshot for user {settings.user_id}")
+        self.logger.info("Created settings snapshot", user_id=settings.user_id)

Same applies to line 76:

-        self.logger.info(f"Deleted all settings data for user {user_id}")
+        self.logger.info("Deleted all settings data", user_id=user_id)

This avoids log injection risk and makes logs queryable by user_id as a separate field.

backend/app/services/user_settings_service.py (1)

41-50: Inconsistent logging style: f-strings mixed with extra dicts.

Line 41-44 correctly uses a static message with extra, but line 50 interpolates user_id into the message while also passing extra. The same pattern appears on lines 200, 205, and 211. Consider adopting one consistent approach — ideally structlog's native keyword args:

Example for line 50
-            self.logger.debug(f"Settings cache hit for user {user_id}", extra={"cache_size": len(self._cache)})
+            self.logger.debug("Settings cache hit", user_id=user_id, cache_size=len(self._cache))

This also applies to lines 200, 205, and 211 — and to the repository file reviewed above. Using structlog's keyword args is idiomatic, avoids log injection, and makes all fields independently queryable.

backend/app/services/admin/admin_events_service.py (1)

8-9: Use keyword arguments instead of extra dict for structured fields.
Passing fields via extra={...} creates a literal "extra" key in the event dict. Use keyword arguments to keep structured fields at the top level of the log context.

♻️ Example change (apply similarly to other log calls)
-        self.logger.info(
-            "Preparing replay session",
-            extra={
-                "dry_run": dry_run,
-                "replay_id": replay_id,
-            },
-        )
+        self.logger.info(
+            "Preparing replay session",
+            dry_run=dry_run,
+            replay_id=replay_id,
+        )

Also applies to: 142–147, 184–190

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 28 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/core/logging.py">

<violation number="1" location="backend/app/core/logging.py:81">
P2: `logging.basicConfig()` is a no-op when the root logger already has handlers. Since `setup_logger` is called multiple times (confirmed in `main.py` lines 54 and 131), subsequent calls won't update the stdlib log level. The removed `logging.getLogger().setLevel(level)` was specifically working around this. Consider adding `force=True` to `basicConfig` (available since Python 3.8) so that each call fully reconfigures the root logger.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/app/services/saved_script_service.py (1)

72-75: ⚠️ Potential issue | 🟡 Minor

Repo return value is discarded, causing a redundant DB round-trip.

saved_script_repo.update_saved_script already returns DomainSavedScript | None (returns None when not found, or the updated document on success). Ignoring this and issuing a separate get_saved_script call means:

  1. An unnecessary extra database query on every update.
  2. A narrow race window: a concurrent delete between lines 72 and 73 would raise SavedScriptNotFoundError even though the update succeeded.

Use the repo's return value directly:

Proposed fix
-        await self.saved_script_repo.update_saved_script(script_id, user_id, update_data)
-        updated_script = await self.saved_script_repo.get_saved_script(script_id, user_id)
-        if not updated_script:
+        updated_script = await self.saved_script_repo.update_saved_script(script_id, user_id, update_data)
+        if not updated_script:
             raise SavedScriptNotFoundError(script_id)
backend/app/services/sse/redis_bus.py (1)

99-109: ⚠️ Potential issue | 🟠 Major

Printf-style %s log calls were not migrated to structlog.

These calls at lines 99 and 104-109 still use logging.Logger-style positional %s formatting. The structlog configuration does not include PositionalArgumentsFormatter in its processor chain, so positional arguments will not be interpolated into the message. Instead, they will be treated as extra context fields, and the %s placeholders will appear literally in the JSON output. This is inconsistent with the structured logging migration.

🐛 Proposed fix
-        if not execution_id:
-            self.logger.debug("Event %s has no execution_id, skipping", event.event_type)
-            return
+        if not execution_id:
+            self.logger.debug("Event has no execution_id, skipping", event_type=event.event_type)
+            return
         try:
             await self.publish_event(execution_id, event)
         except Exception as e:
             self.logger.error(
-                "Failed to publish %s to Redis for %s: %s",
-                event.event_type, execution_id, e,
+                "Failed to publish event to Redis",
+                event_type=event.event_type,
+                execution_id=execution_id,
+                error=str(e),
                 exc_info=True,
-
             )
🤖 Fix all issues with AI agents
In `@backend/app/core/logging.py`:
- Around line 7-17: The email regex in SENSITIVE_PATTERNS causes all email-like
values to be redacted when sanitize_sensitive_data iterates event_dict values;
update either SENSITIVE_PATTERNS or sanitize_sensitive_data to avoid removing
intentionally-logged emails—preferably add an allowlist of keys (e.g.,
ALLOWLIST_PRESERVE_EMAILS) and change sanitize_sensitive_data to skip redaction
for those keys (referencing SENSITIVE_PATTERNS, sanitize_sensitive_data, and
event_dict), or alternatively remove the email tuple from SENSITIVE_PATTERNS and
handle email redaction only in a targeted helper that checks field names before
masking.

In `@backend/app/core/tracing/__init__.py`:
- Line 64: Replace the f-string logging call with a static message and argument
to follow the structured-logging convention: update the logger.info call
(currently logger.info(f"Tracing initialized for {name}")) to use a static
format and pass name as an argument, e.g. logger.info("Tracing initialized for
%s", name), so the message string remains constant and the runtime data is
passed separately.
- Around line 43-51: The TracerProvider instance stored in the local variable
provider can leave buffered spans unflushed because provider.shutdown() is never
called; modify the Tracer class to store the TracerProvider as an instance
attribute (e.g., self._provider) when you create provider and add a public
shutdown method (e.g., Tracer.shutdown) that calls self._provider.shutdown();
then ensure dishka_lifespan (or the DI container/lifespan teardown) invokes
Tracer.shutdown() in its finally block so BatchSpanProcessor flushes remaining
spans on application exit.
- Around line 56-62: The code currently double-injects trace context because
LoggingInstrumentor().instrument(set_logging_format=True) adds
otelTraceID/otelSpanID to stdlib LogRecord while the structlog processor chain
still includes add_otel_context; pick one approach and apply it consistently:
either disable stdlib injection by changing
LoggingInstrumentor().instrument(...) to use set_logging_format=False (keep
add_otel_context in the structlog processors) or remove add_otel_context from
the structlog processor chain (keep set_logging_format=True) so only one
mechanism (LoggingInstrumentor or add_otel_context) injects trace_id/span_id;
update the LoggingInstrumentor invocation or the structlog processor list
accordingly (look for LoggingInstrumentor().instrument(...) and add_otel_context
in the structlog config).
- Around line 56-59: The current Tracer.__init__ calls
FastAPIInstrumentor().instrument(...), which monkey-patches Future FastAPI
instances and therefore misses the existing FastAPI instance created earlier;
update Tracer to call FastAPIInstrumentor().instrument_app(app,
tracer_provider=tp, excluded_urls="health,metrics,docs,openapi.json") passing
the actual app instance (the same app variable created in main) so the existing
app gets OpenTelemetryMiddleware, or alternatively move the instrumentation call
to run before the app is instantiated; ensure you reference the tracer_provider
(tp) and excluded_urls arguments when switching to instrument_app.

In `@backend/app/services/notification_service.py`:
- Around line 346-357: The code logs and traces the full webhook_url which may
contain embedded secrets; update the webhook handling in notification_service.py
so logger.debug and trace.get_current_span().set_attributes do not include the
full URL: extract and log only a safe identifier such as the host (use
urlparse(webhook_url).netloc) or a masked version (e.g., keep scheme and host
and redact the path/query) and replace webhook_url with that safe value in the
logger.debug call and the "notification.webhook_url" span attribute; keep other
attributes (notification_id, payload_size, channel) unchanged and ensure no full
URL is stored or emitted.

In `@docs/operations/logging.md`:
- Around line 24-25: Update the include line ranges so the docs reference the
actual functions: change the broken include that points to
backend/app/core/logging.py lines 110–147 to instead include lines 57–84 to show
the complete setup_logger function, and replace the include that currently
references lines 35–59 with lines 26–54 so the snippet cleanly contains both
sanitize_sensitive_data and add_otel_context processors; ensure the identifiers
setup_logger, sanitize_sensitive_data, and add_otel_context are visible in the
updated snippets.
🧹 Nitpick comments (12)
backend/app/dlq/manager.py (1)

251-251: Inconsistent logging style: use structured keyword args instead of f-string.

Every other log call in this file passes context as keyword arguments (event_id=, reason=, status=), but this line uses an f-string, which buries the event_id and error inside the message string. This defeats structlog's structured field extraction — these values won't be independently queryable/filterable in log aggregation.

♻️ Proposed fix
-                self.logger.error(f"Error retrying message {event_id}: {e}")
+                self.logger.error("Error retrying message", event_id=event_id, error=str(e))
backend/app/services/user_settings_service.py (1)

51-51: Avoid f-strings in structlog event messages — use keyword arguments for variable data.

With structlog, the first positional argument (the "event" string) should be a static, greppable key. Embedding user_id via f-strings creates unique event strings per user, defeating structured log aggregation and search.

♻️ Proposed fix
-            self.logger.debug(f"Settings cache hit for user {user_id}", cache_size=len(self._cache))
+            self.logger.debug("Settings cache hit", user_id=user_id, cache_size=len(self._cache))
-            self.logger.debug(f"Invalidated cache for user {user_id}", cache_size=len(self._cache))
+            self.logger.debug("Cache invalidated", user_id=user_id, cache_size=len(self._cache))
-        self.logger.debug(f"Cached settings for user {user_id}", cache_size=len(self._cache))
+        self.logger.debug("Settings cached", user_id=user_id, cache_size=len(self._cache))
-        self.logger.info(f"Reset settings for user {user_id}")
+        self.logger.info("Settings reset", user_id=user_id)

Also applies to: 201-201, 206-206, 212-212

backend/app/services/execution_service.py (2)

424-430: Avoid f-string in structlog event name; pass the dynamic value as a keyword argument.

Structlog best practice is to keep event strings static so they're easily searchable and groupable in log aggregation tools. The execution count should be a separate keyword argument.

Proposed fix
         self.logger.debug(
-            f"Retrieved {len(executions)} executions for user",
+            "Retrieved executions for user",
             user_id=user_id,
             filters={k: v for k, v in query.items() if k != "user_id"},
             limit=limit,
             skip=skip,
+            count=len(executions),
         )

380-389: Consider whether logging resource_usage verbatim is desirable.

resource_usage may be a large or nested object. Logging it directly on every retrieval could add noise. If it's useful for debugging, debug level might be more appropriate for that specific field, or log only a summary (e.g., a boolean has_resource_usage).

backend/app/services/notification_service.py (2)

144-151: Prefer static event strings with structlog.

Throughout this file, several log calls embed variables into f-string event messages (e.g., f"Creating notification for user {user_id}") while simultaneously passing the same data as keyword arguments. This is a structlog anti-pattern — dynamic event strings prevent grouping/filtering by event type in log aggregation tools, which is one of the main benefits of the structured logging migration.

Affected locations: Lines 145, 347, 605, 663, 688.

♻️ Example fix for this call site (apply same pattern to others)
     self.logger.info(
-        f"Creating notification for user {user_id}",
+        "creating_notification",
         user_id=user_id,
         channel=channel,
         severity=str(severity),
         tags=tags,
         scheduled=scheduled_for is not None,
     )

640-650: Backoff retry with jitter=None risks thundering herd.

Disabling jitter means concurrent retries for the same external endpoint (webhook/Slack) will fire in lockstep. Consider using backoff.full_jitter or the default jitter to spread retry attempts.

♻️ Suggested change
     `@backoff.on_exception`(
         backoff.expo,
         Exception,
         max_tries=notification.max_retries,
         max_value=30,
-        jitter=None,
+        jitter=backoff.full_jitter,
         on_backoff=lambda details: self.logger.warning(
backend/app/services/sse/redis_bus.py (1)

32-36: Log message mixes f-string interpolation with structured fields — not idiomatic structlog.

The channel is duplicated (in the f-string and as a kwarg), and the exception is only embedded in the string rather than passed as a structured field. With structlog, prefer a static event string and push all dynamic data into keyword arguments.

♻️ Suggested refactor
         except Exception as e:
             self.logger.warning(
-                f"Failed to parse Redis message on channel {self._channel}: {e}",
+                "Failed to parse Redis message",
                 channel=self._channel,
                 model=model.__name__,
+                error=str(e),
             )
backend/app/settings.py (1)

114-123: Confusing naming between trace and metric endpoint settings.

There are now two OTLP endpoint fields with inconsistent naming conventions:

  • OTLP_TRACES_ENDPOINT (line 115) — for traces
  • OTEL_EXPORTER_OTLP_ENDPOINT (line 157) — for metrics

Similarly, TRACING_SERVICE_NAME/TRACING_SERVICE_VERSION (lines 122-123) duplicate SERVICE_NAME/SERVICE_VERSION (lines 152-153) with identical defaults. The tracing Tracer class uses the former; setup_metrics uses the latter.

Consider unifying these — e.g., a single OTLP_ENDPOINT with sub-paths for traces/metrics, or at least aligning the naming prefix. The duplication will confuse operators configuring different service names for traces vs metrics when there's no obvious reason to diverge.

Also applies to: 152-157

backend/tests/unit/core/test_logging_and_correlation.py (2)

154-161: capture_logs() overrides the processor chain configured by setup_logger.

structlog.testing.capture_logs() replaces the processor pipeline with its own capturing processor for the duration of the context manager. This means the setup_logger("INFO") configuration (including sanitize_sensitive_data and add_otel_context) is not exercised here — the test only verifies that structlog's basic event/key binding works, not the custom pipeline.

If the intent is to verify the configured processor chain end-to-end, consider calling the logger outside capture_logs and asserting on the output (e.g., captured stderr), or test the processors individually (as you already do above). If the intent is simply "setup_logger returns a usable logger," the test name and assertions could be clarified accordingly.


106-107: Hardcoded pattern count is brittle.

Asserting len(SENSITIVE_PATTERNS) == 6 will break silently (with a non-descriptive assertion error) whenever a new pattern is added. Consider removing this test or replacing it with a minimum-count assertion (>= 6) to avoid false failures when the list is intentionally extended.

backend/app/services/admin/admin_events_service.py (2)

116-118: max_events = 1000 is duplicated.

The 1000 limit appears both in the validation check (Line 116) and in ReplayConfig.max_events (Line 155). Consider extracting to a class-level constant to keep them in sync.

Proposed fix
 class AdminEventsService:
+    MAX_REPLAY_EVENTS = 1000
+
     def __init__(
             self,
             repository: AdminEventsRepository,

Then reference self.MAX_REPLAY_EVENTS in both locations.

Also applies to: 148-157


62-66: Stale comment about LogRecord reserved attribute.

Line 64's comment references LogRecord, which is a stdlib logging concept. With structlog, filename isn't reserved in the event dict. The comment is now misleading — consider updating or removing it.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/services/pod_monitor/config.py">

<violation number="1" location="backend/app/services/pod_monitor/config.py:19">
P2: Hardcoding the namespace and kubeconfig path removes runtime configurability, so PodMonitor will always use "integr8scode" and no kubeconfig even when deployments rely on K8S_NAMESPACE/KUBECONFIG. This can break monitoring in non-default namespaces or out-of-cluster runs.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/api/dependencies.py">

<violation number="1" location="backend/app/api/dependencies.py:13">
P2: The size check loads the full request body into memory before enforcing the limit, which can allow large payloads to consume memory and defeats the guard. Prefer checking Content-Length (or streaming with a cap) so oversized requests are rejected without buffering the entire body.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/api/dependencies.py">

<violation number="1" location="backend/app/api/dependencies.py:27">
P2: Guard the Content-Length parsing so malformed headers don’t raise `ValueError` and cause a 500. Fall back to the streaming check or return a 400.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@sonarqubecloud
Copy link

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="backend/app/core/middlewares/request_size_limit.py">

<violation number="1">
P2: Invalid Content-Length values will raise ValueError and return 500. Catch parsing errors and return a 400/413 instead.</violation>

<violation number="2">
P1: Requests without a Content-Length header bypass the size limit, so oversized streaming bodies can be accepted without any enforcement. Add a streaming/receive-size check or reject requests missing Content-Length.</violation>
</file>

<file name="docs/architecture/middleware.md">

<violation number="1">
P2: This documentation overstates the request-size enforcement. The middleware only rejects requests when `Content-Length` is present and exceeds the limit, so requests without (or with incorrect) Content-Length can bypass it. Please clarify the limitation so readers don’t assume full DoS protection.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@HardMax71 HardMax71 merged commit 17763d0 into main Feb 13, 2026
18 checks passed
@HardMax71 HardMax71 deleted the fix/logging branch February 13, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant