Skip to content

feat!: add telemetry error events and stop producer on auth failures (v3.0.0)#14

Merged
Rolf Håvard Blindheim (rhblind) merged 21 commits intomainfrom
issue-12-telemetry-errors
Apr 14, 2026
Merged

feat!: add telemetry error events and stop producer on auth failures (v3.0.0)#14
Rolf Håvard Blindheim (rhblind) merged 21 commits intomainfrom
issue-12-telemetry-errors

Conversation

@rhblind
Copy link
Copy Markdown
Member

@rhblind Rolf Håvard Blindheim (rhblind) commented Apr 14, 2026

Summary

Closes #12.

  • Breaking: Client.receive_messages/3 now returns {:ok, [Message.t()]} | {:error, reason} instead of a bare list. Custom client implementations must wrap their result in {:ok, list}.
  • Emit [:off_broadway_splunk, :receive_status, :error] telemetry on network errors from receive_status/2.
  • Emit [:off_broadway_splunk, :receive_jobs, :error] telemetry when the job list fetch fails (non-200 or network error).
  • Emit [:off_broadway_splunk, :receive_messages, :error] telemetry when an event fetch fails (non-200 or network error).
  • Stop the producer ({:stop, :unauthorized, state}) on HTTP 401 or 403 from either the jobs or messages path, allowing Broadway's supervisor to restart or leave it stopped.

Test plan

  • All 34 tests pass (mix test)
  • Credo clean (mix credo --strict)
  • Telemetry events verified in producer tests for 403 (jobs) and 404 (messages) paths
  • Stop-on-auth tested for 401 and 403 from both jobs and messages paths
  • Custom client implementors: update receive_messages/3 to return {:ok, list} or {:error, reason}

Update the receive_messages/3 callback typespec from returning a plain
list to returning {:ok, messages} | {:error, reason :: any}, allowing
the producer to distinguish errors from empty results.
wrap_received_messages now returns {:ok, messages} on success,
{:error, {:http_error, status}} on non-200 HTTP responses, and
{:error, reason} on network errors, matching the Client behaviour typespec.
…p on 401/403

- receive_messages_from_splunk/2 now handles {:ok, msgs} | {:error, ...} return
  from the client and emits [:off_broadway_splunk, :receive_messages, :error]
  telemetry on errors
- Returns {:stop, :unauthorized, state} on HTTP 401/403, {:error, ...} otherwise
- handle_receive_messages/1 demand>0 clause pattern-matches the new return shape,
  retrying on transient errors and stopping the producer on auth failures
- Add two tests: telemetry emission on 404 and producer stop on 401
- Emit [:off_broadway_splunk, :receive_jobs, :error] telemetry for non-200
  HTTP responses and network errors from receive_jobs_from_splunk/1
- Convert non-200 responses to {:error, {:http_error, status}} returns
- Stop the producer on 401/403 from the jobs path in both
  handle_receive_jobs/1 and the inline fetch in handle_receive_messages/1
- Retry on other errors by rescheduling
- Remove now-unreachable catch-all clauses from update_queue_from_response/2
- Add tests for telemetry emission and producer stop on 401/403
Replace manual changelog format with release-please generated version including bug fixes, documentation updates, CI configuration, style changes, and test improvements.
Consolidate multi-line assert_receive statements for telemetry events into single lines for improved readability and consistency.
These options were deprecated in 2.1.3 in favour of the :jobs option.
Add blank line between variable assignment and case statement for improved readability following Elixir style guidelines.
…ucer

- Add terminate/2 callback emitting [:off_broadway_splunk, :producer, :stop]
  whenever the producer stops via a callback return (e.g. :unauthorized on
  a 401/403). This gives consumers an observable signal for non-transient
  failures. Note: fires only for callback-initiated stops, not supervisor
  shutdown (ProducerStage does not trap exits).

- Remove [:off_broadway_splunk, :receive_status, :error] telemetry from
  SplunkClient.receive_status/2. The producer already emits
  [:off_broadway_splunk, :receive_jobs, :error] for the same failure, so
  the client-side emission was a double-event. Moving responsibility to the
  producer ensures the event is always emitted regardless of which client
  implementation is used.
@rhblind Rolf Håvard Blindheim (rhblind) merged commit 46fba50 into main Apr 14, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auth and network errors are silently swallowed with no telemetry or observable signal

1 participant