dapr · marcduiker · Feb 25, 2026 · Feb 1, 2026 · Feb 1, 2026 · Feb 1, 2026
@@ -0,0 +1,23 @@
+---
+type: docs
+title: "Protocol Reference"
+linkTitle: "Protocol Reference"
+weight: 100
+description: >
+  Low-level technical documentation of the Dapr runtime protocols and internal mechanics for each building block.
+---
+
+This section provides a deep dive into the internal workings of the Dapr runtime. It is intended for maintainers, 
+contributors, and anyone interested in the low-level implementation details of Dapr's building blocks.
+
+Unlike the user-facing API reference, these documents focus on:
+- How the runtime processes requests.
+- Internal state transitions.
+- Interaction with component interfaces.
+- Protocol-level details (gRPC and HTTP).
+
+## Building Blocks
+
+Select a building block to explore its internal protocol and mechanics:
+
+- [Workflow]({{% ref workflow-protocol %}})
@@ -0,0 +1,139 @@
+---
+type: docs
+title: "Workflow Protocol"
+linkTitle: "Workflow"
+weight: 10
+description: "Low-level description of the Workflow building block internals."
+---
+
+This document specifies the Dapr Workflow protocol and runtime contract at a low level. It targets SDK authors building 
+Workflow Workers and runtime maintainers evolving the Dapr sidecar’s Workflow Engine.
+
+## Overview
+Dapr Workflow implements a sidecar-as-scheduler pattern: the Dapr runtime (sidecar) acts as the Workflow Engine, and 
+the application SDK acts as the Workflow Worker. All control and execution traffic flows over gRPC.
+
+There are two protocol surfaces:
+1. Management API (standard Dapr gRPC accessible via SDK):
+    - Start, terminate, pause, resume, re-run, purge and query workflow instances.
+2. Execution API (Task Hub Protocol):
+    - Worker facing, used to receive orchestration/activity work items and to report completion (e.g., via 
+    `TaskHubSidecarService`).
+
+[//]: # (graph LR)
+
+[//]: # (Client[Workflow Client] -- Management API --> Sidecar[Dapr Sidecar Engine])
+
+[//]: # (Sidecar -- Task Hub Protocol --> Worker[Workflow Worker SDK])
+
+[//]: # (Sidecar -- State Store --> Actors[Dapr Actors Backend])
+
+
+Key Components
+- **Workflow Engine (Dapr Sidecar)**
+
+   Manages workflow state transitions, history persistence, scheduling of orchestration and activity tasks, and 
+   reliable delivery semantics. By default, it leverages Dapr Actors as the backend for durable, partitioned execution.
+
+- **Workflow Worker (Application SDK)**
+
+  Connects to the sidecar, polls for orchestration and activity work items, executes user-defined logic, and returns 
+  results, failures, and heartbeats to the engine. Orchestration logic must be deterministic; activity logic need not 
+  be.
+
+- **Orchestration**
+
+  The deterministic coordinator that defines the workflow. The engine drives orchestrations via history replay to 
+  rebuild state and schedule outbound tasks (activities, sub-orchestrations, timers, external events).
+
+- **Activity**
+
+  The atomic unit of work. Activities are executed **at-least-once** and report results or failures back to the engine. 
+  Idempotency is recommended and task execution identifiers are available on context to assist with this.
+
+- **State Store & Backend**
+
+  Workflow history and state are durably persisted. The engine typically implements a task hub pattern over the chosen 
+  persistence and uses Dapr Actors as the default reliability substrate.
+
+## Execution Model
+Dapr Workflow is based on the Durable Task Framework (DTFx) execution semantics:
+
+### Replay-based Execution
+Orchestrators are replayed from their event history to rebuild deterministic state. All nondeterministic operations 
+(time, random values, I/O) must be mediated by the engine (e.g., timers, activity calls, external events).
+
+### Deterministic Orchestrators
+Orchestrator code must be side-effect free except via engine-mediated effects. Control flow must be reproducible 
+during replay.
+
+### At-least-once Activities, Exactly-once State Commit
+Activities may be delivered more than once. The engine ensures workflow state commits are idempotent and applied 
+exactly once.
+
+### Sidecar-as-Scheduler
+The sidecar owns scheduling and persists all history/events before dispatching work to workers. Workers are stateless 
+executors from the engine’s perspective.
+
+## Protocol Surfaces
+1) Management API (Standard Dapr gRPC)
+  - **Start Workflow**: Create and persist an initial history event; return instance metadata
+  - **Terminate / Pause/ Resume**: Drive lifecycle transitions through persisted control events.
+  - **Query**: Retrieve instance status, history, output, failure details, and custom metadata.
+  - **Re-run**: Start a new workflow instance from a history event.
+  - **Purge**: Proactively clear workflow history and state.
+
+> **Note**: See: [Management API specification]({{% ref workflow-protocol-management-api %}}) for exact RPC shapes, 
+> error codes and semantics.
+
+2) Execution API (Task Hub Protocol)
+  - **Poll for Work**: Workers fetch orchestration and activity work items.
+  - **Complete / Fail Work**: Workers report completion results or failures; the engine appends these to history and advances
+    orchestration progress.
+  - **Heartbeats / Leases**: Optional mechanisms for long-running activities and cooperative rebalancing.
+  - **Timers & External Events**: Delivered to orchestrations as history events to keep replay deterministic.
+
+> **Note**: See [Execution API specification]({{% ref workflow-protocol-execution-api %}}) defining `TaskHubSidecarService` 
+> contracts, payload schemas and sequencing rules.
+
+# Request & Runtime Lifecycle
+1. Start Workflow
+  - Client calls `StartWorkflow` via the Management API.
+  - Engine persists the initial event (e.g., `ExecutionStarted`) and materializes an instance.
+2. Orchestrator Execution (Replay-driven)
+  - Engine replays orchestration history to rehydrate state.
+  - Orchestrator schedules effects (activities, sub-orchestrations, timers) by issuing commands, which the engine 
+  persists as new history events.
+3. Activity Dispatch & Execution
+  - Engine dispatches activity work items to workers
+  - Worker runs the activity (may be retried and delivered at least once).
+  - Worker responds with completion (result) or failure; engine appends to history.
+4. Timers & External Signals
+  - Engine delivers timer fired or external event records as history entries.
+  - Orchestrator consumes these deterministically on next replay.
+5. Progress & Checkpointing
+  - Each step appends to the history log and advances orchestration state.
+  - The engine safeguards idempotence and exactly-once commit of orchestration state.
+6. Completion
+  - Orchestration returns an output (success) or a failure (exception details).
+  - Final state and output are persisted; status queries reflect the terminal state.
+
+# Protocol Principles
+- **GRIEF (GRpc IntErFace)**: All worker/engine and client/engine communication is gRPC.
+- **Replay-based Orchestration**: Determinism enforced through history replay.
+- **At-least-once Activity Delivery**: Activities may re-execute; design for idempotency.
+- **Engine-mediated Effects**: All nondeterminism/time/IO flows through the engine to remain replay-safe.
+
+# Documentation Map
+1. [Management API]({{% ref workflow-protocol-management-api %}})
+   Detailed Dapr gRPC control-plane operations and payloads.
+2. [Execution API (Task Hub Protocol)]({{% ref workflow-protocol-execution-api %}})
+   `TaskHubSidecarService` worker protocol, work item contracts, result/failure reporting, and sequencing.
+3. [Orchestration Lifecycle]({{% ref workflow-protocol-orchestration-lifecycle %}})
+   Replay semantics, scheduling, external events, timers, and completion.
+4. [Activity Lifecycle]({{% ref workflow-protocol-activity-lifecycle %}})
+   Dispatch, retries, idempotency, heartbeat semantics, and failure handling.
+5. [State & History]({{% ref workflow-protocol-state-and-history %}})
+   History schema, state snapshots, and persistence guarantees.
+6. [Versioning]({{% ref workflow-protocol-versioning %}})
+   How Dapr handles multiple versions of the same workflow definition.
@@ -0,0 +1,118 @@
+---
+type: docs
+title: "Workflow Protocol - Activity Lifecycle"
+linkTitle: "Activity Lifecycle"
+weight: 400
+description: "Low-level description of the Workflow building block internals."
+---
+
+# Activity Lifecycle
+
+Activities are the basic units of work in a Dapr Workflow. Unlike orchestrations, activities are not replayed and do 
+not need to be deterministic. They are executed exactly once per "schedule" (though retries may occur).
+
+## Execution Flow
+
+1.  **Scheduling**: An orchestration requests an activity by sending a `ScheduleTask` action to the Dapr engine.
+2.  **Work Item Dispatch**: The Dapr engine enqueues an activity task. When an activity worker (SDK) is available, the 
+engine sends an `ActivityWorkItem` via the `GetWorkItems` stream.
+3.  **Execution**: The SDK receives the `ActivityWorkItem`, which contains:
+    *   `name`: The name of the activity to execute.
+    *   `input`: The input data for the activity.
+    *   `instance_id`: The ID of the workflow instance that scheduled the activity.
+    *   `task_id`: A unique identifier for this specific activity execution.
+    *   `task_execution_id`: A unique identifier for the specific *attempt* of this activity. This is useful for 
+        implementing idempotency in activity logic.
+    *   `completion_token`: An opaque token used to correlate the response with this specific work item.
+4.  **Reporting**: After the activity logic finishes, the SDK sends a `CompleteActivityTask` request back to Dapr.
+    *   **Success**: The SDK provides the serialized output in the `result` field.
+    *   **Failure**: The SDK provides `failure_details` (error message, type, stack trace).
+
+## Task Execution IDs
+
+The `task_execution_id` (also known as the Task Execution Key) is a unique, runtime-generated string (typically a UUID) 
+that identifies a specific **attempt** to execute an activity task.
+
+### Why it matters to the SDK
+
+While the Workflow SDK is generally stateless between work items, the `task_execution_id` provides critical context 
+for the **Activity Worker**:
+
+1.  **Distributed Idempotency**: If an activity performs a side effect (e.g., charging a credit card), it 
+should use the `task_execution_id` as an idempotency key.
+2.  **Distinguishing Retries**: Unlike the `task_id` (which remains constant for a specific step in the workflow), 
+the `task_execution_id` **changes every time the engine retries the activity** (e.g., due to a timeout or worker crash).
+3.  **Zombie Detection**: If an activity worker takes too long and the engine times it out and retries on another 
+worker, the original worker might eventually finish. By checking the `task_execution_id` against a persistent store 
+or external API, the worker can determine if it is a "zombie" whose results are no longer wanted.
+
+### Implementation Guidelines for SDKs:
+
+*   **Expose to User**: The SDK MUST expose the `task_execution_id` to the activity implementation logic 
+(e.g., via an `ActivityContext`).
+*   **Do Not Cache**: The SDK should not attempt to cache or reuse this ID across different work items.
+*   **Opaque Usage**: The SDK should treat the value as an opaque string. It is generated by the Dapr sidecar when 
+the activity is dispatched and is not something the SDK needs to create or parse.
+
+## Completion Tokens
+
+The `completion_token` is an opaque string generated by the Dapr runtime and delivered to the SDK as part of the 
+`ActivityWorkItem`.
+
+### Purpose and Intent
+
+1.  **Response Correlation**: The sidecar uses the `completion_token` to reliably match an `ActivityResponse` 
+(from `CompleteActivityTask`) to the original task it dispatched.
+2.  **Stateless Tracking**: It allows the sidecar to remain stateless or minimize state lookups when receiving a 
+completion, as the token contains (or points to) the necessary context (instance ID, task ID, etc.).
+3.  **Zombie Prevention**: If an activity times out and is retried, the new attempt will have a different 
+`completion_token`. If the original "zombie" worker eventually responds with the old token, the sidecar can easily 
+identify and ignore the late response.
+
+### SDK Implementation Guidelines
+
+*   **Capture**: The SDK MUST capture the `completion_token` from the incoming `ActivityWorkItem`.
+*   **Propagation**: The SDK MUST include the exact same `completion_token` in the `ActivityResponse` sent via 
+`CompleteActivityTask`.
+*   **Opaqueness**: The SDK MUST treat the token as a black box. It should not attempt to parse, modify, or construct 
+its own tokens.
+*   **Storage**: While the activity is executing, the SDK must keep this token in memory (e.g., in the `ActivityContext`).
+
+
+## Task Activity IDs
+
+In the Dapr runtime (specifically when using the Actors backend), activities are represented as actors. Each activity 
+execution has a unique **Task Activity ID** (also known as the Activity Actor ID).
+
+The ID follows a specific pattern:
+`{workflowInstanceID}::{taskID}::{generation}`
+
+*   **workflowInstanceID**: The unique ID of the workflow instance that scheduled the activity.
+*   **taskID**: The sequence number of the task within the workflow execution (e.g., 0, 1, 2...).
+*   **generation**: A counter that increments if the workflow is restarted or "continued as new".
+
+This unique ID ensures that activity executions are isolated and can be tracked reliably across retries and restarts.
+
+## Retries
+
+Dapr handles activity retries based on the policy defined in the orchestration (if the SDK supports defining retry 
+policies in the `ScheduleTask` action). If an activity fails and a retry policy is in place, the engine will re-enqueue 
+the activity task after the specified delay.
+
+From the activity worker's perspective, a retry is simply a new `ActivityWorkItem` with the same name and input, but 
+potentially a different `task_id` (or the same, depending on the backend implementation).
+
+## Idempotency
+
+Because activities might be executed more than once (e.g., if the worker crashes after execution but before reporting 
+completion), it is recommended that activity logic be idempotent where possible.
+
+## Comparison with Workflows
+
+| Feature | Orchestration | Activity |
+| :--- | :--- | :--- |
+| **Execution Style** | Replay-based (Deterministic) | Direct execution |
+| **State** | Managed via History Events | No internal workflow state |
+| **Side Effects** | Forbidden (must use activities) | Allowed (IO, Database, etc.) |
+| **Lifetime** | Can be long-running (days/months) | Usually short-lived |
+| **Connectivity** | Connected via `GetWorkItems` | Connected via `GetWorkItems` |
@@ -0,0 +1,110 @@
+---
---
+---
---
+---
+type: docs
+title: "Workflow Protocol - Execution API"
+linkTitle: "Execution API"
+weight: 200
+description: "Low-level description of the Workflow building block internals."
+---
+
+# Workflow Execution API (Task Hub Protocol)
+
+The Workflow Execution API is a low-level gRPC protocol used by Dapr Workflow SDKs to act as "Workers". The SDK 
+connects to the Dapr sidecar via this protocol to poll for work and report completion.
+
+The service is named `TaskHubSidecarService`.
+
+### Service Definition (gRPC)
+
+```protobuf
+service TaskHubSidecarService {
+  rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
+  rpc CompleteOrchestratorTask(OrchestratorResponse) returns (CompleteBatchResponse);
+  rpc CompleteActivityTask(ActivityResponse) returns (CompleteBatchResponse);
+  // ... other management methods
+}
+```
+
+## Worker Lifecycle
+
+1.  **Connection**: The SDK opens a long-running bidirectional stream to `GetWorkItems`.
+2.  **Polling**: The SDK receives `WorkItem` messages from the stream.
+3.  **Execution**:
+    *   If the work item is an **Orchestration**, the SDK retrieves and replays the history events to determine the next actions.
+    *   If the work item is an **Activity**, the SDK executes the activity logic.
+4.  **Completion**:
+    *   For Orchestrations, the SDK calls `CompleteOrchestratorTask` with a list of actions to take.
+    *   For Activities, the SDK calls `CompleteActivityTask` with the result or failure information.
+
+## gRPC Service: `TaskHubSidecarService`
+
+### GetWorkItems
+
+Opens a stream to receive work items for orchestrations and activities.
+
+**Request (`GetWorkItemsRequest`):**
+Typically empty or contains worker metadata.
+
+**Response (stream `WorkItem`):**
+A `WorkItem` can be one of:
+*   `orchestrator_item`: Contains history and new events for an orchestration.
+*   `activity_item`: Contains details for a single activity task.
+
+### CompleteOrchestratorTask
+
+Reports the results of an orchestration execution.
+
+**Request (`OrchestratorResponse`):**
+*   `instance_id`: The ID of the workflow instance.
+*   `actions`: A list of `OrchestratorAction` messages.
+*   `custom_status`: Optional string for user-defined status.
+
+**OrchestratorAction Types:**
+*   `ScheduleTask`: Schedule a new activity.
+*   `CreateTimer`: Schedule a durable timer.
+*   `CreateSubOrchestration`: Start a child workflow.
+*   `CompleteOrchestration`: Mark the workflow as completed (success or failure).
+*   `TerminateOrchestration`: Forcefully terminate the instance.
+*   `SendEvent`: Send an event to another workflow.
+
+### CompleteActivityTask
+
+Reports the result of an activity execution.
+
+**Request (`ActivityResponse`):**
+*   `instance_id`: The ID of the workflow instance.
+*   `task_id`: The unique ID of the activity task.
+*   `completion_token`: The opaque token received in the `ActivityWorkItem`.
+*   `result`: The serialized output of the activity (if successful).
+*   `failure_details`: Details about the error (if failed).
+
+## Data Models
+
+### HistoryEvent
+
+Workflows in Dapr are event-sourced. The state of an orchestration is rebuilt by replaying a sequence of 
+`HistoryEvent` messages.
+
+Common event types:
+*   `ExecutionStarted`: Initial event containing workflow name and input.
+*   `TaskScheduled`: An activity was scheduled.
+*   `TaskCompleted`: An activity finished successfully.
+*   `TaskFailed`: An activity failed.
+*   `TimerCreated`: A timer was scheduled.
+*   `TimerFired`: A timer expired.
+*   `OrchestrationCompleted`: The workflow finished.
+
+### FailureDetails
+
+Used to report errors from activities or orchestrations.
+*   `error_type`: A string identifying the type of error.
+*   `error_message`: A human-readable error message.
+*   `stack_trace`: Optional stack trace.
+*   `is_non_retriable`: Boolean flag.
+
+## Protocol Nuances
+
+*   **Streaming**: `GetWorkItems` is a server-to-client stream. Dapr pushes work to the SDK as it becomes available.
+*   **Sticky Sessions**: Dapr attempts to send work items for the same instance to the same worker if possible, but 
+the SDK must not rely on this for correctness.
+*   **Determinism**: The SDK **must** ensure that the orchestration logic is deterministic. During replay, the SDK 
+uses the history provided in the `OrchestratorWorkItem` to avoid re-executing actions that have already been recorded.