Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
b6d1842
Initial commit adding low-level workflow protocol information as deri…
WhitWaldo Feb 1, 2026
cec2e1a
Added page for workflow versioning
WhitWaldo Feb 1, 2026
5fd61bc
Added section for Task Activity IDs
WhitWaldo Feb 1, 2026
8fcbca8
Updated state value to include version information
WhitWaldo Feb 1, 2026
efc3df6
Added patch versioning and stalled sections to orchestration lifecycle
WhitWaldo Feb 1, 2026
c0d20e6
Added missing article
WhitWaldo Feb 1, 2026
1ba6a96
Added documentation about the continuation tokens and task execution IDs
WhitWaldo Feb 2, 2026
687f667
Added more context around task execution IDs
WhitWaldo Feb 2, 2026
3f1ea60
Fleshed out the continuation tokens in management API
WhitWaldo Feb 2, 2026
02a8c6d
Added completion tokens to activity lifecycle
WhitWaldo Feb 2, 2026
96ea383
Added completion token to protocol execution
WhitWaldo Feb 2, 2026
8d3e8ce
Removed bad link
WhitWaldo Feb 2, 2026
11fb8a6
Merge branch 'v1.17' into runtime-workflow-docs
marcduiker Feb 23, 2026
cec592f
Merge branch 'v1.17' into runtime-workflow-docs
WhitWaldo Feb 23, 2026
7d06616
Merge branch 'v1.17' into runtime-workflow-docs
marcduiker Feb 23, 2026
5821078
Added rerun/purge per Josh's review
WhitWaldo Feb 24, 2026
b40cfb6
Merge remote-tracking branch 'origin/runtime-workflow-docs' into runt…
WhitWaldo Feb 24, 2026
98a26f6
Added re-run and purge descriptions
WhitWaldo Feb 24, 2026
fa6937f
Removed text that didn't make sense
WhitWaldo Feb 24, 2026
f85153f
Removed unnecessary spaces
WhitWaldo Feb 24, 2026
5fbbf90
Added clarification as to where the SDK's role lies
WhitWaldo Feb 24, 2026
c79282f
Fixed line indentation
WhitWaldo Feb 24, 2026
20e8f33
Merge branch 'v1.17' into runtime-workflow-docs
WhitWaldo Feb 24, 2026
d5f19d0
Updated to use nomenclature of Dapr Workflows
WhitWaldo Feb 24, 2026
65335ff
Added section clarifying named workflow versioning in addition to pat…
WhitWaldo Feb 24, 2026
980f5df
Clarified that the management APIs are often exposed by the SDKs
WhitWaldo Feb 24, 2026
88cfb97
Clarified that the SDK also retrieves work history (as it's not immed…
WhitWaldo Feb 24, 2026
672332d
Added clarifying verbiage about versioning properties and how a stall…
WhitWaldo Feb 24, 2026
4b2cceb
Update daprdocs/content/en/contributing/protocol-reference/workflow-p…
marcduiker Feb 25, 2026
d693142
Merge branch 'v1.17' into runtime-workflow-docs
marcduiker Feb 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions daprdocs/content/en/contributing/protocol-reference/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
Comment thread
WhitWaldo marked this conversation as resolved.
type: docs
title: "Protocol Reference"
linkTitle: "Protocol Reference"
weight: 100
description: >
Low-level technical documentation of the Dapr runtime protocols and internal mechanics for each building block.
---

This section provides a deep dive into the internal workings of the Dapr runtime. It is intended for maintainers,
contributors, and anyone interested in the low-level implementation details of Dapr's building blocks.

Unlike the user-facing API reference, these documents focus on:
- How the runtime processes requests.
- Internal state transitions.
- Interaction with component interfaces.
- Protocol-level details (gRPC and HTTP).

## Building Blocks

Select a building block to explore its internal protocol and mechanics:

- [Workflow]({{% ref workflow-protocol %}})
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
---
Comment thread
WhitWaldo marked this conversation as resolved.
type: docs
title: "Workflow Protocol"
linkTitle: "Workflow"
weight: 10
description: "Low-level description of the Workflow building block internals."
---

This document specifies the Dapr Workflow protocol and runtime contract at a low level. It targets SDK authors building
Workflow Workers and runtime maintainers evolving the Dapr sidecar’s Workflow Engine.

## Overview
Dapr Workflow implements a sidecar-as-scheduler pattern: the Dapr runtime (sidecar) acts as the Workflow Engine, and
the application SDK acts as the Workflow Worker. All control and execution traffic flows over gRPC.

There are two protocol surfaces:
1. Management API (standard Dapr gRPC accessible via SDK):
- Start, terminate, pause, resume, re-run, purge and query workflow instances.
2. Execution API (Task Hub Protocol):
- Worker facing, used to receive orchestration/activity work items and to report completion (e.g., via
`TaskHubSidecarService`).

[//]: # (graph LR)
Comment thread
WhitWaldo marked this conversation as resolved.

[//]: # (Client[Workflow Client] -- Management API --> Sidecar[Dapr Sidecar Engine])

[//]: # (Sidecar -- Task Hub Protocol --> Worker[Workflow Worker SDK])

[//]: # (Sidecar -- State Store --> Actors[Dapr Actors Backend])


Key Components
- **Workflow Engine (Dapr Sidecar)**

Manages workflow state transitions, history persistence, scheduling of orchestration and activity tasks, and
reliable delivery semantics. By default, it leverages Dapr Actors as the backend for durable, partitioned execution.

- **Workflow Worker (Application SDK)**

Connects to the sidecar, polls for orchestration and activity work items, executes user-defined logic, and returns
results, failures, and heartbeats to the engine. Orchestration logic must be deterministic; activity logic need not
be.

- **Orchestration**

The deterministic coordinator that defines the workflow. The engine drives orchestrations via history replay to
rebuild state and schedule outbound tasks (activities, sub-orchestrations, timers, external events).

- **Activity**

The atomic unit of work. Activities are executed **at-least-once** and report results or failures back to the engine.
Idempotency is recommended and task execution identifiers are available on context to assist with this.

- **State Store & Backend**

Workflow history and state are durably persisted. The engine typically implements a task hub pattern over the chosen
persistence and uses Dapr Actors as the default reliability substrate.

## Execution Model
Dapr Workflow is based on the Durable Task Framework (DTFx) execution semantics:

### Replay-based Execution
Orchestrators are replayed from their event history to rebuild deterministic state. All nondeterministic operations
(time, random values, I/O) must be mediated by the engine (e.g., timers, activity calls, external events).

### Deterministic Orchestrators
Orchestrator code must be side-effect free except via engine-mediated effects. Control flow must be reproducible
during replay.

### At-least-once Activities, Exactly-once State Commit
Activities may be delivered more than once. The engine ensures workflow state commits are idempotent and applied
exactly once.

### Sidecar-as-Scheduler
The sidecar owns scheduling and persists all history/events before dispatching work to workers. Workers are stateless
executors from the engine’s perspective.

## Protocol Surfaces
1) Management API (Standard Dapr gRPC)
- **Start Workflow**: Create and persist an initial history event; return instance metadata
- **Terminate / Pause/ Resume**: Drive lifecycle transitions through persisted control events.
- **Query**: Retrieve instance status, history, output, failure details, and custom metadata.
- **Re-run**: Start a new workflow instance from a history event.
- **Purge**: Proactively clear workflow history and state.

> **Note**: See: [Management API specification]({{% ref workflow-protocol-management-api %}}) for exact RPC shapes,
> error codes and semantics.

2) Execution API (Task Hub Protocol)
- **Poll for Work**: Workers fetch orchestration and activity work items.
- **Complete / Fail Work**: Workers report completion results or failures; the engine appends these to history and advances
orchestration progress.
- **Heartbeats / Leases**: Optional mechanisms for long-running activities and cooperative rebalancing.
- **Timers & External Events**: Delivered to orchestrations as history events to keep replay deterministic.

> **Note**: See [Execution API specification]({{% ref workflow-protocol-execution-api %}}) defining `TaskHubSidecarService`
> contracts, payload schemas and sequencing rules.

# Request & Runtime Lifecycle
1. Start Workflow
- Client calls `StartWorkflow` via the Management API.
- Engine persists the initial event (e.g., `ExecutionStarted`) and materializes an instance.
2. Orchestrator Execution (Replay-driven)
- Engine replays orchestration history to rehydrate state.
- Orchestrator schedules effects (activities, sub-orchestrations, timers) by issuing commands, which the engine
persists as new history events.
3. Activity Dispatch & Execution
- Engine dispatches activity work items to workers
- Worker runs the activity (may be retried and delivered at least once).
- Worker responds with completion (result) or failure; engine appends to history.
4. Timers & External Signals
- Engine delivers timer fired or external event records as history entries.
- Orchestrator consumes these deterministically on next replay.
5. Progress & Checkpointing
- Each step appends to the history log and advances orchestration state.
- The engine safeguards idempotence and exactly-once commit of orchestration state.
6. Completion
- Orchestration returns an output (success) or a failure (exception details).
- Final state and output are persisted; status queries reflect the terminal state.

# Protocol Principles
- **GRIEF (GRpc IntErFace)**: All worker/engine and client/engine communication is gRPC.
- **Replay-based Orchestration**: Determinism enforced through history replay.
- **At-least-once Activity Delivery**: Activities may re-execute; design for idempotency.
- **Engine-mediated Effects**: All nondeterminism/time/IO flows through the engine to remain replay-safe.

# Documentation Map
1. [Management API]({{% ref workflow-protocol-management-api %}})
Detailed Dapr gRPC control-plane operations and payloads.
2. [Execution API (Task Hub Protocol)]({{% ref workflow-protocol-execution-api %}})
`TaskHubSidecarService` worker protocol, work item contracts, result/failure reporting, and sequencing.
3. [Orchestration Lifecycle]({{% ref workflow-protocol-orchestration-lifecycle %}})
Replay semantics, scheduling, external events, timers, and completion.
4. [Activity Lifecycle]({{% ref workflow-protocol-activity-lifecycle %}})
Dispatch, retries, idempotency, heartbeat semantics, and failure handling.
5. [State & History]({{% ref workflow-protocol-state-and-history %}})
History schema, state snapshots, and persistence guarantees.
6. [Versioning]({{% ref workflow-protocol-versioning %}})
How Dapr handles multiple versions of the same workflow definition.
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
---
Comment thread
marcduiker marked this conversation as resolved.
type: docs
title: "Workflow Protocol - Activity Lifecycle"
linkTitle: "Activity Lifecycle"
weight: 400
description: "Low-level description of the Workflow building block internals."
---

# Activity Lifecycle

Activities are the basic units of work in a Dapr Workflow. Unlike orchestrations, activities are not replayed and do
not need to be deterministic. They are executed exactly once per "schedule" (though retries may occur).

## Execution Flow

1. **Scheduling**: An orchestration requests an activity by sending a `ScheduleTask` action to the Dapr engine.
2. **Work Item Dispatch**: The Dapr engine enqueues an activity task. When an activity worker (SDK) is available, the
engine sends an `ActivityWorkItem` via the `GetWorkItems` stream.
3. **Execution**: The SDK receives the `ActivityWorkItem`, which contains:
* `name`: The name of the activity to execute.
* `input`: The input data for the activity.
* `instance_id`: The ID of the workflow instance that scheduled the activity.
* `task_id`: A unique identifier for this specific activity execution.
* `task_execution_id`: A unique identifier for the specific *attempt* of this activity. This is useful for
implementing idempotency in activity logic.
* `completion_token`: An opaque token used to correlate the response with this specific work item.
4. **Reporting**: After the activity logic finishes, the SDK sends a `CompleteActivityTask` request back to Dapr.
* **Success**: The SDK provides the serialized output in the `result` field.
* **Failure**: The SDK provides `failure_details` (error message, type, stack trace).

## Task Execution IDs

The `task_execution_id` (also known as the Task Execution Key) is a unique, runtime-generated string (typically a UUID)
that identifies a specific **attempt** to execute an activity task.

### Why it matters to the SDK

While the Workflow SDK is generally stateless between work items, the `task_execution_id` provides critical context
for the **Activity Worker**:

1. **Distributed Idempotency**: If an activity performs a side effect (e.g., charging a credit card), it
should use the `task_execution_id` as an idempotency key.
2. **Distinguishing Retries**: Unlike the `task_id` (which remains constant for a specific step in the workflow),
the `task_execution_id` **changes every time the engine retries the activity** (e.g., due to a timeout or worker crash).
3. **Zombie Detection**: If an activity worker takes too long and the engine times it out and retries on another
worker, the original worker might eventually finish. By checking the `task_execution_id` against a persistent store
or external API, the worker can determine if it is a "zombie" whose results are no longer wanted.

### Implementation Guidelines for SDKs:

* **Expose to User**: The SDK MUST expose the `task_execution_id` to the activity implementation logic
(e.g., via an `ActivityContext`).
* **Do Not Cache**: The SDK should not attempt to cache or reuse this ID across different work items.
* **Opaque Usage**: The SDK should treat the value as an opaque string. It is generated by the Dapr sidecar when
the activity is dispatched and is not something the SDK needs to create or parse.

## Completion Tokens

The `completion_token` is an opaque string generated by the Dapr runtime and delivered to the SDK as part of the
`ActivityWorkItem`.

### Purpose and Intent

1. **Response Correlation**: The sidecar uses the `completion_token` to reliably match an `ActivityResponse`
(from `CompleteActivityTask`) to the original task it dispatched.
2. **Stateless Tracking**: It allows the sidecar to remain stateless or minimize state lookups when receiving a
completion, as the token contains (or points to) the necessary context (instance ID, task ID, etc.).
3. **Zombie Prevention**: If an activity times out and is retried, the new attempt will have a different
`completion_token`. If the original "zombie" worker eventually responds with the old token, the sidecar can easily
identify and ignore the late response.

### SDK Implementation Guidelines

* **Capture**: The SDK MUST capture the `completion_token` from the incoming `ActivityWorkItem`.
* **Propagation**: The SDK MUST include the exact same `completion_token` in the `ActivityResponse` sent via
`CompleteActivityTask`.
* **Opaqueness**: The SDK MUST treat the token as a black box. It should not attempt to parse, modify, or construct
its own tokens.
* **Storage**: While the activity is executing, the SDK must keep this token in memory (e.g., in the `ActivityContext`).


## Task Activity IDs

In the Dapr runtime (specifically when using the Actors backend), activities are represented as actors. Each activity
execution has a unique **Task Activity ID** (also known as the Activity Actor ID).

The ID follows a specific pattern:
`{workflowInstanceID}::{taskID}::{generation}`

* **workflowInstanceID**: The unique ID of the workflow instance that scheduled the activity.
* **taskID**: The sequence number of the task within the workflow execution (e.g., 0, 1, 2...).
* **generation**: A counter that increments if the workflow is restarted or "continued as new".

This unique ID ensures that activity executions are isolated and can be tracked reliably across retries and restarts.

## Retries

Dapr handles activity retries based on the policy defined in the orchestration (if the SDK supports defining retry
policies in the `ScheduleTask` action). If an activity fails and a retry policy is in place, the engine will re-enqueue
the activity task after the specified delay.

From the activity worker's perspective, a retry is simply a new `ActivityWorkItem` with the same name and input, but
potentially a different `task_id` (or the same, depending on the backend implementation).

## Idempotency

Because activities might be executed more than once (e.g., if the worker crashes after execution but before reporting
completion), it is recommended that activity logic be idempotent where possible.

## Comparison with Workflows

| Feature | Orchestration | Activity |
| :--- | :--- | :--- |
| **Execution Style** | Replay-based (Deterministic) | Direct execution |
| **State** | Managed via History Events | No internal workflow state |
| **Side Effects** | Forbidden (must use activities) | Allowed (IO, Database, etc.) |
| **Lifetime** | Can be long-running (days/months) | Usually short-lived |
| **Connectivity** | Connected via `GetWorkItems` | Connected via `GetWorkItems` |
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file starts with a UTF-8 BOM (Byte Order Mark) character "" which should be removed. BOM characters can cause issues with some parsers and are not necessary for UTF-8 encoded files.

Suggested change
---
---

Copilot uses AI. Check for mistakes.
type: docs
title: "Workflow Protocol - Execution API"
linkTitle: "Execution API"
weight: 200
description: "Low-level description of the Workflow building block internals."
---

# Workflow Execution API (Task Hub Protocol)

The Workflow Execution API is a low-level gRPC protocol used by Dapr Workflow SDKs to act as "Workers". The SDK
connects to the Dapr sidecar via this protocol to poll for work and report completion.

The service is named `TaskHubSidecarService`.

### Service Definition (gRPC)

```protobuf
service TaskHubSidecarService {
rpc GetWorkItems(GetWorkItemsRequest) returns (stream WorkItem);
rpc CompleteOrchestratorTask(OrchestratorResponse) returns (CompleteBatchResponse);
rpc CompleteActivityTask(ActivityResponse) returns (CompleteBatchResponse);
// ... other management methods
}
```

## Worker Lifecycle

1. **Connection**: The SDK opens a long-running bidirectional stream to `GetWorkItems`.
2. **Polling**: The SDK receives `WorkItem` messages from the stream.
3. **Execution**:
* If the work item is an **Orchestration**, the SDK retrieves and replays the history events to determine the next actions.
* If the work item is an **Activity**, the SDK executes the activity logic.
4. **Completion**:
* For Orchestrations, the SDK calls `CompleteOrchestratorTask` with a list of actions to take.
* For Activities, the SDK calls `CompleteActivityTask` with the result or failure information.

## gRPC Service: `TaskHubSidecarService`

### GetWorkItems

Opens a stream to receive work items for orchestrations and activities.

**Request (`GetWorkItemsRequest`):**
Typically empty or contains worker metadata.

**Response (stream `WorkItem`):**
A `WorkItem` can be one of:
* `orchestrator_item`: Contains history and new events for an orchestration.
* `activity_item`: Contains details for a single activity task.

### CompleteOrchestratorTask

Reports the results of an orchestration execution.

**Request (`OrchestratorResponse`):**
* `instance_id`: The ID of the workflow instance.
* `actions`: A list of `OrchestratorAction` messages.
* `custom_status`: Optional string for user-defined status.

**OrchestratorAction Types:**
* `ScheduleTask`: Schedule a new activity.
* `CreateTimer`: Schedule a durable timer.
* `CreateSubOrchestration`: Start a child workflow.
* `CompleteOrchestration`: Mark the workflow as completed (success or failure).
* `TerminateOrchestration`: Forcefully terminate the instance.
* `SendEvent`: Send an event to another workflow.

### CompleteActivityTask

Reports the result of an activity execution.

**Request (`ActivityResponse`):**
* `instance_id`: The ID of the workflow instance.
* `task_id`: The unique ID of the activity task.
* `completion_token`: The opaque token received in the `ActivityWorkItem`.
* `result`: The serialized output of the activity (if successful).
* `failure_details`: Details about the error (if failed).

## Data Models

### HistoryEvent

Workflows in Dapr are event-sourced. The state of an orchestration is rebuilt by replaying a sequence of
`HistoryEvent` messages.

Common event types:
* `ExecutionStarted`: Initial event containing workflow name and input.
* `TaskScheduled`: An activity was scheduled.
* `TaskCompleted`: An activity finished successfully.
* `TaskFailed`: An activity failed.
* `TimerCreated`: A timer was scheduled.
* `TimerFired`: A timer expired.
* `OrchestrationCompleted`: The workflow finished.

### FailureDetails

Used to report errors from activities or orchestrations.
* `error_type`: A string identifying the type of error.
* `error_message`: A human-readable error message.
* `stack_trace`: Optional stack trace.
* `is_non_retriable`: Boolean flag.

## Protocol Nuances

* **Streaming**: `GetWorkItems` is a server-to-client stream. Dapr pushes work to the SDK as it becomes available.
* **Sticky Sessions**: Dapr attempts to send work items for the same instance to the same worker if possible, but
the SDK must not rely on this for correctness.
* **Determinism**: The SDK **must** ensure that the orchestration logic is deterministic. During replay, the SDK
uses the history provided in the `OrchestratorWorkItem` to avoid re-executing actions that have already been recorded.
Loading
Loading