feat: Enable concurrent microbatch execution by wmjones · Pull Request #1326 · databricks/dbt-databricks

wmjones · 2026-02-18T16:33:48Z

Resolves #914

Description

Declares MicrobatchConcurrency adapter capability so dbt-core 1.9+ can execute microbatch incremental batches in parallel threads instead of sequentially.

Per reviewer feedback from @sd-db, the capability is gated behind the use_concurrent_microbatch behavior flag (default: false). Users opt in via:

# dbt_project.yml
flags:
  use_concurrent_microbatch: true

When the flag is disabled (default), adapter.supports(MicrobatchConcurrency) returns False and dbt-core falls back to sequential batch execution — identical to current behavior.

Implementation

USE_CONCURRENT_MICROBATCH behavior flag — module-level BehaviorFlag with default=False, registered in _behavior_flags
supports() instance method override — intercepts Capability.MicrobatchConcurrency and gates on the flag; delegates all other capabilities to super().supports()
MicrobatchConcurrency removed from _capabilities dict — the supports() override is the sole gatekeeper (if the capability stayed in _capabilities, super().supports() would return True regardless of the flag)
Unit tests — disabled-by-default, enabled-with-flag, guard test (capability not declared as Full in _capabilities), delegation regression test

Integration test findings (Databricks cluster, `batch_size='day'`, 31 batches, `--threads 4`)

Run	Flag	Batches OK	Notes
Sequential	`false`	31/31	Correct fallback, warning emitted
Concurrent	`true`	13-30/31	`DELTA_CONCURRENT_APPEND` on non-partitioned tables

Key finding: REPLACE WHERE predicates are non-overlapping (each batch is exactly one batch_size wide, regardless of lookback), but non-partitioned Delta tables still conflict because Delta's WriteSerializable isolation cannot verify non-overlap at the file level — it conservatively rejects concurrent conditional overwrites to the same table root.

A secondary error class (DELTA_METADATA_CHANGED) occurs when dbt-databricks applies SET TBLPROPERTIES (e.g., autoCompact) per batch, conflicting with concurrent writes.

Safe configurations for concurrent microbatch:

Partition the target table by event_time at the same granularity as batch_size (allows Delta to verify non-overlap at the partition level)
Avoid per-batch tblproperties changes
DATABRICKS_SKIP_OPTIMIZE=true prevents post-write OPTIMIZE conflicts (but does not prevent the core REPLACE WHERE conflict)

lookback is NOT a factor — it only controls which batches are generated (shifts the start of the batch list backwards), not the width of any individual batch's REPLACE WHERE predicate.

The default=false is the correct safe choice. The default can be flipped to true around 1.12 once partitioning guidance and/or retry logic is established.

Prior art: dbt-snowflake added the same capability in dbt-snowflake#1259.

Checklist

I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

sd-db · 2026-02-19T17:10:22Z

Hi @wmjones, we are doing a the release today for 1.11.5 and would not be possible to add a new feature in the same release (includes dbt-core pinning upgrade). Can you update the CHANGELOG.md to target 1.11.5 instead ? Thx again for the PR !!

P.S. I am stating integration test runs and will report if I see any issues.

…y capability Signed-off-by: Wyatt Jones <wyatt.jones6@cfacorp.com>

sd-db · 2026-02-24T07:34:59Z

@wmjones It seem concurrency is opt-in by default. What this means is that anyone who is using microbatch-incremental models will have concurrent execution enabled by default. In dbt-databricks there is a well-established behaviour flag system that you can use to gate this feature behind a flag (it would be False by default). I would suggest doing that so it can ship safely in a patch, and then flip the default to true around 1.12 as we should have good data points by then on prod behaviour. See --> b819a878 for more details on how you might do this...

…avior flag Per reviewer feedback, gate the capability behind a behavior flag (default: false) so it ships safely in a patch. Users opt in via `flags: {use_concurrent_microbatch: true}` in dbt_project.yml. The default will be flipped to true around 1.12 once production data points confirm safe concurrent execution.

wmjones · 2026-02-24T15:54:00Z

@sd-db Thanks for the feedback — great call on the behavior flag approach. I've implemented it and pushed the changes.

What changed

The MicrobatchConcurrency capability is now gated behind use_concurrent_microbatch (default: false), following the same pattern as use_managed_iceberg (reference commit b819a878):

USE_CONCURRENT_MICROBATCH BehaviorFlag definition (default=False)
Removed MicrobatchConcurrency from class-level _capabilities
Registered in _behavior_flags property
supports() instance method override that checks bool(self.behavior.use_concurrent_microbatch)
4 unit tests (disabled-by-default, enabled-with-flag, guard test, delegation test)
CHANGELOG updated to mention the flag

Users opt in via:

flags:
  use_concurrent_microbatch: true

Integration test results

I ran integration tests on a Databricks cluster (DBR 17.3, batch_size='day', 31 batches, --threads 4):

Flag OFF (default): 31/31 batches passed for both models. Sequential fallback works correctly with the expected "adapter does not support running batches concurrently" warning.
Flag ON: Concurrency works (overlapping batch START timestamps confirmed), but DELTA_CONCURRENT_APPEND errors occurred on non-partitioned Delta tables — causing partial failures and data loss on one model (13/31 batches OK).

Root cause: Delta's WriteSerializable isolation cannot verify non-overlap at the file level for non-partitioned tables. Even though each batch's REPLACE WHERE predicate is exactly one day wide with no overlap, concurrent conditional overwrites to the same table root are conservatively rejected.

A secondary error (DELTA_METADATA_CHANGED) occurs when SET TBLPROPERTIES (autoCompact, etc.) is applied per batch, conflicting with concurrent writes.

lookback is NOT a factor — it only controls which batches are generated, not predicate width. Each batch is always exactly one batch_size wide.

Recommendation

default=false is absolutely the right choice. For users who want to enable this, they'd need:

Tables partitioned by event_time at batch granularity
Avoiding per-batch tblproperties changes
DATABRICKS_SKIP_OPTIMIZE=true for tables with clustering

Happy to adjust anything based on your review. Also updated the PR description with these findings.

sd-db

Added some minor comments, otherwise looks good!

dbt/adapters/databricks/impl.py

tests/unit/test_adapter_capabilities.py

… tests - Reduce supports() comment to one-liner per sd-db's nit - Remove test_microbatch_concurrency_not_declared_in_capabilities - Remove test_supports_delegates_other_capabilities

wmjones · 2026-02-24T18:09:31Z

@sd-db Thanks for the review! Pushed d63ae152 addressing all three comments:

Trimmed the supports() comment to a one-liner
Removed test_microbatch_concurrency_not_declared_in_capabilities
Removed test_supports_delegates_other_capabilities

The two core tests (disabled-by-default + enabled-with-flag) remain.

sd-db

Changes look good, thx!

wmjones requested review from benc-db, sd-db and tejassp-db as code owners February 18, 2026 16:33

wmjones force-pushed the 914-microbatch-concurrency branch from 3d91722 to 060f1b9 Compare February 19, 2026 13:23

feat: Enable concurrent microbatch execution via MicrobatchConcurrenc…

04e0846

…y capability Signed-off-by: Wyatt Jones <wyatt.jones6@cfacorp.com>

wmjones force-pushed the 914-microbatch-concurrency branch from 060f1b9 to 04e0846 Compare February 20, 2026 00:02

style: fix import sorting in test_adapter_capabilities.py

8b6318a

sd-db reviewed Feb 24, 2026

View reviewed changes

dbt/adapters/databricks/impl.py Outdated Show resolved Hide resolved

tests/unit/test_adapter_capabilities.py Outdated Show resolved Hide resolved

tests/unit/test_adapter_capabilities.py Outdated Show resolved Hide resolved

style: address review feedback — trim comment, remove overly specific…

d63ae15

… tests - Reduce supports() comment to one-liner per sd-db's nit - Remove test_microbatch_concurrency_not_declared_in_capabilities - Remove test_supports_delegates_other_capabilities

sd-db approved these changes Feb 25, 2026

View reviewed changes

sd-db merged commit 33cca1d into databricks:main Feb 26, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable concurrent microbatch execution#1326

feat: Enable concurrent microbatch execution#1326
sd-db merged 4 commits intodatabricks:mainfrom
wmjones:914-microbatch-concurrency

wmjones commented Feb 18, 2026 •

edited

Loading

Uh oh!

sd-db commented Feb 19, 2026

Uh oh!

sd-db commented Feb 24, 2026

Uh oh!

wmjones commented Feb 24, 2026

Uh oh!

sd-db left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wmjones commented Feb 24, 2026

Uh oh!

sd-db left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wmjones commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Implementation

Integration test findings (Databricks cluster, batch_size='day', 31 batches, --threads 4)

Checklist

Uh oh!

sd-db commented Feb 19, 2026

Uh oh!

sd-db commented Feb 24, 2026

Uh oh!

wmjones commented Feb 24, 2026

What changed

Integration test results

Recommendation

Uh oh!

sd-db left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wmjones commented Feb 24, 2026

Uh oh!

sd-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wmjones commented Feb 18, 2026 •

edited

Loading

Integration test findings (Databricks cluster, `batch_size='day'`, 31 batches, `--threads 4`)

sd-db left a comment •

edited

Loading