feat: Enable concurrent microbatch execution#1326
Conversation
3d91722 to
060f1b9
Compare
|
Hi @wmjones, we are doing a the release today for P.S. I am stating integration test runs and will report if I see any issues. |
…y capability Signed-off-by: Wyatt Jones <wyatt.jones6@cfacorp.com>
060f1b9 to
04e0846
Compare
|
@wmjones It seem concurrency is opt-in by default. What this means is that anyone who is using |
…avior flag
Per reviewer feedback, gate the capability behind a behavior flag
(default: false) so it ships safely in a patch. Users opt in via
`flags: {use_concurrent_microbatch: true}` in dbt_project.yml.
The default will be flipped to true around 1.12 once production
data points confirm safe concurrent execution.
|
@sd-db Thanks for the feedback — great call on the behavior flag approach. I've implemented it and pushed the changes. What changedThe
Users opt in via: flags:
use_concurrent_microbatch: trueIntegration test resultsI ran integration tests on a Databricks cluster (DBR 17.3,
Root cause: Delta's A secondary error (
Recommendation
Happy to adjust anything based on your review. Also updated the PR description with these findings. |
… tests - Reduce supports() comment to one-liner per sd-db's nit - Remove test_microbatch_concurrency_not_declared_in_capabilities - Remove test_supports_delegates_other_capabilities
|
@sd-db Thanks for the review! Pushed
The two core tests (disabled-by-default + enabled-with-flag) remain. |
Resolves #914
Description
Declares
MicrobatchConcurrencyadapter capability so dbt-core 1.9+ can execute microbatch incremental batches in parallel threads instead of sequentially.Per reviewer feedback from @sd-db, the capability is gated behind the
use_concurrent_microbatchbehavior flag (default:false). Users opt in via:When the flag is disabled (default),
adapter.supports(MicrobatchConcurrency)returnsFalseand dbt-core falls back to sequential batch execution — identical to current behavior.Implementation
USE_CONCURRENT_MICROBATCHbehavior flag — module-levelBehaviorFlagwithdefault=False, registered in_behavior_flagssupports()instance method override — interceptsCapability.MicrobatchConcurrencyand gates on the flag; delegates all other capabilities tosuper().supports()MicrobatchConcurrencyremoved from_capabilitiesdict — thesupports()override is the sole gatekeeper (if the capability stayed in_capabilities,super().supports()would returnTrueregardless of the flag)_capabilities), delegation regression testIntegration test findings (Databricks cluster,
batch_size='day', 31 batches,--threads 4)falsetrueDELTA_CONCURRENT_APPENDon non-partitioned tablesKey finding:
REPLACE WHEREpredicates are non-overlapping (each batch is exactly onebatch_sizewide, regardless oflookback), but non-partitioned Delta tables still conflict because Delta'sWriteSerializableisolation cannot verify non-overlap at the file level — it conservatively rejects concurrent conditional overwrites to the same table root.A secondary error class (
DELTA_METADATA_CHANGED) occurs when dbt-databricks appliesSET TBLPROPERTIES(e.g.,autoCompact) per batch, conflicting with concurrent writes.Safe configurations for concurrent microbatch:
event_timeat the same granularity asbatch_size(allows Delta to verify non-overlap at the partition level)tblpropertieschangesDATABRICKS_SKIP_OPTIMIZE=trueprevents post-write OPTIMIZE conflicts (but does not prevent the coreREPLACE WHEREconflict)lookbackis NOT a factor — it only controls which batches are generated (shifts the start of the batch list backwards), not the width of any individual batch'sREPLACE WHEREpredicate.The
default=falseis the correct safe choice. The default can be flipped totruearound 1.12 once partitioning guidance and/or retry logic is established.Prior art: dbt-snowflake added the same capability in dbt-snowflake#1259.
Checklist
CHANGELOG.mdand added information about my change to the "dbt-databricks next" section.