adapter: sort catalog entries into topo order in bootstrap#34859
Merged
teskje merged 1 commit intoMaterializeInc:mainfrom Feb 26, 2026
Merged
adapter: sort catalog entries into topo order in bootstrap#34859teskje merged 1 commit intoMaterializeInc:mainfrom
teskje merged 1 commit intoMaterializeInc:mainfrom
Conversation
5 tasks
1f4649d to
26b92ac
Compare
7181974 to
3937b85
Compare
4170bb2 to
c604ecb
Compare
teskje
added a commit
that referenced
this pull request
Feb 20, 2026
### Motivation This fixes a dormant bug in how the compute controller schedules (i.e. allows hydration to start for) compute collections. The bug is revealed by #34859 where the cloudtests get stuck without this fix because some system indexes are scheduled before their dependencies and thus can never hydrate. ### Description The existing "wait for compute inputs to be available" check is not sufficient to ensure compute collections cannot be scheduled in the wrong order, because it doesn't take replicas into account. It just looks at the global frontier, ensuring that _some_ replica must have scheduled and hydrated the dataflow before. But there is no guarantee that the dataflow is currently scheduled and hydrated. To ensure that no dataflow is scheduled before all of its dependencies have been scheduled, we add another check to do just that. We leave in the "wait for compute inputs to be available" check because it still seems like a useful heuristic to achieve a better scheduling order. Though it isn't strictly necessary to ensure liveness.
c604ecb to
2e23475
Compare
2e23475 to
3b0f857
Compare
5 tasks
teskje
commented
Feb 23, 2026
Comment on lines
-50
to
-52
| # 2. Re-create explain_mv2 and check its plan again - it should be | ||
| # almost identical to the plan for explain_mv1 after picking up | ||
| # explain_item_t2_y as a used index. |
Contributor
Author
There was a problem hiding this comment.
With the new logic, explain_mv2 can have already picked up the index if there was a version upgrade prior to the validate call, which would make the validate flaky. We resolve this by simply not creating an index appropriate for explain_mv2.
3b0f857 to
ecac1f1
Compare
SangJunBak
reviewed
Feb 26, 2026
|
|
||
| let mut result = Vec::new(); | ||
| for entry in non_indexes { | ||
| let gid = entry.latest_global_id(); |
Contributor
There was a problem hiding this comment.
Out of curiosity, when can an entry have multiple global IDs?
Contributor
Author
There was a problem hiding this comment.
ALTERed tables and MVs have multiple GlobalIds, one for each version they went through.
SangJunBak
approved these changes
Feb 26, 2026
Contributor
SangJunBak
left a comment
There was a problem hiding this comment.
Last commit looks good to me!
This commit adds topological sorting of catalog entries to the coordinator bootstrap process. Previously, we were only sorting by catalog IDs, but ID order doesn't always match dependency order nowadays, and it is important that the entries are bootstrapped in dependency order. In the sorting we treat indexes specially, to ensure they get created as early as possible. This makes sure any object depending on an indexed relation is also able to make use of the index. To avoid code duplication, this commit pulls out the existing topo sort implementation from `apply.rs` into a util module and makes it generic.
ecac1f1 to
11d9580
Compare
Contributor
Author
|
TFTRs! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds topological sorting of catalog entries to the coordinator bootstrap process. Previously, we were only sorting by catalog IDs, but ID order doesn't always match dependency order nowadays, and it is important that the entries are bootstrapped in dependency order.
In the sorting we treat indexes specially, to ensure they get created as early as possible. This makes sure any object depending on an indexed relation is also able to make use of the index. On main, this change reduces the number of dataflow operators in mz_catalog_server from 11863 to 9948 and the number of arrangements from 842 to 763.
To avoid code duplication, this PR pulls out the existing topo sort implementation from
apply.rsinto a util module and makes it generic.Motivation
mz_catalog_server has a bunch of dataflows that don't make use of available indexes. Slack thread.
Closes SQL-95
Checklist
$T ⇔ Proto$Tmapping (possibly in a backwards-incompatible way), then it is tagged with aT-protolabel.