Skip to content

adapter: sort catalog entries into topo order in bootstrap#34859

Merged
teskje merged 1 commit intoMaterializeInc:mainfrom
teskje:bootstrap-topo-sort
Feb 26, 2026
Merged

adapter: sort catalog entries into topo order in bootstrap#34859
teskje merged 1 commit intoMaterializeInc:mainfrom
teskje:bootstrap-topo-sort

Conversation

@teskje
Copy link
Copy Markdown
Contributor

@teskje teskje commented Jan 29, 2026

This PR adds topological sorting of catalog entries to the coordinator bootstrap process. Previously, we were only sorting by catalog IDs, but ID order doesn't always match dependency order nowadays, and it is important that the entries are bootstrapped in dependency order.

In the sorting we treat indexes specially, to ensure they get created as early as possible. This makes sure any object depending on an indexed relation is also able to make use of the index. On main, this change reduces the number of dataflow operators in mz_catalog_server from 11863 to 9948 and the number of arrangements from 842 to 763.

To avoid code duplication, this PR pulls out the existing topo sort implementation from apply.rs into a util module and makes it generic.

Motivation

  • This PR fixes a previously unreported bug.

mz_catalog_server has a bunch of dataflows that don't make use of available indexes. Slack thread.

Closes SQL-95

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@teskje teskje force-pushed the bootstrap-topo-sort branch 4 times, most recently from 1f4649d to 26b92ac Compare February 9, 2026 18:21
@teskje teskje force-pushed the bootstrap-topo-sort branch 6 times, most recently from 7181974 to 3937b85 Compare February 18, 2026 17:49
@teskje teskje force-pushed the bootstrap-topo-sort branch 2 times, most recently from 4170bb2 to c604ecb Compare February 20, 2026 10:06
teskje added a commit that referenced this pull request Feb 20, 2026
### Motivation

This fixes a dormant bug in how the compute controller schedules (i.e.
allows hydration to start for) compute collections. The bug is revealed
by #34859 where the
cloudtests get stuck without this fix because some system indexes are
scheduled before their dependencies and thus can never hydrate.

### Description

The existing "wait for compute inputs to be available" check is not
sufficient to ensure compute collections cannot be scheduled in the
wrong order, because it doesn't take replicas into account. It just
looks at the global frontier, ensuring that _some_ replica must have
scheduled and hydrated the dataflow before. But there is no guarantee
that the dataflow is currently scheduled and hydrated.

To ensure that no dataflow is scheduled before all of its dependencies
have been scheduled, we add another check to do just that.

We leave in the "wait for compute inputs to be available" check because
it still seems like a useful heuristic to achieve a better scheduling
order. Though it isn't strictly necessary to ensure liveness.
@teskje teskje force-pushed the bootstrap-topo-sort branch from c604ecb to 2e23475 Compare February 20, 2026 16:37
@antiguru antiguru self-requested a review February 23, 2026 10:28
@teskje teskje force-pushed the bootstrap-topo-sort branch from 2e23475 to 3b0f857 Compare February 23, 2026 11:24
Comment on lines -50 to -52
# 2. Re-create explain_mv2 and check its plan again - it should be
# almost identical to the plan for explain_mv1 after picking up
# explain_item_t2_y as a used index.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new logic, explain_mv2 can have already picked up the index if there was a version upgrade prior to the validate call, which would make the validate flaky. We resolve this by simply not creating an index appropriate for explain_mv2.

@teskje teskje force-pushed the bootstrap-topo-sort branch from 3b0f857 to ecac1f1 Compare February 24, 2026 10:45
@teskje teskje marked this pull request as ready for review February 24, 2026 13:29
@teskje teskje requested review from a team as code owners February 24, 2026 13:29
@teskje teskje requested a review from aljoscha February 24, 2026 13:29
Copy link
Copy Markdown
Member

@antiguru antiguru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good! Ty

@SangJunBak SangJunBak self-requested a review February 25, 2026 23:29
Comment thread src/adapter/src/coord.rs

let mut result = Vec::new();
for entry in non_indexes {
let gid = entry.latest_global_id();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, when can an entry have multiple global IDs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALTERed tables and MVs have multiple GlobalIds, one for each version they went through.

Copy link
Copy Markdown
Contributor

@SangJunBak SangJunBak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit looks good to me!

This commit adds topological sorting of catalog entries to the
coordinator bootstrap process. Previously, we were only sorting by
catalog IDs, but ID order doesn't always match dependency order
nowadays, and it is important that the entries are bootstrapped in
dependency order.

In the sorting we treat indexes specially, to ensure they get created as
early as possible. This makes sure any object depending on an indexed
relation is also able to make use of the index.

To avoid code duplication, this commit pulls out the existing topo sort
implementation from `apply.rs` into a util module and makes it generic.
@teskje teskje force-pushed the bootstrap-topo-sort branch from ecac1f1 to 11d9580 Compare February 26, 2026 15:30
@teskje
Copy link
Copy Markdown
Contributor Author

teskje commented Feb 26, 2026

TFTRs!

@teskje teskje merged commit e66b095 into MaterializeInc:main Feb 26, 2026
135 checks passed
@teskje teskje deleted the bootstrap-topo-sort branch February 26, 2026 17:08
@antiguru antiguru mentioned this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants