Skip to content

ScheduledMerges: migrate union level early (as soon as merging tree is completed)#824

Merged
mheinzel merged 3 commits intomainfrom
mheinzel/prototype-migrate-union-level-early
Apr 16, 2026
Merged

ScheduledMerges: migrate union level early (as soon as merging tree is completed)#824
mheinzel merged 3 commits intomainfrom
mheinzel/prototype-migrate-union-level-early

Conversation

@mheinzel
Copy link
Copy Markdown
Collaborator

@mheinzel mheinzel commented Mar 4, 2026

When creating the union of multiple tables, we create a special union level containing a merging tree. Once the tree has been completed, i.e. merged into a single run, we want to get rid of the union level by migrating the run to the regular levels. The main motivation is that we want it to become part of a last level merge, so compaction can occur. Last level merges are especially useful for compaction, since they can drop all delete entries.

This PR demonstrates one of two approaches for migration (#822 being the other one). As soon as the merging tree gets completed through supplyUnionCredits, we migrate it to a new last regular level. Sometimes it would not fit directly after the existing levels, in which case we can insert a few empty levels.

An interesting case to consider it that a union can get completed by an operation on another table that contains a shared reference to the merging tree. Therefore, at the beginning of any operation, a completed but not yet migrated union might be present. To ensure unions can become part of merges as soon as possible, we also check for possible migrations when flushing the write buffer. It might be beneficial to do the same when doing lookups.

@jorisdral
Copy link
Copy Markdown
Collaborator

Over other channels we agreed we would try the migrate-early approach in this PR, rather than the migrate-late approach from #822

@mheinzel mheinzel force-pushed the mheinzel/prototype-empty-levels branch 2 times, most recently from 2591aff to 1bd2cdd Compare March 24, 2026 17:16
@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch from 09072bb to cb45455 Compare March 24, 2026 19:37
Base automatically changed from mheinzel/prototype-empty-levels to main March 24, 2026 19:40
@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch from cb45455 to 1b77fe6 Compare March 25, 2026 13:13
@mheinzel mheinzel marked this pull request as ready for review March 25, 2026 13:14
@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch from 1b77fe6 to b53734f Compare March 25, 2026 13:23
Copy link
Copy Markdown
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this take shape 😄

Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch 2 times, most recently from 198da6b to 16e0b00 Compare April 2, 2026 13:14
Copy link
Copy Markdown
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good -- the only thing we should maybe reconsider a little bit more is how we handle empty union results

Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs
Comment on lines +1267 to +1270
-- In some cases we could even add it as one of the tiering runs of
-- the existing last regular level, but the difference is small and
-- we would need to relax the invariant that the last level uses
-- levelling.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I'm not sure I fully understand. When would it be added as a tiering run?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before migration, there usually is a last regular level, which uses tiering. If the union level is small enough, we could theoretically add it as a run into that tiering level (instead of creating a new levelling level for it), but that clashes with our current assertions about merge policy of course.

Maybe I should just remove the comment. I was trying to answer the question a reader might have: "Why do we always have to create a new level just for this union run, even if it's small?" But maybe that causes more confusion than it prevents.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first paragraph you wrote already cleared up my confusion. Just add some of those detail to the haddocks and then we should be fine. Maybe something along the lines of:

Before migrating the run that resulted from the union into the regular levels, there is usually a last "regular" level, which uses tiering. We could add the union-result run to the resident tiering runs of the last regular level, but that clashes with some invariants. In any case, the difference is small and we would need to relax the invariant that the last level uses levelling.

Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/src-prototypes/ScheduledMerges.hs Outdated
Comment thread lsm-tree/test-prototypes/Test/ScheduledMergesDL.hs
@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch from 16e0b00 to 051001b Compare April 7, 2026 10:32
Copy link
Copy Markdown
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Were there any changes since last time I reviewed that you think I should look at? If not, let's resolve the final few review comments, and then we can put this PR on the merge queue

@mheinzel
Copy link
Copy Markdown
Collaborator Author

Were there any changes since last time I reviewed that you think I should look at?

No, I only slightly tweaked the comments: https://github.com/IntersectMBO/lsm-tree/compare/16e0b0070478b3b97f6d5a37e1aebd59a44d7f81..051001bb5634e1166f386d1c16fe6c406027ee4e

So I will make some last documentation improvements as discussed, rebase and merge.

@mheinzel mheinzel force-pushed the mheinzel/prototype-migrate-union-level-early branch from 051001b to 2671565 Compare April 16, 2026 15:58
@mheinzel mheinzel enabled auto-merge April 16, 2026 16:12
@mheinzel mheinzel added this pull request to the merge queue Apr 16, 2026
Merged via the queue into main with commit f620800 Apr 16, 2026
32 checks passed
@mheinzel mheinzel deleted the mheinzel/prototype-migrate-union-level-early branch April 16, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants