Add optional traversal limit for commit offset lookup#4007
Add optional traversal limit for commit offset lookup#4007NssGourav wants to merge 2 commits intoapache:mainfrom
Conversation
|
A few things worth addressing like the limit is still inside CommitsImpl, but @snazy mentioned the TODO was meant for the call sites (REST/service layer), not the implementation itself worth clarifying with him before going further. There's also a counter bug where commitLog and commitLogReversed count traversals differently since traversed starts at 0 in one and 1 in the other, so the same maxTraversal value behaves inconsistently. On the exception side, IllegalStateException feels off for an expected operational limit a dedicated exception would be easier to catch specifically. For the tests, the .hasNext() in the commitLog test is misleading since the exception fires during the commitLog() call itself, not lazily, and there's also no test for the success case where the offset is found within the limit. Overall the approach is reasonable, just needs a quick sync with the maintainer on the architectural question first. |
|
Thanks agreed on the feedback. I understand the TODOs may be intended for call-site guardrails; I implemented an opt-in safeguard in Fixed traversal counting so commitLog and commitLogReversed behave consistently. Updated tests to assert at the right point and added success path coverage. |
|
@dimas-b can u please review this pr. |
|
The idea looks reasonable to me at first glance... Should the PR be non-draft for a wide review, though 😉 Please also rebase (just to have a recent base commit) and fix CI :) |
| if (tailId == offset) { | ||
| break outer; | ||
| } | ||
| if (maxTraversal != UNLIMITED_TRAVERSAL && ++traversed > maxTraversal) { |
There was a problem hiding this comment.
nit: please make the ++ operation more prominent (for ease of following the logic), e.g. do it on a separate line.
| throw new CommitOffsetTraversalLimitExceededException( | ||
| refName, offset, traversed, maxTraversal); | ||
| } | ||
| visited.add(tailId); |
There was a problem hiding this comment.
isn't traversed equivalent to visited.size()?
| * <p>A value of {@code 0} disables the safeguard (unlimited traversal). | ||
| */ | ||
| @WithDefault("0") | ||
| long commitOffsetLookupMaxTraversal(); |
| var type = head.type().id(); | ||
|
|
||
| // TODO add safeguard to limit the work done when finding the commit with ID 'offset' | ||
| var maxTraversal = persistence.params().commitOffsetLookupMaxTraversal(); |
There was a problem hiding this comment.
The guardrail in this case is different in nature from commitLogReversed. Here we do not accumulate memory, AFAIK. It might be preferable to use different config settings for these cases.
Also, commitLogReversed is not called in "prod" code ATM.
|
|
||
| // Only walk, if the most recent commit ID is != offset | ||
| if (head.id() == off) { | ||
| return singletonList(head).iterator(); |
|
@NssGourav Could you please take a look at the recent review comments from @dimas-b and apply them to the current code? Also it looks like there are a couple of CI checks failing right now that will need to be fixed. Let us know if you need any help! |
6dc6db0 to
3166814
Compare
3166814 to
fdb5ce2
Compare
|
@NssGourav : Please rebase this PR to get recent CI job changes. |
Introduce a PersistenceParams knob to bound offset resolution in CommitsImpl and fail fast when exceeded, with coverage verifying the safeguard for both commitLog variants. Made-with: Cursor
Use a dedicated exception for limit exceedance, make traversal counting consistent across log variants, and expand tests to cover both fail-fast and success paths. Made-with: Cursor
6dc6db0 to
b626841
Compare
|
I'm still a bit concerned about adding a global setting that trickles down to all call sites (currently only maintenance). |
|
Thanks, that makes sense, a global traversal limit in persistence can definitely become confusing if maintenance already has a separate limit, and I agree that would be the wrong layering. I’ll revise the PR to avoid introducing a global setting here and keep the safeguard scoped to the specific caller/use case instead, then I’ll rerun CI and update the PR. |
Summary
polaris.persistence.commit-offset-lookup-max-traversalparameter (default0= unlimited) to optionally bound commit history traversal while resolving offsets.CommitsImpl.commitLog()andCommitsImpl.commitLogReversed()during offset lookup, failing fast with a clear exception when exceeded.Rationale
CommitsImplcurrently walks commit history unbounded when resolving offsets. In pathological cases (very large histories or missing/far-back offsets) this can lead to unbounded work.The safeguard is opt-in (default unlimited) to avoid breaking background/maintenance flows that may legitimately traverse the full log.
Test plan
./gradlew :persistence:nosql:persistence-impl:testFixes #4000