-
Notifications
You must be signed in to change notification settings - Fork 2.9k
fix(code-index): preserve identifiers when splitting large nodes #11396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(code-index): preserve identifiers when splitting large nodes #11396
Conversation
Both previously flagged issues are resolved in this revision. No new issues found.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
| } else { | ||
| const wrapped: QueueItem = { | ||
| kind: "queue_item", | ||
| text: text, | ||
| startRow: startRow, | ||
| endRow: endRow, | ||
| } | ||
| queue.push(wrapped) | ||
| lastQueued = wrapped | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: when a large child has no pending small siblings, it gets wrapped as a QueueItem here. Back in the main loop (line 212), !isQueueItem(currentItem) is false for QueueItems, so the child's own tree-sitter children are never examined for recursive splitting -- it falls straight through to line-based chunking. This means a class_body node containing multiple methods will no longer have those methods extracted as individual semantic blocks; instead they get chunked by arbitrary line boundaries. Consider pushing the raw Node when there are no pending parts, so the main loop can still recurse into its children.
Fix it with Roo Code or mention @roomote and request a fix.
| if (pendingParts.length > 0) { | ||
| const combined: QueueItem = { | ||
| kind: "queue_item", | ||
| text: pendingParts.join("") + text, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: pendingParts.join("") concatenates sibling texts with no separator. Tree-sitter child nodes don't include inter-node whitespace, so accumulating ["export", "class", "TestParser"] produces "exportclassTestParser" rather than "export class TestParser". The same issue applies to join("") on line 344 (tail handling). The existing test doesn't catch this because it only checks includes("TestParser"), which matches as a substring of the garbled concatenation. Consider using join(" ") or join("\n") to preserve readable separation.
| text: pendingParts.join("") + text, | |
| text: pendingParts.join(" ") + " " + text, |
Fix it with Roo Code or mention @roomote and request a fix.
19f6478 to
4db7f10
Compare
| // can still recurse into its children for deeper splitting. | ||
| queue.push(child) | ||
| lastQueued = null | ||
| lastNodeIndex = queue.length - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Large children with no pending small siblings are now pushed as raw Node objects so the main loop can recurse into their children. When trailing small siblings follow a raw Node, they are dropped rather than converting the Node to a QueueItem (which would block recursion). Added test: "should preserve recursive splitting even with trailing small siblings".
| } | ||
|
|
||
| // Large child: prepend any accumulated pending text | ||
| if (pendingParts.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Changed pendingParts.join("") to join(" ") with " " + separators at all concatenation points. Test 1 now explicitly asserts "export class TestParser" (proper spacing) and rejects "exportclassTestParser" (garbled).
…CodeInc#10715) When the code parser splits a node that exceeds MAX_BLOCK_CHARS, small children (export keyword, class/function name, etc.) fall below MIN_BLOCK_CHARS and are silently discarded. This causes function and class names to disappear from the search index. Add a split-local sibling accumulator that prepends small children to the next large sibling as a synthetic QueueItem, and appends trailing small children to the last large sibling. Only the split path is affected; main-loop discard behaviour for non-split nodes is unchanged. Closes RooCodeInc#10715
4db7f10 to
dc9cd4c
Compare
Closes #10715
Summary
MAX_BLOCK_CHARS, small children (export keyword, class/function name) fall belowMIN_BLOCK_CHARSand are silently discarded, causing identifiers to disappear from the search index_expandChildrenWithAccumulator) that prepends small children to the next large sibling as a syntheticQueueItem, and appends trailing small children to the last large siblingTest plan
vitest run services/code-index/processors/__tests__/parser.spec.ts)