Skip to content

Conversation

@0xMink
Copy link
Contributor

@0xMink 0xMink commented Feb 11, 2026

Closes #10715

Summary

  • When the code parser splits a node exceeding MAX_BLOCK_CHARS, small children (export keyword, class/function name) fall below MIN_BLOCK_CHARS and are silently discarded, causing identifiers to disappear from the search index
  • Add a split-local sibling accumulator (_expandChildrenWithAccumulator) that prepends small children to the next large sibling as a synthetic QueueItem, and appends trailing small children to the last large sibling
  • Large children with no pending small siblings are pushed as raw Nodes to preserve recursive splitting of deeply nested structures
  • Uses space-separated joins to preserve readable whitespace between accumulated tokens
  • Trailing small siblings after a raw Node are dropped (rather than converting the Node to a QueueItem that would block recursion) since they are typically closing delimiters with no semantic value

Test plan

  • Existing: 36 tests pass (vitest run services/code-index/processors/__tests__/parser.spec.ts)
  • New: should preserve class/function identifier when splitting a large node
  • New: should not contaminate across unrelated parent nodes
  • New: should allow recursive splitting of large children with no pending parts
  • New: should preserve recursive splitting even with trailing small siblings
  • New: should append trailing small siblings to the last large sibling
  • Total: 41

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Feb 11, 2026
@roomote
Copy link
Contributor

roomote bot commented Feb 11, 2026

Rooviewer Clock   See task

Both previously flagged issues are resolved in this revision. No new issues found.

  • Wrapping large children as QueueItem when no pending parts exist prevents recursive splitting of deeply nested nodes (behavioral regression)
  • pendingParts.join("") concatenates sibling texts without whitespace separators, producing garbled output like exportclassTestParser
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

Comment on lines 330 to 340
} else {
const wrapped: QueueItem = {
kind: "queue_item",
text: text,
startRow: startRow,
endRow: endRow,
}
queue.push(wrapped)
lastQueued = wrapped
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: when a large child has no pending small siblings, it gets wrapped as a QueueItem here. Back in the main loop (line 212), !isQueueItem(currentItem) is false for QueueItems, so the child's own tree-sitter children are never examined for recursive splitting -- it falls straight through to line-based chunking. This means a class_body node containing multiple methods will no longer have those methods extracted as individual semantic blocks; instead they get chunked by arbitrary line boundaries. Consider pushing the raw Node when there are no pending parts, so the main loop can still recurse into its children.

Fix it with Roo Code or mention @roomote and request a fix.

if (pendingParts.length > 0) {
const combined: QueueItem = {
kind: "queue_item",
text: pendingParts.join("") + text,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: pendingParts.join("") concatenates sibling texts with no separator. Tree-sitter child nodes don't include inter-node whitespace, so accumulating ["export", "class", "TestParser"] produces "exportclassTestParser" rather than "export class TestParser". The same issue applies to join("") on line 344 (tail handling). The existing test doesn't catch this because it only checks includes("TestParser"), which matches as a substring of the garbled concatenation. Consider using join(" ") or join("\n") to preserve readable separation.

Suggested change
text: pendingParts.join("") + text,
text: pendingParts.join(" ") + " " + text,

Fix it with Roo Code or mention @roomote and request a fix.

@0xMink 0xMink force-pushed the fix/parser-10715-split-signature-accumulator branch 2 times, most recently from 19f6478 to 4db7f10 Compare February 11, 2026 06:54
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Feb 11, 2026
// can still recurse into its children for deeper splitting.
queue.push(child)
lastQueued = null
lastNodeIndex = queue.length - 1
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Large children with no pending small siblings are now pushed as raw Node objects so the main loop can recurse into their children. When trailing small siblings follow a raw Node, they are dropped rather than converting the Node to a QueueItem (which would block recursion). Added test: "should preserve recursive splitting even with trailing small siblings".

}

// Large child: prepend any accumulated pending text
if (pendingParts.length > 0) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed pendingParts.join("") to join(" ") with " " + separators at all concatenation points. Test 1 now explicitly asserts "export class TestParser" (proper spacing) and rejects "exportclassTestParser" (garbled).

…CodeInc#10715)

When the code parser splits a node that exceeds MAX_BLOCK_CHARS, small
children (export keyword, class/function name, etc.) fall below
MIN_BLOCK_CHARS and are silently discarded.  This causes function and
class names to disappear from the search index.

Add a split-local sibling accumulator that prepends small children to
the next large sibling as a synthetic QueueItem, and appends trailing
small children to the last large sibling.  Only the split path is
affected; main-loop discard behaviour for non-split nodes is unchanged.

Closes RooCodeInc#10715
@0xMink 0xMink force-pushed the fix/parser-10715-split-signature-accumulator branch from 4db7f10 to dc9cd4c Compare February 11, 2026 07:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Code indexing chunker drops function names when body is large

1 participant