-
Notifications
You must be signed in to change notification settings - Fork 96
Description
Summary
parse_generic_imports in internal/cbm/extract_imports.c iterates top-level AST nodes using indexed access, which is O(N²) due to how tree-sitter implements child lookup internally.
Root Cause
uint32_t count = ts_node_child_count(ctx->root);
for (uint32_t i = 0; i < count; i++) {
TSNode node = ts_node_child(ctx->root, i); // O(i) per callts_node_child(node, i) is not O(1) random access — it walks the child linked list from index 0 to i on every call via ts_node_child_with_descendant() → ts_node_child_iterator_next. For a root with N children this is O(N²) total.
ts_node_next_sibling() has the same problem: it also calls ts_node_child_with_descendant() internally.
Impact
On repos with generated TypeScript .d.ts files or large Perl files, the AST root node can have 5,000–50,000 top-level children. At 10,000 children: ~50M iterator steps, causing the indexer to spin at 100% CPU indefinitely until killed.
Observed on a monorepo with 6 submodules containing TypeScript and Perl. Both full mode and fast mode hang. The process never completes.
CPU profile stack (from sample(1)):
cbm_pipeline_pass_semantic
→ cbm_extract_file
→ parse_generic_imports
→ ts_node_child / ts_node_next_sibling
→ ts_node_child_with_descendant
→ ts_node_child_iterator_next ← spinning here
Fix
Use TSTreeCursor for sibling traversal. ts_tree_cursor_goto_next_sibling() maintains cursor state across calls and is O(1) per step:
TSTreeCursor cursor = ts_tree_cursor_new(ctx->root);
if (ts_tree_cursor_goto_first_child(&cursor)) {
do {
TSNode node = ts_tree_cursor_current_node(&cursor);
// ... existing match logic unchanged ...
} while (ts_tree_cursor_goto_next_sibling(&cursor));
}
ts_tree_cursor_delete(&cursor);Total complexity: O(N). At 10,000 children: 10,000 steps vs ~50M.
Verification
After fix: a previously-hanging monorepo with 6 submodules (TypeScript, Perl, Java, HTML, CSS) indexed successfully in 71 seconds — 142,908 nodes, 218,685 edges.