Skip to content

fix(embed): handle absolute file paths from native engine#780

Merged
carlos-alm merged 8 commits intomainfrom
fix/752-embed-absolute-paths
Apr 3, 2026
Merged

fix(embed): handle absolute file paths from native engine#780
carlos-alm merged 8 commits intomainfrom
fix/752-embed-absolute-paths

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

Closes #752.

  • buildEmbeddings in generator.ts unconditionally did path.join(rootDir, file) on file paths from the DB. When the native engine stores absolute paths, this produces doubled paths (H:/repo/H:/repo/src/cli.ts), causing every readFileSync to fail with ENOENT — 0 of 5,959 symbols get embedded
  • Same pattern existed in findReverseDependencies in detect-changes.ts
  • Fix: check path.isAbsolute() before joining with rootDir

Test plan

  • New regression test: inserts a node with an absolute file path, verifies buildEmbeddings succeeds
  • All existing search tests pass (69/69)
  • All detect-changes tests pass (4/4)

The embed command unconditionally joined rootDir with node.file from the
DB, producing doubled paths when the native engine stores absolute paths.
Check path.isAbsolute before joining. Same fix applied to detect-changes
reverse-dependency lookup. Regression test included.
@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR #780: fix(embed): handle absolute file paths from native engine

Todo List

  • Read and understand the problem being solved
  • Examine the code changes in detail
  • Review the regression test
  • Perform 5-phase review analysis
  • Provide final assessment and rating

Root Cause Analysis

Problem Being Solved: The native engine stores absolute file paths in the database, but buildEmbeddings() and findReverseDependencies() unconditionally join these paths with rootDir, creating invalid doubled paths like H:/repo/H:/repo/src/cli.ts. This causes all file reads to fail with ENOENT, resulting in 0 embeddings being generated out of 5,959 symbols.

Why This Approach: The fix correctly addresses the root cause by checking path.isAbsolute(file) before joining with rootDir. This ensures compatibility with both relative paths (WASM engine) and absolute paths (native engine) without breaking existing behavior.

Risk Assessment: Very low risk. The changes are minimal (2 one-line fixes), well-tested, and maintain backward compatibility.

Backlog Compliance

  • Zero-dep: ✅ — No new dependencies added, uses existing Node.js path module
  • Foundation-aligned: ✅ — Improves reliability of core embedding functionality, no principle violations
  • Problem-fit: 4/5 — Directly fixes a critical bug that prevents AI agents from getting embeddings when using the native engine, significantly impacting search and context quality
  • Breaking: No — Purely additive fix, maintains full backward compatibility
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Review

Changes Analyzed:

1. src/domain/search/generator.ts:103

const fullPath = path.isAbsolute(file) ? file : path.join(rootDir, file);

Excellent fix - properly handles both absolute and relative paths

2. src/domain/graph/builder/stages/detect-changes.ts:369

const absPath = path.isAbsolute(row.file) ? row.file : path.join(rootDir, row.file);

Good consistency - same pattern applied to reverse dependency detection

3. Test Coverage (tests/search/embedding-strategy.test.ts:292-323)

test('embeds symbols even when DB stores absolute file paths', async () => {
  // Creates DB with absolute file path, verifies buildEmbeddings succeeds
  const absFile = path.join(absDir, 'math.js').split(path.sep).join('/');
  insertNode(db, 'add', 'function', absFile, 1, 1);
  
  await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' });
  expect(EMBEDDED_TEXTS.length).toBe(1);
})

Perfect regression test - directly reproduces the bug scenario and verifies the fix

Strengths:

  • Minimal, targeted changes that address the exact problem
  • Consistent application of the fix pattern in both affected functions
  • Comprehensive regression test that simulates native engine behavior
  • No breaking changes to existing functionality
  • Maintains performance (no additional overhead for relative paths)

Minor Notes:

  • All existing search tests continue to pass (69/69)
  • All detect-changes tests pass (4/4)
  • Fix follows established patterns in the codebase
  • Cross-platform path handling is correctly implemented

Final Assessment

This is a high-quality bug fix that solves a critical issue preventing embeddings from working with the native engine. The implementation is minimal, correct, and well-tested. The regression test ensures this specific bug won't reoccur.

The PR demonstrates excellent engineering practices:

  • Root cause analysis led to the right fix
  • Consistent application across affected functions
  • Comprehensive test coverage
  • Zero risk to existing functionality

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent bug fix with perfect test coverage, minimal risk, and high impact on embedding reliability. The only reason this isn't 5 stars is that it's a bug fix rather than exceptional new functionality.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR fixes a doubled-path bug where the native engine stores absolute file paths in SQLite, but consuming code unconditionally did path.join(rootDir, file) — producing invalid paths like H:/repo/H:/repo/src/cli.ts and silently dropping all embeddings. The core generator.ts fix (already committed) and the detect-changes.ts fix (this diff) both apply path.isAbsolute() guards before joining with rootDir.

Key changes:

Remaining concern: purgeAndAddReverseDeps (line 408–411) still performs an unconditional path.join(rootDir, relPath) on the values returned from findReverseDependencies. When the native engine is active, those values are absolute paths, so the doubled-path is recreated there — causing every reverse-dep file to be pushed into ctx.parseChanges with a broken file path and silently skipped by the parse stage.

Confidence Score: 4/5

Safe to merge for the embedding fix, but the incremental-build reverse-dep path for the native engine still has the doubled-path bug in purgeAndAddReverseDeps.

The generator.ts and detect-changes.ts existence-check fixes are correct and the test coverage is solid. However, purgeAndAddReverseDeps at line 409 does path.join(rootDir, relPath) unconditionally after findReverseDependencies has added absolute native-engine paths to reverseDeps — reproducing the same class of bug in the downstream parse-changes push. This is a P1 defect on the native-engine incremental rebuild path.

src/domain/graph/builder/stages/detect-changes.ts — purgeAndAddReverseDeps needs the same path.isAbsolute guard applied to the parseChanges push loop.

Important Files Changed

Filename Overview
src/domain/graph/builder/stages/detect-changes.ts Fixes the fs.existsSync doubled-path bug in findReverseDependencies, but purgeAndAddReverseDeps still does unconditional path.join(rootDir, relPath) at line 409 — when the native engine returns absolute paths those are stored in reverseDeps and re-doubled there, breaking downstream re-parse of reverse-dep files.
tests/search/embedding-strategy.test.ts Corrects the issue number (#760#752), renames tests to distinguish strategies, and adds a second regression test for the source strategy — both structured and source paths are now explicitly covered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Native DB returns file paths] --> B{path.isAbsolute?}
    B -- Yes --> C[Use path as-is for fs.existsSync ✅]
    B -- No --> D[path.join rootDir + path for fs.existsSync ✅]
    C --> E[reverseDeps.add absolute path]
    D --> F[reverseDeps.add relative path]
    E --> G[purgeAndAddReverseDeps]
    F --> G
    G --> H["path.join(rootDir, relPath) ← no isAbsolute guard ❌"]
    H -- absolute relPath --> I["Doubled path: /repo//repo/src/file.ts"]
    H -- relative relPath --> J["Correct: /repo/src/file.ts"]
    I --> K[ctx.parseChanges.push broken file path]
    J --> L[ctx.parseChanges.push correct file path]
Loading

Reviews (2): Last reviewed commit: "fix: apply isAbsolute guard to native fa..." | Re-trigger Greptile

Comment on lines +316 to +322
test('embeds symbols even when DB stores absolute file paths', async () => {
EMBEDDED_TEXTS.length = 0;
await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' });

expect(EMBEDDED_TEXTS.length).toBe(1);
expect(EMBEDDED_TEXTS[0]).toContain('add');
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test only covers source strategy, not structured

The new regression test exercises buildEmbeddings with { strategy: 'source' } only. The structured strategy also calls buildStructuredText(node, file, lines, db), and the file argument passed there is the raw path from the DB (not fullPath). While the readFileSync bug is fixed (since fullPath is now resolved correctly either way), it would be worth adding a second assertion with { strategy: 'structured' } to confirm end-to-end coverage for both strategies given this was a 0-embedding regression.

test('embeds symbols even when DB stores absolute file paths', async () => {
    EMBEDDED_TEXTS.length = 0;
    await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' });
    expect(EMBEDDED_TEXTS.length).toBe(1);
    expect(EMBEDDED_TEXTS[0]).toContain('add');

+   // Also verify structured strategy works with absolute paths
+   EMBEDDED_TEXTS.length = 0;
+   await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'structured' });
+   expect(EMBEDDED_TEXTS.length).toBe(1);
  });

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e44ad60. The default strategy (when none is passed) is structured, so the existing test already covered that path. Added an explicit second test case for source strategy so both are now clearly tested: produces embeddings when DB stores absolute paths (structured) and produces embeddings when DB stores absolute paths (source). All 22 tests pass.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed in e44ad60:

P1 — Native engine fast-path not patched: Applied the same path.isAbsolute(dep) ? dep : path.join(rootDir, dep) guard to the native fast-path in findReverseDependencies (line 352). Both branches now handle absolute paths correctly.

P2 — Test only covers source strategy: The existing test (no strategy option) already tests structured (the default). Added an explicit source strategy test case so both strategies are covered. 22/22 tests pass.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit d9ed474 into main Apr 3, 2026
13 checks passed
@carlos-alm carlos-alm deleted the fix/752-embed-absolute-paths branch April 3, 2026 06:14
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(embed): double-path bug prevents embedding storage on native builds

1 participant