fix(embed): handle absolute file paths from native engine#780
fix(embed): handle absolute file paths from native engine#780carlos-alm merged 8 commits intomainfrom
Conversation
The embed command unconditionally joined rootDir with node.file from the DB, producing doubled paths when the native engine stores absolute paths. Check path.isAbsolute before joining. Same fix applied to detect-changes reverse-dependency lookup. Regression test included.
|
Claude finished @carlos-alm's task —— View job Reviewing PR #780: fix(embed): handle absolute file paths from native engineTodo List
Root Cause AnalysisProblem Being Solved: The native engine stores absolute file paths in the database, but Why This Approach: The fix correctly addresses the root cause by checking Risk Assessment: Very low risk. The changes are minimal (2 one-line fixes), well-tested, and maintain backward compatibility. Backlog Compliance
Code Quality ReviewChanges Analyzed:1. const fullPath = path.isAbsolute(file) ? file : path.join(rootDir, file);✅ Excellent fix - properly handles both absolute and relative paths 2. const absPath = path.isAbsolute(row.file) ? row.file : path.join(rootDir, row.file);✅ Good consistency - same pattern applied to reverse dependency detection 3. Test Coverage ( test('embeds symbols even when DB stores absolute file paths', async () => {
// Creates DB with absolute file path, verifies buildEmbeddings succeeds
const absFile = path.join(absDir, 'math.js').split(path.sep).join('/');
insertNode(db, 'add', 'function', absFile, 1, 1);
await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' });
expect(EMBEDDED_TEXTS.length).toBe(1);
})✅ Perfect regression test - directly reproduces the bug scenario and verifies the fix Strengths:
Minor Notes:
Final AssessmentThis is a high-quality bug fix that solves a critical issue preventing embeddings from working with the native engine. The implementation is minimal, correct, and well-tested. The regression test ensures this specific bug won't reoccur. The PR demonstrates excellent engineering practices:
Final Recommendation
|
Greptile SummaryThis PR fixes a doubled-path bug where the native engine stores absolute file paths in SQLite, but consuming code unconditionally did Key changes:
Remaining concern: Confidence Score: 4/5Safe to merge for the embedding fix, but the incremental-build reverse-dep path for the native engine still has the doubled-path bug in The src/domain/graph/builder/stages/detect-changes.ts — Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Native DB returns file paths] --> B{path.isAbsolute?}
B -- Yes --> C[Use path as-is for fs.existsSync ✅]
B -- No --> D[path.join rootDir + path for fs.existsSync ✅]
C --> E[reverseDeps.add absolute path]
D --> F[reverseDeps.add relative path]
E --> G[purgeAndAddReverseDeps]
F --> G
G --> H["path.join(rootDir, relPath) ← no isAbsolute guard ❌"]
H -- absolute relPath --> I["Doubled path: /repo//repo/src/file.ts"]
H -- relative relPath --> J["Correct: /repo/src/file.ts"]
I --> K[ctx.parseChanges.push broken file path]
J --> L[ctx.parseChanges.push correct file path]
Reviews (2): Last reviewed commit: "fix: apply isAbsolute guard to native fa..." | Re-trigger Greptile |
| test('embeds symbols even when DB stores absolute file paths', async () => { | ||
| EMBEDDED_TEXTS.length = 0; | ||
| await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' }); | ||
|
|
||
| expect(EMBEDDED_TEXTS.length).toBe(1); | ||
| expect(EMBEDDED_TEXTS[0]).toContain('add'); | ||
| }); |
There was a problem hiding this comment.
Test only covers
source strategy, not structured
The new regression test exercises buildEmbeddings with { strategy: 'source' } only. The structured strategy also calls buildStructuredText(node, file, lines, db), and the file argument passed there is the raw path from the DB (not fullPath). While the readFileSync bug is fixed (since fullPath is now resolved correctly either way), it would be worth adding a second assertion with { strategy: 'structured' } to confirm end-to-end coverage for both strategies given this was a 0-embedding regression.
test('embeds symbols even when DB stores absolute file paths', async () => {
EMBEDDED_TEXTS.length = 0;
await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'source' });
expect(EMBEDDED_TEXTS.length).toBe(1);
expect(EMBEDDED_TEXTS[0]).toContain('add');
+ // Also verify structured strategy works with absolute paths
+ EMBEDDED_TEXTS.length = 0;
+ await buildEmbeddings(absDir, 'minilm', absDbPath, { strategy: 'structured' });
+ expect(EMBEDDED_TEXTS.length).toBe(1);
});There was a problem hiding this comment.
Fixed in e44ad60. The default strategy (when none is passed) is structured, so the existing test already covered that path. Added an explicit second test case for source strategy so both are now clearly tested: produces embeddings when DB stores absolute paths (structured) and produces embeddings when DB stores absolute paths (source). All 22 tests pass.
|
Addressed in e44ad60: P1 — Native engine fast-path not patched: Applied the same P2 — Test only covers source strategy: The existing test (no strategy option) already tests |
…ave/ops-codegraph-tool into fix/752-embed-absolute-paths
Summary
Closes #752.
buildEmbeddingsingenerator.tsunconditionally didpath.join(rootDir, file)on file paths from the DB. When the native engine stores absolute paths, this produces doubled paths (H:/repo/H:/repo/src/cli.ts), causing everyreadFileSyncto fail with ENOENT — 0 of 5,959 symbols get embeddedfindReverseDependenciesindetect-changes.tspath.isAbsolute()before joining with rootDirTest plan
buildEmbeddingssucceeds