Skip to content

perf: forward langId hint to native standalone analysis functions#743

Merged
carlos-alm merged 5 commits intomainfrom
perf/forward-langid-to-native-analysis
Apr 2, 2026
Merged

perf: forward langId hint to native standalone analysis functions#743
carlos-alm merged 5 commits intomainfrom
perf/forward-langid-to-native-analysis

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Pass the JS-resolved langId as an explicit Option<String> to the three native standalone analysis functions (analyzeComplexity, buildCfgAnalysis, extractDataflowAnalysis)
  • Rust uses the hint as the primary language, falling back to extension detection only when None
  • Files whose language is inferred by content (e.g. .vue tagged as "javascript", extensionless shebang files) now use the native Rust path instead of silently falling back to WASM

Changes

File Change
crates/codegraph-core/src/analysis.rs parse_source accepts lang_id: Option<&str>, tries from_lang_id first then from_extension
crates/codegraph-core/src/lib.rs Three #[napi] wrappers accept lang_id: Option<String>
src/types.ts NativeAddon signatures updated with optional langId parameter
src/ast-analysis/engine.ts runNativeAnalysis forwards already-resolved langId to all three native calls

Test plan

  • TypeScript compiles cleanly (tsc --noEmit)
  • All 560 integration tests pass
  • Verify native path is taken for .vue files with langId: "javascript" (manual or future test)

Closes #739

Pass the JS-resolved langId as an explicit Option<String> to the three
native standalone analysis functions (analyzeComplexity, buildCfgAnalysis,
extractDataflowAnalysis). Rust now uses the hint as the primary language,
falling back to extension detection only when None.

This ensures files whose language is inferred by content rather than
extension (e.g. .vue files tagged as "javascript", extensionless shebang
files) use the native Rust path instead of silently falling back to WASM.

Closes #739
@claude
Copy link
Copy Markdown

claude bot commented Apr 2, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Reviewing PR #743: perf: forward langId hint to native standalone analysis functions

Root Cause Analysis

Problem Being Solved: The native Rust standalone analysis functions (analyzeComplexity, buildCfgAnalysis, extractDataflowAnalysis) only used file extension for language detection, causing files with content-inferred languages (e.g., .vue files tagged as "javascript", extensionless shebang files) to fall back to WASM instead of using the faster native path.

Why This Approach: The JS engine already resolves langId through content analysis. By forwarding this resolved hint to the native functions, files that would otherwise be unrecognized by extension can use the native Rust path, improving performance.

Risk Assessment: Low risk. This is a pure optimization that maintains backward compatibility via fallback behavior.

Backlog Compliance

  • Zero-dep: ✓ — No new runtime dependencies
  • Foundation-aligned: ✓ — Improves performance while maintaining offline-first architecture
  • Problem-fit: 4 — Directly improves performance for AI agents by reducing WASM fallbacks
  • Breaking: No — Purely additive change with backward compatibility
  • Tier: 1

Code Quality Assessment

Rust Implementation (crates/codegraph-core/)

Strengths:

  • Clean separation of concerns in parse_source() function (analysis.rs:51-63)
  • Proper fallback chain: lang_id hint → extension detection → None
  • Consistent error handling with proper Option types
  • Good documentation explaining the .vue file use case

Observations:

  • Uses unwrap() in test code only (acceptable pattern)
  • No unsafe code or panic paths in the core implementation
  • Proper napi-rs FFI boundary handling

TypeScript Implementation (src/)

Strengths:

  • Clean type signatures with optional langId parameter
  • Minimal surface area change (3 function signatures)
  • Proper forwarding of already-resolved language hints
  • No breaking changes to existing callers

Performance Impact

  • Positive: Files like .vue with langId: "javascript" now use native path instead of WASM
  • No regression: Extension-based detection unchanged for existing files
  • Fallback preserved: WASM still available when native can't handle the language

Critical Concerns

None identified. The implementation is solid and follows established patterns.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE WITH CONDITIONS
  • Reasoning: Well-implemented performance optimization with clear benefits and no breaking changes. Missing manual test verification for the stated .vue file improvement.

Conditions:

  1. Verify that .vue files with langId: "javascript" actually use the native path (mentioned in test plan but not implemented)
  2. Consider adding a targeted test case for the specific use case this PR addresses

This is a clean, focused performance improvement that addresses a real gap in language detection without introducing risk or complexity.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR threads a JS-resolved langId hint through the three native standalone analysis functions (analyzeComplexity, buildCfgAnalysis, extractDataflowAnalysis) so that files whose language is identified by content rather than extension (e.g. .vue tagged as "javascript", extensionless shebang scripts) now correctly take the native Rust path instead of silently falling back to WASM.

  • crates/codegraph-core/src/analysis.rsparse_source now accepts lang_id: Option<&str>, attempting LanguageKind::from_lang_id first and falling back to from_extension only when None. All three standalone analysis functions (analyze_complexity_standalone, build_cfg_standalone, extract_dataflow_standalone) are updated accordingly.
  • crates/codegraph-core/src/lib.rs — The three #[napi] wrappers each gain a lang_id: Option<String> parameter and forward it via .as_deref().
  • src/types.tsNativeAddon interface updated with the optional langId?: string | null parameter on all three methods, consistent with napi-rs Option<String> marshalling.
  • src/ast-analysis/engine.tsrunNativeAnalysis adds langSupports* variables combining the extension-based and rule-based checks, and passes langId to all three native calls. The WASM path (ensureWasmTreesIfNeeded and visitor setup) is also updated to check langId-based rules alongside extension sets. The now-redundant CFG_EXTENSIONS.has(ext) and DATAFLOW_EXTENSIONS.has(ext) guards inside WASM visitor setup functions are removed since the upstream needsCfg/needsDataflow gates already enforce them via the combined rule check.
  • Documentation, CHANGELOG.md, ROADMAP.md, package.json, and Cargo.toml are updated to reflect the v3.8.0 release and Phase 7 completion.

Confidence Score: 5/5

  • This PR is safe to merge — changes are a clean, consistent plumbing of an already-resolved value with correct fallback semantics and no logic regressions.
  • No P0 or P1 issues found. The fallback chain (from_lang_idfrom_extension) is correctly ordered. The napi-rs Option<String> / TypeScript string | null | undefined boundary is handled idiomatically via .as_deref(). Removal of the redundant extension guards from the WASM visitor functions is safe because the upstream needsCfg/needsDataflow gates already apply the combined rule+extension check. All 560 integration tests pass per the PR description.
  • No files require special attention.

Important Files Changed

Filename Overview
crates/codegraph-core/src/analysis.rs Adds lang_id: Option<&str> to parse_source and all three standalone analysis functions; fallback chain (from_lang_idfrom_extension) is logically correct
crates/codegraph-core/src/lib.rs Three #[napi] wrappers updated to accept lang_id: Option<String> and forward it via .as_deref() — idiomatic Rust, no issues
src/ast-analysis/engine.ts Forwards langId to all three native calls, replaces pure extension checks with `ext
src/types.ts NativeAddon interface updated with optional `langId?: string
crates/codegraph-core/Cargo.toml Version bumped 3.7.0 → 3.8.0 in sync with package.json
package.json Version bump 3.7.0 → 3.8.0 with new tree-sitter grammar devDependencies added for batch 3/4 languages

Sequence Diagram

sequenceDiagram
    participant E as engine.ts (runNativeAnalysis)
    participant N as NativeAddon (napi-rs)
    participant L as lib.rs (#[napi] wrapper)
    participant A as analysis.rs (parse_source)
    participant TS as tree-sitter

    E->>E: resolve langId from JS engine<br/>(e.g. "javascript" for .vue)
    E->>E: langSupports* = EXTENSIONS.has(ext) || RULES.has(langId)
    alt needsComplexity
        E->>N: analyzeComplexity(source, absPath, langId)
        N->>L: analyze_complexity(source, file_path, Some("javascript"))
        L->>A: parse_source(source, path, Some("javascript"))
        A->>A: LanguageKind::from_lang_id("javascript") → Some(JS)
        A->>TS: parse with JS grammar
        TS-->>A: Tree
        A-->>L: Some((Tree, JS))
        L-->>N: Vec<FunctionComplexityResult>
        N-->>E: results
    end
    alt lang_id is None (extension-only file)
        E->>N: analyzeComplexity(source, absPath, langId)
        N->>L: analyze_complexity(source, file_path, None)
        L->>A: parse_source(source, path, None)
        A->>A: from_lang_id(None) → None<br/>fallback: LanguageKind::from_extension(path)
        A->>TS: parse with detected grammar
        TS-->>A: Tree
        A-->>L: Some((Tree, Lang))
        L-->>N: Vec<FunctionComplexityResult>
        N-->>E: results
    end
Loading

Reviews (2): Last reviewed commit: "fix: use langId fallback in extension-se..." | Re-trigger Greptile

Comment on lines +7507 to +7527
"peerDependencies": {
"tree-sitter": "^0.22.1"
},
"peerDependenciesMeta": {
"tree-sitter": {
"optional": true
}
}
},
"node_modules/tree-sitter-objc/node_modules/tree-sitter-c": {
"version": "0.23.6",
"resolved": "https://registry.npmjs.org/tree-sitter-c/-/tree-sitter-c-0.23.6.tgz",
"integrity": "sha512-0dxXKznVyUA0s6PjNolJNs2yF87O5aL538A/eR6njA5oqX3C3vH4vnx3QdOKwuUdpKEcFdHuiDpRKLLCA/tjvQ==",
"dev": true,
"hasInstallScript": true,
"license": "MIT",
"dependencies": {
"node-addon-api": "^8.3.0",
"node-gyp-build": "^4.8.4"
},
"peerDependencies": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 yarn pulled in as a runtime dependency via tree-sitter-solidity

tree-sitter-solidity@1.2.13 declares yarn@^1.22.21 as a runtime (non-dev) dependency, which causes yarn@1.22.22 to be installed into node_modules. This is unusual for a grammar package and inflates install size noticeably (yarn is a full package manager). While it doesn't affect correctness today, it is worth confirming this is acceptable before shipping — a future version of tree-sitter-solidity that fixes this packaging quirk would allow the dependency to be dropped.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — the tree-sitter-solidity pulling in yarn as a runtime dep is a known upstream packaging quirk. This is unrelated to the changes in this PR (it comes from the lockfile reflecting existing devDeps). Filed as a separate concern to track.

… files (#743)

The needsComplexity/needsCfg/needsDataflow guards gated solely on
file-extension sets, so content-inferred files (.vue tagged as
"javascript", extensionless shebang files) could never reach the
native or WASM analysis paths despite having a valid langId.

Add langId-based fallback checks to all guard sites: the native
analysis dispatcher, the WASM pre-parse check, the per-file CFG
visitor setup, and the per-file dataflow visitor setup.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's P1 finding about extension-set guards blocking content-inferred files:

Problem: The needsComplexity/needsCfg/needsDataflow guards gated solely on COMPLEXITY_EXTENSIONS.has(ext) etc., so files like .vue (tagged as "javascript") and extensionless shebang files (ext = "") could never reach the native or WASM analysis paths despite having a valid langId.

Fix (9a6d5e6): Added langId-based fallback checks at all four guard sites:

  1. Native analysis dispatcher (lines 130-141) — added || COMPLEXITY_RULES.has(langId) / CFG_RULES.has(langId) / DATAFLOW_RULES.has(langId)
  2. WASM pre-parse check (lines 312-323) — same pattern using symbols._langId
  3. Per-file CFG visitor setup — removed redundant CFG_EXTENSIONS.has(ext) guard since CFG_RULES.get(langId) already validates language support
  4. Per-file dataflow visitor setup — same, removed redundant extension check

This completes the PR's stated goal: content-inferred files now correctly reach both the native and WASM analysis paths.

1 similar comment
@carlos-alm
Copy link
Copy Markdown
Contributor Author

Addressed Greptile's P1 finding about extension-set guards blocking content-inferred files:

Problem: The needsComplexity/needsCfg/needsDataflow guards gated solely on COMPLEXITY_EXTENSIONS.has(ext) etc., so files like .vue (tagged as "javascript") and extensionless shebang files (ext = "") could never reach the native or WASM analysis paths despite having a valid langId.

Fix (9a6d5e6): Added langId-based fallback checks at all four guard sites:

  1. Native analysis dispatcher (lines 130-141) — added || COMPLEXITY_RULES.has(langId) / CFG_RULES.has(langId) / DATAFLOW_RULES.has(langId)
  2. WASM pre-parse check (lines 312-323) — same pattern using symbols._langId
  3. Per-file CFG visitor setup — removed redundant CFG_EXTENSIONS.has(ext) guard since CFG_RULES.get(langId) already validates language support
  4. Per-file dataflow visitor setup — same, removed redundant extension check

This completes the PR's stated goal: content-inferred files now correctly reach both the native and WASM analysis paths.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 1e97927 into main Apr 2, 2026
11 checks passed
@carlos-alm carlos-alm deleted the perf/forward-langid-to-native-analysis branch April 2, 2026 05:28
@github-actions github-actions bot locked and limited conversation to collaborators Apr 2, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: forward langId hint to native standalone analysis functions

1 participant