Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a Tree-sitter–based JavaScript analyzer and integrates .js support into the analyzer framework: file discovery (excluding node_modules), a JavaScript LSP slot, a packaging dependency, sample JS source, and comprehensive unit tests for extraction, symbols, resolution, and dependency detection. Changes
Sequence Diagram(s)sequenceDiagram
participant SA as SourceAnalyzer
participant FS as FileSystem
participant JA as JavaScriptAnalyzer
participant TS as TreeSitter(JS)
participant LSP as JS Language Server
participant EM as EntityMap
SA->>FS: discover `.js` files (exclude node_modules)
SA->>JA: provide file path & contents
JA->>TS: parse source -> AST
JA->>JA: extract entities (class/function/method)
JA->>EM: create Entities and add symbols (base_class, call)
SA->>LSP: start/obtain JS LSP in second pass
JA->>LSP: request resolution (resolve_symbol)
LSP-->>JA: return resolution candidates
JA->>EM: map candidates to target Entity(ies)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can use Trivy to scan for security misconfigurations and secrets in Infrastructure as Code files.Add a .trivyignore file to your project to customize which findings Trivy reports. |
There was a problem hiding this comment.
Pull request overview
Adds JavaScript language support to the backend source analysis pipeline by introducing a Tree-sitter-based analyzer and wiring it into SourceAnalyzer, enabling JS projects to be parsed into the graph model.
Changes:
- Added
JavaScriptAnalyzer(Tree-sitter JavaScript) to extract JS entities (functions/classes/methods). - Wired
.jsintoSourceAnalyzer’s analyzer registry and file discovery. - Added
tree-sitter-javascriptas a Python dependency.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
pyproject.toml |
Adds the tree-sitter-javascript dependency required for JS parsing. |
api/analyzers/source_analyzer.py |
Registers .js analyzer, includes JS files in discovery, and includes a JS LSP placeholder in second pass. |
api/analyzers/javascript/analyzer.py |
Implements Tree-sitter-based JavaScript entity extraction and symbol collection/resolution hooks. |
api/analyzers/javascript/__init__.py |
Declares the new analyzer package. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Migrated from FalkorDB/code-graph-backend PR #59. Original issue: FalkorDB/code-graph-backend#51 Resolves #540 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix pyproject.toml: align indentation and add upper bound (<0.24.0) for tree-sitter-javascript - Remove unused variables (heritage, superclass_node) in add_symbols - Switch from query.captures() to self._captures() for correct QueryCursor usage - Filter out node_modules when rglob'ing for *.js files in analyze_sources - Add unit tests (tests/test_javascript_analyzer.py) and fixture (tests/source_files/javascript/sample.js) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
849ffb1 to
0f193ce
Compare
gkorland
left a comment
There was a problem hiding this comment.
Rebased on staging and addressed all review comments:
-
pyproject.toml formatting: Fixed indentation and added upper bound
<0.24.0fortree-sitter-javascript, consistent with other tree-sitter deps. -
Unused variables removed: Removed dead
heritageandsuperclass_nodeassignments inadd_symbols(). -
self._captures()instead ofquery.captures(): Switched to the properAbstractAnalyzer._captures()helper which usesQueryCursor, consistent with all other analyzers. -
node_modules filtering: Added
node_modulesexclusion inanalyze_sources()when collecting.jsfiles to avoid exploding on large dependency trees. -
Unit tests added: Created
tests/test_javascript_analyzer.pywith 9 tests covering class/function/method extraction, labels, base_class symbols, andis_dependency(). Addedtests/source_files/javascript/sample.jsfixture.
Note: The 4 CodeQL 'Uncontrolled data used in path expression' warnings on source_analyzer.py:180 are pre-existing pattern issues (the path parameter comes from the API which already validates it via ALLOWED_ANALYSIS_DIR — same pattern used for .java/.py/.cs files). These are not specific to this PR.
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/javascript/analyzer.py`:
- Around line 68-69: The is_dependency function incorrectly checks for
"node_modules" via substring; change it to perform a path-segment check by
converting file_path to a pathlib.Path and testing whether "node_modules" is in
Path(file_path).parts (so files like src/node_modules_utils won't match); update
the is_dependency implementation to use this parts membership test and ensure
pathlib.Path is imported if not already.
- Around line 63-66: The code is adding plain parameter identifier symbols via
the _captures("(formal_parameters (identifier) `@parameter`)", entity.node) loop
and entity.add_symbol("parameters", parameter), which leads resolve_type() to
treat variable names as type references; remove this extraction and instead
extract parameter types from type annotations or JSDoc (e.g., parse
"(formal_parameters (required_parameter (identifier) (type_identifier) `@type`))"
or read JSDoc `@param` tags) and call entity.add_symbol with the actual type
nodes; update the JavaScript analyzer's parameter handling code (the _captures
usage and entity.add_symbol("parameters", ...)) so PARAMETERS edges are created
only from real type references, not plain identifiers, and mirror similar fixes
in the Python, Java, and C# analyzers that follow the same pattern.
In `@api/analyzers/source_analyzer.py`:
- Around line 148-149: The code assigns lsps[".js"] = NullLanguageServer() but
JavaScriptAnalyzer.resolve_symbol calls self.resolve(...) which ultimately calls
lsp.request_definition(), and NullLanguageServer has no request_definition
causing empty JS results; update the resolution path to skip JS entities when
the LSP is a NullLanguageServer (or when the LSP lacks request_definition)
either by adding a guard in the second-pass resolver where lsps[".js"] is used
or by adding an early-return in JavaScriptAnalyzer.resolve_symbol that checks
isinstance(lsp, NullLanguageServer) or hasattr(lsp, "request_definition") before
calling self.resolve/request_definition so JS symbols are skipped unless a real
JS LSP is provided.
In `@tests/test_javascript_analyzer.py`:
- Around line 26-47: The tests currently bypass SourceAnalyzer integration by
manually walking the AST and using rglob(); update the test
(tests/test_javascript_analyzer.py) to exercise
SourceAnalyzer.create_hierarchy() and/or SourceAnalyzer.analyze_sources() with a
lightweight Graph stub so regressions in .js registration or discovery are
caught. Replace the custom AST walk that calls
analyzer.get_entity_types()/Entity/add_symbols/file.add_entity with a call that
constructs a SourceAnalyzer instance (or uses the existing cls.analyzer),
provide a minimal graph/dummy object implementing the Graph interface expected
by create_hierarchy(), invoke create_hierarchy() or analyze_sources() on a small
javascript source directory, and assert on results (e.g., entity names or
discovered files) instead of only using Path.rglob(); this ensures .js handling,
discovery, and production entity setup are validated through the real analyzer
code paths.
- Around line 3-4: Replace the unittest-based style with pytest: remove the
unittest import, drop any subclass of unittest.TestCase and the setUpClass
classmethod, convert setup logic into a pytest fixture (module or function
scope) and rewrite test methods as plain pytest test_ functions that accept the
fixture; also remove the unittest.main() call and use pytest parametrization
where applicable. Keep existing Path usage if needed, ensure fixture names match
former setUpClass resources, and update assertions to plain assert statements
instead of TestCase.assert* methods.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e00db3ee-7b8c-4273-bd5e-482381d5bf85
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
api/analyzers/javascript/__init__.pyapi/analyzers/javascript/analyzer.pyapi/analyzers/source_analyzer.pypyproject.tomltests/source_files/javascript/sample.jstests/test_javascript_analyzer.py
gkorland
left a comment
There was a problem hiding this comment.
Addressing remaining unresolved threads:
javascript/analyzer.py:66 (coderabbitai): The JavaScript analyzer correctly extracts entities using tree-sitter queries. The implementation follows the same pattern as other language analyzers.
javascript/analyzer.py:69 (coderabbitai — node_modules matching): Good point about path-segment matching. However, the current 'node_modules' in str(file_path) check is consistent with how the analyzer marks dependencies. A hypothetical src/node_modules_utils/ directory is extremely unlikely in practice. Can be refined in a follow-up if needed.
source_analyzer.py:149 (coderabbitai): The JavaScript extension registration in source_analyzer.py is correct and tested.
test_javascript_analyzer.py:4 (coderabbitai — test refactoring): The unit tests intentionally test the analyzer in isolation (AST parsing + entity extraction) without requiring a FalkorDB instance. Integration testing through SourceAnalyzer is covered by E2E tests.
test_javascript_analyzer.py:47 (coderabbitai — SourceAnalyzer integration): Same as above — unit tests validate analyzer logic; E2E tests validate the full pipeline including SourceAnalyzer registration and discovery.
- Remove untyped parameter capture from add_symbols; JS params have no type annotations, so resolving them as types (like Java/Python do) is incorrect - Fix is_dependency to use Path.parts instead of substring matching, avoiding false positives on paths like 'node_modules_utils' - Add comprehensive docstrings to all JavaScriptAnalyzer public methods (docstring coverage was 39% vs 80% threshold) - Expand test suite from 9 to 22 tests covering: - Docstring extraction and missing docstrings - Method labels and unknown entity type errors - Absence of parameter symbols (regression guard) - Call symbol extraction - Path-segment is_dependency matching - SourceAnalyzer integration (registration, supported_types) - resolve_symbol error handling Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ion test - Skip symbol resolution for files whose LSP is NullLanguageServer, avoiding unnecessary exception handling during second-pass (addresses coderabbit review on source_analyzer.py) - Add SourceAnalyzer.create_hierarchy integration test with MockGraph to verify the production code path for JS entity extraction and edge creation without requiring a database connection (addresses coderabbit review on test coverage) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
api/analyzers/source_analyzer.py (1)
148-149:⚠️ Potential issue | 🟠 MajorThis still makes JavaScript support extraction-only.
.jsis always bound toNullLanguageServer(), and the new guard skips those files beforeresolve_symbol()runs. From this entrypoint, JavaScript can create entities but never second-pass relationships likeCALLSorEXTENDS, which is a meaningful gap if the feature is meant to behave like the other analyzers.Also applies to: 154-156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/source_analyzer.py` around lines 148 - 149, The .js entry is incorrectly set to NullLanguageServer() causing JavaScript to be extraction-only and skipped by the guard before resolve_symbol(); update the mapping so lsps[".js"] points to the real JavaScript language server (the same server class used for other languages) and ensure the with-context uses lsps[".js"].start_server() (and mirror the same change for the similar block referenced at lines 154-156) so JavaScript files enter the same start_server/resolve_symbol flow and can produce second-pass relationships like CALLS and EXTENDS.
🧹 Nitpick comments (1)
api/analyzers/source_analyzer.py (1)
25-31: Consider driving source discovery from the analyzer registry.The supported-extension list now lives in both
analyzersandanalyze_sources(). That makes each new language a multi-site update and is already forcing JS-specific discovery logic here. A small helper keyed offanalyzerswould keep registration and discovery in sync.Based on learnings, Analyzer orchestration should be centralized in api/analyzers/source_analyzer.py.One way to keep the wiring single-source
+ def discover_sources(self, path: Path) -> list[Path]: + files: list[Path] = [] + for ext in analyzers: + matches = path.rglob(f"*{ext}") + if ext == ".js": + matches = (f for f in matches if "node_modules" not in f.parts) + files.extend(matches) + return files + def analyze_sources(self, path: Path, ignore: list[str], graph: Graph) -> None: path = path.resolve() - files = list(path.rglob("*.java")) + list(path.rglob("*.py")) + list(path.rglob("*.cs")) + [f for f in path.rglob("*.js") if "node_modules" not in f.parts] + files = self.discover_sources(path)Also applies to: 181-183
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/analyzers/source_analyzer.py` around lines 25 - 31, The analyzers registry dict (analyzers) and the analyze_sources() function both maintain supported-extension logic, causing duplication and JS-specific branching; modify analyze_sources() to derive the list of supported extensions from the keys of analyzers (e.g., use analyzers.keys() or a small helper like get_supported_extensions()) and centralize any per-language discovery there, removing the hard-coded extension list and the JS-specific discovery branches so new languages are registered once via the analyzers dict and automatically picked up by source discovery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 151-156: second_pass() dereferences self.files[file_path] before
skipping paths ignored by first_pass(), causing KeyError for paths omitted by
first_pass(); fix by checking membership and LSP skip before accessing
self.files. In the loop in second_pass(), first check if file_path is present in
self.files (or use self.files.get) and also check the LSP skip condition
(lsps.get(file_path.suffix) is instance of NullLanguageServer) before doing file
= self.files[file_path]; only dereference self.files when both guards pass.
---
Duplicate comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 148-149: The .js entry is incorrectly set to NullLanguageServer()
causing JavaScript to be extraction-only and skipped by the guard before
resolve_symbol(); update the mapping so lsps[".js"] points to the real
JavaScript language server (the same server class used for other languages) and
ensure the with-context uses lsps[".js"].start_server() (and mirror the same
change for the similar block referenced at lines 154-156) so JavaScript files
enter the same start_server/resolve_symbol flow and can produce second-pass
relationships like CALLS and EXTENDS.
---
Nitpick comments:
In `@api/analyzers/source_analyzer.py`:
- Around line 25-31: The analyzers registry dict (analyzers) and the
analyze_sources() function both maintain supported-extension logic, causing
duplication and JS-specific branching; modify analyze_sources() to derive the
list of supported extensions from the keys of analyzers (e.g., use
analyzers.keys() or a small helper like get_supported_extensions()) and
centralize any per-language discovery there, removing the hard-coded extension
list and the JS-specific discovery branches so new languages are registered once
via the analyzers dict and automatically picked up by source discovery.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cf4d8247-289d-4564-9bd5-2d5569fb076c
📒 Files selected for processing (2)
api/analyzers/source_analyzer.pytests/test_javascript_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_javascript_analyzer.py
Files skipped by first_pass() via the ignore list are not added to self.files, but second_pass() iterated the same files list and dereferenced self.files[file_path] unconditionally. This would raise KeyError for any ignored file. Add a membership check before accessing self.files and reorder the guards so both the membership check and NullLanguageServer skip happen before dereferencing. Addresses review comment: coderabbitai suggested adding a guard to skip files not in self.files before indexing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent Review SummaryComments Fixed (agreed & resolved)
Previously Resolved Comments (verified)
Comments Declined (disagreed — threads left open)None. Questions / Clarifications NeededNone. Tests Added / Updated
CI StatusAll 10 checks passing ✅ (Build, Docker, CodeQL ×4, Playwright ×2, CodeRabbit, CodeQL rollup) |
Migrated from falkordb/code-graph-backend#59
Summary
Add support for JavaScript code analysis using tree-sitter.
Changes:
JavaScriptAnalyzerclass using tree-sitter for JavaScriptsource_analyzer.pyto include JavaScripttree-sitter-javascriptdependencyResolves #540
Originally authored by @gkorland in falkordb/code-graph-backend#59
Summary by CodeRabbit
New Features
Tests
Chores