feat: add Solidity, Objective-C, CUDA, Groovy, Verilog language support#729
feat: add Solidity, Objective-C, CUDA, Groovy, Verilog language support#729carlos-alm merged 4 commits intomainfrom
Conversation
|
Claude finished @carlos-alm's task —— View job Reviewing PR #729: Add Specialized Language SupportRoot Cause AnalysisProblem Being Solved: Expanding language support to include 5 specialized languages (Solidity, Objective-C, CUDA, Groovy, Verilog) that are important for specific domains: blockchain development, iOS/macOS development, GPU/ML computing, JVM ecosystem, and hardware design. Why This Approach: Following established patterns - each language gets an extractor, registry entry, type definition, build config, grammar devDependency, and parser tests. This is purely additive with no breaking changes. All new parsers are marked Risk Assessment: Low risk - follows documented architectural conventions from CLAUDE.md exactly. Backlog Compliance
Configuration Changes Assessment ✅Low Risk: Only adds 5 devDependencies for tree-sitter grammars and corresponding build-wasm.ts entries. No linting rules relaxed, no test configuration weakened, no runtime dependencies added. Test Integrity Assessment ✅Good Coverage: Each language gets comprehensive parser tests covering:
Tests follow established patterns from existing parser tests. No weakened assertions detected. Code Quality Assessment ✅Extractor Implementation Quality:
Specific Language Implementations:
Registry Integration: All 5 languages properly registered with correct extensions, grammar files, and extractors. Type definitions updated correctly. Critical ConcernsNone. This is a well-executed expansion following established patterns. Minor Observations
Final Recommendation
Conditions for approval:
This PR excellently demonstrates how to extend codegraph language support following documented patterns. The implementation is thorough, consistent, and adds meaningful value for developers working in these specialized domains. |
Greptile SummaryThis PR adds five new language extractors — Solidity, Objective-C, CUDA, Groovy, and Verilog/SystemVerilog — each with a corresponding tree-sitter grammar dependency, WASM build entry, registry registration, type definition, and parser test suite. The overall structure is consistent with the existing extractor pattern in this codebase, and the previously-reported issues (wrong Key changes:
Issues found:
Confidence Score: 4/5Mostly safe — four of five extractors look solid, but the CUDA qualifier deduplication bug should be fixed before merging to avoid corrupting decorator metadata for kernel functions. A confirmed P1 logic bug in src/extractors/cuda.ts — Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
File["Source File\n(.sol / .m / .cu / .groovy / .v / .sv)"]
Registry["LANGUAGE_REGISTRY\nlookup by extension"]
WasmParser["tree-sitter WASM\nParser"]
Tree["Parse Tree\n(TreeSitterTree)"]
File --> Registry
Registry --> WasmParser
WasmParser --> Tree
Tree --> SolExt["extractSoliditySymbols\ncontracts · functions · events\nstate vars · imports"]
Tree --> ObjCExt["extractObjCSymbols\n@interface · @implementation\n@protocol · message exprs"]
Tree --> CudaExt["extractCudaSymbols\nfunctions + CUDA qualifiers\nclasses · structs · #include"]
Tree --> GroovyExt["extractGroovySymbols\nclasses · methods\nimports · call exprs"]
Tree --> VerilogExt["extractVerilogSymbols\nmodules · tasks · functions\ninstantiations · pkg imports"]
SolExt --> Output["ExtractorOutput\ndefinitions · calls · imports\nclasses · exports · typeMap"]
ObjCExt --> Output
CudaExt --> Output
GroovyExt --> Output
VerilogExt --> Output
CudaExt -. "⚠ extractCudaQualifiers\nmay push qualifier twice" .-> BugNote["P1: duplicate\ndecorator entries"]
Reviews (2): Last reviewed commit: "fix: resolve Verilog function/task name ..." | Re-trigger Greptile |
| ctx.definitions.push({ | ||
| name: fullName, | ||
| kind: 'function', | ||
| line: node.startPosition.row + 1, | ||
| endLine: nodeEndLine(node), | ||
| visibility: extractSolVisibility(node), | ||
| }); |
There was a problem hiding this comment.
State variable emitted with wrong
kind
handleStateVarDecl pushes definitions with kind: 'function', but a Solidity state variable (e.g. uint256 public totalSupply) is not a function. This will cause state variables to appear as callable symbols in the code graph, corrupting downstream queries and call-graph analysis.
The correct kind is 'variable' (already part of ExtendedSymbolKind) or at a minimum 'property' to match how other extractors treat class-level fields.
| ctx.definitions.push({ | |
| name: fullName, | |
| kind: 'function', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| visibility: extractSolVisibility(node), | |
| }); | |
| ctx.definitions.push({ | |
| name: fullName, | |
| kind: 'variable', | |
| line: node.startPosition.row + 1, | |
| endLine: nodeEndLine(node), | |
| visibility: extractSolVisibility(node), | |
| }); |
There was a problem hiding this comment.
Fixed in 690e3ac. Changed handleStateVarDecl to emit kind: 'variable' instead of kind: 'function' for Solidity state variables.
src/extractors/solidity.ts
Outdated
| const body = findChild(node, 'struct_member') ? node : null; | ||
| if (!body) { | ||
| // Iterate direct children looking for member declarations | ||
| for (let i = 0; i < node.childCount; i++) { | ||
| const child = node.child(i); | ||
| if (child && child.type === 'struct_member') { | ||
| const memberName = child.childForFieldName('name'); | ||
| if (memberName) { | ||
| members.push({ | ||
| name: memberName.text, | ||
| kind: 'property', | ||
| line: child.startPosition.row + 1, | ||
| }); | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Inverted condition causes struct members to never be collected
The condition if (!body) is backwards. body is set to the struct node itself whenever findChild successfully finds a struct_member child:
- When struct members exist:
body = node(truthy) →!bodyisfalse→ the collection loop is skipped →membersstays empty. - When struct members don't exist:
body = null→!bodyistrue→ the loop runs but finds nothing.
The net result is that struct_declaration nodes always emit children: undefined, regardless of whether the struct has fields. This means Solidity struct members are silently dropped from the code graph.
The fix is to remove the body variable and always iterate over node.childCount:
| const body = findChild(node, 'struct_member') ? node : null; | |
| if (!body) { | |
| // Iterate direct children looking for member declarations | |
| for (let i = 0; i < node.childCount; i++) { | |
| const child = node.child(i); | |
| if (child && child.type === 'struct_member') { | |
| const memberName = child.childForFieldName('name'); | |
| if (memberName) { | |
| members.push({ | |
| name: memberName.text, | |
| kind: 'property', | |
| line: child.startPosition.row + 1, | |
| }); | |
| } | |
| } | |
| } | |
| } | |
| const members: SubDeclaration[] = []; | |
| for (let i = 0; i < node.childCount; i++) { | |
| const child = node.child(i); | |
| if (child && child.type === 'struct_member') { | |
| const memberName = child.childForFieldName('name'); | |
| if (memberName) { | |
| members.push({ | |
| name: memberName.text, | |
| kind: 'property', | |
| line: child.startPosition.row + 1, | |
| }); | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in 690e3ac. Removed the inverted if (!body) condition entirely. Now iterates directly over node.childCount to collect all struct_member children, matching the suggested approach.
| it('extracts module instantiations as calls', () => { | ||
| const symbols = parseVerilog(`module top; | ||
| counter u1(.clk(clk), .reset(reset)); | ||
| endmodule`); | ||
| expect(symbols.calls.length).toBeGreaterThanOrEqual(0); | ||
| }); | ||
|
|
||
| it('extracts package imports', () => { | ||
| const symbols = parseVerilog(`module m; | ||
| import pkg::item; | ||
| endmodule`); | ||
| expect(symbols.imports.length).toBeGreaterThanOrEqual(0); | ||
| }); |
There was a problem hiding this comment.
Trivially-passing test assertions give no coverage
Both expect(symbols.calls.length).toBeGreaterThanOrEqual(0) (line 53) and expect(symbols.imports.length).toBeGreaterThanOrEqual(0) (line 60) are always true — an empty array has length 0, so these assertions pass even when the extractor silently produces zero results.
These should assert the actual expected count or content, e.g.:
| it('extracts module instantiations as calls', () => { | |
| const symbols = parseVerilog(`module top; | |
| counter u1(.clk(clk), .reset(reset)); | |
| endmodule`); | |
| expect(symbols.calls.length).toBeGreaterThanOrEqual(0); | |
| }); | |
| it('extracts package imports', () => { | |
| const symbols = parseVerilog(`module m; | |
| import pkg::item; | |
| endmodule`); | |
| expect(symbols.imports.length).toBeGreaterThanOrEqual(0); | |
| }); | |
| it('extracts module instantiations as calls', () => { | |
| const symbols = parseVerilog(`module top; | |
| counter u1(.clk(clk), .reset(reset)); | |
| endmodule`); | |
| expect(symbols.calls).toContainEqual(expect.objectContaining({ name: 'counter' })); | |
| }); | |
| it('extracts package imports', () => { | |
| const symbols = parseVerilog(`module m; | |
| import pkg::item; | |
| endmodule`); | |
| expect(symbols.imports).toContainEqual( | |
| expect.objectContaining({ source: 'pkg', names: ['item'] }), | |
| ); | |
| }); |
There was a problem hiding this comment.
Fixed in 5effa9c. Replaced both trivially-passing toBeGreaterThanOrEqual(0) assertions with specific content checks: module instantiation now asserts { name: 'counter' } and package import asserts { source: 'pkg', names: ['item'] }.
#729) - Change handleStateVarDecl to emit kind 'variable' instead of 'function' for state variables like `uint256 public totalSupply` - Fix inverted condition in handleStructDecl that prevented struct members from ever being collected
…#729) The tree-sitter-objc grammar does not expose class/protocol names as named fields. Fall back to finding the first identifier child node when childForFieldName('name') returns null.
…t assertions (#729) - Add findFunctionOrTaskName helper that searches for function_identifier and task_identifier node types used by the tree-sitter-verilog grammar - Remove duplicate handling of function_body_declaration/task_body_declaration from the walker to prevent double definitions - Replace trivially-passing toBeGreaterThanOrEqual(0) assertions with specific content checks for module instantiations and package imports
|
The two issues from Greptile's second review round were not addressed before merge:
Both are fixed in follow-up PR #731. |
Summary
.sol) — smart contract language: contracts, interfaces, libraries, structs, enums, events, errors, modifiers, inheritance, import directives.m) —@interface,@implementation,@protocol, categories, method declarations, message expressions,#import, C function support.cu,.cuh) — C++ superset for GPU/ML: full C++ extraction plus__global__/__device__/__host__qualifier detection as decorators.groovy,.gvy) — JVM language: classes, interfaces, enums, methods, constructors, Java-style imports, call expressions.v,.sv) — HDL: modules with port extraction, interfaces, packages, functions, tasks, module instantiations as calls, package imports, include directivesEach language includes: extractor, registry entry, type definition, build-wasm config, grammar devDependency, and parser tests.
Test plan
tsc --noEmitpasses (no type errors)biome checkpasses on all new files