Skip to content

feat: add Solidity, Objective-C, CUDA, Groovy, Verilog language support#729

Merged
carlos-alm merged 4 commits intomainfrom
feat/batch4-specialized-languages
Apr 1, 2026
Merged

feat: add Solidity, Objective-C, CUDA, Groovy, Verilog language support#729
carlos-alm merged 4 commits intomainfrom
feat/batch4-specialized-languages

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Add Solidity (.sol) — smart contract language: contracts, interfaces, libraries, structs, enums, events, errors, modifiers, inheritance, import directives
  • Add Objective-C (.m) — @interface, @implementation, @protocol, categories, method declarations, message expressions, #import, C function support
  • Add CUDA (.cu, .cuh) — C++ superset for GPU/ML: full C++ extraction plus __global__/__device__/__host__ qualifier detection as decorators
  • Add Groovy (.groovy, .gvy) — JVM language: classes, interfaces, enums, methods, constructors, Java-style imports, call expressions
  • Add Verilog/SystemVerilog (.v, .sv) — HDL: modules with port extraction, interfaces, packages, functions, tasks, module instantiations as calls, package imports, include directives

Each language includes: extractor, registry entry, type definition, build-wasm config, grammar devDependency, and parser tests.

Test plan

  • tsc --noEmit passes (no type errors)
  • biome check passes on all new files
  • Existing JS parser tests still pass (31/31)
  • Parser unit tests pass (6/6)
  • New parser tests pass once WASM grammars are built in CI (local build requires emscripten)
  • Verify WASM grammar build succeeds in CI for all 5 new grammars

@claude
Copy link
Copy Markdown

claude bot commented Apr 1, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR #729: Add Specialized Language Support

Root Cause Analysis

Problem Being Solved: Expanding language support to include 5 specialized languages (Solidity, Objective-C, CUDA, Groovy, Verilog) that are important for specific domains: blockchain development, iOS/macOS development, GPU/ML computing, JVM ecosystem, and hardware design.

Why This Approach: Following established patterns - each language gets an extractor, registry entry, type definition, build config, grammar devDependency, and parser tests. This is purely additive with no breaking changes. All new parsers are marked required: false for graceful degradation.

Risk Assessment: Low risk - follows documented architectural conventions from CLAUDE.md exactly.

Backlog Compliance

  • Zero-dep: ✓ — Only adds tree-sitter grammar devDependencies, builds on existing infrastructure
  • Foundation-aligned: ✓ — Purely extends parser registry, follows P1 (always-current) and P4 (zero-cost core)
  • Problem-fit: 4/5 — Directly addresses "AI coding assistants waste tokens re-orienting in large codebases" by expanding language coverage to more domains
  • Breaking: No — Purely additive, maintains backward compatibility
  • Tier: 1 (zero-dep + foundation-aligned)

Configuration Changes Assessment ✅

Low Risk: Only adds 5 devDependencies for tree-sitter grammars and corresponding build-wasm.ts entries. No linting rules relaxed, no test configuration weakened, no runtime dependencies added.

Test Integrity Assessment ✅

Good Coverage: Each language gets comprehensive parser tests covering:

  • Core language constructs (classes, functions, modules)
  • Import/include mechanisms
  • Inheritance relationships
  • Call expression extraction

Tests follow established patterns from existing parser tests. No weakened assertions detected.

Code Quality Assessment ✅

Extractor Implementation Quality:

  • Consistent patterns: All extractors follow the same walker-based AST traversal pattern established in existing code
  • Proper error handling: Use helpers like findChild, nodeEndLine, extractModifierVisibility consistently
  • Language-appropriate mappings: Smart kind mappings (Solidity contracts→class, libraries→module, Verilog packages→module)
  • Complete extraction: Each extractor handles definitions, calls, imports, inheritance, and sub-declarations appropriately

Specific Language Implementations:

  • Solidity: Comprehensive smart contract support - contracts, interfaces, libraries, events, errors, modifiers
  • Objective-C: Proper iOS/macOS support - @interface, @implementation, @protocol, categories, message expressions
  • CUDA: C++ superset with GPU qualifiers - detects __global__, __device__, __host__ decorators
  • Groovy: JVM language with Java-like patterns - classes, interfaces, closures, dynamic typing
  • Verilog: Hardware description - modules with port extraction, SystemVerilog features

Registry Integration: All 5 languages properly registered with correct extensions, grammar files, and extractors. Type definitions updated correctly.

Critical Concerns

None. This is a well-executed expansion following established patterns.

Minor Observations

  • Some test assertions could be more specific (checking exact method names vs just kind)
  • Grammar availability depends on CI build process completing successfully
  • WASM grammar build may require emscripten in CI environment

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE WITH CONDITIONS
  • Reasoning: High-quality implementation that follows project conventions exactly. Fills important gaps in language coverage for specialized domains. Only condition is ensuring CI successfully builds the 5 new WASM grammars.

Conditions for approval:

  1. ✅ WASM grammar build succeeds in CI for all 5 new languages
  2. ✅ Existing tests continue to pass (as claimed by author)
  3. ✅ TypeScript compilation succeeds (no type errors)

This PR excellently demonstrates how to extend codegraph language support following documented patterns. The implementation is thorough, consistent, and adds meaningful value for developers working in these specialized domains.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 1, 2026

Greptile Summary

This PR adds five new language extractors — Solidity, Objective-C, CUDA, Groovy, and Verilog/SystemVerilog — each with a corresponding tree-sitter grammar dependency, WASM build entry, registry registration, type definition, and parser test suite. The overall structure is consistent with the existing extractor pattern in this codebase, and the previously-reported issues (wrong kind for Solidity state variables, inverted struct-member condition, trivially-passing Verilog test assertions) have all been resolved.

Key changes:

  • src/extractors/{cuda,groovy,objc,solidity,verilog}.ts — five new extractors implementing language-specific symbol extraction
  • src/domain/parser.ts — all five languages registered in LANGUAGE_REGISTRY with correct extensions and extractor references; minor optional-chaining cleanup in createParseTreeCache
  • src/types.ts — five new LanguageId union members
  • package.json / scripts/build-wasm.ts — new grammar devDependencies and WASM build targets
  • tests/parsers/ — five new test files covering definitions, calls, imports, and inheritance per language

Issues found:

  • src/extractors/cuda.tsextractCudaQualifiers uses two independent if blocks rather than if / else if, so a node typed storage_class_specifier (or attribute_specifier) whose .text matches a CUDA qualifier is pushed into the qualifiers array twice, corrupting the decorators field for kernel functions.
  • src/extractors/solidity.tshandleContractDecl omits event_definition, error_declaration, and modifier_definition from the contract's inline children list, making it incomplete relative to what the walker emits as top-level definitions.

Confidence Score: 4/5

Mostly safe — four of five extractors look solid, but the CUDA qualifier deduplication bug should be fixed before merging to avoid corrupting decorator metadata for kernel functions.

A confirmed P1 logic bug in extractCudaQualifiers (double-push of the same qualifier) remains unaddressed. All other concerns are P2 or lower. Score is 4 rather than 5 because of this one definite correctness defect on the main CUDA extraction path.

src/extractors/cuda.ts — extractCudaQualifiers needs the ifelse if fix.

Important Files Changed

Filename Overview
src/extractors/cuda.ts New CUDA extractor reusing C++ patterns with CUDA-qualifier detection; contains a logic bug in extractCudaQualifiers that can push the same qualifier twice, corrupting the decorators array for kernel functions.
src/extractors/groovy.ts New Groovy extractor covering classes, interfaces, enums, methods, constructors, imports, and call expressions; handles grammar variant node names defensively.
src/extractors/objc.ts New Objective-C extractor handling @interface, @implementation, @protocol, categories, ObjC message expressions, C-function definitions, and #import/@import directives; selector building logic is sound.
src/extractors/solidity.ts New Solidity extractor covers contracts, interfaces, libraries, functions, events, errors, modifiers, state variables, and imports; previous bugs fixed; events/errors/modifiers missing from contract children member list is a minor incompleteness.
src/extractors/verilog.ts New Verilog/SystemVerilog extractor covering modules with port extraction, interfaces, packages, functions, tasks, module instantiations as calls, and package imports; test assertions now verify specific content.
src/domain/parser.ts Cleanly adds all 5 new language registry entries and re-exports; also micro-optimizes createParseTreeCache with optional chaining.
src/types.ts Adds five new LanguageId union members — straightforward, no issues.
tests/parsers/verilog.test.ts Previous trivially-passing assertions replaced with specific content checks for module instantiations and package imports.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    File["Source File\n(.sol / .m / .cu / .groovy / .v / .sv)"]
    Registry["LANGUAGE_REGISTRY\nlookup by extension"]
    WasmParser["tree-sitter WASM\nParser"]
    Tree["Parse Tree\n(TreeSitterTree)"]

    File --> Registry
    Registry --> WasmParser
    WasmParser --> Tree

    Tree --> SolExt["extractSoliditySymbols\ncontracts · functions · events\nstate vars · imports"]
    Tree --> ObjCExt["extractObjCSymbols\n@interface · @implementation\n@protocol · message exprs"]
    Tree --> CudaExt["extractCudaSymbols\nfunctions + CUDA qualifiers\nclasses · structs · #include"]
    Tree --> GroovyExt["extractGroovySymbols\nclasses · methods\nimports · call exprs"]
    Tree --> VerilogExt["extractVerilogSymbols\nmodules · tasks · functions\ninstantiations · pkg imports"]

    SolExt --> Output["ExtractorOutput\ndefinitions · calls · imports\nclasses · exports · typeMap"]
    ObjCExt --> Output
    CudaExt --> Output
    GroovyExt --> Output
    VerilogExt --> Output

    CudaExt -. "⚠ extractCudaQualifiers\nmay push qualifier twice" .-> BugNote["P1: duplicate\ndecorator entries"]
Loading

Reviews (2): Last reviewed commit: "fix: resolve Verilog function/task name ..." | Re-trigger Greptile

Comment on lines +274 to +280
ctx.definitions.push({
name: fullName,
kind: 'function',
line: node.startPosition.row + 1,
endLine: nodeEndLine(node),
visibility: extractSolVisibility(node),
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 State variable emitted with wrong kind

handleStateVarDecl pushes definitions with kind: 'function', but a Solidity state variable (e.g. uint256 public totalSupply) is not a function. This will cause state variables to appear as callable symbols in the code graph, corrupting downstream queries and call-graph analysis.

The correct kind is 'variable' (already part of ExtendedSymbolKind) or at a minimum 'property' to match how other extractors treat class-level fields.

Suggested change
ctx.definitions.push({
name: fullName,
kind: 'function',
line: node.startPosition.row + 1,
endLine: nodeEndLine(node),
visibility: extractSolVisibility(node),
});
ctx.definitions.push({
name: fullName,
kind: 'variable',
line: node.startPosition.row + 1,
endLine: nodeEndLine(node),
visibility: extractSolVisibility(node),
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 690e3ac. Changed handleStateVarDecl to emit kind: 'variable' instead of kind: 'function' for Solidity state variables.

Comment on lines +151 to +167
const body = findChild(node, 'struct_member') ? node : null;
if (!body) {
// Iterate direct children looking for member declarations
for (let i = 0; i < node.childCount; i++) {
const child = node.child(i);
if (child && child.type === 'struct_member') {
const memberName = child.childForFieldName('name');
if (memberName) {
members.push({
name: memberName.text,
kind: 'property',
line: child.startPosition.row + 1,
});
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Inverted condition causes struct members to never be collected

The condition if (!body) is backwards. body is set to the struct node itself whenever findChild successfully finds a struct_member child:

  • When struct members exist: body = node (truthy) → !body is false → the collection loop is skippedmembers stays empty.
  • When struct members don't exist: body = null!body is true → the loop runs but finds nothing.

The net result is that struct_declaration nodes always emit children: undefined, regardless of whether the struct has fields. This means Solidity struct members are silently dropped from the code graph.

The fix is to remove the body variable and always iterate over node.childCount:

Suggested change
const body = findChild(node, 'struct_member') ? node : null;
if (!body) {
// Iterate direct children looking for member declarations
for (let i = 0; i < node.childCount; i++) {
const child = node.child(i);
if (child && child.type === 'struct_member') {
const memberName = child.childForFieldName('name');
if (memberName) {
members.push({
name: memberName.text,
kind: 'property',
line: child.startPosition.row + 1,
});
}
}
}
}
const members: SubDeclaration[] = [];
for (let i = 0; i < node.childCount; i++) {
const child = node.child(i);
if (child && child.type === 'struct_member') {
const memberName = child.childForFieldName('name');
if (memberName) {
members.push({
name: memberName.text,
kind: 'property',
line: child.startPosition.row + 1,
});
}
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 690e3ac. Removed the inverted if (!body) condition entirely. Now iterates directly over node.childCount to collect all struct_member children, matching the suggested approach.

Comment on lines +49 to +61
it('extracts module instantiations as calls', () => {
const symbols = parseVerilog(`module top;
counter u1(.clk(clk), .reset(reset));
endmodule`);
expect(symbols.calls.length).toBeGreaterThanOrEqual(0);
});

it('extracts package imports', () => {
const symbols = parseVerilog(`module m;
import pkg::item;
endmodule`);
expect(symbols.imports.length).toBeGreaterThanOrEqual(0);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Trivially-passing test assertions give no coverage

Both expect(symbols.calls.length).toBeGreaterThanOrEqual(0) (line 53) and expect(symbols.imports.length).toBeGreaterThanOrEqual(0) (line 60) are always true — an empty array has length 0, so these assertions pass even when the extractor silently produces zero results.

These should assert the actual expected count or content, e.g.:

Suggested change
it('extracts module instantiations as calls', () => {
const symbols = parseVerilog(`module top;
counter u1(.clk(clk), .reset(reset));
endmodule`);
expect(symbols.calls.length).toBeGreaterThanOrEqual(0);
});
it('extracts package imports', () => {
const symbols = parseVerilog(`module m;
import pkg::item;
endmodule`);
expect(symbols.imports.length).toBeGreaterThanOrEqual(0);
});
it('extracts module instantiations as calls', () => {
const symbols = parseVerilog(`module top;
counter u1(.clk(clk), .reset(reset));
endmodule`);
expect(symbols.calls).toContainEqual(expect.objectContaining({ name: 'counter' }));
});
it('extracts package imports', () => {
const symbols = parseVerilog(`module m;
import pkg::item;
endmodule`);
expect(symbols.imports).toContainEqual(
expect.objectContaining({ source: 'pkg', names: ['item'] }),
);
});

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5effa9c. Replaced both trivially-passing toBeGreaterThanOrEqual(0) assertions with specific content checks: module instantiation now asserts { name: 'counter' } and package import asserts { source: 'pkg', names: ['item'] }.

#729)

- Change handleStateVarDecl to emit kind 'variable' instead of 'function'
  for state variables like `uint256 public totalSupply`
- Fix inverted condition in handleStructDecl that prevented struct members
  from ever being collected
…#729)

The tree-sitter-objc grammar does not expose class/protocol names as
named fields. Fall back to finding the first identifier child node when
childForFieldName('name') returns null.
…t assertions (#729)

- Add findFunctionOrTaskName helper that searches for function_identifier
  and task_identifier node types used by the tree-sitter-verilog grammar
- Remove duplicate handling of function_body_declaration/task_body_declaration
  from the walker to prevent double definitions
- Replace trivially-passing toBeGreaterThanOrEqual(0) assertions with
  specific content checks for module instantiations and package imports
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 57ebcfd into main Apr 1, 2026
26 checks passed
@carlos-alm carlos-alm deleted the feat/batch4-specialized-languages branch April 1, 2026 03:47
@github-actions github-actions bot locked and limited conversation to collaborators Apr 1, 2026
@carlos-alm
Copy link
Copy Markdown
Contributor Author

The two issues from Greptile's second review round were not addressed before merge:

  1. P1: CUDA extractCudaQualifiers double-pushif blocks should be if/else if to avoid pushing the same qualifier twice when a storage_class_specifier/attribute_specifier node's text matches CUDA_QUALIFIERS.
  2. P2: Solidity handleContractDecl incomplete children — Missing event_definition, error_declaration, and modifier_definition from contract member collection.

Both are fixed in follow-up PR #731.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant