ASTERIXDB-PR2: Add schema extraction pipeline for NL2SQL++ (SchemaContextBuilder) by pineappleBest123 · Pull Request #47 · apache/asterixdb

pineappleBest123 · 2026-03-27T23:18:04Z

Summary

Add the schema extraction pipeline for the GSoC 2026 NL2SQL++ project.
This patch builds on top of the servlet infrastructure introduced in
#46.

Changes

ColumnInfo: field name, type string, and primary-key flag with
prompt-ready toDescriptionString() output
DatasetSchema: holds all columns, supports pruned column subset
(for ColumnPruner in a later PR) and value hints (for ValueHintsSampler)
DatasetSchemaFormatter: recursively converts ADM IAType objects to
human-readable strings (supports nested records, arrays, multisets,
nullable unions, depth limit of 4)
SchemaContextBuilder: reads Dataset and type metadata from
MetadataManager, builds a SchemaContext with one description
string per Dataset, wrapped in a metadata transaction
13 unit tests covering all formatter rules and schema pipeline behavior

Example output

Dataset TweetMessages (tweetid: int64 [PK], sender-location: any,
send-time: datetime, referred-topics: [string], message-text: string,
author-id: int64)

Testing

All unit tests pass: mvn test -pl asterixdb/asterix-spidersilk

…let infrastructure

…textBuilder)

pineappleBest123 added 2 commits March 27, 2026 15:30

ASTERIXDB-PR1: Bootstrap asterix-spidersilk module with NL2SQL++ serv…

544cfa0

…let infrastructure

ASTERIXDB-PR2: Add schema extraction pipeline for NL2SQL++ (SchemaCon…

cdc71be

…textBuilder)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASTERIXDB-PR2: Add schema extraction pipeline for NL2SQL++ (SchemaContextBuilder)#47

ASTERIXDB-PR2: Add schema extraction pipeline for NL2SQL++ (SchemaContextBuilder)#47
pineappleBest123 wants to merge 2 commits intoapache:masterfrom
pineappleBest123:gsoc-pr2-schema-extraction

pineappleBest123 commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pineappleBest123 commented Mar 27, 2026

Summary

Changes

Example output

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant