BoltSQL

A fast, lightweight natural language to SQL translator for PostgreSQL, MySQL, and SQLite. Query your database using plain English with schema-aware retrieval and minimal setup.

Overview

BoltSQL translates natural language questions into SQL queries using:

Schema-aware retrieval: Automatically finds relevant tables and columns based on your question
Fast embeddings: CPU-based semantic search with on-disk caching
LLM-powered generation: Uses OpenAI's GPT models to generate SQL
Multi-database support: Works with PostgreSQL, MySQL, and SQLite
Read-only safety: Only executes SELECT queries to protect data

Features

Rust-based: Fast, memory-safe library and CLI
Zero vector database: Simple brute-force cosine similarity for retrieval
Embedding cache: Subsequent queries reuse cached schema embeddings
Comprehensive logging: Debug queries with detailed timing and context logs
JSON catalog: Define your schema once in a simple JSON format

Installation

Prerequisites

Rust toolchain (rustup)
OpenAI API key (get one here)
A database connection (PostgreSQL, MySQL, or SQLite)

Build

git clone <repository-url>
cd bolt-sql
cargo build --release

The binary will be available at target/release/bolt-sql (or target/release/bolt-sql.exe on Windows).

Quick Start

1. Configure Environment

Create a .env file in the project root:

OPENAI_API_KEY=sk-your-key-here
DATABASE_URL=postgres://user:password@localhost:5432/mydb

For PostgreSQL with SSL (e.g., Supabase, AWS RDS):

DATABASE_URL=postgres://user:password@host:5432/db?sslmode=require

For SQLite:

DATABASE_URL=sqlite:///path/to/database.db

2. Create a Catalog

Define your database schema in a JSON file (catalog.json):

{
  "databases": [
    {
      "name": "public",
      "tables": [
        {
          "name": "users",
          "description": "User accounts and profile information",
          "columns": [
            { "name": "id", "type": "integer", "description": "Primary key, unique user identifier" },
            { "name": "email", "type": "text", "description": "User email address" },
            { "name": "created_at", "type": "timestamp", "description": "Account creation timestamp" }
          ]
        },
        {
          "name": "orders",
          "description": "Customer orders and transactions",
          "columns": [
            { "name": "id", "type": "integer", "description": "Primary key" },
            { "name": "user_id", "type": "integer", "description": "Foreign key to users.id" },
            { "name": "total", "type": "decimal", "description": "Order total amount" },
            { "name": "status", "type": "text", "description": "Order status: pending, completed, cancelled" }
          ]
        }
      ]
    }
  ]
}

Tips for better results:

Provide clear, descriptive table and column descriptions
Include relationship hints (e.g., "Foreign key to users.id")
Mention data types and constraints when relevant

3. Query Your Database

Ask questions in natural language:

# Basic query
target/release/bolt-sql ask "How many users signed up this month?" \
  --catalog catalog.json

# With custom database URL
target/release/bolt-sql ask "Show me the top 10 orders by total" \
  --catalog catalog.json \
  --db postgres://user:pass@localhost:5432/mydb

# Get JSON output
target/release/bolt-sql ask "List all pending orders" \
  --catalog catalog.json \
  --json

# Verbose mode (with timing and logs)
target/release/bolt-sql ask "What's the average order value?" \
  --catalog catalog.json \
  --verbose

Command Reference

`ask` Command

Query your database with natural language.

Required:

question: Your question in natural language
--catalog: Path to your catalog JSON file

Optional:

--db: Database connection URL (overrides DATABASE_URL env var)
--model: OpenAI model to use (default: gpt-4.1-nano, fallback: gpt-4o-mini)
--top-tables: Number of top tables to retrieve (default: 3)
--top-columns: Number of top columns to retrieve (default: 10)
--max-rows: Maximum rows to return (default: 100)
--json: Output results as JSON instead of a table
--verbose / -v: Enable detailed logging and timing information
--no-cache: Disable embedding cache (useful for testing)
--pooler: Use connection pooler (for Supabase, replace port 5432 with 6543)

`warm` Command

Preload embeddings and test database connection:

target/release/bolt-sql warm --verbose

Logging

When using --verbose, BoltSQL creates detailed logs in the logger/ directory:

main.log: General execution logs and timing information
retrieval.log: Retrieved tables and columns for each query
prompt.log: Full prompts sent to the LLM
context.txt: Latest retrieved context (overwritten each run)
prompt.txt: Latest prompt (overwritten each run)
llm-cache.jsonl: Cached LLM responses (20-minute TTL)

Performance

First run: Generates embeddings for your catalog (typically 1-5 seconds)
Subsequent runs: Uses cached embeddings (typically <1 second for simple queries)
Embedding cache: Stored as boltsql-cache-<hash>-d384.bin in the project root
Connection pooling: Automatic connection pooling for PostgreSQL and MySQL

Security

Read-only: Only SELECT statements are executed; writes are rejected
Transaction safety: Uses read-only transactions where supported
No data modification: Your database is never modified
API key security: Store your OpenAI API key in .env (never commit it)

Troubleshooting

Connection issues:

Verify your DATABASE_URL is correct
For managed databases (Supabase, AWS RDS), ensure SSL is enabled: ?sslmode=require
Test connectivity with your database client first

Slow queries:

Check network latency to your database
Use --verbose to identify bottlenecks
Try --model gpt-4o-mini if gpt-4.1-nano is unavailable or slow

Poor SQL generation:

Improve your catalog descriptions (more detail = better results)
Increase --top-tables and --top-columns for complex schemas
Check logger/prompt.txt to see what context was sent to the LLM

Empty results:

Verify your catalog JSON is valid
Ensure table and column names match your actual schema
Check logger/retrieval.log to see what was retrieved

Evaluation

BoltSQL has been evaluated on the Spider dataset. Initial results (50 questions):

Exact match: 4% (syntactic SQL match)
Execution match: 16% (semantic correctness, produces same results)
Token similarity: 0.57 average Jaccard score

The system performs well on simple queries (SELECT, WHERE, ORDER BY, aggregates) and is actively being improved for complex operations (INTERSECT, EXCEPT, multi-join queries).

See README-spider.md for evaluation setup and scripts.

Architecture

BoltSQL is organized into focused modules:

catalog: Schema loading, flattening, and hashing
embedding: CPU-based embeddings with on-disk caching
retrieval: Cosine similarity and top-k selection
prompt: Few-shot prompt construction with schema pruning
generator: OpenAI API integration and SQL validation
executor: Database connection pooling and query execution
logging: Performance timing and file-based logging

Contributing

Contributions are welcome! Areas for improvement:

Cloud based vector retrieval support i.e Qdrant/Pinecone etc
Complex query generation (INTERSECT, EXCEPT, correlated subqueries)
Better schema retrieval for large databases
Support for additional database dialects
Query decomposition for multi-step questions
Performance optimizations

License

GNU General Public License v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README-spider.md		README-spider.md
README.md		README.md
spider_eval.jsonl		spider_eval.jsonl
spider_score.json		spider_score.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoltSQL

Overview

Features

Installation

Prerequisites

Build

Quick Start

1. Configure Environment

2. Create a Catalog

3. Query Your Database

Command Reference

`ask` Command

`warm` Command

Logging

Performance

Security

Troubleshooting

Evaluation

Architecture

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BoltSQL

Overview

Features

Installation

Prerequisites

Build

Quick Start

1. Configure Environment

2. Create a Catalog

3. Query Your Database

Command Reference

ask Command

warm Command

Logging

Performance

Security

Troubleshooting

Evaluation

Architecture

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ask` Command

`warm` Command

Packages