This document originally outlined the implementation plan for adding PostgreSQL Full-Text Search (FTS) support to pgsqlite. The implementation has been successfully completed with full tsvector/tsquery support using SQLite's FTS5 extension.
The FTS implementation provides complete PostgreSQL Full-Text Search compatibility using SQLite FTS5 as the backend engine, with comprehensive query translation and type support.
Core Features:
- ✅ Migration v9: FTS schema tables (__pgsqlite_fts_tables, __pgsqlite_fts_columns)
- ✅ Type System: Full tsvector and tsquery types with PostgreSQL wire protocol support
- ✅ CREATE TABLE: Automatic FTS5 virtual table creation for tsvector columns
- ✅ Search Functions: to_tsvector(), to_tsquery(), plainto_tsquery()
- ✅ Query Operations: @@ operator translation to SQLite FTS5 MATCH
- ✅ Data Operations: INSERT, UPDATE, DELETE with automatic tsvector population
Advanced Features:
- ✅ Complex Query Translation: tsquery to FTS5 syntax (AND, OR, NOT, phrases)
- ✅ Table Alias Resolution: Proper handling of aliases (d.search_vector @@ query)
- ✅ SQL Parser Compatibility: Custom pgsqlite_fts_match() function to avoid conflicts
- ✅ DELETE/UPDATE with FTS: Full WHERE clause FTS search support
- ✅ Comprehensive Testing: Integration tests covering all operations and edge cases
Files Implemented:
src/migration/registry.rs- Migration v9 registrationsrc/translator/fts_translator.rs- Core FTS translation logicsrc/functions/fts_functions.rs- PostgreSQL FTS functionssrc/types/type_mapper.rs- tsvector/tsquery type mappingstests/test_fts_translator.rs- Comprehensive unit teststests/sql/features/test_fts_functions.sql- Integration tests
Add new columns to __pgsqlite_schema table:
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_table_name TEXT;
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_config TEXT DEFAULT 'english';
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_weights TEXT; -- JSON array of weight mappingsCreate FTS metadata table:
CREATE TABLE __pgsqlite_fts_metadata (
table_name TEXT,
column_name TEXT,
fts_table_name TEXT,
config_name TEXT,
tokenizer TEXT,
stop_words TEXT, -- JSON array
PRIMARY KEY (table_name, column_name)
);tsvector→ TEXT column in main table (stores metadata) + FTS5 shadow tabletsquery→ Translated to FTS5 MATCH syntax at query timeregconfig→ Mapped to FTS5 tokenizer configurations
tsvectorOID: 3614tsqueryOID: 3615regconfigOID: 3734
For each tsvector column, create a shadow FTS5 table:
CREATE VIRTUAL TABLE __pgsqlite_fts_{table}_{column} USING fts5(
content, -- The indexed text
weights, -- A,B,C,D weights as space-separated positions
lexemes UNINDEXED,-- Original lexemes for exact reconstruction
tokenize = 'porter unicode61' -- Configurable based on regconfig
);-- PostgreSQL:
INSERT INTO articles (id, title, content_tsv)
VALUES (1, 'Title', to_tsvector('english', 'The quick brown fox'));
-- Translated to:
BEGIN;
INSERT INTO articles (id, title, content_tsv)
VALUES (1, 'Title', '{"fts_ref": "__pgsqlite_fts_articles_content_tsv", "config": "english"}');
INSERT INTO __pgsqlite_fts_articles_content_tsv (rowid, content, weights, lexemes)
VALUES (1, 'The quick brown fox', '', '{"brown":{"pos":[3],"weight":"D"},"fox":{"pos":[4],"weight":"D"},"quick":{"pos":[2],"weight":"D"}}');
COMMIT;-- PostgreSQL:
SELECT * FROM articles WHERE content_tsv @@ to_tsquery('english', 'quick & fox');
-- Translated to:
SELECT DISTINCT a.* FROM articles a
JOIN __pgsqlite_fts_articles_content_tsv f ON a.rowid = f.rowid
WHERE f.content MATCH 'quick AND fox';-- PostgreSQL:
UPDATE articles SET content_tsv = to_tsvector('english', 'New content')
WHERE id = 1;
-- Translated to:
BEGIN;
UPDATE articles SET content_tsv = '{"fts_ref": "__pgsqlite_fts_articles_content_tsv", "config": "english"}'
WHERE id = 1;
UPDATE __pgsqlite_fts_articles_content_tsv
SET content = 'New content',
weights = '',
lexemes = '{"content":{"pos":[2],"weight":"D"},"new":{"pos":[1],"weight":"D"}}'
WHERE rowid = (SELECT rowid FROM articles WHERE id = 1);
COMMIT;- to_tsvector(regconfig, text) → Creates tsvector
- to_tsquery(regconfig, text) → Creates tsquery
- plainto_tsquery(regconfig, text) → Creates tsquery from plain text
- phraseto_tsquery(regconfig, text) → Creates phrase query
- websearch_to_tsquery(regconfig, text) → Web search syntax
- ts_rank(tsvector, tsquery) → Calculate relevance
- ts_rank_cd(tsvector, tsquery) → Cover density ranking
- ts_headline(regconfig, text, tsquery) → Generate snippets
@@(match) → FTS5 MATCH@>(contains) → Custom implementation<@(contained by) → Custom implementation||(concatenate) → Merge tsvectors
- Migration v9: Add FTS schema tables
- Create FtsTranslator module
- Implement type recognition for tsvector/tsquery
- Basic CREATE TABLE translation with tsvector columns
- Implement to_tsvector() function
- Implement to_tsquery() function
- INSERT statement translation
- SELECT with @@ operator translation
- UPDATE/DELETE translation
- ts_rank() and ts_rank_cd() functions
- ts_headline() function
- Additional query functions (plainto_tsquery, etc.)
- Multiple language configurations
- Custom stop words
- Weight support (setweight function)
- Position information preservation
- Query plan optimization for FTS queries
- Caching of parsed tsvectors
- Batch operation optimizations
- Performance benchmarking
Create benchmark suite comparing:
- Pure SQLite FTS5 performance
- pgsqlite FTS implementation
- Native PostgreSQL FTS (for reference)
// File: tests/benchmark_fts.rs
#[bench]
fn bench_fts_insert_baseline_sqlite() {
// Direct SQLite FTS5 inserts
// Measure: inserts/second
}
#[bench]
fn bench_fts_insert_pgsqlite() {
// PostgreSQL protocol FTS inserts through pgsqlite
// Measure: inserts/second and overhead percentage
}
#[bench]
fn bench_fts_search_simple() {
// Single term searches
// Compare: SQLite MATCH vs pgsqlite @@ operator
}
#[bench]
fn bench_fts_search_complex() {
// Complex boolean queries
// Compare: query translation overhead
}
#[bench]
fn bench_fts_ranking() {
// ts_rank() performance
// Measure: overhead of rank calculation
}
#[bench]
fn bench_fts_mixed_workload() {
// 70% searches, 20% inserts, 10% updates
// Measure: real-world performance impact
}- Overhead Percentage:
(pgsqlite_time - sqlite_time) / sqlite_time * 100 - Operations per Second: For each operation type
- Memory Usage: Shadow table overhead
- Translation Time: Time spent in query translation
- Cache Hit Rate: For translated queries and tsvectors
- INSERT overhead: < 200% (similar to current INSERT performance)
- Simple SELECT overhead: < 150%
- Complex query overhead: < 300%
- Ranking function overhead: < 100%
- Memory overhead: < 2x the text size
#!/bin/bash
# File: tests/runner/run_fts_benchmarks.sh
# Setup test data
echo "Creating test dataset..."
sqlite3 bench_fts.db < tests/sql/fts/create_benchmark_data.sql
# Run SQLite baseline
echo "Running SQLite FTS5 baseline..."
time sqlite3 bench_fts.db < tests/sql/fts/benchmark_queries.sql > sqlite_baseline.txt
# Run pgsqlite
echo "Running pgsqlite FTS..."
./target/release/pgsqlite --database bench_fts.db &
PGSQLITE_PID=$!
sleep 2
time psql -h localhost -p 5432 -U postgres -d postgres \
-f tests/sql/fts/benchmark_queries_pg.sql > pgsqlite_results.txt
kill $PGSQLITE_PID
# Compare results
echo "Analyzing results..."
python3 tests/scripts/analyze_fts_benchmarks.py- Type conversion tests
- Query translation tests
- Function implementation tests
- Full query execution tests
- PostgreSQL client compatibility tests
- Edge cases and error handling
-- Create diverse test dataset
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
tags TEXT[],
search_vector tsvector
);
-- Insert test data with various characteristics:
-- - Different languages
-- - Various document lengths
-- - Special characters and edge cases
-- - Performance testing corpus (100k+ documents)-- Migration v9: Full-Text Search Support
CREATE TABLE IF NOT EXISTS __pgsqlite_fts_metadata (
table_name TEXT,
column_name TEXT,
fts_table_name TEXT,
config_name TEXT,
tokenizer TEXT,
stop_words TEXT,
PRIMARY KEY (table_name, column_name)
);
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_table_name TEXT;
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_config TEXT DEFAULT 'english';
ALTER TABLE __pgsqlite_schema ADD COLUMN fts_weights TEXT;
-- Register new types
INSERT INTO __pgsqlite_type_map (pg_type, sqlite_type, oid)
VALUES
('tsvector', 'TEXT', 3614),
('tsquery', 'TEXT', 3615),
('regconfig', 'TEXT', 3734);- All PostgreSQL FTS operators and core functions work correctly
- Performance overhead is within acceptable limits (see goals above)
- psql and common PostgreSQL clients work without modification
- All tests pass in CI/CD pipeline
- Comprehensive documentation is available
- GIN Index Emulation: Create custom index structures for better performance
- Phrase Search: Enhanced phrase matching beyond FTS5 capabilities
- Custom Dictionaries: User-defined dictionaries and thesaurus support
- Multilingual Support: Better handling of non-English languages
- Streaming Updates: Efficient bulk loading of FTS data