STRETTO: a new execution engine for LLM-augmented data systems

Stretto makes the cost–accuracy search space significantly more navigable introducing a new physical operator layer, while providing explicit, end-to-end guarantees at the query level.

⚙️ Setup

Use Python v3.13 or later.

# Export your OpenAI API key for query planning
export OPENAI_API_KEY=....

# Update the submodules
git submodule update --init --recursive

# Install dependencies and the package
pip install -r requirements.txt
pip install -e .

# On another terminal: Start the backend servers
# (open the script first to select which models to load on which GPUs)
bash scripts/start_servers.sh

# Optional: Run the demos
python demos/artwork.py
python demos/emails.py
python demos/real_estate.py
python demos/rotowire.py

🚀 Optimization experiments

# Run benchmarks
# Basic usage (requires a GPU device specification to run Stretto):
python scripts/run_benchmark.py --device cuda:0

# You can specify which approaches to run with --select-executors (optim_global = Stretto, abacus, lotus, optim_local, ...) 
# You can specify the datasets with --benchmarks (e.g. artwork_random_medium, movie_random, email_random, rotowire_random, ecommerce_random_large are used in the paper)
# Set precision and recall guarantees: --precision-guarantees / --recall-guarantees
# For example:
python scripts/run_benchmark.py --device cuda:0 --benchmarks artwork rotowire --select-executors optim_global lotus --precision-guarantees 0.7 0.9 --recall-guarantees 0.7 0.9

# To plot the results you can use our plotting script.
# The plots will appear in benchmark_results. For instance:
python scripts/plot_benchmark.py --benchmarks artwork_random_medium rotowire_random movie_random email_random ecommerce_random_large --approaches lotus optim_global

📊 Benchmarks

📁 Datasets

Here is how to obtain the data sets:

Artwork: Already included, running an artwork query will automatically download the required images from Wikidata.
Rotowire: Already included, see reasondb/benchmarks/evaluation/files
Email: Running the benchmark will automatically run the Palimpzest Script to download the data set. See also the instructions of Palimpzest.
Movies: Already included, see reasondb/benchmarks/evaluation/files/movies_1000.csv
ecommerce: Download the dataset here. We expect the following file structure /SemBench/ecomm/1/fashion-dataset/...

🔍 Query Generation

Random queries are generated from predefined query shapes and operator options. For each benchmark (i.e. artwork, rotowire, email, movie, ecommerce), you can find:

Query definitions and operator options: reasondb/evaluation/benchmarks/<benchmark_name>.py
- Example: reasondb/evaluation/benchmarks/artwork.py
- Contains:
  - ARTWORK_OPERATOR_OPTIONS: List of possible operators (filters and extracts) that can be used in random query generation
  - ARTWORK_QUERY_SHAPES: Templates defining the structure of randomly generated queries (e.g., 2 operators, 3 operators)
Query generation: The RandomBenchmark class (base class) generates queries by:
1. Selecting a query shape (number and order of operators)
2. Randomly sampling operators from the operator options
3. Creating valid operator sequences following the shape template

🗜️ KV Cache Compression Experiments

To analyze the effects of KV cache compression on individual operators without query optimization, use the run_benchmark_single_op_no_opt.py script. This evaluates single-operator queries (filters and/or extracts) with various compression ratios. This script tests physical operators in isolation to understand quality-cost tradeoffs across different KV cache compression ratios.

Basic usage:

# Select the benchmark: --benchmark
# Run filter operators only (default)
# Run only extract operators: --only-extracts
# Run all operators (filters + extracts): --all-operators
# Specify which KV methods to compare: --kv-methods (e.g. `kv70B05` refers to 70B version of the model with compression ratio 0.5)

python scripts/run_benchmark_single_op_no_opt.py --benchmark artwork_random --all-operators --kv-methods kv70B05 kv70B00

Other artifacts

Artifacts from our runs are provided in benchmark_results

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
benchmark_results		benchmark_results
demos		demos
kvpress @ 44db88f		kvpress @ 44db88f
palimpzest @ 01a7aaa		palimpzest @ 01a7aaa
reasondb		reasondb
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STRETTO: a new execution engine for LLM-augmented data systems

⚙️ Setup

🚀 Optimization experiments

📊 Benchmarks

📁 Datasets

🔍 Query Generation

🗜️ KV Cache Compression Experiments

Other artifacts

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

DataManagementLab/stretto

Folders and files

Latest commit

History

Repository files navigation

STRETTO: a new execution engine for LLM-augmented data systems

⚙️ Setup

🚀 Optimization experiments

📊 Benchmarks

📁 Datasets

🔍 Query Generation

🗜️ KV Cache Compression Experiments

Other artifacts

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages