perf: switch to fastembed ONNX backend + user-selectable model/backend by iamvirul · Pull Request #29 · VecGrep/vecgrep

iamvirul · 2026-02-26T14:23:47Z

Pull Request

Type of Change

Description

The MCP server was slow to start (~6.6s) because sentence_transformers (PyTorch) was imported at module level. This delayed every Claude Code session on the first tool call.

This PR replaces the default embedding backend with fastembed (ONNX Runtime), bringing startup from ~6.6s down to ~1.25s. It also adds user-selectable backend and model via environment variables.

Related Issues / PRs

Closes #27
Closes #28

Changes Made

src/vecgrep/embedder.py — full rewrite with dual-backend lazy loading:
- VECGREP_BACKEND=onnx (default): fastembed + ONNX Runtime, ~100ms model load
- VECGREP_BACKEND=torch: sentence-transformers + PyTorch, supports any HF model
- VECGREP_MODEL: override the default HuggingFace model
- All heavy imports deferred to first embed() call
- Registers isuruwijesiri/all-MiniLM-L6-v2-code-search-512 as a custom fastembed ONNX model
pyproject.toml — add fastembed>=0.4.0 runtime dependency
tests/test_embedder.py — add TestTorchBackend class + fix TestDetectDevice for lazy import pattern
README.md — add Configuration section documenting VECGREP_BACKEND and VECGREP_MODEL

Testing

Unit tests
Manual testing (describe steps below)

All 110 existing tests pass. New tests added:

TestTorchBackend: validates shape (1, 384) and unit-norm vectors via VECGREP_BACKEND=torch
TestDetectDevice: covers cuda/mps/cpu paths by patching torch.cuda.is_available and torch.backends.mps.is_available directly

Startup benchmark:

Before: python -c "import vecgrep.server"  → ~6.6s
After:  python -c "import vecgrep.server"  → ~1.25s  (5× faster)

Checklist

My code follows the project's style guidelines (ruff passes)
I have performed a self-review of my own code
I have made corresponding changes to the documentation (if applicable)
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…kend - Replace sentence-transformers module-level import with dual-backend lazy loading system (fastembed ONNX default, torch opt-in) - Startup time: ~6.6s → ~1.25s (5× improvement) - ONNX model load: ~100ms vs ~2-3s for PyTorch on first embed() call - Register isuruwijesiri/all-MiniLM-L6-v2-code-search-512 as custom fastembed model via TextEmbedding.add_custom_model() with ONNX files from HuggingFace - Add VECGREP_BACKEND env var (onnx|torch) for backend selection - Add VECGREP_MODEL env var for custom HuggingFace model selection - Add fastembed>=0.4.0 to runtime dependencies - Update tests to cover torch backend and all device detection paths - Document new env vars in README Closes #28 Closes #27

codecov · 2026-02-26T14:28:54Z

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/vecgrep/embedder.py	93.93%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

iamvirul self-assigned this Feb 26, 2026

iamvirul merged commit dcfa869 into main Feb 26, 2026
1 check passed

iamvirul mentioned this pull request Feb 28, 2026

chore: release 1.5.0 changelog #34

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: switch to fastembed ONNX backend + user-selectable model/backend#29

perf: switch to fastembed ONNX backend + user-selectable model/backend#29
iamvirul merged 1 commit intomainfrom
perf/lazy-import-startup

iamvirul commented Feb 26, 2026

Uh oh!

Uh oh!

codecov bot commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamvirul commented Feb 26, 2026

Pull Request

Type of Change

Description

Related Issues / PRs

Changes Made

Testing

Checklist

Uh oh!

Uh oh!

codecov bot commented Feb 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant