feat: BYOK cloud embedding providers (OpenAI, Voyage, Gemini) by iamvirul · Pull Request #40 · VecGrep/vecgrep

iamvirul · 2026-03-04T18:17:54Z

Pull Request

Type of Change

✨ New feature
♻️ Refactor
📖 Documentation update
✅ Test improvement

Description

Adds Bring Your Own Key (BYOK) support for cloud embedding providers, allowing users to choose between the existing local model and three cloud providers (OpenAI, Voyage AI, Google Gemini) when indexing a codebase.

Previously, VecGrep only supported a single local embedding backend. This change introduces a strategy-pattern EmbeddingProvider abstract base class so new providers can be added by implementing a single interface. Provider selection is wired through the MCP index_codebase tool.

The vector store now persists embedding dimensionality in the per-project meta table so the LanceDB schema is created with the correct dimensions for whichever provider is in use (384 / 1024 / 1536 / 3072). A provider lock prevents silent dimension mismatches when switching providers. Re-indexing with a different provider requires force=True.

Related Issues / PRs

Based on work from: https://github.com/Kavirubc/VecGrep/tree/feat/byok-embedding-providers

Changes Made

src/vecgrep/embedder.py
- Introduced EmbeddingProvider ABC
- Refactored local ONNX/torch logic into LocalProvider
- Added OpenAIProvider, VoyageProvider, GeminiProvider with lazy-loaded clients
- Added get_provider(name) registry
- Kept backward-compatible embed() free function
src/vecgrep/store.py
- Added _chunks_schema(dims) factory for dynamic vector size
- Updated VectorStore.__init__ to accept dims and read stored dims from meta
- Added _get_meta / _set_meta helpers
- Added set_provider_meta / get_provider_meta
- Added drop_and_recreate_chunks(dims) for force-switching providers
- Updated get_stats() to return provider, model, and dims
src/vecgrep/server.py
- index_codebase now accepts a provider parameter
- _resolve_provider enforces per-project provider lock
- _do_index passes provider dims to the store and saves provider meta
- Guarded watch=True against cloud providers
- LiveSyncHandler skips non-local providers to prevent unbounded API usage
pyproject.toml
- Added optional extras:
  - vecgrep[openai]
  - vecgrep[voyage]
  - vecgrep[gemini]
  - vecgrep[cloud]
tests/test_providers.py
- Full BYOK test suite
- Provider registry tests
- LocalProvider tests
- Cloud providers tested with mocked API calls
- Missing-key and missing-package error paths validated
README.md / CHANGELOG.md
- Updated documentation

Testing

Unit tests
Integration tests
Manual testing

Manual Testing

LocalProvider tested end-to-end via existing integration tests
Cloud providers tested using mocked openai, voyageai, and google-genai clients
Runtime key and missing-package errors verified to raise clear RuntimeError messages

Checklist

Code follows project style guidelines (ruff passes)
Self-review completed
Code commented where necessary
Documentation updated
No new warnings introduced
Tests added to verify feature
All unit tests pass locally
Any dependent downstream changes merged

Screenshots

N/A — CLI/MCP tool changes only.

- README: document cloud providers (OpenAI, Voyage, Gemini) with API key env vars, optional install extras, provider lock behaviour, updated tool signatures (index_codebase gains provider param), and updated get_index_status example showing provider/model/dims fields - CHANGELOG: add [Unreleased] entry covering BYOK cloud providers, strategy- pattern EmbeddingProvider ABC, dynamic vector dims, provider lock, live-sync guard, VectorStore meta helpers, and new test suite

codecov · 2026-03-04T18:41:52Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

iamvirul merged commit b558a80 into main Mar 4, 2026
1 check passed

iamvirul mentioned this pull request Mar 4, 2026

chore: release v1.7.0 — BYOK embedding providers #41

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: BYOK cloud embedding providers (OpenAI, Voyage, Gemini)#40

feat: BYOK cloud embedding providers (OpenAI, Voyage, Gemini)#40
iamvirul merged 1 commit intomainfrom
feat/byok-embedding-providers

iamvirul commented Mar 4, 2026

Uh oh!

Uh oh!

codecov bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iamvirul commented Mar 4, 2026

Pull Request

Type of Change

Description

Related Issues / PRs

Changes Made

Testing

Manual Testing

Checklist

Screenshots

Uh oh!

Uh oh!

codecov bot commented Mar 4, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant