fix: register embedding functions in extension SQL and install v2.0.0 schema in Docker by jjohare · Pull Request #136 · ruvnet/RuVector

jjohare · 2026-01-27T15:12:26Z

Summary

Fix version mismatch: Dockerfile now copies both ruvector--0.1.0.sql and ruvector--2.0.0.sql into the image. Previously only 0.1.0 was installed, but ruvector.control declares default_version = '2.0.0', causing CREATE EXTENSION ruvector to fail with "no installation script for version 2.0.0".
Register embedding functions: Both extension SQL files had a comment stub for embedding functions instead of actual CREATE FUNCTION declarations. The Docker build compiles fastembed into ruvector.so via --features embeddings, and all _wrapper symbols are present in the binary, but PostgreSQL never knew about them. Replaced the stubs with 11 function declarations (10 C functions + 1 SQL convenience function).
Fix volatility markers: embeddings.sql incorrectly marked stateful functions (ruvector_load_model, ruvector_embedding_models, ruvector_embed, etc.) as IMMUTABLE. Changed to VOLATILE where the function loads models, mutates cache state, or returns mutable results. Only ruvector_embedding_dims remains IMMUTABLE (pure dimension lookup).
Add ruvector_embed_vec() convenience function: ruvector_embed() returns real[] (PostgreSQL array format {...}), but the ruvector type expects bracket notation [...]. This SQL wrapper handles the conversion so users can do ruvector_embed_vec('some text')::ruvector(384) in a single call.
Add embedding smoke test to init.sql: Tests ruvector_default_model() and ruvector_embedding_dims() during container initialization.

Root Cause

The extension SQL files (ruvector--0.1.0.sql / ruvector--2.0.0.sql) were hand-crafted as a fallback because cargo pgrx schema is unreliable in Docker. The embedding function declarations from sql/embeddings.sql were never merged into these files — they only contained a note saying "build with --features embeddings", even though the Dockerfile does exactly that.

Files Changed

File	Change
`crates/ruvector-postgres/Dockerfile`	Copy both 0.1.0 and 2.0.0 SQL; verify embedding functions in build
`crates/ruvector-postgres/sql/ruvector--0.1.0.sql`	Replace embedding comment stub with 11 function declarations
`crates/ruvector-postgres/sql/ruvector--2.0.0.sql`	Replace embedding comment stub with 11 function declarations
`crates/ruvector-postgres/sql/embeddings.sql`	Fix IMMUTABLE→VOLATILE, add `ruvector_embed_vec()`, add _wrapper docs
`crates/ruvector-postgres/docker/init.sql`	Add embedding function smoke test

Test plan

Build Docker image: docker build -f crates/ruvector-postgres/Dockerfile -t ruvector-postgres .
Verify CREATE EXTENSION ruvector succeeds (v2.0.0)
Verify SELECT ruvector_embed('hello world') returns 384-dim real[]
Verify SELECT ruvector_embed_vec('hello world') returns ruvector type
Verify SELECT * FROM ruvector_embedding_models() lists 6 models
Verify SELECT ruvector_default_model() returns all-MiniLM-L6-v2
Verify init.sql smoke tests pass in container logs

🤖 Generated with claude-flow

Remove tests for PostgreSQL 14, 15, and 16 from CI workflows. Only PostgreSQL 17 is now tested to simplify the CI matrix. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Built from commit 7f7c069 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

The benchmark workflow was failing because pgrx-pg-sys requires PostgreSQL development headers. Added PostgreSQL 17 installation and pgrx initialization to both the main benchmarks job and the baseline comparison job. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add README.md to patches/ explaining the critical hnsw_rs patch - Run cargo fmt on ruvector-postgres to fix formatting issues The patches/hnsw_rs directory is REQUIRED for builds as it provides a WASM-compatible version of hnsw_rs (using rand 0.8 instead of 0.9). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add user-friendly introduction explaining: - What the library does in plain language - Who should use it (use cases table) - Key benefits with concrete examples - Simple "how it works" diagram Keeps all technical details intact while making the project more accessible to newcomers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Brief section highlighting the self-learning query DAG with: - Key benefits (automatic optimization, 50-80% latency reduction) - Core features (7 attention mechanisms, SONA learning, MinCut control) - Quick code example - Link to full documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…ion-1O4jg Explore optimized self-learning DAG architecture

- Create npm/packages/rudag with TypeScript SDK - WASM-accelerated DAG operations via ruvector-dag-wasm - IndexedDB persistence for browser environments - MemoryStorage fallback for Node.js - CLI tool for DAG operations (rudag command) - Restore patches/hnsw_rs for WASM builds Features: - DagOperator enum (SCAN, FILTER, PROJECT, JOIN, etc.) - AttentionMechanism enum (TOPOLOGICAL, CRITICAL_PATH, UNIFORM) - RuDag class with auto-save to IndexedDB - BrowserDagManager for browser-specific management - NodeDagManager with file-based persistence 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

SECURITY FIXES (HIGH): - Path traversal prevention in CLI (validateFilePath) - Path traversal prevention in FileDagStorage (ensureWithinBase) - Input validation for all public APIs (isValidDagId, isValidStorageId) - Type guards for WASM output (isCriticalPath) to prevent prototype pollution - Extension restrictions (.dag, .json only) in CLI MEMORY LEAK FIXES (HIGH): - Static methods now properly close owned storage connections - WASM cleanup on initialization failures - Cache invalidation on dispose() PERFORMANCE OPTIMIZATIONS: - Single-transaction IndexedDB saves (atomic read-modify-write) - Batch save API for bulk operations (saveBatch) - Result caching for topoSort and criticalPath - Lazy module loading in CLI for faster startup OTHER IMPROVEMENTS: - onblocked/onversionchange handlers for IndexedDB - Background save error handler (onSaveError option) - Comprehensive input validation with clear error messages - Convert sync to async file operations in Node.js 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Easy introduction to DAGs and rudag - Features and benefits overview - Detailed use cases (SQL optimizer, task scheduler, build system, ETL) - Integration examples (Express, React, D3.js, Bull, GraphQL, RxJS) - CLI documentation - Full API reference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Lead with the problem, not technical jargon - Visual task dependency diagram - Show what each method answers - Real-world use cases with emojis - Before/after comparison table 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Self-Learning Optimization with ML-inspired attention - WASM-Accelerated Performance (Rust/WebAssembly) - Automatic Cycle Detection - Critical Path Analysis - Zero-Config Persistence - Serialization & Interop 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Quote-style hook question - 3-line code snippet showing value immediately - Box-style ASCII diagram (more professional) - Question/Method/Answer table format - Expanded use cases with examples - Added Game AI and Workflow Engines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Description: Include key search terms (DAG, topological sort, critical path) Keywords: Expanded from 12 → 32 high-traffic terms including: - Graph terms: dag, directed-acyclic-graph, topological-sort, critical-path - Use cases: task-scheduler, workflow-engine, pipeline, etl, build-system - Tech: wasm, rust, typescript, indexeddb - Features: self-learning, bottleneck, performance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Feature comparison table showing: - Performance (WASM advantage) - Unique features (critical path, attention, persistence) - TypeScript support - Bundle sizes Plus 'When to Use What' guide for choosing the right library 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Remove wasm-pack generated .gitignore files that were blocking npm from including the pkg/ and pkg-node/ WASM binaries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Prevents wasm-pack from regenerating .gitignore files that block npm 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Comprehensive documentation for hooks including: - init, session-start, session-end - pre-edit, post-edit with success/error recording - pre-command, post-command with risk analysis - route for agent recommendations - remember/recall for vector memory - suggest-context for relevant context - stats for intelligence statistics - swarm-recommend for task routing - Configuration example for .claude/settings.json - Explanation of how self-learning works 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds Intelligence class with self-learning capabilities: - Q-learning for agent routing - Vector embeddings for semantic memory - Command classification and risk analysis - File sequence prediction Hooks commands added: - init, stats, session-start, session-end - pre-edit, post-edit, pre-command, post-command - route, suggest-context, remember, recall - pre-compact, swarm-recommend, async-agent - lsp-diagnostic, track-notification Bumps version to 0.1.39 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…EADME Updates all hook commands to use npx ruvector instead of ruvector for portability (no global install required). Updated files: - .claude/settings.json - all hooks now use npx ruvector - CLAUDE.md - documentation updated to npx ruvector 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The hooks init command now generates settings.json with npx ruvector hooks instead of ruvector hooks, ensuring commands work without global installation. Bumps to v0.1.40 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Updates hooks to use new object-based format: - matcher: { tools: [...] } instead of string - hooks: [{ type: "command", command: "..." }] instead of string array Required by Claude Code's updated hooks schema. Bumps to v0.1.41 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Format fixes based on Claude Code validation: - matcher: string regex (e.g., "Edit|Write|MultiEdit") not object - SessionStart/Stop: require { hooks: [...] } wrapper Bumps to v0.1.42 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add chrono dependency to Cargo.toml - Replace pgrx::TimestampWithTimeZone with chrono::Utc strings - Fix temporary reference error in analysis.rs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Built from commit 035c94a Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

Proposes two-tier dynamic cut architecture based on arXiv:2601.09139 (Goranci, Henzinger, Kiss, Momeni, Zöcklein, SODA 2026): - Tier 1: j-Tree hierarchy for O(n^ε) approximate cut queries - Tier 2: Existing exact min-cut (arXiv:2512.13105) for verification Key benefits: - Broader query support (sparsest cut, multi-way cut, multi-cut) - Vertex-split-tolerant cut sparsifier with poly-log recourse - Two-tier strategy: fast approximate + exact verification - Integration path with coherence gate (ADR-001)

Extends ADR-002 with cutting-edge techniques: - Predictive dynamics: SNN predicts updates before they happen - Neural sparsification: SpecNet + DSpar for 90% edge reduction - Lazy hierarchical evaluation: demand-paged j-tree levels - Warm-start cut-matching: reuse computation across updates - 256-core parallel distribution: leverage agentic chip - Streaming sketch fallback: O(n log n) space for n > 100K Target: sub-microsecond approximate queries, <100μs exact verification

Integrates @ruvnet/bmssp for j-tree acceleration: - O(m·log^(2/3) n) via path-cut duality (beats O(n log n)) - WasmNeuralBMSSP for learned edge importance/sparsification - Multi-source queries for terminal-based operations - 27KB WASM enables browser/edge deployment - 10-84x speedup over JavaScript implementations Key integration points: - BmsspJTreeLevel: WASM-backed j-tree levels - BmsspNeuralSparsifier: embedding-based edge selection - Hybrid deployment: BMSSP queries + native exact verification

Implements SOTA optimizations from ADR-002-addendum: - dspar.rs: Degree-based presparse (DSpar algorithm) - Effective resistance approximation via degree product - 5.9x speedup for initial sparsification - Configurable sparsity ratio and thresholds - cache.rs: LRU cache for path/cut distances - Prefetch based on access patterns - SIMD-ready distance array operations - Configurable capacity and eviction policy - mod.rs: Module exports and unified interface

- jtree_bench.rs: Comprehensive benchmarks for j-tree implementation - Query benchmarks (point-to-point, multi-terminal, all-pairs) - Update benchmarks (insert, delete, batch) - Scaling benchmarks (verify O(n^ε) complexity) - Memory benchmarks (full vs lazy hierarchy) - Cargo.toml: Add benchmark configuration and dependencies - Cargo.lock: Update lockfile with new dependencies

Security: - BMSSP-SECURITY-REVIEW.md: Comprehensive WASM security audit - Risk assessment matrix for FFI boundary - Input validation recommendations - Resource exhaustion mitigations Optimization: - simd_distance.rs: SIMD-accelerated distance array operations - Vectorized min/max/sum operations - Cache-line aligned memory access Tests: - jtree_tests.rs: Comprehensive test suite - Unit tests for LazyLevel transitions - Integration tests for TwoTierCoordinator - Property-based tests for approximation guarantees

- pool.rs: Pool allocator for frequent allocations - Reduces allocation overhead in hot paths - Configurable pool size and growth factor - jtree_tests.rs: Enhanced test coverage - Additional edge cases for hierarchy operations - Improved property-based test assertions

Core implementation of ADR-002 dynamic hierarchical j-tree: - mod.rs: Module exports, JTreeConfig, feature gates - level.rs: BmsspJTreeLevel with path-cut duality - min_cut(s, t) via shortest path in dual - multi_terminal_cut for k terminals - LRU cache for distance queries - hierarchy.rs: LazyJTreeHierarchy - Demand-paged level materialization - Warm-start recomputation from dirty state - O(n^ε) amortized updates - sparsifier.rs: DynamicCutSparsifier - Vertex-split-tolerant with poly-log recourse - Forest packing for edge sampling - Degree-based presparse integration - coordinator.rs: TwoTierCoordinator - Routes between approximate (Tier 1) and exact (Tier 2) - Configurable escalation triggers - Cross-tier result caching - lib.rs: Add jtree module with feature gate

- parallel.rs: Rayon-based parallel level updates - Lock-free cache updates with atomic operations - Work-stealing for imbalanced levels - Configurable thread pool size

- wasm_batch.rs: Batch WASM operations for reduced FFI overhead - Pre-allocate WASM memory for bulk transfers - TypedArray batching for distance arrays - Minimizes JS-WASM boundary crossings - lib.rs: Update module exports for optimization features

- benchmark.rs: Benchmark utilities for performance profiling - Throughput measurement helpers - Latency histogram tracking - Memory usage estimation - coordinator.rs: Additional safety checks and error handling - hierarchy.rs: Refined level management - lib.rs: Export new optimization modules

- optimization_bench.rs: Benchmarks for optimization components - DSpar presparse performance - Cache hit/miss ratios - SIMD distance operations - Pool allocator throughput - Parallel level update scaling - lib.rs: Update exports

- coordinator.rs: Fixed to work with JTreeHierarchy - Lazy initialization pattern with ensure_built() - EscalationPolicy enum (Never, Always, LowConfidence, etc.) - TierMetrics for usage tracking - 14 coordinator-specific tests passing - mod.rs: Export coordinator types - benchmark.rs: Minor refinements - parallel.rs: Minor refinements All 50 jtree tests now passing.

…F2Gvs Reviewed and verified: - All CI builds pass (linux, darwin, windows) - Crates published: ruvector-mincut@0.1.30, ruvector-mincut-wasm@0.1.29, ruvector-mincut-node@0.1.29 - NPM packages published: @ruvnet/bmssp@1.0.0 - Code compiles cleanly with workspace version 2.0.1 - BMSSP WASM integration and j-Tree optimizations verified

Built from commit 33c0d23 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

… schema in Docker The Docker image builds ruvector.so with --features embeddings, compiling fastembed into the binary. However, the extension SQL files (ruvector--0.1.0.sql and ruvector--2.0.0.sql) contained only a comment stub for embedding functions instead of actual CREATE FUNCTION declarations. This meant CREATE EXTENSION ruvector never registered the embedding functions despite the compiled symbols being present in the .so. Additionally, the Dockerfile only copied ruvector--0.1.0.sql into the image while ruvector.control declares default_version = '2.0.0', causing CREATE EXTENSION ruvector to fail with "no installation script for version 2.0.0". Changes: - Replace embedding comment stubs in both ruvector--0.1.0.sql and ruvector--2.0.0.sql with actual CREATE FUNCTION declarations using the pgrx _wrapper symbol convention - Add ruvector_embed_vec() convenience function (text -> ruvector type) - Fix Dockerfile to copy both 0.1.0 and 2.0.0 SQL files into the image - Fix volatility markers in embeddings.sql (IMMUTABLE -> VOLATILE for functions that load models or mutate state) - Add embedding function smoke test to docker/init.sql Co-Authored-By: claude-flow <ruv@ruv.net>

ruvnet and others added 30 commits December 30, 2025 15:34

chore: Update NAPI-RS binaries for all platforms

37a74f9

Built from commit 7f7c069 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

Merge pull request ruvnet#89 from ruvnet/claude/explore-dag-optimizat…

2e4df3c

…ion-1O4jg Explore optimized self-learning DAG architecture

fix(rudag): correct package.json exports to match actual build output

0d51613

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs(rudag): add badges to README

e2ceaf7

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs(rudag): fix CLI usage - rudag vs npx @ruvector/rudag

5901509

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore(rudag): add LICENSE file

6691129

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore(rudag): add .npmkeep to preserve WASM directories

415f08a

Prevents wasm-pack from regenerating .gitignore files that block npm 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs: update CLAUDE.md with new hooks format

68dadb6

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

docs: fix hooks format in CLAUDE.md

10bba36

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions Bot and others added 29 commits January 24, 2026 18:26

chore: Update NAPI-RS binaries for all platforms

aec62e5

Built from commit 035c94a Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

chore(mincut): update Cargo.toml and benchmark configuration

ecf2d4b

feat(mincut): add parallel optimization for j-tree updates

56a5f26

- parallel.rs: Rayon-based parallel level updates - Lock-free cache updates with atomic operations - Work-stealing for imbalanced levels - Configurable thread pool size

fix(mincut): update sparsifier with additional optimizations

3eef446

fix(mincut): refine hierarchy warm-start logic

48463a2

fix(mincut): update jtree module exports

1e40275

fix(mincut): enhance coordinator with security validations

b1ea430

fix(mincut): update lib.rs module declarations

c6dcb93

fix(mincut): coordinator refinements

2e4394b

feat(mincut): add optimization benchmark suite

1dc4aa2

- optimization_bench.rs: Benchmarks for optimization components - DSpar presparse performance - Cache hit/miss ratios - SIMD distance operations - Pool allocator throughput - Parallel level update scaling - lib.rs: Update exports

fix(mincut): update Cargo.toml, coordinator, and lib exports

50fa654

fix(mincut): update benchmark utilities and module exports

b814288

fix(mincut): refine WASM batch operations

9bc7c91

fix(mincut): additional refinements to jtree and wasm_batch

f33ee4d

fix(mincut): update SIMD distance operations

9abd1f6

chore: Update NAPI-RS binaries for all platforms

27a8c13

Built from commit 33c0d23 Platforms updated: - linux-x64-gnu - linux-arm64-gnu - darwin-x64 - darwin-arm64 - win32-x64-msvc 🤖 Generated by GitHub Actions

ruvnet force-pushed the main branch from 6964dfd to c82183f Compare April 21, 2026 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: register embedding functions in extension SQL and install v2.0.0 schema in Docker#136

fix: register embedding functions in extension SQL and install v2.0.0 schema in Docker#136
jjohare wants to merge 688 commits intoruvnet:mainfrom
jjohare:fix/extension-v2-packaging

jjohare commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jjohare commented Jan 27, 2026

Summary

Root Cause

Files Changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants