Skip to content

Commit 57ce38c

Browse files
Add Python package for EmbeddingBridge
1 parent 82d0a8a commit 57ce38c

File tree

8 files changed

+1575
-0
lines changed

8 files changed

+1575
-0
lines changed

python/README.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# EmbeddingBridge Python Package
2+
3+
A Python interface to the EmbeddingBridge vector database with dataset management capabilities.
4+
5+
## Features
6+
7+
- Python ctypes bindings to the C core
8+
- Dataset management with S3 integration
9+
- Vector similarity search
10+
- Compatibility with Pinecone datasets
11+
- Simple and intuitive API
12+
13+
## Installation
14+
15+
### From Source
16+
17+
```bash
18+
pip install -e .
19+
```
20+
21+
### Requirements
22+
23+
- Python 3.7+
24+
- NumPy
25+
- pandas
26+
- pyarrow
27+
- boto3 (for S3 support)
28+
- zstandard
29+
30+
## Usage
31+
32+
### Embedding Store
33+
34+
```python
35+
from embeddingbridge import EmbeddingStore
36+
37+
# Create a new store
38+
store = EmbeddingStore("path/to/store", dimension=384)
39+
40+
# Add vectors
41+
store.add_vector(
42+
id="doc1",
43+
vector=[0.1, 0.2, ...], # Your vector values
44+
metadata={"text": "Example document", "source": "wiki"}
45+
)
46+
47+
# Search for similar vectors
48+
results = store.search([0.1, 0.2, ...], top_k=5)
49+
for result in results:
50+
print(f"ID: {result['id']}, Score: {result['score']}")
51+
print(f"Metadata: {result['metadata']}")
52+
```
53+
54+
### Dataset Management
55+
56+
```python
57+
from embeddingbridge import datasets
58+
59+
# List available datasets
60+
dataset_list = datasets.list_datasets()
61+
print(dataset_list)
62+
63+
# Load a dataset
64+
dataset = datasets.load_dataset("my-dataset")
65+
66+
# Get dataset info
67+
print(f"Dimension: {dataset.dimension}")
68+
print(f"Documents: {len(dataset)}")
69+
70+
# Search for similar vectors
71+
query_vector = [0.1, 0.2, ...] # Your query vector
72+
results = dataset.search(query_vector, top_k=10)
73+
for id, score in results:
74+
print(f"ID: {id}, Score: {score}")
75+
76+
# Save a dataset to S3
77+
dataset.save("s3://my-bucket/datasets/my-dataset")
78+
79+
# Load a dataset from S3
80+
s3_dataset = datasets.Dataset.from_path("s3://my-bucket/datasets/my-dataset")
81+
```
82+
83+
## API Reference
84+
85+
### `EmbeddingStore`
86+
87+
- `__init__(path, dimension=None)`: Initialize a new embedding store or open an existing one
88+
- `add_vector(id, vector, metadata=None)`: Add a vector to the store
89+
- `search(query_vector, top_k=10)`: Search for similar vectors
90+
- `get_vector(id)`: Get a vector by ID
91+
- `delete_vector(id)`: Delete a vector by ID
92+
- `get_metadata(id)`: Get metadata for a vector
93+
- `dimension`: Property returning the dimension of vectors
94+
- `count`: Property returning the number of vectors
95+
96+
### `Dataset`
97+
98+
- `from_path(path)`: Load a dataset from a local path or S3 bucket
99+
- `save(path, overwrite=False)`: Save dataset to local path or S3 bucket
100+
- `iter_documents(batch_size=100)`: Iterate through documents in batches
101+
- `search(query_vector, top_k=10)`: Search for similar vectors
102+
- `dimension`: Property returning the dimension of vectors
103+
- `documents`: DataFrame containing the documents
104+
105+
### Helper Functions
106+
107+
- `list_datasets(as_df=False)`: List available datasets
108+
- `load_dataset(name)`: Load a dataset by name
109+
110+
## License
111+
112+
This program is free software; you can redistribute it and/or modify
113+
it under the terms of the GNU General Public License as published by
114+
the Free Software Foundation; either version 2 of the License, or
115+
(at your option) any later version.

python/embeddingbridge/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
"""
2+
EmbeddingBridge - Python Interface to Vector Embedding Storage
3+
Copyright (C) 2024 ProgramComputer
4+
5+
This program is free software; you can redistribute it and/or modify
6+
it under the terms of the GNU General Public License as published by
7+
the Free Software Foundation; either version 2 of the License, or
8+
(at your option) any later version.
9+
"""
10+
11+
from .core import EmbeddingStore, EmbeddingBridge, CommandResult
12+
13+
__version__ = "0.1.0"
14+
__all__ = ['EmbeddingStore', 'EmbeddingBridge', 'CommandResult']

0 commit comments

Comments
 (0)