Skip to content

Commit a3c3dc3

Browse files
committed
docs: Update README for v3.3.0 embedding features
1 parent 24515e8 commit a3c3dc3

2 files changed

Lines changed: 40 additions & 15 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,5 @@ test.png
1010
.coverage
1111
test/.coverage
1212
PRIVATE_NOTES.md
13+
FEATURE_ROADMAP.md
14+
RELEASE_v3.3.0.md

README.md

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@
55
# CogDB - Micro Graph Database for Python Applications
66
> Documents and examples at [cogdb.io](https://cogdb.io)
77
8-
> New release: 3.2.0
9-
> - New Torque query methods: `both()`, `is_()`, `unique()`, `limit()`, `skip()`, `back()`
10-
> - Bidirectional traversal and pagination support
11-
> - Navigate back to tagged vertices
8+
> New release: 3.3.0
9+
> - SIMD-optimized vector similarity with [SimSIMD](https://github.com/ashvardanian/SimSIMD) (10-50x faster)
10+
> - New methods: `k_nearest()`, `load_glove()`, `load_gensim()`, `put_embeddings_batch()`
11+
> - Bulk embedding loading and k-nearest neighbor search
1212
1313
![ScreenShot](notes/ex2.png)
1414

@@ -252,29 +252,45 @@ In a json, CogDB treats `_id` property as a unique identifier for each object. I
252252

253253
## Using word embeddings
254254

255-
CogDB supports word embeddings. Word embeddings are a way to represent words as vectors. Word embeddings are useful for many NLP tasks.
256-
There are various types of word embeddings, including popular ones like [GloVe](https://nlp.stanford.edu/projects/glove/) and [FastText](https://fasttext.cc/).
255+
CogDB supports word embeddings with SIMD-optimized similarity search powered by [SimSIMD](https://github.com/ashvardanian/SimSIMD). Word embeddings are useful for semantic search, recommendations, and NLP tasks.
257256

258-
#### Add a word embedding:
257+
#### Load pre-trained embeddings (GloVe):
259258

260259
```python
261-
g.put_embedding("orange", [0.1, 0.2, 0.3, 0.4, 0.5])
260+
# Load GloVe embeddings (one-liner!)
261+
count = g.load_glove("glove.6B.100d.txt", limit=50000)
262+
print(f"Loaded {count} embeddings")
263+
```
264+
265+
#### Load from Gensim model:
266+
267+
```python
268+
from gensim.models import Word2Vec
269+
model = Word2Vec(sentences)
270+
count = g.load_gensim(model)
262271
```
263272

264-
#### Get a word embedding:
273+
#### Add embeddings manually:
265274

266275
```python
267-
g.get_embedding("orange")
276+
g.put_embedding("orange", [0.1, 0.2, 0.3, 0.4, 0.5])
277+
278+
# Bulk insert for better performance
279+
g.put_embeddings_batch([
280+
("apple", [0.1, 0.2, ...]),
281+
("banana", [0.3, 0.4, ...]),
282+
])
268283
```
269284

270-
> [0.1, 0.2, 0.3, 0.4, 0.5]
271-
#### Delete a word embedding:
285+
#### Find k-nearest neighbors:
272286

273287
```python
274-
g.delete_embedding("orange")
288+
# Find 5 most similar vertices to "machine_learning"
289+
g.v().k_nearest("machine_learning", k=5).all()
275290
```
291+
> {'result': [{'id': 'deep_learning'}, {'id': 'neural_network'}, ...]}
276292
277-
#### Use word embeddings in a query:
293+
#### Filter by similarity threshold:
278294

279295
```python
280296
g.v().sim('orange', '>', 0.35).all()
@@ -286,7 +302,14 @@ g.v().sim('orange', 'in', [0.25, 0.35]).all()
286302
```
287303
> {'result': [{'id': 'banana'}, {'id': 'apple'}]}
288304
289-
In the above code, the sim method is used to filter vertices based on their cosine similarity with the word embedding for "orange". The operator and threshold arguments determine how the similarity is compared to the threshold value, which can be a single value or a range.
305+
#### Get embedding stats:
306+
307+
```python
308+
g.embedding_stats()
309+
```
310+
> {'count': 50000, 'dimensions': 100}
311+
312+
The `sim` method filters vertices based on cosine similarity. The `k_nearest` method returns the top-k most similar vertices.
290313

291314
## Loading data from a file
292315

0 commit comments

Comments
 (0)