Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Astra DB, vector store] Upgrade to astrapy 1.0 usage #1218

Open
wants to merge 27 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5d0c188
notebook (except images) for astrapy
hemidactylus Nov 12, 2023
12d5bfa
add astrapy example; openai 1.0; datasets; fix language/typos
hemidactylus Nov 13, 2023
4dfe96d
merge main
hemidactylus Nov 13, 2023
b944661
improve title and links
hemidactylus Nov 13, 2023
935d9eb
demo db values in cassio notebook
hemidactylus Nov 13, 2023
1d8b659
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Nov 13, 2023
afb4afc
registry.yaml
hemidactylus May 17, 2024
5dbd170
Merge branch 'main' into SL-astradb-astrapy
hemidactylus May 17, 2024
1f681ed
modernize to astrapy 1
hemidactylus May 17, 2024
290c8d7
support for nondefault namespace; slight rewording in readme
hemidactylus May 17, 2024
786d6fc
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jun 4, 2024
4a5a203
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jun 13, 2024
62c0a74
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jun 14, 2024
c08702f
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jun 18, 2024
12f01f3
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 3, 2024
98d6f17
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 23, 2024
19cbcce
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 25, 2024
fe8a0b4
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 26, 2024
caa3469
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 31, 2024
700f691
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Jul 31, 2024
63c5038
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Aug 10, 2024
c799b0c
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Aug 21, 2024
c21a0f5
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Aug 29, 2024
007fab7
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Sep 10, 2024
c2b40f8
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Sep 16, 2024
ab431ed
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Sep 26, 2024
c81359c
Merge branch 'main' into SL-astradb-astrapy
hemidactylus Nov 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@
"\n",
"To find a quote similar to the provided search quote, the latter is made into an embedding vector on the fly, and this vector is used to query the store for similar vectors ... i.e. similar quotes that were previously indexed. The search can optionally be constrained by additional metadata (\"find me quotes by Spinoza similar to this one ...\").\n",
"\n",
"![2_vector_search](https://user-images.githubusercontent.com/14221764/282422033-0a1297c4-63bb-4e04-b120-dfd98dc1a689.png)\n",
"![2_vector_search](https://gist.github.com/assets/14221764/6c883c5b-defd-44d6-a64e-082255e66b57)\n",
"\n",
"The key point here is that \"quotes similar in content\" translates, in vector space, to vectors that are metrically close to each other: thus, vector similarity search effectively implements semantic similarity. _This is the key reason vector embeddings are so powerful._\n",
"\n",
Expand Down Expand Up @@ -92,7 +92,7 @@
},
"outputs": [],
"source": [
"!pip install --quiet \"astrapy>=0.6.0\" \"openai>=1.0.0\" datasets"
"!pip install --quiet \"astrapy>=1.1.0\" \"openai>=1.0.0\" \"datasets>=2.19.1\""
]
},
{
Expand All @@ -105,7 +105,7 @@
"from getpass import getpass\n",
"from collections import Counter\n",
"\n",
"from astrapy.db import AstraDB\n",
"from astrapy import DataAPIClient\n",
"import openai\n",
"from datasets import load_dataset"
]
Expand All @@ -123,7 +123,7 @@
"id": "65a8edc1-4633-491b-9ed3-11163ec24e46",
"metadata": {},
"source": [
"Please retrieve your database credentials on your Astra dashboard ([info](https://docs.datastax.com/en/astra/astra-db-vector/)): you will supply them momentarily.\n",
"Please retrieve your database credentials on your Astra dashboard ([info](https://docs.datastax.com/en/astra-db-serverless/index.html)): you will supply them momentarily.\n",
"\n",
"Example values:\n",
"\n",
Expand All @@ -138,25 +138,30 @@
"metadata": {},
"outputs": [
{
"name": "stdout",
"name": "stdin",
"output_type": "stream",
"text": [
"Please enter your API Endpoint: https://4f835778-ec78-42b0-9ae3-29e3cf45b596-us-east1.apps.astra.datastax.com\n",
"Please enter your Token ········\n"
"Please enter your API Endpoint: https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com\n",
"Please enter your Token: ········\n",
"Please enter your namespace. Leave empty for default: \n"
]
}
],
"source": [
"ASTRA_DB_API_ENDPOINT = input(\"Please enter your API Endpoint:\")\n",
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"Please enter your Token\")"
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"Please enter your Token:\")\n",
"\n",
"ASTRA_DB_KEYSPACE = input(\"Please enter your namespace. Leave empty for default:\")\n",
"if not ASTRA_DB_KEYSPACE:\n",
" ASTRA_DB_KEYSPACE = None"
]
},
{
"cell_type": "markdown",
"id": "f8c4e5ec-2ab2-4d41-b3ec-c946469fed8b",
"metadata": {},
"source": [
"### Instantiate an Astra DB client"
"### Connect to your Astra DB"
]
},
{
Expand All @@ -166,10 +171,8 @@
"metadata": {},
"outputs": [],
"source": [
"astra_db = AstraDB(\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
")"
"astra_db_client = DataAPIClient(token=ASTRA_DB_APPLICATION_TOKEN)\n",
"database = astra_db_client.get_database_by_api_endpoint(ASTRA_DB_API_ENDPOINT, namespace=ASTRA_DB_KEYSPACE)"
]
},
{
Expand All @@ -196,7 +199,7 @@
"outputs": [],
"source": [
"coll_name = \"philosophers_astra_db\"\n",
"collection = astra_db.create_collection(coll_name, dimension=1536)"
"collection = database.create_collection(coll_name, dimension=1536)"
]
},
{
Expand All @@ -222,7 +225,7 @@
"metadata": {},
"outputs": [
{
"name": "stdout",
"name": "stdin",
"output_type": "stream",
"text": [
"Please enter your OpenAI API Key: ········\n"
Expand Down Expand Up @@ -281,7 +284,7 @@
"output_type": "stream",
"text": [
"len(result.data) = 2\n",
"result.data[1].embedding = [-0.0108176339417696, 0.0013546717818826437, 0.00362232...\n",
"result.data[1].embedding = [0.009454815648496151, 0.0015082946047186852, -0.036191...\n",
"len(result.data[1].embedding) = 1536\n"
]
}
Expand Down Expand Up @@ -408,13 +411,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Starting to store entries: [20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][20][10]\n",
"Starting to store entries: [100][100][100][100][50]\n",
"Finished storing entries.\n"
]
}
],
"source": [
"BATCH_SIZE = 20\n",
"BATCH_SIZE = 100\n",
"\n",
"num_batches = ((len(philo_dataset) + BATCH_SIZE - 1) // BATCH_SIZE)\n",
"\n",
Expand Down Expand Up @@ -492,11 +495,11 @@
" for tag in tags:\n",
" filter_clause[\"tags\"][tag] = True\n",
" #\n",
" results = collection.vector_find(\n",
" query_vector,\n",
" results = collection.find(\n",
" vector=query_vector,\n",
" limit=n,\n",
" filter=filter_clause,\n",
" fields=[\"quote\", \"author\"]\n",
" projection={\"quote\": True, \"author\": True},\n",
" )\n",
" return [\n",
" (result[\"quote\"], result[\"author\"])\n",
Expand Down Expand Up @@ -531,10 +534,10 @@
"text/plain": [
"[('Life to the great majority is only a constant struggle for mere existence, with the certainty of losing it at last.',\n",
" 'schopenhauer'),\n",
" ('We give up leisure in order that we may have leisure, just as we go to war in order that we may have peace.',\n",
" 'aristotle'),\n",
" ('Perhaps the gods are kind to us, by making life more disagreeable as we grow older. In the end death seems less intolerable than the manifold burdens we carry',\n",
" 'freud')]"
" ('To endure life remains, when all is said, the first duty of all living being Illusion can have no value if it makes this more difficult for us.',\n",
" 'freud'),\n",
" ('To live is to suffer, to survive is to find some meaning in the suffering.',\n",
" 'nietzsche')]"
]
},
"execution_count": 14,
Expand Down Expand Up @@ -596,8 +599,8 @@
"data": {
"text/plain": [
"[('He who seeks equality between unequals seeks an absurdity.', 'spinoza'),\n",
" ('The people are that part of the state that does not know what it wants.',\n",
" 'hegel')]"
" ('One... gets an impression that civilization is something which was imposed on a resisting majority by a minority which understood how to obtain possession of the means to power and coercion. It is, of course, natural to assume that these difficulties are not inherent in the nature of civilization itself but are determined by the imperfections of the cultural forms which have so far been developed.',\n",
" 'freud')]"
]
},
"execution_count": 16,
Expand Down Expand Up @@ -636,10 +639,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
"3 quotes within the threshold:\n",
" 0. [similarity=0.927] \"The assumption that animals are without rights, and the illusion that ...\"\n",
" 1. [similarity=0.922] \"Animals are in possession of themselves; their soul is in possession o...\"\n",
" 2. [similarity=0.920] \"At his best, man is the noblest of all animals; separated from law and...\"\n"
"4 quotes within the threshold:\n",
" 0. [similarity=0.746] \"The assumption that animals are without rights, and the illusion that ...\"\n",
" 1. [similarity=0.728] \"Man is the only animal that must be encouraged to live....\"\n",
" 2. [similarity=0.727] \"Animals are in possession of themselves; their soul is in possession o...\"\n",
" 3. [similarity=0.725] \"At his best, man is the noblest of all animals; separated from law and...\"\n"
]
}
],
Expand All @@ -648,17 +652,18 @@
"# quote = \"Be good.\"\n",
"# quote = \"This teapot is strange.\"\n",
"\n",
"metric_threshold = 0.92\n",
"metric_threshold = 0.72\n",
"\n",
"quote_vector = client.embeddings.create(\n",
" input=[quote],\n",
" model=embedding_model_name,\n",
").data[0].embedding\n",
"\n",
"results_full = collection.vector_find(\n",
" quote_vector,\n",
"results_full = collection.find(\n",
" vector=quote_vector,\n",
" limit=8,\n",
" fields=[\"quote\"]\n",
" projection={\"quote\": True},\n",
" include_similarity=True,\n",
")\n",
"results = [res for res in results_full if res[\"$similarity\"] >= metric_threshold]\n",
"\n",
Expand Down Expand Up @@ -780,12 +785,12 @@
"output_type": "stream",
"text": [
"** quotes found:\n",
"** - Happiness is the reward of virtue. (aristotle)\n",
"** - Our moral virtues benefit mainly other people; intellectual virtues, on the other hand, benefit primarily ourselves; therefore the former make us universally popular, the latter unpopular. (schopenhauer)\n",
"** - Happiness is the reward of virtue. (aristotle)\n",
"** end of logging\n",
"\n",
"A new generated quote:\n",
"True politics lies in the virtuous pursuit of justice, for it is through virtue that we build a better world for all.\n"
"- In politics, true virtue is the compass that guides us towards a better society.\n"
]
}
],
Expand Down Expand Up @@ -814,12 +819,12 @@
"output_type": "stream",
"text": [
"** quotes found:\n",
"** - Because Christian morality leaves animals out of account, they are at once outlawed in philosophical morals; they are mere 'things,' mere means to any ends whatsoever. They can therefore be used for vivisection, hunting, coursing, bullfights, and horse racing, and can be whipped to death as they struggle along with heavy carts of stone. Shame on such a morality that is worthy of pariahs, and that fails to recognize the eternal essence that exists in every living thing, and shines forth with inscrutable significance from all eyes that see the sun! (schopenhauer)\n",
"** - The assumption that animals are without rights, and the illusion that our treatment of them has no moral significance, is a positively outrageous example of Western crudity and barbarity. Universal compassion is the only guarantee of morality. (schopenhauer)\n",
"** - Because Christian morality leaves animals out of account, they are at once outlawed in philosophical morals; they are mere 'things,' mere means to any ends whatsoever. They can therefore be used for vivisection, hunting, coursing, bullfights, and horse racing, and can be whipped to death as they struggle along with heavy carts of stone. Shame on such a morality that is worthy of pariahs, and that fails to recognize the eternal essence that exists in every living thing, and shines forth with inscrutable significance from all eyes that see the sun! (schopenhauer)\n",
"** end of logging\n",
"\n",
"A new generated quote:\n",
"Excluding animals from ethical consideration reveals a moral blindness that allows for their exploitation and suffering. True morality embraces universal compassion.\n"
"The true measure of humanity lies in our treatment of animals, for compassion towards all living beings reflects our moral essence.\n"
]
}
],
Expand Down Expand Up @@ -848,7 +853,7 @@
{
"data": {
"text/plain": [
"{'status': {'ok': 1}}"
"{'ok': 1}"
]
},
"execution_count": 22,
Expand All @@ -857,7 +862,7 @@
}
],
"source": [
"astra_db.delete_collection(coll_name)"
"collection.drop()"
]
}
],
Expand All @@ -877,7 +882,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions examples/vector_databases/cassandra_astradb/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ These example notebooks demonstrate implementation of
the same GenAI standard RAG workload with different libraries and APIs.

To use [Astra DB](https://docs.datastax.com/en/astra/home/astra.html)
with its HTTP API interface, head to the "AstraPy" notebook (`astrapy`
is the Python client to interact with the database).
through its Data API interface, head to the "AstraPy" notebook (`astrapy`
is the Python client for the Data API).

If you prefer CQL access to the database (either with
[Astra DB](https://docs.datastax.com/en/astra-serverless/docs/vector-search/overview.html)
Expand Down
10 changes: 10 additions & 0 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,16 @@
- embeddings
- completions

- title: Philosophy with Vector Embeddings, OpenAI and Cassandra / Astra DB
path: >-
examples/vector_databases/cassandra_astradb/Philosophical_Quotes_AstraPy.ipynb
date: 2023-11-13
authors:
- hemidactylus
tags:
- embeddings
- completions

- title: Cassandra / Astra DB
path: examples/vector_databases/cassandra_astradb/README.md
date: 2023-08-29
Expand Down