Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This updates the Pinecone example file, "Using_Pinecone_for_embeddings_search.ipynb," to use current versions of the Pinecone and OpenAI APIs and also fixes a mismatch between the embedding model specified in the notebook and the embedding model that was used to create the embeddings file which the notebook retrieves.
Motivation
The Pinecone and OpenAI APIs that were used to create the notebook have both been revised since the notebook was created. I noticed this when I tried using the code and encountered error messages.
In addition to the issues with old API calling syntax that is now deprecated, I noticed a mismatch between the embedding model specified in the notebook (text-embedding-3-small) and the embedding model that was used to create the embeddings file that is referenced at embeddings_url https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip
The embeddings file was created using text-embedding-ada-002 as its embedding model. As a result, running the query_article() function produces nonsense results. Here is the result I got when I searched for similar results to modern art in Europe in the "title" namespace:
General Dynamics F-16 Fighting Falcon (score = 0.0341419838)
Mikoyan-Gurevich MiG-17 (score = 0.0325526334)
The Good, the Bad and the Ugly (score = 0.0281740129)
Mikoyan-Gurevich MiG-15 (score = 0.0260391217)
Musical genre (score = 0.0248822626)
And here are the results I got when I searched for "Famous battles in Scottish history" in the "content" namespace:
585 BC (score = 0.0467720367)
Order of the British Empire (score = 0.0448796861)
40s BC (score = 0.0444191061)
Order of the Bath (score = 0.0433623493)
Julius Caesar (score = 0.0405869484)
Once I switched back to the older text-embedding-ada-002 embeddings model, the notebook produced correct results. The notebook should therefore use "text-embedding-ada-002," or else you should regenerate file vector_database_wikipedia_articles_embedded.zip using the newer embeddings model.