Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

sheldonrampton · 2024-08-07T02:44:47Z

Summary

This updates the Pinecone example file, "Using_Pinecone_for_embeddings_search.ipynb," to use current versions of the Pinecone and OpenAI APIs and also fixes a mismatch between the embedding model specified in the notebook and the embedding model that was used to create the embeddings file which the notebook retrieves.

Motivation

The Pinecone and OpenAI APIs that were used to create the notebook have both been revised since the notebook was created. I noticed this when I tried using the code and encountered error messages.

In addition to the issues with old API calling syntax that is now deprecated, I noticed a mismatch between the embedding model specified in the notebook (text-embedding-3-small) and the embedding model that was used to create the embeddings file that is referenced at embeddings_url https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip

The embeddings file was created using text-embedding-ada-002 as its embedding model. As a result, running the query_article() function produces nonsense results. Here is the result I got when I searched for similar results to modern art in Europe in the "title" namespace:

General Dynamics F-16 Fighting Falcon (score = 0.0341419838)
Mikoyan-Gurevich MiG-17 (score = 0.0325526334)
The Good, the Bad and the Ugly (score = 0.0281740129)
Mikoyan-Gurevich MiG-15 (score = 0.0260391217)
Musical genre (score = 0.0248822626)

And here are the results I got when I searched for "Famous battles in Scottish history" in the "content" namespace:

585 BC (score = 0.0467720367)
Order of the British Empire (score = 0.0448796861)
40s BC (score = 0.0444191061)
Order of the Bath (score = 0.0433623493)
Julius Caesar (score = 0.0405869484)

Once I switched back to the older text-embedding-ada-002 embeddings model, the notebook produced correct results. The notebook should therefore use "text-embedding-ada-002," or else you should regenerate file vector_database_wikipedia_articles_embedded.zip using the newer embeddings model.

github-actions · 2024-10-07T02:03:43Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 10 days.

sheldonrampton · 2024-10-07T04:02:29Z

This issue has not been fixed and causes errors when trying to run the example. What can I do to help move forward action on this?

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs

c9fd047

github-actions bot added the Stale label Oct 7, 2024

github-actions bot removed the Stale label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

sheldonrampton commented Aug 7, 2024

github-actions bot commented Oct 7, 2024

sheldonrampton commented Oct 7, 2024

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

Are you sure you want to change the base?

Update 'Using_Pinecone_for_embeddings_search.ipynb' to current APIs #1355

Conversation

sheldonrampton commented Aug 7, 2024

Summary

Motivation

github-actions bot commented Oct 7, 2024

sheldonrampton commented Oct 7, 2024