Preservation of vector database #3

fewarren · 2024-04-16T21:20:20Z

The vector database seems to be cleared when Docker and the Nvidia AI Workbench are shutdown. Anyone know how to preserve and reload the vector database between instantiations?

freemansoft · 2024-07-30T11:23:08Z

Comments on in the NVIDIA forum are pointing folks at https://github.com/NVIDIA/nim-anywhere which has a standalone vector database. Don't know if it preserves the data.

jtcasablanca · 2024-09-26T13:57:40Z

@fewarren - i'm doing a little research on how the preservation should work.

What are some of your requirements?

fewarren · 2024-09-26T15:06:03Z

I am not clear on what you mean by "preservation". Fred On Thursday, September 26, 2024 at 06:58:05 AM PDT, JT Casablanca ***@***.***> wrote: @fewarren - i'm doing a little research on how the preservation should work. What are some of your requirements? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

freemansoft · 2024-09-26T15:33:46Z

AFAIK
The vector database holds the vectors representing the RAG documents. I don't want to have to reload the vector database (upload docs) every time I start the same project.

The current project is running an in memory database. The other project is running a standalone container with a mounted file system that holds the docs. That file system is available across restarts. (I'm typing this from memory so I could be thinking of another project)

fewarren · 2024-09-26T16:21:32Z

Hi Joe, I ran into the same problem and switched to the mounted file system approach. That solved the persistence issue. The second problem for me is that the existing Nvidia example code is extremely slow and unreliable when loading more than a few PDF files into the Milvus database. I am exploring an alternate approach that would allow for a full scale load into the database. I made some progress but then got deflected and am just now getting back to the issue. I found an example that I am trying to adapt: GitHub - ruslanmv/How-to-load-PDF-files-into-Milvus-by-using-Spark: How to ingest and embed PDF files at scale using Spark for Retrieval Augmented Generation. We will walk through the steps required to set up the environment, install the necessary co | | | | | | | | | | | GitHub - ruslanmv/How-to-load-PDF-files-into-Milvus-by-using-Spark: How ... How to ingest and embed PDF files at scale using Spark for Retrieval Augmented Generation. We will walk through ... | | | It may be possible to adapt this to run locally. I am still undecided as to whether to stick with the Nvidia platform or switch totally to this alternative one. It is critical to find a fast way to load a local file based Milvus database to create a system that is usable. I hope you make good progress towards your goals. Fred On Thursday, September 26, 2024 at 08:34:10 AM PDT, Joe Freeman ***@***.***> wrote: AFAIK The vector database holds the vectors representing the RAG documents. I don't want to have to reload the vector database (upload docs) every time I start the same project. The current project is running an in memory database. The other project is running a standalone container with a mounted file system that holds the docs. That file system is available across restarts. (I'm typing this from memory so I could be thinking of another project) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

sschaber381 mentioned this issue Apr 19, 2024

Error during build process: failed to fetch http://security.ubuntu.com/ubuntu/dists/jammy-security/InRelease #4

Closed

freemansoft mentioned this issue Jul 18, 2024

Searches all fail if vector database enbabled #13

Closed

winstonsf mentioned this issue Nov 12, 2024

build failed on mac book pro #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preservation of vector database #3

Preservation of vector database #3

fewarren commented Apr 16, 2024

freemansoft commented Jul 30, 2024 •

edited

Loading

jtcasablanca commented Sep 26, 2024

fewarren commented Sep 26, 2024 via email

freemansoft commented Sep 26, 2024

fewarren commented Sep 26, 2024 via email

Preservation of vector database #3

Preservation of vector database #3

Comments

fewarren commented Apr 16, 2024

freemansoft commented Jul 30, 2024 • edited Loading

jtcasablanca commented Sep 26, 2024

fewarren commented Sep 26, 2024 via email

freemansoft commented Sep 26, 2024

fewarren commented Sep 26, 2024 via email

freemansoft commented Jul 30, 2024 •

edited

Loading