fix: Agent uses the first configured vector_db_id when documents are provided #1276

dmartinol · 2025-02-26T11:20:48Z

What does this PR do?

The agent API allows to query multiple DBs using the vector_db_ids argument of the rag tool:

        toolgroups=[
            {
                "name": "builtin::rag",
                "args": {"vector_db_ids": [vector_db_id]},
            }
        ],

This means that multiple DBs can be used to compose an aggregated context by executing the query on each of them.

When documents are passed to the next agent turn, there is no explicit way to configure the vector DB where the embeddings will be ingested. In such cases, we can assume that:

if any vector_db_ids is given, we use the first one (it probably makes sense to assume that it's the only one in the list, otherwise we should loop on all the given DBs to have a consistent ingestion)
if no vector_db_ids is given, we can use the current logic to generate a default DB using the default provider. If multiple providers are defined, the API will fail as expected: the user has to provide details on where to ingest the documents.

(Closes #1270)

Test Plan

The issue description details how to replicate the problem.

leseb · 2025-02-26T13:41:28Z

llama_stack/providers/inline/agents/meta_reference/agent_instance.py

This seems somewhat arbitrary; I think we should at least log that information. What do you think?

Agree with logging of course.
About the "arbitrary" part: what else could we do in this case? Some ideas that come to mind:

add an explicit config arg to identify the ingestion vector_db?

extend the concept of session vector_db to store a list of ids?

stop the execution in case more dbs are given? (when documents are also provided)

other?

Thanks for the suggestions!

I don’t think the code should be making decisions on behalf of the user, so having a config arg to specify which DB to use in this scenario makes sense to me.

I think an important use case for multiple vector DBs is federating across those vector DBs. That's challenging to do well, and the approach to doing it is different when each vector DB has the same content and when they have different content. When they have the same content, you generally want to specify some sort of unique ID on each chunk and/or document that you can use to recognize when the same result came from two different sources so you can boost that result. All of that is out of scope for this PR of course, but it would be good to design the configuration for which DB to use in a way that reflects that in the future we might want the users to be able to select all and/or a subset and then provide additional configuration details about how to federate across them.

@jwm4 👍 for that is out of scope for this PR

@leseb I've added a new field insert_vector_db_id to configure the ingestion DB but I'm wondering how we can document this change. I see that the other argument vector_db_ids has not been well documented either, but just mentioned in code snippets.

yanxi0830 · 2025-02-26T18:36:02Z

I think we need to define the behaviour for when documents are provided. WDYT?

1/ If vector_db_id is not provided. We do not perform any indexing and sends the raw document content.
2/ If inserted_vector_db_id is provided for the document, we index the document into inserted_vector_db_id.
3/ When multiple vector_db_ids are provided in AgentConfig, but inserted_vector_db_id is not provided, we follow behaviour of (1)?

Reference discussion in #1118 (comment)

cc @hardikjshah

dmartinol · 2025-02-26T21:34:36Z

1/ If vector_db_id is not provided. We do not perform any indexing and sends the raw document content.

If we want to provide an option to send the whole document content in the context, what about adding a new builtin vector-io provider that implements this logic? (in a separate issue/PR) This would also reuse the existing PDF parsing logic.

2/ If inserted_vector_db_id is provided for the document, we index the document into inserted_vector_db_id.

+1

3/ When multiple vector_db_ids are provided in AgentConfig, but inserted_vector_db_id is not provided, we follow behaviour of (1)?

I'd use and explicit provider for that, instead (see point 1).

=====
What about reviewing the args of the builtin::rag tool as follows:

documents_db_id: the DB to store the given documents (also used to retrieve context)
vector_db_ids: additional DBs used only to query the context

Sample specification:

empty documents:
- empty or given documents_db_id:
  - context is retrieved from vector_db_ids
given documents:
- empty documents_db_id: raise ValueError
- given documents_db_id:
  - use is to ingest the document chunks
  - context is retrieved from documents_db_id + vector_db_ids
    This solution is similar to the latest commit, but removes some implicit behaviors.

Notes:

this option would allow to entirely remove the vector_db_id field from the session_info.
also, we can remove the logic to generate a default DB when none is given

The API would be more clear if we could change the definition of the documents field to hold both a vector_db_id and a list of Document. Unfortunately this option may generate some concerns because ATM the DBs are passed to the agent config but the documents are defined in the create turn API.

Reference discussion in #1118 (comment)
Thanks for linking to the other discussion!

dmartinol · 2025-02-27T14:23:46Z

@leseb @yanxi0830 I prepared a different version with the changes described in the previous comment:

documents_db_id: the DB to store the given documents (also used to retrieve context)
vector_db_ids: additional DBs used only to query the context

- this option would allow to entirely remove the vector_db_id field from the session_info.
- also, we can remove the logic to generate a default DB when none is given

Let me know if you prefer me to submit it instead.
Note: as a side effect, both changes should also impact the example code (rag.md) in this repo and examples from the llama-stack-apps repo. I will take care of both

ehhuang · 2025-02-27T18:36:13Z

It feels odd to me to insert documents into the provided vector db from the RAG tool, since the it is an ad-hoc document only for the current thread, and unlikely something that the user expects to persist in the persistent vector db.

How about this:

Documents are added to an ephemeral thread-level vector DB (with some TTL), which we create automatically when documents are present.
If user wants to add documents to their persistent vector DB, they can also do so explicitly by calling the existing API.

jwm4 · 2025-02-27T19:54:01Z

@ehhuang , I don't think I understand your comment. When you say this:

If user wants to add documents to their persistent vector DB, they can also do so explicitly by calling the existing API.

What do you mean by "the existing API"? Are you referring to the API for that vector DB, or some Llama Stack API and if the latter than which API is that?

FWIW, I do think the following snippet from the Quick Start guide is a little odd:

# Register a vector database
vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
client.vector_dbs.register(
    vector_db_id=vector_db_id,
    provider_id=provider_id,
    embedding_model="all-MiniLM-L6-v2",
    embedding_dimension=384,
)

# Insert the documents into the vector database
client.tool_runtime.rag_tool.insert(
    documents=documents,
    vector_db_id=vector_db_id,
    chunk_size_in_tokens=512,
)

Specifically, I would have expected that instead of a client.tool_runtime.rag_tool.insert command, there would be something like a client.vector_dbs.insert command for inserting into the vector DB that I just registered and then a client.tool_runtime.rag_tool command of some sort for pointing the RAG tool at that vector DB. With that said, given this code, I do kind of expect that an index with ID vector_db_id is created and persisted (and the documents are inserted into it) in whatever vector DB I have configured as my vector DB provider when I run this code. That could be an ephemeral thread-level database if I specified an ephemeral in-line vector DB provider, but if I configured a remote provider then that's what I would expect to be used here.

ehhuang · 2025-02-27T20:17:26Z

Yea sorry I was referring to client.tool_runtime.rag_tool.insert to insert documents. So to rephrase,

When documents is used with create_turn, an ephemeral db, associated with the session, is created and the documents are inserted there, independent of the vector_db_ids passed in with the rag_tool.
If users want to insert documents into some persisted vector_db, they use client.tool_runtime.rag_tool.insert.

dmartinol · 2025-02-27T22:06:42Z

When documents is used with create_turn, an ephemeral db, associated with the session, is created and the documents are inserted there, independent of the vector_db_ids passed in with the rag_tool.

Hey, thanks for your input!
I’m new to the project but AFAIK the doc suggests to use the agent also to ingest a (persistent) vector db.
Also, I’m a bit concerned about the “ephemeral” db, since we expect this db to support document embeddings and similarity search query, through the regular sequence of insert and query functions (which includes PDF handling). What provider do you have in mind?

Note that the initial problem tracked by the associated issue is that the current implementation cannot create a default db for the session when multiple providers are configured: let’s find a solution that does not cause the same issue again while tempting to change the behavior 😉

dmartinol · 2025-02-28T07:34:45Z

Updated the PR to:

Manage two separate arguments in the RAG tool configuration:

        toolgroups=[
            {
                "name": "builtin::rag",
                "args": {
                  "vector_db_ids": [_DB_IDS_FOR_QUERY_PURPOSES_], # Optional
                  "documents_db_id": _DB_ID_FOR_INGESTION_PURPOSES_ # If provided, it's also used at query time
                },
            }
        ],

Remove vector database ID from the session info
Update sample code in RAG docs

ehhuang · 2025-02-28T08:09:35Z

Note that the initial problem tracked by the associated issue is that the current implementation cannot create a default db for the session when multiple providers are configured: let’s find a solution that does not cause the same issue again while tempting to change the behavior 😉

Sorry I read into this issue and code in more detail (am pretty new to this project too). I realized that what I suggested above was already the current behavior, except that it broke (i.e. need to specify the provider_id). Can we just choose a provider_id from available ones arbitrarily? Alternatively, we choose one provider_id and throw an error if it's not available and when documents are provided.

Re. documents_db_id, thanks for putting up the solution. My concern here is the added complexity. My understanding of the point of the documents feature is convenience: instead of having to set up a vector db for a session and ingesting documents manually, all you need to do is attach documents to a message and the work would be done for you.

With documents_db_id, users need to set up the vector db and manage it with the session. All documents does then is to save one line of inserting it to the documents_db_id, compared to not using documents in message. This no longer justifies the complexity of having to learn about this new concept IMO.

dmartinol · 2025-02-28T08:31:18Z

Sorry I read into this issue and code in more detail (am pretty new to this project too). I realized that what I suggested above was already the current behavior, except that it broke (i.e. need to specify the provider_id). Can we just choose a provider_id from available ones arbitrarily?

Being "arbitrary" was the first comment I received, which then started the journey about trying to review the API and its behavior, my fault.
I will get back to the first option, I also think it's the best thing we can do w/o altering the original behavior. Also, it's a bit of a corner case as I'm not expecting many real setups with multiple providers for vector DBs.

If we want to change the API and behavior, we'll track and discuss it with a separate issue.

Re. documents_db_id, thanks for putting up the solution. My concern here is the added complexity. My understanding of the point of the documents feature is convenience: instead of having to set up a vector db for a session and ingesting documents manually, all you need to do is attach documents to a message and the work would be done for you.

Agree: Let's move on with the simpler solution then. It just have to be clear that this is a one-time consumption of these documents.

Signed-off-by: Daniele Martinoli <[email protected]>

…from session info Signed-off-by: Daniele Martinoli <[email protected]>

Signed-off-by: Daniele Martinoli <[email protected]>

…r an ephemeral vector db Signed-off-by: Daniele Martinoli <[email protected]>

dmartinol · 2025-02-28T10:44:20Z

Reverted all changes to modify the API behavior and using the first cofigured provider when no provider_id is specified.
Updated test_chat_agent to align with latest changes
Note some changes are needed in llama-stack-apps examples, as the RAG memory tool only accepts a tool_prompt_format="python_list" instead of tool_prompt_format="json". Created another PR fix: Fixing tool prompt format llama-stack-apps#196 for that.

ehhuang · 2025-02-28T18:41:43Z

LG. Thanks! @dmartinol

dmartinol · 2025-03-04T10:52:42Z

can we at least merge the changes to the broken UT?

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 26, 2025

dmartinol marked this pull request as ready for review February 26, 2025 11:21

dmartinol requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 26, 2025 11:21

leseb reviewed Feb 26, 2025

View reviewed changes

leseb approved these changes Feb 27, 2025

View reviewed changes

dmartinol added 4 commits February 28, 2025 09:43

adding the 1st configured vector_db_id, if any

3076977

Signed-off-by: Daniele Martinoli <[email protected]>

renamed insert_vector_db_id to documents_db_id, removed vector_db_id …

aa546de

…from session info Signed-off-by: Daniele Martinoli <[email protected]>

restored from upstream

5ca575e

Signed-off-by: Daniele Martinoli <[email protected]>

fixed test_chat_agent

1181754

Signed-off-by: Daniele Martinoli <[email protected]>

dmartinol force-pushed the fix_multi_vector_io branch from 3e7b351 to 1181754 Compare February 28, 2025 09:11

fixed RAG doc (broken URL)

2d0ad6b

Signed-off-by: Daniele Martinoli <[email protected]>

In case of missing provider_id, use the first one (if any) to registe…

ff3384d

…r an ephemeral vector db Signed-off-by: Daniele Martinoli <[email protected]>

ehhuang approved these changes Feb 28, 2025

View reviewed changes

shethaadit approved these changes Mar 5, 2025

View reviewed changes

ehhuang merged commit fb99868 into llamastack:main Mar 5, 2025
4 checks passed

fix: Agent uses the first configured vector_db_id when documents are provided #1276

fix: Agent uses the first configured vector_db_id when documents are provided #1276

Uh oh!

Conversation

dmartinol commented Feb 26, 2025

What does this PR do?

Test Plan

Uh oh!

leseb Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

dmartinol Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

leseb Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

jwm4 Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

dmartinol Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

dmartinol Feb 26, 2025

Choose a reason for hiding this comment

Uh oh!

yanxi0830 commented Feb 26, 2025

Uh oh!

dmartinol commented Feb 26, 2025

Uh oh!

dmartinol commented Feb 27, 2025

Uh oh!

ehhuang commented Feb 27, 2025

Uh oh!

jwm4 commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehhuang commented Feb 27, 2025

Uh oh!

dmartinol commented Feb 27, 2025

Uh oh!

dmartinol commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehhuang commented Feb 28, 2025

Uh oh!

dmartinol commented Feb 28, 2025

Uh oh!

dmartinol commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ehhuang commented Feb 28, 2025

Uh oh!

dmartinol commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

jwm4 commented Feb 27, 2025 •

edited

Loading

dmartinol commented Feb 28, 2025 •

edited

Loading

dmartinol commented Feb 28, 2025 •

edited

Loading