We should revisit how we are splitting our documents into chunks, and for those chunks to be represented as a vector of embedding.
This is very important because we need to know how to split our documents in a way that makes sense , so the default way we should split our documents as a start is maybe into paragraphs that we find in a PDF documents.
We should think afterwards of what splitting mechanism is the most efficient and the most relevant to our use cases so we can use it differently on each document type .
We should revisit how we are splitting our documents into chunks, and for those chunks to be represented as a vector of embedding.
This is very important because we need to know how to split our documents in a way that makes sense , so the default way we should split our documents as a start is maybe into paragraphs that we find in a PDF documents.
We should think afterwards of what splitting mechanism is the most efficient and the most relevant to our use cases so we can use it differently on each document type .