-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3 llm workflow #4
Conversation
@daavoo Setup and demo works on Codespaces with no issues. I was doing some testing against the EU AI Act pdf doc, and it was having some difficulties getting answers. For example for this question: Then I get: I think this happens because the 'C)' in the question looks like a section, and it seems to be somehow finding that as a 'section result', then retrieving it and getting an error. This happens for all questions that have a multiple-choice style. It seems like this means the LLM is not actually restricting itself well to retrieving from within Even if I remove the A,b,C,D options from the question, it is still trying to retrieve sections that do not exist. |
Is it the full doc? Can you send me the link? |
@daavoo - here's the full EU AI Act pdf. It is very long so perhaps you're right in terms of input context. I will re-do some testing will smaller sections tommorrow. |
thanks!
When I did the initial testing I was using individual chapters I created by splitting the pdf |
@daavoo - New pre-processing seems to work well and quickly! I'm still having difficulties getting correct answers though. I tested against this paper Some examples: example 1:
example 2:
I wonder if its something we can improve with the quality of the prompts? Or can we write some logic so that if section doesn't exist, the model comes up with a new section to look at so that it doesn't break. |
I think it might be more related to using a better instruct model.
This we can try to workaround in the code, but I also think a better model should be able to follow the instruction of picking a name from the list. |
@daavoo I did some manual experimentation and testing against this paper for 7 different questions:
FYI This is the find prompt that gave me that result: |
@stefanfrench I think that could be a reasonable default to have. I was running out of memory on codespaces so I tested your prompt with Qwen/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf. Do you think we can merge this (if you have confirmed that the logic works) so then I can move to work on the benchmark code to test different ones (so we can pick the "best" default)? |
Preprocessing
Uses pymupdf4llm to convert input_file to markdown.
Then uses langchain_text_splitters to split the markdown into sections based on the headers.
Workflow
Uses a single model and 2 different prompts:
Runs in a loop until the correct answer is found or an invalid section is queried or there are no sections left.
The process can be followed in the
debug
logs.