3 llm workflow #4

daavoo · 2024-12-27T11:23:00Z

Preprocessing

Uses pymupdf4llm to convert input_file to markdown.
Then uses langchain_text_splitters to split the markdown into sections based on the headers.

Workflow

Uses a single model and 2 different prompts:

To find the appropriate section based on the question.
To answer the question using the information available in the section previously found.

Runs in a loop until the correct answer is found or an invalid section is queried or there are no sections left.

The process can be followed in the debug logs.

Codespaces Setup

bash .github/setup.sh

Demo

python -m streamlit run demo/app.py

CLI

structured-qa --from_config example_data/config.yaml \
--question "How many and what GPUs were used to train the model?"

stefanfrench · 2025-01-02T16:09:48Z

@daavoo Setup and demo works on Codespaces with no issues.

I was doing some testing against the EU AI Act pdf doc, and it was having some difficulties getting answers. For example for this question:
"What is the threshold, measured in floating point operations, that leads to a presumption that a general-purpose AI model has systemic risk?
A) 10^15, B) 10^20, C) 10^25

Then I get:
Finding section 2025-01-02 15:32:39.361 | DEBUG | structured_qa.workflow:find_retrieve_answer:77 - Result: C) 10^25 2025-01-02 15:32:39.361 | INFO | structured_qa.workflow:find_retrieve_answer:81 - Retrieving section: C) 10^25 2025-01-02 15:32:39.361 | ERROR | structured_qa.workflow:find_retrieve_answer:88 - Unknown section: C) 10^25 2025-01-02 15:33:08.485 | DEBUG | structured_qa.workflow:find_retrieve_answer:52 - Current information available: None

I think this happens because the 'C)' in the question looks like a section, and it seems to be somehow finding that as a 'section result', then retrieving it and getting an error.

This happens for all questions that have a multiple-choice style. It seems like this means the LLM is not actually restricting itself well to retrieving from within {SECTIONS}, and creating searching for sections that don't exist.

Even if I remove the A,b,C,D options from the question, it is still trying to retrieve sections that do not exist.

daavoo · 2025-01-02T18:07:18Z

EU AI Act pdf doc

Is it the full doc? Can you send me the link?
I think what you describe might happen if there are a lot of sections as the current model struggles with long input context

stefanfrench · 2025-01-02T18:38:40Z

@daavoo - here's the full EU AI Act pdf. It is very long so perhaps you're right in terms of input context. I will re-do some testing will smaller sections tommorrow.

daavoo · 2025-01-03T09:46:52Z

@daavoo - here's the full EU AI Act pdf

thanks!

It is very long so perhaps you're right in terms of input context. I will re-do some testing will smaller sections tommorrow.

When I did the initial testing I was using individual chapters I created by splitting the pdf

stefanfrench · 2025-01-03T12:43:06Z

@daavoo - New pre-processing seems to work well and quickly!

I'm still having difficulties getting correct answers though. I tested against this paper

Some examples:

example 1:

Question: How many large language models were evaluated?
Checks sections: ["5 symbolic reasoning","8 conclusions","c extended related work"]
then tries to check section 'c' -> which doesn't exist so causes an error

example 2:

Question: How many benchmarks were used to evaluate arithmetic reasoning?
Checks: ["5 symbolic reasoning"] , logically, it should really be should be checking section "3 arithmetic reasoning"
Then tries checking "conclusions' which doesn't exist, so causes an error

I wonder if its something we can improve with the quality of the prompts? Or can we write some logic so that if section doesn't exist, the model comes up with a new section to look at so that it doesn't break.

daavoo · 2025-01-03T14:58:33Z

I wonder if its something we can improve with the quality of the prompts

I think it might be more related to using a better instruct model.
Right now it is a 1.7B parameters model, we probably need something in the 8B range.
I will do some tests with a bigger one.

Or can we write some logic so that if section doesn't exist, the model comes up with a new section to look at so that it doesn't break.

This we can try to workaround in the code, but I also think a better model should be able to follow the instruction of picking a name from the list.

stefanfrench · 2025-01-08T12:17:54Z

I wonder if its something we can improve with the quality of the prompts

I think it might be more related to using a better instruct model. Right now it is a 1.7B parameters model, we probably need something in the 8B range. I will do some tests with a bigger one.

Or can we write some logic so that if section doesn't exist, the model comes up with a new section to look at so that it doesn't break.

This we can try to workaround in the code, but I also think a better model should be able to follow the instruction of picking a name from the list.

@daavoo I did some manual experimentation and testing against this paper for 7 different questions:

First I changed the model to bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf. This made the results go from 0/7 correct to 2/7 correct
I iterated on the FIND_PROMPT a few times and managed to increase from 2/7 to 7/7 correct

FYI This is the find prompt that gave me that result:
find_prompt.txt

daavoo · 2025-01-10T12:02:13Z

@daavoo I did some manual experimentation and testing against this paper for 7 different questions

@stefanfrench I think that could be a reasonable default to have. I was running out of memory on codespaces so I tested your prompt with Qwen/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf.

Do you think we can merge this (if you have confirmed that the logic works) so then I can move to work on the benchmark code to test different ones (so we can pick the "best" default)?

daavoo added 3 commits December 23, 2024 15:41

Add preprocessing module.

1e5a45b

Add CLI docs

3807cb0

Disable ocr and table structure

92f57df

daavoo linked an issue Dec 27, 2024 that may be closed by this pull request

LLM workflow #3

Closed

daavoo self-assigned this Dec 27, 2024

daavoo marked this pull request as ready for review January 2, 2025 10:22

daavoo requested a review from a team January 2, 2025 10:22

daavoo added 14 commits January 3, 2025 12:49

Update to use pymupdf4llm and langchain-text-splitters

13f24ea

Add workflow

07a092e

Update app to run workflow

afdfc33

pre-commit

fe2cde3

Add deps

52d9e9f

Update model

78de6ee

Extend CLI and Config

6ed555b

Add test_workflow. Expose prompts

4290366

pre-commit

cad0937

Update job name

c2ba244

Update config.yaml

a31479c

Add setup.sh

b6b5586

fix

012358e

Create output_dir when using from_config

b499899

daavoo force-pushed the 3-llm-workflow branch from eec20d7 to b499899 Compare January 3, 2025 11:53

daavoo mentioned this pull request Jan 3, 2025

Add preprocessing module. #2

Closed

daavoo added 3 commits January 3, 2025 12:56

Update docstring

abeb681

Replace hardcoded input_file

59a06bb

Update demo

a02c69f

fix(cli): Use values from config

e3327a0

daavoo added 2 commits January 10, 2025 11:54

Pass all args to Config

1934e32

format

00d20e5

daavoo added 4 commits January 10, 2025 12:14

Update defaults

46fad4d

Updates

6acbe42

Update test

e5af7a7

Update test to use FIND_PROMPT

d1b7587

daavoo merged commit 930bf64 into main Jan 13, 2025
3 checks passed

daavoo deleted the 3-llm-workflow branch January 13, 2025 14:05

daavoo linked an issue Jan 13, 2025 that may be closed by this pull request

Implement preprocessing module #1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

3 llm workflow #4

3 llm workflow #4

Uh oh!

daavoo commented Dec 27, 2024 •

edited

Loading

Uh oh!

stefanfrench commented Jan 2, 2025 •

edited

Loading

Uh oh!

daavoo commented Jan 2, 2025

Uh oh!

stefanfrench commented Jan 2, 2025

Uh oh!

daavoo commented Jan 3, 2025

Uh oh!

stefanfrench commented Jan 3, 2025 •

edited

Loading

Uh oh!

daavoo commented Jan 3, 2025

Uh oh!

stefanfrench commented Jan 8, 2025 •

edited

Loading

Uh oh!

daavoo commented Jan 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

3 llm workflow #4

3 llm workflow #4

Uh oh!

Conversation

daavoo commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preprocessing

Workflow

Uh oh!

stefanfrench commented Jan 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daavoo commented Jan 2, 2025

Uh oh!

stefanfrench commented Jan 2, 2025

Uh oh!

daavoo commented Jan 3, 2025

Uh oh!

stefanfrench commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daavoo commented Jan 3, 2025

Uh oh!

stefanfrench commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daavoo commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daavoo commented Dec 27, 2024 •

edited

Loading

stefanfrench commented Jan 2, 2025 •

edited

Loading

stefanfrench commented Jan 3, 2025 •

edited

Loading

stefanfrench commented Jan 8, 2025 •

edited

Loading

daavoo commented Jan 10, 2025 •

edited

Loading