Skip to content

Commit 28ae1d9

Browse files
authored
9 add notebook demo (#10)
* Update API * Update index * Fix repo_url * Update getting-started * Updates with model loader * Update step-by-step guide * Update defaults * Update contributions * reorder * Lint * Fix import * fix test_workflow * Update README.md * Add demo/notebook * Add heading * Remove outdated text * Update Python version * Update name
1 parent 1868b90 commit 28ae1d9

File tree

8 files changed

+262
-27
lines changed

8 files changed

+262
-27
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,29 @@
11
<p align="center"><img src="./images/Blueprints-logo.png" width="35%" alt="Project logo"/></p>
22

3-
# Structured-Q&A: a Blueprint by Mozilla.ai for answering questions about structured documents.
3+
# Structured-QA: a Blueprint by Mozilla.ai for answering questions about structured documents.
44

55

66
[![](https://dcbadge.limes.pink/api/server/YuMNeuKStr?style=flat)](https://discord.gg/YuMNeuKStr)
7-
[![Docs](https://github.com/mozilla-ai/structured-q-a/actions/workflows/docs.yaml/badge.svg)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/docs.yaml/)
8-
[![Tests](https://github.com/mozilla-ai/structured-q-a/actions/workflows/tests.yaml/badge.svg)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/tests.yaml/)
9-
[![Ruff](https://github.com/mozilla-ai/structured-q-a/actions/workflows/lint.yaml/badge.svg?label=Ruff)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/lint.yaml/)
7+
[![Docs](https://github.com/mozilla-ai/structured-qa/actions/workflows/docs.yaml/badge.svg)](https://github.com/mozilla-ai/structured-qa/actions/workflows/docs.yaml/)
8+
[![Tests](https://github.com/mozilla-ai/structured-qa/actions/workflows/tests.yaml/badge.svg)](https://github.com/mozilla-ai/structured-qa/actions/workflows/tests.yaml/)
9+
[![Ruff](https://github.com/mozilla-ai/structured-qa/actions/workflows/lint.yaml/badge.svg?label=Ruff)](https://github.com/mozilla-ai/structured-qa/actions/workflows/lint.yaml/)
1010

1111

1212
This Blueprint demonstrates how to use open-source models and a simple LLM workflow to answer questions based on structured documents.
1313

1414
It is designed to showcase a simpler alternative to more complex and/or resource demanding alternatives, such as RAG systems that rely on vectorDBs and/or long-context models with large token windows.
1515

1616

17-
### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/structured-q-a/).
17+
### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/structured-qa/).
1818

1919

2020
## Quick-start
2121

22-
Get started with structured-q-a using one of the options below:
22+
Get started with structured-qa using one of the options below:
2323

2424
| Google Colab | HuggingFace Spaces | GitHub Codespaces |
2525
| -------------| ------------------- | ----------------- |
26-
| [![Try on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/structured-q-a/blob/main/demo/notebook.ipynb) | [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/mozilla-ai/structured-q-a) | [![Try on Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) |
26+
| [![Try on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/structured-qa/blob/main/demo/notebook.ipynb) | [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/mozilla-ai/structured-qa) | [![Try on Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) |
2727

2828
Alternatively, you can install it from pypi:
2929

demo/app.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def convert_to_sections(uploaded_file, output_dir):
2424
)
2525

2626

27-
st.title("Structured Q&A")
27+
st.title("Structured QA")
2828

2929
st.header("Uploading Data")
3030

demo/notebook.ipynb

Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Structured Q&A"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"Source code: https://github.com/mozilla-ai/structured-qa"
15+
]
16+
},
17+
{
18+
"cell_type": "markdown",
19+
"metadata": {},
20+
"source": [
21+
"Docs: https://mozilla-ai.github.io/structured-qa"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"metadata": {},
27+
"source": [
28+
"## GPU Check"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"metadata": {},
34+
"source": [
35+
"First, you'll need to enable GPUs for the notebook:\n",
36+
"\n",
37+
"- Navigate to `Edit`→`Notebook Settings`\n",
38+
"- Select T4 GPU from the Hardware Accelerator section\n",
39+
"- Click `Save` and accept.\n",
40+
"\n",
41+
"Next, we'll confirm that we can connect to the GPU:"
42+
]
43+
},
44+
{
45+
"cell_type": "code",
46+
"execution_count": null,
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"import torch\n",
51+
"\n",
52+
"if not torch.cuda.is_available():\n",
53+
" raise RuntimeError(\"GPU not available\")\n",
54+
"else:\n",
55+
" print(\"GPU is available!\")"
56+
]
57+
},
58+
{
59+
"cell_type": "markdown",
60+
"metadata": {},
61+
"source": [
62+
"## Installing dependencies"
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"metadata": {},
69+
"outputs": [],
70+
"source": [
71+
"%pip install --quiet https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu122/llama_cpp_python-0.3.4-cp311-cp311-linux_x86_64.whl\n",
72+
"%pip install --quiet structured-qa"
73+
]
74+
},
75+
{
76+
"cell_type": "markdown",
77+
"metadata": {},
78+
"source": [
79+
"## Uploading data"
80+
]
81+
},
82+
{
83+
"cell_type": "code",
84+
"execution_count": null,
85+
"metadata": {},
86+
"outputs": [],
87+
"source": [
88+
"from google.colab import files\n",
89+
"\n",
90+
"uploaded = files.upload()"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"metadata": {},
96+
"source": [
97+
"## Converting document to a directory of sections"
98+
]
99+
},
100+
{
101+
"cell_type": "code",
102+
"execution_count": null,
103+
"metadata": {},
104+
"outputs": [],
105+
"source": [
106+
"from pathlib import Path\n",
107+
"from structured_qa.preprocessing import document_to_sections_dir\n",
108+
"\n",
109+
"input_file = list(uploaded.keys())[0]\n",
110+
"sections_dir = f\"output/{Path(input_file).stem}\"\n",
111+
"section_names = document_to_sections_dir(input_file, sections_dir)\n",
112+
"section_names"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": [
119+
"## Loading model"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": null,
125+
"metadata": {},
126+
"outputs": [],
127+
"source": [
128+
"from structured_qa.model_loaders import load_llama_cpp_model\n",
129+
"\n",
130+
"model = load_llama_cpp_model(\n",
131+
" \"bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf\"\n",
132+
")"
133+
]
134+
},
135+
{
136+
"cell_type": "markdown",
137+
"metadata": {},
138+
"source": [
139+
"## Find, Retrieve, and Answer"
140+
]
141+
},
142+
{
143+
"cell_type": "code",
144+
"execution_count": null,
145+
"metadata": {},
146+
"outputs": [],
147+
"source": [
148+
"FIND_PROMPT = \"\"\"\n",
149+
"You are given two pieces of information:\n",
150+
"1. A user question.\n",
151+
"2. A list of valid section names.\n",
152+
"\n",
153+
"Your task is to:\n",
154+
"- Identify exactly one `section_name` from the provided list that seems related to the user question.\n",
155+
"- Return the `section_name` exactly as it appears in the list.\n",
156+
"- Do NOT return any additional text, explanation, or formatting.\n",
157+
"- Do NOT combine multiple section names into a single response.\n",
158+
"\n",
159+
"Here is the list of valid `section_names`:\n",
160+
"\n",
161+
"```\n",
162+
"{SECTIONS}\n",
163+
"```\n",
164+
"\n",
165+
"Now, based on the input question, return the single most relevant `section_name` from the list.\n",
166+
"\"\"\""
167+
]
168+
},
169+
{
170+
"cell_type": "code",
171+
"execution_count": null,
172+
"metadata": {},
173+
"outputs": [],
174+
"source": [
175+
"ANSWER_PROMPT = \"\"\"\n",
176+
"You are a rigorous assistant answering questions.\n",
177+
"You only answer based on the current information available.\n",
178+
"\n",
179+
"The current information available is:\n",
180+
"\n",
181+
"```\n",
182+
"{CURRENT_INFO}\n",
183+
"```\n",
184+
"\n",
185+
"If the current information available is not enough to answer the question,\n",
186+
"you must return the following message and nothing else:\n",
187+
"\n",
188+
"```\n",
189+
"I need more info.\n",
190+
"```\n",
191+
"\"\"\""
192+
]
193+
},
194+
{
195+
"cell_type": "code",
196+
"execution_count": null,
197+
"metadata": {},
198+
"outputs": [],
199+
"source": [
200+
"QUESTION = \"What optimizer was used to train the model?\""
201+
]
202+
},
203+
{
204+
"cell_type": "code",
205+
"execution_count": null,
206+
"metadata": {},
207+
"outputs": [],
208+
"source": [
209+
"from structured_qa.workflow import find_retrieve_answer\n",
210+
"\n",
211+
"find_retrieve_answer(\n",
212+
" question=QUESTION,\n",
213+
" model=model,\n",
214+
" sections_dir=sections_dir,\n",
215+
" find_prompt=FIND_PROMPT,\n",
216+
" answer_prompt=ANSWER_PROMPT,\n",
217+
")"
218+
]
219+
},
220+
{
221+
"cell_type": "markdown",
222+
"metadata": {},
223+
"source": []
224+
}
225+
],
226+
"metadata": {
227+
"kernelspec": {
228+
"display_name": ".venv",
229+
"language": "python",
230+
"name": "python3"
231+
},
232+
"language_info": {
233+
"name": "python",
234+
"version": "3.10.12"
235+
}
236+
},
237+
"nbformat": 4,
238+
"nbformat_minor": 2
239+
}

docs/future-features-contributions.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,18 @@ This Blueprint is an evolving project designed to grow with the help of the open
77
## 🌟 **How You Can Contribute**
88

99
### 🛠️ **Enhance the Blueprint**
10-
- Check the [Issues](https://github.com/mozilla-ai/structured-q-a/issues) page to see if there are feature requests you'd like to implement
11-
- Refer to our [Contribution Guide](https://github.com/mozilla-ai/structured-q-a/blob/main/CONTRIBUTING.md) for more details on contributions
10+
- Check the [Issues](https://github.com/mozilla-ai/structured-qa/issues) page to see if there are feature requests you'd like to implement
11+
- Refer to our [Contribution Guide](https://github.com/mozilla-ai/structured-qa/blob/main/CONTRIBUTING.md) for more details on contributions
1212

1313
### 🎨 **Extensibility Ideas**
1414

1515
This Blueprint is designed to be a foundation you can build upon. By extending its capabilities, you can open the door to new applications, improve user experience, and adapt the Blueprint to address other use cases. Here are a few ideas for how you can expand its potential:
1616

1717

18-
We’d love to see how you can enhance this Blueprint! If you create improvements or extend its capabilities, consider contributing them back to the project so others in the community can benefit from your work. Check out our [Contributions Guide](https://github.com/mozilla-ai/structured-q-a/blob/main/CONTRIBUTING.md) to get started!
18+
We’d love to see how you can enhance this Blueprint! If you create improvements or extend its capabilities, consider contributing them back to the project so others in the community can benefit from your work. Check out our [Contributions Guide](https://github.com/mozilla-ai/structured-qa/blob/main/CONTRIBUTING.md) to get started!
1919

2020
### 💡 **Share Your Ideas**
21-
Got an idea for how this Blueprint could be improved? You can share your suggestions through [GitHub Issues](https://github.com/mozilla-ai/structured-q-a/issues).
21+
Got an idea for how this Blueprint could be improved? You can share your suggestions through [GitHub Issues](https://github.com/mozilla-ai/structured-qa/issues).
2222

2323
### 🌍 **Build New Blueprints**
2424
This project is part of a larger initiative to create a collection of reusable starter code solutions that use open-source AI tools. If you’re inspired to create your own Blueprint, you can use the [Blueprint-template](https://github.com/new?template_name=Blueprint-template&template_owner=mozilla-ai) to get started.

docs/getting-started.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
Get started with Structured-Q-A using one of the options below:
1+
Get started with Structured-QA using one of the options below:
22

33
---
44

@@ -29,7 +29,7 @@ Get started with Structured-Q-A using one of the options below:
2929
You can install the project from Pypi:
3030

3131
```bash
32-
pip install structured-q-a
32+
pip install structured-qa
3333
```
3434

3535
Check the [Command Line Interface](./cli.md) guide.
@@ -41,8 +41,8 @@ Get started with Structured-Q-A using one of the options below:
4141
1. **Clone the Repository**
4242

4343
```bash
44-
git clone https://github.com/mozilla-ai/structured-q-a.git
45-
cd structured-q-a
44+
git clone https://github.com/mozilla-ai/structured-qa.git
45+
cd structured-qa
4646
```
4747

4848
2. **Install the project and its Dependencies**

docs/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
1-
# **Structured-Q-A Blueprint**
1+
# **Structured-QA Blueprint**
22

33
<div style="text-align: center;">
44
<img src="images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
55
</div>
66

77
Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools.
88

9-
These docs are your companion to mastering the **Structured-Q-A Blueprint** a local-first approach for answering questions about your structured documents.
9+
These docs are your companion to mastering the **Structured-QA Blueprint** a local-first approach for answering questions about your structured documents.
1010

1111
### Built with
1212
- Python 3.10+
@@ -15,7 +15,7 @@ These docs are your companion to mastering the **Structured-Q-A Blueprint** a lo
1515
---
1616

1717
### 🚀 **Get Started Quickly**
18-
#### _Start building your own Structured-Q-A pipeline in minutes:_
18+
#### _Start building your own Structured-QA pipeline in minutes:_
1919
- **[Getting Started](getting-started.md):** Quick setup and installation instructions.
2020

2121
### 🔍 **Understand the System**

docs/step-by-step-guide.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,4 @@
1-
# **Step-by-Step Guide: How the Structured-Q-A Blueprint Works**
2-
3-
Transforming static documents into engaging podcast episodes involves an integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation. Here's how it all works under the hood:
4-
5-
---
1+
# **Step-by-Step Guide: How the Structured-QA Blueprint Works**
62

73
## **Overview**
84

mkdocs.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
site_name: Structured Q&A
2-
repo_url: https://github.com/mozilla-ai/structured-q-a
3-
repo_name: structured-q-a
1+
site_name: Structured QA
2+
repo_url: https://github.com/mozilla-ai/structured-qa
3+
repo_name: structured-qa
44

55
nav:
66
- Home: index.md

0 commit comments

Comments
 (0)