9 add notebook demo (#10)

daavoo · web-flow · commit 28ae1d9af534 · 2025-01-17T10:16:18.000+01:00
* Update API

* Update index

* Fix repo_url

* Update getting-started

* Updates with model loader

* Update step-by-step guide

* Update defaults

* Update contributions

* reorder

* Lint

* Fix import

* fix test_workflow

* Update README.md

* Add demo/notebook

* Add heading

* Remove outdated text

* Update Python version

* Update name
diff --git a/README.md b/README.md
@@ -1,29 +1,29 @@
 <p align="center"><img src="./images/Blueprints-logo.png" width="35%" alt="Project logo"/></p>
 
-# Structured-Q&A: a Blueprint by Mozilla.ai for answering questions about structured documents.
+# Structured-QA: a Blueprint by Mozilla.ai for answering questions about structured documents.
 
 
 [![](https://dcbadge.limes.pink/api/server/YuMNeuKStr?style=flat)](https://discord.gg/YuMNeuKStr)
-[![Docs](https://github.com/mozilla-ai/structured-q-a/actions/workflows/docs.yaml/badge.svg)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/docs.yaml/)
-[![Tests](https://github.com/mozilla-ai/structured-q-a/actions/workflows/tests.yaml/badge.svg)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/tests.yaml/)
-[![Ruff](https://github.com/mozilla-ai/structured-q-a/actions/workflows/lint.yaml/badge.svg?label=Ruff)](https://github.com/mozilla-ai/structured-q-a/actions/workflows/lint.yaml/)
+[![Docs](https://github.com/mozilla-ai/structured-qa/actions/workflows/docs.yaml/badge.svg)](https://github.com/mozilla-ai/structured-qa/actions/workflows/docs.yaml/)
+[![Tests](https://github.com/mozilla-ai/structured-qa/actions/workflows/tests.yaml/badge.svg)](https://github.com/mozilla-ai/structured-qa/actions/workflows/tests.yaml/)
+[![Ruff](https://github.com/mozilla-ai/structured-qa/actions/workflows/lint.yaml/badge.svg?label=Ruff)](https://github.com/mozilla-ai/structured-qa/actions/workflows/lint.yaml/)
 
 
 This Blueprint demonstrates how to use open-source models and a simple LLM workflow to answer questions based on structured documents.
 
 It is designed to showcase a simpler alternative to more complex and/or resource demanding alternatives, such as RAG systems that rely on vectorDBs and/or long-context models with large token windows.
 
 
-### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/structured-q-a/).
+### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/structured-qa/).
 
 
 ## Quick-start
 
-Get started with structured-q-a using one of the options below:
+Get started with structured-qa using one of the options below:
 
 | Google Colab | HuggingFace Spaces  | GitHub Codespaces |
 | -------------| ------------------- | ----------------- |
-| [![Try on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/structured-q-a/blob/main/demo/notebook.ipynb) | [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/mozilla-ai/structured-q-a) | [![Try on Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) |
+| [![Try on Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mozilla-ai/structured-qa/blob/main/demo/notebook.ipynb) | [![Try on Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Try%20on-Spaces-blue)](https://huggingface.co/spaces/mozilla-ai/structured-qa) | [![Try on Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb) |
 
 Alternatively, you can install it from pypi:
 
diff --git a/demo/app.py b/demo/app.py
@@ -24,7 +24,7 @@ def convert_to_sections(uploaded_file, output_dir):
     )
 
 
-st.title("Structured Q&A")
+st.title("Structured QA")
 
 st.header("Uploading Data")
 
diff --git a/demo/notebook.ipynb b/demo/notebook.ipynb
@@ -0,0 +1,239 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Structured Q&A"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Source code: https://github.com/mozilla-ai/structured-qa"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Docs: https://mozilla-ai.github.io/structured-qa"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## GPU Check"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, you'll need to enable GPUs for the notebook:\n",
+    "\n",
+    "- Navigate to `Edit`→`Notebook Settings`\n",
+    "- Select T4 GPU from the Hardware Accelerator section\n",
+    "- Click `Save` and accept.\n",
+    "\n",
+    "Next, we'll confirm that we can connect to the GPU:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "if not torch.cuda.is_available():\n",
+    "    raise RuntimeError(\"GPU not available\")\n",
+    "else:\n",
+    "    print(\"GPU is available!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installing dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install --quiet https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.4-cu122/llama_cpp_python-0.3.4-cp311-cp311-linux_x86_64.whl\n",
+    "%pip install --quiet structured-qa"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Uploading data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from google.colab import files\n",
+    "\n",
+    "uploaded = files.upload()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Converting document to a directory of sections"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from pathlib import Path\n",
+    "from structured_qa.preprocessing import document_to_sections_dir\n",
+    "\n",
+    "input_file = list(uploaded.keys())[0]\n",
+    "sections_dir = f\"output/{Path(input_file).stem}\"\n",
+    "section_names = document_to_sections_dir(input_file, sections_dir)\n",
+    "section_names"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Loading model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from structured_qa.model_loaders import load_llama_cpp_model\n",
+    "\n",
+    "model = load_llama_cpp_model(\n",
+    "    \"bartowski/Qwen2.5-3B-Instruct-GGUF/Qwen2.5-3B-Instruct-f16.gguf\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Find, Retrieve, and Answer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FIND_PROMPT = \"\"\"\n",
+    "You are given two pieces of information:\n",
+    "1. A user question.\n",
+    "2. A list of valid section names.\n",
+    "\n",
+    "Your task is to:\n",
+    "- Identify exactly one `section_name` from the provided list that seems related to the user question.\n",
+    "- Return the `section_name` exactly as it appears in the list.\n",
+    "- Do NOT return any additional text, explanation, or formatting.\n",
+    "- Do NOT combine multiple section names into a single response.\n",
+    "\n",
+    "Here is the list of valid `section_names`:\n",
+    "\n",
+    "```\n",
+    "{SECTIONS}\n",
+    "```\n",
+    "\n",
+    "Now, based on the input question, return the single most relevant `section_name` from the list.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ANSWER_PROMPT = \"\"\"\n",
+    "You are a rigorous assistant answering questions.\n",
+    "You only answer based on the current information available.\n",
+    "\n",
+    "The current information available is:\n",
+    "\n",
+    "```\n",
+    "{CURRENT_INFO}\n",
+    "```\n",
+    "\n",
+    "If the current information available is not enough to answer the question,\n",
+    "you must return the following message and nothing else:\n",
+    "\n",
+    "```\n",
+    "I need more info.\n",
+    "```\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "QUESTION = \"What optimizer was used to train the model?\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from structured_qa.workflow import find_retrieve_answer\n",
+    "\n",
+    "find_retrieve_answer(\n",
+    "    question=QUESTION,\n",
+    "    model=model,\n",
+    "    sections_dir=sections_dir,\n",
+    "    find_prompt=FIND_PROMPT,\n",
+    "    answer_prompt=ANSWER_PROMPT,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/docs/future-features-contributions.md b/docs/future-features-contributions.md
@@ -7,18 +7,18 @@ This Blueprint is an evolving project designed to grow with the help of the open
 ## 🌟 **How You Can Contribute**
 
 ### 🛠️ **Enhance the Blueprint**
-- Check the [Issues](https://github.com/mozilla-ai/structured-q-a/issues) page to see if there are feature requests you'd like to implement
-- Refer to our [Contribution Guide](https://github.com/mozilla-ai/structured-q-a/blob/main/CONTRIBUTING.md) for more details on contributions
+- Check the [Issues](https://github.com/mozilla-ai/structured-qa/issues) page to see if there are feature requests you'd like to implement
+- Refer to our [Contribution Guide](https://github.com/mozilla-ai/structured-qa/blob/main/CONTRIBUTING.md) for more details on contributions
 
 ### 🎨 **Extensibility Ideas**
 
 This Blueprint is designed to be a foundation you can build upon. By extending its capabilities, you can open the door to new applications, improve user experience, and adapt the Blueprint to address other use cases. Here are a few ideas for how you can expand its potential:
 
 
-We’d love to see how you can enhance this Blueprint! If you create improvements or extend its capabilities, consider contributing them back to the project so others in the community can benefit from your work. Check out our [Contributions Guide](https://github.com/mozilla-ai/structured-q-a/blob/main/CONTRIBUTING.md) to get started!
+We’d love to see how you can enhance this Blueprint! If you create improvements or extend its capabilities, consider contributing them back to the project so others in the community can benefit from your work. Check out our [Contributions Guide](https://github.com/mozilla-ai/structured-qa/blob/main/CONTRIBUTING.md) to get started!
 
 ### 💡 **Share Your Ideas**
-Got an idea for how this Blueprint could be improved? You can share your suggestions through [GitHub Issues](https://github.com/mozilla-ai/structured-q-a/issues).
+Got an idea for how this Blueprint could be improved? You can share your suggestions through [GitHub Issues](https://github.com/mozilla-ai/structured-qa/issues).
 
 ### 🌍 **Build New Blueprints**
 This project is part of a larger initiative to create a collection of reusable starter code solutions that use open-source AI tools. If you’re inspired to create your own Blueprint, you can use the [Blueprint-template](https://github.com/new?template_name=Blueprint-template&template_owner=mozilla-ai) to get started.
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -1,4 +1,4 @@
-Get started with Structured-Q-A using one of the options below:
+Get started with Structured-QA using one of the options below:
 
 ---
 
@@ -29,7 +29,7 @@ Get started with Structured-Q-A using one of the options below:
       You can install the project from Pypi:
 
       ```bash
-      pip install structured-q-a
+      pip install structured-qa
       ```
 
       Check the [Command Line Interface](./cli.md) guide.
@@ -41,8 +41,8 @@ Get started with Structured-Q-A using one of the options below:
       1. **Clone the Repository**
 
          ```bash
-         git clone https://github.com/mozilla-ai/structured-q-a.git
-         cd structured-q-a
+         git clone https://github.com/mozilla-ai/structured-qa.git
+         cd structured-qa
          ```
 
       2. **Install the project and its Dependencies**
diff --git a/docs/index.md b/docs/index.md
@@ -1,12 +1,12 @@
-# **Structured-Q-A Blueprint**
+# **Structured-QA Blueprint**
 
 <div style="text-align: center;">
   <img src="images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
 </div>
 
 Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools.
 
-These docs are your companion to mastering the **Structured-Q-A Blueprint** a local-first approach for answering questions about your structured documents.
+These docs are your companion to mastering the **Structured-QA Blueprint** a local-first approach for answering questions about your structured documents.
 
 ### Built with
 - Python 3.10+
@@ -15,7 +15,7 @@ These docs are your companion to mastering the **Structured-Q-A Blueprint** a lo
 ---
 
 ### 🚀 **Get Started Quickly**
-#### _Start building your own Structured-Q-A pipeline in minutes:_
+#### _Start building your own Structured-QA pipeline in minutes:_
 - **[Getting Started](getting-started.md):** Quick setup and installation instructions.
 
 ### 🔍 **Understand the System**
diff --git a/docs/step-by-step-guide.md b/docs/step-by-step-guide.md
@@ -1,8 +1,4 @@
-# **Step-by-Step Guide: How the Structured-Q-A Blueprint Works**
-
-Transforming static documents into engaging podcast episodes involves an integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation. Here's how it all works under the hood:
-
----
+# **Step-by-Step Guide: How the Structured-QA Blueprint Works**
 
 ## **Overview**
 
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -1,6 +1,6 @@
-site_name: Structured Q&A
-repo_url: https://github.com/mozilla-ai/structured-q-a
-repo_name: structured-q-a
+site_name: Structured QA
+repo_url: https://github.com/mozilla-ai/structured-qa
+repo_name: structured-qa
 
 nav:
   - Home: index.md

Original file line number	Diff line number	Diff line change
`@@ -24,7 +24,7 @@ def convert_to_sections(uploaded_file, output_dir):`
`24`	`24`	`)`
`25`	`25`
`26`	`26`
`27`		`-st.title("Structured Q&A")`
	`27`	`+st.title("Structured QA")`
`28`	`28`
`29`	`29`	`st.header("Uploading Data")`
`30`	`30`