QA Bot: Business Law and Legal Environment

This application leverages retrieval-augmented generation (RAG) and large language models (LLMs) to answer questions about business laws and the legal environment. RAG combines a Facebook AI Similarity Search (FAISS)-based retrieval mechanism to fetch relevant document chunks with GPT-2, one of the early LLMs, to generate accurate and context-aware answers. This approach ensures responses are both grounded in reliable data and enhanced by the generative capabilities of advanced LLMs.

Introduction

This QA Bot leverages advanced machine learning and natural language processing (NLP) techniques to provide answers to questions about business laws and the legal environment. It employs a RAG approach, combining a FAISS vector store for efficient document retrieval with a free GPT-2 model via the Hugging Face API for response generation. The bot generates natural language answers grounded in the retrieved legal documents.

The goal of this project is to demonstrate how machine learning techniques can be applied to legal datasets to create an interactive question-answering tool.

Data

The dataset consists of five textbooks related to business law and the legal environment. These textbooks were downloaded from https://open.umn.edu/opentextbooks/subjects/law. The content of these books has been preprocessed and embedded using FAISS for efficient document retrieval. List of Textbooks:

Mayer, D., Warner, D., Siedel, G., Lieberman, J., & Martina, A. (2012). Advanced Business Law and the Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., & Siedel, G. (2012). Business Law and the Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., Siedel, G., & Lieberman, J. (2012). Foundations of Business Law and Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., Siedel, G., & Lieberman, J. (2012). Government Regulation and the Legal Environment of Business. Saylor Foundation.

Lau, T., & Johnson, L. (2011). The Legal and Ethical Environment of Business. Saylor Foundation.

These textbooks were adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. For more information, visit the law textbook list on the University of Minnesota's Open Textbook Library.

Methods

The application uses the following methods:

Document Embedding: Legal texts are embedded using sentence-transformers/all-MiniLM-L6-v2 to create a FAISS index for efficient similarity search.
RAG: Combines document retrieval with generation by using the FAISS index to retrieve relevant chunks and providing them as context to the model, enhancing accuracy and relevance.
Response Generation by LLM: GPT-2 generates answers grounded in retrieved document context using its transformer architecture for coherence and fluency.

The workflow ensures that the bot delivers reliable responses derived from credible sources.

Figure from What is a RAG and why you should use it in combination with your LLM by Gianluca Centulani.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & others. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
Paper link

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI.
Paper link

Results

Hugging Face Space

The application is deployed on Hugging Face Spaces. You can test the bot here.

While the QA bot effectively retrieves and generates answers based on the provided textbooks, its responses are constrained by the accuracy and comprehensiveness of the source material. Additionally, the use of free GPT-2 for generation may result in less nuanced or detailed answers compared to more advanced language models.

Directory Structure

qa-bot-business-law-environment/
├── app.png                         # Screenshot of the Hugging Face space
├── configs/                        # Configurations
│   └── huggingface_api_key.txt     # API key for Hugging Face (not uploaded for security)
├── data/                           # Dataset
│   ├── preprocessed/               # FAISS index files
│   └── raw/                        # Raw documents
├── Dockerfile                      # Docker setup
├── environment.yml                 # Conda environment setup
├── LICENSE                         # Project license
├── llm_rag.png                     # Illustration of an LLM integrated with RAG
├── main.py                         # Main pipeline script
├── README.md                       # Project README
├── requirements.txt                # Python dependencies
└── src/                            # Source code
    ├── build.py                    # Script for retrieving and answering queries
    └── data.py                     # Script for data preparation and FAISS indexing

Installation

Conda Environment Setup

Clone the repository:

git clone https://github.com/your-ai-solution/qa-bot-business-law-environment.git
cd qa-bot-business-law-environment

Create a Conda environment:

conda env create -f environment.yml
conda activate qa-bot-business-law-environment

Install dependencies:
```
pip install -r requirements.txt
```

Docker Setup (Optional)

Build the Docker image:

docker build -t qa-bot-business-law-environment .

Run the Docker container:

docker run --gpus all -v $(pwd)/data:/app/data -v $(pwd)/results:/app/results qa-bot-business-law-environment

Usage

Run Main Script

Place the downloaded textbook PDFs in data/raw/.
Run the main script that automates the pipeline:
```
python main.py
```

Run Each Source Script (Optional)

Data preparation: Preprocess documents and create FAISS embeddings.
```
python src/data.py
```
Building: Test document retrieval and response generation.
```
python src/build.py
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QA Bot: Business Law and Legal Environment

Table of Contents

Introduction

Data

Methods

Results

Hugging Face Space

Directory Structure

Installation

Conda Environment Setup

Docker Setup (Optional)

Usage

Run Main Script

Run Each Source Script (Optional)

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.png		app.png
environment.yml		environment.yml
llm_rag.png		llm_rag.png
main.py		main.py
requirements.txt		requirements.txt

License

your-ai-solution/qa-bot-business-law-environment

Folders and files

Latest commit

History

Repository files navigation

QA Bot: Business Law and Legal Environment

Table of Contents

Introduction

Data

Methods

Results

Hugging Face Space

Directory Structure

Installation

Conda Environment Setup

Docker Setup (Optional)

Usage

Run Main Script

Run Each Source Script (Optional)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages