Skip to content

This application focuses on answering questions about business laws and the legal environment using a RAG pipeline. It integrates a FAISS-based retrieval system with a GPT-2 language model to generate contextually relevant responses from preprocessed legal textbooks.

License

Notifications You must be signed in to change notification settings

your-ai-solution/qa-bot-business-law-environment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QA Bot: Business Law and Legal Environment

This application leverages retrieval-augmented generation (RAG) and large language models (LLMs) to answer questions about business laws and the legal environment. RAG combines a Facebook AI Similarity Search (FAISS)-based retrieval mechanism to fetch relevant document chunks with GPT-2, one of the early LLMs, to generate accurate and context-aware answers. This approach ensures responses are both grounded in reliable data and enhanced by the generative capabilities of advanced LLMs.

Table of Contents

Introduction

This QA Bot leverages advanced machine learning and natural language processing (NLP) techniques to provide answers to questions about business laws and the legal environment. It employs a RAG approach, combining a FAISS vector store for efficient document retrieval with a free GPT-2 model via the Hugging Face API for response generation. The bot generates natural language answers grounded in the retrieved legal documents.

The goal of this project is to demonstrate how machine learning techniques can be applied to legal datasets to create an interactive question-answering tool.

Application on Hugging Face

Data

The dataset consists of five textbooks related to business law and the legal environment. These textbooks were downloaded from https://open.umn.edu/opentextbooks/subjects/law. The content of these books has been preprocessed and embedded using FAISS for efficient document retrieval. List of Textbooks:

Mayer, D., Warner, D., Siedel, G., Lieberman, J., & Martina, A. (2012). Advanced Business Law and the Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., & Siedel, G. (2012). Business Law and the Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., Siedel, G., & Lieberman, J. (2012). Foundations of Business Law and Legal Environment. Saylor Foundation.

Mayer, D., Warner, D., Siedel, G., & Lieberman, J. (2012). Government Regulation and the Legal Environment of Business. Saylor Foundation.

Lau, T., & Johnson, L. (2011). The Legal and Ethical Environment of Business. Saylor Foundation.

These textbooks were adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. For more information, visit the law textbook list on the University of Minnesota's Open Textbook Library.

Methods

The application uses the following methods:

  • Document Embedding: Legal texts are embedded using sentence-transformers/all-MiniLM-L6-v2 to create a FAISS index for efficient similarity search.
  • RAG: Combines document retrieval with generation by using the FAISS index to retrieve relevant chunks and providing them as context to the model, enhancing accuracy and relevance.
  • Response Generation by LLM: GPT-2 generates answers grounded in retrieved document context using its transformer architecture for coherence and fluency.

The workflow ensures that the bot delivers reliable responses derived from credible sources.

Illustration of an LLM integrated with RAG

Figure from What is a RAG and why you should use it in combination with your LLM by Gianluca Centulani.

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., & others. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
Paper link

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI.
Paper link

Results

Hugging Face Space

The application is deployed on Hugging Face Spaces. You can test the bot here.

While the QA bot effectively retrieves and generates answers based on the provided textbooks, its responses are constrained by the accuracy and comprehensiveness of the source material. Additionally, the use of free GPT-2 for generation may result in less nuanced or detailed answers compared to more advanced language models.

Directory Structure

qa-bot-business-law-environment/
├── app.png                         # Screenshot of the Hugging Face space
├── configs/                        # Configurations
│   └── huggingface_api_key.txt     # API key for Hugging Face (not uploaded for security)
├── data/                           # Dataset
│   ├── preprocessed/               # FAISS index files
│   └── raw/                        # Raw documents
├── Dockerfile                      # Docker setup
├── environment.yml                 # Conda environment setup
├── LICENSE                         # Project license
├── llm_rag.png                     # Illustration of an LLM integrated with RAG
├── main.py                         # Main pipeline script
├── README.md                       # Project README
├── requirements.txt                # Python dependencies
└── src/                            # Source code
    ├── build.py                    # Script for retrieving and answering queries
    └── data.py                     # Script for data preparation and FAISS indexing

Installation

Conda Environment Setup

  1. Clone the repository:

    git clone https://github.com/your-ai-solution/qa-bot-business-law-environment.git
    cd qa-bot-business-law-environment
  2. Create a Conda environment:

    conda env create -f environment.yml
    conda activate qa-bot-business-law-environment
  3. Install dependencies:

    pip install -r requirements.txt

Docker Setup (Optional)

  1. Build the Docker image:

    docker build -t qa-bot-business-law-environment .
  2. Run the Docker container:

    docker run --gpus all -v $(pwd)/data:/app/data -v $(pwd)/results:/app/results qa-bot-business-law-environment

Usage

Run Main Script

  1. Place the downloaded textbook PDFs in data/raw/.

  2. Run the main script that automates the pipeline:

    python main.py

Run Each Source Script (Optional)

  1. Data preparation: Preprocess documents and create FAISS embeddings.

    python src/data.py
  2. Building: Test document retrieval and response generation.

    python src/build.py

About

This application focuses on answering questions about business laws and the legal environment using a RAG pipeline. It integrates a FAISS-based retrieval system with a GPT-2 language model to generate contextually relevant responses from preprocessed legal textbooks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published