Skip to content

banditofsmoke/RAGs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Certainly! Here's a detailed README for the src/RAGs/ directory, including explanations for the ReadWriteSymbolCollector and the newly created task_1.py:

# RAGs (Retrieval Augmented Generation) Tools

This directory contains various tools and utilities for Retrieval Augmented Generation (RAG) tasks, including document processing, chunking, and storage.

## Contents

1. [ReadWriteSymbolCollector](#readwritesymbolcollector)
2. [PythonFileWriter](#pythonfilewriter)
3. [task_1.py: Document Processor](#task_1py-document-processor)

## ReadWriteSymbolCollector

The `ReadWriteSymbolCollector` is a utility class for reading and writing symbol table JSON files. It provides functionality to manage and manipulate JSON data structures.

### Key Features:

- Write storage dictionaries to JSON files
- Read JSON dictionaries from files
- Combine JSON dictionary files

### Usage:

```python
from ReadWriteSymbolCollector_lite import ReadWriteSymbolCollector

collector = ReadWriteSymbolCollector()

# Writing to a JSON file
collector.WriteStorageDictJson(writeJsonPath, portionOneDict, portionTwoDict, fileName)

# Reading from a JSON file
data = collector.ReadJsonDict(readJsonPath)

# Combining JSON files
collector.CombineJsonDictFiles(frontJsonData, backJsonData, writeJsonPath, fileName)

PythonFileWriter

The PythonFileWriter class is an extension of the ReadWriteSymbolCollector with some modifications and additional functionality.

Key Features:

  • Write dictionaries to JSON files
  • Read JSON dictionaries from files
  • Combine JSON dictionary files

Usage:

from PythonFileWriter import PythonFileWriter

writer = PythonFileWriter()

# Writing to a JSON file
writer.WriteStorageDictJson(writeJsonPath, portionOneDict, portionTwoDict, fileName)

# Reading from a JSON file
data = writer.ReadJsonDict(readJsonPath)

# Combining JSON files
writer.CombineJsonDictFiles(frontJsonData, backJsonData, writeJsonPath, fileName)

task_1.py: Document Processor

task_1.py contains a DocumentProcessor class that provides functionality for chunking and indexing documents, assigning unique identifiers, and storing them in a PostgreSQL database.

Key Features:

  • Parse XML documents into a JSON-like structure
  • Chunk documents and assign SHA256 hash identifiers
  • Store documents and chunks in a PostgreSQL database
  • Retrieve chunks by ID or document ID

Usage:

from task_1 import DocumentProcessor

db_config = {
    "dbname": "your_database",
    "user": "your_username",
    "password": "your_password",
    "host": "localhost"
}

processor = DocumentProcessor(db_config)

# Process XML documents
xml_content = """
<documents>
<document index="1">
        <source>example.txt</source>
        <document_content>This is an example document.</document_content>
        </document>
</documents>
"""
processor.process_xml_documents(xml_content, chunk_size=50)

# Retrieve chunks for a document
document_id = processor.generate_hash("This is an example document.")
chunks = processor.get_chunks_by_document_id(document_id)

processor.close()

Setup and Dependencies

  1. Ensure you have Python 3.7+ installed.
  2. Install required packages:
    pip install psycopg2
    
  3. Set up a PostgreSQL database and update the db_config in your scripts accordingly.

Contributing

Feel free to contribute to this project by submitting pull requests or opening issues for any bugs or feature requests.

License

[Add your chosen license here]


This README provides a comprehensive overview of the tools available in the `src/RAGs/` directory, including brief explanations of their use. It covers the `ReadWriteSymbolCollector`, `PythonFileWriter`, and the newly created `task_1.py` with its `DocumentProcessor` class. 

The README includes sections on key features, usage examples, setup instructions, and placeholders for contribution guidelines and licensing information. You can further customize this README to include any specific details about your project structure, development workflow, or additional tools you may add in the future.

About

RAG systems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages