RAG-based Penetration Testing Report Generator 🛡️

A sophisticated tool that leverages RAG (Retrieval Augmented Generation) to analyze penetration testing data and generate comprehensive security reports.

🎯 Problem Statement

Traditional penetration testing report generation is:

Time-consuming and labor-intensive
Prone to inconsistencies in reporting format
Challenging to maintain standardization across reports
Difficult to process large volumes of test data efficiently

💡 Solution

This tool provides:

Automated analysis of penetration testing data
Standardized report generation using RAG
Parallel processing of large datasets
Intelligent caching and optimization
Structured output with comprehensive security insights

🚀 Features

Parallel Document Processing: Efficiently handles multiple files simultaneously
Smart Caching: Implements document hashing for faster subsequent runs
Vector Storage: Uses ChromaDB for efficient similarity search
Memory Optimization: Batch processing with automatic memory management
Device Adaptation: Supports CUDA, MPS (M-series chips), and CPU
Comprehensive Reports: Generates detailed reports with:
- Executive Summary
- Methodology
- Findings and Vulnerabilities
- Risk Analysis
- Technical Details
- Remediation Steps

🛠️ Technical Architecture

Components

Document Processor:
- Text cleaning and normalization
- Intelligent chunking
- Metadata extraction
Vector Store:
- ChromaDB integration
- Efficient similarity search
- Persistent storage
LLM Integration:
- Ollama model integration
- Customizable prompts
- Streaming support
Embedding System:
- HuggingFace embeddings
- Model: "BAAI/bge-small-en-v1.5"
- Device-specific optimization

⚙️ Installation

# Clone the repository
git clone https://github.com/Abhinandan-Khurana/rag-based-ai-pentest-report-generator.git

# Install dependencies
pip install -r requirements.txt

# Install Ollama (Mac/Linux)
curl https://ollama.ai/install.sh | sh

# Pull required model
ollama pull deepseek-r1:latest

🗂️ Usage

python pentest_analyzer.py

When prompted, provide the path to your penetration testing data directory. (if running locally)

When prompted, provide the path data to your penetration testing data directory, after mounting that path to docker container, like in DockerSetup.md as demo_data directory is mounted. (if running in docker)

⚠️ Important Notes

Variability in Results:
- Due to the nature of RAG and LLM responses, results may vary between runs
- The tool might generate slightly different analyses for the same input
- It's recommended to run multiple analyses for critical assessments
Supported File Types:
- .txt
- .json
- .md
- .xml
- .csv
- .log

🔍 Example Output Structure

📑 Generated Report
├── 1. Executive Summary
├── 2. Methodology
├── 3. Findings and Vulnerabilities
├── 4. Risk Analysis
├── 5. Detailed Technical Analysis
└── 6. Remediation Roadmap

Sequential Diagram Flow for the project

sequenceDiagram
    participant User
    participant Main
    participant PentestAnalyzer
    participant DocumentProcessor
    participant ChromaDB
    participant LLM
    participant FileSystem

    User->>Main: Start Program
    Main->>PentestAnalyzer: Initialize(config)
    PentestAnalyzer->>PentestAnalyzer: setup_logging()
    PentestAnalyzer->>PentestAnalyzer: setup_device()
    PentestAnalyzer->>PentestAnalyzer: initialize_models()

    Main->>User: Request Input Directory
    User->>Main: Provide Directory Path
    Main->>PentestAnalyzer: process_and_analyze(directory)

    PentestAnalyzer->>FileSystem: parallel_document_loading()
    activate FileSystem
    FileSystem-->>DocumentProcessor: Process Files
    DocumentProcessor->>DocumentProcessor: clean_text()
    DocumentProcessor->>DocumentProcessor: chunk_text()
    FileSystem-->>PentestAnalyzer: Return Documents
    deactivate FileSystem

    PentestAnalyzer->>ChromaDB: create_or_load_index()
    activate ChromaDB
    ChromaDB->>ChromaDB: get_or_create_collection()
    ChromaDB-->>PentestAnalyzer: Return Index
    deactivate ChromaDB

    PentestAnalyzer->>LLM: generate_report()
    activate LLM
    LLM->>LLM: Process Sections
    Note over LLM: Executive Summary
    Note over LLM: Methodology
    Note over LLM: Findings
    Note over LLM: Risk Analysis
    Note over LLM: Technical Analysis
    Note over LLM: Remediation
    LLM-->>PentestAnalyzer: Return Report
    deactivate LLM

    PentestAnalyzer->>FileSystem: Save Report
    PentestAnalyzer-->>Main: Return Status
    Main->>User: Display Completion Message

Important Notes

Results may be redundant or vary between runs due to the nature of RAG based LLM responses
Local LLMs may provide different results compared to OpenAI's models

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

👨‍💻 Author

Abhinandan Khurana

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
demo_data		demo_data
DockerSetup.md		DockerSetup.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
hardware_check.sh		hardware_check.sh
pentest_analyzer.log		pentest_analyzer.log
pentest_analyzer.py		pentest_analyzer.py
pentest_report.md		pentest_report.md
requirements.txt		requirements.txt
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-based Penetration Testing Report Generator 🛡️

🎯 Problem Statement

💡 Solution

🚀 Features

🛠️ Technical Architecture

Components

⚙️ Installation

🗂️ Usage

⚠️ Important Notes

🔍 Example Output Structure

Sequential Diagram Flow for the project

Important Notes

🤝 Contributing

👨‍💻 Author

About

Releases

Packages

Languages

License

Abhinandan-Khurana/rag-based-ai-pentest-report-generator

Folders and files

Latest commit

History

Repository files navigation

RAG-based Penetration Testing Report Generator 🛡️

🎯 Problem Statement

💡 Solution

🚀 Features

🛠️ Technical Architecture

Components

⚙️ Installation

🗂️ Usage

⚠️ Important Notes

🔍 Example Output Structure

Sequential Diagram Flow for the project

Important Notes

🤝 Contributing

👨‍💻 Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages