nf-core Validator

An AI-powered agent for validating nf-core pipeline compliance. This tool analyzes Nextflow pipelines against the official nf-core guidelines and provides detailed recommendations for improving compliance.

Architecture

Features

Documentation-Based Validation: Uses the latest nf-core guidelines directly from the official documentation
Comprehensive Analysis: Checks modules, workflows, and subworkflows against all relevant requirements
Detailed Reports: Generates comprehensive reports with specific recommendations for each component
AI-Powered: Leverages large language models to understand complex requirements and provide context-aware suggestions
Interactive Chat: Query the nf-core documentation directly with natural language questions
Categorized Results: View validation results and chat responses organized by documentation category

Installation

Prerequisites

Python 3.8 or higher
OpenAI API key

Install from source

# Clone the repository
git clone https://github.com/arunbodd/nf-core_guidelines_validator
cd nf-core_guidelines_validator

# Install the package
pip install -e .

Usage

1. Harvest nf-core Documentation

Before validating pipelines, you need to harvest the nf-core documentation:

# Set your OpenAI API key
export OPENAI_API_KEY="your-api-key"

# Harvest documentation
nfcore-validator harvest

This will create a vector store of nf-core guidelines in the current directory.

2. Validate a Pipeline

# Validate a pipeline
nfcore-validator validate /path/to/your/pipeline --format markdown

This will:

Analyze all components in the pipeline
Check them against relevant nf-core guidelines
Generate a detailed compliance report

3. Chat with nf-core Documentation

You can also directly query the nf-core documentation using natural language:

# Start the chat interface
nfcore-validator chat

# Show sources for answers
nfcore-validator chat --show-sources

# Increase context size for more comprehensive answers
nfcore-validator chat --context-size 15

Report Format

The tool generates two types of reports:

JSON Report: Contains all validation details in a structured format
Markdown Report: Human-readable summary with component details and recommendations

The enhanced markdown report includes:

Component Type Breakdown: Shows compliance by component type
Detailed Violations: Descriptions and example fixes for each violation type
Organized Components: Grouped by type with failed requirements highlighted
Actionable Recommendations: Prioritized list of improvements with affected components

Example markdown report:

# nf-core Pipeline Compliance Report

**Pipeline:** `rnaseq`
**Path:** `/path/to/pipeline`
**Date:** 2025-04-28 00:11:25

## Summary
- **Components Analyzed:** 25
- **Requirements Checked:** 143
- **Passed Requirements:** 128
- **Failed Requirements:** 15
- **Compliance Score:** 89.5%

## Component Type Breakdown
| Component Type | Count | Avg. Compliance |
|---------------|-------|----------------|
| module | 15 | 92.33% |
| workflow | 5 | 85.60% |
...

Advanced Usage

Custom Vector Store Location

nfcore-validator harvest --output /path/to/vectorstore
nfcore-validator validate /path/to/pipeline --vectorstore /path/to/vectorstore

Rate Limit Handling

The validator automatically handles OpenAI API rate limits by:

Processing components gradually
Waiting and retrying when rate limits are hit
Adding small delays between API calls

You can also reduce parallelism to further avoid rate limits:

python -m nfcore_validator.cli.main validate /path/to/pipeline --max-workers 2

Categorized Chat

The chat interface categorizes information by documentation section:

nfcore-validator chat --show-sources

This will display sources grouped by categories like:

Module Guidelines
Subworkflow Guidelines
Pipeline Structure
Test Data Guidelines

How It Works

Documentation Harvesting: The tool extracts all guidelines from the official nf-core documentation website
Vector Embedding: Guidelines are converted to vector embeddings for semantic search
Component Analysis: For each pipeline component, the tool retrieves relevant guidelines
AI Validation: An LLM analyzes the component against the guidelines and generates recommendations
Report Generation: Results are compiled into comprehensive reports

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
my_vectorstore		my_vectorstore
nfcore_validator.egg-info		nfcore_validator.egg-info
nfcore_validator		nfcore_validator
nfcore_vectorstore		nfcore_vectorstore
.DS_Store		.DS_Store
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nf-core Validator

Architecture

Features

Installation

Prerequisites

Install from source

Usage

1. Harvest nf-core Documentation

2. Validate a Pipeline

3. Chat with nf-core Documentation

Report Format

Advanced Usage

Custom Vector Store Location

Rate Limit Handling

Categorized Chat

How It Works

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

arunbodd/nf-core_guidelines_validator

Folders and files

Latest commit

History

Repository files navigation

nf-core Validator

Architecture

Features

Installation

Prerequisites

Install from source

Usage

1. Harvest nf-core Documentation

2. Validate a Pipeline

3. Chat with nf-core Documentation

Report Format

Advanced Usage

Custom Vector Store Location

Rate Limit Handling

Categorized Chat

How It Works

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages