diff --git a/pages/getting-started.md b/pages/getting-started.md new file mode 100644 index 00000000..e9accfd7 --- /dev/null +++ b/pages/getting-started.md @@ -0,0 +1,427 @@ +# Getting Started with BRC Analytics: A Beginner's Guide + +## What is BRC Analytics? + +BRC Analytics is a browser-based bioinformatics platform that makes advanced pathogen genomics accessible to researchers without requiring programming experience or local software installation. Built on the Galaxy workflow system and funded by NIAID as part of the Bioinformatics Resource Centers for Infectious Diseases Program, it unifies reference data from NCBI Datasets with curated community workflows, allowing you to analyze genomic data entirely in your web browser. + +### Key Features +- **No installation required** - Everything runs in your browser +- **Free public computational resources** - No additional investment needed +- **5,000+ genome assemblies** covering nearly 2,000 pathogen, host, and vector taxa +- **Integrated reference data** from authoritative sources like NCBI and UCSC +- **Community-curated workflows** for quality control, read mapping, variant identification, and annotation +- **Built-in analysis tools** including JupyterLite notebooks for custom analyses +- **Reproducible research** with automatic provenance tracking + +## Before You Begin + +### System Requirements +- A modern web browser (Firefox, Chrome, Safari, or Edge) +- Stable internet connection +- No special hardware or software needed + +### Creating an Account +1. Visit [https://usegalaxy.org](https://usegalaxy.org) +2. Click "Login or Register" in the top menu +3. Choose "Register" to create a new account +4. Fill in your information and submit +5. Your account will be shared between BRC Analytics and Galaxy + +**Note**: Creating an account allows you to save your work, access your analysis history, and share results. + +## Understanding the Interface + +BRC Analytics consists of three main components: + +### 1. BRC Analytics Website +- Browse organisms and assemblies +- Select reference genomes +- Configure workflow parameters +- Launch analyses + +### 2. Galaxy Workflow System +- Execute computational analyses +- Manage datasets and histories +- Run tools and workflows +- View results + +### 3. UCSC Genome Browser +- Visualize genomic data +- Explore annotations +- Compare sequences +- View custom tracks + +## Your First Analysis: Variant Calling from Sequencing Data + +This walkthrough demonstrates how to perform pathogen surveillance analysis using the measles virus example from the featured analysis. + +### Step 1: Select Your Organism + +1. From the BRC Analytics homepage, navigate to **"Organisms"** +2. Use the search or browse function to find your organism of interest + - Example: Search for "Measles morbillivirus" +3. Click on the organism to view available assemblies + +### Step 2: Choose a Reference Assembly + +1. Review the available genome assemblies from NCBI +2. Select the appropriate reference assembly for your analysis + - Assemblies include metadata like assembly level, genome size, and annotation +3. Click **"View on UCSC Genome Browser"** to explore the genome visually (optional) + +### Step 3: Select a Workflow + +BRC Analytics provides curated workflows for common analyses: + +- **Variant Discovery** - Identify genetic variants from sequencing data +- **Quality Control** - Assess sequencing data quality +- **Read Mapping** - Align reads to reference genomes +- **Annotation** - Functionally annotate variants +- **Transcriptomics** - Analyze gene expression +- **Epigenetics** - Study DNA modifications +- **Assembly** - Construct genome sequences + +For this example, select the **"Variant calling and consensus construction from paired-end short-read data"** workflow. + +### Step 4: Get Your Data + +BRC Analytics can access data from: + +#### Option A: Public SRA Data +1. Click **"Search SRA"** to access the Sequence Read Archive +2. Enter search terms (e.g., "measles virus surveillance") +3. Filter results by: + - Date range + - Geographic location + - Study type + - Read length +4. Select datasets by checking boxes +5. The platform will automatically retrieve the data + +#### Option B: Upload Your Own Data +1. Navigate to the Galaxy interface +2. Click **"Upload Data"** in the left panel +3. Choose your file source: + - Local files from your computer + - Paste URLs + - Import from FTP +4. Select your files (FASTQ format for sequencing reads) +5. Click **"Start"** to upload +6. Return to BRC-Analytics to configure the workflow parameters + +### Step 5: Configure Workflow Parameters + +The workflow interface will display configurable parameters: + +**Common parameters include:** +- **Base quality threshold** - Minimum quality score for variant calling (typically 20-30) +- **Allele frequency threshold** - Minimum frequency to call a variant (e.g., 0.05 for 5%) +- **Read depth** - Minimum coverage required +- **Trimming options** - Adapter removal and quality trimming settings + +**For the measles virus variant workflow:** +1. Review default parameters (optimized for viral genomes) +2. Adjust thresholds if needed for your specific analysis +3. The workflow will automatically: + - Trim adapter sequences + - Map reads to the reference genome + - Call variants with iVar + - Generate consensus sequences + - Annotate variants with SnpEff + - Produce quality control reports with MultiQC + +### Step 6: Launch the Analysis + +1. Review your selections: + - Reference genome ✓ + - Input data ✓ + - Workflow ✓ + - Parameters ✓ +2. Click **"Run Workflow"** +3. You'll be redirected to Galaxy where the analysis will execute +4. Monitor progress in your Galaxy history panel (right side) + +### Step 7: View Results + +Analysis outputs include: + +**Primary Results:** +- **Trimmed reads** - Quality-filtered sequencing data +- **BAM files** - Mapped and aligned reads +- **VCF files** - Annotated variant calls with summary tables +- **Consensus FASTA** - Per-sample consensus sequences +- **QC reports** - Integrated quality metrics + +**Accessing results:** +1. Click on any dataset in your history to expand details +2. Click the eye icon đŸ‘ī¸ to view data +3. Click the download icon 💾 to save locally +4. Click the info icon â„šī¸ for provenance and parameters + +## Advanced Features + +### Using JupyterLite for Custom Analyses + +BRC Analytics includes integrated JupyterLite notebooks for advanced analysis: + +1. From your Galaxy history, locate the dataset you want to analyze +2. Click on the dataset and select **"Visualize"** +3. Choose **"JupyterLite Notebook"** +4. Write Python code to perform custom analyses +5. Example analyses include: + - Statistical testing + - Data visualization + - Custom filtering + - Evolutionary analysis + +### Creating and Sharing Workflows + +You can extract your analysis as a reusable workflow: + +1. In Galaxy, click the **history options** menu (gear icon) +2. Select **"Extract Workflow"** +3. Choose which steps to include +4. Name your workflow +5. Save and share with collaborators + +### Visualizing Results in UCSC Genome Browser + +1. From Galaxy, select your BAM or VCF file +2. Click **"Visualize"** and choose **"UCSC Genome Browser"** +3. Your data will load as a custom track +4. Explore alongside reference annotations +5. Zoom, scroll, and configure display settings + +## Common Workflows + +### Quality Control Workflow +**Purpose**: Assess sequencing data quality before analysis + +**Steps:** +1. Select organism and assembly +2. Choose "QC Workflow" +3. Upload or select FASTQ files +4. Run workflow +5. Review MultiQC report for: + - Read quality scores + - Adapter content + - Sequence duplication + - GC content + +### Read Mapping Workflow +**Purpose**: Align sequencing reads to reference genome + +**Key parameters:** +- Choose aligner (BWA-MEM, Bowtie2, HISAT2, STAR) +- Set mapping quality thresholds +- Configure paired-end vs single-end + +### Variant Discovery Pipeline +**Purpose**: Complete variant identification from raw reads + +**Includes:** +1. Quality control +2. Read mapping +3. Variant calling +4. Annotation +5. Summary reports + +## Data Management Tips + +### Organizing Your Histories +- **Create separate histories** for different projects or experiments +- **Name histories descriptively** (e.g., "Measles_Surveillance_2024") +- **Add tags** to datasets for easy filtering +- **Delete unnecessary files** to manage storage + +### Saving and Exporting Data +- **Download results** by clicking the download icon on any dataset +- **Export workflows** for reproducibility +- **Share histories** with collaborators via Galaxy's sharing features +- **Publish analyses** to make them publicly accessible + +## Understanding Galaxy Basics + +Since BRC Analytics runs on Galaxy, understanding these fundamentals is helpful: + +### The Three-Panel Interface +1. **Left panel (Tools)**: Available analysis tools and workflows +2. **Middle panel (Main)**: Tool forms, visualizations, and documentation +3. **Right panel (History)**: Your datasets and analysis outputs + +### Datasets +- Each file in your history is a dataset +- Color coding indicates status: + - **Gray**: Waiting to run + - **Yellow**: Running + - **Green**: Successfully completed + - **Red**: Failed (click for error details) + +### Collections +- Group related datasets together +- Useful for batch processing multiple samples +- Created automatically for paired-end reads or can be manually created for grouping + +## Troubleshooting Common Issues + +### Analysis Failed (Red Dataset) +1. Click on the red dataset +2. Review the error message +3. Common causes: + - Incorrect file format + - Invalid parameter values + - Insufficient computational resources +4. Adjust settings and re-run + +### Dataset Stuck in Gray/Yellow +- Jobs are queued - be patient +- Check the Galaxy status page for system issues +- If stuck >24 hours, report to support + +### Can't Find My Data +- Check you're viewing the correct history +- Use the search function in the history panel +- Datasets may be hidden - click "Include hidden datasets" + +### Results Don't Look Right +- Verify input data quality +- Check parameter settings +- Compare with expected reference +- Review QC reports for issues + +## Best Practices + +### For Reproducible Research +1. **Document parameters**: Note all settings used +2. **Extract workflows**: Save your analysis pipeline +3. **Record provenance**: Galaxy automatically tracks this +4. **Share complete histories**: Include all inputs and parameters +5. **Publish workflows**: Make methods available to others + +### For Efficient Analysis +1. **Start small**: Test with a subset of data +2. **Use collections**: Process multiple samples together +3. **Delete intermediate files**: Keep only essential outputs +4. **Monitor job status**: Check for errors early +5. **Save important results**: Download completed analyses + +### For Quality Results +1. **Always run QC first**: Assess data quality +2. **Use appropriate references**: Match your organism +3. **Validate results**: Check consistency across samples +4. **Review annotations carefully**: Understand variant impacts +5. **Consult documentation**: Each tool has specific requirements + +## Example Use Cases + +### Outbreak Surveillance +Monitor pathogen evolution by: +1. Downloading recent SRA surveillance data +2. Running variant calling workflow +3. Comparing variants across samples +4. Identifying emerging mutations +5. Visualizing on genome browser + +### Comparative Genomics (available soon!) +Compare strains or species: +1. Select multiple genome assemblies +2. Run alignment workflows +3. Identify conserved regions +4. Analyze variant patterns +5. Perform evolutionary analysis in JupyterLite + +### Transcriptome Analysis +Analyze gene expression: +1. Upload RNA-seq data +2. Run transcriptomics workflow +3. Quantify gene expression +4. Identify differentially expressed genes +5. Visualize results + +## Getting Help + +### Documentation Resources +- **BRC Analytics Help**: [https://help.brc-analytics.org](https://help.brc-analytics.org) +- **Galaxy Training Network**: [https://training.galaxyproject.org](https://training.galaxyproject.org) +- **Galaxy 101 Tutorial**: [https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-101/tutorial.html?utm_source=redirect&utm_medium=learn&utm_campaign=galaxyhub](Basic introduction to Galaxy interface) +- **Featured Analyses**: [/learn/featured-analyses](Real-world analysis examples) + +### Support Channels +- **Galaxy Help Forum**: [https://help.galaxyproject.org](https://help.galaxyproject.org) +- **BRC Analytics Discourse**: [https://help.brc-analytics.org](Community support forum) +- **Matrix Chat**: [https://matrix.to/#/#brc-analytics:matrix.org](Real-time community help) + +### Learning Resources +- **GTN Tutorials**: [https://training.galaxyproject.org](Step-by-step guides for specific analyses) +- **Webinars**: [https://www.youtube.com/@brc_consortium](Recorded sessions on BRC Analytics features) +- **Example Workflows**: [https://iwc.galaxyproject.org](Pre-built analysis pipelines) + +## Next Steps + +### Beginner Path +1. ✅ Complete this getting started guide +2. ☐ Try the Galaxy 101 tutorial +3. ☐ Run the measles virus example analysis +4. ☐ Upload your own data +5. ☐ Create your first custom workflow + +### Intermediate Path +1. ☐ Explore advanced workflow parameters +2. ☐ Use JupyterLite for custom analysis +3. ☐ Batch process multiple samples using collections +4. ☐ Visualize results in UCSC Genome Browser +5. ☐ Share workflows with collaborators + +### Advanced Path +1. ☐ Develop custom workflows +2. ☐ Integrate multiple data types +3. ☐ Perform evolutionary analyses +4. ☐ Contribute to community workflows +5. ☐ Publish reproducible research + +## Additional Resources + +### Related Platforms +- **[https://bv-brc.org](BV-BRC)**: Bacterial and Viral Bioinformatics Resource Center +- **[https://www.ncbi.nlm.nih.gov/datasets/](NCBI Datasets)**: Source of reference genomes +- **[https://galaxyproject.org](Galaxy Project)**: Underlying workflow system +- **[https://iwc.galaxyproject.org/](Intergalactic Workflow Commission)**: Community-curated workflows + +### Key Publications +- [https://www.biorxiv.org/content/10.1101/2025.10.13.682095v2](BRC Analytics paper (bioRxiv 2025.10.13.682095)) +- [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010752](Galaxy Training Network (PLOS Comp Bio 2023)) +- [https://pmc.ncbi.nlm.nih.gov/articles/PMC11223835/](Galaxy Project (NAR 2024)) + +### Stay Updated +- Follow [https://mstdn.science/@brc_analytics](@brc_analytics on Mastodon) +- Check the [https://galaxyproject.org/news/](Galaxy Project news) + +--- + +## Quick Reference Card + +### Essential URLs +- BRC Analytics: https://brc-analytics.org +- Galaxy Interface: https://brc.usegalaxy.org +- Help Forum: https://help.brc-analytics.org +- Training: https://training.galaxyproject.org + +### Key Concepts +- **Assembly**: Reference genome build +- **Workflow**: Automated analysis pipeline +- **History**: Your analysis workspace +- **Dataset**: Individual file in your history +- **Collection**: Group of related datasets + +### Common File Formats +- **FASTQ**: Raw sequencing reads +- **BAM/SAM**: Aligned sequence reads +- **VCF**: Variant call format +- **FASTA**: Sequence data +- **GFF/GTF**: Genome annotations + +--- + +Welcome to BRC Analytics! This guide has covered the essentials to get you started. Remember, the platform is designed to make complex genomic analysis accessible - don't hesitate to explore, experiment, and ask for help when needed. The community is here to support your research journey. + +Happy analyzing! đŸ§Ŧ