Skip to content

mayee123/HiCPlas

Repository files navigation

HiCPlas: Documentation

Nextflow pipeline for reconstructing plasmids and associating them to their bacterial host using Hi-C metagenomic data.

Overview

This pipeline requires both metagenomic and Hi-C reads from the same sample. It performs quality control and assembly of the shotgun metagenomic reads using nf-core/MAG, as well as quality control of the Hi-C reads. Binning is carried out with either MetaCC or ImputeCC, including an assessment of the number of informative Hi-C reads using QC3C. From the resulting bins, plasmid sequences are predicted with MOB-recon, taxonomic classification is performed with Kraken2, and bin quality is evaluated with CheckM [6]. The pipeline concludes by reporting the genus and species of bins that pass quality control.

Usage

Input

This pipeline takes as input a samplesheet containing paths to the metagenomic reads. This samplesheet is the same as one would include for nf-core/MAG and include these columns: "sample,group,short_reads_1,short_reads_2,long_reads". Paths to short_reads_2 and long_reads are optional. Additionally, a path to the paired Hi-C reads must be specified under the --hic_read flag and an output directory must be supplied under the --outdir flag. The path to a kraken database must be provided using --kraken_db unless it is skipped.

Parameters

--input: path to samplesheet
--hic_read: path to paired Hi-C reads, should follow pattern similar to "path_to_hic_read/sample_{1,2}.fastq.gz"
--imputecc:(true or false): Bin using imputeCC instead of MetaCC. Default false 
--outdir: output folder
--enzyme (optional): Case-sensitive enzyme name. Use multiple times for multiple enzymes. Required if using ImputeCC
--hybrid(true or false):Assemble using hybrid reads. If using hybrid reads, must specify paths to both long and short reads in samplesheet. Default false
--host_removal(true or false): Use bowtie2 for host removal. Default false
--host_fasta: path to fasta file of host genome if doing host removal
--host_fasta_bowtie2index: bowtie2 index of host genome if doing host removal
--kraken_db: path to kraken2 database. Required if not skipping kraken2
--skip_kraken: skip bin taxonomic classification using kraken2. Default false
--skip_qc3c: skip Hi-C alignment QC using qc3C. Default false
--skip_hic_trim: skip Hi-C read trimming. Default false
--skip_assembly: skip assembly of shotgun reads. Requires supplying path of assembled contigs and a samplesheet which can be blank. Default false
--assembled_contigs: path to assembled contigs if skipping assembly

Example Usage

nextflow run main.nf -profile singularity --input path_to_samplesheet.csv --outdir HiCPlas_results --hic_read "path_to_hic_read/sample_{1,2}.fastq.gz --kraken_db path_to_kraken_db"

In this case, depending on whether short read or hybrid reads are provided in the samplesheet, HiCPlas will quality control the reads and then assemble using MEGAHIT or MetaSPAdes. By default, host removal of both the Hi-C reads and shotgun reads will not be performed. The assembled contigs will then be aligned to the trimmed Hi-C reads and then binned by MetaCC by default. It is recommended to provide the enzyme using the --enzyme flag.

Credits

This pipeline was built by me as part of my Master's thesis at Simon Fraser University. It was created using computational resources from my lab CIDGOH genomics group.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

 
 
 

Contributors