Skip to content

Commit

Permalink
update-pre
Browse files Browse the repository at this point in the history
  • Loading branch information
telatin committed Dec 19, 2022
1 parent dcbcf5f commit dd76a52
Show file tree
Hide file tree
Showing 5 changed files with 295 additions and 7 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,5 @@ data/MiSeq_SOP/
test-err
dada2-input
__pycache__
MiSeq_SOP/
miseqsopdata.zip
2 changes: 1 addition & 1 deletion bin/dadaist2
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#ABSTRACT: A program to run DADA2 from the CLI
use 5.012;
use warnings;
my $VERSION = '1.2.5';
my $VERSION = '1.3.0';

BEGIN {

Expand Down
19 changes: 14 additions & 5 deletions docs/3_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ permalink: /tutorial
Updated for version 1.3.0
```

This tutorial aims at familiarising with the programme, but relies on a very short and noisy dataset.
To fully test the pipeline we recommend a well established dataset such as "Mothur SOP", see
[the full tutorial]({{ site.baseurl }}{% link 4_usage.md %})

## Get ready

[Install Dadaist2](({{ 'installation' | relative_url }})) and activate the Miniconda environment (if needed).
Expand All @@ -30,7 +34,7 @@ seqfu count --basename data/16S/*.gz
This will tell the number of reads, checking that the forward (R1) and reverse (R2)
pair have the same amount of reads. This should be the output produced:

```
```text
F99_S0_L001_R1_001.fastq.gz 4553 Paired
A01_S0_L001_R1_001.fastq.gz 6137 Paired
A02_S0_L001_R1_001.fastq.gz 5414 Paired
Expand All @@ -43,6 +47,7 @@ prohibited in a future release, and will trigger a warning in the current releas

Dadaist2 provides a convenient tool to download some pre-formatted reference databases.
To have a list of the available to download:

```bash
dadaist2-getdb --list
```
Expand Down Expand Up @@ -105,17 +110,19 @@ F99 F99_S0_L001_R1_001.fastq.gz,F99_S0_L001_R2_001.fastq.gz
## Run the analysis

Dadaist2 provides options to:

* select the QC strategy (fastp, cutadapt of seqfu)
* select the taxonomy classifier (DECIPHER or DADA2 naive classifier)
* adjust various steps via command line parameters


As a first run, we recommend using the default parameters:

```bash
dadaist2 -i data/16S/ -o example-output -d refs/SILVA_SSU_r138_2019.RData -t 8 -m metadata.tsv --verbose
```

Briefly:

* `-i` points to the input directory containing paired end reads (by default recognised by `_R1` and `_R2` tags, but this can be customised)
* `-o` is the output directory
* `-d` is the reference database in DADA2 or DECIPHER format (we downloaded a DECIPHER database)
Expand Down Expand Up @@ -145,28 +152,30 @@ and we didn't have overlap between the reads.
In this case a good loss is at the first step (filtered), as these sample
reads are not of very high quality and are just used to test the pipeline.

:bulb: In this datasets the primers were not removed. There are two ways to fix this:
:bulb: Now there is a `dadaist2-checkstats` tool to identify the steps causing the biggest loss.

In this datasets the primers were not removed. There are two ways to fix this:

* Use the primer sequences with `--primers CCTACGGGNGGCWGCAGTNG:GACTACNNGGGTATCTAATC` (forward:reverse)
* Trim fixed lengths from forward and reverse reads with `--trim-primer-for 20` and `--trim-primer-rev 20` (or `--s1 20` and `--s2 20` in shorter form)


## A real dataset

If the pipeline ended as expected, it means you are ready to run it with real samples
as [described in another tutorial]({{ site.baseurl }}{% link 4_usage.md %}).


## The output directory

Notable files:

* **rep-seqs.fasta** representative sequences (ASVs) in FASTA format
* **rep-seqs-tax.fasta** representative sequences (ASVs) in FASTA format, with taxonomy labels as comments
* **feature-table.tsv** table of raw counts (after cross-talk removal if specified)
* **taxonomy.tsv** a text file with the taxonomy of each ASV (used to add the labels to the _rep-seqs-tax.fasta_)
* copy of the **metadata.tsv** file

Subdirectories:

* **MicrobiomeAnalyst** a set of files formatted to be used with the online (also available offline as R package) software [MicrobiomeAnalyst](https://www.microbiomeanalyst.ca/MicrobiomeAnalyst/upload/OtuUploadView.xhtml).
* **Rhea** a directory with files to be used with the [Rhea pipeline](https://lagkouvardos.github.io/Rhea/), as well as some pre-calculated outputs (Normalization and Alpha diversity are done by default, as they don't require knowledge about metadata categories)
* **R** a directory with the PhyloSeq object
Expand Down
277 changes: 277 additions & 0 deletions env/dadaist2-1.3.0_Linux.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
name: dadaist-1.5
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- _r-mutex=1.0.1=anacondar_1
- argcomplete=2.0.0=pyhd8ed1ab_0
- argtable2=2.13=h14c3975_1001
- binutils_impl_linux-64=2.39=he00db2b_1
- bioconductor-biobase=2.58.0=r42hc0cfd56_0
- bioconductor-biocgenerics=0.44.0=r42hdfd78af_0
- bioconductor-biocparallel=1.32.0=r42hc247a5b_0
- bioconductor-biomformat=1.26.0=r42hdfd78af_0
- bioconductor-biostrings=2.66.0=r42hc0cfd56_0
- bioconductor-dada2=1.26.0=r42hc247a5b_0
- bioconductor-data-packages=20221112=hdfd78af_0
- bioconductor-decipher=2.26.0=r42hc0cfd56_0
- bioconductor-delayedarray=0.24.0=r42hc0cfd56_0
- bioconductor-genomeinfodb=1.34.1=r42hdfd78af_0
- bioconductor-genomeinfodbdata=1.2.9=r42hdfd78af_0
- bioconductor-genomicalignments=1.34.0=r42hc0cfd56_0
- bioconductor-genomicranges=1.50.0=r42hc0cfd56_0
- bioconductor-iranges=2.32.0=r42hc0cfd56_0
- bioconductor-matrixgenerics=1.10.0=r42hdfd78af_0
- bioconductor-microbiome=1.20.0=r42hdfd78af_0
- bioconductor-multtest=2.54.0=r42hc0cfd56_0
- bioconductor-phyloseq=1.42.0=r42hdfd78af_0
- bioconductor-rhdf5=2.42.0=r42hbe1951d_1
- bioconductor-rhdf5filters=1.10.0=r42hc247a5b_0
- bioconductor-rhdf5lib=1.20.0=r42hc0cfd56_0
- bioconductor-rhtslib=2.0.0=r42hc0cfd56_0
- bioconductor-rsamtools=2.14.0=r42hc247a5b_0
- bioconductor-s4vectors=0.36.0=r42hc0cfd56_0
- bioconductor-shortread=1.56.0=r42hc247a5b_0
- bioconductor-summarizedexperiment=1.28.0=r42hdfd78af_0
- bioconductor-xvector=0.38.0=r42hc0cfd56_0
- bioconductor-zlibbioc=1.44.0=r42hc0cfd56_0
- biom-format=2.1.13=py310h1fa729e_0
- bwidget=1.9.14=ha770c72_1
- bzip2=1.0.8=h7f98852_4
- c-ares=1.18.1=h7f98852_0
- ca-certificates=2022.12.7=ha878542_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- cairo=1.16.0=ha61ee94_1014
- cffi=1.15.1=py310h255011f_3
- click=8.1.3=py310hff52083_1
- clustalo=1.2.4=h87f3376_5
- curl=7.86.0=h6312ad2_2
- cutadapt=4.2=py310h1425a21_0
- dadaist2=1.0.1=hdfd78af_0
- dnaio=0.10.0=py310h1425a21_0
- expat=2.5.0=h27087fc_0
- fastp=0.23.2=h5f740d0_3
- fasttree=2.1.11=hec16e2b_1
- font-ttf-dejavu-sans-mono=2.37=hab24e00_0
- font-ttf-inconsolata=3.000=h77eed37_0
- font-ttf-source-code-pro=2.038=h77eed37_0
- font-ttf-ubuntu=0.83=hab24e00_0
- fontconfig=2.14.1=hc2a2eb6_0
- fonts-conda-ecosystem=1=0
- fonts-conda-forge=1=0
- freetype=2.12.1=hca18f0e_1
- fribidi=1.0.10=h36c2ea0_0
- gcc_impl_linux-64=12.2.0=hcc96c02_19
- gettext=0.21.1=h27087fc_0
- gfortran_impl_linux-64=12.2.0=h55be85b_19
- glpk=5.0=h445213a_0
- gmp=6.2.1=h58526e2_0
- graphite2=1.3.13=h58526e2_1001
- gsl=2.7=he838d99_0
- gxx_impl_linux-64=12.2.0=hcc96c02_19
- h5py=3.7.0=nompi_py310h416281c_102
- harfbuzz=6.0.0=h8e241bc_0
- hdf5=1.12.2=nompi_h2386368_100
- icu=70.1=h27087fc_0
- importlib-metadata=4.11.4=py310hff52083_0
- importlib_metadata=4.11.4=hd8ed1ab_0
- isa-l=2.30.0=ha770c72_4
- jpeg=9e=h166bdaf_2
- jq=1.6=h36c2ea0_1000
- kernel-headers_linux-64=2.6.32=he073ed8_15
- keyutils=1.6.1=h166bdaf_0
- krb5=1.20.1=hf9c8cef_0
- ld_impl_linux-64=2.39=hcc3a1bd_1
- lerc=4.0.0=h27087fc_0
- libblas=3.9.0=16_linux64_openblas
- libcblas=3.9.0=16_linux64_openblas
- libcurl=7.86.0=h6312ad2_2
- libdeflate=1.13=h166bdaf_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libffi=3.4.2=h7f98852_5
- libgcc-devel_linux-64=12.2.0=h3b97bd3_19
- libgcc-ng=12.2.0=h65d4601_19
- libgfortran-ng=12.2.0=h69a702a_19
- libgfortran5=12.2.0=h337968e_19
- libglib=2.74.1=h606061b_1
- libgomp=12.2.0=h65d4601_19
- libiconv=1.17=h166bdaf_0
- liblapack=3.9.0=16_linux64_openblas
- libnghttp2=1.47.0=hdcd2b5c_1
- libnsl=2.0.0=h7f98852_0
- libopenblas=0.3.21=pthreads_h78a6416_3
- libpng=1.6.39=h753d276_0
- libsanitizer=12.2.0=h46fd767_19
- libsqlite=3.40.0=h753d276_0
- libssh2=1.10.0=haa6b8db_3
- libstdcxx-devel_linux-64=12.2.0=h3b97bd3_19
- libstdcxx-ng=12.2.0=h46fd767_19
- libtiff=4.4.0=h0e0dad5_3
- libuuid=2.32.1=h7f98852_1000
- libwebp-base=1.2.4=h166bdaf_0
- libxcb=1.13=h7f98852_1004
- libxml2=2.10.3=h7463322_0
- libzip=1.9.2=hc869a4a_1
- libzlib=1.2.13=h166bdaf_4
- make=4.3=hd18ef5c_1
- ncurses=6.3=h27087fc_1
- numpy=1.23.5=py310h53a5b5f_0
- oniguruma=6.9.8=h166bdaf_0
- openssl=1.1.1s=h0b41bf4_1
- pandas=1.5.2=py310h769672d_0
- pango=1.50.12=hd33c08f_1
- pbzip2=1.1.13=0
- pcre=8.45=h9c3ff4c_0
- pcre2=10.40=hc3806b6_0
- perl=5.32.1=2_h7f98852_perl5
- perl-capture-tiny=0.48=pl5321ha770c72_1
- perl-carp=1.50=pl5321hd8ed1ab_0
- perl-exporter=5.74=pl5321hd8ed1ab_0
- perl-extutils-makemaker=7.64=pl5321hd8ed1ab_0
- perl-fastx-reader=1.7.0=pl5321hdfd78af_0
- pigz=2.6=h27826a3_0
- pip=22.3.1=pyhd8ed1ab_0
- pixman=0.40.0=h36c2ea0_0
- pthread-stubs=0.4=h36c2ea0_1001
- pycparser=2.21=pyhd8ed1ab_0
- python=3.10.8=h257c98d_0_cpython
- python-dateutil=2.8.2=pyhd8ed1ab_0
- python-isal=1.1.0=py310h5764c6d_1
- python_abi=3.10=3_cp310
- pytz=2022.7=pyhd8ed1ab_0
- pyyaml=6.0=py310h5764c6d_5
- qax=0.9.6=hac521b0_1
- r-ade4=1.7_20=r42h5f7b363_0
- r-ape=5.6_2=r42h9f5de39_1
- r-assertthat=0.2.1=r42hc72bb7e_3
- r-base=4.2.2=h6b4767f_2
- r-bayesm=3.1_5=r42h9f5de39_0
- r-bh=1.78.0_0=r42hc72bb7e_1
- r-bit=4.0.5=r42h06615bd_0
- r-bit64=4.0.5=r42h06615bd_1
- r-bitops=1.0_7=r42h06615bd_1
- r-blob=1.2.3=r42hc72bb7e_1
- r-cachem=1.0.6=r42h06615bd_1
- r-cli=3.4.1=r42h7525677_1
- r-cluster=2.1.4=r42h8da6f51_0
- r-codetools=0.2_18=r42hc72bb7e_1
- r-colorspace=2.0_3=r42h06615bd_1
- r-compositions=2.0_4=r42h06615bd_1
- r-cpp11=0.4.3=r42hc72bb7e_0
- r-crayon=1.5.2=r42hc72bb7e_1
- r-data.table=1.14.6=r42h06615bd_0
- r-dbi=1.1.3=r42hc72bb7e_1
- r-deldir=1.0_6=r42h8da6f51_1
- r-deoptimr=1.0_11=r42hc72bb7e_1
- r-dplyr=1.0.10=r42h7525677_1
- r-ellipsis=0.3.2=r42h06615bd_1
- r-fansi=1.0.3=r42h06615bd_1
- r-farver=2.1.1=r42h7525677_1
- r-fastmap=1.1.0=r42h7525677_1
- r-foreach=1.5.2=r42hc72bb7e_1
- r-formatr=1.12=r42hc72bb7e_1
- r-futile.logger=1.4.3=r42hc72bb7e_1004
- r-futile.options=1.0.1=r42hc72bb7e_1003
- r-generics=0.1.3=r42hc72bb7e_1
- r-ggplot2=3.4.0=r42hc72bb7e_1
- r-glue=1.6.2=r42h06615bd_1
- r-gtable=0.3.1=r42hc72bb7e_1
- r-hms=1.1.2=r42hc72bb7e_1
- r-hwriter=1.3.2.1=r42hc72bb7e_1
- r-igraph=1.3.5=r42hb34fc8a_0
- r-interp=1.1_3=r42h7525677_1
- r-isoband=0.2.6=r42h7525677_2
- r-iterators=1.0.14=r42hc72bb7e_1
- r-jpeg=0.1_10=r42h06615bd_0
- r-jsonlite=1.8.4=r42h133d619_0
- r-labeling=0.4.2=r42hc72bb7e_2
- r-lambda.r=1.2.4=r42hc72bb7e_2
- r-lattice=0.20_45=r42h06615bd_1
- r-latticeextra=0.6_30=r42hc72bb7e_1
- r-lifecycle=1.0.3=r42hc72bb7e_1
- r-magrittr=2.0.3=r42h06615bd_1
- r-mass=7.3_58.1=r42h06615bd_1
- r-matrix=1.5_3=r42h5f7b363_0
- r-matrixstats=0.63.0=r42h06615bd_0
- r-memoise=2.0.1=r42hc72bb7e_1
- r-mgcv=1.8_41=r42h5f7b363_0
- r-munsell=0.5.0=r42hc72bb7e_1005
- r-nlme=3.1_161=r42hac0b197_0
- r-permute=0.9_7=r42hc72bb7e_1
- r-pillar=1.8.1=r42hc72bb7e_1
- r-pixmap=0.4_12=r42hc72bb7e_1
- r-pkgconfig=2.0.3=r42hc72bb7e_2
- r-plogr=0.2.0=r42hc72bb7e_1004
- r-plyr=1.8.8=r42h7525677_0
- r-png=0.1_8=r42h10cf519_0
- r-prettyunits=1.1.1=r42hc72bb7e_2
- r-progress=1.2.2=r42hc72bb7e_3
- r-purrr=0.3.5=r42h06615bd_1
- r-r6=2.5.1=r42hc72bb7e_1
- r-rcolorbrewer=1.1_3=r42h785f33e_1
- r-rcpp=1.0.9=r42h7525677_2
- r-rcpparmadillo=0.11.4.2.1=r42h9f5de39_0
- r-rcppeigen=0.3.3.9.3=r42h9f5de39_0
- r-rcppparallel=5.1.5=r42h7525677_1
- r-rcurl=1.98_1.9=r42h06615bd_1
- r-reshape2=1.4.4=r42h7525677_2
- r-rlang=1.0.6=r42h7525677_1
- r-robustbase=0.95_0=r42hb20cf53_1
- r-rsqlite=2.2.19=r42h7525677_0
- r-rtsne=0.16=r42h37cf8d7_1
- r-scales=1.2.1=r42hc72bb7e_1
- r-snow=0.4_4=r42hc72bb7e_1
- r-sp=1.5_1=r42h06615bd_0
- r-stringi=1.7.8=r42h30a9eb7_1
- r-stringr=1.5.0=r42h785f33e_0
- r-survival=3.4_0=r42h06615bd_1
- r-tensora=0.36.2=r42h06615bd_1
- r-tibble=3.1.8=r42h06615bd_1
- r-tidyr=1.2.1=r42h7525677_1
- r-tidyselect=1.2.0=r42hc72bb7e_0
- r-utf8=1.2.2=r42h06615bd_1
- r-vctrs=0.5.1=r42h7525677_0
- r-vegan=2.6_4=r42hb20cf53_0
- r-viridislite=0.4.1=r42hc72bb7e_1
- r-withr=2.5.0=r42hc72bb7e_1
- readline=8.1.2=h0f457ee_0
- scipy=1.9.3=py310hdfbd76f_2
- sed=4.8=he412f7d_0
- seqfu=1.17.0=hbd632db_0
- setuptools=65.6.3=pyhd8ed1ab_0
- six=1.16.0=pyh6c4a22f_0
- sysroot_linux-64=2.12=he073ed8_15
- tk=8.6.12=h27826a3_0
- tktable=2.10=hb7b940f_3
- toml=0.10.2=pyhd8ed1ab_0
- tzdata=2022g=h191b570_0
- vsearch=2.22.1=hf1761c0_0
- wheel=0.38.4=pyhd8ed1ab_0
- xmltodict=0.13.0=pyhd8ed1ab_0
- xopen=1.7.0=py310hff52083_0
- xorg-kbproto=1.0.7=h7f98852_1002
- xorg-libice=1.0.10=h7f98852_0
- xorg-libsm=1.2.3=hd9c2040_1000
- xorg-libx11=1.7.2=h7f98852_0
- xorg-libxau=1.0.9=h7f98852_0
- xorg-libxdmcp=1.1.3=h7f98852_0
- xorg-libxext=1.3.4=h7f98852_1
- xorg-libxrender=0.9.10=h7f98852_1003
- xorg-libxt=1.2.1=h7f98852_2
- xorg-renderproto=0.11.1=h7f98852_1002
- xorg-xextproto=7.3.0=h7f98852_1002
- xorg-xproto=7.0.31=h7f98852_1007
- xz=5.2.6=h166bdaf_0
- yaml=0.2.5=h7f98852_2
- yq=3.1.0=pyhd8ed1ab_0
- zip=3.0=h7f98852_1
- zipp=3.11.0=pyhd8ed1ab_0
- zlib=1.2.13=h166bdaf_4
- zstandard=0.19.0=py310hdeb6495_1
- zstd=1.5.2=h6239696_4
prefix: /mnt/disk/miniconda3/envs/dadaist-1.5
2 changes: 1 addition & 1 deletion env/dadaist2_1.2.0_Linux.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: dadaist_1.2
name: dadaist
channels:
- defaults
- bioconda
Expand Down

0 comments on commit dd76a52

Please sign in to comment.