Skip to content

Commit

Permalink
cleaned up branch, added scripts, etc
Browse files Browse the repository at this point in the history
  • Loading branch information
ndamle2 committed Feb 23, 2024
1 parent 97fcb57 commit 99fc80f
Show file tree
Hide file tree
Showing 11 changed files with 224 additions and 213 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,17 +20,17 @@ Next, ensure your environment has the following packages installed:

We recommend using the provided prebuilt [conda](https://docs.conda.io/en/latest/#) to install the required packages. Movi does not have a conda package, so it must be installed separately. See the [installation instructions for Movi](https://github.com/mohsenzakeri/Movi#install-movi-and-its-dependencies-from-source).
```bash
conda env create -f human-depletion.yml
conda env create -f human-filtration.yml
```

Next, download the human reference genomes to be used for filtration. We recommend [GRCh38](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.26/), [T2T-CHM13v2.0](https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_009914755.1/), and all currently available pangenomes from the [Human Pangenome Reference Consortium (HPRC)](https://humanpangenome.org). See the table below for additional information and citations for the reference genomes used in this pipeline. A download script is provided for convenience.
```bash
bash download_references.sh
bash scripts/download_references.sh
```

Next, create Minimap2 and Movi indexes for the previously downloaded reference genomes. A script is provided for convenience.
```bash
bash create_indexes.sh
bash scripts/create_indexes.sh
```

Next, configure the file `config.sh` with the necessary files and executables for your environment. The file `config.sh` is sourced by all other scripts in the pipeline, so it is important to ensure that it is configured correctly. Some of the variables in `config.sh` have specific constraints that must be followed. These constraints are described in the comments of `config.sh`. An example is provided below:
Expand Down
49 changes: 0 additions & 49 deletions config.example.sh

This file was deleted.

2 changes: 1 addition & 1 deletion config.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,4 +44,4 @@ file_map["ALIGN-HPRC"]="filter_align_hprc.sh"
file_map["INDEX-HPRC"]="filter_index_hprc.sh"

#conda
CONDA_ENV_NAME=human-depletion
CONDA_ENV_NAME=human-filtration
40 changes: 0 additions & 40 deletions create_minimap_indexes/hg38_download.sh

This file was deleted.

70 changes: 0 additions & 70 deletions filter.array.sbatch

This file was deleted.

4 changes: 2 additions & 2 deletions filter.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@
#SBATCH --output=logs/%x-%A_%a.out
#SBATCH --error=logs/%x-%A_%a.err

#config_fn="config.sh"
config_fn="config.sh"
#config_fn="config.cg.hmf.sh"
config_fn="config.cg.100k.sh"
#config_fn="config.cg.100k.sh"

source ${config_fn}
echo "Beginning host filtration on directory: ${IN}"
Expand Down
103 changes: 103 additions & 0 deletions human-filtration.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
name: human-filtration
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- bedtools=2.31.0=hf5e1c6e_2
- binutils_impl_linux-64=2.40=hf600244_0
- bzip2=1.0.8=h7f98852_4
- c-ares=1.19.1=hd590300_0
- ca-certificates=2023.7.22=hbcca054_0
- cached-property=1.5.2=hd8ed1ab_1
- cached_property=1.5.2=pyha770c72_1
- entrez-direct=16.2=he881be0_1
- fastp=0.23.4=hadf994f_2
- gcc=13.1.0=h8e92de4_1
- gcc_impl_linux-64=13.1.0=hc4be1a9_0
- gettext=0.21.1=h27087fc_0
- h5py=3.9.0=nompi_py311he78b9b8_101
- hdf5=1.14.1=nompi_h4f84152_100
- hmmer=3.3.2=hdbdd923_4
- htslib=1.17=h81da01d_2
- isa-l=2.30.0=ha770c72_4
- k8=0.2.5=hdcf5f25_4
- kernel-headers_linux-64=2.6.32=he073ed8_16
- keyutils=1.6.1=h166bdaf_0
- krb5=1.21.1=h659d440_0
- ld_impl_linux-64=2.40=h41732ed_0
- libaec=1.0.6=hcb278e6_1
- libblas=3.9.0=17_linux64_openblas
- libcblas=3.9.0=17_linux64_openblas
- libcurl=8.2.1=hca28451_0
- libdeflate=1.18=h0b41bf4_0
- libedit=3.1.20191231=he28a2e2_2
- libev=4.33=h516909a_1
- libexpat=2.5.0=hcb278e6_1
- libffi=3.4.2=h7f98852_5
- libgcc-devel_linux-64=13.1.0=he3cc6c4_0
- libgcc-ng=13.1.0=he5830b7_0
- libgfortran-ng=13.1.0=h69a702a_0
- libgfortran5=13.1.0=h15d22d2_0
- libgomp=13.1.0=he5830b7_0
- libidn2=2.3.4=h166bdaf_0
- liblapack=3.9.0=17_linux64_openblas
- libnghttp2=1.52.0=h61bc06f_0
- libnsl=2.0.0=h7f98852_0
- libopenblas=0.3.23=pthreads_h80387f5_0
- libsanitizer=13.1.0=hfd8a6a1_0
- libsqlite=3.42.0=h2797004_0
- libssh2=1.11.0=h0841786_0
- libstdcxx-ng=13.1.0=hfd8a6a1_0
- libunistring=0.9.10=h7f98852_0
- libuuid=2.38.1=h0b41bf4_0
- libzlib=1.2.13=hd590300_5
- minigraph=0.20=he4a0461_2
- minimap2=2.26=he4a0461_1
- more-itertools=10.1.0=pyhd8ed1ab_0
- ncurses=6.4=hcb278e6_0
- numpy=1.25.2=py311h64a7726_0
- openssl=3.1.2=hd590300_0
- pcre=8.45=h9c3ff4c_0
- perl=5.32.1=4_hd590300_perl5
- perl-archive-tar=2.40=pl5321hdfd78af_0
- perl-carp=1.50=pl5321hd8ed1ab_0
- perl-common-sense=3.75=pl5321hd8ed1ab_0
- perl-compress-raw-bzip2=2.201=pl5321h166bdaf_0
- perl-compress-raw-zlib=2.202=pl5321h166bdaf_0
- perl-encode=3.19=pl5321h166bdaf_0
- perl-exporter=5.74=pl5321hd8ed1ab_0
- perl-exporter-tiny=1.002002=pl5321hd8ed1ab_0
- perl-extutils-makemaker=7.70=pl5321hd8ed1ab_0
- perl-io-compress=2.201=pl5321hdbdd923_2
- perl-io-zlib=1.14=pl5321hdfd78af_0
- perl-json=4.10=pl5321hdfd78af_0
- perl-json-xs=2.34=pl5321h4ac6f70_6
- perl-list-moreutils=0.430=pl5321hdfd78af_0
- perl-list-moreutils-xs=0.430=pl5321h031d066_2
- perl-parent=0.241=pl5321hd8ed1ab_0
- perl-pathtools=3.75=pl5321h166bdaf_0
- perl-scalar-list-utils=1.63=pl5321h166bdaf_0
- perl-storable=3.15=pl5321h166bdaf_0
- perl-types-serialiser=1.01=pl5321hdfd78af_0
- pip=23.2.1=pyhd8ed1ab_0
- python=3.11.4=hab00c5b_0_cpython
- python_abi=3.11=3_cp311
- readline=8.2=h8228510_1
- repeatmasker=4.1.5=pl5321hdfd78af_0
- rmblast=2.14.0=h4565617_2
- samtools=1.17=hd87286a_1
- seqtk=1.4=he4a0461_1
- setuptools=68.0.0=pyhd8ed1ab_0
- sysroot_linux-64=2.12=he073ed8_16
- tk=8.6.12=h27826a3_0
- trf=4.09.1=h031d066_4
- tzdata=2023c=h71feb2d_0
- wget=1.20.3=ha35d2d1_1
- wheel=0.41.1=pyhd8ed1ab_0
- xz=5.2.6=h166bdaf_0
- zlib=1.2.13=hd590300_5
- zstd=1.5.2=hfc55251_7
prefix: /home/mcdonadt/miniconda3/envs/human-filtration
78 changes: 78 additions & 0 deletions ref/NC_001422.fna
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
>NC_001422.1 Escherichia phage phiX174, complete genome
GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT
GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA
ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG
TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA
GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC
TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT
TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT
CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT
TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG
TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC
GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA
CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG
TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT
AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC
CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA
TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC
TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA
CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA
GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT
GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA
ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC
TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT
TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC
ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC
CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT
GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC
CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC
TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG
TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT
TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA
AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT
TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT
ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC
GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC
TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT
TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA
TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG
TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC
CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG
AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC
CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT
TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG
CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA
AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT
GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG
GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA
TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT
CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG
TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC
CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA
TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA
AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC
TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT
CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA
TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG
TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT
CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT
TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC
ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG
TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA
ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG
GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC
CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT
GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG
GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT
ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG
CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC
CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC
GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG
CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA
TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT
TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG
TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC
AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC
TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
Loading

0 comments on commit 99fc80f

Please sign in to comment.