HIFI-BARCODES-PACBIO-PIPELIN User's Guide V1.0 20170707
HIFIBarcode is used to produce full-length COI barcodes from pooled PCR amplicons generated by individual specimens.
- Clone from github
$ git clone https://github.com/comery/HIFI-barcode-pacbio.git
- Download a ZIP file Go to website https://github.com/comery/HIFI-barcode-hiseq and click 'Download ZIP'
- PicBio smrtanalysis | http://www.pacb.com/products-and-services/analytical-software/smrt-analysis/
- standard perl
- standard python(python2 is ok)
- Bio::Perl( exactlly Bio::Seq )
- 1.primer_like_extract.pl
- 2.cluster_count_passes_length.pl
- ccs_passes.py | look => here| or directly download => here
- fish_ccs.pl
- 01.data/*.h5 ( linkage will be available soon )
-
experiment_data/primer.lst
for GGTCAACAAATCATAAAGATATTGG
rev TAAACTTCAGGGTGACCAAAAAATCA
-
experiment_data/index.xls
001 AAAGC
002 AACAG
003 AACCT
004 AACTC
005 AAGCA
... .....
-
samples name and corresponding location in 96-cell plate
1 A01
2 B01
3 C01
4 D01
5 E01
. ...
If you installed PacBio smrtanalysis, I suppose you get the file path of setup.sh, more about Pacbio Data : http://www.pacb.com/wp-content/uploads/SMRT-Link-User-Guide-v4.0.0.pdf
e.g: setup_path='/path/PicBio/smrtanalysis/current/etc/setup.sh'
Input:
- my_inputs.fofn
Output:
-
log
-
data *.ccs.fasta *.ccs.fastq *.ccs.h5 reads_of_insert.fasta reads_of_insert.fastq slots.pickle
-
workflow
-
results
my_inputs.fofn contains files list of Pacbio H5 file in 01.data/, like this: ./01.data/m170506_092957_42199_c101149142550000001823255607191735_s1_p0.1.bax.h5 ./01.data/m170506_092957_42199_c101149142550000001823255607191735_s1_p0.bas.h5 ./01.data/m170506_092957_42199_c101149142550000001823255607191735_s1_p0.3.bax.h5 ./01.data/m170506_092957_42199_c101149142550000001823255607191735_s1_p0.2.bax.h5
run:
$ source /path/PicBio/smrtanalysis/current/etc/setup.sh
$ fofnToSmrtpipeInput.py my_inputs.fofn > my_inputs.xml
$ smrtpipe.py --params=settings.xml xml:input.xml
Input:
- /data/*.ccs.h5
Output:
- ccs_passes.lst
run: in sure that you have run source /path/PicBio/smrtanalysis/current/etc/setup.sh
$ python bin/ccs_passes.py data/*.ccs.h5 >ccs_passes.lst
Input:
- ccs_passes.lst
- data/reads_of_insert.fasta
Output:
- ccs_passes_15.fa
run:
$ awk '$2>=15{print $1}' ccs_passes.lst >ccs_passes_15.lst
$ perl ./bin/fish_ccs.pl ccs_passes_15.lst data/reads_of_insert.fasta >ccs_passes_15.fa
Input:
- experiment_data/primer.fa
- experiment_data/index.xls
- ccs_passes_15.fa
Output: "outdir" name is up to you, here default value is "02.assignment"
02.assignment/
- assign.log.txt
- ccs.successfully_assigned.fa
- check.ccs_passes_15.fa.log
run:
$ perl ./bin/1.primer_like_extract.pl -p experiment_data/primer.fa -index experiment_data/index.xls -fa ccs_passes_15.fa -cm 2 -cg 1
Input:
- ccs.successfully_assigned.fa
- check.ccs_passes_15.fa.log
- ccs_passes.lst
Output:
- cluster.top1.fas
- cluster.id.txt
- cluster.all.fa
run:
$ cd 02.assignment/
$ perl ../bin/2.cluster_count_passes_length.pl -ccs ccs.successfully_assigned.fa -pattern check.ccs_passes_15.fa.log -passes ../ccs_passes.lst
$ perl ../bin/change_name-location.pl cluster.top1.fas >hifi-barcode-pacbio.cluster.top1.fa
ALL DONE!
So, "hifi-barcode-pacbio.cluster.top1.fa" is final result!
Thanks Hailin Pan for inspiring me about dynamic programming in script-"1.primer_like_extract.pl", I did learn much from that!
Email: yangchentao at genomics dot cn
Liu, Shanlin, Chentao Yang, Chengran Zhou, and Xin Zhou. "Filling reference gaps via assembling DNA barcodes using high-throughput sequencing–moving toward barcoding the world." GigaScience(2017).
Version 1.0 201707