-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from campanam/development
Development
- Loading branch information
Showing
4 changed files
with
150 additions
and
40 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# vcf2aln Change Log | ||
Michael G. Campana & Jacob A. West-Roberts, 2017-2019 | ||
Smithsonian Conservation Biology Institute | ||
Contact: [email protected] | ||
|
||
### Version 0.6.0 | ||
get_GT_tags gets tag information from VCF rather than VCF 4.2 standard tags | ||
Can read GATK HaplotypeCaller PGT phasing information | ||
GT/PGT information do not need to be in first slot of output | ||
|
||
### Version 0.5.0 | ||
Speed increase using write-cycle controls | ||
Indexing bug fix | ||
|
||
### Version 0.4.2 | ||
Haploid VCF bug fix | ||
|
||
### Version 0.4.1 | ||
Onehap bug fix | ||
|
||
### Version 0.4.0 | ||
Onehap concatenation bug fix | ||
|
||
### Version 0.3.0 | ||
Now gets all type fields in VCF | ||
Onehap flag and bug fixes | ||
GLE & ambiguity code handling | ||
Cleaned up help screen | ||
|
||
### Version 0.2.0 | ||
New filters | ||
Ability to identify VCF tags | ||
Separation of methods | ||
|
||
### Version 0.1.0 | ||
Preliminary script to generate FASTA alignment from multi-sample VCF | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The software is made available under the Smithsonian Institution terms of use (https://www.si.edu/termsofuse). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,61 @@ | ||
# vcf2aln | ||
Script to convert multi-sample VCFs to FASTA alignments | ||
Script to convert multi-sample VCFs to FASTA alignments without assuming the reference sequence when data are missing. Users can apply a variety of data filters, produce phased/unphased, concatenated/split alignments, etc. | ||
|
||
## Authors | ||
Michael G. Campana & Jacob A. West-Roberts, 2017-2018 | ||
Michael G. Campana & Jacob A. West-Roberts, 2017-2019 | ||
|
||
## License | ||
The software is made available under the Smithsonian Institution [terms of use](https://wwww.si.edu/termsofuse). | ||
|
||
## Citation | ||
Campana, M.G. & West-Roberts, J.A. 2018. vcf2aln. (https://github.com/campanam/vcf2aln) | ||
Campana, M.G. & West-Roberts, J.A. 2019. vcf2aln. (https://github.com/campanam/vcf2aln) | ||
|
||
## Installation | ||
In the terminal: | ||
`git clone https://github.com/campanam/vcf2aln` | ||
`cd vcf2aln` | ||
`chmod +x vcf2aln.rb` | ||
|
||
Optionally, vcf2aln.rb can be placed within the users $PATH so that they can be executed from any location. Depending on your operating system, you may need to change the shebang lines in the scripts (first lines starting with #!) to specify the path of your Ruby executable. | ||
|
||
## Input | ||
vcf2aln requires an all-sites VCF (e.g. such as one produced using EMIT_ALL_SITES in the [Genome Analysis Toolkit](https://software.broadinstitute.org/gatk/)). | ||
|
||
## Execution | ||
Execute the script using `ruby vcf2aln.rb` (or `vcf2aln.rb` if the script is in your $PATH). This will display the help screen. Basic usage is as follows: | ||
`ruby vcf2aln.rb -i <input_vcf> -o <out_prefix>` | ||
|
||
## Available options | ||
### I/O options: | ||
`-i, --input [FILE]`: Input VCF file. | ||
`-o, --outprefix [VALUE]`: Output FASTA alignment prefix. | ||
`-c, --concatenate`: Concatenate markers into single alignment (e.g. concatenate multiple separate chromosomes/contigs or disparate regions within a chromosome with unresolved gaps between them). | ||
`-s, --skip`: Skip missing sites in VCF. | ||
`-O, --onehap`: Print only one haplotype for diploid data. If phasing information is missing, it will generate a pseudohaplotype by randomly assigning one of the alleles. | ||
`-a, --alts`: Print alternate (pseudo)haplotypes in same file. | ||
`-b, --ambig`: Print SNP sites as ambiguity codes. | ||
`-N, --hap_flag`: Data are haploid. | ||
`-g, --split_regions [VALUE]`: Split alignment into subregional alignments for phylogenetic analysis. *DO NOT USE: UNDER DEVELOPMENT* | ||
|
||
### Filtration options: | ||
`-m, --mincalls [VALUE]`: Minimum number of individuals called to include site (Default = 0). | ||
`-x, --maxmissing [VALUE]`: Maximum percent missing data to include sequence (Default = 100.0). | ||
`-q, --qual_filter [VALUE]`: Minimum accepted value for QUAL (per site) (Default = 0.0). | ||
`-y, --site_depth [VALUE]`: Minimum desired total depth for each site (Default = No filter). | ||
`-d, --sampledepth [VALUE]`: Minimum allowed sample depth for each site (Default = No filter). | ||
`-l, --likelihood [VALUE]`: Minimum allowed genotype log-likelihood (At least one option must satisfy this value). | ||
`-p, --phred [VALUE]`: Minimum accepted phred-scaled genotype likelihood (Default = No filter). | ||
`-P, --posterior [VALUE]`: Minimum accepted phred-scaled genotype posterior probability (Default = No filter). | ||
`-C, --conditional [VALUE]`: Minimum conditional genotype quality (phred-encoded) (Default = No filter). | ||
`-H, --haplotype_quality [VALUE]`: Minimum allowed haplotype quality (phred-encoded) (Default = No filter). | ||
`-r, --sample_mq [VALUE]`: Minimum allowed per-sample RMS mapping quality (Default = No filter). | ||
`-R, --site_mq [VALUE]`: Minimum allowed per-site mapping quality (MQ in INFO) (Default = No filter). | ||
`-F, --mq0f [VALUE]`: Maximum allowed value for MQ0F. Must be between 0 and 1. (Default = No filter). | ||
`-S, --mqsb [VALUE]`: Minimum allowed value for MQSB. (Default = No filter). | ||
`-A, --adepth [VALUE]`: Minimum allowed allele depth. (Default = No filter). | ||
|
||
### General information: | ||
`-t, --typefields`: Display VCF genotype field information, then quit the program. | ||
`-W, --writecycles`: Number of variants to store in memory before writing to disk. (Default = 1000000). | ||
`-v, --version`: Print program version. | ||
`-h, --help`: Show help. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters