@@ -25,6 +25,8 @@ multiple SNPs unless the -m flag is provided.
2525
2626: shows help message and exits.
2727
28+ See more below.
29+
2830# EXIT VALUES
2931
3032** 0**
@@ -38,19 +40,54 @@ multiple SNPs unless the -m flag is provided.
3840
3941<!--
4042
41- >>> from pytest.rtest import run_stdout, head, cat
43+ >>> from pytest.rtest import run_stdout, head, cat, sh
4244
4345-->
4446
4547```
4648
47- >>> head("vcfallelicprimitives -h",1 )
49+ >>> head("vcfallelicprimitives -h",20 )
4850usage: vcfallelicprimitives [options] [file]
51+ >
52+ If multiple allelic primitives (gaps or mismatches) are specified in
53+ a single VCF record, split the record into multiple lines, but drop all
54+ INFO fields. Does not handle genotypes (yet). MNPs are split into
55+ multiple SNPs unless the -m flag is provided. Records generated by splits have the
56+ options:
57+ -m, --use-mnps Retain MNPs as separate events (default: false).
58+ -t, --tag-parsed FLAG Tag records which are split apart of a complex allele with this flag.
59+ -L, --max-length LEN Do not manipulate records in which either the ALT or
60+ REF is longer than LEN (default: 200).
61+ -k, --keep-info Maintain site and allele-level annotations when decomposing.
62+ Note that in many cases, such as multisample VCFs, these won't
63+ be valid post-decomposition. For biallelic loci in single-sample
64+ VCFs, they should be usable with caution.
65+ -g, --keep-geno Maintain genotype-level annotations when decomposing. Similar
66+ caution should be used for this as for --keep-info.
67+ >
68+ Type: transformation
69+ >
4970
5071```
5172
52- vcfallelicprimitives picks complex regions and simplifies nested alignments
73+ vcfallelicprimitives picks complex regions and simplifies nested alignments. For example:
74+
75+ ``` python
76+
77+ >> > sh(" grep 10158243 ../samples/10158243.vcf" )
78+ grch38# chr4 10158243 >3655>3662 ACCCCCACCCCCACC ACC,AC,ACCCCCACCCCCAC,ACCCCCACC,ACA 60 . AC=64,3,2,3,1;AF=0.719101,0.0337079,0.0224719,0.0337079,0.011236;AN=89;AT=>3655>3656>3657>3658>3659>3660>3662,>3655>3656>3660>3662,>3655>3660>3662,>3655>3656>3657>3658>3660>3662,>3655>3656>3657>3660>3662,>3655>3656>3661>3662;NS=45;LV=0 GT 0|0 1|1 1|1 1|0 5|1 0|4 0|1 0|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 4|3 1|1 1|1 1|1 1|0 1|0 1|0 1|0 1|1 1|1 1|4 1|1 1|1 3|0 1|0 1|1 0|1 1|1 1|1 2|1 1|2 1|1 1|1 0|1 1|1 1|1 1|0 1|2 1|1 0
79+
80+ ```
5381
82+ After aligning it reduces into two records and adjusts the genotypes accordingly:
83+
84+ ``` python
85+
86+ >> > sh(" ../build/vcfallelicprimitives -m -L 1000 ../samples/10158243.vcf|grep -v ^\#" )
87+ grch38# chr4 10158243 . ACCCCCACCCCCAC ACCCCCAC,ACAC,AC,A 60 . AC=3,1,64,3;AF=0.0337079,0.011236,0.719101,0.0337079;LEN=6,10,12,13;TYPE=del,del,del,del GT 0|0 3|3 3|3 3|0 2|3 0|1 0|3 0|3 3|3 3|3 3|3 3|3 3|3 3|3 3|3 1|0 3|3 3|3 3|3 3|0 3|0 3|0 3|0 3|3 3|3 3|1 3|3 3|3 0|0 3|0 3|3 0|3 3|3 3|3 4|3 3|4 3|3 3|3 0|3 3|3 3|3 3|0 3|4 3|3 0
88+ grch38# chr4 10158255 . ACC AC,A 60 . AC=2,1;AF=0.0224719,0.011236;LEN=1,2;TYPE=del,del GT 0|0 0|0 0|0 0|0 2|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0
89+
90+ ```
5491
5592## Source code
5693
@@ -60,10 +97,10 @@ vcfallelicprimitives picks complex regions and simplifies nested alignments
6097
6198``` python
6299>> > run_stdout(" vcfallelicprimitives -m -L 1000 ../samples/grch38#chr8_36353854-36453166.vcf" , ext = " vcf" )
63- output in < a href= " ../data/regression/vcfallelicprimitives_2 .vcf" > vcfallelicprimitives_2 .vcf< / a>
100+ output in < a href= " ../data/regression/vcfallelicprimitives_4 .vcf" > vcfallelicprimitives_4 .vcf< / a>
64101
65102>> > run_stdout(" vcfallelicprimitives -m -L 1000 ../samples/grch38#chr4_10083863-10181258.vcf" , ext = " vcf" )
66- output in < a href= " ../data/regression/vcfallelicprimitives_3 .vcf" > vcfallelicprimitives_3 .vcf< / a>
103+ output in < a href= " ../data/regression/vcfallelicprimitives_5 .vcf" > vcfallelicprimitives_5 .vcf< / a>
67104
68105```
69106
0 commit comments