-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathatom.xml
3240 lines (2780 loc) · 365 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title><![CDATA[Big Data Genomics]]></title>
<link href="http://bigdatagenomics.github.io/atom.xml" rel="self"/>
<link href="http://bigdatagenomics.github.io/"/>
<updated>2018-12-07T10:22:47-08:00</updated>
<id>http://bigdatagenomics.github.io/</id>
<author>
<name><![CDATA[Big Data Genomics]]></name>
</author>
<generator uri="http://octopress.org/">Octopress</generator>
<entry>
<title type="html"><![CDATA[ADAM 0.25.0 and Cannoli 0.3.0 Released]]></title>
<link href="http://bigdatagenomics.github.io/blog/2018/12/01/adam-0-dot-25-dot-0-cannoli-0-dot-3-dot-0-releases/"/>
<updated>2018-12-01T00:00:00-08:00</updated>
<id>http://bigdatagenomics.github.io/blog/2018/12/01/adam-0-dot-25-dot-0-cannoli-0-dot-3-dot-0-releases</id>
<content type="html"><![CDATA[<p>ADAM <a href="https://github.com/bigdatagenomics/adam/releases">version 0.25.0</a> and
Cannoli <a href="https://github.com/bigdatagenomics/cannoli/releases">version 0.3.0</a> have been released!</p>
<p>Since the 0.24.0 release of ADAM, more then 40 issues have been closed, including bug fixes around
indexed reads and attributes in VCF. New features include additional filter by methods and multi-sample
coverage. The ADAM Python APIs now support Python 3.</p>
<p>Based on feedback from the <a href="https://www.open-bio.org/wiki/BOSC_2018">2018 GCCBOSC bioinformatics community conference</a>,
at <a href="https://galaxyproject.org/events/gccbosc2018/collaboration/">2018 GCCBOSC CollaborationFest</a> the Cannoli API
was refactored to greatly improve interactive use in <code>cannoli-shell</code> (a Scala REPL based on Spark Shell, similar
to <code>adam-shell</code>) and notebooks such as <a href="https://jupyter.org/">Jupyter</a>, <a href="https://zeppelin.apache.org/">Zeppelin</a>,
and <a href="http://spark-notebook.io/">Spark Notebook</a>.</p>
<p>For example, here is an entire variant calling pipeline, based on bwa, ADAM, and Freebayes</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.cannoli.cli._
</span><span class='line'>import org.bdgenomics.cannoli.cli.Cannoli._
</span><span class='line'>
</span><span class='line'>val sample = "sample"
</span><span class='line'>val reference = "ref.fa"
</span><span class='line'>
</span><span class='line'>val reads = sc.loadPairedFastqAsFragments(sample + "_1.fq", sample + "_2.fq")
</span><span class='line'>
</span><span class='line'>val bwaArgs = new BwaArgs()
</span><span class='line'>bwaArgs.sample = sample
</span><span class='line'>bwaArgs.indexPath = reference
</span><span class='line'>
</span><span class='line'>val alignments = reads.alignWithBwa(bwaArgs)
</span><span class='line'>val sorted = alignments.sortReadsByReferencePositionAndIndex()
</span><span class='line'>val markdup = sorted.markDuplicates()
</span><span class='line'>
</span><span class='line'>val freebayesArgs = new FreebayesArgs()
</span><span class='line'>freebayesArgs.referencePath = reference
</span><span class='line'>
</span><span class='line'>val variantContexts = markdup.callVariantsWithFreebayes(freebayesArgs)
</span><span class='line'>
</span><span class='line'>variantContexts.saveAsVcf(sample + ".freebayes.vcf.bgzf")</span></code></pre></td></tr></table></div></figure>
<h1>Changes since Previous Releases</h1>
<p>The full list of changes to ADAM since version 0.24.0 and Cannoli since version 0.2.0 are below.</p>
<!-- more -->
<h3>ADAM version 0.25.0</h3>
<p><strong>Closed issues:</strong></p>
<ul>
<li>Expand illumina metadata regex to include “N” character <a href="https://github.com/bigdatagenomics/adam/issues/2079">#2079</a></li>
<li>Remove support for Hadoop 2.6 <a href="https://github.com/bigdatagenomics/adam/issues/2073">#2073</a></li>
<li>NumberFormatException: For input string: “nan” in VCF <a href="https://github.com/bigdatagenomics/adam/issues/2068">#2068</a></li>
<li>Support Spark 2.3.2 <a href="https://github.com/bigdatagenomics/adam/issues/2062">#2062</a></li>
<li>Arrays should be passed to HTSJDK in the JVM primitive type <a href="https://github.com/bigdatagenomics/adam/issues/2059">#2059</a></li>
<li>toCoverage() function for alignments does not distinguish samples <a href="https://github.com/bigdatagenomics/adam/issues/2049">#2049</a></li>
<li>Building from adam-core module directory fails to generate Scala code for sql package <a href="https://github.com/bigdatagenomics/adam/issues/2047">#2047</a></li>
<li>Data Sets <a href="https://github.com/bigdatagenomics/adam/issues/2043">#2043</a></li>
<li>saveAsBed writes missing score values as ‘.’ instead of ‘0’ <a href="https://github.com/bigdatagenomics/adam/issues/2039">#2039</a></li>
<li>Fix GFF3 parser to handle trailing FASTA <a href="https://github.com/bigdatagenomics/adam/issues/2037">#2037</a></li>
<li>Add StorageLevel as an optional parameter to loadPairedFastq <a href="https://github.com/bigdatagenomics/adam/issues/2032">#2032</a></li>
<li>Error: File name too long when building on encrypted file system <a href="https://github.com/bigdatagenomics/adam/issues/2031">#2031</a></li>
<li>Fail to transform a VCF file containing multiple genome data (Muliple sample) <a href="https://github.com/bigdatagenomics/adam/issues/2029">#2029</a></li>
<li>Dataset and RDD constructors are missing from CoverageRDD <a href="https://github.com/bigdatagenomics/adam/issues/2027">#2027</a></li>
<li>How to create a single RDD[Genotype] object out of multiple VCF files? <a href="https://github.com/bigdatagenomics/adam/issues/2025">#2025</a></li>
<li>ReadTheDocs github banner is broken <a href="https://github.com/bigdatagenomics/adam/issues/2020">#2020</a></li>
<li>-realign_indels throws serialization error with instrumentation enabled <a href="https://github.com/bigdatagenomics/adam/issues/2007">#2007</a></li>
<li>Support 0 length FASTQ reads <a href="https://github.com/bigdatagenomics/adam/issues/2006">#2006</a></li>
<li>Speed of Reading into ADAM RDDs from S3 <a href="https://github.com/bigdatagenomics/adam/issues/2003">#2003</a></li>
<li>Support Python 3 <a href="https://github.com/bigdatagenomics/adam/issues/1999">#1999</a></li>
<li>Unordered list of region join types in doc is missing nested levels <a href="https://github.com/bigdatagenomics/adam/issues/1997">#1997</a></li>
<li>Add VariantContextRDD.saveAsPartitionedParquet, ADAMContext.loadPartitionedParquetVariantContexts <a href="https://github.com/bigdatagenomics/adam/issues/1996">#1996</a></li>
<li>VCF annotation question <a href="https://github.com/bigdatagenomics/adam/issues/1994">#1994</a></li>
<li>Fastq reader clips long reads at 10,000 bp <a href="https://github.com/bigdatagenomics/adam/issues/1992">#1992</a></li>
<li>adam-submit Error: Number of executors must be a positive number on EMR 5.13.0/Spark 2.3.0 <a href="https://github.com/bigdatagenomics/adam/issues/1991">#1991</a></li>
<li>Test against Spark 2.3.1, Parquet 1.8.3 <a href="https://github.com/bigdatagenomics/adam/issues/1989">#1989</a></li>
<li>END does not get set when writing a gVCF <a href="https://github.com/bigdatagenomics/adam/issues/1988">#1988</a></li>
<li>Support saving single files to filesystems that don’t implement getScheme <a href="https://github.com/bigdatagenomics/adam/issues/1984">#1984</a></li>
<li>Add additional filter by convenience methods <a href="https://github.com/bigdatagenomics/adam/issues/1978">#1978</a></li>
<li>Limiting FragmentRDD pipe paralellism <a href="https://github.com/bigdatagenomics/adam/issues/1977">#1977</a></li>
<li>Consider javadoc.io for API documentation linking <a href="https://github.com/bigdatagenomics/adam/issues/1976">#1976</a></li>
<li>FASTQ Reader leaks connections <a href="https://github.com/bigdatagenomics/adam/issues/1974">#1974</a></li>
<li>Update bioconda recipe for version 0.24.0 <a href="https://github.com/bigdatagenomics/adam/issues/1971">#1971</a></li>
<li>Update homebrew formula at brewsci/homebrew-bio for version 0.24.0 <a href="https://github.com/bigdatagenomics/adam/issues/1970">#1970</a></li>
<li>loadPartitionedParquetAlignments fails with Reference.all <a href="https://github.com/bigdatagenomics/adam/issues/1967">#1967</a></li>
<li>Caused by: java.lang.VerifyError: class com.fasterxml.jackson.module.scala.ser.ScalaIteratorSerializer overrides final method withResolved <a href="https://github.com/bigdatagenomics/adam/issues/1953">#1953</a></li>
<li>FASTQ input format needs to support index sequences <a href="https://github.com/bigdatagenomics/adam/issues/1697">#1697</a></li>
<li>Changelog must be edited and committed manually during release process <a href="https://github.com/bigdatagenomics/adam/issues/936">#936</a></li>
</ul>
<p><strong>Merged and closed pull requests:</strong></p>
<ul>
<li>added pyspark mock modules for API documentation <a href="https://github.com/bigdatagenomics/adam/pull/2084">#2084</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>Added mock python modules for API python documentation <a href="https://github.com/bigdatagenomics/adam/pull/2082">#2082</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-2079] Expand illumina metadata regex to include “N” character <a href="https://github.com/bigdatagenomics/adam/pull/2081">#2081</a> (<a href="https://github.com/pauldwolfe">pauldwolfe</a>)</li>
<li>ADAM-2079 Added “N” to regexs for illumina metadata <a href="https://github.com/bigdatagenomics/adam/pull/2080">#2080</a> (<a href="https://github.com/pauldwolfe">pauldwolfe</a>)</li>
<li>Update docs with new template and documentation <a href="https://github.com/bigdatagenomics/adam/pull/2078">#2078</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1992] Make maximum FASTQ read length configurable. <a href="https://github.com/bigdatagenomics/adam/pull/2077">#2077</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-2059] Properly pass back primitive typed arrays to HTSJDK. <a href="https://github.com/bigdatagenomics/adam/pull/2075">#2075</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Update dependency versions, including htsjdk to 2.16.1 and guava to 27.0-jre <a href="https://github.com/bigdatagenomics/adam/pull/2072">#2072</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1999] Support Python 3 <a href="https://github.com/bigdatagenomics/adam/pull/2070">#2070</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-2068] Prevent NumberFormatException for nan vs NaN in VCF files. <a href="https://github.com/bigdatagenomics/adam/pull/2069">#2069</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Update python MAKE file <a href="https://github.com/bigdatagenomics/adam/pull/2067">#2067</a> (<a href="https://github.com/Georgehe4">Georgehe4</a>)</li>
<li>Update python MAKE file <a href="https://github.com/bigdatagenomics/adam/pull/2066">#2066</a> (<a href="https://github.com/Georgehe4">Georgehe4</a>)</li>
<li>Update jenkins script to test python 3.6 <a href="https://github.com/bigdatagenomics/adam/pull/2060">#2060</a> (<a href="https://github.com/Georgehe4">Georgehe4</a>)</li>
<li>[ADAM-2062] Update Spark version to 2.3.2 <a href="https://github.com/bigdatagenomics/adam/pull/2055">#2055</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Clean up fields and doc in fragment. <a href="https://github.com/bigdatagenomics/adam/pull/2054">#2054</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-2037] Support GFF3 files containing FASTA formatted sequences. <a href="https://github.com/bigdatagenomics/adam/pull/2053">#2053</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>modified CoverageRDD and FeatureRDD to extend MultisampleGenomicDataset <a href="https://github.com/bigdatagenomics/adam/pull/2051">#2051</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>Multi-sample coverage <a href="https://github.com/bigdatagenomics/adam/pull/2050">#2050</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-2047] Use source directory relative to project.basedir for adam codegen. <a href="https://github.com/bigdatagenomics/adam/pull/2048">#2048</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-2039] Adding support for writing BED format per UCSC definition <a href="https://github.com/bigdatagenomics/adam/pull/2042">#2042</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Update Jenkins Spark version to 2.2.2 <a href="https://github.com/bigdatagenomics/adam/pull/2035">#2035</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-2032] Add StorageLevel as an optional parameter to loadPairedFastq <a href="https://github.com/bigdatagenomics/adam/pull/2033">#2033</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-2027] Add RDD and Dataset constructors to CoverageRDD. <a href="https://github.com/bigdatagenomics/adam/pull/2028">#2028</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Allow for export of query name sorted SAM files <a href="https://github.com/bigdatagenomics/adam/pull/2026">#2026</a> (<a href="https://github.com/karenfeng">karenfeng</a>)</li>
<li>[ADAM-2020] Fix ReadTheDocs Github banner. <a href="https://github.com/bigdatagenomics/adam/pull/2021">#2021</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1988] Add copyVariantEndToAttribute method to support gVCF END attribute … <a href="https://github.com/bigdatagenomics/adam/pull/2017">#2017</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-936] Use github-changes-maven-plugin to update CHANGES.md. <a href="https://github.com/bigdatagenomics/adam/pull/2014">#2014</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1992] Make maximum FASTQ read length configurable. <a href="https://github.com/bigdatagenomics/adam/pull/2011">#2011</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1697] Expand Illumina metadata regex to cover interleaved index sequences. <a href="https://github.com/bigdatagenomics/adam/pull/2010">#2010</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-2007] Make IndelRealignmentTarget implement Serializable. <a href="https://github.com/bigdatagenomics/adam/pull/2009">#2009</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-2006] Support loading 0-length reads as FASTQ. <a href="https://github.com/bigdatagenomics/adam/pull/2008">#2008</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1697] Expand Illumina metadata regex to cover index sequences <a href="https://github.com/bigdatagenomics/adam/pull/2004">#2004</a> (<a href="https://github.com/pauldwolfe">pauldwolfe</a>)</li>
<li>[ADAM-1996] Load and save VariantContexts as partitioned Parquet. <a href="https://github.com/bigdatagenomics/adam/pull/2001">#2001</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1997] Nest list of region join types in joins doc. <a href="https://github.com/bigdatagenomics/adam/pull/1998">#1998</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1877] Add filterToReferenceName(s) to SequenceDictionary. <a href="https://github.com/bigdatagenomics/adam/pull/1995">#1995</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1984] Support file systems that don’t set the scheme. <a href="https://github.com/bigdatagenomics/adam/pull/1985">#1985</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1978] Add additional filter by convenience methods. <a href="https://github.com/bigdatagenomics/adam/pull/1983">#1983</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Adding printAttribute methods for alignment records, features, and samples. <a href="https://github.com/bigdatagenomics/adam/pull/1982">#1982</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Fix partitioning code to use Long instead of Int <a href="https://github.com/bigdatagenomics/adam/pull/1980">#1980</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1976] Adding core API documentation link and badge. <a href="https://github.com/bigdatagenomics/adam/pull/1979">#1979</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1974] Close unclosed stream in FastqInputFormat. <a href="https://github.com/bigdatagenomics/adam/pull/1975">#1975</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Set defaults to schemas <a href="https://github.com/bigdatagenomics/adam/pull/1972">#1972</a> (<a href="https://github.com/ffinfo">ffinfo</a>)</li>
<li>Add loadPairedFastqAsFragments method. <a href="https://github.com/bigdatagenomics/adam/pull/1866">#1866</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Adding loadPairedFastqAsFragments method <a href="https://github.com/bigdatagenomics/adam/pull/1828">#1828</a> (<a href="https://github.com/ffinfo">ffinfo</a>)</li>
</ul>
<h3>Cannoli Version 0.3.0</h3>
<p><strong>Closed issues:</strong></p>
<ul>
<li>Add implicit methods that attach to source RDD <a href="https://github.com/bigdatagenomics/cannoli/issues/131">#131</a></li>
<li>Flip function and command line class names around <a href="https://github.com/bigdatagenomics/cannoli/issues/130">#130</a></li>
<li>Add API documentation link and badge <a href="https://github.com/bigdatagenomics/cannoli/issues/128">#128</a></li>
<li>Add homebrew formula at brewsci/homebrew-bio <a href="https://github.com/bigdatagenomics/cannoli/issues/124">#124</a></li>
<li>Add bioconda recipe <a href="https://github.com/bigdatagenomics/cannoli/issues/123">#123</a></li>
<li>Support validation stringency in out formatters <a href="https://github.com/bigdatagenomics/cannoli/issues/122">#122</a></li>
<li>Add Ensembl Variant Effect Predictor (VEP) for variant annotation <a href="https://github.com/bigdatagenomics/cannoli/issues/112">#112</a></li>
<li>Add Minimap2 for alignment <a href="https://github.com/bigdatagenomics/cannoli/issues/111">#111</a></li>
</ul>
<p><strong>Merged and closed pull requests:</strong></p>
<ul>
<li>Update release script for changelog. <a href="https://github.com/bigdatagenomics/cannoli/pull/143">#143</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-141] Update ADAM dependency to 0.25.0. <a href="https://github.com/bigdatagenomics/cannoli/pull/142">#142</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Update default docker image for bowtie2. <a href="https://github.com/bigdatagenomics/cannoli/pull/140">#140</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-138] Update Cannoli per latest ADAM snapshot changes. <a href="https://github.com/bigdatagenomics/cannoli/pull/139">#139</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-131] Add implicits on Cannoli function source data sets. <a href="https://github.com/bigdatagenomics/cannoli/pull/133">#133</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-130] Extract function classes to core package. <a href="https://github.com/bigdatagenomics/cannoli/pull/132">#132</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-128] Adding API documentation link and badge. <a href="https://github.com/bigdatagenomics/cannoli/pull/129">#129</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-112] Adding Ensembl Variant Effect Predictor (VEP) for variant annotation <a href="https://github.com/bigdatagenomics/cannoli/pull/127">#127</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-122] Support validation stringency in out formatters. <a href="https://github.com/bigdatagenomics/cannoli/pull/126">#126</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-111] Adding Minimap2 for alignment. <a href="https://github.com/bigdatagenomics/cannoli/pull/119">#119</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ADAM 0.24.0 and Cannoli 0.2.0 Released]]></title>
<link href="http://bigdatagenomics.github.io/blog/2018/03/28/adam-0-dot-24-dot-0-cannoli-0-dot-2-dot-0-releases/"/>
<updated>2018-03-28T00:00:00-07:00</updated>
<id>http://bigdatagenomics.github.io/blog/2018/03/28/adam-0-dot-24-dot-0-cannoli-0-dot-2-dot-0-releases</id>
<content type="html"><![CDATA[<p>ADAM <a href="https://github.com/bigdatagenomics/adam/releases">version 0.24.0</a> and
Cannoli <a href="https://github.com/bigdatagenomics/cannoli/releases">version 0.2.0</a> have been released!</p>
<p>As of version 0.24.0, support for Spark version 1.x and Scala 2.10.x has been dropped. ADAM and
Cannoli currently build against Spark version 2.3.0 and Scala version 2.11.12.</p>
<p>Major new features in ADAM version 0.24.0 include Spark SQL support across all genomic data
types and access to the ADAM region join API through Python and R. The ADAM Python and R APIs are
now feature complete relative to ADAM’s Java API. ADAM version 0.24.0 also introduces
Hive-style partitioning by genomic range for Parquet-backed Datasets. This greatly improves
performance for genomic range based queries.</p>
<p>With version 0.2.0, Cannoli now provides a functional API for interactive use in
<code>cannoli-shell</code> (a Scala REPL based on Spark Shell, similar to <code>adam-shell</code>) and
notebooks such as <a href="https://jupyter.org/">Jupyter</a>, <a href="https://zeppelin.apache.org/">Zeppelin</a>,
and <a href="http://spark-notebook.io/">Spark Notebook</a>. This API allows for multiple
Cannoli-wrapped bioinformatics tools as processes in a larger Spark-based workflow
without having to write out to disk intermediately.</p>
<h1>Changes since Previous Releases</h1>
<p>The full list of changes to ADAM since version 0.23.0 and Cannoli since version 0.1.0 are below.</p>
<!-- more -->
<h3>ADAM Version 0.24.0</h3>
<p><strong>Closed issues:</strong></p>
<ul>
<li>Phred values from 156–254 do not round trip properly between log space <a href="https://github.com/bigdatagenomics/adam/issues/1964">#1964</a></li>
<li>Support VCF lines with positions at 0 <a href="https://github.com/bigdatagenomics/adam/issues/1959">#1959</a></li>
<li>Don’t initialize non-ref values to Int.MinValue <a href="https://github.com/bigdatagenomics/adam/issues/1957">#1957</a></li>
<li>Support downsampling in recalibration <a href="https://github.com/bigdatagenomics/adam/issues/1955">#1955</a></li>
<li>Cannot waive validation stringency for INFO Number=.,Type=Flag fields <a href="https://github.com/bigdatagenomics/adam/issues/1939">#1939</a></li>
<li>Clip phred scores below Int.MaxValue <a href="https://github.com/bigdatagenomics/adam/issues/1934">#1934</a></li>
<li>ADAMContext.getFsAndFilesWithFilter should throw exception if paths null or empty <a href="https://github.com/bigdatagenomics/adam/issues/1932">#1932</a></li>
<li>Bump to Spark 2.3.0 <a href="https://github.com/bigdatagenomics/adam/issues/1931">#1931</a></li>
<li>util.FileExtensions should be public for use downstream in Cannoli <a href="https://github.com/bigdatagenomics/adam/issues/1927">#1927</a></li>
<li>Reduce logging level for ADAMKryoRegistrator <a href="https://github.com/bigdatagenomics/adam/issues/1925">#1925</a></li>
<li>Revisit performance implications of commit 1eed8e8 <a href="https://github.com/bigdatagenomics/adam/issues/1923">#1923</a></li>
<li>add akmorrow13 to PyPl for bdgenomics.adam <a href="https://github.com/bigdatagenomics/adam/issues/1919">#1919</a></li>
<li>Read the Docs build failing with TypeError: super() argument 1 must be type, not None <a href="https://github.com/bigdatagenomics/adam/issues/1917">#1917</a></li>
<li>Bump Hadoop-BAM dependency to 7.9.2. <a href="https://github.com/bigdatagenomics/adam/issues/1915">#1915</a></li>
<li>cannot run pyadam from adam distribution 0.23.0 <a href="https://github.com/bigdatagenomics/adam/issues/1914">#1914</a></li>
<li>adam2fasta/q are missing asSingleFile, disableFastConcat <a href="https://github.com/bigdatagenomics/adam/issues/1912">#1912</a></li>
<li>Pipe API doesn’t properly handle multiple arguments and spaces <a href="https://github.com/bigdatagenomics/adam/issues/1909">#1909</a></li>
<li>Bump to HTSJDK 2.13.2 <a href="https://github.com/bigdatagenomics/adam/issues/1907">#1907</a></li>
<li>S3A error: HTTP request: Timeout waiting for connection from pool <a href="https://github.com/bigdatagenomics/adam/issues/1906">#1906</a></li>
<li>InputStream passed to VCFHeaderReader does not get closed <a href="https://github.com/bigdatagenomics/adam/issues/1900">#1900</a></li>
<li>Support INFO fields set to missing <a href="https://github.com/bigdatagenomics/adam/issues/1898">#1898</a></li>
<li>CLI to transfer between cloud storage and HDFS <a href="https://github.com/bigdatagenomics/adam/issues/1896">#1896</a></li>
<li>Jenkins does not run python or R tests <a href="https://github.com/bigdatagenomics/adam/issues/1889">#1889</a></li>
<li>pyadam throws application option error <a href="https://github.com/bigdatagenomics/adam/issues/1886">#1886</a></li>
<li>ReferenceRegion in python does not exist <a href="https://github.com/bigdatagenomics/adam/issues/1884">#1884</a></li>
<li>Caching GenomicRDD in pyspark <a href="https://github.com/bigdatagenomics/adam/issues/1883">#1883</a></li>
<li>adam-submit aborts if ADAM_HOME is set <a href="https://github.com/bigdatagenomics/adam/issues/1882">#1882</a></li>
<li>Allow piped commands to timeout <a href="https://github.com/bigdatagenomics/adam/issues/1875">#1875</a></li>
<li>loadVcf does not dedupe sample ID <a href="https://github.com/bigdatagenomics/adam/issues/1874">#1874</a></li>
<li>Add coverage command for reporting read coverage <a href="https://github.com/bigdatagenomics/adam/issues/1873">#1873</a></li>
<li>Only python 2? <a href="https://github.com/bigdatagenomics/adam/issues/1871">#1871</a></li>
<li>Support VariantContextRDD from SQL <a href="https://github.com/bigdatagenomics/adam/issues/1867">#1867</a></li>
<li>Cannot find <code>find-adam-assembly.sh</code> in bioconda build <a href="https://github.com/bigdatagenomics/adam/issues/1862">#1862</a></li>
<li><code>_jvm.java.lang.Class.forName</code> does not work for certain configurations <a href="https://github.com/bigdatagenomics/adam/issues/1858">#1858</a></li>
<li>Formatting error in CHANGES.md <a href="https://github.com/bigdatagenomics/adam/issues/1857">#1857</a></li>
<li>Various improvements to readthedocs documentation <a href="https://github.com/bigdatagenomics/adam/issues/1853">#1853</a></li>
<li>add filterByOverlappingRegion(query: ReferenceRegion) to R and python APIs <a href="https://github.com/bigdatagenomics/adam/issues/1852">#1852</a></li>
<li>Support adding VCF header lines from Python <a href="https://github.com/bigdatagenomics/adam/issues/1840">#1840</a></li>
<li>Support loadIndexedBam from Python <a href="https://github.com/bigdatagenomics/adam/issues/1836">#1836</a></li>
<li>Add link to awesome list of applications that extend ADAM <a href="https://github.com/bigdatagenomics/adam/issues/1832">#1832</a></li>
<li>loadIndexed bam lazily throws Exception if index does not exist <a href="https://github.com/bigdatagenomics/adam/issues/1830">#1830</a></li>
<li>OAuth credentials for Github in Coveralls configuration are no longer valid <a href="https://github.com/bigdatagenomics/adam/issues/1829">#1829</a></li>
<li>base counts per position <a href="https://github.com/bigdatagenomics/adam/issues/1825">#1825</a></li>
<li>Issues loading BAM files in Google FS <a href="https://github.com/bigdatagenomics/adam/issues/1816">#1816</a></li>
<li>Error when writing a vcf file to Parquet <a href="https://github.com/bigdatagenomics/adam/issues/1810">#1810</a></li>
<li>transformAlignments cannot repartition files <a href="https://github.com/bigdatagenomics/adam/issues/1808">#1808</a></li>
<li>GenotypeRDD should support <code>toVariants</code> method <a href="https://github.com/bigdatagenomics/adam/issues/1806">#1806</a></li>
<li>Add support for python and R in Homebrew formula <a href="https://github.com/bigdatagenomics/adam/issues/1796">#1796</a></li>
<li>Add <code>transformVariantContexts</code> or similar to cli <a href="https://github.com/bigdatagenomics/adam/issues/1793">#1793</a></li>
<li>Issue while using Sorting option <a href="https://github.com/bigdatagenomics/adam/issues/1791">#1791</a></li>
<li>Issue with adam2vcf <a href="https://github.com/bigdatagenomics/adam/issues/1787">#1787</a></li>
<li>Remove explicit <code><compile></code> scopes from submodule POMs <a href="https://github.com/bigdatagenomics/adam/issues/1786">#1786</a></li>
<li>java.nio.file.ProviderNotFoundException (Provider “s3” not found) <a href="https://github.com/bigdatagenomics/adam/issues/1732">#1732</a></li>
<li>Accessing GenomicRDD join functions in python <a href="https://github.com/bigdatagenomics/adam/issues/1728">#1728</a></li>
<li>ArrayIndexOutOfBoundsException in PhredUtils$.phredToSuccessProbability <a href="https://github.com/bigdatagenomics/adam/issues/1714">#1714</a></li>
<li>Add ability to specify region bounds to pipe command <a href="https://github.com/bigdatagenomics/adam/issues/1707">#1707</a></li>
<li>Unable to run pyadam, SQLException: Failed to start database ‘metastore_db’ <a href="https://github.com/bigdatagenomics/adam/issues/1666">#1666</a></li>
<li>SAMFormatException: Unrecognized tag type: ^@ <a href="https://github.com/bigdatagenomics/adam/issues/1657">#1657</a></li>
<li>IndexOutOfBoundsException in BAMInputFormat.getSplits <a href="https://github.com/bigdatagenomics/adam/issues/1656">#1656</a></li>
<li>overlaps considers that Strand.FORWARD cannot overlap with Strand.INDEPENDENT <a href="https://github.com/bigdatagenomics/adam/issues/1650">#1650</a></li>
<li>migration converters <a href="https://github.com/bigdatagenomics/adam/issues/1629">#1629</a></li>
<li>RFC: Removing Spark 1.x, Scala 2.10 support in 0.24.0 release <a href="https://github.com/bigdatagenomics/adam/issues/1597">#1597</a></li>
<li>Eliminate unused ConcreteADAMRDDFunctions class <a href="https://github.com/bigdatagenomics/adam/issues/1580">#1580</a></li>
<li>Add set theory/statistics packages to ADAM <a href="https://github.com/bigdatagenomics/adam/issues/1533">#1533</a></li>
<li>Evaluate Apache Carbondata INDEXED column store file format for genomics <a href="https://github.com/bigdatagenomics/adam/issues/1527">#1527</a></li>
<li>Stranded vs unstranded in getReferenceRegions() for features <a href="https://github.com/bigdatagenomics/adam/issues/1513">#1513</a></li>
<li>Question:How to tranform a line of sam to AlignmentRecord? <a href="https://github.com/bigdatagenomics/adam/issues/1425">#1425</a></li>
<li>Excessive compilation warnings about multiple scala libraries <a href="https://github.com/bigdatagenomics/adam/issues/695">#695</a></li>
<li>Support Hive-style partitioning <a href="https://github.com/bigdatagenomics/adam/issues/651">#651</a></li>
</ul>
<p><strong>Merged and closed pull requests:</strong></p>
<ul>
<li>[ADAM-1964] Lower point where phred conversions are done using log code. <a href="https://github.com/bigdatagenomics/adam/pull/1965">#1965</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Add utility methods for adam-shell. <a href="https://github.com/bigdatagenomics/adam/pull/1958">#1958</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1955] Add support for downsampling during recalibration table generation <a href="https://github.com/bigdatagenomics/adam/pull/1963">#1963</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1957] Don’t initialize missing likelihoods to MinValue. <a href="https://github.com/bigdatagenomics/adam/pull/1961">#1961</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1959] Support VCF rows at position 0. <a href="https://github.com/bigdatagenomics/adam/pull/1960">#1960</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets <a href="https://github.com/bigdatagenomics/adam/pull/1948">#1948</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1914] Python profile needs to be specified for egg to be in distribution. <a href="https://github.com/bigdatagenomics/adam/pull/1946">#1946</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1917] Delete dependency on fulltoc. <a href="https://github.com/bigdatagenomics/adam/pull/1944">#1944</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1917] Try 3: fix Sphinx fulltoc. <a href="https://github.com/bigdatagenomics/adam/pull/1943">#1943</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1917] Set Sphinx version in requirements.txt. <a href="https://github.com/bigdatagenomics/adam/pull/1942">#1942</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1917] Set minimal Sphinx version for Readthedocs build. <a href="https://github.com/bigdatagenomics/adam/pull/1941">#1941</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1939] Allow validation stringency to waive off FLAG arrays. <a href="https://github.com/bigdatagenomics/adam/pull/1940">#1940</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1915] Bump to Hadoop-BAM 7.9.2. <a href="https://github.com/bigdatagenomics/adam/pull/1938">#1938</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1934] Clip phred values to 3233, instead of Int.MaxValue. <a href="https://github.com/bigdatagenomics/adam/pull/1936">#1936</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Ignore VCF INFO fields with number=G when stringency=LENIENT <a href="https://github.com/bigdatagenomics/adam/pull/1935">#1935</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>[ADAM-1931] Bump to Spark 2.3.0. <a href="https://github.com/bigdatagenomics/adam/pull/1933">#1933</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1840] Support adding VCF header lines from Python. <a href="https://github.com/bigdatagenomics/adam/pull/1930">#1930</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1927] Increase visibility for util.FileExtensions for use downstream. <a href="https://github.com/bigdatagenomics/adam/pull/1929">#1929</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1925] Reduce logging level for ADAMKryoRegistrator. <a href="https://github.com/bigdatagenomics/adam/pull/1928">#1928</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1923] Revert 1eed8e8 <a href="https://github.com/bigdatagenomics/adam/pull/1926">#1926</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Use SparkFiles.getRootDirectory in local mode. <a href="https://github.com/bigdatagenomics/adam/pull/1924">#1924</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-651] Implement Hive-style partitioning by genomic range of Parquet backed datasets <a href="https://github.com/bigdatagenomics/adam/pull/1922">#1922</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>Make Spark SQL APIs supported across all types <a href="https://github.com/bigdatagenomics/adam/pull/1921">#1921</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1909] Refactor pipe cmd parameter from String to Seq[String]. <a href="https://github.com/bigdatagenomics/adam/pull/1920">#1920</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Add Google Cloud documentation <a href="https://github.com/bigdatagenomics/adam/pull/1918">#1918</a> (<a href="https://github.com/Georgehe4">Georgehe4</a>)</li>
<li>[ADAM-1917] Load sphinxcontrib.fulltoc with imp.load_sources. <a href="https://github.com/bigdatagenomics/adam/pull/1916">#1916</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1912] Add asSingleFile, disableFastConcat to adam2fasta/q. <a href="https://github.com/bigdatagenomics/adam/pull/1913">#1913</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-651] Hive-style partitioning of parquet files by genomic position <a href="https://github.com/bigdatagenomics/adam/pull/1911">#1911</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>Minor unit test/style fixes. <a href="https://github.com/bigdatagenomics/adam/pull/1910">#1910</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1907] Bump to HTSJDK 2.13.2. <a href="https://github.com/bigdatagenomics/adam/pull/1908">#1908</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1882] Don’t abort adam-submit if ADAM_HOME is set. <a href="https://github.com/bigdatagenomics/adam/pull/1905">#1905</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1806] Add toVariants conversion from GenotypeRDD. <a href="https://github.com/bigdatagenomics/adam/pull/1904">#1904</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1882] Return true if ADAM_HOME is set, not exit 0. <a href="https://github.com/bigdatagenomics/adam/pull/1903">#1903</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1900] Close stream after reading VCF header. <a href="https://github.com/bigdatagenomics/adam/pull/1901">#1901</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1898] Support converting INFO fields set to empty (‘.’). <a href="https://github.com/bigdatagenomics/adam/pull/1899">#1899</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Add Kryo registration for two classes required for Spark 2.3.0. <a href="https://github.com/bigdatagenomics/adam/pull/1897">#1897</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>[ADAM-1853] Various improvements to readthedocs documentation. <a href="https://github.com/bigdatagenomics/adam/pull/1893">#1893</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1889][ADAM-1884] updated ReferenceRegion in python <a href="https://github.com/bigdatagenomics/adam/pull/1892">#1892</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1889] Run R/Python tests. <a href="https://github.com/bigdatagenomics/adam/pull/1890">#1890</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1886] fix for pyadam to recognize >1 egg file <a href="https://github.com/bigdatagenomics/adam/pull/1887">#1887</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1883] Python and R caching <a href="https://github.com/bigdatagenomics/adam/pull/1885">#1885</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1875] Add ability to timeout a piped command. <a href="https://github.com/bigdatagenomics/adam/pull/1881">#1881</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1871] Fix print call that broke python 3 support. <a href="https://github.com/bigdatagenomics/adam/pull/1880">#1880</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1832] Use awesome list style and link to bigdatagenomics/awesome-adam. <a href="https://github.com/bigdatagenomics/adam/pull/1879">#1879</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-651] Hive-style partitioning of parquet files by genomic position <a href="https://github.com/bigdatagenomics/adam/pull/1878">#1878</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>[ADAM-1874] Dedupe samples when loading VCFs. <a href="https://github.com/bigdatagenomics/adam/pull/1876">#1876</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Fixes Coverage python API and adds tests <a href="https://github.com/bigdatagenomics/adam/pull/1870">#1870</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>added filterByOverlappingRegion for python <a href="https://github.com/bigdatagenomics/adam/pull/1869">#1869</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>Add command line option for populating nested variant.annotation field in Genotype records. <a href="https://github.com/bigdatagenomics/adam/pull/1865">#1865</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Hive partitioned(v4) rebased <a href="https://github.com/bigdatagenomics/adam/pull/1864">#1864</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>[ADAM-1597] Move to Scala 2.11 and Spark 2.x. <a href="https://github.com/bigdatagenomics/adam/pull/1861">#1861</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1857] Fix formatting error due to forward slashes. <a href="https://github.com/bigdatagenomics/adam/pull/1860">#1860</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[ADAM-1858] Use getattr instead of Class.forName from python API. <a href="https://github.com/bigdatagenomics/adam/pull/1859">#1859</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>[ADAM-1836] Adds loadIndexedBam API to Python and Java. <a href="https://github.com/bigdatagenomics/adam/pull/1837">#1837</a> (<a href="https://github.com/fnothaft">fnothaft</a>)</li>
<li>Added check for bam index files in loadIndexedBam <a href="https://github.com/bigdatagenomics/adam/pull/1831">#1831</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1793] Adding vcf2adam and adam2vcf that handle separate variant and genotype data. <a href="https://github.com/bigdatagenomics/adam/pull/1794">#1794</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>added adam notebook <a href="https://github.com/bigdatagenomics/adam/pull/1778">#1778</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>[ADAM-1666] SQLContext creation fix for Spark 2.x <a href="https://github.com/bigdatagenomics/adam/pull/1777">#1777</a> (<a href="https://github.com/akmorrow13">akmorrow13</a>)</li>
<li>Add optional accumulator for VCF header lines to VCFOutFormatter. <a href="https://github.com/bigdatagenomics/adam/pull/1727">#1727</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>add hive style partitioning for contigName <a href="https://github.com/bigdatagenomics/adam/pull/1620">#1620</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>Add loadReadsFromSamString function into ADAMContext <a href="https://github.com/bigdatagenomics/adam/pull/1434">#1434</a> (<a href="https://github.com/xubo245">xubo245</a>)</li>
</ul>
<h3>Cannoli Version 0.2.0</h3>
<p><strong>Closed issues:</strong></p>
<ul>
<li>Update ADAM dependency version to 0.24.0. <a href="https://github.com/bigdatagenomics/cannoli/issues/118">#118</a></li>
<li>Javadoc error and warnings <a href="https://github.com/bigdatagenomics/cannoli/issues/115">#115</a></li>
<li>Update pipe method calls due to latest ADAM 0.24.0 snapshot <a href="https://github.com/bigdatagenomics/cannoli/issues/114">#114</a></li>
<li>Split commands with subcommands into separate Cannoli CLI classes <a href="https://github.com/bigdatagenomics/cannoli/issues/110">#110</a></li>
<li>Jenkins build failing due to upstream changes. <a href="https://github.com/bigdatagenomics/cannoli/issues/108">#108</a></li>
<li>Provide functions for use in cannoli-shell or notebooks. <a href="https://github.com/bigdatagenomics/cannoli/issues/104">#104</a></li>
<li>Error running BWA with Docker <a href="https://github.com/bigdatagenomics/cannoli/issues/103">#103</a></li>
<li>Allow use of Singularity instead of Docker <a href="https://github.com/bigdatagenomics/cannoli/issues/98">#98</a></li>
<li>Bump ADAM dependency version to 0.24.0-SNAPSHOT. <a href="https://github.com/bigdatagenomics/cannoli/issues/95">#95</a></li>
<li>Drop support for Scala 2.10 and Spark 1.x. <a href="https://github.com/bigdatagenomics/cannoli/issues/94">#94</a></li>
<li>Tidy up FreeBayes <a href="https://github.com/bigdatagenomics/cannoli/issues/67">#67</a></li>
<li>Support loading reference files from HDFS/other file system <a href="https://github.com/bigdatagenomics/cannoli/issues/50">#50</a></li>
<li>Attributes from freebayes header missing from variants and genotypes <a href="https://github.com/bigdatagenomics/cannoli/issues/43">#43</a></li>
<li>Factor out docker/mapping code <a href="https://github.com/bigdatagenomics/cannoli/issues/34">#34</a></li>
<li>Add wrappers for GMAP and GSNAP aligners <a href="https://github.com/bigdatagenomics/cannoli/issues/29">#29</a></li>
<li>Jenkins failures due to missing publish_scaladoc.sh <a href="https://github.com/bigdatagenomics/cannoli/issues/21">#21</a></li>
</ul>
<p><strong>Merged and closed pull requests:</strong></p>
<ul>
<li>[CANNOLI-118] Update ADAM dependency version to 0.24.0. <a href="https://github.com/bigdatagenomics/cannoli/pull/121">#121</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-110] Split commands with subcommands into separate Cannoli CLI classes. <a href="https://github.com/bigdatagenomics/cannoli/pull/117">#117</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-115] Fix javadoc error and warnings. <a href="https://github.com/bigdatagenomics/cannoli/pull/116">#116</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-108] Command argument to pipe is now Seq[String]. <a href="https://github.com/bigdatagenomics/cannoli/pull/109">#109</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-98] Adding container builder. <a href="https://github.com/bigdatagenomics/cannoli/pull/107">#107</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Allow Singularity to run containers <a href="https://github.com/bigdatagenomics/cannoli/pull/106">#106</a> (<a href="https://github.com/jpdna">jpdna</a>)</li>
<li>[CANNOLI-95] Bump ADAM dependency version to 0.24.0-SNAPSHOT <a href="https://github.com/bigdatagenomics/cannoli/pull/102">#102</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-94] Dropping support for Scala 2.10 and Spark 1.x. <a href="https://github.com/bigdatagenomics/cannoli/pull/101">#101</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-94][CANNOLI-95] Drop support for Scala 2.10 and Spark 1.x. <a href="https://github.com/bigdatagenomics/cannoli/pull/100">#100</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-43] Use accumulator for VCF header lines. <a href="https://github.com/bigdatagenomics/cannoli/pull/72">#72</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-104] Provide functions for use in cannoli-shell or notebooks. <a href="https://github.com/bigdatagenomics/cannoli/pull/69">#69</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>Add CannoliCommand and CannoliAlignerCommand. <a href="https://github.com/bigdatagenomics/cannoli/pull/54">#54</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
<li>[CANNOLI-29] Add minimal GMAP and GSNAP wrappers. <a href="https://github.com/bigdatagenomics/cannoli/pull/32">#32</a> (<a href="https://github.com/heuermh">heuermh</a>)</li>
</ul>
]]></content>
</entry>
<entry>
<title type="html"><![CDATA[ADAM 0.23.0 Released (+ Avocado and DECA releases)]]></title>
<link href="http://bigdatagenomics.github.io/blog/2018/01/04/adam-0-dot-23-dot-0-released-plus-avocado-cannoli-and-deca-releases/"/>
<updated>2018-01-04T09:47:53-08:00</updated>
<id>http://bigdatagenomics.github.io/blog/2018/01/04/adam-0-dot-23-dot-0-released-plus-avocado-cannoli-and-deca-releases</id>
<content type="html"><![CDATA[<p>We are excited to announce the availability of the ADAM 0.23.0 release, along
with releases of Avocado germline variant caller (release 0.1.0) and the DECA
copy number variant caller (release 0.2.0). These releases contain an extensive
number of feature additions, performance improvements, and bug patches, with
over 375 issues closed and pull requests merged or closed since the last ADAM
release.</p>
<p>Some of the highlights include:</p>
<ul>
<li>A validated, high-performance end-to-end alignment/variant calling pipeline
using ADAM, Cannoli, and Avocado.</li>
<li>Support for manipulating data using Spark SQL.</li>
<li>R and Python APIs for ADAM, including the ability to get a working deployment
of ADAM simply by running <code>pip install bdgenomics.adam</code>.</li>
</ul>
<p>With this release, we have also moved our documentation to Read The Docs:</p>
<ul>
<li><a href="http://adam.readthedocs.io/en/latest/">Read the Docs for ADAM</a></li>
<li><a href="http://bdg-avocado.readthedocs.io/en/latest/">Read the Docs for Avocado</a></li>
<li><a href="http://bdg-deca.readthedocs.io/en/latest/">Read the Docs for DECA</a></li>
</ul>
<p>This documentation describes how to deploy our tools on a variety of platforms,
including a local cluster, cloud computing, and through the
<a href="https://github.com/bd2kgenomics/toil">Toil</a> workflow manager. We already have
a <code>pip</code> installable Toil workflow for calling copy number variants with DECA,
which is packaged as part of the
<a href="http://bdg-workflows.readthedocs.io/en/latest/">bdgenomics.workflows</a> library.</p>
<p>This release is the last release of ADAM that supports Spark 1.x and Scala 2.10.
The upcoming release of ADAM will only support Spark 2.x and Scala 2.11. Avocado
and DECA have already dropped support for Spark 1.x.</p>
<p>Over the upcoming few weeks, we are working on a release of
<a href="https://github.com/bigdatagenomics/cannoli">Cannoli</a>, as well as Toil workflows
for running the ADAM/Avocado/Cannoli variant calling pipeline, and a preprint
describing the pipeline in more depth. We also are working on a release of the
<a href="https://github.com/bigdatagenomics/mango">Mango</a> visualization tool, which uses
ADAM as a backend for interactively visualizing large genomics datasets. Stay
tuned for more info!</p>
<h1>Variant Calling with Cannoli, ADAM, Avocado, and DECA</h1>
<p>With the collection of tools we have released, you can run highly rapid and
accurate variant calling entirely in Apache Spark. While we have introduced
Avocado and DECA earlier in this post, we haven’t talked about Cannoli yet.
Cannoli—-Italian for “a little pipe”—-uses ADAM’s <a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/pipes/">pipe API</a>
to parallelize commonly used genomics tools. Currently, Cannoli supports
aligning reads with Bowtie, Bowtie2, and BWA; calling variants with FreeBayes;
and annotating variant effects with SnpEff. We are working on support for many
more tools, as you can see in our <a href="https://github.com/bigdatagenomics/cannoli/issues">issue tracker</a>.
Please let us know if you are interested in any specific tool—-or even
better—-in helping us add support for a specific tool. ADAM’s pipe API makes
it extremely easy to parallelize an existing single node genomic analysis tool,
and most tools can be implemented on top of the pipe API in less than 10 lines
of code. For example, here’s how you could launch BWA using ADAM’s Pipe API in
Python:</p>
<p><img class="center" src="http://bigdatagenomics.github.io/images/pipe.png" width="750"></p>
<p>By using Cannoli, we can accelerate alignment with BWA to take approximately
10—15 minutes when running on a 1,024 core cluster.</p>
<p>We can couple this rapid alignment pipeline with the fast preprocessing stages in
ADAM and the variant calling stages in Avocado to call variants on a 60x coverage WGS
dataset in approximately 45 minutes on a 1,024 core cluster. Avocado can be used to
call variants on a single sample, or to jointly call variants using a <a href="http://bdg-avocado.readthedocs.io/en/latest/workflows/joint.html">gVCF-based
workflow</a>. When
running on 1,024 cores, we were able to jointly genotype more than 10TB of gVCFs
within approximately 6 hours. Avocado has >99% accuracy when genotyping SNPs,
and >96% accuracy when genotyping INDELs. Detailed benchmarking results can be
found in <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-204.pdf">Chapter 8 of this thesis</a>.
Avocado is two times faster than the GATK4’s Spark-based implementation of the
HaplotypeCaller, although it is worth pointing out that this is an unfair
comparison, as the HaplotypeCaller performs local reassembly, while Avocado does
not.</p>
<p>One interesting comparison is between the duplicate marking and BQSR tools in
ADAM and in the GATK4. In both cases, ADAM’s implementation is faster than the
GATK4’s equivalent implementation.</p>
<p><img class="center" src="http://bigdatagenomics.github.io/images/speedup-md.png"></p>
<p><img class="center" src="http://bigdatagenomics.github.io/images/speedup-bqsr.png"></p>
<p>We have work-in-progress towards a Spark SQL-based implementation of duplicate
marking, which will provide an additional >20% performance improvement. We hope to
introduce this new duplicate marker in the 0.24.0 release of ADAM.</p>
<h1>Manipulating Data using Spark SQL</h1>
<p>Since Apache Spark 1.6, there has been a major push in the Spark project to
rearchitect Spark around the Catalyst query optimizer and the Tungsten code
execution engine. These two engines are hidden behind Spark SQL’s DataFrame
and Dataset APIs, which provide a SQL-like interface for manipulating data
using Spark. Unlike Spark’s Resilient Distributed Dataset (RDD) API, the
DataFrame API allows the Catalyst query optimizer to examine the function that
the user is running. Catalyst can then rewrite the query so that it runs in a
more efficient manner, and can implement the query using the Tungsten engine
with performance that approaches native performance. This can provide
order-of-magnitude performance improvements for some queries, and it also
provides users with uniform query performance across Scala, Java, SQL, Python,
and R.</p>
<p>Although Spark SQL was introduced in 2015, we were not able to take advantage
of Spark SQL in ADAM until recently. While ADAM has always described genomics
data using a set of schemas, the library we used to represent these schemas
(<a href="https://avro.apache.org">Apache Avro</a>) was not compatible with Spark SQL. To
resolve this, we updated our core <a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/genomicRdd/"><code>GenomicRDD</code> interfaces</a>
to transparently convert between Spark’s RDD and DataFrame/Dataset APIs. We
describe the architecture we use for converting between these two representations
<a href="http://adam.readthedocs.io/en/adam-parent_2.11-0.23.0/api/genomicRdd/#transforming-genomicrdds-via-spark-sql">here</a>.
With the Spark SQL query interfaces built into <code>GenomicRDD</code>s, you can begin
running SQL queries on genomic data in fewer than 5 lines of code:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ adam-shell
</span><span class='line'>
</span><span class='line'>Welcome to
</span><span class='line'> ____ __
</span><span class='line'> / __/__ ___ _____/ /__
</span><span class='line'> _\ \/ _ \/ _ `/ __/ '_/
</span><span class='line'> /___/ .__/\_,_/_/ /_/\_\ version 2.2.1
</span><span class='line'> /_/
</span><span class='line'>
</span><span class='line'>Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
</span><span class='line'>
</span><span class='line'>scala> import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>
</span><span class='line'>scala> val reads = sc.loadAlignments("adam-core/src/test/resources/small.sam")
</span><span class='line'>reads: org.bdgenomics.adam.rdd.read.AlignmentRecordRDD = RDDBoundAlignmentRecordRDD with 2 reference sequences, 0 read groups, and 2 processing steps
</span><span class='line'>
</span><span class='line'>scala> reads.transformDataset(_.filter("readMapped=true")).dataset.show
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+
</span><span class='line'>|readInFragment|contigName| start|oldPosition| end|mapq| readName| sequence|qual|cigar|oldCigar|basesTrimmedFromStart|basesTrimmedFromEnd|readPaired|properPair|readMapped|mateMapped|failedVendorQualityChecks|duplicateRead|readNegativeStrand|mateNegativeStrand|primaryAlignment|secondaryAlignment|supplementaryAlignment|mismatchingPositions|origQual| attributes|recordGroupName|recordGroupSample|mateAlignmentStart|mateContigName|inferredInsertSize|
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+
</span><span class='line'>| 0| 1| 26472783| null| 26472858| 60|simread:1:2647278...|GTATAAGAGCAGCCTTA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|240997787| null|240997862| 60|simread:1:2409977...|CTTTATTTTTATTTTTA...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:39 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|189606653| null|189606728| 60|simread:1:1896066...|TGTATCTTCCTCCCCTG...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|207027738| null|207027813| 60|simread:1:2070277...|TTTAATAAATGTTGATT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 14397233| null| 14397308| 60|simread:1:1439723...|TAAAATGCCCCCATCTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|240344442| null|240344517| 24|simread:1:2403444...|TACAGGCACCCACCATC...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:61 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|153978724| null|153978799| 60|simread:1:1539787...|GCTCACTGCAGCCTCAA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|237728409| null|237728484| 28|simread:1:2377284...|TTTCTTTTTCTTTCTTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:59 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|231911906| null|231911981| 60|simread:1:2319119...|TCATGTAGCATGCATAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 50683371| null| 50683446| 60|simread:1:5068337...|GCTCAGGCCTTGCAAGA...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 37577445| null| 37577520| 60|simread:1:3757744...|CCTAGAGAAGCTCCCAC...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|195211965| null|195212040| 60|simread:1:1952119...|AAATAAAGTTTGGCTTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|163841413| null|163841488| 60|simread:1:1638414...|TGTGTAACTAACATAAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|101556378| null|101556453| 60|simread:1:1015563...|TTTATTTTTTGAGCATG...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 20101800| null| 20101875| 35|simread:1:2010180...|CTCAGGTGATCCACCCG...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:55 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>| 0| 1|186794283| null|186794358| 60|simread:1:1867942...|GACAAGATAGTACTTGA...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|165341382| null|165341457| 60|simread:1:1653413...|CTACTCTCATTGACTGT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 5469106| null| 5469181| 60|simread:1:5469106...|CTCATTCTCTCTCCTGC...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1| 89554252| null| 89554327| 60|simread:1:8955425...|AAATTAAACAGCTCGTT...|null| 75M| null| 0| 0| false| false| true| false| false| false| true| false| true| false| false| null| null|XS:i:0 AS:i:75 NM...| null| null| null| null| null|
</span><span class='line'>| 0| 1|169801933| null|169802008| 40|simread:1:1698019...|AGACTGGGTCTCACTAT...|null| 75M| null| 0| 0| false| false| true| false| false| false| false| false| true| false| false| null| null|XS:i:52 AS:i:75 N...| null| null| null| null| null|
</span><span class='line'>+--------------+----------+---------+-----------+---------+----+--------------------+--------------------+----+-----+--------+---------------------+-------------------+----------+----------+----------+----------+-------------------------+-------------+------------------+------------------+----------------+------------------+----------------------+--------------------+--------+--------------------+---------------+-----------------+------------------+--------------+------------------+</span></code></pre></td></tr></table></div></figure>
<p>While Spark SQL has specific optimizations for loading data from Apache Parquet
files, ADAM can be used to run Spark SQL queries against data stored in most
common genomics file formats, including SAM/BAM/CRAM, FASTQ, VCF/BCF, BED,
GTF/GFF3, IntervalList, NarrowPeak, FASTA and more.</p>
<h1>Using ADAM through Python and R</h1>
<p>As mentioned above, one of the major advantages of Spark SQL is that it provides
uniform query performance across Scala, Java, Python, and R. While ADAM is
mostly written in Scala, we have maintained Java APIs for a long time. However,
we have previously been unable to support Python or R APIs. Adding support
for Spark SQL eliminated the major issues that prevented us from adding Python
and R APIs. This release of ADAM introduces the <code>bdgenomics.adam</code> packages for
Python and R. Our Python API can be installed using <code>pip install
bdgenomics.adam</code>, and our R API is available from
<a href="https://github.com/bigdatagenomics/adam/releases/download/adam-parent-spark2_2.11-0.23.0/bdgenomics.adam_0.23.0.tar.gz">GitHub</a>.
We hope to make our R API available through CRAN in the 0.24.0 release of ADAM;
we are blocked on an issue upstream in Apache Spark and are tracking progress on
this issue at <a href="https://github.com/bigdatagenomics/adam/issues/1851">ADAM-1851</a>.</p>
<p>In addition to installing the <code>bdgenomics.adam</code> libraries, running <code>pip install
bdgenomics.adam</code> installs all of the ADAM command line tools:</p>
<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
<span class='line-number'>7</span>
<span class='line-number'>8</span>
<span class='line-number'>9</span>
<span class='line-number'>10</span>
<span class='line-number'>11</span>
<span class='line-number'>12</span>
<span class='line-number'>13</span>
<span class='line-number'>14</span>
<span class='line-number'>15</span>
<span class='line-number'>16</span>
<span class='line-number'>17</span>
<span class='line-number'>18</span>
<span class='line-number'>19</span>
<span class='line-number'>20</span>
<span class='line-number'>21</span>
<span class='line-number'>22</span>
<span class='line-number'>23</span>
<span class='line-number'>24</span>
<span class='line-number'>25</span>
<span class='line-number'>26</span>
<span class='line-number'>27</span>
<span class='line-number'>28</span>
<span class='line-number'>29</span>
<span class='line-number'>30</span>
<span class='line-number'>31</span>
<span class='line-number'>32</span>
<span class='line-number'>33</span>
<span class='line-number'>34</span>
<span class='line-number'>35</span>
<span class='line-number'>36</span>
<span class='line-number'>37</span>
<span class='line-number'>38</span>
<span class='line-number'>39</span>
<span class='line-number'>40</span>
<span class='line-number'>41</span>
<span class='line-number'>42</span>
<span class='line-number'>43</span>
<span class='line-number'>44</span>
<span class='line-number'>45</span>
<span class='line-number'>46</span>
<span class='line-number'>47</span>
<span class='line-number'>48</span>
<span class='line-number'>49</span>
<span class='line-number'>50</span>
<span class='line-number'>51</span>
<span class='line-number'>52</span>
<span class='line-number'>53</span>
<span class='line-number'>54</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>$ pip install bdgenomics.adam
</span><span class='line'>...
</span><span class='line'>Successfully installed bdgenomics.adam-0.23.0 py4j-0.10.4 pyspark-2.2.1
</span><span class='line'>
</span><span class='line'>$ adam-submit
</span><span class='line'>
</span><span class='line'> e 888~-_ e e e
</span><span class='line'> d8b 888 \ d8b d8b d8b
</span><span class='line'> /Y88b 888 | /Y88b d888bdY88b
</span><span class='line'> / Y88b 888 | / Y88b / Y88Y Y888b
</span><span class='line'> /____Y88b 888 / /____Y88b / YY Y888b
</span><span class='line'> / Y88b 888_-~ / Y88b / Y888b
</span><span class='line'>
</span><span class='line'>Usage: adam-submit [<spark-args> --] <adam-args>
</span><span class='line'>
</span><span class='line'>Choose one of the following commands:
</span><span class='line'>
</span><span class='line'>ADAM ACTIONS
</span><span class='line'> countKmers : Counts the k-mers/q-mers from a read dataset.
</span><span class='line'> countContigKmers : Counts the k-mers/q-mers from a read dataset.
</span><span class='line'> transformAlignments : Convert SAM/BAM to ADAM format and optionally perform read pre-processing transformations
</span><span class='line'> transformFeatures : Convert a file with sequence features into corresponding ADAM format and vice versa
</span><span class='line'> transformGenotypes : Convert a file with genotypes into corresponding ADAM format and vice versa
</span><span class='line'> transformVariants : Convert a file with variants into corresponding ADAM format and vice versa
</span><span class='line'> mergeShards : Merges the shards of a file
</span><span class='line'> reads2coverage : Calculate the coverage from a given ADAM file
</span><span class='line'>
</span><span class='line'>CONVERSION OPERATIONS
</span><span class='line'> fasta2adam : Converts a text FASTA sequence file into an ADAMNucleotideContig Parquet file which represents assembled sequences.
</span><span class='line'> adam2fasta : Convert ADAM nucleotide contig fragments to FASTA files
</span><span class='line'> adam2fastq : Convert BAM to FASTQ files
</span><span class='line'> transformFragments : Convert alignment records into fragment records.
</span><span class='line'>
</span><span class='line'>PRINT
</span><span class='line'> print : Print an ADAM formatted file
</span><span class='line'> flagstat : Print statistics on reads in an ADAM file (similar to samtools flagstat)
</span><span class='line'> view : View certain reads from an alignment-record file.
</span><span class='line'>
</span><span class='line'>
</span><span class='line'>$ adam-shell
</span><span class='line'>
</span><span class='line'>Welcome to
</span><span class='line'> ____ __
</span><span class='line'> / __/__ ___ _____/ /__
</span><span class='line'> _\ \/ _ \/ _ `/ __/ '_/
</span><span class='line'> /___/ .__/\_,_/_/ /_/\_\ version 2.2.1
</span><span class='line'> /_/
</span><span class='line'>
</span><span class='line'>Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_152)
</span><span class='line'>
</span><span class='line'>scala> import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>import org.bdgenomics.adam.rdd.ADAMContext._
</span><span class='line'>
</span><span class='line'>scala> :quit</span></code></pre></td></tr></table></div></figure>
<p>Most of the major APIs in ADAM can be used through our Python and R bindings,
with the exception of the region join API. We plan to enable the use of the
region join API in Python and R in the 0.24.0 release of ADAM, along with other
API compatibility improvements.</p>
<h1>Changes since Previous Release</h1>
<p>The full list of changes since version 0.22.0 is below.</p>
<!-- more -->
<p><strong>Closed issues:</strong></p>
<ul>
<li>Readthedocs build error <a href="https://github.com/bigdatagenomics/adam/issues/1854">#1854</a></li>
<li>Add pip release to release scripts <a href="https://github.com/bigdatagenomics/adam/issues/1847">#1847</a></li>
<li>Publish scaladoc script still attempts to build markdown docs <a href="https://github.com/bigdatagenomics/adam/issues/1845">#1845</a></li>
<li>Allow variant annotations to be loaded into genotypes <a href="https://github.com/bigdatagenomics/adam/issues/1838">#1838</a></li>
<li>Specify correct extensions for SAM/BAM output <a href="https://github.com/bigdatagenomics/adam/issues/1834">#1834</a></li>
<li>Fix link anchors and other issues in readthedocs <a href="https://github.com/bigdatagenomics/adam/issues/1822">#1822</a></li>
<li>Sphinx fulltoc is not included <a href="https://github.com/bigdatagenomics/adam/issues/1821">#1821</a></li>
<li>Readme link to bigdatagenomics/lime 404s <a href="https://github.com/bigdatagenomics/adam/issues/1819">#1819</a></li>
<li>Bump to Hadoop-BAM 7.9.1 <a href="https://github.com/bigdatagenomics/adam/issues/1817">#1817</a></li>
<li>LoadVariants Header Format <a href="https://github.com/bigdatagenomics/adam/issues/1815">#1815</a></li>
<li>Right and Left Outer Shuffle Region Join don’t match <a href="https://github.com/bigdatagenomics/adam/issues/1813">#1813</a></li>
<li>Pipe command can fail with empty partitions <a href="https://github.com/bigdatagenomics/adam/issues/1807">#1807</a></li>
<li>adam files with outdated formats throw FileNotFoundException <a href="https://github.com/bigdatagenomics/adam/issues/1804">#1804</a></li>
<li>Move GenomicRDD.writeTextRDD outside of GenomicRDD <a href="https://github.com/bigdatagenomics/adam/issues/1803">#1803</a></li>
<li>find-adam-assembly fails to recognize more than 1 jar <a href="https://github.com/bigdatagenomics/adam/issues/1801">#1801</a></li>
<li>tests/testthat.R failed on git head <a href="https://github.com/bigdatagenomics/adam/issues/1799">#1799</a></li>
<li>Run python and R tests conditionally in build <a href="https://github.com/bigdatagenomics/adam/issues/1795">#1795</a></li>
<li>scala-lang should be a provided dependency <a href="https://github.com/bigdatagenomics/adam/issues/1789">#1789</a></li>
<li>loadIndexedBam does an unnecessary union <a href="https://github.com/bigdatagenomics/adam/issues/1784">#1784</a></li>
<li>Release bdgenomics.adam R package on CRAN <a href="https://github.com/bigdatagenomics/adam/issues/1783">#1783</a></li>
<li>Issue with transformVariant // Adam to vcf <a href="https://github.com/bigdatagenomics/adam/issues/1782">#1782</a></li>
<li>Add code of conduct <a href="https://github.com/bigdatagenomics/adam/issues/1779">#1779</a></li>
<li>Reinstantiation of SQLContext in pyadam ADAMContext <a href="https://github.com/bigdatagenomics/adam/issues/1774">#1774</a></li>
<li>Genotypes should only contain the core variant fields <a href="https://github.com/bigdatagenomics/adam/issues/1770">#1770</a></li>
<li>Add SingleFASTQInFormatter <a href="https://github.com/bigdatagenomics/adam/issues/1768">#1768</a></li>
<li>INDEL realigner can emit negative partition IDs <a href="https://github.com/bigdatagenomics/adam/issues/1763">#1763</a></li>
<li>Request for a new release <a href="https://github.com/bigdatagenomics/adam/issues/1762">#1762</a></li>
<li>INDEL realigner generates targets for reads with more than 1 INDEL <a href="https://github.com/bigdatagenomics/adam/issues/1753">#1753</a></li>
<li>Fragment Issue <a href="https://github.com/bigdatagenomics/adam/issues/1752">#1752</a></li>
<li>Variant Caller!!! <a href="https://github.com/bigdatagenomics/adam/issues/1751">#1751</a></li>
<li>Spark Version!! <a href="https://github.com/bigdatagenomics/adam/issues/1750">#1750</a></li>
<li>ReferenceRegion.subtract eliminating valid regions <a href="https://github.com/bigdatagenomics/adam/issues/1747">#1747</a></li>
<li>New Shuffle Join Implementation – Left Outer + Group By Left <a href="https://github.com/bigdatagenomics/adam/issues/1745">#1745</a></li>
<li>command failure after build success <a href="https://github.com/bigdatagenomics/adam/issues/1744">#1744</a></li>
<li>Recalibrate_base_Qualities <a href="https://github.com/bigdatagenomics/adam/issues/1743">#1743</a></li>
<li>Standardize regionFn for ShuffleJoin returned objects <a href="https://github.com/bigdatagenomics/adam/issues/1740">#1740</a></li>
<li>Shuffle, Broadcast Joins with threshold <a href="https://github.com/bigdatagenomics/adam/issues/1739">#1739</a></li>
<li>Adam on Spark 2.1 <a href="https://github.com/bigdatagenomics/adam/issues/1738">#1738</a></li>
<li>Opening up permission on GenericGenomicRDD constructor <a href="https://github.com/bigdatagenomics/adam/issues/1735">#1735</a></li>
<li>Consistency on ShuffleRegionJoin returns <a href="https://github.com/bigdatagenomics/adam/issues/1734">#1734</a></li>
<li>vcf2adam support <a href="https://github.com/bigdatagenomics/adam/issues/1731">#1731</a></li>
<li>Cloud-scale BWA MEM <a href="https://github.com/bigdatagenomics/adam/issues/1730">#1730</a></li>
<li>Aligned Human Genome couldn’t convert to Adam <a href="https://github.com/bigdatagenomics/adam/issues/1729">#1729</a></li>
<li>Mark Duplicates <a href="https://github.com/bigdatagenomics/adam/issues/1726">#1726</a></li>
<li>Genomics Pipeline <a href="https://github.com/bigdatagenomics/adam/issues/1724">#1724</a></li>
<li>.fastq Alignment <a href="https://github.com/bigdatagenomics/adam/issues/1723">#1723</a></li>
<li>Is it correct Adam file <a href="https://github.com/bigdatagenomics/adam/issues/1720">#1720</a></li>
<li>.fastQ to .adam <a href="https://github.com/bigdatagenomics/adam/issues/1718">#1718</a></li>
<li>Unable to create .adam from .sam <a href="https://github.com/bigdatagenomics/adam/issues/1717">#1717</a></li>
<li>Add adam- prefix to distribution module name <a href="https://github.com/bigdatagenomics/adam/issues/1716">#1716</a></li>
<li>Python load methods don’t have ability to specify validation stringency <a href="https://github.com/bigdatagenomics/adam/issues/1715">#1715</a></li>
<li>NPE when trying to map <em>loadVariants</em> over RDD <a href="https://github.com/bigdatagenomics/adam/issues/1713">#1713</a></li>
<li>Add left normalization of INDELs as an RDD level primitive <a href="https://github.com/bigdatagenomics/adam/issues/1709">#1709</a></li>
<li>Allow validation stringency to be set in AnySAMOutFormatter <a href="https://github.com/bigdatagenomics/adam/issues/1703">#1703</a></li>
<li>InterleavedFastqInFormatter should sort by readInFragment <a href="https://github.com/bigdatagenomics/adam/issues/1702">#1702</a></li>
<li>Allow silencing the # of reads in fragment warning in InterleavedFastqInFormatter <a href="https://github.com/bigdatagenomics/adam/issues/1701">#1701</a></li>
<li>GenomicRDD.toXxx method names should be consistent <a href="https://github.com/bigdatagenomics/adam/issues/1699">#1699</a></li>
<li>Exception thrown in VariantContextConverter.formatAllelicDepth despite SILENT validation stringency <a href="https://github.com/bigdatagenomics/adam/issues/1695">#1695</a></li>
<li>Make GenomicRDD.toString more adam-shell friendly <a href="https://github.com/bigdatagenomics/adam/issues/1694">#1694</a></li>
<li>Add adam-shell friendly VariantContextRDD.saveAsVcf method <a href="https://github.com/bigdatagenomics/adam/issues/1693">#1693</a></li>
<li>change bdgenomics.adam package name for adam-python to bdg-adam <a href="https://github.com/bigdatagenomics/adam/issues/1691">#1691</a></li>
<li>Conflict in bdg-formats dependency version due to org.hammerlab:genomic-loci <a href="https://github.com/bigdatagenomics/adam/issues/1688">#1688</a></li>
<li>Convert and store variant quality field. <a href="https://github.com/bigdatagenomics/adam/issues/1682">#1682</a></li>
<li>Region join shows non-determinism <a href="https://github.com/bigdatagenomics/adam/issues/1680">#1680</a></li>
<li>Shuffle region join throws multimapped exception for unmapped reads <a href="https://github.com/bigdatagenomics/adam/issues/1679">#1679</a></li>
<li>Push validation checks down to INFO/FORMAT fields <a href="https://github.com/bigdatagenomics/adam/issues/1676">#1676</a></li>
<li>IndexOutOfBounds thrown when saving gVCF with no likelihoods <a href="https://github.com/bigdatagenomics/adam/issues/1673">#1673</a></li>
<li>Generate docs from R API for distribution <a href="https://github.com/bigdatagenomics/adam/issues/1672">#1672</a></li>
<li>Support loading a subset of VCF fields <a href="https://github.com/bigdatagenomics/adam/issues/1670">#1670</a></li>
<li>Error with metadata: Multivalued flags are not supported for INFO lines <a href="https://github.com/bigdatagenomics/adam/issues/1669">#1669</a></li>
<li>Include bdg.adam-0.23.0.tar.gz in distribution tarballs <a href="https://github.com/bigdatagenomics/adam/issues/1668">#1668</a></li>
<li>Include bdgenomics.adam-0.23.0_SNAPSHOT-py2.7.egg in distribution tarball <a href="https://github.com/bigdatagenomics/adam/issues/1667">#1667</a></li>
<li>Add SUPPORT.md file to complement CONTRIBUTING.md <a href="https://github.com/bigdatagenomics/adam/issues/1664">#1664</a></li>
<li>Can’t merge BAM files containing the same sample <a href="https://github.com/bigdatagenomics/adam/issues/1663">#1663</a></li>
<li>Incorrect README.md kmer.scala loadAliments method parameter name <a href="https://github.com/bigdatagenomics/adam/issues/1662">#1662</a></li>
<li>Add performance benchmarks similar to Samtools CRAM benchmarking page <a href="https://github.com/bigdatagenomics/adam/issues/1661">#1661</a></li>
<li>Transient bad GZIP header bug when loading BGZF FASTQ <a href="https://github.com/bigdatagenomics/adam/issues/1658">#1658</a></li>
<li>bdgenomics.adam vs bdg.adam for R/Python APIs <a href="https://github.com/bigdatagenomics/adam/issues/1655">#1655</a></li>
<li>Need adamR script <a href="https://github.com/bigdatagenomics/adam/issues/1649">#1649</a></li>
<li>incorrect grep for assembly jars in bin/pyadam <a href="https://github.com/bigdatagenomics/adam/issues/1647">#1647</a></li>
<li>VariantRDD union creates multiple records for the same SNP ID <a href="https://github.com/bigdatagenomics/adam/issues/1644">#1644</a></li>
<li>S3 access documentation <a href="https://github.com/bigdatagenomics/adam/issues/1643">#1643</a></li>
<li>Algorithms docs formatting <a href="https://github.com/bigdatagenomics/adam/issues/1639">#1639</a></li>
<li>Building downstream apps docs reformatting <a href="https://github.com/bigdatagenomics/adam/issues/1638">#1638</a></li>
<li>FastqInputFormat.FILE_SPLITTABLE in conf not getting passed properly <a href="https://github.com/bigdatagenomics/adam/issues/1635">#1635</a></li>
<li>Add benchmarks to documentation <a href="https://github.com/bigdatagenomics/adam/issues/1634">#1634</a></li>
<li>Intro docs contain outdated/incompatible code <a href="https://github.com/bigdatagenomics/adam/issues/1633">#1633</a></li>
<li>Intro docs missing a number of active projects <a href="https://github.com/bigdatagenomics/adam/issues/1632">#1632</a></li>
<li>Installation instructions for Homebrew missing from documentation <a href="https://github.com/bigdatagenomics/adam/issues/1631">#1631</a></li>
<li>Architecture section is missing from docs <a href="https://github.com/bigdatagenomics/adam/issues/1630">#1630</a></li>
<li>Seq<VCFCompoundHeaderLine> vs. Seq<VCFHeaderLine> with javac <a href="https://github.com/bigdatagenomics/adam/issues/1625">#1625</a></li>
<li>ProcessingStep missing from adam-codegen <a href="https://github.com/bigdatagenomics/adam/issues/1623">#1623</a></li>
<li>Add ADAM recipe to bioconda <a href="https://github.com/bigdatagenomics/adam/issues/1618">#1618</a></li>
<li>adam-submit cannot find assembly jar if installed as symlink <a href="https://github.com/bigdatagenomics/adam/issues/1616">#1616</a></li>
<li>Expose transform/transmute in Java/Python/R <a href="https://github.com/bigdatagenomics/adam/issues/1615">#1615</a></li>
<li>Expose VariantContextRDD in R/Python <a href="https://github.com/bigdatagenomics/adam/issues/1614">#1614</a></li>
<li>Expose pipe API from Python/R <a href="https://github.com/bigdatagenomics/adam/issues/1611">#1611</a></li>
<li>Serialization issue with TwoBitFile <a href="https://github.com/bigdatagenomics/adam/issues/1610">#1610</a></li>
<li>Snapshot Distribution Does not include jar files <a href="https://github.com/bigdatagenomics/adam/issues/1607">#1607</a></li>
<li>ManualRegionPartitioner is broken for ParallelFileMerger codepath <a href="https://github.com/bigdatagenomics/adam/issues/1602">#1602</a></li>
<li>VariantRDD doesn’t save partition map <a href="https://github.com/bigdatagenomics/adam/issues/1601">#1601</a></li>
<li>Scala copy method not supported in abstract classes such as AlignmentRecordRDD <a href="https://github.com/bigdatagenomics/adam/issues/1599">#1599</a></li>
<li>Interleaved FASTQ recognizes only /1 suffix pattern <a href="https://github.com/bigdatagenomics/adam/issues/1589">#1589</a></li>
<li>Use empty sequence dictionary when loading features <a href="https://github.com/bigdatagenomics/adam/issues/1588">#1588</a></li>
<li>New Illumina FASTQ spec adds metadata to read name line <a href="https://github.com/bigdatagenomics/adam/issues/1585">#1585</a></li>
<li>first run of ADAM <a href="https://github.com/bigdatagenomics/adam/issues/1582">#1582</a></li>
<li>Add unit test coverage for BED12 parser and writer <a href="https://github.com/bigdatagenomics/adam/issues/1579">#1579</a></li>
<li>Spark 1.x Scala 2.10 snapshot artifacts missing since 31 March 2017 <a href="https://github.com/bigdatagenomics/adam/issues/1578">#1578</a></li>
<li>Unable to save GenomicRDDs after a join. <a href="https://github.com/bigdatagenomics/adam/issues/1576">#1576</a></li>
<li>Add filterBySequenceDictionary to GenomicRDD <a href="https://github.com/bigdatagenomics/adam/issues/1575">#1575</a></li>
<li>Unaligned Trait does nothing <a href="https://github.com/bigdatagenomics/adam/issues/1573">#1573</a></li>
<li>Bump to bdg-formats 0.11.1 <a href="https://github.com/bigdatagenomics/adam/issues/1570">#1570</a></li>
<li>PhredUtils conversion to log probabilities has insufficient resolution for PLs <a href="https://github.com/bigdatagenomics/adam/issues/1569">#1569</a></li>
<li>Reference model import code is borked <a href="https://github.com/bigdatagenomics/adam/issues/1568">#1568</a></li>
<li>SequenceDictionary vs Feature[RDD] of reference length features <a href="https://github.com/bigdatagenomics/adam/issues/1567">#1567</a></li>
<li>giab-NA12878 truth_small_variants.vcf.gz header issues <a href="https://github.com/bigdatagenomics/adam/issues/1566">#1566</a></li>
<li>VCF header read from stream ignored in VCFOutFormatter <a href="https://github.com/bigdatagenomics/adam/issues/1564">#1564</a></li>
<li>VCF genotype Number=A attribute throws ArrayIndexOutOfBoundsException <a href="https://github.com/bigdatagenomics/adam/issues/1562">#1562</a></li>
<li>Save compressed single file VCF via HadoopBAM <a href="https://github.com/bigdatagenomics/adam/issues/1554">#1554</a></li>
<li>bucketing strategy <a href="https://github.com/bigdatagenomics/adam/issues/1553">#1553</a></li>
<li>Is parquet using delta encoding for positions? <a href="https://github.com/bigdatagenomics/adam/issues/1552">#1552</a></li>
<li>Export to VCF does not include symbolic non-ref if site has a called alt <a href="https://github.com/bigdatagenomics/adam/issues/1551">#1551</a></li>
<li>Refactor filterByOverlappingRegions not to require a List <a href="https://github.com/bigdatagenomics/adam/issues/1549">#1549</a></li>
<li>Move docs to Sphinx/pure Markdown <a href="https://github.com/bigdatagenomics/adam/issues/1548">#1548</a></li>
<li>java.lang.IncompatibleClassChangeError: Implementing class <a href="https://github.com/bigdatagenomics/adam/issues/1544">#1544</a></li>
<li>Support locus predicate in <code>TransformAlignments</code> <a href="https://github.com/bigdatagenomics/adam/issues/1539">#1539</a></li>
<li>Visibility from Java, jrdd has private access in AvroGenomicRDD <a href="https://github.com/bigdatagenomics/adam/issues/1538">#1538</a></li>
<li>Rename o.b.adam.apis.java package to o.b.adam.api.java <a href="https://github.com/bigdatagenomics/adam/issues/1537">#1537</a></li>
<li>VCF header genotype reserved key FT cardinality clobbered by htsjdk <a href="https://github.com/bigdatagenomics/adam/issues/1535">#1535</a></li>
<li>Compute a SequenceDictionary from a *.genome file <a href="https://github.com/bigdatagenomics/adam/issues/1534">#1534</a></li>
<li>Queryname sorted check should check for queryname grouped as well <a href="https://github.com/bigdatagenomics/adam/issues/1530">#1530</a></li>
<li>Bump to bdg-formats 0.11.0 <a href="https://github.com/bigdatagenomics/adam/issues/1520">#1520</a></li>
<li>Move to Spark 2.2, Parquet 1.8.2 <a href="https://github.com/bigdatagenomics/adam/issues/1517">#1517</a></li>
<li>Minor refactor for TreeRegionJoin for consistency <a href="https://github.com/bigdatagenomics/adam/issues/1514">#1514</a></li>
<li>Allow +Inf and -Inf Float values when reading VCF <a href="https://github.com/bigdatagenomics/adam/issues/1512">#1512</a></li>
<li>SparkFiles temp directory path should be accessible as a variable <a href="https://github.com/bigdatagenomics/adam/issues/1510">#1510</a></li>
<li>SparkFiles.get expects just the filename <a href="https://github.com/bigdatagenomics/adam/issues/1509">#1509</a></li>
<li>Split apart #1324 <a href="https://github.com/bigdatagenomics/adam/issues/1507">#1507</a></li>
<li>Where can I find “Phred-scaled quality score” (QUAL)? <a href="https://github.com/bigdatagenomics/adam/issues/1506">#1506</a></li>
<li>Alignment Record sort is not consistent with samtools <a href="https://github.com/bigdatagenomics/adam/issues/1504">#1504</a></li>
<li>Sequence dictionary records in TwoBitFile are not stable <a href="https://github.com/bigdatagenomics/adam/issues/1502">#1502</a></li>
<li>Move coverage counter over to Dataset API <a href="https://github.com/bigdatagenomics/adam/issues/1501">#1501</a></li>
<li>Allow users to set the minimum partition count across all load methods <a href="https://github.com/bigdatagenomics/adam/issues/1500">#1500</a></li>
<li>Enable reuse of broadcast object across broadcast region joins <a href="https://github.com/bigdatagenomics/adam/issues/1499">#1499</a></li>
<li>Take union across genomic RDDs <a href="https://github.com/bigdatagenomics/adam/issues/1497">#1497</a></li>
<li>Adam files created by vcf2adam is not recognizable <a href="https://github.com/bigdatagenomics/adam/issues/1496">#1496</a></li>
<li>Scalatest log output disappears with Maven 3.5.0 <a href="https://github.com/bigdatagenomics/adam/issues/1495">#1495</a></li>
<li>ArrayOutOfBoundsException in vcf2adam (spark2_2.11-0.22.0) on UK10K VCFs (VCFv4.1) <a href="https://github.com/bigdatagenomics/adam/issues/1494">#1494</a></li>
<li>ReferenceRegion overlaps and covers returns false if overlap is 1 <a href="https://github.com/bigdatagenomics/adam/issues/1492">#1492</a></li>
<li>Provide asSingleFile parameter for saveAsFastq and related <a href="https://github.com/bigdatagenomics/adam/issues/1490">#1490</a></li>
<li>Min Phred score gets bumped by 33 twice in BQSR <a href="https://github.com/bigdatagenomics/adam/issues/1488">#1488</a></li>
<li>Should throw error when BAM header load fails <a href="https://github.com/bigdatagenomics/adam/issues/1486">#1486</a></li>
<li>Default value for reads.toCoverage(collapse) should be false <a href="https://github.com/bigdatagenomics/adam/issues/1483">#1483</a></li>
<li>Refactor ADAMContext loadXxx methods for consistency <a href="https://github.com/bigdatagenomics/adam/issues/1481">#1481</a></li>
<li>loadGenotypes three time <a href="https://github.com/bigdatagenomics/adam/issues/1480">#1480</a></li>
<li>Fall back to sequential concat when HDFS concat fails <a href="https://github.com/bigdatagenomics/adam/issues/1478">#1478</a></li>
<li>VCF line with <code>.</code> ALT gets dropped <a href="https://github.com/bigdatagenomics/adam/issues/1476">#1476</a></li>
<li>ADAM works on Cloudera but does NOT work on MAPR <a href="https://github.com/bigdatagenomics/adam/issues/1475">#1475</a></li>
<li>Clean up ReferenceRegion.scala <a href="https://github.com/bigdatagenomics/adam/issues/1474">#1474</a></li>
<li>Allow joins on regions that are within a threshold (instead of requiring overlap) <a href="https://github.com/bigdatagenomics/adam/issues/1473">#1473</a></li>
<li>FeatureRDD.toCoverage throws NullPointerException when there is no coverage information <a href="https://github.com/bigdatagenomics/adam/issues/1471">#1471</a></li>
<li>Add quality score binner <a href="https://github.com/bigdatagenomics/adam/issues/1462">#1462</a></li>
<li>Splittable compression and FASTQ <a href="https://github.com/bigdatagenomics/adam/issues/1457">#1457</a></li>
<li>Don’t convert .{different-type}.adam in loadAlignments and loadFragments <a href="https://github.com/bigdatagenomics/adam/issues/1456">#1456</a></li>
<li>New primitives for adam-core <a href="https://github.com/bigdatagenomics/adam/issues/1454">#1454</a></li>
<li>Port over code for populating SequenceDictionaries from .dict files <a href="https://github.com/bigdatagenomics/adam/issues/1449">#1449</a></li>
<li>Ignore failed push to Coveralls during CI builds <a href="https://github.com/bigdatagenomics/adam/issues/1444">#1444</a></li>
<li>No asSingleFile parameter for saveAsFasta in NucleotideContigFragmentRDD <a href="https://github.com/bigdatagenomics/adam/issues/1438">#1438</a></li>
<li>shufflejoin and ArrayIndexOutOfBoundsException <a href="https://github.com/bigdatagenomics/adam/issues/1436">#1436</a></li>
<li>Document using ADAM snapshot <a href="https://github.com/bigdatagenomics/adam/issues/1432">#1432</a></li>
<li>Improve metrics coverage across ADAMContext load methods <a href="https://github.com/bigdatagenomics/adam/issues/1428">#1428</a></li>
<li>loadReferenceFile missing from Java API <a href="https://github.com/bigdatagenomics/adam/issues/1421">#1421</a></li>
<li>loadCoverage missing from Java API <a href="https://github.com/bigdatagenomics/adam/issues/1420">#1420</a></li>
<li>Question: How to get paired-end alignemntRecord like RDD[AlignmentRecord, AlignmentRecordRDD]? <a href="https://github.com/bigdatagenomics/adam/issues/1419">#1419</a></li>
<li>Clean up possibly unused methods in Projection <a href="https://github.com/bigdatagenomics/adam/issues/1417">#1417</a></li>
<li>Problem loading SNPeff annotated VCF <a href="https://github.com/bigdatagenomics/adam/issues/1390">#1390</a></li>
<li>RecordGroupDictionary should support <code>isEmpty</code> <a href="https://github.com/bigdatagenomics/adam/issues/1380">#1380</a></li>
<li>Get rid of mutable collection transformations in ShuffleRegionJoin <a href="https://github.com/bigdatagenomics/adam/issues/1379">#1379</a></li>
<li>Add tab5/6 as native output format for AlignmentRecordRDD <a href="https://github.com/bigdatagenomics/adam/issues/1377">#1377</a></li>
<li>ValidationStringency in MDTagging should apply to reads on unknown references <a href="https://github.com/bigdatagenomics/adam/issues/1365">#1365</a></li>
<li>Assembly final name doesn’t include spark2 for Spark 2.x builds <a href="https://github.com/bigdatagenomics/adam/issues/1361">#1361</a></li>
<li>Merge reads2fragments and fragments2reads into a single CLI <a href="https://github.com/bigdatagenomics/adam/issues/1359">#1359</a></li>
<li>Investigate failures to load ExAC.0.3.GRCh38.vcf variants <a href="https://github.com/bigdatagenomics/adam/issues/1351">#1351</a></li>
<li>adam-shell does not allow additional jars via Spark jars argument <a href="https://github.com/bigdatagenomics/adam/issues/1349">#1349</a></li>
<li>Loading GZipped VCF returns an empty RDD <a href="https://github.com/bigdatagenomics/adam/issues/1333">#1333</a></li>
<li>Bump Spark 2 build to Spark 2.1.0 <a href="https://github.com/bigdatagenomics/adam/issues/1330">#1330</a></li>
<li>Rename Transform command TransformAlignments or similar <a href="https://github.com/bigdatagenomics/adam/issues/1328">#1328</a></li>
<li>Replace ADAM2Vcf and Vcf2ADAM commands with TransformGenotypes and TransformVariants <a href="https://github.com/bigdatagenomics/adam/issues/1327">#1327</a></li>
<li>FeatureRDD instantiation tries to cache the RDD <a href="https://github.com/bigdatagenomics/adam/issues/1321">#1321</a></li>
<li>Repository for Pipe API wrappers for bioinformatics tools <a href="https://github.com/bigdatagenomics/adam/issues/1314">#1314</a></li>
<li>Trying to get Spark pipeline working with slightly out of date code. <a href="https://github.com/bigdatagenomics/adam/issues/1313">#1313</a></li>
<li>Support for gVCF merging and genotyping (e.g. CombineGVCFs and GenotypeGVCFs) <a href="https://github.com/bigdatagenomics/adam/issues/1312">#1312</a></li>
<li>Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) <a href="https://github.com/bigdatagenomics/adam/issues/1311">#1311</a></li>
<li>Don’t include log4j.properties in published JAR <a href="https://github.com/bigdatagenomics/adam/issues/1300">#1300</a></li>
<li>Removing ProgramRecords info when saving data to sam/bam? <a href="https://github.com/bigdatagenomics/adam/issues/1257">#1257</a></li>
<li>ADAM on Slurm/LSF <a href="https://github.com/bigdatagenomics/adam/issues/1229">#1229</a></li>
<li>Maintaining sorted/partitioned knowledge <a href="https://github.com/bigdatagenomics/adam/issues/1216">#1216</a></li>
<li>Evaluate bdg-convert external conversion library proposal <a href="https://github.com/bigdatagenomics/adam/issues/1197">#1197</a></li>
<li>Port AMPCamp Tutorial over <a href="https://github.com/bigdatagenomics/adam/issues/1174">#1174</a></li>
<li>Top level WrappedRDD or similar abstraction <a href="https://github.com/bigdatagenomics/adam/issues/1173">#1173</a></li>
<li>GFF3 formatted features written as single file must include gff-version pragma <a href="https://github.com/bigdatagenomics/adam/issues/1169">#1169</a></li>