-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlibrary.bib
8815 lines (8169 loc) · 948 KB
/
library.bib
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
@article{Ackermann2009a,
abstract = {BACKGROUND: Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear.
RESULTS: We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods.
CONCLUSION: We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.},
author = {Ackermann, Marit and Strimmer, Korbinian},
date = {2009-01},
doi = {10/dvzd5z},
eprint = {19192285},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Ackermann and Strimmer - 2009 - A general modular framework for gene set enrichmen.pdf},
isbn = {1471210510},
issn = {1471-2105},
journaltitle = {BMC bioinformatics},
keywords = {Algorithms,Animals,Computer Simulation,Databases; Genetic,Humans,Models; Statistical,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: methods},
pages = {47},
title = {A General Modular Framework for Gene Set Enrichment Analysis.},
volume = {10}
}
@article{Adcock1997,
author = {Adcock, C. J.},
date = {1997-07},
doi = {10/csrcsd},
file = {/Users/ryan/Documents/Zotero Library/Adcock - 1997 - Sample size determination a review.pdf},
issn = {0039-0526},
journaltitle = {Journal of the Royal Statistical Society: Series D (The Statistician)},
keywords = {average coverage criterion,average length criterion,bayes factors,bayesian methods,binomial,coherence,distribution,hypothesis testing,maximum expected utility,mcnemar,multinomial distribution,multivariate analysis,normal distribution,pivots,regression,s test,sample size determination,tolerance intervals,worst},
number = {2},
pages = {261-283},
title = {Sample Size Determination: A Review},
volume = {46}
}
@article{Aggarwal2005,
abstract = {Mesenchymal stem cells (MSCs) are multipotent cells found in several adult tissues. Transplanted allogeneic MSCs can be detected in recipients at extended time points, indicating a lack of immune recognition and clearance. As well, a role for bone marrow-derived MSCs in reducing the incidence and severity of graft-versus-host disease (GVHD) during allogeneic transplantation has recently been reported; however, the mechanisms remain to be investigated. We examined the immunomodulatory functions of human MSCs (hMSCs) by coculturing them with purified subpopulations of immune cells and report here that hMSCs altered the cytokine secretion profile of dendritic cells (DCs), naive and effector T cells (T helper 1 [TH1] and TH2), and natural killer (NK) cells to induce a more anti-inflammatory or tolerant phenotype. Specifically, the hMSCs caused mature DCs type 1 (DC1) to decrease tumor necrosis factor {$\alpha$} (TMF-{$\alpha$}) secretion and mature DC2 to increase interleukin-10 (IL-10) secretion; hMSCs caused TH1 cells to decrease interferon {$\gamma$} (IFN-{$\gamma$}) and caused the TH2 cells to increase secretion of IL-4; hMSCs caused an increase in the proportion of regulatory T cells (T Regs) present; and hMSCs decreased secretion of IFN-{$\gamma$} from the NK cells. Mechanistically, the hMSCs produced elevated prostaglandin E2 (PGE2) in co-cultures, and inhibitors of PGE2 production mitigated hMSC-mediafed immune modulation. These data offer insight into the interactions between allogeneic MSCs and immune cells and provide mechanisms likely involved with the in vivo MSC-mediated induction of tolerance that could be therapeutic for reduction of GVHD, rejection, and modulation of inflammation. \textcopyright{} 2005 by The American Society of Hematology.},
author = {Aggarwal, Sudeepta and Pittenger, Mark F.},
date = {2005-02-15},
doi = {10/fnb37s},
eprint = {15494428},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Aggarwal and Pittenger - 2005 - Human mesenchymal stem cells modulate allogeneic i.pdf},
issn = {00064971},
journaltitle = {Blood},
number = {4},
pages = {1815-1822},
title = {Human Mesenchymal Stem Cells Modulate Allogeneic Immune Cell Responses},
volume = {105}
}
@article{Aitken2018,
abstract = {Background: CTCF binding to DNA helps partition the mammalian genome into discrete structural and regulatory domains. Complete removal of CTCF from mammalian cells causes catastrophic genome dysregulation, likely due to widespread collapse of 3D chromatin looping and alterations to inter- and intra-TAD interactions within the nucleus. In contrast, Ctcf hemizygous mice with lifelong reduction of CTCF expression are viable, albeit with increased cancer incidence. Here, we exploit chronic Ctcf hemizygosity to reveal its homeostatic roles in maintaining genome function and integrity. Results: We find that Ctcf hemizygous cells show modest but robust changes in almost a thousand sites of genomic CTCF occupancy; these are enriched for lower affinity binding events with weaker evolutionary conservation across the mouse lineage. Furthermore, we observe dysregulation of the expression of several hundred genes, which are concentrated in cancer-related pathways, and are caused by changes in transcriptional regulation. Chromatin structure is preserved but some loop interactions are destabilized; these are often found around differentially expressed genes and their enhancers. Importantly, the transcriptional alterations identified in vitro are recapitulated in mouse tumors and also in human cancers. Conclusions: This multi-dimensional genomic and epigenomic profiling of a Ctcf hemizygous mouse model system shows that chronic depletion of CTCF dysregulates steady-state gene expression by subtly altering transcriptional regulation, changes which can also be observed in primary tumors.},
author = {Aitken, Sarah J. and Ibarra-Soria, Ximena and Kentepozidou, Elissavet and Flicek, Paul and Feig, Christine and Marioni, John C. and Odom, Duncan T.},
date = {2018},
doi = {10/gd3fhd},
file = {/Users/ryan/Documents/Zotero Library/Aitken et al. - 2018 - CTCF maintains regulatory homeostasis of cancer pa.pdf},
issn = {1474760X},
journaltitle = {Genome Biology},
keywords = {Cancer,Chromatin architecture,Chromatin state,CTCF,Hemizygosity,Transcription},
number = {1},
pages = {1-17},
title = {{{CTCF}} Maintains Regulatory Homeostasis of Cancer Pathways},
volume = {19}
}
@article{Alexa2006a,
abstract = {MOTIVATION: The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. This list is only the starting point for a meaningful biological interpretation. Modern methods identify relevant biological processes or functions from gene expression data by scoring the statistical significance of predefined functional gene groups, e.g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance.
RESULTS: We present two novel algorithms that improve GO group scoring using the underlying GO graph topology. The algorithms are evaluated on real and simulated gene expression data. We show that both methods eliminate local dependencies between GO terms and point to relevant areas in the GO graph that remain undetected with state-of-the-art algorithms for scoring functional terms. A simulation study demonstrates that the new methods exhibit a higher level of detecting relevant biological terms than competing methods.},
author = {Alexa, Adrian and Rahnenf\"uhrer, J\"org and Lengauer, Thomas},
date = {2006-07-01},
doi = {10/bzj9v5},
eprint = {16606683},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Alexa et al. - 2006 - Improved scoring of functional groups from gene ex.pdf},
issn = {1367-4803},
journaltitle = {Bioinformatics (Oxford, England)},
keywords = {Algorithms,Cluster Analysis,Computational Biology,Computational Biology: methods,Databases; Genetic,Gene Expression Profiling,Gene Expression Regulation,Gene Expression Regulation; Neoplastic,Humans,Leukemia,Leukemia: metabolism,Models; Statistical,Oligonucleotide Array Sequence Analysis,Protein Folding},
number = {13},
pages = {1600-7},
title = {Improved Scoring of Functional Groups from Gene Expression Data by Decorrelating {{GO}} Graph Structure.},
volume = {22}
}
@article{Alexeyenko2012a,
abstract = {BACKGROUND: Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis.
RESULTS: We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study.
CONCLUSIONS: The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps.},
author = {Alexeyenko, Andrey and Lee, Woojoo and Pernemalm, Maria and Guegan, Justin and Dessen, Philippe and Lazar, Vladimir and Lehti\"o, Janne and Pawitan, Yudi},
date = {2012-01},
doi = {10/f22bh8},
eprint = {22966941},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Alexeyenko et al. - 2012 - Network enrichment analysis extension of gene-set.pdf},
issn = {1471-2105},
journaltitle = {BMC bioinformatics},
keywords = {Algorithms,Gene Expression,Gene Expression Profiling,Gene Expression Profiling: methods,Gene Regulatory Networks,Humans,Lung Neoplasms,Lung Neoplasms: genetics,Lung Neoplasms: metabolism,Protein Biosynthesis,Proteomics,Proteomics: methods},
pages = {226},
title = {Network Enrichment Analysis: Extension of Gene-Set Enrichment Analysis to Gene Networks.},
volume = {13}
}
@article{Allison2006,
abstract = {In just a few years, microarrays have gone from obscurity to being almost ubiquitous in biological research. At the same time, the statistical methodology for microarray analysis has progressed from simple visual assessments of results to a weekly deluge of papers that describe purportedly novel algorithms for analysing changes in gene expression. Although the many procedures that are available might be bewildering to biologists who wish to apply them, statistical geneticists are recognizing commonalities among the different methods. Many are special cases of more general models, and points of consensus are emerging about the general approaches that warrant use and elaboration.},
author = {Allison, David B and Cui, Xiangqin and Page, Grier P and Sabripour, Mahyar},
date = {2006-01},
doi = {10/frn2j4},
eprint = {16369572},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Allison et al. - 2006 - Microarray data analysis from disarray to consoli.pdf},
issn = {1471-0056},
journaltitle = {Nature reviews. Genetics},
keywords = {Algorithms,Cluster Analysis,Computational Biology,Computational Biology: methods,Computer Simulation,Data Interpretation; Statistical,DNA; Complementary,DNA; Complementary: metabolism,Gene Expression Profiling,Gene Expression Profiling: methods,Gene Expression Regulation,Genetic Techniques,Genetics,Humans,Microarray Analysis,Models; Biological,Models; Statistical,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: methods,RNA; Messenger,RNA; Messenger: metabolism},
number = {1},
pages = {55-65},
title = {Microarray Data Analysis: From Disarray to Consolidation and Consensus.},
volume = {7}
}
@article{Amemiya2019,
abstract = {Functional genomics assays based on high-throughput sequencing greatly expand our ability to understand the genome. Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment. The removal of the ENCODE blacklist is an essential quality measure when analyzing functional genomics data.},
author = {Amemiya, Haley M. and Kundaje, Anshul and Boyle, Alan P.},
date = {2019},
doi = {10/gf4jsb},
file = {/Users/ryan/Documents/Zotero Library/Amemiya et al. - 2019 - The ENCODE Blacklist Identification of Problemati.pdf},
isbn = {4159801945839},
issn = {20452322},
journaltitle = {Scientific Reports},
number = {1},
pages = {1-5},
title = {The {{ENCODE Blacklist}}: {{Identification}} of {{Problematic Regions}} of the {{Genome}}},
volume = {9}
}
@article{Anders2010,
abstract = {High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.},
author = {Anders, Simon and Huber, Wolfgang},
date = {2010-10-27},
doi = {10/btmbk5},
eprint = {20979621},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Anders and Huber - 2010 - Differential expression analysis for sequence coun.pdf;/Users/ryan/Documents/Zotero Library/Anders and Huber - 2010 - Differential expression analysis for sequence coun2.pdf;/Users/ryan/Documents/Zotero Library/Anders and Huber - 2010 - Differential expression analysis for sequence coun3.pdf;/Users/ryan/Zotero/storage/I65673QD/gb-2010-11-10-r106.html},
issn = {1474-760X},
journaltitle = {Genome Biology},
keywords = {Animals,Binomial Distribution,Chromatin Immunoprecipitation,Chromatin Immunoprecipitation: methods,Computational Biology,Computational Biology: methods,Drosophila,Drosophila: genetics,Gene Expression Profiling,Gene Expression Profiling: methods,Genetic,High-Throughput Nucleotide Sequencing,High-Throughput Nucleotide Sequencing: methods,Linear Models,Models,RNA,RNA: methods,Saccharomyces cerevisiae,Saccharomyces cerevisiae: genetics,Sequence Analysis,Stem Cells,Tissue Culture Techniques},
number = {10},
pages = {R106},
shortjournal = {Genome Biology},
title = {Differential Expression Analysis for Sequence Count Data},
volume = {11}
}
@article{Anders2012,
abstract = {RNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential iso- form abundance in comparisons between conditions, cell types or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-Seq data. DEXSeq employs generalized linear models and offers re- liable control of false discoveries by taking biological variation into account. DEXSeq detect genes, and in many cases specific exons, that are subject to differential exon usage with high sensitivity. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.},
author = {Anders, Simon and Reyes, Alejandro and Huber, Wolfgang},
date = {2012},
doi = {10/ggcxjt},
file = {/Users/ryan/Documents/Zotero Library/Anders et al. - 2012 - Detecting differential usage of exons from RNA-seq.pdf},
journaltitle = {Genome Research},
pages = {1-30},
title = {Detecting Differential Usage of Exons from {{RNA}}-Seq Data}
}
@online{Anders2013,
author = {Anders, Simon},
date = {2013},
keywords = {\#nosource},
title = {{{HTSeq}}: {{Analysing}} High-Throughput Sequencing Data with {{Python}}},
url = {http://www-huber.embl.de/users/anders/HTSeq/doc/index.html}
}
@article{Anders2013a,
abstract = {RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be {$<$}1 h, with computation time {$<$}1 d using a standard desktop PC.},
author = {Anders, Simon and McCarthy, Davis J and Chen, Yunshun and Okoniewski, Michal and Smyth, Gordon K and Huber, Wolfgang and Robinson, Mark D},
date = {2013-09},
doi = {10/f4794j},
eprint = {23975260},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Anders et al. - 2013 - Count-based differential expression analysis of RN.pdf},
issn = {1750-2799},
journaltitle = {Nature protocols},
keywords = {Base Sequence,Computational Biology,Computational Biology: methods,Gene Expression Profiling,Gene Expression Profiling: methods,Sequence Analysis; RNA,Sequence Analysis; RNA: methods,Software,Workflow},
number = {9},
pages = {1765-86},
title = {Count-Based Differential Expression Analysis of {{RNA}} Sequencing Data Using {{R}} and {{Bioconductor}}.},
volume = {8}
}
@article{Anders2014,
abstract = {Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq},
author = {Anders, S. and Pyl, P. T. and Huber, W.},
date = {2015-01-15},
doi = {10/f6v7kx},
eprint = {25260700},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Anders et al. - 2015 - HTSeq--a Python framework to work with high-throug.pdf},
isbn = {1367-4811 (Electronic) 1367-4803 (Linking)},
issn = {1367-4803},
journaltitle = {Bioinformatics},
number = {2},
pages = {166-169},
title = {{{HTSeq}}--a {{Python}} Framework to Work with High-Throughput Sequencing Data},
volume = {31}
}
@article{Ankrum2014,
abstract = {The diverse immunomodulatory properties of mesenchymal stem/stromal cells (MSCs) may be exploited for treatment of a multitude of inflammatory conditions. MSCs have long been reported to be hypoimmunogenic or 'immune privileged'; this property is thought to enable MSC transplantation across major histocompatibility barriers and the creation of off-the-shelf therapies consisting of MSCs grown in culture. However, recent studies describing generation of antibodies against and immune rejection of allogeneic donor MSCs suggest that MSCs may not actually be immune privileged. Nevertheless, whether rejection of donor MSCs influences the efficacy of allogeneic MSC therapies is not known, and no definitive clinical advantage of autologous MSCs over allogeneic MSCs has been demonstrated to date. Although MSCs may exert therapeutic function through a brief 'hit and run' mechanism, protecting MSCs from immune detection and prolonging their persistence in vivo may improve clinical outcomes and prevent patient sensitization toward donor antigens. \textcopyright{} 2014 Nature America, Inc.},
author = {Ankrum, James A. and Ong, Joon Faii and Karp, Jeffrey M.},
date = {2014},
doi = {10/f5vjkk},
file = {/Users/ryan/Documents/Zotero Library/Ankrum et al. - 2014 - Mesenchymal stem cells Immune evasive, not immune.pdf},
issn = {15461696},
journaltitle = {Nature Biotechnology},
number = {3},
pages = {252-260},
title = {Mesenchymal Stem Cells: {{Immune}} Evasive, Not Immune Privileged},
volume = {32}
}
@article{Aponte2011,
abstract = {Two intermingled hypothalamic neuron populations specified by expression of agouti-related peptide (AGRP) or pro-opiomelanocortin (POMC) positively and negatively influence feeding behavior, respectively, possibly by reciprocally regulating downstream melanocortin receptors. However, the sufficiency of these neurons to control behavior and the relationship of their activity to the magnitude and dynamics of feeding are unknown. To measure this, we used channelrhodopsin-2 for cell type-specific photostimulation. Activation of only 800 AGRP neurons in mice evoked voracious feeding within minutes. The behavioral response increased with photoexcitable neuron number, photostimulation frequency and stimulus duration. Conversely, POMC neuron stimulation reduced food intake and body weight, which required melanocortin receptor signaling. However, AGRP neuron-mediated feeding was not dependent on suppressing this melanocortin pathway, indicating that AGRP neurons directly engage feeding circuits. Furthermore, feeding was evoked selectively over drinking without training or prior photostimulus exposure, which suggests that AGRP neurons serve a dedicated role coordinating this complex behavior.},
author = {Aponte, Yexica and Atasoy, Deniz and Sternson, Scott M},
date = {2011-03},
doi = {10/fwh8kz},
eprint = {21209617},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Aponte et al. - 2011 - AGRP neurons are sufficient to orchestrate feeding.pdf},
issn = {1546-1726},
journaltitle = {Nature neuroscience},
keywords = {Agouti-Related Protein,Agouti-Related Protein: genetics,Agouti-Related Protein: metabolism,Animal,Animal: physiology,Animals,Behavior,Classical,Classical: physiology,Conditioning,Eating,Feeding Behavior,Feeding Behavior: physiology,Hypothalamus,Hypothalamus: cytology,Hypothalamus: metabolism,Melanocortins,Melanocortins: metabolism,Mice,Neurons,Neurons: metabolism,Photic Stimulation,Pro-Opiomelanocortin,Pro-Opiomelanocortin: metabolism,Recombinant Fusion Proteins,Recombinant Fusion Proteins: genetics,Recombinant Fusion Proteins: metabolism,Rhodopsin,Rhodopsin: genetics,Rhodopsin: metabolism},
number = {3},
pages = {351-5},
title = {{{AGRP}} Neurons Are Sufficient to Orchestrate Feeding Behavior Rapidly and without Training.},
volume = {14}
}
@article{Argelaguet,
author = {Argelaguet, Ricard and Velten, Britta and Arnol, Damien and Dietrich, Sascha and Zenz, Thorsten and Marioni, John C and Buettner, Florian and Huber, Wolfgang and Stegle, Oliver},
file = {/Users/ryan/Documents/Zotero Library/Argelaguet et al. - Methods for Multi-Omics factor analysis disentan.pdf},
pages = {1-16},
title = {Methods for : {{Multi}}-{{Omics}} Factor Analysis Disentangles Heterogeneity in Blood Cancer {{Multi}}-{{Omics Factor Analysis}} Model}
}
@article{Argelaguet2017b,
author = {Argelaguet, Ricard and Velten, Britta and Arnol, Damien and Dietrich, Sascha and Zenz, Thorsten and Marioni, John C and Buettner, Florian and Huber, Wolfgang and Stegle, Oliver},
date = {2017},
doi = {10/gfvttf},
file = {/Users/ryan/Documents/Zotero Library/Argelaguet et al. - 2017 - Multi-Omics factor analysis disentangles heterogen.pdf},
issue = {Cll},
pages = {1-16},
title = {Multi-{{Omics}} Factor Analysis Disentangles Heterogeneity in \mbox{} \mbox{} Blood \mbox{} \mbox{} Cancer}
}
@article{Argelaguet2018,
abstract = {Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.},
author = {Argelaguet, Ricard and Velten, Britta and Arnol, Damien and Dietrich, Sascha and Zenz, Thorsten and Marioni, John C. and Buettner, Florian and Huber, Wolfgang and Stegle, Oliver},
date = {2018-06-20},
doi = {10/gdqq3f},
eprint = {29925568},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Argelaguet et al. - 2018 - Multi‐Omics Factor Analysis—a framework for unsupe.pdf},
issn = {1744-4292},
journaltitle = {Molecular Systems Biology},
keywords = {biology,data integration,dimensionality reduction,genome-scale,integrative,methods,multi-omics,personalized medicine,resources,single-cell omics,subject categories computational biology},
number = {6},
pages = {1-13},
title = {Multi-{{Omics Factor Analysis}}\textemdash{}a Framework for Unsupervised Integration of Multi-omics Data Sets},
volume = {14}
}
@article{Arnaud2016,
abstract = {Transcriptome studies based on quantitative sequencing can estimate levels of gene expression by measuring target RNA abundance in sequencing libraries. Sequencing costs are proportional to the total number of sequenced reads, and in order to cover rare RNAs, considerable quantities of abundant and identical reads are needed. This major limitation can be addressed by depleting a proportion of the most abundant sequences from the library. However, such depletion strategies involve either extra handling of the input RNA sample or use of a large number of reverse transcription primers, termed not-so-random (NSR) primers, which are costly to synthesize. Taking advantage of the high tolerance of reverse transcriptase to mis-prime, we found that it is possible to use as few as 40 pseudo-random (PS) reverse transcription primers to decrease the rate of undesirable abundant sequences within a library without affecting the overall transcriptome diversity. PS primers are simple to design and can be used to deplete several undesirable RNAs simultaneously, thus creating a flexible tool for enriching transcriptome libraries for rare transcript sequences.},
author = {Arnaud, Oph\'elie and Kato, Sachi and Poulain, St\'ephane and Plessy, Charles},
date = {2016-04-01},
doi = {10/ggcxjv},
eprint = {27071605},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Arnaud et al. - 2016 - Targeted reduction of highly abundant transcripts .pdf},
issn = {1940-9818},
journaltitle = {BioTechniques},
keywords = {high-throughput sequencing,nanoCAGE,rRNA,undesirable sequences},
number = {4},
pages = {169-74},
title = {Targeted Reduction of Highly Abundant Transcripts Using Pseudo-Random Primers},
volume = {60}
}
@article{Aryee2014,
abstract = {Motivation: The recently released Infinium HumanMethylation450 array (the '450k' array) provides a high-throughput assay to quantify DNA methylation (DNAm) at {$\sim$}450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. \textcopyright{} The Author 2014.},
author = {Aryee, Martin J. and Jaffe, Andrew E. and Corrada-Bravo, Hector and Ladd-Acosta, Christine and Feinberg, Andrew P. and Hansen, Kasper D. and Irizarry, Rafael A.},
date = {2014-05-15},
doi = {10/f3m42q},
file = {/Users/ryan/Documents/Zotero Library/Aryee et al. - 2014 - Minfi A flexible and comprehensive Bioconductor p.pdf},
issn = {14602059},
journaltitle = {Bioinformatics},
number = {10},
pages = {1363-1369},
title = {Minfi: {{A}} Flexible and Comprehensive {{Bioconductor}} Package for the Analysis of {{Infinium DNA}} Methylation Microarrays},
volume = {30}
}
@article{Aschoff2013,
abstract = {MOTIVATION: Alternative splicing is central for cellular processes and substantially increases transcriptome and proteome diversity. Aberrant splicing events often have pathological consequences and are associated with various diseases and cancer types. The emergence of next-generation RNA sequencing (RNA-seq) provides an exciting new technology to analyse alternative splicing on a large scale. However, algorithms that enable the analysis of alternative splicing from short-read sequencing are not fully established yet and there are still no standard solutions available for a variety of data analysis tasks.
RESULTS: We present a new method and software to predict genes that are differentially spliced between two different conditions using RNA-seq data. Our method uses geometric angles between the high dimensional vectors of exon read counts. With this, differential splicing can be detected even if the splicing events are composed of higher complexity and involve previously unknown splicing patterns. We applied our approach to two case studies including neuroblastoma tumour data with favourable and unfavourable clinical courses. We show the validity of our predictions as well as the applicability of our method in the context of patient clustering. We verified our predictions by several methods including simulated experiments and complementary in silico analyses. We found a significant number of exons with specific regulatory splicing factor motifs for predicted genes and a substantial number of publications linking those genes to alternative splicing. Furthermore, we could successfully exploit splicing information to cluster tissues and patients. Finally, we found additional evidence of splicing diversity for many predicted genes in normalized read coverage plots and in reads that span exon-exon junctions.
AVAILABILITY: SplicingCompass is licensed under the GNU GPL and freely available as a package in the statistical language R at http://www.ichip.de/software/SplicingCompass.html},
author = {Aschoff, Moritz and Hotz-Wagenblatt, Agnes and Glatting, Karl-Heinz and Fischer, Matthias and Eils, Roland and K\"onig, Rainer},
date = {2013-05-01},
doi = {10/f4w547},
eprint = {23449093},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Aschoff et al. - 2013 - SplicingCompass differential splicing detection u.pdf},
issn = {1367-4811},
journaltitle = {Bioinformatics (Oxford, England)},
number = {9},
pages = {1141-8},
title = {{{SplicingCompass}}: Differential Splicing Detection Using {{RNA}}-Seq Data.},
volume = {29}
}
@article{Au2013,
abstract = {Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.},
author = {Au, Kin Fai and Sebastiano, Vittorio and Afshar, Pegah Tootoonchi and Durruthy, Jens Durruthy and Lee, Lawrence and a Williams, Brian and van Bakel, Harm and Schadt, Eric E and a Reijo-Pera, Renee and Underwood, Jason G and Wong, Wing Hung},
date = {2013-11-26},
doi = {10/f5jn67},
eprint = {24282307},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Au et al. - 2013 - Characterization of the human ESC transcriptome by.pdf},
issn = {1091-6490},
journaltitle = {Proceedings of the National Academy of Sciences of the United States of America},
options = {useprefix=true},
title = {Characterization of the Human {{ESC}} Transcriptome by Hybrid Sequencing.}
}
@article{Auer2010,
abstract = {Next-generation sequencing technologies are quickly becoming the preferred approach for characterizing and quantifying entire genomes. Even though data produced from these technologies are proving to be the most informative of any thus far, very little attention has been paid to fundamental design aspects of data collection and analysis, namely sampling, randomization, replication, and blocking. We discuss these concepts in an RNA sequencing framework. Using simulations we demonstrate the benefits of collecting replicated RNA sequencing data according to well known statistical designs that partition the sources of biological and technical variation. Examples of these designs and their corresponding models are presented with the goal of testing differential expression.},
author = {Auer, Paul L and Doerge, R W},
date = {2010-06},
doi = {10/dqrqxw},
eprint = {20439781},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Auer and Doerge - 2010 - Statistical design and analysis of RNA sequencing .pdf},
issn = {1943-2631},
journaltitle = {Genetics},
keywords = {Base Sequence,Clinical Laboratory Techniques,Research,Research: methods},
number = {2},
pages = {405-16},
title = {Statistical Design and Analysis of {{RNA}} Sequencing Data.},
volume = {185}
}
@article{aulettaPerspectivesEmergingRoles,
abstract = {Abstract
Multipotent, bone marrow\textendash{}derived stromal cells (BMSCs, also known as mesenchymal stem cells [MSCs]), are culture-expanded, nonhematopoietic cells with immunomodulatory effects currently being investigated as novel cellular therapy to prevent and to treat clinical disease associated with aberrant immune response. Emerging preclinical studies suggest that BMSCs may protect against infectious challenge either by direct effects on the pathogen or through indirect effects on the host. BMSCs may reduce pathogen burden by inhibiting growth through soluble factors or by enhancing immune cell antimicrobial function. In the host, BMSCs may attenuate pro-inflammatory cytokine and chemokine induction, reduce pro-inflammatory cell migration into sites of injury and infection, and induce immunoregulatory soluble and cellular factors to preserve organ function. These preclinical studies provide provocative hints into the direction MSC therapeutics may take in the future. Notably, BMSCs appear to function as a critical fulcrum, providing balance by promoting pathogen clearance during the initial inflammatory response while suppressing inflammation to preserve host integrity and facilitate tissue repair. Such exquisite balance in BMSC function appears intrinsically linked to Toll-like receptor signaling and immune crosstalk.},
author = {Auletta, Jeffery J. and Deans, Robert J. and Bartholomew, Amelia M.},
doi = {10/fzmb9t},
file = {/Users/ryan/Documents/Zotero Library/Auletta et al. - Perspectives Emerging roles for multipotent, bone.pdf},
ids = {aulettaEmergingRolesMultipotent2012},
journaltitle = {Cell},
keywords = {cyno-project},
pages = {1-33},
title = {Perspectives: {{Emerging}} Roles for Multipotent, Bone Marrow-Derived Stromal Cells in Host Defense},
volume = {598}
}
@article{Bailey2013,
abstract = {Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.},
author = {Bailey, Timothy and Krajewski, Pawel and Ladunga, Istvan and Lefebvre, Celine and Li, Qunhua and Liu, Tao and Madrigal, Pedro and Taslim, Cenny and Zhang, Jie},
date = {2013-11},
doi = {10/gfr9pq},
eprint = {24244136},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bailey et al. - 2013 - Practical guidelines for the comprehensive analysi.pdf},
issn = {1553-7358},
journaltitle = {PLoS computational biology},
number = {11},
pages = {e1003326},
title = {Practical Guidelines for the Comprehensive Analysis of {{ChIP}}-Seq Data.},
volume = {9}
}
@article{Bair2004,
abstract = {An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature for performing this type of diagnosis. Unfortunately, most of these techniques assume that different subtypes of cancer are already known to exist. Their utility is limited when such subtypes have not been previously identified. Although methods for identifying such subtypes exist, these methods do not work well for all datasets. It would be desirable to develop a procedure to find such subtypes that is applicable in a wide variety of circumstances. Even if no information is known about possible subtypes of a certain form of cancer, clinical information about the patients, such as their survival time, is often available. In this study, we develop some procedures that utilize both the gene expression data and the clinical data to identify subtypes of cancer and use this knowledge to diagnose future patients. These procedures were successfully applied to several publicly available datasets. We present diagnostic procedures that accurately predict the survival of future patients based on the gene expression profile and survival times of previous patients. This has the potential to be a powerful tool for diagnosing and treating cancer.},
author = {Bair, Eric and Tibshirani, Robert},
date = {2004-04},
doi = {10/c4qv69},
eprint = {15094809},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bair and Tibshirani - 2004 - Semi-supervised methods to predict patient surviva.pdf},
issn = {1545-7885},
journaltitle = {PLoS biology},
keywords = {Breast Neoplasms,Breast Neoplasms: metabolism,Breast Neoplasms: mortality,Cluster Analysis,Computer Simulation,Data Interpretation; Statistical,Databases; Factual,Gene Expression Profiling,Humans,Models; Statistical,Neoplasms,Neoplasms: metabolism,Neoplasms: mortality,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: methods,Principal Component Analysis,Prognosis,Software,Time Factors,Treatment Outcome},
number = {4},
pages = {E108},
title = {Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data.},
volume = {2}
}
@article{Barbie2009,
abstract = {The proto-oncogene KRAS is mutated in a wide array of human cancers, most of which are aggressive and respond poorly to standard therapies. Although the identification of specific oncogenes has led to the development of clinically effective, molecularly targeted therapies in some cases, KRAS has remained refractory to this approach. A complementary strategy for targeting KRAS is to identify gene products that, when inhibited, result in cell death only in the presence of an oncogenic allele. Here we have used systematic RNA interference to detect synthetic lethal partners of oncogenic KRAS and found that the non-canonical IkappaB kinase TBK1 was selectively essential in cells that contain mutant KRAS. Suppression of TBK1 induced apoptosis specifically in human cancer cell lines that depend on oncogenic KRAS expression. In these cells, TBK1 activated NF-kappaB anti-apoptotic signals involving c-Rel and BCL-XL (also known as BCL2L1) that were essential for survival, providing mechanistic insights into this synthetic lethal interaction. These observations indicate that TBK1 and NF-kappaB signalling are essential in KRAS mutant tumours, and establish a general approach for the rational identification of co-dependent pathways in cancer.},
author = {a Barbie, David and Tamayo, Pablo and Boehm, Jesse S and Kim, So Young and Moody, Susan E and Dunn, Ian F and Schinzel, Anna C and Sandy, Peter and Meylan, Etienne and Scholl, Claudia and Fr\"ohling, Stefan and Chan, Edmond M and Sos, Martin L and Michel, Kathrin and Mermel, Craig and Silver, Serena J and a Weir, Barbara and Reiling, Jan H and Sheng, Qing and Gupta, Piyush B and Wadlow, Raymond C and Le, Hanh and Hoersch, Sebastian and Wittner, Ben S and Ramaswamy, Sridhar and Livingston, David M and Sabatini, David M and Meyerson, Matthew and Thomas, Roman K and Lander, Eric S and Mesirov, Jill P and Root, David E and Gilliland, D Gary and Jacks, Tyler and Hahn, William C},
date = {2009-11-05},
doi = {10/frdz3h},
eprint = {19847166},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Barbie et al. - 2009 - Systematic RNA interference reveals that oncogenic.pdf},
issn = {1476-4687},
journaltitle = {Nature},
keywords = {Alleles,Apoptosis,bcl-X Protein,bcl-X Protein: metabolism,Cell Line; Tumor,Cell Survival,Gene Expression Profiling,Genes; Lethal,Genes; ras,Genes; ras: genetics,Humans,Lung Neoplasms,Lung Neoplasms: genetics,Lung Neoplasms: metabolism,Lung Neoplasms: pathology,Neoplasms,Neoplasms: genetics,Neoplasms: metabolism,Neoplasms: pathology,Oncogene Protein p21(ras),Oncogene Protein p21(ras): genetics,Oncogene Protein p21(ras): metabolism,Protein-Serine-Threonine Kinases,Protein-Serine-Threonine Kinases: antagonists & in,Protein-Serine-Threonine Kinases: metabolism,Proto-Oncogene Proteins c-rel,Proto-Oncogene Proteins c-rel: metabolism,RNA Interference,Signal Transduction},
number = {7269},
pages = {108-12},
title = {Systematic {{RNA}} Interference Reveals That Oncogenic {{KRAS}}-Driven Cancers Require {{TBK1}}.},
volume = {462}
}
@article{Barski2007,
abstract = {Histone modifications are implicated in influencing gene expression. We have generated high-resolution maps for the genome-wide distribution of 20 histone lysine and arginine methylations as well as histone variant H2A.Z, RNA polymerase II, and the insulator binding protein CTCF across the human genome using the Solexa 1G sequencing technology. Typical patterns of histone methylations exhibited at promoters, insulators, enhancers, and transcribed regions are identified. The monomethylations of H3K27, H3K9, H4K20, H3K79, and H2BK5 are all linked to gene activation, whereas trimethylations of H3K27, H3K9, and H3K79 are linked to repression. H2A.Z associates with functional regulatory elements, and CTCF marks boundaries of histone methylation domains. Chromosome banding patterns are correlated with unique patterns of histone modifications. Chromosome breakpoints detected in T cell cancers frequently reside in chromatin regions associated with H3K4 methylations. Our data provide new insights into the function of histone methylation and chromatin organization in genome function.},
author = {Barski, Artem and Cuddapah, Suresh and Cui, Kairong and Roh, Tae-Young and Schones, Dustin E and Wang, Zhibin and Wei, Gang and Chepelev, Iouri and Zhao, Keji},
date = {2007-05-18},
doi = {10/dvv94h},
eprint = {17512414},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Barski et al. - 2007 - High-resolution profiling of histone methylations .pdf},
issn = {0092-8674},
journaltitle = {Cell},
keywords = {Chromatin,Chromatin: genetics,Chromatin: ultrastructure,Chromosome Breakage,Enhancer Elements; Genetic,Enhancer Elements; Genetic: genetics,Epigenesis; Genetic,Epigenesis; Genetic: genetics,Gene Expression Profiling,Gene Expression Profiling: methods,Gene Expression Regulation,Gene Expression Regulation: genetics,Genome; Human,Genome; Human: genetics,Histone-Lysine N-Methyltransferase,Histone-Lysine N-Methyltransferase: metabolism,Histones,Histones: genetics,Histones: metabolism,Humans,Lymphoma,Lymphoma: genetics,Methylation,Promoter Regions; Genetic,Promoter Regions; Genetic: genetics,Protein Methyltransferases,Regulatory Elements; Transcriptional,Regulatory Elements; Transcriptional: genetics,RNA Polymerase II,RNA Polymerase II: metabolism,Transcriptional Activation,Transcriptional Activation: genetics},
number = {4},
pages = {823-37},
title = {High-Resolution Profiling of Histone Methylations in the Human Genome.},
volume = {129}
}
@article{Bartholome2009a,
abstract = {In order to handle and interpret the vast amounts of data produced by microarray experiments, the analysis of sets of genes with a common biological functionality has been shown to be advantageous compared to single gene analyses. Some statistical methods have been proposed to analyse the differential gene expression of gene sets in microarray experiments. However, most of these methods either require threshhold values to be chosen for the analysis, or they need some reference set for the determination of significance. We present a method that estimates the number of differentially expressed genes in a gene set without requiring a threshold value for significance of genes. The method is self-contained (i.e., it does not require a reference set for comparison). In contrast to other methods which are focused on significance, our approach emphasizes the relevance of the regulation of gene sets. The presented method measures the degree of regulation of a gene set and is a useful tool to compare the induction of different gene sets and place the results of microarray experiments into the biological context. An R-package is available.},
author = {Bartholom\'e, Kilian and Kreutz, Clemens and Timmer, Jens},
date = {2009-07},
doi = {10/ftj4zb},
eprint = {19580524},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bartholomé et al. - 2009 - Estimation of gene induction enables a relevance-b.pdf},
issn = {1557-8666},
journaltitle = {Journal of computational biology : a journal of computational molecular cell biology},
keywords = {Gene Expression Profiling,Gene Expression Profiling: methods,Gene Expression Regulation,Models; Biological,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: methods},
number = {7},
pages = {959-67},
title = {Estimation of Gene Induction Enables a Relevance-Based Ranking of Gene Sets.},
volume = {16}
}
@article{Bartholomew2009,
abstract = {Mesenchymal stem cells directly suppress ongoing immune responses. Through production of toleragenic cytokines, inhibition of lymphocyte proliferation, delivery of reparative and protective signals after reperfusion injury, and facilitation of hematopoietic chimerism, these cells demonstrate a wide-ranging potential for the development of multifaceted toleragenic strategies after transplantation.},
author = {Bartholomew, Amelia and Polchert, David and Szilagyi, Erzsebet and Douglas, G. W. and Kenyon, Norma},
date = {2009-05},
doi = {10/cf45q7},
file = {/Users/ryan/Documents/Zotero Library/Bartholomew et al. - 2009 - Mesenchymal stem cells in the induction of transpl.pdf},
issn = {15346080},
issue = {9 Suppl},
journaltitle = {Transplantation},
keywords = {87,and malignancy,as the ability of,immunoprotective responses to infection,mesenchymal stem cells,olerance may be defined,s55,s57,the host to retain,tolerance,transplantation,transplantation 2009},
pages = {S55-S57},
title = {Mesenchymal Stem Cells in the Induction of Transplantation Tolerance.},
volume = {87}
}
@misc{Bekiranov2009,
author = {Bekiranov, Stefan},
date = {2009-06-15},
file = {/Users/ryan/Documents/Zotero Library/Bekiranov - 2009 - Introduction to ChIP-Seq Analysis using the SPP Pa.pdf},
keywords = {presentation},
langid = {english},
note = {Pages: 1-8},
title = {Introduction to {{ChIP}}-{{Seq Analysis}} Using the {{SPP Package}}}
}
@article{Beleites2013,
abstract = {In biospectroscopy, suitably annotated and statistically independent samples (e.g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5-25 independent samples per class. Although the classification models achieve acceptable performance, the learning curve can be completely masked by the random testing uncertainty due to the equally limited test sample size. In consequence, we determine test sample sizes necessary to achieve reasonable precision in the validation and find that 75-100 samples will usually be needed to test a good but not perfect classifier. Such a data set will then allow refined sample size planning on the basis of the achieved performance. We also demonstrate how to calculate necessary sample sizes in order to show the superiority of one classifier over another: this often requires hundreds of statistically independent test samples or is even theoretically impossible. We demonstrate our findings with a data set of ca. 2550 Raman spectra of single cells (five classes: erythrocytes, leukocytes and three tumour cell lines BT-20, MCF-7 and OCI-AML3) as well as by an extensive simulation that allows precise determination of the actual performance of the models in question.},
author = {Beleites, Claudia and Neugebauer, Ute and Bocklitz, Thomas and Krafft, Christoph and Popp, J\"urgen},
date = {2013-01-14},
doi = {10/gf8kzx},
eprint = {23265730},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Beleites et al. - 2013 - Sample size planning for classification models..pdf},
issn = {1873-4324},
issue = {June 2012},
journaltitle = {Analytica chimica acta},
keywords = {Cells; Cultured,Erythrocytes,Erythrocytes: chemistry,Erythrocytes: classification,Erythrocytes: cytology,Humans,Leukocytes,Leukocytes: chemistry,Leukocytes: classification,Leukocytes: cytology,MCF-7 Cells,Models; Theoretical,Sample Size,Spectrum Analysis; Raman},
pages = {25-33},
title = {Sample Size Planning for Classification Models.},
volume = {760}
}
@article{Benjamini1995,
author = {Benjamini, Y and Hochberg, Y},
date = {1995},
eprint = {10.2307/2346101},
eprinttype = {jstor},
journaltitle = {Journal of the Royal Statistical Society. Series B \ldots{}},
keywords = {\#nosource,⛔ No DOI found},
title = {Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing}
}
@article{Benjamini1997,
abstract = {ABSTRACT. In this paper we offer a multiplicity of approaches and procedures for multiple testing problems with weights. Some rationale for incorporating weights in multiple hypotheses testing are discussed. Various type-I error-rates and different possible ... \textbackslash{}n},
author = {Benjamini, Yoav and Hochberg, Yosef},
date = {1997},
doi = {10/btjgsv},
file = {/Users/ryan/Documents/Zotero Library/Benjamini and Hochberg - 1997 - Multiple Hypotheses Testing with Weights.pdf},
isbn = {0303-6898},
issn = {03036898, 14679469},
journaltitle = {Scandinavian Journal of Statistics},
keywords = {control weights,false discovery rate,family-wise error-rate,p -values,per-family error-rate,procedural weights},
number = {3},
pages = {407-418},
title = {Multiple {{Hypotheses Testing}} with {{Weights}}},
volume = {24}
}
@article{Benjamini2016,
abstract = {Scientists use high-dimensional measurement assays to detect and prioritize regions of strong signal in a spatially organized domain. Examples include finding methylation enriched genomic regions using microarrays and identifying active cortical areas using brain-imaging. The most common procedure for detecting potential regions is to group together neighboring sites where the signal passed a threshold. However, one needs to account for the selection bias induced by this opportunistic procedure to avoid diminishing e{$\carriagereturn$}ects when generalizing to a population. In this paper, we present a model and a method that permit population inference for these de-tected regions. In particular, we provide non-asymptotic point and confidence interval estimates for mean e{$\carriagereturn$}ect in the region, which account for the local selection bias and the non-stationary covariance that is typical of these data. Such summaries allow researchers to better compare regions of di{$\carriagereturn$}erent sizes and di{$\carriagereturn$}erent correlation structures. Inference is provided within a conditional one-parameter exponential family for each region, with truncations that match the constraints of selection. A secondary screening-and-adjustment step allows pruning the set of detected regions, while controlling the false-coverage rate for the set of regions that are re-ported. We illustrate the benefits of the method by applying it to detected genomic regions with di{$\carriagereturn$}ering DNA-methylation rates across tissue types. Our method is shown to provide superior power compared to non-parametric approaches.},
author = {Benjamini, Yuval and Taylor, Jonathan and Irizarry, Rafael A},
date = {2019-07-03},
doi = {10/ggcxjw},
issn = {0162-1459},
journaltitle = {Journal of the American Statistical Association},
keywords = {\#nosource},
number = {527},
pages = {1351-1365},
title = {Selection-{{Corrected Statistical Inference}} for {{Region Detection With High}}-{{Throughput Assays}}},
volume = {114}
}
@article{Berard2002,
abstract = {Mature T cells are produced in the thymus and released into the bloodstream in low numbers. These cells are considered to be immunologically na\i\textasciidieresis{}ve until such time as they encounter MHC-peptide complexes for which their T-cell receptors (TCR) have high affinity. Recognition of antigen in appropriate form, i.e. in association with costimulatory signals on the surface of professional antigen-presenting cells (APCs), leads to extensive T-cell proliferation and differentiation into effector cells. Once the infection has been cleared, it is no longer of benefit to the host to maintain high numbers of effector cells and most of the activated T cells die by apoptosis. However, a proportion of these cells survive, leaving the frequency of cells specific for the priming antigen much higher among memory T cells than that which existed among na\i\textasciidieresis{}ve T cells. This difference in frequency makes a major contribution to the nature of the secondary response, which is typically faster and of greater magnitude than the primary response. In addition, T cells may also carry a true `memory' of a prior response to antigen, exhibiting differences from na\i\textasciidieresis{}ve T cells at the single cell level. Here we provide a brief overview of the qualitative differences that have been reported to exist between na\i\textasciidieresis{}ve and memory T cells and evidence that memory T cells themselves are functionally heterogeneous. PHENOTYPIC DIFFERENCES BETWEEN NAI \textasciidieresis{}VE AND MEMORY T CELLS The supposition that na\i\textasciidieresis{}ve and memory T cells can be distinguished phenotypically is based on the notion that memory T cells retain a permanent imprint of having responded to antigen. Precise identification of memory T cells, however, remains problematic. Unlike B cells, T cells do not appear to mutate their antigen receptor genes during the course of an immune response. Furthermore, discrimi- nation between effector and memory T cells is accomplished Received 2 April 2002; accepted 17 April 2002. Correspondence: David F. Tough, The Edward Jenner Institute for Vaccine Research, Compton, Newbury, Berkshire RG20 7NN, UK. E-mail: [email protected] on},
author = {Berard, Marion and Tough, David F.},
date = {2002},
doi = {10/fc8shc},
file = {/Users/ryan/Documents/Zotero Library/Berard and Tough - 2002 - Qualitative differences between naïve and memory T.pdf},
issn = {00192805},
journaltitle = {Immunology},
number = {2},
pages = {127-138},
title = {Qualitative Differences between Na\"ive and Memory {{T}} Cells},
volume = {106}
}
@article{Berest2018,
abstract = {Transcription factor (TF) activity constitutes an important readout of cellular signalling pathways and thus for assessing regulatory differences across conditions. However, current technologies lack the ability to simultaneously assess activity changes for multiple TFs and in particular to determine whether a specific TF acts as repressor or activator. To this end, we introduce a widely applicable genome-wide method diffTF to assess differential TF binding activity and classifying TFs as activator or repressor by integrating any type of genome-wide chromatin with RNA-Seq data and in-silico predicted TF binding sites (available at https://git.embl.de/grp-zaugg/diffTF). We apply diffTF to a large ATAC-Seq dataset of mutated and unmutated chronic lymphocytic leukemia and identify dozens of TFs that are differentially active. Around 40\% of them have a previously described association with CLL while \textasciitilde{}60\% constitute potentially novel TFs driving the different CLL subtypes. Finally, we validated the method experimentally using the well studied system of hematopoietic differentiation in mouse.},
author = {Berest, Ivan and Arnold, Christian and Reyes-Palomares, Armando and Palla, Giovanni and Rasmussen, Kasper Dindler and Helin, Kristian and Zaugg, Judith B.},
date = {2018},
doi = {10/ggcxjx},
file = {/Users/ryan/Documents/Zotero Library/Berest et al. - 2018 - Quantification of differential transcription facto.pdf},
journaltitle = {bioRxiv},
pages = {368498},
title = {Quantification of Differential Transcription Factor Activity and Multiomic-Based Classification into Activators and Repressors: {{diffTF Novo Nordisk Foundation Center}} for {{Stem Cell Biology}}, {{Copenhagen}} * Equal Contribution}
}
@article{Berge2017,
author = {Berge, Koen Van Den and Soneson, Charlotte and Robinson, Mark D and Clement, Lieven},
date = {2017},
file = {/Users/ryan/Documents/Zotero Library/Berge et al. - 2017 - A general and powerful stage-wise testing procedur.pdf},
keywords = {differential expression,differential transcript usage,rna-sequencing,stage-wise testing},
pages = {1-14},
title = {A General and Powerful Stage-Wise Testing Procedure for Differential Expression and Differential Transcript Usage}
}
@article{Berglund2017,
abstract = {Background: Autologous and allogeneic adult mesenchymal stem/stromal cells (MSCs) are increasingly being investigated for treating a wide range of clinical diseases. Allogeneic MSCs are especially attractive due to their potential to provide immediate care at the time of tissue injury or disease diagnosis. The prevailing dogma has been that allogeneic MSCs are immune privileged, but there have been very few studies that control for matched or mismatched major histocompatibility complex (MHC) molecule expression and that examine immunogenicity in vivo. Studies that control for MHC expression have reported both cell-mediated and humoral immune responses to MHC-mismatched MSCs. The clinical implications of immune responses to MHC-mismatched MSCs are still unknown. Pre-clinical and clinical studies that document the MHC haplotype of donors and recipients and measure immune responses following MSC treatment are necessary to answer this critical question. Conclusions: This review details what is currently known about the immunogenicity of allogeneic MSCs and suggests contemporary assays that could be utilized in future studies to appropriately identify and measure immune responses to MHC-mismatched MSCs.},
author = {Berglund, Alix K. and Fortier, Lisa A. and Antczak, Douglas F. and Schnabel, Lauren V.},
date = {2017-12-22},
doi = {10/ggcxjz},
file = {/Users/ryan/Documents/Zotero Library/Berglund et al. - 2017 - Immunoprivileged no more Measuring the immunogeni.pdf},
issn = {17576512},
journaltitle = {Stem Cell Research and Therapy},
keywords = {Allogeneic,Cytotoxicity,ELISPOT,Immunogenicity,Major histocompatibility complex,Mesenchymal stem cell,Microcytotoxicity,Mixed leukocyte reaction},
number = {1},
pages = {288},
title = {Immunoprivileged No More: {{Measuring}} the Immunogenicity of Allogeneic Adult Mesenchymal Stem Cells},
volume = {8}
}
@article{Berman2010,
abstract = {OBJECTIVE - To test the graft-promoting effects of mesenchymal stem cells (MSCs) in a cynomolgus monkey model of islet/bone marrow transplantation. RESEARCH DESIGN AND METHODS - Cynomolgus MSCs were obtained from iliac crest aspirate and characterized through passage 11 for phenotype, gene expression, differentiation potential, and karyotype. Allogeneic donor MSCs were cotransplanted intraportally with islets on postoperative day (POD) 0 and intravenously with donor marrow on PODs 5 and 11. Recipients were followed for stabilization of blood glucose levels, reduction of exogenous insulin requirement (EIR), C-peptide levels, changes in peripheral blood T regulatory cells, and chimerism. Destabilization of glycemia and increases in EIR were used as signs of rejection; additional intravenous MSCs were administered to test the effect on reversal of rejection. RESULTS - MSC phenotype and a normal karyotype were observed through passage 11. IL-6, IL-10, vascular endothelial growth factor, TGF-{$\beta$}, hepatocyte growth factor, and galectin-1 gene expression levels varied among donors. MSC treatment significantly enhanced islet engraftment and function at 1 month posttransplant (n = 8), as compared with animals that received islets without MSCs (n = 3). Additional infusions of donor or third-party MSCs resulted in reversal of rejection episodes and prolongation of islet function in two animals. Stable islet allograft function was associated with increased numbers of regulatory T-cells in peripheral blood. CONCLUSIONS - MSCs may provide an important approach for enhancement of islet engraftment, thereby decreasing the numbers of islets needed to achieve insulin independence. Furthermore, MSCs may serve as a new, safe, and effective antirejection therapy. \textcopyright{} 2010 by the American Diabetes Association.},
author = {Berman, Dora M. and Willman, Melissa A. and Han, Dongmei and Kleiner, Gary and Kenyon, Norman M. and Cabrera, Over and Karl, Julie A. and Wiseman, Roger W. and O'Connor, David H. and Bartholomew, Amelia M. and Kenyon, Norma S.},
date = {2010},
doi = {10/c9r6nn},
file = {/Users/ryan/Documents/Zotero Library/Berman et al. - 2010 - Mesenchymal stem cells enhance allogeneic islet en.pdf},
issn = {00121797},
journaltitle = {Diabetes},
number = {10},
pages = {2558-2568},
title = {Mesenchymal Stem Cells Enhance Allogeneic Islet Engraftment in Nonhuman Primates},
volume = {59}
}
@article{Bi2013,
abstract = {BACKGROUND: RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions.
RESULTS: We developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is "nonparametric" in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates.
CONCLUSIONS: NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at http://bioinformatics.wistar.upenn.edu/NPEBseq.},
author = {Bi, Yingtao and Davuluri, Ramana V},
date = {2013-08-27},
doi = {10/gb8vvz},
eprint = {23981227},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bi and Davuluri - 2013 - NPEBseq nonparametric empirical bayesian-based pr.pdf},
issn = {1471-2105},
journaltitle = {BMC bioinformatics},
number = {1},
pages = {262},
title = {{{NPEBseq}}: Nonparametric Empirical Bayesian-Based Procedure for Differential Expression Analysis of {{RNA}}-Seq Data.},
volume = {14}
}
@report{Bischl2012,
abstract = {Empirical analysis of statistical algorithms often demands time-consuming ex- periments which are best performed on high performance computing clusters. We present two R packages which greatly simplify working in batch computing envi- ronments. The package BatchJobs implements the basic objects and procedures to control a batch cluster within R . It is structured around cluster versions of the well-known higher order functions Map , Reduce and Filter from functional programming. An important feature is that the state of computation is persistently available in a database. The user can query the status of jobs and then continue working with a desired subset. The second package, BatchExperiments , is tailored for the still very general sce- nario of analyzing arbitrary algorithms on problem instances. It extends BatchJobs by letting the user define an array of jobs of the kind ``apply algorithm A to prob- lem instance P and store results''. It is possible to associate statistical designs with parameters of algorithms and problems and therefore to systematically study their influence on the results. In general our main contributions are: (a) Portability : Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (b) Reproducibility : Every computational part has an associated seed that the user can control to ensure reproducibility even when the underlying batch system changes. (c) Efficiency : Efficiently use batch computing clusters completely within R . (d) Abstraction and good software design : The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.},
author = {Bischl, Bernd and Lang, Michel},
date = {2012},
file = {/Users/ryan/Documents/Zotero Library/Bischl and Lang - 2012 - Computing on high performance clusters with R Pac.pdf},
institution = {{technische universit\"at dortmund}},
title = {Computing on High Performance Clusters with {{R}}: {{Packages BatchJobs}} and {{BatchExperiments}}}
}
@article{blanchetteAligningMultipleGenomic2004,
abstract = {We define a ``threaded blockset,'' which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for ``threaded blockset aligner'') builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.},
author = {Blanchette, Mathieu and Kent, W. James and Riemer, Cathy and Elnitski, Laura and Smit, Arian F. A. and Roskin, Krishna M. and Baertsch, Robert and Rosenbloom, Kate and Clawson, Hiram and Green, Eric D. and Haussler, David and Miller, Webb},
date = {2004-01-04},
doi = {10/d79h2w},
eprint = {15060014},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Blanchette et al. - 2004 - Aligning Multiple Genomic Sequences With the Threa.pdf;/Users/ryan/Zotero/storage/B94BUFKK/708.html},
issn = {1088-9051, 1549-5469},
journaltitle = {Genome Research},
langid = {english},
number = {4},
pages = {708-715},
shortjournal = {Genome Res.},
title = {Aligning {{Multiple Genomic Sequences With}} the {{Threaded Blockset Aligner}}},
volume = {14}
}
@article{Blanco2007,
abstract = {This unit describes the usage of geneid, an efficient gene-finding program that allows for the analysis of large genomic sequences, including whole mammalian chromosomes. These sequences can be partially annotated, and geneid can be used to refine this initial annotation. Training geneid is relatively easy, and parameter configurations exist for a number of eukaryotic species. Geneid produces output in a variety of standard formats. The results, thus, can be processed by a variety of software tools, including visualization programs. Geneid software is in the public domain, and it is undergoing constant development. It is easy to install and use. Exhaustive benchmark evaluations show that geneid compares favorably with other existing gene finding tools.},
author = {Blanco, Enrique and Parra, Gen\'is and Guig\'o, Roderic},
date = {2007-06},
doi = {10/b54cc7},
eprint = {18428791},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Blanco et al. - 2007 - Using geneid to identify genes..pdf},
issn = {1934-340X},
journaltitle = {Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]},
keywords = {Algorithms,Base Sequence,Chromosome Mapping,Chromosome Mapping: methods,DNA,DNA: methods,Genes,Genes: genetics,Molecular Sequence Data,Sequence Alignment,Sequence Alignment: methods,Sequence Analysis},
number = {1},
pages = {Unit 4.3},
title = {Using Geneid to Identify Genes.},
volume = {Chapter 4}
}
@article{Blume2018,
abstract = {Verifying that a statistically significant result is scientifically meaningful is not only good scientific practice, it is a natural way to control the Type I error rate. Here we introduce a novel extension of the p-value\textemdash{}a second-generation p-value (p{$\delta$})\textendash{}that formally accounts for scientific relevance and leverages this natural Type I Error control. The approach relies on a pre-specified interval null hypothesis that represents the collection of effect sizes that are scientifically uninteresting or are practically null. The second-generation p-value is the proportion of data-supported hypotheses that are also null hypotheses. As such, second-generation p-values indicate when the data are compatible with null hypotheses (p{$\delta$} = 1), or with alternative hypotheses (p{$\delta$} = 0), or when the data are inconclusive (0 {$<$} p{$\delta$} {$<$} 1). Moreover, second-generation p-values provide a proper scientific adjustment for multiple comparisons and reduce false discovery rates. This is an advance for environments rich in data, where traditional p-value adjustments are needlessly punitive. Second-generation p-values promote transparency, rigor and reproducibility of scientific results by a priori specifying which candidate hypotheses are practically meaningful and by providing a more reliable statistical summary of when the data are compatible with alternative or null hypotheses.},
archivePrefix = {arXiv},
author = {Blume, Jeffrey D. and D'Agostino McGowan, Lucy and Dupont, William D. and Greevy, Robert A.},
date = {2018-03-22},
doi = {10/gc7575},
editor = {Smalheiser, Neil R.},
eprint = {1709.09333},
eprinttype = {arxiv},
file = {/Users/ryan/Documents/Zotero Library/Blume et al. - 2018 - Second-generation p-values Improved rigor, reprod.pdf},
isbn = {1111111111},
issn = {19326203},
journaltitle = {PLoS ONE},
number = {3},
pages = {e0188299},
title = {Second-Generation p-Values: {{Improved}} Rigor, Reproducibility, \& Transparency in Statistical Analyses},
volume = {13}
}
@article{Boley2014a,
abstract = {The identification of full length transcripts entirely from short-read RNA sequencing data (RNA-seq) remains a challenge in the annotation of genomes. Here we describe an automated pipeline for genome annotation that integrates RNA-seq and gene-boundary data sets, which we call Generalized RNA Integration Tool, or GRIT. Applying GRIT to Drosophila melanogaster short-read RNA-seq, cap analysis of gene expression (CAGE) and poly(A)-site-seq data collected for the modENCODE project, we recovered the vast majority of previously annotated transcripts and doubled the total number of transcripts cataloged. We found that 20\% of protein coding genes encode multiple protein-localization signals and that, in 20-d-old adult fly heads, genes with multiple polyadenylation sites are more common than genes with alternative splicing or alternative promoters. GRIT demonstrates 30\% higher precision and recall than the most widely used transcript assembly tools. GRIT will facilitate the automated generation of high-quality genome annotations without the need for extensive manual annotation.},
author = {Boley, Nathan and Stoiber, Marcus H and Booth, Benjamin W and Wan, Kenneth H and a Hoskins, Roger and Bickel, Peter J and Celniker, Susan E and Brown, James B},
date = {2014-03-16},
doi = {10/f5zgdb},
eprint = {24633242},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Boley et al. - 2014 - Genome-guided transcript assembly by integrative a.pdf},
issn = {1546-1696},
journaltitle = {Nature biotechnology},
title = {Genome-Guided Transcript Assembly by Integrative Analysis of {{RNA}} Sequence Data.}
}
@software{boleyIrreproducibleDiscoveryRate2019,
author = {Boley, Nathan},
date = {2019-10-15T20:58:43Z},
ids = {gh-idr},
keywords = {⛔ No DOI found},
origdate = {2015-01-22T18:57:07Z},
title = {Irreproducible Discovery Rate ({{IDR}})},
url = {https://github.com/nboley/idr},
urldate = {2019-11-14}
}
@article{Bolstad2003,
abstract = {MOTIVATION: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations.
RESULTS: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably.
AVAILABILITY: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org.
SUPPLEMENTARY INFORMATION: Additional figures may be found at http://www.stat.berkeley.edu/\textasciitilde{}bolstad/normalize/index.html},
author = {Bolstad, B M and a Irizarry, R and Astrand, M and Speed, T P},
date = {2003-01-22},
eprint = {12538238},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bolstad et al. - 2003 - A comparison of normalization methods for high den.pdf},
issn = {1367-4803},
journaltitle = {Bioinformatics (Oxford, England)},
keywords = {Algorithms,Calibration,Models; Genetic,Molecular Probes,Nonlinear Dynamics,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: instrumen,Oligonucleotide Array Sequence Analysis: methods,Oligonucleotide Array Sequence Analysis: standards,Quality Control,Sequence Analysis; DNA,Sequence Analysis; DNA: methods,Sequence Analysis; DNA: standards,Stochastic Processes},
number = {2},
pages = {185-93},
title = {A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias.},
volume = {19}
}
@article{Bonafede2014,
archivePrefix = {arXiv},
author = {Bonafede, Elisabetta and Picard, Franck and Viroli, Cinzia and Sciences, Statistical and Evolutive, Biologie and Cnrs, U M R and November, France},
date = {2014},
doi = {10/f846sd},
eprint = {1410.8093v2},
eprinttype = {arxiv},
file = {/Users/ryan/Documents/Zotero Library/Bonafede et al. - 2014 - Modelling overdispersion heterogeneity in differen.pdf},
issn = {0006341X},
keywords = {hypothesis testing,mixture models,rna-seq data},
pages = {1-22},
title = {Modelling Overdispersion Heterogeneity in Differential Expression Analysis Using Mixtures}
}
@article{Boos2011,
abstract = {Cyclin-dependent kinases (CDKs) play crucial roles in promoting DNA replication and preventing rereplication in eukaryotic cells [1-4]. In budding yeast, CDKs promote DNA replication by phosphorylating two proteins, Sld2 and Sld3, which generates binding sites for pairs of BRCT repeats (breast cancer gene 1 [BRCA1] C terminal repeats) in the Dpb11 protein [5, 6]. The Sld3-Dpb11-Sld2 complex generated by CDK phosphorylation is required for the assembly and activation of the Cdc45-Mcm2-7-GINS (CMG) replicative helicase. In response to DNA replication stress, the interaction between Sld3 and Dpb11 is blocked by the checkpoint kinase Rad53 [7], which prevents late origin firing [7, 8]. Here we show that the two key CDK sites in Sld3 are conserved in the human Sld3-related protein Treslin/ticrr and are essential for DNA replication. Moreover, phosphorylation of these two sites mediates interaction with the orthologous pair of BRCT repeats in the human Dpb11 ortholog, TopBP1. Finally, we show that DNA replication stress prevents the interaction between Treslin/ticrr and TopBP1 via the Chk1 checkpoint kinase. Our results indicate that Treslin/ticrr is a genuine ortholog of Sld3 and that the Sld3-Dpb11 interaction has remained a critical nexus of S phase regulation through eukaryotic evolution.},
author = {Boos, Dominik and Sanchez-Pulido, Luis and Rappas, Mathieu and Pearl, Laurence H and Oliver, Antony W and Ponting, Chris P and Diffley, John F X},
date = {2011-07-12},
doi = {10/d754zj},
eprint = {21700459},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Boos et al. - 2011 - Regulation of DNA replication through Sld3-Dpb11 i.pdf},
issn = {1879-0445},
journaltitle = {Current biology : CB},
keywords = {Amino Acid Sequence,Cell Cycle Proteins,Cell Cycle Proteins: chemistry,Cell Cycle Proteins: metabolism,Cell Cycle Proteins: physiology,Conserved Sequence,Cyclin-Dependent Kinases,Cyclin-Dependent Kinases: chemistry,Cyclin-Dependent Kinases: physiology,DNA Replication,DNA Replication: physiology,Evolution; Molecular,Fungal Proteins,Fungal Proteins: chemistry,Fungal Proteins: metabolism,Fungal Proteins: physiology,HeLa Cells,Humans,Molecular Sequence Data,Protein Kinases,Protein Kinases: metabolism,Protein Kinases: physiology,Saccharomyces cerevisiae Proteins,Saccharomyces cerevisiae Proteins: chemistry,Saccharomyces cerevisiae Proteins: metabolism,Saccharomyces cerevisiae Proteins: physiology,Sequence Alignment,Yeasts,Yeasts: genetics},
number = {13},
pages = {1152-7},
title = {Regulation of {{DNA}} Replication through {{Sld3}}-{{Dpb11}} Interaction Is Conserved from Yeast to Humans.},
volume = {21}
}
@article{Bourgon2010,
abstract = {With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50\%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering-using filter/test pairs that are independent under the null hypothesis but correlated under the alternative-is a general approach that can substantially increase the efficiency of experiments.},
author = {Bourgon, Richard and Gentleman, Robert and Huber, Wolfgang},
date = {2010-05-25},
doi = {10/b94qj2},
eprint = {20460310},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bourgon et al. - 2010 - Independent filtering increases detection power fo.pdf},
issn = {1091-6490},
journaltitle = {Proceedings of the National Academy of Sciences of the United States of America},
keywords = {Algorithms,Biometry,Biometry: methods,Computational Biology,Genetic,Models},
number = {21},
pages = {9546-51},
title = {Independent Filtering Increases Detection Power for High-Throughput Experiments.},
volume = {107}
}
@article{Boyle2008,
abstract = {Mapping DNase I hypersensitive (HS) sites is an accurate method of identifying the location of genetic regulatory elements, including promoters, enhancers, silencers, insulators, and locus control regions. We employed high-throughput sequencing and whole-genome tiled array strategies to identify DNase I HS sites within human primary CD4+ T cells. Combining these two technologies, we have created a comprehensive and accurate genome-wide open chromatin map. Surprisingly, only 16\%-21\% of the identified 94,925 DNase I HS sites are found in promoters or first exons of known genes, but nearly half of the most open sites are in these regions. In conjunction with expression, motif, and chromatin immunoprecipitation data, we find evidence of cell-type-specific characteristics, including the ability to identify transcription start sites and locations of different chromatin marks utilized in these cells. In addition, and unexpectedly, our analyses have uncovered detailed features of nucleosome structure.},
author = {Boyle, Alan P and Davis, Sean and Shulha, Hennady P and Meltzer, Paul and Margulies, Elliott H and Weng, Zhiping and Furey, Terrence S and Crawford, Gregory E},
date = {2008-01-25},
doi = {10/fbcrk6},
eprint = {18243105},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Boyle et al. - 2008 - High-resolution mapping and characterization of op.pdf},
issn = {1097-4172},
journaltitle = {Cell},
keywords = {Algorithms,Area Under Curve,Binding Sites,CD4-Positive T-Lymphocytes,CD4-Positive T-Lymphocytes: cytology,Cell Nucleus,Cell Nucleus: metabolism,Chromatin,Chromatin Immunoprecipitation,Chromatin: genetics,Chromosome Mapping,Chromosome Mapping: methods,Chromosomes; Human,Deoxyribonuclease I,Deoxyribonuclease I: chemistry,Deoxyribonuclease I: pharmacology,Genome; Human,Genome; Human: genetics,Genome; Human: immunology,Histones,Histones: chemistry,Humans,Nucleosomes,Nucleosomes: chemistry,Oligonucleotide Array Sequence Analysis,Promoter Regions; Genetic,ROC Curve,Sensitivity and Specificity,Sequence Analysis; DNA,Transcription Factors,Transcription Factors: metabolism},
number = {2},
pages = {311-22},
title = {High-Resolution Mapping and Characterization of Open Chromatin across the Genome.},
volume = {132}
}
@article{Bray2016,
archivePrefix = {arXiv},
author = {Bray, Nicolas L and Pimentel, Harold and Melsted, P\'all and Pachter, Lior},
date = {2016},
doi = {10/f8nvsp},
eprint = {27043002},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Bray et al. - 2016 - Near-optimal probabilistic RNA-seq quantification.pdf},
isbn = {1546-1696 (Electronic) 1087-0156 (Linking)},
issn = {1087-0156},
journaltitle = {Nature Biotechnology},
number = {5},
pages = {525-527},
title = {Near-Optimal Probabilistic {{RNA}}-Seq Quantification},
volume = {34}
}
@article{Breese2013,
abstract = {SUMMARY: NGSUtils is a suite of software tools for manipulating data common to next-generation sequencing experiments, such as FASTQ, BED and BAM format files. These tools provide a stable and modular platform for data management and analysis.Availability and implementation: NGSUtils is available under a BSD license and works on Mac OS X and Linux systems. Python 2.6+ and virtualenv are required. More information and source code may be obtained from the website: http://ngsutils.org. CONTACT: [email protected] information: Supplementary data are available at Bioinformatics online.},
author = {Breese, Marcus R and Liu, Yunlong},
date = {2013-01-21},
doi = {10/ggcxj2},
eprint = {23314324},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Breese and Liu - 2013 - NGSUtils a software suite for analyzing and manip.pdf},
issn = {1367-4811},
journaltitle = {Bioinformatics (Oxford, England)},
number = {4},
pages = {494-496},
title = {{{NGSUtils}}: A Software Suite for Analyzing and Manipulating next-Generation Sequencing Datasets.},
volume = {29}
}
@article{Bresler2012,
author = {Bresler, M. and Sheehan, S. and Chan, a. H. and Song, Y. S.},
date = {2012-09-07},
doi = {10/f38qfp},
file = {/Users/ryan/Documents/Zotero Library/Bresler et al. - 2012 - Telescoper de novo assembly of highly repetitive .pdf},
issn = {1367-4803},
journaltitle = {Bioinformatics},
number = {18},
pages = {i311-i317},
title = {Telescoper: De Novo Assembly of Highly Repetitive Regions},
volume = {28}
}
@article{Caplan2017,
abstract = {Mesenchymal stem cells (MSCs) were officially named more than 25 years ago to represent a class of cells from human and mammalian bone marrow and periosteum that could be isolated and expanded in culture while maintaining their in vitro capacity to be induced to form a variety of mesodermal phenotypes and tissues. The in vitro capacity to form bone, cartilage, fat, etc., became an assay for identifying this class of multipotent cells and around which several companies were formed in the 1990s to medically exploit the regenerative capabilities of MSCs. Today, there are hundreds of clinics and hundreds of clinical trials using human MSCs with very few, if any, focusing on the in vitro multipotential capacities of these cells. Unfortunately, the fact that MSCs are called ``stem cells'' is being used to infer that patients will receive direct medical benefit, because they imagine that these cells will differentiate into regenerating tissueproducing cells. Such a stem cell treatment will presumably cure the patient of their medically relevant difficulties ranging from osteoarthritic (bone-on-bone) knees to various neurological maladies including dementia. I now urge that we change the name of MSCs to Medicinal Signaling Cells to more accurately reflect the fact that these cells home in on sites of injury or disease and secrete bioactive factors that are immunomodulatory and trophic (regenerative) meaning that these cells make therapeutic drugs in situ that are medicinal. It is, indeed, the patient's own site-specific and tissue-specific resident stem cells that construct the new tissue as stimulated by the bioactive factors secreted by the exogenously supplied MSCs.},
author = {Caplan, Arnold I.},
date = {2017-06},
doi = {10/ggcxj3},
file = {/Users/ryan/Documents/Zotero Library/Caplan - 2017 - Mesenchymal stem cells Time to change the name!.pdf},
issn = {21576580},
journaltitle = {Stem Cells Translational Medicine},
keywords = {Medicinal signaling cells,Mesenchymal stem cells,MSCs,Regenerative medicine},
number = {6},
pages = {1445-1451},
title = {Mesenchymal Stem Cells: {{Time}} to Change the Name!},
volume = {6}
}
@book{Carlson2013,
author = {Carlson, Marc and Obenchain, Valerie and Pag\`es, Herv\'e and Shannon, Paul and Tenenbaum, Dan and Morgan, Martin},
date = {2013-05-28},
file = {/Users/ryan/Documents/Zotero Library/Carlson et al. - 2013 - Intermediate R Bioconductor for Sequence Analysi.pdf},
keywords = {⛔ No DOI found},
title = {Intermediate {{R}} / {{Bioconductor}} for {{Sequence Analysis}}}
}
@article{Castellana2008,
abstract = {Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40\% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation ({$>$}99\% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13\% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.},
author = {Castellana, Natalie E and Payne, Samuel H and Shen, Zhouxin and Stanke, Mario and Bafna, Vineet and Briggs, Steven P},
date = {2008-12-30},
doi = {10/fpqs6c},
eprint = {19098097},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Castellana et al. - 2008 - Discovery and revision of Arabidopsis genes by pro.pdf},
issn = {1091-6490},
journaltitle = {Proceedings of the National Academy of Sciences of the United States of America},
keywords = {Arabidopsis,Arabidopsis Proteins,Arabidopsis Proteins: genetics,Arabidopsis: genetics,Genome; Plant,Genome; Plant: genetics,Models; Genetic,Proteome,Proteome: genetics,Proteomics,Proteomics: methods,Software},
number = {52},
pages = {21034-8},
title = {Discovery and Revision of {{Arabidopsis}} Genes by Proteogenomics.},
volume = {105}
}
@article{Caviston2011,
abstract = {Huntingtin (Htt) is a membrane-associated scaffolding protein that interacts with microtubule motors as well as actin-associated adaptor molecules. We examined a role for Htt in the dynein-mediated intracellular trafficking of endosomes and lysosomes. In HeLa cells depleted of either Htt or dynein, early, recycling, and late endosomes (LE)/lysosomes all become dispersed. Despite altered organelle localization, kinetic assays indicate only minor defects in intracellular trafficking. Expression of full-length Htt is required to restore organelle localization in Htt-depleted cells, supporting a role for Htt as a scaffold that promotes functional interactions along its length. In dynein-depleted cells, LE/lysosomes accumulate in tight patches near the cortex, apparently enmeshed by cortactin-positive actin filaments; Latrunculin B-treatment disperses these patches. Peripheral LE/lysosomes in dynein-depleted cells no longer colocalize with microtubules. Htt may be required for this off-loading, as the loss of microtubule association is not seen in Htt-depleted cells or in cells depleted of both dynein and Htt. Inhibition of kinesin-1 relocalizes peripheral LE/lysosomes induced by Htt depletion but not by dynein depletion, consistent with their detachment from microtubules upon dynein knockdown. Together, these data support a model of Htt as a facilitator of dynein-mediated trafficking that may regulate the cytoskeletal association of dynamic organelles.},
author = {Caviston, Juliane P and Zajac, Allison L and Tokito, Mariko and Holzbaur, Erika L F},
date = {2011-02-15},
doi = {10/bhxdc9},
eprint = {21169558},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Caviston et al. - 2011 - Huntingtin coordinates the dynein-mediated dynamic.pdf},
issn = {1939-4586},
journaltitle = {Molecular biology of the cell},
keywords = {Actins,Actins: metabolism,Cell Line; Tumor,Cytoskeleton,Cytoskeleton: metabolism,Dyneins,Dyneins: genetics,Dyneins: metabolism,Endosomes,Endosomes: metabolism,Gene Knockdown Techniques,Gene Knockdown Techniques: methods,HeLa Cells,Humans,Lysosome-Associated Membrane Glycoproteins,Lysosome-Associated Membrane Glycoproteins: metabo,Lysosomes,Lysosomes: metabolism,Microtubule-Associated Proteins,Microtubule-Associated Proteins: metabolism,Microtubule-Associated Proteins: physiology,Microtubules,Microtubules: metabolism,Microtubules: physiology,Molecular Motor Proteins,Molecular Motor Proteins: genetics,Molecular Motor Proteins: metabolism,Nerve Tissue Proteins,Nerve Tissue Proteins: genetics,Nerve Tissue Proteins: metabolism,Nuclear Proteins,Nuclear Proteins: genetics,Nuclear Proteins: metabolism,Organelles,Organelles: metabolism,Polymerization,Protein Transport,Protein Transport: physiology,RNA Interference},
number = {4},
pages = {478-92},
title = {Huntingtin Coordinates the Dynein-Mediated Dynamic Positioning of Endosomes and Lysosomes.},
volume = {22}
}
@article{Chabbert2015,
abstract = {\textcopyright{} 2015 The Authors. Published under the terms of the CC BY 4.0 license. We present a modified approach of chromatin immuno-precipitation followed by sequencing (ChIP-Seq), which relies on the direct ligation of molecular barcodes to chromatin fragments, thereby permitting experimental scale-up. With Bar-ChIP now enabling the concurrent profiling of multiple DNA-protein interactions, we report the simultaneous generation of 90 ChIP-Seq datasets without any robotic instrumentation. We demonstrate that application of Bar-ChIP to a panel of Saccharomyces cerevisiae chromatin-associated mutants provides a rapid and accurate genome-wide overview of their chromatin status. Additionally, we validate the utility of this technology to derive novel biological insights by identifying a role for the Rpd3S complex in maintaining H3K14 hypo-acetylation in gene bodies. We also report an association between the presence of intragenic H3K4 tri-methylation and the emergence of cryptic transcription in a Set2 mutant. Finally, we uncover a crosstalk between H3K14 acetylation and H3K4 methylation in this mutant. These results show that Bar-ChIP enables biological discovery through rapid chromatin profiling at single-nucleosome resolution for various conditions and protein modifications at once. Synopsis A new approach provides a rapid and accurate genome-wide overview of the chromatin status of multiple yeast chromatin-associated mutants at once. The simultaneous profiling of epigenetic marks in the mutants is achieved by multiplex immuno-precipitation of barcoded chromatin samples. Bar-ChIP is based on the immuno-precipitation of barcoded chromatin and permits sample multiplexing, thereby increasing the throughput of ChIP-Seq experiments. Application of the method to yeast chromatin-associated mutants enabled the concurrent generation of 90 ChIP-Seq datasets without the need for robotic instrumentation. The rapid chromatin profiling of the mutants at single-nucleosome resolution uncovered an association between intragenic H3K4 tri-methylation and cryptic transcription in set2{$\Pi$}. A new approach provides a rapid and accurate genome-wide overview of the chromatin status of multiple yeast chromatin-associated mutants at once. The simultaneous profiling of epigenetic marks in the mutants is achieved by multiplex immuno-precipitation of barcoded chromatin samples.},
author = {Chabbert, Christophe D and Adjalley, Sophie H and Klaus, Bernd and Fritsch, Emilie S and Gupta, Ishaan and Pelechano, Vicent and Steinmetz, Lars M},
date = {2015},
doi = {10/f2zhr9},
eprint = {25583149},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Chabbert et al. - 2015 - A high‐throughput C h IP ‐ S eq for large‐scale ch.pdf},
issn = {1744-4292},
journaltitle = {Molecular Systems Biology},
keywords = {chip-seq,chromatin,high-throughput,histone,histone marks},
number = {1},
pages = {777},
title = {A High-throughput {{C}} h {{IP}} - {{S}} Eq for Large-scale Chromatin Studies},
volume = {11}
}
@article{Chabbert2016,
abstract = {The genome-wide study of epigenetic states requires the integrative analysis of histone modification ChIP-seq data. Here, we introduce an easy-to-use analytic framework to compare profiles of enrichment in histone modifications around classes of genomic elements, e.g. transcription start sites (TSS). Our framework is available via the user-friendly R/Bioconductor package DChIPRep. DChIPRep uses biological replicate information as well as chromatin Input data to allow for a rigorous assessment of differential enrichment. DChIPRep is available for download through the Bioconductor project at http://bioconductor.org/packages/DChIPRep.},
author = {Chabbert, Christophe D. and Steinmetz, Lars M. and Klaus, Bernd},
date = {2016-04-26},
doi = {10/ggcxj4},
eprint = {27168989},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Chabbert et al. - 2016 - DChIPRep, an RBioconductor package for differenti.pdf},
issn = {21678359},
journaltitle = {PeerJ},
keywords = {Bioinformatics,ChiP-seq,Chromatin,Computational biology,Differential enrichment,Genomics,Histone-modifications,Statistics},
number = {4},
pages = {e1981},
title = {{{DChIPRep}}, an {{R}}/{{Bioconductor}} Package for Differential Enrichment Analysis in Chromatin Studies},
volume = {2016}
}
@collection{chambersStatisticalModels1992,
date = {1992},
doi = {10/gf5g89},
edition = {1},
editor = {Chambers, John M. and Hastie, Trevor J.},
ids = {chambers:1992},
isbn = {978-0-203-73853-5},
keywords = {S models statistical statistics},
langid = {english},
publisher = {{Routledge}},
title = {Statistical {{Models}} in {{S}}},
url = {https://www.taylorfrancis.com/books/e/9780203738535}
}
@article{Champagne2014,
abstract = {Land cover and land use classifications from remote sensing are increasingly becoming institutionalized framework data sets for monitoring environmental change. As such, the need for robust statements of classification accuracy is critical. This paper describes a method to estimate confidence in classification model accuracy using a bootstrap approach. Using this method, it was found that classification accuracy and confidence, while closely related, can be used in complementary ways to provide additional information on map accuracy and define groups of classes and to inform the future reference sampling strategies. Overall classification accuracy increases with an increase in the number of fields surveyed, where the width of classification confidence bounds decreases. Individual class accuracies and confidence were non-linearly related to the number of fields surveyed. Results indicate that some classes can be estimated accurately and confidently with fewer numbers of samples, whereas others require larger reference data sets to achieve satisfactory results. This approach is an improvement over other approaches for estimating class accuracy and confidence as it uses repetitive sampling to produce a more realistic estimate of the range in classification accuracy and confidence that can be obtained with different reference data inputs. ?? 2014 Published by Elsevier B.V.},
author = {Champagne, Catherine and McNairn, Heather and Daneshfar, Bahram and Shang, Jiali},
date = {2014},
doi = {10/f5v533},
file = {/Users/ryan/Documents/Zotero Library/Champagne et al. - 2014 - A bootstrap method for assessing classification ac.pdf},
issn = {15698432},
journaltitle = {International Journal of Applied Earth Observation and Geoinformation},
number = {1},
pages = {44-52},
title = {A Bootstrap Method for Assessing Classification Accuracy and Confidence for Agricultural Land Use Mapping in {{Canada}}},
volume = {29}
}
@article{Chang2008,
abstract = {BACKGROUND: Alternative RNA splicing greatly increases proteome diversity and thereby contribute to species- or tissue-specific functions. The possibility to study alternative splicing (AS) events on a genomic scale using splicing-sensitive microarrays, including the Affymetrix GeneChip Exon 1.0 ST microarray (exon array), has appeared very recently. However, the application of this new technology is hindered by the lack of free and user-friendly software devoted to these novel platforms.
RESULTS: In this study we present a Java-based freeware, easyExon http://microarray.ym.edu.tw/easyexon, to process, filtrate and visualize exon array data with an analysis pipeline. This tool implements the most commonly used probeset summarization methods as well as AS-orientated filtration algorithms, e.g. MIDAS and PAC, for the detection of alternative splicing events. We include a biological filtration function according to GO terms, and provide a module to visualize and interpret the selected exons and transcripts. Furthermore, easyExon can integrate with other related programs, such as Integrate Genome Browser (IGB) and Affymetrix Power Tools (APT), to make the whole analysis more comprehensive. We applied easyExon on a public accessible colon cancer dataset as an example to illustrate the analysis pipeline of this tool.
CONCLUSION: EasyExon can efficiently process and analyze the Affymetrix exon array data. The simplicity, flexibility and brevity of easyExon make it a valuable tool for AS event identification in genomic research.},
author = {Chang, Ting-Yu and Li, Yin-Yi and Jen, Chih-Hung and Yang, Tsun-Po and Lin, Chi-Hung and Hsu, Ming-Ta and Wang, Hsei-Wei},
date = {2008-01},
doi = {10/fqf9jm},
eprint = {18851762},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Chang et al. - 2008 - easyExon--a Java-based GUI tool for processing and.pdf},
issn = {1471-2105},
journaltitle = {BMC bioinformatics},
keywords = {Alternative Splicing,Alternative Splicing: genetics,Animals,Exons,Gene Expression Profiling,Gene Expression Profiling: methods,Humans,Information Storage and Retrieval,Information Storage and Retrieval: methods,Mice,Oligonucleotide Array Sequence Analysis,Oligonucleotide Array Sequence Analysis: methods,Rats,User-Computer Interface},
pages = {432},
title = {{{easyExon}}--a {{Java}}-Based {{GUI}} Tool for Processing and Visualization of {{Affymetrix}} Exon Array Data.},
volume = {9}
}
@article{Chen2007,
abstract = {Orthology detection is critically important for accurate functional annotation, and has been widely used to facilitate studies on comparative and evolutionary genomics. Although various methods are now available, there has been no comprehensive analysis of performance, due to the lack of a genomic-scale 'gold standard' orthology dataset. Even in the absence of such datasets, the comparison of results from alternative methodologies contains useful information, as agreement enhances confidence and disagreement indicates possible errors. Latent Class Analysis (LCA) is a statistical technique that can exploit this information to reasonably infer sensitivities and specificities, and is applied here to evaluate the performance of various orthology detection methods on a eukaryotic dataset. Overall, we observe a trade-off between sensitivity and specificity in orthology detection, with BLAST-based methods characterized by high sensitivity, and tree-based methods by high specificity. Two algorithms exhibit the best overall balance, with both sensitivity and specificity{$>$}80\%: INPARANOID identifies orthologs across two species while OrthoMCL clusters orthologs from multiple species. Among methods that permit clustering of ortholog groups spanning multiple genomes, the (automated) OrthoMCL algorithm exhibits better within-group consistency with respect to protein function and domain architecture than the (manually curated) KOG database, and the homolog clustering algorithm TribeMCL as well. By way of using LCA, we are also able to comprehensively assess similarities and statistical dependence between various strategies, and evaluate the effects of parameter settings on performance. In summary, we present a comprehensive evaluation of orthology detection on a divergent set of eukaryotic genomes, thus providing insights and guides for method selection, tuning and development for different applications. Many biological questions have been addressed by multiple tests yielding binary (yes/no) outcomes but no clear definition of truth, making LCA an attractive approach for computational biology.},
author = {Chen, Feng and Mackey, Aaron J and Vermunt, Jeroen K and Roos, David S},
date = {2007-01},
doi = {10/bbkkn3},
eprint = {17440619},
eprinttype = {pmid},
file = {/Users/ryan/Documents/Zotero Library/Chen et al. - 2007 - Assessing performance of orthology detection strat.pdf},
issn = {1932-6203},
journaltitle = {PloS one},
keywords = {Algorithms,Eukaryotic Cells,Genome},
number = {4},
pages = {e383},
title = {Assessing Performance of Orthology Detection Strategies Applied to Eukaryotic Genomes.},
volume = {2}
}
@article{Chen2007a,
abstract = {Background: Cowpea [Vigna unguiculata (L.) Walp.] is one of the most important food and forage legumes in the semi-arid tropics because of its ability to tolerate drought and grow on poor soils. It is cultivated mostly by poor farmers in developing countries, with 80\% of production taking place in the dry savannah of tropical West and Central Africa. Cowpea is largely an underexploited crop with relatively little genomic information available for use in applied plant breeding. The goal of the Cowpea Genomics Initiative (CGI), funded by the Kirkhouse Trust, a UK-based charitable organization, is to leverage modern molecular genetic tools for gene discovery and cowpea improvement. One aspect of the initiative is the sequencing of the gene-rich region of the cowpea genome (termed the genespace) recovered using methylation filtration technology and providing annotation and analysis of the sequence data. Description: CGKB, Cowpea Genespace/Genomics Knowledge Base, is an annotation knowledge base developed under the CGI. The database is based on information derived from 298,848 cowpea genespace sequences (GSS) isolated by methylation filtering of genomic DNA. The CGKB consists of three knowledge bases: GSS annotation and comparative genomics knowledge base, GSS enzyme and metabolic pathway knowledge base, and GSS simple sequence repeats (SSRs) knowledge base for molecular marker discovery. A homology-based approach was applied for annotations of the GSS, mainly using BLASTX against four public FASTA formatted protein databases (NCBI GenBank Proteins, UniProtKB-Swiss-Prot, UniprotKB-PIR (Protein Information Resource), and UniProtKB-TrEMBL). Comparative genome analysis was done by BLASTX searches of the cowpea GSS against four plant proteomes from Arabidopsis thaliana, Oryza sativa, Medicago truncatula, and Populus trichocarpa. The possible exons and introns on each cowpea GSS were predicted using the HMM-based Genscan gene predication program and the potential domains on annotated GSS were analyzed using the HMMER package against the Pfam database. The annotated GSS were also assigned with Gene Ontology annotation terms and integrated with 228 curated plant metabolic pathways from the Arabidopsis Information Resource (TAIR) knowledge base. The UniProtKB-Swiss-Prot ENZYME database was used to assign putative enzymatic function to each GSS. Each GSS was also analyzed with the Tandem Repeat Finder (TRF) program in order to identify potential SSRs for molecular marker discovery. The raw sequence data, processed annotation, and SSR results were stored in relational tables designed in key-value pair fashion using a PostgreSQL relational database management system. The biological knowledge derived from the sequence data and processed results are represented as views or materialized views in the relational database management system. All materialized views are indexed for quick data access and retrieval. Data processing and analysis pipelines were implemented using the Perl programming language. The web interface was implemented in JavaScript and Perl CGI running on an Apache web server. The CPU intensive data processing and analysis pipelines were run on a computer cluster of more than 30 dual-processor Apple XServes. A job management system called Vela was created as a robust way to submit large numbers of jobs to the Portable Batch System (PBS). Conclusion: CGKB is an integrated and annotated resource for cowpea GSS with features of homology-based and HMM-based annotations, enzyme and pathway annotations, GO term annotation, toolkits, and a large number of other facilities to perform complex queries. The cowpea GSS, chloroplast sequences, mitochondrial sequences, retroelements, and SSR sequences are available as FASTA formatted files and downloadable at CGKB. \textcopyright{} 2007 Chen et al; licensee BioMed Central Ltd.},
author = {Chen, Xianfeng and Laudeman, Thomas W. and Rushton, Paul J. and Spraggins, Thomas A. and Timko, Michael P.},
date = {2007},
doi = {10/b8nt58},