Literature DB >> 20644199

The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Aaron McKenna1, Matthew Hanna, Eric Banks, Andrey Sivachenko, Kristian Cibulskis, Andrew Kernytsky, Kiran Garimella, David Altshuler, Stacey Gabriel, Mark Daly, Mark A DePristo.   

Abstract

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

Entities:  

Mesh:

Year:  2010        PMID: 20644199      PMCID: PMC2928508          DOI: 10.1101/gr.107524.110

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  25 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  SSAHA: a fast search method for large DNA databases.

Authors:  Z Ning; A J Cox; J C Mullikin
Journal:  Genome Res       Date:  2001-10       Impact factor: 9.043

3.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

4.  The International HapMap Project.

Authors: 
Journal:  Nature       Date:  2003-12-18       Impact factor: 49.962

5.  Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

Authors:  Kevin Judd McKernan; Heather E Peckham; Gina L Costa; Stephen F McLaughlin; Yutao Fu; Eric F Tsung; Christopher R Clouser; Cisyla Duncan; Jeffrey K Ichikawa; Clarence C Lee; Zheng Zhang; Swati S Ranade; Eileen T Dimalanta; Fiona C Hyland; Tanya D Sokolsky; Lei Zhang; Andrew Sheridan; Haoning Fu; Cynthia L Hendrickson; Bin Li; Lev Kotler; Jeremy R Stuart; Joel A Malek; Jonathan M Manning; Alena A Antipova; Damon S Perez; Michael P Moore; Kathleen C Hayashibara; Michael R Lyons; Robert E Beaudoin; Brittany E Coleman; Michael W Laptewicz; Adam E Sannicandro; Michael D Rhodes; Rajesh K Gottimukkala; Shan Yang; Vineet Bafna; Ali Bashir; Andrew MacBride; Can Alkan; Jeffrey M Kidd; Evan E Eichler; Martin G Reese; Francisco M De La Vega; Alan P Blanchard
Journal:  Genome Res       Date:  2009-06-22       Impact factor: 9.043

6.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples.

Authors:  Daniel C Koboldt; Ken Chen; Todd Wylie; David E Larson; Michael D McLellan; Elaine R Mardis; George M Weinstock; Richard K Wilson; Li Ding
Journal:  Bioinformatics       Date:  2009-06-19       Impact factor: 6.937

Review 7.  Computation for ChIP-seq and RNA-seq studies.

Authors:  Shirley Pepke; Barbara Wold; Ali Mortazavi
Journal:  Nat Methods       Date:  2009-11       Impact factor: 28.547

8.  Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.

Authors:  Radoje Drmanac; Andrew B Sparks; Matthew J Callow; Aaron L Halpern; Norman L Burns; Bahram G Kermani; Paolo Carnevali; Igor Nazarenko; Geoffrey B Nilsen; George Yeung; Fredrik Dahl; Andres Fernandez; Bryan Staker; Krishna P Pant; Jonathan Baccash; Adam P Borcherding; Anushka Brownley; Ryan Cedeno; Linsu Chen; Dan Chernikoff; Alex Cheung; Razvan Chirita; Benjamin Curson; Jessica C Ebert; Coleen R Hacker; Robert Hartlage; Brian Hauser; Steve Huang; Yuan Jiang; Vitali Karpinchyk; Mark Koenig; Calvin Kong; Tom Landers; Catherine Le; Jia Liu; Celeste E McBride; Matt Morenzoni; Robert E Morey; Karl Mutch; Helena Perazich; Kimberly Perry; Brock A Peters; Joe Peterson; Charit L Pethiyagoda; Kaliprasad Pothuraju; Claudia Richter; Abraham M Rosenbaum; Shaunak Roy; Jay Shafto; Uladzislau Sharanhovich; Karen W Shannon; Conrad G Sheppy; Michel Sun; Joseph V Thakuria; Anne Tran; Dylan Vu; Alexander Wait Zaranek; Xiaodi Wu; Snezana Drmanac; Arnold R Oliphant; William C Banyai; Bruce Martin; Dennis G Ballinger; George M Church; Clifford A Reid
Journal:  Science       Date:  2009-11-05       Impact factor: 47.728

9.  BreakDancer: an algorithm for high-resolution mapping of genomic structural variation.

Authors:  Ken Chen; John W Wallis; Michael D McLellan; David E Larson; Joelle M Kalicki; Craig S Pohl; Sean D McGrath; Michael C Wendl; Qunyuan Zhang; Devin P Locke; Xiaoqi Shi; Robert S Fulton; Timothy J Ley; Richard K Wilson; Li Ding; Elaine R Mardis
Journal:  Nat Methods       Date:  2009-08-09       Impact factor: 28.547

10.  ShortRead: a bioconductor package for input, quality assessment and exploration of high-throughput sequence data.

Authors:  Martin Morgan; Simon Anders; Michael Lawrence; Patrick Aboyoun; Hervé Pagès; Robert Gentleman
Journal:  Bioinformatics       Date:  2009-08-03       Impact factor: 6.937

View more
  2000 in total

1.  Long Noncoding RNAs CUPID1 and CUPID2 Mediate Breast Cancer Risk at 11q13 by Modulating the Response to DNA Damage.

Authors:  Joshua A Betts; Mahdi Moradi Marjaneh; Fares Al-Ejeh; Yi Chieh Lim; Wei Shi; Haran Sivakumaran; Romain Tropée; Ann-Marie Patch; Michael B Clark; Nenad Bartonicek; Adrian P Wiegmans; Kristine M Hillman; Susanne Kaufmann; Amanda L Bain; Brian S Gloss; Joanna Crawford; Stephen Kazakoff; Shivangi Wani; Shu W Wen; Bryan Day; Andreas Möller; Nicole Cloonan; John Pearson; Melissa A Brown; Timothy R Mercer; Nicola Waddell; Kum Kum Khanna; Eloise Dray; Marcel E Dinger; Stacey L Edwards; Juliet D French
Journal:  Am J Hum Genet       Date:  2017-08-03       Impact factor: 11.025

2.  Identification of STAC3 variants in non-Native American families with overlapping features of Carey-Fineman-Ziter syndrome and Moebius syndrome.

Authors:  Aida Telegrafi; Bryn D Webb; Sarah M Robbins; Carlos E Speck-Martins; David FitzPatrick; Leah Fleming; Richard Redett; Andreas Dufke; Gunnar Houge; Jeske J T van Harssel; Alain Verloes; Angela Robles; Irini Manoli; Elizabeth C Engle; Ethylin W Jabs; David Valle; John Carey; Julie E Hoover-Fong; Nara L M Sobreira
Journal:  Am J Med Genet A       Date:  2017-08-04       Impact factor: 2.802

3.  Biallelic Mutations in MRPS34 Lead to Instability of the Small Mitoribosomal Subunit and Leigh Syndrome.

Authors:  Nicole J Lake; Bryn D Webb; David A Stroud; Tara R Richman; Benedetta Ruzzenente; Alison G Compton; Hayley S Mountford; Juliette Pulman; Coralie Zangarelli; Marlene Rio; Nathalie Boddaert; Zahra Assouline; Mingma D Sherpa; Eric E Schadt; Sander M Houten; James Byrnes; Elizabeth M McCormick; Zarazuela Zolkipli-Cunningham; Katrina Haude; Zhancheng Zhang; Kyle Retterer; Renkui Bai; Sarah E Calvo; Vamsi K Mootha; John Christodoulou; Agnes Rötig; Aleksandra Filipovska; Ingrid Cristian; Marni J Falk; Metodi D Metodiev; David R Thorburn
Journal:  Am J Hum Genet       Date:  2017-08-03       Impact factor: 11.025

4.  Pancreatic intraductal tubulopapillary neoplasm is genetically distinct from intraductal papillary mucinous neoplasm and ductal adenocarcinoma.

Authors:  Olca Basturk; Michael F Berger; Hiroshi Yamaguchi; Volkan Adsay; Gokce Askan; Umesh K Bhanot; Ahmet Zehir; Fatima Carneiro; Seung-Mo Hong; Giuseppe Zamboni; Esra Dikoglu; Vaidehi Jobanputra; Kazimierz O Wrzeszczynski; Serdar Balci; Peter Allen; Naoki Ikari; Shoko Takeuchi; Hiroyuki Akagawa; Atsushi Kanno; Tooru Shimosegawa; Takanori Morikawa; Fuyuhiko Motoi; Michiaki Unno; Ryota Higuchi; Masakazu Yamamoto; Kyoko Shimizu; Toru Furukawa; David S Klimstra
Journal:  Mod Pathol       Date:  2017-08-04       Impact factor: 7.842

5.  The hnRNP Q-like gene is retroinserted into the B chromosomes of the cichlid fish Astatotilapia latifasciata.

Authors:  Bianca O Carmello; Rafael L B Coan; Adauto L Cardoso; Erica Ramos; Bruno E A Fantinatti; Diego F Marques; Rogério A Oliveira; Guilherme T Valente; Cesar Martins
Journal:  Chromosome Res       Date:  2017-08-03       Impact factor: 5.239

6.  Pan-cancer analysis of expressed somatic nucleotide variants in long intergenic non-coding RNA.

Authors:  Travers Ching; Lana X Garmire
Journal:  Pac Symp Biocomput       Date:  2018

7.  Simian T Lymphotropic Virus 1 Infection of Papio anubis: tax Sequence Heterogeneity and T Cell Recognition.

Authors:  James M Termini; Diogo M Magnani; Helen S Maxwell; William Lauer; Iris Castro; Jerilyn Pecotte; Glen N Barber; David I Watkins; Ronald C Desrosiers
Journal:  J Virol       Date:  2017-09-27       Impact factor: 5.103

8.  Exome sequencing revealed a novel nonsense variant in ALX3 gene underlying frontorhiny.

Authors:  Asmat Ullah; Muhammad Umair; Umm E-Kalsoom; Shaheen Shahzad; Sulman Basit; Wasim Ahmad
Journal:  J Hum Genet       Date:  2017-11-16       Impact factor: 3.172

9.  Novel Mutation in FLNC (Filamin C) Causes Familial Restrictive Cardiomyopathy.

Authors:  Nathan R Tucker; Micheal A McLellan; Dongjian Hu; Jiangchuan Ye; Victoria A Parsons; Robert W Mills; Sebastian Clauss; Elena Dolmatova; Marisa A Shea; David J Milan; Nandita S Scott; Mark Lindsay; Steven A Lubitz; Ibrahim J Domian; James R Stone; Honghuang Lin; Patrick T Ellinor
Journal:  Circ Cardiovasc Genet       Date:  2017-12

10.  An Alzheimer's Disease-Linked Loss-of-Function CLN5 Variant Impairs Cathepsin D Maturation, Consistent with a Retromer Trafficking Defect.

Authors:  Yasir H Qureshi; Vivek M Patel; Diego E Berman; Milankumar J Kothiya; Jessica L Neufeld; Badri Vardarajan; Min Tang; Dolly Reyes-Dumeyer; Rafael Lantigua; Martin Medrano; Ivonne J Jiménez-Velázquez; Scott A Small; Christiane Reitz
Journal:  Mol Cell Biol       Date:  2018-09-28       Impact factor: 4.272

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.