Cuiping Pan1,2, Gregory McInnes1,3, Nicole Deflaux4,5, Michael Snyder2,3, Jonathan Bingham4,5, Somalee Datta1,3, Philip S Tsao1,6. 1. VA Palo Alto Health Care System, Palo Alto Epidemiology Research and Information Center for Genomics, CA 94304, USA. 2. Department of Genetics. 3. Stanford Center for Genomics and Personalized Medicine, Stanford University, CA 94305, USA. 4. Google, Mountain View, CA 94043, USA. 5. Verily Life Sciences, South San Francisco, CA 94080, USA. 6. Division of Cardiovascular Medicine, Stanford University, Stanford, CA 94305, USA.
Abstract
MOTIVATION: Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. RESULTS: We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. AVAILABILITY AND IMPLEMENTATION: Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. CONTACT: cuiping@stanford.edu or ptsao@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
MOTIVATION: Large scale genomic sequencing is now widely used to decipher questions in diverse realms such as biological function, human diseases, evolution, ecosystems, and agriculture. With the quantity and diversity these data harbor, a robust and scalable data handling and analysis solution is desired. RESULTS: We present interactive analytics using a cloud-based columnar database built on Dremel to perform information compression, comprehensive quality controls, and biological information retrieval in large volumes of genomic data. We demonstrate such Big Data computing paradigms can provide orders of magnitude faster turnaround for common genomic analyses, transforming long-running batch jobs submitted via a Linux shell into questions that can be asked from a web browser in seconds. Using this method, we assessed a study population of 475 deeply sequenced human genomes for genomic call rate, genotype and allele frequency distribution, variant density across the genome, and pharmacogenomic information. AVAILABILITY AND IMPLEMENTATION: Our analysis framework is implemented in Google Cloud Platform and BigQuery. Codes are available at https://github.com/StanfordBioinformatics/mvp_aaa_codelabs. CONTACT: cuiping@stanford.edu or ptsao@stanford.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.
Authors: Daniel F Gudbjartsson; Hannes Helgason; Sigurjon A Gudjonsson; Florian Zink; Asmundur Oddson; Arnaldur Gylfason; Soren Besenbacher; Gisli Magnusson; Bjarni V Halldorsson; Eirikur Hjartarson; Gunnar Th Sigurdsson; Simon N Stacey; Michael L Frigge; Hilma Holm; Jona Saemundsdottir; Hafdis Th Helgadottir; Hrefna Johannsdottir; Gunnlaugur Sigfusson; Gudmundur Thorgeirsson; Jon Th Sverrisson; Solveig Gretarsdottir; G Bragi Walters; Thorunn Rafnar; Bjarni Thjodleifsson; Einar S Bjornsson; Sigurdur Olafsson; Hildur Thorarinsdottir; Thora Steingrimsdottir; Thora S Gudmundsdottir; Asgeir Theodors; Jon G Jonasson; Asgeir Sigurdsson; Gyda Bjornsdottir; Jon J Jonsson; Olafur Thorarensen; Petur Ludvigsson; Hakon Gudbjartsson; Gudmundur I Eyjolfsson; Olof Sigurdardottir; Isleifur Olafsson; David O Arnar; Olafur Th Magnusson; Augustine Kong; Gisli Masson; Unnur Thorsteinsdottir; Agnar Helgason; Patrick Sulem; Kari Stefansson Journal: Nat Genet Date: 2015-03-25 Impact factor: 38.330
Authors: Sarah S Kalia; Kathy Adelman; Sherri J Bale; Wendy K Chung; Christine Eng; James P Evans; Gail E Herman; Sophia B Hufnagel; Teri E Klein; Bruce R Korf; Kent D McKelvey; Kelly E Ormond; C Sue Richards; Christopher N Vlangos; Michael Watson; Christa L Martin; David T Miller Journal: Genet Med Date: 2016-11-17 Impact factor: 8.822
Authors: Konrad J Karczewski; Guy Haskin Fernald; Alicia R Martin; Michael Snyder; Nicholas P Tatonetti; Joel T Dudley Journal: PLoS One Date: 2014-01-15 Impact factor: 3.240
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Kevin P Kenna; Perry T C van Doormaal; Annelot M Dekker; Nicola Ticozzi; Brendan J Kenna; Frank P Diekstra; Wouter van Rheenen; Kristel R van Eijk; Ashley R Jones; Pamela Keagle; Aleksey Shatunov; William Sproviero; Bradley N Smith; Michael A van Es; Simon D Topp; Aoife Kenna; Jack W Miller; Claudia Fallini; Cinzia Tiloca; Russell L McLaughlin; Caroline Vance; Claire Troakes; Claudia Colombrita; Gabriele Mora; Andrea Calvo; Federico Verde; Safa Al-Sarraj; Andrew King; Daniela Calini; Jacqueline de Belleroche; Frank Baas; Anneke J van der Kooi; Marianne de Visser; Anneloor L M A Ten Asbroek; Peter C Sapp; Diane McKenna-Yasek; Meraida Polak; Seneshaw Asress; José Luis Muñoz-Blanco; Tim M Strom; Thomas Meitinger; Karen E Morrison; Giuseppe Lauria; Kelly L Williams; P Nigel Leigh; Garth A Nicholson; Ian P Blair; Claire S Leblond; Patrick A Dion; Guy A Rouleau; Hardev Pall; Pamela J Shaw; Martin R Turner; Kevin Talbot; Franco Taroni; Kevin B Boylan; Marka Van Blitterswijk; Rosa Rademakers; Jesús Esteban-Pérez; Alberto García-Redondo; Phillip Van Damme; Wim Robberecht; Adriano Chio; Cinzia Gellera; Carsten Drepper; Michael Sendtner; Antonia Ratti; Jonathan D Glass; Jesús S Mora; Nazli A Basak; Orla Hardiman; Albert C Ludolph; Peter M Andersen; Jochen H Weishaupt; Robert H Brown; Ammar Al-Chalabi; Vincenzo Silani; Christopher E Shaw; Leonard H van den Berg; Jan H Veldink; John E Landers Journal: Nat Genet Date: 2016-07-25 Impact factor: 41.307
Authors: Rehan Akbani; Patrick Kwok Shing Ng; Henrica M J Werner; Maria Shahmoradgoli; Fan Zhang; Zhenlin Ju; Wenbin Liu; Ji-Yeon Yang; Kosuke Yoshihara; Jun Li; Shiyun Ling; Elena G Seviour; Prahlad T Ram; John D Minna; Lixia Diao; Pan Tong; John V Heymach; Steven M Hill; Frank Dondelinger; Nicolas Städler; Lauren A Byers; Funda Meric-Bernstam; John N Weinstein; Bradley M Broom; Roeland G W Verhaak; Han Liang; Sach Mukherjee; Yiling Lu; Gordon B Mills Journal: Nat Commun Date: 2014-05-29 Impact factor: 14.919
Authors: Enis Afgan; Clare Sloggett; Nuwan Goonasekera; Igor Makunin; Derek Benson; Mark Crowe; Simon Gladman; Yousef Kowsar; Michael Pheasant; Ron Horst; Andrew Lonie Journal: PLoS One Date: 2015-10-26 Impact factor: 3.240
Authors: Amir Bahmani; Kyle Ferriter; Vandhana Krishnan; Arash Alavi; Amir Alavi; Philip S Tsao; Michael P Snyder; Cuiping Pan Journal: PLoS Comput Biol Date: 2021-05-12 Impact factor: 4.475
Authors: Inès Krissaane; Carlos De Niz; Alba Gutiérrez-Sacristán; Gabor Korodi; Nneka Ede; Ranjay Kumar; Jessica Lyons; Arjun Manrai; Chirag Patel; Isaac Kohane; Paul Avillach Journal: J Am Med Inform Assoc Date: 2020-07-27 Impact factor: 4.497
Authors: Intikhab Alam; Allan Anthony Kamau; David Kamanda Ngugi; Takashi Gojobori; Carlos M Duarte; Vladimir B Bajic Journal: Sci Rep Date: 2021-06-01 Impact factor: 4.379