Zhong Ren1, Gundula Povysil2, Joseph A Hostyk2, Hongzhu Cui2, Nitin Bhardwaj2, David B Goldstein2. 1. Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA. zhong.ren@hotmail.com. 2. Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA.
Abstract
BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. RESULTS: We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser ( http://atavdb.org/ ). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. CONCLUSIONS: Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.
BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. RESULTS: We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser ( http://atavdb.org/ ). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. CONCLUSIONS: Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.
Entities:
Keywords:
Association testing; Diagnostic; Gene discovery; Genome analysis; Web platform
Authors: Frederick E Dewey; Michael F Murray; John D Overton; Lukas Habegger; Joseph B Leader; Samantha N Fetterolf; Colm O'Dushlaine; Cristopher V Van Hout; Jeffrey Staples; Claudia Gonzaga-Jauregui; Raghu Metpally; Sarah A Pendergrass; Monica A Giovanni; H Lester Kirchner; Suganthi Balasubramanian; Noura S Abul-Husn; Dustin N Hartzel; Daniel R Lavage; Korey A Kost; Jonathan S Packer; Alexander E Lopez; John Penn; Semanti Mukherjee; Nehal Gosalia; Manoj Kanagaraj; Alexander H Li; Lyndon J Mitnaul; Lance J Adams; Thomas N Person; Kavita Praveen; Anthony Marcketta; Matthew S Lebo; Christina A Austin-Tse; Heather M Mason-Suares; Shannon Bruse; Scott Mellis; Robert Phillips; Neil Stahl; Andrew Murphy; Aris Economides; Kimberly A Skelding; Christopher D Still; James R Elmore; Ingrid B Borecki; George D Yancopoulos; F Daniel Davis; William A Faucett; Omri Gottesman; Marylyn D Ritchie; Alan R Shuldiner; Jeffrey G Reid; David H Ledbetter; Aris Baras; David J Carey Journal: Science Date: 2016-12-23 Impact factor: 47.728
Authors: Nilah M Ioannidis; Joseph H Rothstein; Vikas Pejaver; Sumit Middha; Shannon K McDonnell; Saurabh Baheti; Anthony Musolf; Qing Li; Emily Holzinger; Danielle Karyadi; Lisa A Cannon-Albright; Craig C Teerlink; Janet L Stanford; William B Isaacs; Jianfeng Xu; Kathleen A Cooney; Ethan M Lange; Johanna Schleutker; John D Carpten; Isaac J Powell; Olivier Cussenot; Geraldine Cancel-Tassin; Graham G Giles; Robert J MacInnis; Christiane Maier; Chih-Lin Hsieh; Fredrik Wiklund; William J Catalona; William D Foulkes; Diptasri Mandal; Rosalind A Eeles; Zsofia Kote-Jarai; Carlos D Bustamante; Daniel J Schaid; Trevor Hastie; Elaine A Ostrander; Joan E Bailey-Wilson; Predrag Radivojac; Stephen N Thibodeau; Alice S Whittemore; Weiva Sieh Journal: Am J Hum Genet Date: 2016-09-22 Impact factor: 11.025
Authors: Tristan J Hayeck; Nicholas Stong; Charles J Wolock; Brett Copeland; Sitharthan Kamalakaran; David B Goldstein; Andrew S Allen Journal: Am J Hum Genet Date: 2019-01-24 Impact factor: 11.025
Authors: Sahar Gelfman; Quanli Wang; K Melodi McSweeney; Zhong Ren; Francesca La Carpia; Matt Halvorsen; Kelly Schoch; Fanni Ratzon; Erin L Heinzen; Michael J Boland; Slavé Petrovski; David B Goldstein Journal: Nat Commun Date: 2017-08-09 Impact factor: 14.919
Authors: Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur Journal: Nature Date: 2016-08-18 Impact factor: 49.962
Authors: Slavé Petrovski; Ayal B Gussow; Quanli Wang; Matt Halvorsen; Yujun Han; William H Weir; Andrew S Allen; David B Goldstein Journal: PLoS Genet Date: 2015-09-02 Impact factor: 5.917
Authors: Joshua Traynelis; Michael Silk; Quanli Wang; Samuel F Berkovic; Liping Liu; David B Ascher; David J Balding; Slavé Petrovski Journal: Genome Res Date: 2017-09-01 Impact factor: 9.043
Authors: Konrad J Karczewski; Laurent C Francioli; Grace Tiao; Beryl B Cummings; Jessica Alföldi; Qingbo Wang; Ryan L Collins; Kristen M Laricchia; Andrea Ganna; Daniel P Birnbaum; Laura D Gauthier; Harrison Brand; Matthew Solomonson; Nicholas A Watts; Daniel Rhodes; Moriel Singer-Berk; Eleina M England; Eleanor G Seaby; Jack A Kosmicki; Raymond K Walters; Katherine Tashman; Yossi Farjoun; Eric Banks; Timothy Poterba; Arcturus Wang; Cotton Seed; Nicola Whiffin; Jessica X Chong; Kaitlin E Samocha; Emma Pierce-Hoffman; Zachary Zappala; Anne H O'Donnell-Luria; Eric Vallabh Minikel; Ben Weisburd; Monkol Lek; James S Ware; Christopher Vittal; Irina M Armean; Louis Bergelson; Kristian Cibulskis; Kristen M Connolly; Miguel Covarrubias; Stacey Donnelly; Steven Ferriera; Stacey Gabriel; Jeff Gentry; Namrata Gupta; Thibault Jeandet; Diane Kaplan; Christopher Llanwarne; Ruchi Munshi; Sam Novod; Nikelle Petrillo; David Roazen; Valentin Ruano-Rubio; Andrea Saltzman; Molly Schleicher; Jose Soto; Kathleen Tibbetts; Charlotte Tolonen; Gordon Wade; Michael E Talkowski; Benjamin M Neale; Mark J Daly; Daniel G MacArthur Journal: Nature Date: 2020-05-27 Impact factor: 69.504
Authors: Mahmoud Koko; Joshua E Motelow; Kate E Stanley; Dheeraj R Bobbili; Ryan S Dhindsa; Patrick May Journal: Epilepsia Date: 2022-01-15 Impact factor: 5.864
Authors: Adi Cohen; Joseph Hostyk; Evan H Baugh; Christie M Buchovecky; Vimla S Aggarwal; Robert R Recker; Joan M Lappe; David W Dempster; Hua Zhou; Mafo Kamanda-Kosseh; Mariana Bucovsky; Julie Stubby; David B Goldstein; Elizabeth Shane Journal: Bone Date: 2021-11-04 Impact factor: 4.398
Authors: Amy R Kontorovich; Yingying Tang; Barbara Sampson; Bruce D Gelb; Nihir Patel; Zhanna Georgievskaya; Mariya Shadrina; Nori Williams; Arden Moscati; Inga Peter; Yuval Itan Journal: Circ Genom Precis Med Date: 2021-07-06