Jan Christian Kässens1,2, Lars Wienbrandt1, David Ellinghaus1,3. 1. Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Rosalind-Franklin-Str. 12, 24105 Kiel, Germany. 2. Haematology Lab Kiel, Klinik für Innere Medizin II, University Hospital Schleswig-Holstein, Langer Segen 8-10, 24105 Kiel, Germany. 3. Novo Nordisk Foundation Center for Protein Research, Disease Systems Biology, Faculty of Health and Medical Sciences, University of Copenhagen, Blegdamsvej 3b, 2200 Copenhagen, Denmark.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.
BACKGROUND: Genome-wide association studies (GWAS) and phenome-wide association studies (PheWAS) involving 1 million GWAS samples from dozens of population-based biobanks present a considerable computational challenge and are carried out by large scientific groups under great expenditure of time and personnel. Automating these processes requires highly efficient and scalable methods and software, but so far there is no workflow solution to easily process 1 million GWAS samples. RESULTS: Here we present BIGwas, a portable, fully automated quality control and association testing pipeline for large-scale binary and quantitative trait GWAS data provided by biobank resources. By using Nextflow workflow and Singularity software container technology, BIGwas performs resource-efficient and reproducible analyses on a local computer or any high-performance compute (HPC) system with just 1 command, with no need to manually install a software execution environment or various software packages. For a single-command GWAS analysis with 974,818 individuals and 92 million genetic markers, BIGwas takes ∼16 days on a small HPC system with only 7 compute nodes to perform a complete GWAS QC and association analysis protocol. Our dynamic parallelization approach enables shorter runtimes for large HPCs. CONCLUSIONS: Researchers without extensive bioinformatics knowledge and with few computer resources can use BIGwas to perform multi-cohort GWAS with 1 million GWAS samples and, if desired, use it to build their own (genome-wide) PheWAS resource. BIGwas is freely available for download from http://github.com/ikmb/gwas-qc and http://github.com/ikmb/gwas-assoc.
Authors: Peter J van der Most; Ahmad Vaez; Bram P Prins; M Loretto Munoz; Harold Snieder; Behrooz Z Alizadeh; Ilja M Nolte Journal: Bioinformatics Date: 2014-01-05 Impact factor: 6.937
Authors: Wei Zhou; Zhangchen Zhao; Jonas B Nielsen; Lars G Fritsche; Jonathon LeFaive; Sarah A Gagliano Taliun; Wenjian Bi; Maiken E Gabrielsen; Mark J Daly; Benjamin M Neale; Kristian Hveem; Goncalo R Abecasis; Cristen J Willer; Seunggeun Lee Journal: Nat Genet Date: 2020-05-18 Impact factor: 38.330
Authors: Carl A Anderson; Fredrik H Pettersson; Geraldine M Clarke; Lon R Cardon; Andrew P Morris; Krina T Zondervan Journal: Nat Protoc Date: 2010-08-26 Impact factor: 13.491
Authors: Sayantan Das; Lukas Forer; Sebastian Schönherr; Carlo Sidore; Adam E Locke; Alan Kwong; Scott I Vrieze; Emily Y Chew; Shawn Levy; Matt McGue; David Schlessinger; Dwight Stambolian; Po-Ru Loh; William G Iacono; Anand Swaroop; Laura J Scott; Francesco Cucca; Florian Kronenberg; Michael Boehnke; Gonçalo R Abecasis; Christian Fuchsberger Journal: Nat Genet Date: 2016-08-29 Impact factor: 38.330
Authors: Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin Journal: Bioinformatics Date: 2011-06-07 Impact factor: 6.937
Authors: Petr Danecek; James K Bonfield; Jennifer Liddle; John Marshall; Valeriu Ohan; Martin O Pollard; Andrew Whitwham; Thomas Keane; Shane A McCarthy; Robert M Davies; Heng Li Journal: Gigascience Date: 2021-02-16 Impact factor: 6.524
Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330
Authors: Evangelos Evangelou; Helen R Warren; David Mosen-Ansorena; Borbala Mifsud; Raha Pazoki; He Gao; Georgios Ntritsos; Niki Dimou; Claudia P Cabrera; Ibrahim Karaman; Fu Liang Ng; Marina Evangelou; Katarzyna Witkowska; Evan Tzanis; Jacklyn N Hellwege; Ayush Giri; Digna R Velez Edwards; Yan V Sun; Kelly Cho; J Michael Gaziano; Peter W F Wilson; Philip S Tsao; Csaba P Kovesdy; Tonu Esko; Reedik Mägi; Lili Milani; Peter Almgren; Thibaud Boutin; Stéphanie Debette; Jun Ding; Franco Giulianini; Elizabeth G Holliday; Anne U Jackson; Ruifang Li-Gao; Wei-Yu Lin; Jian'an Luan; Massimo Mangino; Christopher Oldmeadow; Bram Peter Prins; Yong Qian; Muralidharan Sargurupremraj; Nabi Shah; Praveen Surendran; Sébastien Thériault; Niek Verweij; Sara M Willems; Jing-Hua Zhao; Philippe Amouyel; John Connell; Renée de Mutsert; Alex S F Doney; Martin Farrall; Cristina Menni; Andrew D Morris; Raymond Noordam; Guillaume Paré; Neil R Poulter; Denis C Shields; Alice Stanton; Simon Thom; Gonçalo Abecasis; Najaf Amin; Dan E Arking; Kristin L Ayers; Caterina M Barbieri; Chiara Batini; Joshua C Bis; Tineka Blake; Murielle Bochud; Michael Boehnke; Eric Boerwinkle; Dorret I Boomsma; Erwin P Bottinger; Peter S Braund; Marco Brumat; Archie Campbell; Harry Campbell; Aravinda Chakravarti; John C Chambers; Ganesh Chauhan; Marina Ciullo; Massimiliano Cocca; Francis Collins; Heather J Cordell; Gail Davies; Martin H de Borst; Eco J de Geus; Ian J Deary; Joris Deelen; Fabiola Del Greco M; Cumhur Yusuf Demirkale; Marcus Dörr; Georg B Ehret; Roberto Elosua; Stefan Enroth; A Mesut Erzurumluoglu; Teresa Ferreira; Mattias Frånberg; Oscar H Franco; Ilaria Gandin; Paolo Gasparini; Vilmantas Giedraitis; Christian Gieger; Giorgia Girotto; Anuj Goel; Alan J Gow; Vilmundur Gudnason; Xiuqing Guo; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Sarah E Harris; Catharina A Hartman; Aki S Havulinna; Andrew A Hicks; Edith Hofer; Albert Hofman; Jouke-Jan Hottenga; Jennifer E Huffman; Shih-Jen Hwang; Erik Ingelsson; Alan James; Rick Jansen; Marjo-Riitta Jarvelin; Roby Joehanes; Åsa Johansson; Andrew D Johnson; Peter K Joshi; Pekka Jousilahti; J Wouter Jukema; Antti Jula; Mika Kähönen; Sekar Kathiresan; Bernard D Keavney; Kay-Tee Khaw; Paul Knekt; Joanne Knight; Ivana Kolcic; Jaspal S Kooner; Seppo Koskinen; Kati Kristiansson; Zoltan Kutalik; Maris Laan; Marty Larson; Lenore J Launer; Benjamin Lehne; Terho Lehtimäki; David C M Liewald; Li Lin; Lars Lind; Cecilia M Lindgren; YongMei Liu; Ruth J F Loos; Lorna M Lopez; Yingchang Lu; Leo-Pekka Lyytikäinen; Anubha Mahajan; Chrysovalanto Mamasoula; Jaume Marrugat; Jonathan Marten; Yuri Milaneschi; Anna Morgan; Andrew P Morris; Alanna C Morrison; Peter J Munson; Mike A Nalls; Priyanka Nandakumar; Christopher P Nelson; Teemu Niiranen; Ilja M Nolte; Teresa Nutile; Albertine J Oldehinkel; Ben A Oostra; Paul F O'Reilly; Elin Org; Sandosh Padmanabhan; Walter Palmas; Aarno Palotie; Alison Pattie; Brenda W J H Penninx; Markus Perola; Annette Peters; Ozren Polasek; Peter P Pramstaller; Quang Tri Nguyen; Olli T Raitakari; Meixia Ren; Rainer Rettig; Kenneth Rice; Paul M Ridker; Janina S Ried; Harriëtte Riese; Samuli Ripatti; Antonietta Robino; Lynda M Rose; Jerome I Rotter; Igor Rudan; Daniela Ruggiero; Yasaman Saba; Cinzia F Sala; Veikko Salomaa; Nilesh J Samani; Antti-Pekka Sarin; Reinhold Schmidt; Helena Schmidt; Nick Shrine; David Siscovick; Albert V Smith; Harold Snieder; Siim Sõber; Rossella Sorice; John M Starr; David J Stott; David P Strachan; Rona J Strawbridge; Johan Sundström; Morris A Swertz; Kent D Taylor; Alexander Teumer; Martin D Tobin; Maciej Tomaszewski; Daniela Toniolo; Michela Traglia; Stella Trompet; Jaakko Tuomilehto; Christophe Tzourio; André G Uitterlinden; Ahmad Vaez; Peter J van der Most; Cornelia M van Duijn; Anne-Claire Vergnaud; Germaine C Verwoert; Veronique Vitart; Uwe Völker; Peter Vollenweider; Dragana Vuckovic; Hugh Watkins; Sarah H Wild; Gonneke Willemsen; James F Wilson; Alan F Wright; Jie Yao; Tatijana Zemunik; Weihua Zhang; John R Attia; Adam S Butterworth; Daniel I Chasman; David Conen; Francesco Cucca; John Danesh; Caroline Hayward; Joanna M M Howson; Markku Laakso; Edward G Lakatta; Claudia Langenberg; Olle Melander; Dennis O Mook-Kanamori; Colin N A Palmer; Lorenz Risch; Robert A Scott; Rodney J Scott; Peter Sever; Tim D Spector; Pim van der Harst; Nicholas J Wareham; Eleftheria Zeggini; Daniel Levy; Patricia B Munroe; Christopher Newton-Cheh; Morris J Brown; Andres Metspalu; Adriana M Hung; Christopher J O'Donnell; Todd L Edwards; Bruce M Psaty; Ioanna Tzoulaki; Michael R Barnes; Louise V Wain; Paul Elliott; Mark J Caulfield Journal: Nat Genet Date: 2018-09-17 Impact factor: 41.307
Authors: Derek Klarin; Scott M Damrauer; Kelly Cho; Yan V Sun; Tanya M Teslovich; Jacqueline Honerlaw; David R Gagnon; Scott L DuVall; Jin Li; Gina M Peloso; Mark Chaffin; Aeron M Small; Jie Huang; Hua Tang; Julie A Lynch; Yuk-Lam Ho; Dajiang J Liu; Connor A Emdin; Alexander H Li; Jennifer E Huffman; Jennifer S Lee; Pradeep Natarajan; Rajiv Chowdhury; Danish Saleheen; Marijana Vujkovic; Aris Baras; Saiju Pyarajan; Emanuele Di Angelantonio; Benjamin M Neale; Aliya Naheed; Amit V Khera; John Danesh; Kyong-Mi Chang; Gonçalo Abecasis; Cristen Willer; Frederick E Dewey; David J Carey; John Concato; J Michael Gaziano; Christopher J O'Donnell; Philip S Tsao; Sekar Kathiresan; Daniel J Rader; Peter W F Wilson; Themistocles L Assimes Journal: Nat Genet Date: 2018-10-01 Impact factor: 38.330
Authors: Taylor R Thomas; Tanner Koomar; Lucas G Casten; Ashton J Tener; Ethan Bahl; Jacob J Michaelson Journal: Transl Psychiatry Date: 2022-06-13 Impact factor: 7.989