Alexandros Kanterakis1, Patrick Deelen2, Freerk van Dijk3, Heorhiy Byelas4, Martijn Dijkstra5, Morris A Swertz6. 1. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. alexandros.kanterakis@gmail.com. 2. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. patrickdeelen@gmail.com. 3. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. f.van.dijk02@umcg.nl. 4. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. h.v.byelas01@umcg.nl. 5. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. m.dijkstra@umcg.nl. 6. Department of Genetics, Genomics Coordination Center, University Medical Center Groningen and University of Groningen, Genetics, UMCG, PO Box 30 001, 9700 RB, Groningen, The Netherlands. m.a.swertz@rug.nl.
Abstract
BACKGROUND: Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. RESULTS: Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. CONCLUSIONS: MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from: https://github.com/molgenis/molgenis-imputation.
BACKGROUND: Genotype imputation is an important procedure in current genomic analysis such as genome-wide association studies, meta-analyses and fine mapping. Although high quality tools are available that perform the steps of this process, considerable effort and expertise is required to set up and run a best practice imputation pipeline, particularly for larger genotype datasets, where imputation has to scale out in parallel on computer clusters. RESULTS: Here we present MOLGENIS-impute, an 'imputation in a box' solution that seamlessly and transparently automates the set up and running of all the steps of the imputation process. These steps include genome build liftover (liftovering), genotype phasing with SHAPEIT2, quality control, sample and chromosomal chunking/merging, and imputation with IMPUTE2. MOLGENIS-impute builds on MOLGENIS-compute, a simple pipeline management platform for submission and monitoring of bioinformatics tasks in High Performance Computing (HPC) environments like local/cloud servers, clusters and grids. All the required tools, data and scripts are downloaded and installed in a single step. Researchers with diverse backgrounds and expertise have tested MOLGENIS-impute on different locations and imputed over 30,000 samples so far using the 1,000 Genomes Project and new Genome of the Netherlands data as the imputation reference. The tests have been performed on PBS/SGE clusters, cloud VMs and in a grid HPC environment. CONCLUSIONS: MOLGENIS-impute gives priority to the ease of setting up, configuring and running an imputation. It has minimal dependencies and wraps the pipeline in a simple command line interface, without sacrificing flexibility to adapt or limiting the options of underlying imputation tools. It does not require knowledge of a workflow system or programming, and is targeted at researchers who just want to apply best practices in imputation via simple commands. It is built on the MOLGENIS compute workflow framework to enable customization with additional computational steps or it can be included in other bioinformatics pipelines. It is available as open source from: https://github.com/molgenis/molgenis-imputation.
Authors: Albert Hofman; Sarwa Darwish Murad; Cornelia M van Duijn; Oscar H Franco; André Goedegebure; M Arfan Ikram; Caroline C W Klaver; Tamar E C Nijsten; Robin P Peeters; Bruno H Ch Stricker; Henning W Tiemeier; André G Uitterlinden; Meike W Vernooij Journal: Eur J Epidemiol Date: 2013-11-21 Impact factor: 8.082
Authors: Christopher T Johansen; Jian Wang; Matthew B Lanktree; Henian Cao; Adam D McIntyre; Matthew R Ban; Rebecca A Martins; Brooke A Kennedy; Reina G Hassell; Maartje E Visser; Stephen M Schwartz; Benjamin F Voight; Roberto Elosua; Veikko Salomaa; Christopher J O'Donnell; Geesje M Dallinga-Thie; Sonia S Anand; Salim Yusuf; Murray W Huff; Sekar Kathiresan; Robert A Hegele Journal: Nat Genet Date: 2010-07-25 Impact factor: 38.330
Authors: Katherine Wolstencroft; Robert Haines; Donal Fellows; Alan Williams; David Withers; Stuart Owen; Stian Soiland-Reyes; Ian Dunlop; Aleksandra Nenadic; Paul Fisher; Jiten Bhagat; Khalid Belhajjame; Finn Bacall; Alex Hardisty; Abraham Nieva de la Hidalga; Maria P Balcazar Vargas; Shoaib Sufi; Carole Goble Journal: Nucleic Acids Res Date: 2013-05-02 Impact factor: 16.971
Authors: Greg Wilson; D A Aruliah; C Titus Brown; Neil P Chue Hong; Matt Davis; Richard T Guy; Steven H D Haddock; Kathryn D Huff; Ian M Mitchell; Mark D Plumbley; Ben Waugh; Ethan P White; Paul Wilson Journal: PLoS Biol Date: 2014-01-07 Impact factor: 8.029
Authors: Dana B Hancock; Joshua L Levy; Nathan C Gaddis; Laura J Bierut; Nancy L Saccone; Grier P Page; Eric O Johnson Journal: PLoS One Date: 2012-11-30 Impact factor: 3.240
Authors: Jared O'Connell; Deepti Gurdasani; Olivier Delaneau; Nicola Pirastu; Sheila Ulivi; Massimiliano Cocca; Michela Traglia; Jie Huang; Jennifer E Huffman; Igor Rudan; Ruth McQuillan; Ross M Fraser; Harry Campbell; Ozren Polasek; Gershim Asiki; Kenneth Ekoru; Caroline Hayward; Alan F Wright; Veronique Vitart; Pau Navarro; Jean-Francois Zagury; James F Wilson; Daniela Toniolo; Paolo Gasparini; Nicole Soranzo; Manjinder S Sandhu; Jonathan Marchini Journal: PLoS Genet Date: 2014-04-17 Impact factor: 5.917
Authors: Dorret I Boomsma; Cisca Wijmenga; Eline P Slagboom; Morris A Swertz; Lennart C Karssen; Abdel Abdellaoui; Kai Ye; Victor Guryev; Martijn Vermaat; Freerk van Dijk; Laurent C Francioli; Jouke Jan Hottenga; Jeroen F J Laros; Qibin Li; Yingrui Li; Hongzhi Cao; Ruoyan Chen; Yuanping Du; Ning Li; Sujie Cao; Jessica van Setten; Androniki Menelaou; Sara L Pulit; Jayne Y Hehir-Kwa; Marian Beekman; Clara C Elbers; Heorhiy Byelas; Anton J M de Craen; Patrick Deelen; Martijn Dijkstra; Johan T den Dunnen; Peter de Knijff; Jeanine Houwing-Duistermaat; Vyacheslav Koval; Karol Estrada; Albert Hofman; Alexandros Kanterakis; David van Enckevort; Hailiang Mai; Mathijs Kattenberg; Elisabeth M van Leeuwen; Pieter B T Neerincx; Ben Oostra; Fernanodo Rivadeneira; Eka H D Suchiman; Andre G Uitterlinden; Gonneke Willemsen; Bruce H Wolffenbuttel; Jun Wang; Paul I W de Bakker; Gert-Jan van Ommen; Cornelia M van Duijn Journal: Eur J Hum Genet Date: 2013-05-29 Impact factor: 4.246
Authors: Patrick Deelen; Marc Jan Bonder; K Joeri van der Velde; Harm-Jan Westra; Erwin Winder; Dennis Hendriksen; Lude Franke; Morris A Swertz Journal: BMC Res Notes Date: 2014-12-11
Authors: Patrick Deelen; Androniki Menelaou; Elisabeth M van Leeuwen; Alexandros Kanterakis; Freerk van Dijk; Carolina Medina-Gomez; Laurent C Francioli; Jouke Jan Hottenga; Lennart C Karssen; Karol Estrada; Eskil Kreiner-Møller; Fernando Rivadeneira; Jessica van Setten; Javier Gutierrez-Achury; Harm-Jan Westra; Lude Franke; David van Enckevort; Martijn Dijkstra; Heorhiy Byelas; Cornelia M van Duijn; Paul I W de Bakker; Cisca Wijmenga; Morris A Swertz Journal: Eur J Hum Genet Date: 2014-06-04 Impact factor: 4.246
Authors: Tessel E Galesloot; Sita H Vermeulen; Dorine W Swinkels; F de Vegt; B Franke; M den Heijer; J de Graaf; André L M Verbeek; Lambertus A L M Kiemeney Journal: Int J Epidemiol Date: 2017-08-01 Impact factor: 7.196
Authors: Guia Guffanti; Poornima Kumar; Roee Admon; Michael T Treadway; Mei H Hall; Malavika Mehta; Samuel Douglas; Amanda R Arulpragasam; Diego A Pizzagalli Journal: Transl Psychiatry Date: 2019-09-19 Impact factor: 6.222
Authors: Jacqueline M Lane; Irma Vlasac; Simon G Anderson; Simon D Kyle; William G Dixon; David A Bechtold; Shubhroz Gill; Max A Little; Annemarie Luik; Andrew Loudon; Richard Emsley; Frank A J L Scheer; Deborah A Lawlor; Susan Redline; David W Ray; Martin K Rutter; Richa Saxena Journal: Nat Commun Date: 2016-03-09 Impact factor: 14.919