Literature DB >> 33416854

GWASinspector: comprehensive quality control of genome-wide association study results.

Alireza Ani1,2, Peter J van der Most1, Harold Snieder1, Ahmad Vaez1,2, Ilja M Nolte1.   

Abstract

SUMMARY: Quality control (QC) of genome wide association study (GWAS) result files has become increasingly difficult due to advances in genomic technology. The main challenges include continuous increases in the number of polymorphic genetic variants contained in recent GWASs and reference panels, the rising number of cohorts participating in a GWAS consortium, and inclusion of new variant types. Here, we present GWASinspector, a flexible R package for comprehensive QC of GWAS results. This package is compatible with recent imputation reference panels, handles insertion/deletion and multi-allelic variants, provides extensive QC reports and efficiently processes big data files. Reference panels covering three human genome builds (NCBI36, GRCh37 and GRCh38) are available. GWASinspector has a user friendly design and allows easy set-up of the QC pipeline through a configuration file. In addition to checking and reporting on individual files, it can be used in preparation of a meta-analysis by testing for systemic differences between studies and generating cleaned, harmonized GWAS files. Comparison with existing GWAS QC tools shows that the main advantages of GWASinspector are its ability to more effectively deal with insertion/deletion and multi-allelic variants and its relatively low memory use.
AVAILABILITY AND IMPLEMENTATION: Our package is available at The Comprehensive R Archive Network (CRAN): https://CRAN.R-project.org/package=GWASinspector. Reference datasets and a detailed tutorial can be found at the package website at http://gwasinspector.com/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Year:  2021        PMID: 33416854      PMCID: PMC8034536          DOI: 10.1093/bioinformatics/btaa1084

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Recent genome-wide association studies (GWASs) use imputation reference panels based on next-generation sequencing technology. This has created a number of difficulties for quality control (QC) of the GWAS result files as a vital step of the analysis pipeline. Software packages like GWAStools (Gogarten ), GWAtoolbox (Fuchsberger ), QCGWAS (van der Most ) and EasyQC (Winkler ) have been previously developed for this purpose. However, these do not properly address current key challenges including diversity of allele frequency reference panels, inclusion of new variant types such as insertion/deletion (indel), and multi-allelic variants. Furthermore, the sheer data size of the result files as well as the reference panel(s) pose a problem. This issue is more evident in meta-analysis projects involving numerous result files from multiple sources, which warrants the need for a more time-efficient QC software. This motivated us to develop a new package for the QC of GWAS result files addressing the above mentioned shortcomings. GWASinspector is a feature-rich and easy-to-use package written in the R programming language. It evaluates GWAS result files and reports key QC metrics. Its ability to efficiently handle big data, indel and multi-allelic variants and to generate comprehensive graphic reports are the main strengths of this software package. Besides QC of single files, GWASinspector can be used in large-scale consortium projects to check for systematic differences between the reported results from different cohorts and generate cleaned, harmonized GWAS files ready for meta-analysis.

2 Implementation

GWASinspector is developed using S4 object models in R and is publicly available from the Comprehensive R Archive Network (CRAN). In addition, the website at http://GWASinspector.com provides reference databases alongside a detailed tutorial. It is designed to be friendly to use even for users with minimal programming background. All standard delimited text file formats, either raw or compressed as gzip files, are supported for analysis. User options and QC parameters are controlled through a configuration file. A sample configuration file is embedded in the package as an example. This file comes with full internal documentation in the form of comments and examples to make customization easy for novice users. A schematic view of the package is presented in Figure 1. More details on GWASinspector features, comparison with other packages, and sample QC reports are provided in the Supplementary Material.
Fig 1.

Components of GWASinspector. Contributing packages for each function are named on the dashed lines and are all available from the Comprehensive R Archive Network (https://cran.r-project.org). Abbreviations: std. = standard; alt. = alternate

Components of GWASinspector. Contributing packages for each function are named on the dashed lines and are all available from the Comprehensive R Archive Network (https://cran.r-project.org). Abbreviations: std. = standard; alt. = alternate

2.1 Methods

The validity of a GWAS result file can be compromised by accidental mix-up of columns, improper data merging, incorrect statistical analysis, duplicated records, missing data, variant imputation problems, study-level problems like population stratification or, in case of meta-analysis, inconsistency between participating studies. Thus, strict QC procedures are required. The first step includes checking the consistency and integrity of the files. Next, unusable data, including duplicated variants or variants that miss crucial information, are removed. The remaining data are then compared with the variant reference databases for allele and frequency matching, and (optionally) effect sizes are compared to previously published results. Harmonized marker IDs are generated using the combination of chromosome, position and type, for efficient variant matching with the reference datasets, and for handling multi-allelic and indel variants. GWASinspector will automatically generate (i) cleaned, harmonized GWAS files; and (ii) a variety of QC reports, statistics and plots, e.g. variant quality distribution plots, allele frequency correlation plots, Manhattan and QQ plots, genomic control reports, between-study comparison reports, etc. All important events are captured in a log file to monitor every step of the analysis process and to localize possible problems.

2.2 Reference datasets

GWASinspector comes with a variety of prepared reference datasets covering different human genome builds (NCBI36, GRCh37 and GRCh38), different resources (HapMap, 1000G, dbSNP, HRC, UK10K and TOPMED) and more importantly different variant types (multi-allelic and indel variants). These reference datasets are used to check alleles as well as allele frequencies to ensure they are all in the same configuration. We made use of the SQLite engine (https://www.sqlite.org) to generate the reference dataset because it is fast, reliable and portable across different platforms. Similarly, previously published GWAS results can be used to generate variant effect-size reference datasets, in order to check the validity of the reported data. As a running example, effect-size reference datasets for heart rate variability (HRV) measures (Nolte ) and blood pressure (Evangelou ) were prepared via the data available from the GWAS catalogue (https://www.ebi.ac.uk/gwas/).

2.3 Output report files

A detailed report of the QC results is automatically saved as easy-to-read text, Excel and HTML files. The HTML version is the most complete report as it contains both QC summary report and plots in one organized portable file (see Supplementary Material for sample reports). In addition to separate reports for each GWAS file, a between-study comparison report is also created.

2.4 System requirements

GWASinspector is a cross-platform package with minor dependencies and can be run on a standard personal computer. However, to efficiently analyze a full-sized GWAS result file, a computer equipped with 64-bit operating system, Intel Core i7 CPU or equivalent, and at least 36 Gigabytes of RAM is recommended. Time estimate for inspection of a file containing approximately 20 million records, using a reference panel with approximately 80 million variants, is around 30 min on a high-performance computer (less if plots are skipped).

3 Usage

A demo function and sample data are available to explain the package and explore its features. A fast run on the first 1000 lines of a dataset can be done prior to full inspection, to check if it is correctly configured. This package has been successfully applied for the QC of approximately 500 GWAS result files coming from 23 cohorts in the second meta-analysis of the Genetic Variance in Heart Rate Variability (VgHRV) consortium (Nolte ). Click here for additional data file.
  6 in total

1.  GWAtoolbox: an R package for fast quality control and handling of genome-wide association studies meta-analysis data.

Authors:  Christian Fuchsberger; Daniel Taliun; Peter P Pramstaller; Cristian Pattaro
Journal:  Bioinformatics       Date:  2011-12-08       Impact factor: 6.937

2.  QCGWAS: A flexible R package for automated quality control of genome-wide association results.

Authors:  Peter J van der Most; Ahmad Vaez; Bram P Prins; M Loretto Munoz; Harold Snieder; Behrooz Z Alizadeh; Ilja M Nolte
Journal:  Bioinformatics       Date:  2014-01-05       Impact factor: 6.937

3.  GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies.

Authors:  Stephanie M Gogarten; Tushar Bhangale; Matthew P Conomos; Cecelia A Laurie; Caitlin P McHugh; Ian Painter; Xiuwen Zheng; David R Crosslin; David Levine; Thomas Lumley; Sarah C Nelson; Kenneth Rice; Jess Shen; Rohit Swarnkar; Bruce S Weir; Cathy C Laurie
Journal:  Bioinformatics       Date:  2012-10-10       Impact factor: 6.937

4.  Quality control and conduct of genome-wide association meta-analyses.

Authors:  Thomas W Winkler; Felix R Day; Damien C Croteau-Chonka; Andrew R Wood; Adam E Locke; Reedik Mägi; Teresa Ferreira; Tove Fall; Mariaelisa Graff; Anne E Justice; Jian'an Luan; Stefan Gustafsson; Joshua C Randall; Sailaja Vedantam; Tsegaselassie Workalemahu; Tuomas O Kilpeläinen; André Scherag; Tonu Esko; Zoltán Kutalik; Iris M Heid; Ruth J F Loos
Journal:  Nat Protoc       Date:  2014-04-24       Impact factor: 13.491

5.  Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits.

Authors:  Evangelos Evangelou; Helen R Warren; David Mosen-Ansorena; Borbala Mifsud; Raha Pazoki; He Gao; Georgios Ntritsos; Niki Dimou; Claudia P Cabrera; Ibrahim Karaman; Fu Liang Ng; Marina Evangelou; Katarzyna Witkowska; Evan Tzanis; Jacklyn N Hellwege; Ayush Giri; Digna R Velez Edwards; Yan V Sun; Kelly Cho; J Michael Gaziano; Peter W F Wilson; Philip S Tsao; Csaba P Kovesdy; Tonu Esko; Reedik Mägi; Lili Milani; Peter Almgren; Thibaud Boutin; Stéphanie Debette; Jun Ding; Franco Giulianini; Elizabeth G Holliday; Anne U Jackson; Ruifang Li-Gao; Wei-Yu Lin; Jian'an Luan; Massimo Mangino; Christopher Oldmeadow; Bram Peter Prins; Yong Qian; Muralidharan Sargurupremraj; Nabi Shah; Praveen Surendran; Sébastien Thériault; Niek Verweij; Sara M Willems; Jing-Hua Zhao; Philippe Amouyel; John Connell; Renée de Mutsert; Alex S F Doney; Martin Farrall; Cristina Menni; Andrew D Morris; Raymond Noordam; Guillaume Paré; Neil R Poulter; Denis C Shields; Alice Stanton; Simon Thom; Gonçalo Abecasis; Najaf Amin; Dan E Arking; Kristin L Ayers; Caterina M Barbieri; Chiara Batini; Joshua C Bis; Tineka Blake; Murielle Bochud; Michael Boehnke; Eric Boerwinkle; Dorret I Boomsma; Erwin P Bottinger; Peter S Braund; Marco Brumat; Archie Campbell; Harry Campbell; Aravinda Chakravarti; John C Chambers; Ganesh Chauhan; Marina Ciullo; Massimiliano Cocca; Francis Collins; Heather J Cordell; Gail Davies; Martin H de Borst; Eco J de Geus; Ian J Deary; Joris Deelen; Fabiola Del Greco M; Cumhur Yusuf Demirkale; Marcus Dörr; Georg B Ehret; Roberto Elosua; Stefan Enroth; A Mesut Erzurumluoglu; Teresa Ferreira; Mattias Frånberg; Oscar H Franco; Ilaria Gandin; Paolo Gasparini; Vilmantas Giedraitis; Christian Gieger; Giorgia Girotto; Anuj Goel; Alan J Gow; Vilmundur Gudnason; Xiuqing Guo; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Sarah E Harris; Catharina A Hartman; Aki S Havulinna; Andrew A Hicks; Edith Hofer; Albert Hofman; Jouke-Jan Hottenga; Jennifer E Huffman; Shih-Jen Hwang; Erik Ingelsson; Alan James; Rick Jansen; Marjo-Riitta Jarvelin; Roby Joehanes; Åsa Johansson; Andrew D Johnson; Peter K Joshi; Pekka Jousilahti; J Wouter Jukema; Antti Jula; Mika Kähönen; Sekar Kathiresan; Bernard D Keavney; Kay-Tee Khaw; Paul Knekt; Joanne Knight; Ivana Kolcic; Jaspal S Kooner; Seppo Koskinen; Kati Kristiansson; Zoltan Kutalik; Maris Laan; Marty Larson; Lenore J Launer; Benjamin Lehne; Terho Lehtimäki; David C M Liewald; Li Lin; Lars Lind; Cecilia M Lindgren; YongMei Liu; Ruth J F Loos; Lorna M Lopez; Yingchang Lu; Leo-Pekka Lyytikäinen; Anubha Mahajan; Chrysovalanto Mamasoula; Jaume Marrugat; Jonathan Marten; Yuri Milaneschi; Anna Morgan; Andrew P Morris; Alanna C Morrison; Peter J Munson; Mike A Nalls; Priyanka Nandakumar; Christopher P Nelson; Teemu Niiranen; Ilja M Nolte; Teresa Nutile; Albertine J Oldehinkel; Ben A Oostra; Paul F O'Reilly; Elin Org; Sandosh Padmanabhan; Walter Palmas; Aarno Palotie; Alison Pattie; Brenda W J H Penninx; Markus Perola; Annette Peters; Ozren Polasek; Peter P Pramstaller; Quang Tri Nguyen; Olli T Raitakari; Meixia Ren; Rainer Rettig; Kenneth Rice; Paul M Ridker; Janina S Ried; Harriëtte Riese; Samuli Ripatti; Antonietta Robino; Lynda M Rose; Jerome I Rotter; Igor Rudan; Daniela Ruggiero; Yasaman Saba; Cinzia F Sala; Veikko Salomaa; Nilesh J Samani; Antti-Pekka Sarin; Reinhold Schmidt; Helena Schmidt; Nick Shrine; David Siscovick; Albert V Smith; Harold Snieder; Siim Sõber; Rossella Sorice; John M Starr; David J Stott; David P Strachan; Rona J Strawbridge; Johan Sundström; Morris A Swertz; Kent D Taylor; Alexander Teumer; Martin D Tobin; Maciej Tomaszewski; Daniela Toniolo; Michela Traglia; Stella Trompet; Jaakko Tuomilehto; Christophe Tzourio; André G Uitterlinden; Ahmad Vaez; Peter J van der Most; Cornelia M van Duijn; Anne-Claire Vergnaud; Germaine C Verwoert; Veronique Vitart; Uwe Völker; Peter Vollenweider; Dragana Vuckovic; Hugh Watkins; Sarah H Wild; Gonneke Willemsen; James F Wilson; Alan F Wright; Jie Yao; Tatijana Zemunik; Weihua Zhang; John R Attia; Adam S Butterworth; Daniel I Chasman; David Conen; Francesco Cucca; John Danesh; Caroline Hayward; Joanna M M Howson; Markku Laakso; Edward G Lakatta; Claudia Langenberg; Olle Melander; Dennis O Mook-Kanamori; Colin N A Palmer; Lorenz Risch; Robert A Scott; Rodney J Scott; Peter Sever; Tim D Spector; Pim van der Harst; Nicholas J Wareham; Eleftheria Zeggini; Daniel Levy; Patricia B Munroe; Christopher Newton-Cheh; Morris J Brown; Andres Metspalu; Adriana M Hung; Christopher J O'Donnell; Todd L Edwards; Bruce M Psaty; Ioanna Tzoulaki; Michael R Barnes; Louise V Wain; Paul Elliott; Mark J Caulfield
Journal:  Nat Genet       Date:  2018-09-17       Impact factor: 41.307

6.  Genetic loci associated with heart rate variability and their effects on cardiac disease risk.

Authors:  Ilja M Nolte; M Loretto Munoz; Vinicius Tragante; Azmeraw T Amare; Rick Jansen; Ahmad Vaez; Benedikt von der Heyde; Christy L Avery; Joshua C Bis; Bram Dierckx; Jenny van Dongen; Stephanie M Gogarten; Philippe Goyette; Jussi Hernesniemi; Ville Huikari; Shih-Jen Hwang; Deepali Jaju; Kathleen F Kerr; Alexander Kluttig; Bouwe P Krijthe; Jitender Kumar; Sander W van der Laan; Leo-Pekka Lyytikäinen; Adam X Maihofer; Arpi Minassian; Peter J van der Most; Martina Müller-Nurasyid; Michel Nivard; Erika Salvi; James D Stewart; Julian F Thayer; Niek Verweij; Andrew Wong; Delilah Zabaneh; Mohammad H Zafarmand; Abdel Abdellaoui; Sulayma Albarwani; Christine Albert; Alvaro Alonso; Foram Ashar; Juha Auvinen; Tomas Axelsson; Dewleen G Baker; Paul I W de Bakker; Matteo Barcella; Riad Bayoumi; Rob J Bieringa; Dorret Boomsma; Gabrielle Boucher; Annie R Britton; Ingrid Christophersen; Andrea Dietrich; George B Ehret; Patrick T Ellinor; Markku Eskola; Janine F Felix; John S Floras; Oscar H Franco; Peter Friberg; Maaike G J Gademan; Mark A Geyer; Vilmantas Giedraitis; Catharina A Hartman; Daiane Hemerich; Albert Hofman; Jouke-Jan Hottenga; Heikki Huikuri; Nina Hutri-Kähönen; Xavier Jouven; Juhani Junttila; Markus Juonala; Antti M Kiviniemi; Jan A Kors; Meena Kumari; Tatiana Kuznetsova; Cathy C Laurie; Joop D Lefrandt; Yong Li; Yun Li; Duanping Liao; Marian C Limacher; Henry J Lin; Cecilia M Lindgren; Steven A Lubitz; Anubha Mahajan; Barbara McKnight; Henriette Meyer Zu Schwabedissen; Yuri Milaneschi; Nina Mononen; Andrew P Morris; Mike A Nalls; Gerjan Navis; Melanie Neijts; Kjell Nikus; Kari E North; Daniel T O'Connor; Johan Ormel; Siegfried Perz; Annette Peters; Bruce M Psaty; Olli T Raitakari; Victoria B Risbrough; Moritz F Sinner; David Siscovick; Johannes H Smit; Nicholas L Smith; Elsayed Z Soliman; Nona Sotoodehnia; Jan A Staessen; Phyllis K Stein; Adrienne M Stilp; Katarzyna Stolarz-Skrzypek; Konstantin Strauch; Johan Sundström; Cees A Swenne; Ann-Christine Syvänen; Jean-Claude Tardif; Kent D Taylor; Alexander Teumer; Timothy A Thornton; Lesley E Tinker; André G Uitterlinden; Jessica van Setten; Andreas Voss; Melanie Waldenberger; Kirk C Wilhelmsen; Gonneke Willemsen; Quenna Wong; Zhu-Ming Zhang; Alan B Zonderman; Daniele Cusi; Michele K Evans; Halina K Greiser; Pim van der Harst; Mohammad Hassan; Erik Ingelsson; Marjo-Riitta Järvelin; Stefan Kääb; Mika Kähönen; Mika Kivimaki; Charles Kooperberg; Diana Kuh; Terho Lehtimäki; Lars Lind; Caroline M Nievergelt; Chris J O'Donnell; Albertine J Oldehinkel; Brenda Penninx; Alexander P Reiner; Harriëtte Riese; Arie M van Roon; John D Rioux; Jerome I Rotter; Tamar Sofer; Bruno H Stricker; Henning Tiemeier; Tanja G M Vrijkotte; Folkert W Asselbergs; Bianca J J M Brundel; Susan R Heckbert; Eric A Whitsel; Marcel den Hoed; Harold Snieder; Eco J C de Geus
Journal:  Nat Commun       Date:  2017-06-14       Impact factor: 17.694

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.