Literature DB >> 24894501

RAREMETAL: fast and powerful meta-analysis for rare variants.

Shuang Feng1, Dajiang Liu1, Xiaowei Zhan1, Mary Kate Wing1, Gonçalo R Abecasis1.   

Abstract

SUMMARY: RAREMETAL is a computationally efficient tool for meta-analysis of rare variants genotyped using sequencing or arrays. RAREMETAL facilitates analyses of individual studies, accommodates a variety of input file formats, handles related and unrelated individuals, executes both single variant and burden tests and performs conditional association analyses.
AVAILABILITY AND IMPLEMENTATION: http://genome.sph.umich.edu/wiki/RAREMETAL for executables, source code, documentation and tutorial.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2014        PMID: 24894501      PMCID: PMC4173011          DOI: 10.1093/bioinformatics/btu367

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

In genomewide association studies, meta-analysis has been key in establishing association between common variants and complex traits (Willer ). Recent advances in exome sequencing and the development of exome genotyping arrays are enabling complex disease studies to explore association between rare variants of clear functional consequence and complex traits. For these rare variants, single variant tests can lack power, and association tests that group rare variants by gene or functional unit are favored (Li and Leal, 2008; Lin and Tang, 2011; Madsen and Browning, 2009; Price ; Wu ). Here, we describe a tool for meta-analysis of rare variant association studies for quantitative traits. Our tool enables individual studies to account for study-specific covariates as well as family and population structure. In addition, it generates summaries of linkage disequilibrium information that allow association tests for groups of rare variants during meta-analysis.

2 METHODS

The key idea in our implementation is that gene-level test statistics can be reconstructed from single variant score statistics and that, when the linkage disequilibrium relationships between variants are known, the distribution of gene-level statistics can be derived to evaluate significance. Several other tools to support rare variant meta-analysis are now available (Lee ; Lumley ; Tang and Lin, 2013; Voorman ). We have tried to complement these tools by adding support for modeling of related individuals and the X chromosome, additional QC statistics, directly using compressed files to facilitate sharing and implementing conditional analyses to disentangle the contributions of nearby variants, common or rare. RAREMETAL works in two steps. The first step, implemented in RAREMETALWORKER (RMW), analyzes individual studies and generates summary statistics that can later be combined across studies. This step can account for relatedness among individuals or hidden population structure using a variance component approach, based on either a kinship matrix estimated from pedigree (Abecasis ) or a genomic relationship matrix estimated from marker data (Kang ; Lippert ). When chromosome X is analyzed, an additional variance component is used to describe kinship for X-linked markers. RMW tabulates single variant score statistics, which summarize evidence for association, together with covariance matrices, which summarize linkage disequilibrium relationships among variants (see our online documentation for methods http://genome.sph.umich.edu/wiki/RAREMETALWORKER_METHOD). RMW also tabulates quality control statistics for traits and covariates (mean, standard deviation and number of phenotyped samples) and marker genotypes (Hardy–Weinberg Equilibrium P-values and genotype missing rate). These can be used to identify problematic markers and studies during meta-analysis. Meta-analysis is implemented in a separate tool, RAREMETAL, which calculates gene-level burden tests (either weighted or unweighed), variable frequency threshold tests and sequence kernel association tests (SKAT) (Liu ). Key formulae can be found in our online documentation (http://genome.sph.umich.edu/wiki/RAREMETAL_METHOD). RAREMETAL can also use variance–covariance matrices to perform conditional analyses that distinguish true signals from the shadows of significant variants nearby.

3 RESULTS

One of our primary considerations in RAREMETAL was support for standard, easy-to-implement input formats. RMW uses Merlin format input files (Abecasis ) to retrieve phenotypes, covariates and family structure and VCF files to retrieve genotypes (Danecek ). Checks are implemented for a variety of problems in input files, including formatting errors, X-linked genotypes that are inconsistent with reported sex and matching of identifiers across files. RMW and RAREMETAL are implemented in C++. Source code and binary executables are available from our web site. For convenience, input and output files can be processed directly in GZIP format. We have also tested compilation on several Linux, MAC OS X and Windows platforms.

3.1 Usage

RMW and RAREMETAL runs can be customized through command line parameters. These allow users to specify whether phenotypes should be quantile normalized, whether covariates should be modeled, whether population and/or family structure should be controlled using variance components, the size of linkage disequilibrium matrices to be shared (customized through a window size parameter) and boundaries between pseudo-autosomal and sex-linked regions of the X chromosome. A unique feature of RAREMETAL is the ability to customize variant groupings for gene-level statistics at the meta-analysis stage, after individual studies are analyzed. RAREMETAL generates separate reports for each gene-level test with detailed information. QQ and Manhattan plots can be generated by RMW and RAREMETAL directly (see Fig. 1 for example).
Fig. 1.

Automatically generated QQ and Manhattan plots by RAREMETAL and RMW. (a) Manhattan plot from single variant analysis. (b) Manhattan plot from gene-level burden tests

Automatically generated QQ and Manhattan plots by RAREMETAL and RMW. (a) Manhattan plot from single variant analysis. (b) Manhattan plot from gene-level burden tests RAREMETAL is already being used in large meta-analyses of rare variants for a variety of traits, ranging from blood lipids levels, anthropometric traits to smoking and drinking.

3.2 Performance

Using RMW, generating per study statistics in a recent analysis of exome array genotypes at 238 000 markers in 2000 individuals required between ∼9.1 min (unrelated samples) and ∼26.8 min (using genomic relationship). Using RAREMETAL, meta-analysis of 23 studies (sample size of ∼51 000) required ∼40 min to produce single variant and all available gene-level association test results across ∼18 000 genes.

3.3 Comparison to other tools

When analyzing ∼6000 unrelated individuals at ∼100 000 markers, RMW/RAREMETAL provides a speed improvement of ∼600-fold compared with SCORESEQ/MASS (Tang and Lin, 2013). This difference in speed increases with sample size and number of studies. The R package metaSKAT (Lee ) provides comparably fast computations, but variable threshold test is not provided. An important difference between RAREMETAL and these published tools is the ability to use linear mixed models to account for sample relatedness and/or population structure. Even when using linear mixed models to account for relatedness and population structure, RMW can handle large datasets. A mixed model analysis of 10 000 individuals at 238 000 markers used 6.1 h and 2 GB memory. With 12 GB memory, RMW was able to analyze 23 000 individuals in <5 days. Other features in RAREMETAL unique to other published tools are the flexibility of changing gene definitions and grouping strategies after individual studies have been analyzed and the ability to perform conditional meta-analysis. In contrast to popular single variant meta-analysis methods, such as implemented in METAL (Willer ), our new approach is expected to provide more power for analysis of rare variants (Liu ). We hope RAREMETAL will accelerate the discovery of trait-associated rare variants, leading to insights into human biology.
  13 in total

1.  Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data.

Authors:  Bingshan Li; Suzanne M Leal
Journal:  Am J Hum Genet       Date:  2008-08-07       Impact factor: 11.025

2.  General framework for meta-analysis of rare variants in sequencing association studies.

Authors:  Seunggeun Lee; Tanya M Teslovich; Michael Boehnke; Xihong Lin
Journal:  Am J Hum Genet       Date:  2013-06-13       Impact factor: 11.025

3.  Variance component model to account for sample structure in genome-wide association studies.

Authors:  Hyun Min Kang; Jae Hoon Sul; Susan K Service; Noah A Zaitlen; Sit-Yee Kong; Nelson B Freimer; Chiara Sabatti; Eleazar Eskin
Journal:  Nat Genet       Date:  2010-03-07       Impact factor: 38.330

4.  Rare-variant association testing for sequencing data with the sequence kernel association test.

Authors:  Michael C Wu; Seunggeun Lee; Tianxi Cai; Yun Li; Michael Boehnke; Xihong Lin
Journal:  Am J Hum Genet       Date:  2011-07-07       Impact factor: 11.025

5.  A general framework for detecting disease associations with rare variants in sequencing studies.

Authors:  Dan-Yu Lin; Zheng-Zheng Tang
Journal:  Am J Hum Genet       Date:  2011-09-01       Impact factor: 11.025

6.  FaST linear mixed models for genome-wide association studies.

Authors:  Christoph Lippert; Jennifer Listgarten; Ying Liu; Carl M Kadie; Robert I Davidson; David Heckerman
Journal:  Nat Methods       Date:  2011-09-04       Impact factor: 28.547

7.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

8.  MASS: meta-analysis of score statistics for sequencing studies.

Authors:  Zheng-Zheng Tang; Dan-Yu Lin
Journal:  Bioinformatics       Date:  2013-05-21       Impact factor: 6.937

9.  Meta-analysis of gene-level tests for rare variant association.

Authors:  Dajiang J Liu; Gina M Peloso; Xiaowei Zhan; Oddgeir L Holmen; Matthew Zawistowski; Shuang Feng; Majid Nikpay; Paul L Auer; Anuj Goel; He Zhang; Ulrike Peters; Martin Farrall; Marju Orho-Melander; Charles Kooperberg; Ruth McPherson; Hugh Watkins; Cristen J Willer; Kristian Hveem; Olle Melander; Sekar Kathiresan; Gonçalo R Abecasis
Journal:  Nat Genet       Date:  2013-12-15       Impact factor: 38.330

10.  A groupwise association test for rare mutations using a weighted sum statistic.

Authors:  Bo Eskerod Madsen; Sharon R Browning
Journal:  PLoS Genet       Date:  2009-02-13       Impact factor: 5.917

View more
  58 in total

1.  Association of Early-Onset Alzheimer Disease With Elevated Low-Density Lipoprotein Cholesterol Levels and Rare Genetic Coding Variants of APOB.

Authors:  Thomas S Wingo; David J Cutler; Aliza P Wingo; Ngoc-Anh Le; Gil D Rabinovici; Bruce L Miller; James J Lah; Allan I Levey
Journal:  JAMA Neurol       Date:  2019-07-01       Impact factor: 18.302

2.  Protective coding variants in CFH and PELI3 and a variant near CTRB1 are associated with age-related macular degeneration†.

Authors:  Yi Yu; Erin K Wagner; Eric H Souied; Sanna Seitsonen; Ilkka J Immonen; Paavo Häppölä; Soumya Raychaudhuri; Mark J Daly; Johanna M Seddon
Journal:  Hum Mol Genet       Date:  2016-12-01       Impact factor: 6.150

3.  An exome-wide sequencing study of lipid response to high-fat meal and fenofibrate in Caucasians from the GOLDN cohort.

Authors:  Xin Geng; Marguerite R Irvin; Bertha Hidalgo; Stella Aslibekyan; Vinodh Srinivasasainagendra; Ping An; Alexis C Frazier-Wood; Hemant K Tiwari; Tushar Dave; Kathleen Ryan; Jose M Ordovas; Robert J Straka; Mary F Feitosa; Paul N Hopkins; Ingrid Borecki; Michael A Province; Braxton D Mitchell; Donna K Arnett; Degui Zhi
Journal:  J Lipid Res       Date:  2018-02-20       Impact factor: 5.922

4.  Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs.

Authors:  Zheng-Zheng Tang; Dan-Yu Lin
Journal:  Am J Hum Genet       Date:  2015-06-18       Impact factor: 11.025

5.  Genetics of early-onset Parkinson's disease in Finland: exome sequencing and genome-wide association study.

Authors:  Ari Siitonen; Michael A Nalls; Dena Hernández; J Raphael Gibbs; Jinhui Ding; Pauli Ylikotila; Connor Edsall; Andrew Singleton; Kari Majamaa
Journal:  Neurobiol Aging       Date:  2017-02-02       Impact factor: 4.673

6.  Targeted deep sequencing of the PEAR1 locus for platelet aggregation in European and African American families.

Authors:  Ali R Keramati; Lisa R Yanek; Kruthika Iyer; Margaret A Taub; Ingo Ruczinski; Diane M Becker; Lewis C Becker; Nauder Faraday; Rasika A Mathias
Journal:  Platelets       Date:  2018-03-19       Impact factor: 3.862

7.  Improved score statistics for meta-analysis in single-variant and gene-level association studies.

Authors:  Jingjing Yang; Sai Chen; Gonçalo Abecasis
Journal:  Genet Epidemiol       Date:  2018-04-25       Impact factor: 2.135

Review 8.  Discovery of rare variants for complex phenotypes.

Authors:  Jack A Kosmicki; Claire L Churchhouse; Manuel A Rivas; Benjamin M Neale
Journal:  Hum Genet       Date:  2016-05-24       Impact factor: 4.132

9.  No evidence of association of oxytocin polymorphisms with breastfeeding in 2 independent samples.

Authors:  L Colodro-Conde; J F Sánchez-Romera; P A Lind; G Zhu; N G Martin; S E Medland; J R Ordoñana
Journal:  Genes Brain Behav       Date:  2018-02-20       Impact factor: 3.449

10.  Association Between Population Density and Genetic Risk for Schizophrenia.

Authors:  Lucía Colodro-Conde; Baptiste Couvy-Duchesne; John B Whitfield; Fabian Streit; Scott Gordon; Kathryn E Kemper; Loic Yengo; Zhili Zheng; Maciej Trzaskowski; Eveline L de Zeeuw; Michel G Nivard; Marjolijn Das; Rachel E Neale; Stuart MacGregor; Catherine M Olsen; David C Whiteman; Dorret I Boomsma; Jian Yang; Marcella Rietschel; John J McGrath; Sarah E Medland; Nicholas G Martin
Journal:  JAMA Psychiatry       Date:  2018-09-01       Impact factor: 21.596

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.