Literature DB >> 28669402

SEQSpark: A Complete Analysis Tool for Large-Scale Rare Variant Association Studies Using Whole-Genome and Exome Sequence Data.

Di Zhang1, Linhai Zhao1, Biao Li1, Zongxiao He1, Gao T Wang2, Dajiang J Liu3, Suzanne M Leal4.   

Abstract

Massively parallel sequencing technologies provide great opportunities for discovering rare susceptibility variants involved in complex disease etiology via large-scale imputation and exome and whole-genome sequence-based association studies. Due to modest effect sizes, large sample sizes of tens to hundreds of thousands of individuals are required for adequately powered studies. Current analytical tools are obsolete when it comes to handling these large datasets. To facilitate the analysis of large-scale sequence-based studies, we developed SEQSpark which implements parallel processing based on Spark to increase the speed and efficiency of performing data quality control, annotation, and association analysis. To demonstrate the versatility and speed of SEQSpark, we analyzed whole-genome sequence data from the UK10K, testing for associations with waist-to-hip ratios. The analysis, which was completed in 1.5 hr, included loading data, annotation, principal component analysis, and single variant and rare variant aggregate association analysis of >9 million variants. For rare variant aggregate analysis, an exome-wide significant association (p < 2.5 × 10-6) was observed with CCDC62 (SKAT-O [p = 6.89 × 10-7], combined multivariate collapsing [p = 1.48 × 10-6], and burden of rare variants [p = 1.48 × 10-6]). SEQSpark was also used to analyze 50,000 simulated exomes and it required 1.75 hr for the analysis of a quantitative trait using several rare variant aggregate association methods. Additionally, the performance of SEQSpark was compared to Variant Association Tools and PLINK/SEQ. SEQSpark was always faster and in some situations computation was reduced to a hundredth of the time. SEQSpark will empower large sequence-based epidemiological studies to quickly elucidate genetic variation involved in the etiology of complex traits.
Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  CCDC62; Spark; UK10K; complex traits; computational tools; imputed data; quality control; rare variant aggregate association analysis; waist-to-hip ratio; whole-genome sequence data

Mesh:

Year:  2017        PMID: 28669402      PMCID: PMC5501866          DOI: 10.1016/j.ajhg.2017.05.017

Source DB:  PubMed          Journal:  Am J Hum Genet        ISSN: 0002-9297            Impact factor:   11.025


  23 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Optimal tests for rare variant effects in sequencing association studies.

Authors:  Seunggeun Lee; Michael C Wu; Xihong Lin
Journal:  Biostatistics       Date:  2012-06-14       Impact factor: 5.899

3.  Pooled association tests for rare variants in exon-resequencing studies.

Authors:  Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev
Journal:  Am J Hum Genet       Date:  2010-05-13       Impact factor: 11.025

4.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

5.  A general framework for detecting disease associations with rare variants in sequencing studies.

Authors:  Dan-Yu Lin; Zheng-Zheng Tang
Journal:  Am J Hum Genet       Date:  2011-09-01       Impact factor: 11.025

6.  Association of exome sequences with plasma C-reactive protein levels in >9000 participants.

Authors:  Ursula M Schick; Paul L Auer; Joshua C Bis; Honghuang Lin; Peng Wei; Nathan Pankratz; Leslie A Lange; Jennifer Brody; Nathan O Stitziel; Daniel S Kim; Christopher S Carlson; Myriam Fornage; Jeffery Haessler; Li Hsu; Rebecca D Jackson; Charles Kooperberg; Suzanne M Leal; Bruce M Psaty; Eric Boerwinkle; Russell Tracy; Diego Ardissino; Svati Shah; Cristen Willer; Ruth Loos; Olle Melander; Ruth Mcpherson; Kees Hovingh; Muredach Reilly; Hugh Watkins; Domenico Girelli; Pierre Fontanillas; Daniel I Chasman; Stacey B Gabriel; Richard Gibbs; Deborah A Nickerson; Sekar Kathiresan; Ulrike Peters; Josée Dupuis; James G Wilson; Stephen S Rich; Alanna C Morrison; Emelia J Benjamin; Myron D Gross; Alex P Reiner
Journal:  Hum Mol Genet       Date:  2014-09-03       Impact factor: 6.150

Review 7.  Genotype imputation.

Authors:  Yun Li; Cristen Willer; Serena Sanna; Gonçalo Abecasis
Journal:  Annu Rev Genomics Hum Genet       Date:  2009       Impact factor: 8.929

8.  A groupwise association test for rare mutations using a weighted sum statistic.

Authors:  Bo Eskerod Madsen; Sharon R Browning
Journal:  PLoS Genet       Date:  2009-02-13       Impact factor: 5.917

9.  An evaluation of statistical approaches to rare variant analysis in genetic association studies.

Authors:  Andrew P Morris; Eleftheria Zeggini
Journal:  Genet Epidemiol       Date:  2010-02       Impact factor: 2.135

10.  The UK10K project identifies rare variants in health and disease.

Authors:  Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal:  Nature       Date:  2015-09-14       Impact factor: 49.962

View more
  7 in total

1.  Polygenic inheritance, GWAS, polygenic risk scores, and the search for functional variants.

Authors:  Daniel J M Crouch; Walter F Bodmer
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-04       Impact factor: 11.205

Review 2.  Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits.

Authors:  J Dylan Weissenkampen; Yu Jiang; Scott Eckert; Bibo Jiang; Bingshan Li; Dajiang J Liu
Journal:  Curr Protoc Hum Genet       Date:  2019-03-08

Review 3.  Rare-variant collapsing analyses for complex traits: guidelines and applications.

Authors:  Gundula Povysil; Slavé Petrovski; Joseph Hostyk; Vimla Aggarwal; Andrew S Allen; David B Goldstein
Journal:  Nat Rev Genet       Date:  2019-10-11       Impact factor: 53.242

4.  DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark.

Authors:  Michael D Linderman; Davin Chia; Forrest Wallace; Frank A Nothaft
Journal:  BMC Bioinformatics       Date:  2019-10-11       Impact factor: 3.169

Review 5.  Unique roles of rare variants in the genetics of complex diseases in humans.

Authors:  Yukihide Momozawa; Keijiro Mizukami
Journal:  J Hum Genet       Date:  2020-09-18       Impact factor: 3.172

Review 6.  The population genomics of adaptive loss of function.

Authors:  J Grey Monroe; John K McKay; Detlef Weigel; Pádraic J Flood
Journal:  Heredity (Edinb)       Date:  2021-02-11       Impact factor: 3.821

Review 7.  Bioinformatics applications on Apache Spark.

Authors:  Runxin Guo; Yi Zhao; Quan Zou; Xiaodong Fang; Shaoliang Peng
Journal:  Gigascience       Date:  2018-08-01       Impact factor: 6.524

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.