Literature DB >> 29596615

ONETOOL for the analysis of family-based big data.

Yeunjoo E Song1, Sungyoung Lee2, Kyungtaek Park2, Robert C Elston1, Hyeon-Jong Yang3,4, Sungho Won2,5,6.   

Abstract

Motivation: Despite the need for separate tools to analyze family-based data, there are only a handful of tools optimized for family-based big data compared to the number of tools available for analyzing population-based data.
Results: ONETOOL implements the properties of well-known existing family data analysis tools and recently developed methods in a computationally efficient manner, and so is suitable for analyzing the vast amount of variant data available from sequencing family members, providing a rich choice of analysis methods for big data on families. Availability and implementation: ONETOOL is freely available from http://healthstat.snu.ac.kr/software/onetool/. Supplementary information: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2018        PMID: 29596615      PMCID: PMC6084591          DOI: 10.1093/bioinformatics/bty180

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The importance of family-based designs has been repeatedly stressed for analyses with sequence data because of the genetic homogeneity between family members (Bailey-Wilson and Wilson, 2011; Wijsman, 2012). Family study designs provide not only the enrichment of genetic loci containing rare variants, but also methods to control for genetic heterogeneity and population stratification. Family-based data have different properties from population-based data owing to the genetic relatedness among family members and Mendelian transmission. These well-known features have allowed family-based designs to play a key role in the history of genetic analysis, but they also limit the use of many available tools designed for the analysis of population-based data. Despite the need for separate tools to analyze family-based data, there is only a handful of tools available for family-based data, especially the big data from sequencing which comprise the readily available form of data for genetic and genomic analyses these days. As for the population-based data analysis tools, the most common family-based big data analysis tools aim to filter/rank/QC/annotate variants (Supplementary Table 1). These tools are optimized to be used with the vast amount of variant data from sequencing but lack the choices of essential analyses needed to test and infer the valid results regarding the relation between traits of interest against the common and/or rare variants. Therefore, users need to use the separate tools individually. And, though there exists a handful of family-based imputation tools, there is a clear lack of family-based association analysis tools that can analyze the dosage files directly as an input. There are many very well-known family data analysis tools available from the era of linkage analysis and genome-wide association studies (GWAS). Among these, S.A.G.E. 6.4 (http://darwin.cwru.edu/sage/) and Merlin (Abecasis ) are still used a lot by many researchers. PLINK (Purcell ) is one of most popularly used tool for GWAS. It is a part of many standard pipeline tools, therefore, the PLINK input file formats are the standard format for many sequence data analysis tools. However, since it is designed mainly for population-based case-control data, the analysis options are very limited for use with family data. All three tools have their pros and cons. In this work, we introduce a novel comprehensive tool that combines the good features of these existing tools and many newly developed family-based association analysis methods along with a novel feature to analyze the dosage data, providing in a computationally efficient manner a rich choice of analysis options to use for the vast amount of variant data coming from the sequencing of families. This provides a convenience and time-saver that enables a researcher to perform many of the family-based genetic and genomic analyses using one tool, i.e. ONETOOL, instead of hopping among many different tools to accomplish a family data analysis project.

2 Features

ONETOOL provides four main analyses: informatics and quality control (InfoQC), trait analysis, linkage analysis and association analysis with both genotype data and dosage data (in Table 1 and also see the Supplementary Table 2 for details).
Table 1.

Summary of available family-based analyses in ONETOOL

MainSub-categoryDetail
InfoQC analysisVariant informationFST, Ts/Tv ratio, MAF, HWE, PCA
Sample informationHet, Het/Hom
Pedigree informationDescription and summary, plot, relative pairs
Error detectionMendelian error
Relatedness matrixKinship, IBS, GRM
Trait analysisFamilial aggregationCorrelation
HeritabilityBased on Kinship, IBS, GRM
Segregation analysisMode of inheritance
Linkage analysisModel-basedTwo-point, utilizing segregation analysis
Model-freeMultipoint, modeling LD
Association analysisSingle variantScore test, TDT/SDT, MQLS, FQLS, EQLS, GEMMA
Gene-basedCollapsing, PEDCMC, FAMVT, FARVAT, FBSKAT, PEDGENE, RVTDT
Genotype probability & dosage dataScoretest, EQLS, GEMMA, CMC, PEDCMC, FAMVT, FARVAT, PEDGENE
Summary of available family-based analyses in ONETOOL

2.1 InfoQC, trait and linkage analysis

Family data requires additional error checking and filtering that also consider the family structure, so that the family structures are maintained. ONETOOL provides the proper methods to deal this complexity of family data and the downstream analyses as an integrated tool. Moreover, ONETOOL’s options for the variant-wise InfoQC and filtering are similar with those in Plink, but they are implemented in a computationally optimized way providing more speed and efficiency. It also provides visualization of family data as done utilizing the R package kinship2 to generate a plot (Sinnwell ). As shown in Supplementary Table 1, not many tools are available for trait analysis nor are optimized to work with the current pipeline of family big data. ONETOOL fulfills this gap by integrating the tools for familial aggregation or correlation, narrow-sense heritability estimation and segregation analysis. With ONETOOL, both types of linkage analyses, model-based linkage and model-free linkage accounting for linkage disequilibrium, can be done directly with the current genomic data files.

2.2 Association analysis

Depending on the types of trait data (binary or continuous), family data (random or ascertained, trio or general) and variant data (common or rare) in hand, the different family-based association analyses provide the best estimates in terms of both power and type 1 error. Many times, a complex disease data analysis project involves not just a phenotype but a set of multiple phenotypes with different characteristics. It also involves analyzing a set of different types of genetic data. By combining many different types of association methods developed for specific cases into an integrated tool with a common interface, ONETOOL enables more seemingly harmonized family-based association analyses. In Supplementary Tables 3 and 4, we summarized the proper timing to use for the family-based association test available in ONETOOL.

2.3 Imputation and dosage data

ONETOOL provides an option to impute the missing genotypes. Expected missing genotypes for typed variants are imputed based on the familial relationship, and if phenotypes of any subjects with missing genotypes are available, genotypes imputed with family members’ genotypes can improve statistical power (see Supplementary Material for details). ONETOOL also enables the family-based association analysis with dosage data and genotype probability. See the Supplementary Table 5 for the supported dosage and genotype probability file formats from several popular imputation tools.

3 Discussion

ONETOOL enables a researcher to perform many of the family-based genetic and genomic analyses in a computationally efficient manner. It provides a convenience and time-saver with a rich choice of analysis options available, both existing and novel. ONETOOL supports various types of data input files includes the dosage and genotype probability files from several imputation tools. Using two different family datasets, we show, in Table 2, the performance of ONETOOL and the time savings by running several analyses at once compare to the separate run for each component (see Supplementary Material for details).
Table 2.

Efficiency of the integrated analyses in ONETOOL

AnalysesRun typeData1Data2
InfoQC+Traitseparate run2.21s55.83s
ONETOOL0.74s44.09s
InfoQC+Trait+single-variantseparate run2.41s58.74s
ONETOOL0.83s54.09s
InfoQC+Trait+gene-basedseparate run2.47s59.20s
ONETOOL0.85s54.76s
Efficiency of the integrated analyses in ONETOOL

Funding

This work was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (Grant no. HC15C1302); and the Bio-Synergy Research Project (NRF-2017M3A9C4065964) of the Ministry of Science, ICT and Future Planning through the National Research Foundation. Conflict of Interest: none declared. Click here for additional data file.
  5 in total

1.  Merlin--rapid analysis of dense genetic maps using sparse gene flow trees.

Authors:  Gonçalo R Abecasis; Stacey S Cherny; William O Cookson; Lon R Cardon
Journal:  Nat Genet       Date:  2001-12-03       Impact factor: 38.330

Review 2.  Linkage analysis in the next-generation sequencing era.

Authors:  Joan E Bailey-Wilson; Alexander F Wilson
Journal:  Hum Hered       Date:  2011-12-23       Impact factor: 0.444

3.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

4.  The kinship2 R package for pedigree data.

Authors:  Jason P Sinnwell; Terry M Therneau; Daniel J Schaid
Journal:  Hum Hered       Date:  2014-07-29       Impact factor: 0.444

Review 5.  The role of large pedigrees in an era of high-throughput sequencing.

Authors:  Ellen M Wijsman
Journal:  Hum Genet       Date:  2012-06-20       Impact factor: 4.132

  5 in total
  10 in total

1.  Effect of population stratification on SNP-by-environment interaction.

Authors:  Jaehoon An; Sungho Won; Sharon M Lutz; Julian Hecker; Christoph Lange
Journal:  Genet Epidemiol       Date:  2019-08-20       Impact factor: 2.135

2.  Unveiling Genetic Variants Underlying Vitamin D Deficiency in Multiple Korean Cohorts by a Genome-Wide Association Study.

Authors:  Ye An Kim; Ji Won Yoon; Young Lee; Hyuk Jin Choi; Jae Won Yun; Eunsin Bae; Seung-Hyun Kwon; So Eun Ahn; Ah-Ra Do; Heejin Jin; Sungho Won; Do Joon Park; Chan Soo Shin; Je Hyun Seo
Journal:  Endocrinol Metab (Seoul)       Date:  2021-12-02

3.  A missense variant in SHARPIN mediates Alzheimer's disease-specific brain damages.

Authors:  Jun Young Park; Dongsoo Lee; Jang Jae Lee; Jungsoo Gim; Tamil Iniyan Gunasekaran; Kyu Yeong Choi; Sarang Kang; Ah Ra Do; Jinyeon Jo; Juhong Park; Kyungtaek Park; Donghe Li; Sanghun Lee; Hoowon Kim; Immanuel Dhanasingh; Suparna Ghosh; Seula Keum; Jee Hye Choi; Gyun Jee Song; Lee Sael; Sangmyung Rhee; Simon Lovestone; Eunae Kim; Seung Hwan Moon; Byeong C Kim; SangYun Kim; Andrew J Saykin; Kwangsik Nho; Sung Haeng Lee; Lindsay A Farrer; Gyungah R Jun; Sungho Won; Kun Ho Lee
Journal:  Transl Psychiatry       Date:  2021-11-16       Impact factor: 6.222

4.  Unveiling genetic variants for age-related sarcopenia by conducting a genome-wide association study on Korean cohorts.

Authors:  Heejin Jin; Hyun Ju Yoo; Ye An Kim; Ji Hyun Lee; Young Lee; Seung-Hyun Kwon; Young Joo Seo; Seung Hun Lee; Jung-Min Koh; Yunmi Ji; Ah Ra Do; Sungho Won; Je Hyun Seo
Journal:  Sci Rep       Date:  2022-03-03       Impact factor: 4.379

5.  Genome-Wide Association Study of Airway Wall Thickening in a Korean Chronic Obstructive Pulmonary Disease Cohort.

Authors:  Ah Ra Do; Do Yeon Ko; Jeeyoung Kim; So Hyeon Bak; Ki Yeol Lee; Dankyu Yoon; Chol Shin; Soriul Kim; Woo Jin Kim; Sungho Won
Journal:  Genes (Basel)       Date:  2022-07-15       Impact factor: 4.141

6.  Heritability Analyses Uncover Shared Genetic Effects of Lung Function and Change over Time.

Authors:  Donghe Li; Woojin Kim; Jahoon An; Soriul Kim; Seungku Lee; Ahra Do; Wonji Kim; Sanghun Lee; Dankyu Yoon; Kwangbae Lee; Seounguk Ha; Edwin K Silverman; Michael Cho; Chol Shin; Sungho Won
Journal:  Genes (Basel)       Date:  2022-07-15       Impact factor: 4.141

7.  Genome-wide scan for common variants associated with intramuscular fat and moisture content in rainbow trout.

Authors:  Ali Ali; Rafet Al-Tobasei; Daniela Lourenco; Tim Leeds; Brett Kenney; Mohamed Salem
Journal:  BMC Genomics       Date:  2020-07-31       Impact factor: 3.969

8.  Progressive effects of single-nucleotide polymorphisms on 16 phenotypic traits based on longitudinal data.

Authors:  Donghe Li; Hahn Kang; Sanghun Lee; Sungho Won
Journal:  Genes Genomics       Date:  2020-01-04       Impact factor: 1.839

9.  rs2671655 single nucleotide polymorphism modulates the risk for gastric cancer in Helicobacter pylori-infected individuals: a genome-wide association study in the Korean population.

Authors:  Cheol Min Shin; Kyungtaek Park; Nayoung Kim; Sungho Won; Jung Hun Ohn; Sejoon Lee; Ji Hyun Park; Seung Joo Kang; Joo Sung Kim; Dong Ho Lee
Journal:  Gastric Cancer       Date:  2022-03-24       Impact factor: 7.701

10.  Genome-Wide Association Study of Korean Asthmatics: A Comparison With UK Asthmatics.

Authors:  Jin An; Ah Ra Do; Hae Yeon Kang; Woo Jin Kim; Sanghun Lee; Ji Hyang Lee; Woo Jung Song; Hyouk Soo Kwon; You Sook Cho; Hee Bom Moon; Sile Hu; Ian M Adcock; Kian Fan Chung; Sungho Won; Tae Bum Kim
Journal:  Allergy Asthma Immunol Res       Date:  2021-07       Impact factor: 5.764

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.