Literature DB >> 25788325

Evaluation of a two-stage framework for prediction using big genomic data.

Xia Jiang, Richard E Neapolitan.   

Abstract

We are in the era of abundant 'big' or 'high-dimensional' data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, 'genome-wide association studies' examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
© The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

Keywords:  Bayesian network; GWAS; SNP; big data; high-dimensional data; prediction

Mesh:

Year:  2015        PMID: 25788325      PMCID: PMC4652616          DOI: 10.1093/bib/bbv010

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  41 in total

Review 1.  New strategies for identifying gene-gene interactions in hypertension.

Authors:  Jason H Moore; Scott M Williams
Journal:  Ann Med       Date:  2002       Impact factor: 4.709

2.  Genome-wide strategies for detecting multiple loci that influence complex diseases.

Authors:  Jonathan Marchini; Peter Donnelly; Lon R Cardon
Journal:  Nat Genet       Date:  2005-03-27       Impact factor: 38.330

3.  Bayesian graphical models for genomewide association studies.

Authors:  Claudio J Verzilli; Nigel Stallard; John C Whittaker
Journal:  Am J Hum Genet       Date:  2006-05-30       Impact factor: 11.025

4.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies.

Authors:  Can Yang; Zengyou He; Xiang Wan; Qiang Yang; Hong Xue; Weichuan Yu
Journal:  Bioinformatics       Date:  2008-12-19       Impact factor: 6.937

5.  Genome-wide association analysis by lasso penalized logistic regression.

Authors:  Tong Tong Wu; Yi Fang Chen; Trevor Hastie; Eric Sobel; Kenneth Lange
Journal:  Bioinformatics       Date:  2009-01-28       Impact factor: 6.937

6.  Personal genomes: The case of the missing heritability.

Authors:  Brendan Maher
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

7.  An efficient bayesian method for predicting clinical outcomes from genome-wide data.

Authors:  Gregory F Cooper; Pablo Hennings-Yeomans; Shyam Visweswaran; Michael Barmada
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

8.  Comparative analysis of methods for detecting interacting loci.

Authors:  Li Chen; Guoqiang Yu; Carl D Langefeld; David J Miller; Richard T Guy; Jayaram Raghuram; Xiguo Yuan; David M Herrington; Yue Wang
Journal:  BMC Genomics       Date:  2011-07-05       Impact factor: 3.969

9.  Detecting epistatic effects in association studies at a genomic level based on an ensemble approach.

Authors:  Jing Li; Benjamin Horstman; Yixuan Chen
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

10.  A bayesian method for evaluating and discovering disease loci associations.

Authors:  Xia Jiang; M Michael Barmada; Gregory F Cooper; Michael J Becich
Journal:  PLoS One       Date:  2011-08-10       Impact factor: 3.240

View more
  3 in total

1.  An algorithm for direct causal learning of influences on patient outcomes.

Authors:  Chandramouli Rathnam; Sanghoon Lee; Xia Jiang
Journal:  Artif Intell Med       Date:  2016-11-05       Impact factor: 5.326

Review 2.  Genome-Wide Association Study Statistical Models: A Review.

Authors:  Mohsen Yoosefzadeh-Najafabadi; Milad Eskandari; François Belzile; Davoud Torkamaneh
Journal:  Methods Mol Biol       Date:  2022

3.  New Algorithm and Software (BNOmics) for Inferring and Visualizing Bayesian Networks from Heterogeneous Big Biological and Genetic Data.

Authors:  Grigoriy Gogoshin; Eric Boerwinkle; Andrei S Rodin
Journal:  J Comput Biol       Date:  2016-09-28       Impact factor: 1.479

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.