Literature DB >> 27911830

Framework for making better predictions by directly estimating variables' predictivity.

Adeline Lo1, Herman Chernoff2, Tian Zheng3, Shaw-Hwa Lo3.   

Abstract

We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.

Keywords:  high-dimensional data; prediction; predictivity; variable selection

Year:  2016        PMID: 27911830      PMCID: PMC5167195          DOI: 10.1073/pnas.1616647113

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  6 in total

1.  Why significant variables aren't automatically good predictors.

Authors:  Adeline Lo; Herman Chernoff; Tian Zheng; Shaw-Hwa Lo
Journal:  Proc Natl Acad Sci U S A       Date:  2015-10-26       Impact factor: 11.205

2.  Cumulative association of five genetic variants with prostate cancer.

Authors:  S Lilly Zheng; Jielin Sun; Fredrik Wiklund; Shelly Smith; Pär Stattin; Ge Li; Hans-Olov Adami; Fang-Chi Hsu; Yi Zhu; Katarina Bälter; A Karim Kader; Aubrey R Turner; Wennuan Liu; Eugene R Bleecker; Deborah A Meyers; David Duggan; John D Carpten; Bao-Li Chang; William B Isaacs; Jianfeng Xu; Henrik Grönberg
Journal:  N Engl J Med       Date:  2008-01-16       Impact factor: 91.245

Review 3.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

4.  Interaction-based feature selection and classification for high-dimensional biological data.

Authors:  Haitian Wang; Shaw-Hwa Lo; Tian Zheng; Inchi Hu
Journal:  Bioinformatics       Date:  2012-09-03       Impact factor: 6.937

5.  Chromosome 9p21 genetic variation explains 13% of cardiovascular disease incidence but does not improve risk prediction.

Authors:  K Gränsbo; P Almgren; M Sjögren; J G Smith; G Engström; B Hedblad; O Melander
Journal:  J Intern Med       Date:  2013-03-25       Impact factor: 8.989

6.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

  6 in total
  1 in total

1.  Probabilistic Prediction of Nonadherence to Psychiatric Disorder Medication from Mental Health Forum Data: Developing and Validating Bayesian Machine Learning Classifiers.

Authors:  Meng Ji; Wenxiu Xie; Mengdan Zhao; Xiaobo Qian; Chi-Yin Chow; Kam-Yiu Lam; Jun Yan; Tianyong Hao
Journal:  Comput Intell Neurosci       Date:  2022-04-15
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.