Literature DB >> 26504198

Why significant variables aren't automatically good predictors.

Adeline Lo1, Herman Chernoff2, Tian Zheng3, Shaw-Hwa Lo4.   

Abstract

Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.

Entities:  

Keywords:  high-dimensional data; prediction; statistical significance; variable selection classification

Mesh:

Year:  2015        PMID: 26504198      PMCID: PMC4653162          DOI: 10.1073/pnas.1518285112

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  13 in total

1.  A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data.

Authors:  Shaw-Hwa Lo; Tian Zheng
Journal:  Proc Natl Acad Sci U S A       Date:  2004-07-01       Impact factor: 11.205

2.  Backward genotype-trait association (BGTA)-based dissection of complex traits in case-control designs.

Authors:  Tian Zheng; Hui Wang; Shaw-Hwa Lo
Journal:  Hum Hered       Date:  2006-11-15       Impact factor: 0.444

3.  Cumulative association of five genetic variants with prostate cancer.

Authors:  S Lilly Zheng; Jielin Sun; Fredrik Wiklund; Shelly Smith; Pär Stattin; Ge Li; Hans-Olov Adami; Fang-Chi Hsu; Yi Zhu; Katarina Bälter; A Karim Kader; Aubrey R Turner; Wennuan Liu; Eugene R Bleecker; Deborah A Meyers; David Duggan; John D Carpten; Bao-Li Chang; William B Isaacs; Jianfeng Xu; Henrik Grönberg
Journal:  N Engl J Med       Date:  2008-01-16       Impact factor: 91.245

Review 4.  A review of feature selection techniques in bioinformatics.

Authors:  Yvan Saeys; Iñaki Inza; Pedro Larrañaga
Journal:  Bioinformatics       Date:  2007-08-24       Impact factor: 6.937

Review 5.  Genome-based prediction of common diseases: advances and prospects.

Authors:  A Cecile J W Janssens; Cornelia M van Duijn
Journal:  Hum Mol Genet       Date:  2008-10-15       Impact factor: 6.150

6.  Chromosome 9p21 genetic variation explains 13% of cardiovascular disease incidence but does not improve risk prediction.

Authors:  K Gränsbo; P Almgren; M Sjögren; J G Smith; G Engström; B Hedblad; O Melander
Journal:  J Intern Med       Date:  2013-03-25       Impact factor: 8.989

7.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

8.  Discovering interactions among BRCA1 and other candidate genes associated with sporadic breast cancer.

Authors:  Shaw-Hwa Lo; Herman Chernoff; Lei Cong; Yuejing Ding; Tian Zheng
Journal:  Proc Natl Acad Sci U S A       Date:  2008-08-18       Impact factor: 11.205

9.  Prediction and interaction in complex disease genetics: experience in type 1 diabetes.

Authors:  David G Clayton
Journal:  PLoS Genet       Date:  2009-07-03       Impact factor: 5.917

10.  Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers.

Authors:  Johanna Jakobsdottir; Michael B Gorin; Yvette P Conley; Robert E Ferrell; Daniel E Weeks
Journal:  PLoS Genet       Date:  2009-02-06       Impact factor: 5.917

View more
  59 in total

1.  A Roadmap for the Development of Applied Computational Psychiatry.

Authors:  Martin P Paulus; Quentin J M Huys; Tiago V Maia
Journal:  Biol Psychiatry Cogn Neurosci Neuroimaging       Date:  2016-09

2.  Using Machine Learning to Predict Young People's Internet Health and Social Service Information Seeking.

Authors:  W Scott Comulada; Cameron Goldbeck; Ellen Almirol; Heather J Gunn; Manuel A Ocasio; M Isabel Fernández; Elizabeth Mayfield Arnold; Adriana Romero-Espinoza; Stacey Urauchi; Wilson Ramos; Mary Jane Rotheram-Borus; Jeffrey D Klausner; Dallas Swendeman
Journal:  Prev Sci       Date:  2021-05-11

3.  Framework for making better predictions by directly estimating variables' predictivity.

Authors:  Adeline Lo; Herman Chernoff; Tian Zheng; Shaw-Hwa Lo
Journal:  Proc Natl Acad Sci U S A       Date:  2016-11-29       Impact factor: 11.205

4.  Neuroimaging Research: From Null-Hypothesis Falsification to Out-of-Sample Generalization.

Authors:  Danilo Bzdok; Gaël Varoquaux; Bertrand Thirion
Journal:  Educ Psychol Meas       Date:  2016-10-06       Impact factor: 2.821

5.  Interpretation of machine learning predictions for patient outcomes in electronic health records.

Authors:  William La Cava; Christopher Bauer; Jason H Moore; Sarah A Pendergrass
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

6.  Enhanced estimations of post-stroke aphasia severity using stacked multimodal predictions.

Authors:  Dorian Pustina; Harry Branch Coslett; Lyle Ungar; Olufunsho K Faseyitan; John D Medaglia; Brian Avants; Myrna F Schwartz
Journal:  Hum Brain Mapp       Date:  2017-08-07       Impact factor: 5.038

7.  Heart rate-based window segmentation improves accuracy of classifying posttraumatic stress disorder using heart rate variability measures.

Authors:  Erik Reinertsen; Shamim Nemati; Adriana N Vest; Viola Vaccarino; Rachel Lampert; Amit J Shah; Gari D Clifford
Journal:  Physiol Meas       Date:  2017-05-10       Impact factor: 2.833

Review 8.  A review of physiological and behavioral monitoring with digital sensors for neuropsychiatric illnesses.

Authors:  Erik Reinertsen; Gari D Clifford
Journal:  Physiol Meas       Date:  2018-05-15       Impact factor: 2.833

9.  Individual Cortical Entropy Profile: Test-Retest Reliability, Predictive Power for Cognitive Ability, and Neuroanatomical Foundation.

Authors:  Mianxin Liu; Xinyang Liu; Andrea Hildebrandt; Changsong Zhou
Journal:  Cereb Cortex Commun       Date:  2020-05-07

Review 10.  Building a Science of Individual Differences from fMRI.

Authors:  Julien Dubois; Ralph Adolphs
Journal:  Trends Cogn Sci       Date:  2016-04-30       Impact factor: 20.229

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.