Literature DB >> 30202918

The impact of different sources of heterogeneity on loss of accuracy from genomic prediction models.

Yuqing Zhang1, Christoph Bernau2, Giovanni Parmigiani3,4, Levi Waldron5.   

Abstract

Cross-study validation (CSV) of prediction models is an alternative to traditional cross-validation (CV) in domains where multiple comparable datasets are available. Although many studies have noted potential sources of heterogeneity in genomic studies, to our knowledge none have systematically investigated their intertwined impacts on prediction accuracy across studies. We employ a hybrid parametric/non-parametric bootstrap method to realistically simulate publicly available compendia of microarray, RNA-seq, and whole metagenome shotgun microbiome studies of health outcomes. Three types of heterogeneity between studies are manipulated and studied: (i) imbalances in the prevalence of clinical and pathological covariates, (ii) differences in gene covariance that could be caused by batch, platform, or tumor purity effects, and (iii) differences in the "true" model that associates gene expression and clinical factors to outcome. We assess model accuracy, while altering these factors. Lower accuracy is seen in CSV than in CV. Surprisingly, heterogeneity in known clinical covariates and differences in gene covariance structure have very limited contributions in the loss of accuracy when validating in new studies. However, forcing identical generative models greatly reduces the within/across study difference. These results, observed consistently for multiple disease outcomes and omics platforms, suggest that the most easily identifiable sources of study heterogeneity are not necessarily the primary ones that undermine the ability to accurately replicate the accuracy of omics prediction models in new studies. Unidentified heterogeneity, such as could arise from unmeasured confounding, may be more important.
© The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  Cross-study validation; Data heterogeneity; Genomic prediction models

Mesh:

Year:  2020        PMID: 30202918      PMCID: PMC7868050          DOI: 10.1093/biostatistics/kxy044

Source DB:  PubMed          Journal:  Biostatistics        ISSN: 1465-4644            Impact factor:   5.899


  24 in total

1.  A cross-study comparison of gene expression studies for the molecular classification of lung cancer.

Authors:  Giovanni Parmigiani; Elizabeth S Garrett-Mayer; Ramaswamy Anbazhagan; Edward Gabrielson
Journal:  Clin Cancer Res       Date:  2004-05-01       Impact factor: 12.531

2.  Generating survival times to simulate Cox proportional hazards models.

Authors:  Ralf Bender; Thomas Augustin; Maria Blettner
Journal:  Stat Med       Date:  2005-06-15       Impact factor: 2.373

3.  Accessible, curated metagenomic data through ExperimentHub.

Authors:  Edoardo Pasolli; Lucas Schiffer; Paolo Manghi; Audrey Renson; Valerie Obenchain; Duy Tin Truong; Francesco Beghini; Faizan Malik; Marcel Ramos; Jennifer B Dowd; Curtis Huttenhower; Martin Morgan; Nicola Segata; Levi Waldron
Journal:  Nat Methods       Date:  2017-10-31       Impact factor: 28.547

4.  Use of archived specimens in evaluation of prognostic and predictive biomarkers.

Authors:  Richard M Simon; Soonmyung Paik; Daniel F Hayes
Journal:  J Natl Cancer Inst       Date:  2009-10-08       Impact factor: 13.506

5.  Más-o-menos: a simple sign averaging method for discrimination in genomic data analysis.

Authors:  Sihai Dave Zhao; Giovanni Parmigiani; Curtis Huttenhower; Levi Waldron
Journal:  Bioinformatics       Date:  2014-07-23       Impact factor: 6.937

6.  Validation in genetic association studies.

Authors:  Inke R König
Journal:  Brief Bioinform       Date:  2011-05       Impact factor: 11.622

7.  An empirical assessment of validation practices for molecular classifiers.

Authors:  Peter J Castaldi; Issa J Dahabreh; John P A Ioannidis
Journal:  Brief Bioinform       Date:  2011-02-07       Impact factor: 11.622

Review 8.  Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer.

Authors:  Levi Waldron; Benjamin Haibe-Kains; Aedín C Culhane; Markus Riester; Jie Ding; Xin Victoria Wang; Mahnaz Ahmadifar; Svitlana Tyekucheva; Christoph Bernau; Thomas Risch; Benjamin Frederick Ganzfried; Curtis Huttenhower; Michael Birrer; Giovanni Parmigiani
Journal:  J Natl Cancer Inst       Date:  2014-04-03       Impact factor: 11.816

9.  A prognostic index in primary breast cancer.

Authors:  J L Haybittle; R W Blamey; C W Elston; J Johnson; P J Doyle; F C Campbell; R I Nicholson; K Griffiths
Journal:  Br J Cancer       Date:  1982-03       Impact factor: 7.640

10.  Cross-study validation for the assessment of prediction algorithms.

Authors:  Christoph Bernau; Markus Riester; Anne-Laure Boulesteix; Giovanni Parmigiani; Curtis Huttenhower; Levi Waldron; Lorenzo Trippa
Journal:  Bioinformatics       Date:  2014-06-15       Impact factor: 6.937

View more
  2 in total

1.  Tree-Weighting for Multi-Study Ensemble Learners.

Authors:  Maya Ramchandran; Prasad Patil; Giovanni Parmigiani
Journal:  Pac Symp Biocomput       Date:  2020

2.  Epigenomic Assessment of Cardiovascular Disease Risk and Interactions With Traditional Risk Metrics.

Authors:  Kenneth Westerman; Alba Fernández-Sanlés; Prasad Patil; Paola Sebastiani; Paul Jacques; John M Starr; Ian J Deary; Qing Liu; Simin Liu; Roberto Elosua; Dawn L DeMeo; José M Ordovás
Journal:  J Am Heart Assoc       Date:  2020-04-20       Impact factor: 5.501

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.