Literature DB >> 26972839

Object-oriented regression for building predictive models with high dimensional omics data from translational studies.

Lue Ping Zhao1, Hamid Bolouri2.   

Abstract

Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015).
Copyright © 2016 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Big data; Clustering analysis; Gene expression; Generalized linear model; High dimensional data; LASSO; Lung cancer; Nearest neighbor approach; Penalized regression

Mesh:

Year:  2016        PMID: 26972839      PMCID: PMC5097461          DOI: 10.1016/j.jbi.2016.03.001

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  28 in total

1.  Clustering gene expression patterns.

Authors:  A Ben-Dor; R Shamir; Z Yakhini
Journal:  J Comput Biol       Date:  1999 Fall-Winter       Impact factor: 1.479

2.  Powerful SNP-set analysis for case-control genome-wide association studies.

Authors:  Michael C Wu; Peter Kraft; Michael P Epstein; Deanne M Taylor; Stephen J Chanock; David J Hunter; Xihong Lin
Journal:  Am J Hum Genet       Date:  2010-06-11       Impact factor: 11.025

3.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

4.  A working guide to boosted regression trees.

Authors:  J Elith; J R Leathwick; T Hastie
Journal:  J Anim Ecol       Date:  2008-04-08       Impact factor: 5.091

5.  Design of a high-sensitivity classifier based on a genetic algorithm: application to computer-aided diagnosis.

Authors:  B Sahiner; H P Chan; N Petrick; M A Helvie; M M Goodsitt
Journal:  Phys Med Biol       Date:  1998-10       Impact factor: 3.609

Review 6.  Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research.

Authors:  Brendan P Hodkinson; Elizabeth A Grice
Journal:  Adv Wound Care (New Rochelle)       Date:  2015-01-01       Impact factor: 4.730

Review 7.  Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics.

Authors:  John H Phan; Chang-Feng Quo; May D Wang
Journal:  Prog Brain Res       Date:  2006       Impact factor: 2.453

8.  The graphical lasso: New insights and alternatives.

Authors:  Rahul Mazumder; Trevor Hastie
Journal:  Electron J Stat       Date:  2012-11-09       Impact factor: 1.125

9.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

10.  Strong rules for discarding predictors in lasso-type problems.

Authors:  Robert Tibshirani; Jacob Bien; Jerome Friedman; Trevor Hastie; Noah Simon; Jonathan Taylor; Ryan J Tibshirani
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-03       Impact factor: 4.488

View more
  3 in total

1.  Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review.

Authors:  Seyedeh Neelufar Payrovnaziri; Zhaoyi Chen; Pablo Rengifo-Moreno; Tim Miller; Jiang Bian; Jonathan H Chen; Xiuwen Liu; Zhe He
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

2.  Improving prediction performance of colon cancer prognosis based on the integration of clinical and multi-omics data.

Authors:  Danyang Tong; Yu Tian; Tianshu Zhou; Qiancheng Ye; Jun Li; Kefeng Ding; Jingsong Li
Journal:  BMC Med Inform Decis Mak       Date:  2020-02-07       Impact factor: 2.796

3.  Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.

Authors:  Guang Yang; Qinghao Ye; Jun Xia
Journal:  Inf Fusion       Date:  2022-01       Impact factor: 12.975

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.