| Literature DB >> 30897838 |
Evangelina López de Maturana1, Lola Alonso2, Pablo Alarcón3, Isabel Adoración Martín-Antoniano4, Silvia Pineda5, Lucas Piorno6, M Luz Calle7, Núria Malats8.
Abstract
Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm's predictive ability. Only a small number of published studies performed a "real" integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.Entities:
Keywords: RNA expression; challenges; clinical data; data integration; epidemiological data; genomics; integrative analytics; joint modeling; non-omics data; omics data
Mesh:
Year: 2019 PMID: 30897838 PMCID: PMC6471713 DOI: 10.3390/genes10030238
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Classification of the strategies for building OnO models.
Main features of the identified studies conducting omics and non-omics data integration.
| Reference | Title | Outcome | Big Data: Omics and Image Data | Non-Omics | Objective | Model Performance | Approach |
|---|---|---|---|---|---|---|---|
| [ | Classification based on extensions of LS-PLS using logistic regression: application to clinical and multiple genomic data | Conditional modeling | |||||
| [ | Whole-genome multi-omic study of survival in patients with glioblastoma multiform | Survival time in glioblastoma | Predictive ability: AUC | AUCnon-omics = 0.71 | Joint modeling | ||
| [ | Survival prediction from clinico-genomic models—a comparative study | Prediction performance: deviance | Conditional and joint modeling | ||||
| [ | IPF-LASSO: Integrative L1-penalized regression with penalty factors for prediction based on multi-omics data | Predictive ability: | Joint modeling | ||||
| [ | Deep learning based multi-omics integration robustly predicts survival in liver cancer | Predictive ability: C-index, Brier score | Independent modeling | ||||
| [ | A strategy for multimodal data integration: application to biomarkers identification in spinocerebellar ataxia | Graphical | Non-provided | Joint modeling | |||
| [ | Prediction of years of life after diagnosis of breast cancer using omics and omic-by-treatment interactions | Prediction accuracy: | AUCnon-omics = 0.72–0.77 | Joint modeling | |||
| [ | Determination of prognosis in metastatic melanoma through integration of clinic-pathologic, mutation, mRNA, microRNA, and protein information | Prediction error rate (ER) | ERnon-omics = 30% | Independent modeling | |||
| [ | Whole Genome Prediction of Bladder Cancer Risk With the Bayesian LASSO | Prediction: AUC | AUCnon-omics = 0.65 | Joint modeling | |||
| [ | Prediction of non-muscle invasive bladder cancer outcomes assessed by innovative multimarker prognostic models | Prediction: AUC, | Joint modeling | ||||
| [ | A pathway based data integration framework for prediction of disease progression | Accuracy | Accnon-omics = non-provided | Joint modeling | |||
| [ | A methylation-to-expression feature model for generating accurate prognostic risk scores and identifying disease targets in clear cell kidney cancer | Classification performance: | C-indexnon-omics = 0.776 | Independent modeling approach | |||
| [ | Methylation-to-expression feature models of breast cancer accurately predict overall survival, distant-recurrence free survival, and pathologic complete response in multiple cohorts | Classification performance: | Independent modeling | ||||
| [ | Integration of Clinical and Gene Expression Data Has a Synergetic Effect on Predicting Breast Cancer Outcome | Classification performance: error rate, AUC | Independent and Joint modeling | ||||
| [ | A Comprehensive Genetic Approach for Improving Prediction of Skin Cancer Risk in Humans | Classification performance: AUC | AUCnon-omics = 0.53–0.54 | Joint modeling | |||
| [ | Integrating Clinical and Multiple Omics Data for Prognostic Assessment across Human Cancers | Classification performance: | Ovarian and HNSC f: | Independent modeling |
CNS: Central Nervous System; ER: Estrogen receptor; NPI: Nottingham Prognostic Index; NB2004: German neuroblastoma trial; OS: Overall Survival; AML: Acute myeloid leukaemia; CT: Chemotherapy; RT: Radiotherapy; CT: Chemotherapy; TSG: Tumor stage and grade; PRS: Polygenic risk score; HNSC: Head and neck squamous cell carcinoma; DLBCL: Diffuse large B-cell lymphoma; IBS: Integrated Brier score. a Models performance in the largest datasets. b It corresponds to the AUC of COV+GE+GExHT model. c No improvement in classification performance was also obtained in TP. d We provide only the results for OS, when no external validation was considered. Similar performances were obtained when the external validation was performed. e Performance of M2EFM Meth+Exp model. f We report the C-index results for the cancers where the largest prognostic power was achieved.