| Literature DB >> 35804988 |
Barbara Lobato-Delgado1, Blanca Priego-Torres2,3,4, Daniel Sanchez-Morillo2,3,4.
Abstract
Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.Entities:
Keywords: Artificial Intelligence; cancer; data integration; machine learning; multimodal data; patient risk stratification; prognosis prediction; survival analysis
Year: 2022 PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Description of the characteristics of the selected studies.
| First Author & Reference | Year | Country | Study Design 1 | Sample Size 2 | Cancer Type | Clinical Data | Molecular Data | Image Data | Predictive Analytics | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AP | Other | Omics | Non-Omics | |||||||||||
| G | E | T | P | |||||||||||
| Zhu [ | 2016 | USA | RCS | 111 patients | LUAD | ✔ | ✔ | Conventional Statistics | ||||||
| Cheng [ | 2017 | USA, China | RCS | 410 patients | ccRCC | ✔ | ✔ | |||||||
| Dos Reis [ | 2017 | UK | MC RCS | 5738 patients | Breast cancer | ✔ | ✔ | |||||||
| Sperduto [ | 2017 | USA | MC RCS | 2186 patients | NSCLC | ✔ | ✔ | ✔ | ||||||
| Elwood [ | 2018 | New Zealand | MC PCS | 9182 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Matsuo [ | 2019 | USA | RCS | 768 patients | Cervical cancer | ✔ | ✔ | |||||||
| Mohebian [ | 2017 | Iran, Spain | SI RCS | 579 patients | Breast cancer | ✔ | ✔ | ✔ | Machine | |||||
| Obrzut [ | 2017 | Poland | SI RCS | 102 patients | Cervical cancer | ✔ | ✔ | ✔ | ||||||
| Zhu [ | 2017 | USA | RCS | 3382 samples | 14 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Chaudhary [ | 2018 | USA | RCS | 360 patients | Hepatocellular carcinoma | ✔ | ✔ | |||||||
| Sun [ | 2018 | China | RCS | 578 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Zhang [ | 2018 | USA, China | RCS | 380 samples | Neuroblastoma | ✔ | ✔ | |||||||
| Zhao [ | 2018 | USA | MC PCS | 1874 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Cheerla [ | 2019 | USA | MC RCS | 11,160 patients | 20 types of cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Ferroni [ | 2019 | Italy | SI PCS | 454 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Jing [ | 2019 | China | MC RCS | 4630 patients | Nasopharyngeal carcinoma | ✔ | ✔ | |||||||
| Matsuo [ | 2019 | USA | RCS | 768 patients | Cervical cancer | ✔ | ✔ | |||||||
| Sun [ | 2019 | China | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Tapak [ | 2019 | Iran | RCS | 550 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Baek [ | 2020 | South Korea | RCS | 177 patients | Pancreatic adenocarcinoma | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Boeri [ | 2020 | Italy | RCS | 610 patients | Breast cancer | ✔ | ✔ | ✔ | Machine | |||||
| Choi [ | 2020 | South Korea | MC CS-RCS | 205 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ✔ | |||||
| Zhang [ | 2020 | China | RCS | 251 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ||||||
| Arya [ | 2020 | India | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ||||
| Tong [ | 2020 | USA | RCS | ~1000 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Owens [ | 2021 | UK | RCS | 352 patients | Hepatocellular carcinoma | ✔ | ✔ | |||||||
| Malik [ | 2021 | India | RCS | 532 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Zhao [ | 2021 | China | RCS | 474 patients | Low-Grade Glioma | ✔ | ✔ | ✔ | ||||||
| Hassanzadeh [ | 2021 | USA | RCS | 836 patients | 3 types of cancer | ✔ | ✔ | |||||||
| Zhang [ | 2021 | UK | RCS | 131 patients | 35 types of cancer | ✔ | ✔ | |||||||
| Chharia [ | 2021 | India | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Yousefi [ | 2017 | USA | RCS | 3323 patients | 5 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | Mixed | |||
| Katzman [ | 2018 | USA | RCS | 1980 patients | Breast cancer | ✔ | ✔ | ✔ | ||||||
| Mobadersany [ | 2018 | USA | RCS | 769 patients | Gliomas | ✔ | ✔ | ✔ | ||||||
| Huang [ | 2019 | USA, China | RCS | 583 patients | Breast cancer | ✔ | ✔ | ✔ | ✔ | |||||
| Wang [ | 2019 | China | MC RCS | 245 patients | HGSOC | ✔ | ✔ | |||||||
| Shao [ | 2020 | China | RCS | 1324 patients | LUSC, breast cancer, LIHC | ✔ | ✔ | |||||||
| Chen [ | 2020 | USA | RCS | 1186 patients | Glioma and ccRCC | ✔ | ✔ | ✔ | ||||||
| Hao [ | 2020 | USA | RCS | 447 patients | Glioblastoma multiforme | ✔ | ✔ | ✔ | ||||||
| Ning [ | 2020 | Germany | RCS | 209 patients | ccRCC | ✔ | ✔ | ✔ | ||||||
| Zhang [ | 2021 | China | RCS | 454 patients | Bladder cancer | ✔ | ✔ | ✔ | ||||||
| Chai [ | 2021 | China | RCS | 5032 patients | 15 types of cancer | ✔ | ✔ | ✔ | ||||||
| Vale-Silva [ | 2021 | Germany | RCS | 11,081 patients | 33 types of cancer | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||
| Wang [ | 2021 | China | RCS | Not specified | 7 types of cancer | ✔ | ✔ | ✔ | ||||||
| Poirion [ | 2021 | USA | RCS | 10,000 samples | 32 types of cancer | ✔ | ✔ | |||||||
Abbreviations: AP, Anatomopathological; G, Genomics; E, Epigenomics; T, Transcriptomics; P, Proteomics; RCS, retrospective cohort study; MC, multi-centric; PCS, prospective cohort study; SI, single-institution; CS, cross-sectional; LUAD, Lung Adenocarcinoma; ccRCC, Clear Cell Renal Cell Carcinoma; NSCLC, Non-Small-Cell Lung Cancer; HGSOC, High-grade serous ovarian cancer; LUSC, Lung Squamous Cell Carcinoma; LIHC, Liver Hepatocellular Carcinoma. 1 MC and SI terms were assigned when available. 2 Number of patients used to build, train, and internally validate the predictive model.
Clinical variables by subtypes and bibliographical reference to the article in which they appear.
| Subtype of Clinical Data | Variables | Used by |
|---|---|---|
| Demographic data | Age at diagnosis | [ |
| Gender | [ | |
| Ethnicity | [ | |
| General measures of health status | BMI | [ |
| Temperature | [ | |
| Respiration rate | [ | |
| Systolic and diastolic blood pressure | [ | |
| Heart rate | [ | |
| Menopausal status | [ | |
| Lifestyle (e.g., smoking habit) | [ | |
| Prior malignancies | [ | |
| Presence/absence of comorbidities (e.g., hypercholesterolemia, hypertension, diabetes mellitus, synchronous malignancies, etc.) | [ | |
| Number of comorbidities | [ | |
| Risk factors (e.g., high sensitivity to C reactive protein, etc.) | [ | |
| Laboratory test | Blood cells count (e.g., leukocytes, platelets) | [ |
| Haemoglobin level | [ | |
| Serum metabolites/enzymes level (e.g., sugar, urea, creatinine, bicarbonate, albumin, lactate dehydrogenase, etc.) | [ | |
| Surgery-related data | Surgery time | [ |
| Median blood lost | [ | |
| Presence of intraoperative complications | [ | |
| Type of complications | [ | |
| Length of hospital stay | [ | |
| Pathological data | Mode of detection (clinical or screening) | [ |
| Cancer type (primary site) | [ | |
| Cellularity of tumour content | [ | |
| Degree of abnormality of cancer cells | [ | |
| Primary tumour laterality | [ | |
| Primary tumour size | [ | |
| Presence/absence of multifocal tumours | [ | |
| Surgery status | [ | |
| Type of surgery | [ | |
| Resection extent | [ | |
| Parametrial involvement (in cervical cancer) | [ | |
| Skin or chest wall invasion (in breast cancer) | [ | |
| Lymph node status | [ | |
| Number of positive lymph nodes | [ | |
| Lymph node involvement ratio | [ | |
| Lymph-vascular space invasion | [ | |
| Deep stromal invasion | [ | |
| Histologic type and subtype | [ | |
| Histological grade | [ | |
| T Stage | [ | |
| N Stage | [ | |
| M Stage | [ | |
| Stage (e.g., pTNM, NPI, FIGO staging system) | [ | |
| Number of brain metastases | [ | |
| Presence/absence of distant metastasis at diagnosis | [ | |
| Therapy-related data | Prior treatment | [ |
| Radiotherapy (yes/no) | [ | |
| Chemotherapy (yes/no) | [ | |
| Targeted therapy (yes/no) (e.g., hormonal therapy, anti-HER2 therapy, etc.) | [ | |
| Response to chemotherapy (complete/partial/none) | [ | |
| Karnofsky Performance Status (KPS) | [ |
Abbreviations: BMI, Body Mass Index; pTNM, pathological Tumour-Node-Metastasis staging system for cancers of the American Joint Committee on Cancer (AJCC); NPI, Nottingham Prognostic Index; FIGO, Fédération Internationale de Gynécologie et d’Obstétrique.
Summary of types of omics data, methods used to obtain them according to the scientific paper, and variables used to build predictive models along with the bibliographical reference.
| Type of Omics Data | Methods | Variables | Used by |
|---|---|---|---|
| Genomics | WGS | Germinal variants | [ |
| Somatic point mutations | [ | ||
| Mutational status of genes | [ | ||
| CNAs | [ | ||
| CNB | [ | ||
| TMB | [ | ||
| Epigenomics | DNA methylation arrays | DNA methylation data | [ |
| Transcriptomics | RNA-Seq | mRNA levels | [ |
| miRNA levels | [ | ||
| Gene expression profiles | [ | ||
| Proteomics | RPPA | Protein expression levels | [ |
Abbreviations: WGS, Whole Genome Sequencing; WES, Whole Exome Sequencing; RPPA, Reverse-Phase Protein Arrays; CNAs, Copy Number Aberrations; SNVs, Single-Nucleotide Variants; CNB, Copy Number Burden; TMB, Tumour Mutation Burden. The Copy Number Burden is a measure of the copy number alteration level within a genome in proportion to the genome length. The Tumour Mutation Burden (TMB) represents the number of somatic mutations per megabase of interrogated genomic sequence. Both are used as predictive biomarkers in cancer.
Summary of non-omics data that appear in the reviewed papers. Methods used to obtain them and the variables containing the pertinent information to build the models are listed along with the bibliographical reference.
| Type of Molecular Data | Methods | Variables | Used by |
|---|---|---|---|
| IHC data | Immuno- | Presence/absence of proteins in tumour tissue (e.g., ER, PR, Ki-67) | [ |
| Percentage of protein expression in tumour tissue (e.g., ER, Ki-67, etc.) | [ | ||
| Over-expression of proteins in tumour tissue (e.g., HER-2) | [ | ||
| Genetic data | PCR-based | The molecular subtype of cancer (luminal A, luminal B, HER-2 positive luminal B, non-luminal HER-2 positive, triple-negative) | [ |
| Somatic point mutations (e.g., | [ | ||
| Mutational status of genes | [ |
Abbreviations: IHC, Immunohistochemistry; PCR, Polymerase Chain Reaction; ER, estrogen receptors; PR, progesterone receptors; HER-2, Human Epidermal Growth Factor receptor 2; IDH, Isocitrate Dehydrogenase gene.
Summary of image techniques, methods, and features extracted in the reviewed studies to build predictive models on cancer prognosis.
| Methods | Type of Data | Features | Used by |
|---|---|---|---|
| Image segmentation and hand-crafted features | WSIs | Quantitative image features | [ |
| ROIs from WSIs | [ | ||
| MRI images | Quantitative image features | [ | |
| CT images | ROIs | [ |
Abbreviations: WSIs, Whole-Slide Images; MRI, Magnetic Resonance Imaging; CT, Computed Tomography; ROIs, Regions of Interest.
Information related to the techniques used in the articles that applied a conventional statistical approach when building cancer prognosis predictive models.
| First Author & Reference | Predictive Modelling | Validation Technique(s) | Performance Metrics | Model Output | Dimensionality Reduction | External Validation | Model Comparison |
|---|---|---|---|---|---|---|---|
| Zhu [ | SuperPC regression | 10-fold CV | HR and Log-rank tests | HR. Dichotomization of patients into high/low-risk and low-risk | ✔ | ||
| Cheng [ | Lasso–Cox model | 10-fold CV | Log-rank test | Risk index of death | ✔ | ||
| Dos Reis [ | Multivariate CPH regression within a multivariable fractional polynomial model | No | AUC | Risk index of death at 10-years | ✔ | ✔ | ✔ |
| Sperduto [ | Multivariate multiple CPH regression | No | None | Lung-molGPA score | ✔ | ||
| Elwood [ | Multivariate CPH regression | Bootstrapping for internal and external validation | C-index | Predicted OS (months) at 10 years | ✔ | ✔ | |
| Matsuo [ | Multivariate CPH regression | 10-fold CV | MAE, C-index | Survival risk index, PFS, and OS | ✔ |
Abbreviations: superPC, supervised Principal Components; CPH, Cox Proportional Hazards; CV, cross-validation; HR, Hazard Ratio; AUC, Area Under Curve; MAE, Median Absolute Error; OS, Overall Survival; PFS, Progression-Free Survival.
Information related to the techniques used in the articles that applied machine learning techniques when building cancer prognosis predictive models.
| First Author & Reference | Predictive Modelling | Validation Technique(s) | Performance Metrics | Model Output | Dimensionality | External Validation | Model Comparison |
|---|---|---|---|---|---|---|---|
| Matsuo [ | DNN | 10-CV | MAE, C-index | Predicted OS and PFS | ✔ | ||
| Mohebian [ | BDT | Bagging, hold-out and 4-CV | Patient dichotomization | ✔ | ✔ | ||
| Obrzut [ | PNN, MLP, GEP, SVM, RBFNN, and K-means | 10-CV | Predicted OS at 5 years | ✔ | |||
| Zhu [ | MOK | Monte Carlo CV | C-index | Predicted overall prognostic score | ✔ | ✔ | ✔ |
| Chaudhary [ | DL-based model | 5-CV and 10-CV | C-index, log-rank | Patient dichotomization | ✔ | ✔ | ✔ |
| Sun [ | SimpleMKL | 10-CV | AUC, | Patient dichotomization | ✔ | ✔ | |
| Zhang [ | ANN, K-means, SVM, and XGBoost | 10-CV | AUC | Predicted OS and patient dichotomization | ✔ | ✔ | ✔ |
| Zhao [ | Gradient Boosting, RF, SVM, and ANN | 10-CV | ROC curve, | Patient dichotomization | ✔ | ✔ | |
| Cheerla [ | DNN | Hold-out | C-index | Predicted OS | ✔ | ||
| Ferroni [ | MKL based on SVM | 3-CV | AUC, | Patient dichotomization | |||
| Jing [ | DNN | Bootstrapping | C-index | Predicted DFS and patient dichotomization | ✔ | ✔ | |
| Sun [ | DNN | 10-CV | ROC curve, AUC, | Patient dichotomization | ✔ | ✔ | ✔ |
| Tapak [ | NB, RF, AdaBoost, SVM, LS-SVM, AdaBag | Hold-out | Patient dichotomization | ✔ | |||
| Baek [ | SVM, LR, L2RR, RF | Hold-out and 5-CV | Predicted DFS and OS at 5 years | ✔ | ✔ | ||
| Boeri [ | SVM, ANN | 3-CV | Risk of recurrence and risk of death | ✔ | |||
| Choi [ | RSF | Bagging | iAUC | Predicted OS and patient dichotomization | ✔ | ✔ | |
| Zhang [ | MKL based on SVM | 10-CV | AUC | Patient dichotomization | ✔ | ✔ | |
| Arya [ | Ensemble of CNNs and RF | 10-CV | AUC, | Patient dichotomization | ✔ | ✔ | ✔ |
| Tong [ | ANN | 4-CV | C-index | HR | ✔ | ||
| Owens [ | DL-based model | Not detailed | Silhouette score, log-rank | Patient dichotomization | ✔ | ✔ | |
| Malik [ | DL-based model | 10-CV | AUC, | Patient dichotomization | ✔ | ✔ | ✔ |
| Zhao [ | ANN | 10-CV | C-index | Patient dichotomization | ✔ | ✔ | |
| Hassanzadez [ | DL-based model | Hold-out and 5-CV |
| Patient dichotomization | ✔ | ✔ | |
| Zhang [ | DL-based model | Not detailed | C-index, IBS | Predicted OS | ✔ | ✔ | |
| Chharia [ | DL-based model | 5-CV | Precision, | Probability of survival and patient dichotomization | ✔ | ✔ |
Abbreviations: BDT, Bagged Decision Tree; PNN, Probabilistic Neural Network; MLP, Multilayer Perceptron; GEP, Gene Expression Programming; RBFNN, Radial Basis Function Neural Network; MOK, Multi-Omic Kernel; DL, Deep Learning; DNN, Deep Neural Network; MKL, Multiple Kernel Learning; SVM, Support Vector Machine; NB, Naïve Bayes; LS-SVM, Least-Squares Support Vector Machine; LR, Logistic Regression; L2RR, L2 Regularised regression; RF, Random Forest; ANN, Artificial Neural Network; XGBoost, Extreme Gradient Boosting; RSF, Random Survival Forest; CV, cross-validation; Sn, Sensitivity; Sp, Specificity; Acc, accuracy; AUC, Area Under Curve; MCC, Matthews Correlation Coefficient; +LR, Positive Likelihood Ratio; -LR, Negative Likelihood Ratio; DOR, Diagnostic Odds Ratio; DP, Discriminant Power; κ, Cohen’s kappa coefficient; ROC, Receiving Operating Characteristics; CS, Calibration Slope; LR, Likelihood Ratio; FPR, False Positive Rate; HR, Hazard Ratio; MAE, Mean Absolute Error; PPV, Positive Predictive Value; NPV, Negative Predictive Value; BS, Brier Score; IBS, Integrated Brier Score; iAUC, integrated Area Under Curve; OS, Overall Survival; DFS, Disease-Free Survival; PFS, Progression-Free Survival.
Information related to the techniques used in the articles that applied a mixed approach (conventional statistics together with machine learning) when building cancer prognosis predictive models.
| First Author & | Predictive | Validation | Performance | Model | Dimensionality | External Validation | Model |
|---|---|---|---|---|---|---|---|
| Yousefi [ | DL-CPH | Monte Carlo CV | C-index | Risk index of death, correlated to OS | ✔ | ||
| Katzman [ | DL-CPH | Bootstrapping, | C-index | HR | ✔ | ✔ | |
| Mobadersany [ | DL-CPH | Monte Carlo CV | C-index | HR | ✔ | ||
| Huang [ | DL-CPH | 5-CV | C-index, log-rank test | HR and patient dichotomization | ✔ | ✔ | |
| Wang [ | DL-CPH | Not detailed | C-index, AUC, | Risk index of recurrence, correlated to RFS and Patient dichotomization. Recurrence probability in a specific time point | ✔ | ✔ | |
| Shao [ | Adaboost for diagnosis and CPH for prognosis | 5-CV | C-index, BS | Risk index of death and patient dichotomization | ✔ | ✔ | |
| Chen [ | DL-CPH | 15-CV | C-index | Patient dichotomization | ✔ | ||
| Hao [ | DL-CPH | Not detailed | C-index | Patient dichotomization | ✔ | ✔ | |
| Ning [ | DL-CPH | 10-CV | C-index | Patient dichotomization | ✔ | ||
| Chai [ | DL-CPH | Not detailed | C-index | Patient dichotomization | ✔ | ✔ | ✔ |
| Vale-Silva [ | DNN-based model | Hold-out | Ctd, IBS | Conditional survival probability for 1 to 30 years | ✔ | ✔ | |
| Wang [ | NMF-CPH | 3-CV | C-index | Survival probability and patient dichotomization | ✔ | ✔ | ✔ |
| Poirion [ | Ensemble of DL and SVM models | Hold-out and 5-CV | Log-rank | Patient’s risk of death | ✔ | ✔ | ✔ |
| Zhang [ | DL-CPH | 10-CV | AUC | Patient dichotomization | ✔ | ✔ |
Abbreviations: CPH, Cox Proportional Hazards; ML, Machine Learning; DL-CPH, Deep Learning Cox Proportional Hazard; DNN, Deep Neural Networks; NMF, non-negative matrix factorisation; CV, cross-validation; LOO-CV, Leave-One-Out Cross-Validation; AUC, Area Under Curve; Acc, accuracy; BS, Brier Score; Ctd, time-dependent concordance index; IBS, integrated Brier Score; OS, Overall Survival; HR, Hazard Ratio; RFS, Recurrence Free Survival.
Summary of data sources used to build predictive models of cancer prognosis.
| Type of | Repositories & Programs/Studies | Used by |
|---|---|---|
| Public | ICGC Data Portal (e.g., Pan-Cancer Atlas Initiative, TCGA Program) | [ |
| EGA (e.g., METABRIC Study) | [ | |
| GDC Data Portal (e.g., TARGET Program) | [ | |
| GEO | [ | |
| COSMIC | [ | |
| ArrayExpress Archive of Functional Genomics Data | [ | |
| Institutional databases | N/A | [ |
Abbreviations: ICGC, International Cancer Genome Consortium; TCGA, The Cancer Genome Atlas; EGA, European Genome-Phenome Archive; METABRIC, Molecular Taxonomy of Breast Cancer International Consortium; GDC, Genomics Data Commons; GEO, Gene Expression Omnibus; COSMIC, Catalogue of Somatic Mutations in Cancer.