Literature DB >> 32944369

Multiomics and machine learning in lung cancer prognosis.

Abstract

Entities: Chemical

Year: 2020 PMID： 32944369 PMCID： PMC7475596 DOI： 10.21037/jtd-2019-itm-013

Source DB: PubMed Journal: J Thorac Dis ISSN： 2072-1439 Impact factor: 3.005

× No keyword cloud information.

Worldwide, lung cancer accounts for 11.6% of total cancer cases; it is the most common cancer type and the leading cause of cancer death (1). Despite the development of technology and treatment, the prognosis of lung cancer remains poor (2-5). With the development of artificial intelligence technology and the advent of omics, including radiomics, proteomics, genomics, and transcriptomics (6-8), multiomics analysis based on machine learning has great potential to improve lung cancer prognosis. In this paper, schemes based on multiomics and machine learning for improving the prognosis of lung cancer are reviewed.

Radiomics, pathology, demographics, clinical data and machine learning in lung cancer prognosis

Currently, radiomics research and medical or biological research are usually carried out separately by researchers in different disciplines. However, with the emergence of the field of radiomics, the association between biomarkers and radiomics features has attracted increasing research interest (9). As shown in , in 2010, Jayasurya et al. (10) used radiomic features from positron emission tomography (PET) images, pathological features, and performance status (WHO-PS) to develop two personalized prediction models based on Bayesian networks (BNs) and support vector machines (SVMs) to predict the 2-year survival rate of patients with inoperable non-small cell lung cancer (NSCLC). The authors validated the models in three external validation cohorts from three centres. Among them, the area under the curve (AUC) of the prediction model based on BNs reached 0.82 in a cohort of 28 patients. However, the validation cohort in this work was small, and more research on the clinical utility of the model is needed to confirm the results. In 2013, Sun et al. (11) attempted to differentiate benign from malignant lung cancer according to computed tomography (CT) images during early diagnosis to improve prognosis. This group used SVMs and other classifiers, including neural networks, LASSO regressions, boosting, random forests, decision trees, and k-nearest neighbours, to establish prediction models. A set of radiomics features, including 476 textural and 9 morphological features, and demographic parameters were used as input data. The AUC values for the SVM, neural networks, LASSO regressions, boosting, random forests, decision trees, and k-nearest neighbours were 0.94, 0.92, 0.91, 0.86, 0.85, 0.73, and 0.72, respectively. Although the experimental results showed that the SVM-based model was effective, only 57 patients were included in the validation cohort. Recently, Hyun et al. (12) performed a similar study in which a total of 44 demographic and radiomic features were used as input data for a machine-learning model to predict tumour histological subtype. To reduce feature dimensions, they applied a ranking-based feature selection method with the Gini coefficient. By evaluating radiomic and demographic features’ associations with the histological class, they obtained Gini coefficient scores and then ranked these features based on the Gini coefficient. Nine feature subsets were selected to identify the optimal feature selection size, ranging from 5 to 44 in increments of 5. Five different machine-learning algorithms for binary classification, namely, a random forest, a neural network, a naïve BN method, a logistic regression model, and SVM, were evaluated. When using a subset with 15 features, the logistic regression model (AUC =0.859) performed better than other classifiers.

Table 1

Summation of journal publications

Author	Data type	Model	Test-set size	AUC
Jayasurya et al. (10)	Radiomic features; pathological features; WHO-PS	Bayesian networks	28	0.76
Jayasurya et al. (10)	Radiomic features; pathological features; WHO-PS	Support vector machines	28	0.82
Sun et al. (11)	Radiomic features; demographic features	Support vector machines	57	0.94
		Neural networks		0.92
		LASSO regression		0.91
		Boosting		0.86
		Random forest		0.85
		Decision tree		0.73
Hyun et al. (12)	Radiomic features; demographic features	K-nearest neighbours	119	0.85
		Random forest		0.79
		Neural network		0.854
		Naïve Bayes		0.755
		Logistic regression		0.859
		Support vector machines		0.766

AUC, area under the curve.

Genomics, transcriptomics, genetics, proteomics and machine leaning in lung cancer prognosis

In addition to radiomics, pathology, and demographics, there is research interest with regard to the genomics, transcriptomics, genetics and proteomics of lung cancer prognosis. Wang et al. (13) presented a method to construct a prediction model of EGFR mutation-induced drug resistance in lung cancer by combining pathological and demographic data and EGFR-inhibitor interaction patterns. In this method, they initially translated mutations into 3D structures, after which the binding free energies of the mutants and inhibitors were evaluated and the dynamics of the kinase mutant-inhibitor systems were simulated. The EGFR-inhibitor interaction was characterized by binding free energy components, including polar and nonpolar interactions, van der Waals forces and electrostatic interactions. The classification model was built by extreme learning machines, and they also conducted a comparison between a model involving only the mutation feature and a model involving multiomics features, with the latter (classification accuracy of 95.13%) being much better than the former (classification accuracy of 79.17%). In 2015, Emaminejad et al. (14) integrated two genomic biomarkers and radiomic features to predict recurrence risk in patients with stage I NSCLC: they trained a multilayer perceptron-based model using two genomic features (protein expression scores of RRM1 and ERCC1) and a naïve BN classifier using eight redundant radiomic features to predict cancer recurrence risk. The AUC values of the multilayer perceptron classifier and naïve BN classifier were 0.68±0.06 and 0.78±0.07, respectively. Moreover, the AUC value increased significantly (0.84±0.05, P<0.05) when an equal weighting factor to fuse the prediction scores generated by the two models was used. In 2017, Yu et al. (15) used random forest, transcriptomics, and proteomics signatures to predict histology grade (AUC >0.80), building integrative models by using histopathologic and transcriptomic features as input data of the regularized Cox proportional hazards model; the integrative model outperformed transcriptomics or histopathology alone for prognostic prediction (P=0.0182±0.0021). Additionally, Liu et al. (16) identified a novel cluster of prognostic biomarkers for lung adenocarcinoma (LAC) by multiomics analysis. In this work, five microarray datasets downloaded from the Gene Expression Omnibus database were progressively processed by genome-wide relative significance and global significance, and 200 genes able to stably distinguish between nontumour and tumour cells were determined by SVM assessment. These genes were then subjected to gene coexpression and protein-protein interaction (PPI) network analyses. CENPA, CDC20 and CDC20 were identified and validated as having high coexpression and strong PPI patterns in clinical samples, and CENPA, CDC20 and CDK1 might serve as a novel cluster of prognostic biomarkers in LAC. In 2018, Matsubara et al. (17) proposed an approach to lung cancer classification that integrates PPI network and gene expression profile data as input features of a convolutional network; comparisons between convolutional networks and other machine-learning models (random forest and SVM) were also conducted. The model-based convolutional network (accuracy rate was 83.16%) outperformed the model-based SVM and random forest methods (accuracy rates were 81.58% and 82.63%, respectively). Malik et al. (18) in 2019 utilized copy number variations (CNVs), mutations, proteins, RNAs and mi-RNAs to develop three prediction models for LAC prognosis based on SVMs, neural network and random undersampling (RUS) ensemble boosts, with accuracies of 72.7%, 92.9% and 66.7%, respectively. To acquire more omics information, Lee et al. (19) investigated four data features, including DNA methylation, RNA-Seq, CNVs and miRNA-Seq, to build a survival risk stratification model for LAC patients. They proposed an autoencoding approach to predict survival subtype, compared to other approaches, principal component analysis (PCA), Cox-ph and iClusterPlus. As the autoencoding approach has a better log-rank P value (4.08e-09) and C-index (0.65), autoencoding exhibits better prediction performance. Recently, Giang et al. (20) presented a method that combines gene expression, miRNA expression and DNA methylation data features to construct a classification model of lung cancer patient stratification. SVM was used for building a classification model, and a comparison between the approach involving an integrated dataset and that in which only a single dataset was used was also conducted. shows the accuracy and AUC value of the models based on different datasets and models.

Table 2

Accuracy and AUC value of journal publications based on different datasets and models

Author	Data type	Models	Accuracy (%)	AUC
Wang et al. (13)	Mutation features	Extreme learning machines	79.17	NA
Wang et al. (13)	Mutation features; pathological features; demographic features	Extreme learning machines	95.83	NA
Emaminejad et al. (14)	Genomic features	Multilayer perceptron; Naïve Bayes	NA	0.68
	Radiomic features			0.78
	Integrated dataset			0.84
Yu et al. (15)	Genomic features; transcriptomics/proteomics features; histopathology features	Random forest	NA	0.81
Matsubara et al. (17)	PPI network; gene expression	Convolutional networks	83.16	NA
		Radom forest	82.63
		Support vector machines	81.58
Malik et al. (18)	Copy number variations; mutation; protein; RNA; mi-RNA	Support vector machines	72.7	NA
		Neural network	92.9
		RUS ensemble boost	66.7
Giang et al. (20)	Gene expression	Support vector machines	62.50	0.6964
	DNA methylation		71.88	0.6235
	mi-RNA expression		65.63	0.722
	Integrated dataset		78.13	0.7227

AUC, area under the curve; RUS, random undersampling; NA, not available; PPI, protein-protein interaction.

AUC, area under the curve; RUS, random undersampling; NA, not available; PPI, protein-protein interaction. Overall, a growing number of studies have combined machine learning with multiomics analysis to improve the prognosis of lung cancer (21-25), and radiomics, genetics, genomics, proteomics, and transcriptomics are widely employed in the fields of lung cancer. Although the validation cohort in many related studies is relatively small, it is sufficient to indicate that multiomics analysis based on machine learning has great potential in lung cancer prognosis, and more schemes in this field will be developed to improve prognosis for these patients. I hope that this review will be of use to researchers who conduct related research. The article’s supplementary files as

24 in total

Review 1. Integrating multiple 'omics' analysis for microbial biology: application and methodologies.

Authors: Weiwen Zhang; Feng Li; Lei Nie
Journal: Microbiology (Reading) Date: 2009-11-12 Impact factor: 2.777

2. Added Value of Computer-aided CT Image Features for Early Lung Cancer Diagnosis with Small Pulmonary Nodules: A Matched Case-Control Study.

Authors: Peng Huang; Seyoun Park; Rongkai Yan; Junghoon Lee; Linda C Chu; Cheng T Lin; Amira Hussien; Joshua Rathmell; Brett Thomas; Chen Chen; Russell Hales; David S Ettinger; Malcolm Brock; Ping Hu; Elliot K Fishman; Edward Gabrielson; Stephen Lam
Journal: Radiology Date: 2017-09-05 Impact factor: 11.105

3. Cancer statistics in China, 2015.

Authors: Wanqing Chen; Rongshou Zheng; Peter D Baade; Siwei Zhang; Hongmei Zeng; Freddie Bray; Ahmedin Jemal; Xue Qin Yu; Jie He
Journal: CA Cancer J Clin Date: 2016-01-25 Impact factor: 508.702

4. Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma.

Authors: Kun-Hsing Yu; Gerald J Berry; Daniel L Rubin; Christopher Ré; Russ B Altman; Michael Snyder
Journal: Cell Syst Date: 2017-11-15 Impact factor: 10.304

5. Examination of Independent Prognostic Power of Gene Expressions and Histopathological Imaging Features in Cancer.

Authors: Tingyan Zhong; Mengyun Wu; Shuangge Ma
Journal: Cancers (Basel) Date: 2019-03-13 Impact factor: 6.639

6. Molecular characterization of clinical responses to PD-1/PD-L1 inhibitors in non-small cell lung cancer: Predictive value of multidimensional immunomarker detection for the efficacy of PD-1 inhibitors in Chinese patients.

Authors: Peng Song; Xiaoxia Cui; Li Bai; Xiangdong Zhou; Xiaoli Zhu; Jian Zhang; Faguang Jin; Jianping Zhao; Chengzhi Zhou; Yanbin Zhou; Xiaoju Zhang; Kai Wang; Qi Wang; Yao Yu; Xiaoyu Zhang; Chunxue Bai; Li Zhang
Journal: Thorac Cancer Date: 2019-04-23 Impact factor: 3.500

7. Integrative and comparative genomic analyses identify clinically relevant pulmonary carcinoid groups and unveil the supra-carcinoids.

Authors: N Alcala; N Leblay; A A G Gabriel; L Mangiante; D Hervas; T Giffon; A S Sertier; A Ferrari; J Derks; A Ghantous; T M Delhomme; A Chabrier; C Cuenin; B Abedi-Ardekani; A Boland; R Olaso; V Meyer; J Altmuller; F Le Calvez-Kelm; G Durand; C Voegele; S Boyault; L Moonen; N Lemaitre; P Lorimier; A C Toffart; A Soltermann; J H Clement; J Saenger; J K Field; M Brevet; C Blanc-Fournier; F Galateau-Salle; N Le Stang; P A Russell; G Wright; G Sozzi; U Pastorino; S Lacomme; J M Vignaud; V Hofman; P Hofman; O T Brustugun; M Lund-Iversen; V Thomas de Montpreville; L A Muscarella; P Graziano; H Popper; J Stojsic; J F Deleuze; Z Herceg; A Viari; P Nuernberg; G Pelosi; A M C Dingemans; M Milione; L Roz; L Brcic; M Volante; M G Papotti; C Caux; J Sandoval; H Hernandez-Vargas; E Brambilla; E J M Speel; N Girard; S Lantuejoul; J D McKay; M Foll; L Fernandez-Cuesta
Journal: Nat Commun Date: 2019-08-20 Impact factor: 14.919

8. Personalized prediction of EGFR mutation-induced drug resistance in lung cancer.

Authors: Debby D Wang; Weiqiang Zhou; Hong Yan; Maria Wong; Victor Lee
Journal: Sci Rep Date: 2013-10-04 Impact factor: 4.379

9. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach.

Authors: Hugo J W L Aerts; Emmanuel Rios Velazquez; Ralph T H Leijenaar; Chintan Parmar; Patrick Grossmann; Sara Carvalho; Sara Cavalho; Johan Bussink; René Monshouwer; Benjamin Haibe-Kains; Derek Rietveld; Frank Hoebers; Michelle M Rietbergen; C René Leemans; Andre Dekker; John Quackenbush; Robert J Gillies; Philippe Lambin
Journal: Nat Commun Date: 2014-06-03 Impact factor: 14.919

10. Stratifying patients using fast multiple kernel learning framework: case studies of Alzheimer's disease and cancers.

Authors: Thanh-Trung Giang; Thanh-Phuong Nguyen; Dang-Hung Tran
Journal: BMC Med Inform Decis Mak Date: 2020-06-16 Impact factor: 2.796

4 in total

Review 1. Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis.

Authors: Barbara Lobato-Delgado; Blanca Priego-Torres; Daniel Sanchez-Morillo
Journal: Cancers (Basel) Date: 2022-06-30 Impact factor: 6.575

2. Investigation on the incidence and risk factors of lung cancer among Chinese hospital employees.

Authors: Zi-Hao Chen; Zhi-Yong Chen; Jing Kang; Xiang-Peng Chu; Rui Fu; Jia-Tao Zhang; Yi-Fan Qi; Jing-Hua Chen; Jun-Tao Lin; Ben-Yuan Jiang; Xue-Ning Yang; Yi-Long Wu; Wen-Zhao Zhong; Qiang Nie
Journal: Thorac Cancer Date: 2022-07-11 Impact factor: 3.223

3. LASSO Model Better Predicted the Prognosis of DLBCL than Random Forest Model: A Retrospective Multicenter Analysis of HHLWG.

Authors: Ziyuan Shen; Shuo Zhang; Yaxue Jiao; Yuye Shi; Hao Zhang; Fei Wang; Ling Wang; Taigang Zhu; Yuqing Miao; Wei Sang; Guoqi Cai; Working Group Huaihai Lymphoma
Journal: J Oncol Date: 2022-09-16 Impact factor: 4.501

Review 4. Application of Artificial Intelligence in Lung Cancer.

Authors: Hwa-Yen Chiu; Heng-Sheng Chao; Yuh-Min Chen
Journal: Cancers (Basel) Date: 2022-03-08 Impact factor: 6.639

4 in total