| Literature DB >> 32157118 |
Jie Shen1, Liang Qi1, Zhengyun Zou1, Juan Du1, Weiwei Kong1, Lianjun Zhao1, Jia Wei1, Ling Lin2, Min Ren2, Baorui Liu3.
Abstract
Hepatocellular carcinoma (HCC) is a common malignant tumor in China. In the present study, we aimed to construct and verify a prediction model of recurrence in HCC patients using databases (TCGA, AMC and Inserm) and machine learning methods and obtain the gene signature that could predict early relapse of HCC. Statistical methods, such as feature selection, survival analysis and Chi-Square test in R software, were used to analyze and select mutant genes related to disease free survival (DFS), race and vascular invasion. In addition, whole-exome sequencing was performed on 10 HCC patients recruited from our center, and the sequencing results were compared with the databases. Using the databases and machine learning methods, the prediction model of recurrence was constructed and optimized, and the selected mutant genes were verified in the test group. The accuracy of prediction was 74.19%. Moreover, these 10 patients from our center were used to verify these mutant genes and the prediction model, and a success rate of 80% was achieved. Collectively, we discovered recurrence-related genes and established recurrence prediction model of recurrence for HCC patients, which could provide significant guidance for clinical prediction of recurrence.Entities:
Year: 2020 PMID: 32157118 PMCID: PMC7064516 DOI: 10.1038/s41598-020-61298-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(A) A total of 31 genes with significant differences in DFS were selected from the TCGA database. Brown color indicates that the gene is also statistically different in AMC database. Blue color indicates that the gene is also statistically different in AMC database, while it is not a highly frequent mutation. Purple color shows that the gene is also statistically different in AMC database, while such difference is opposite. (B) A total of 15 genes with significant differences in DFS were selected from AMC database. Brown color indicates that the gene is also statistically different in TCGA database. Purple color shows that the gene is also statistically different in AMC database, while such difference is opposite.
Race-related gene analysis.
| Gene | Mutation Type | Asian | Non-Asian | P |
|---|---|---|---|---|
| DNAH5 | Mutation | 9 | 1 | 0.006153 |
| Wild Type | 149 | 197 | ||
| MKI67 | Mutation | 9 | 1 | 0.006153 |
| Wild Type | 149 | 197 | ||
| KRT10 | Mutation | 7 | 1 | 0.02423 |
| Wild Type | 151 | 197 | ||
| COL6A3 | Mutation | 1 | 13 | 0.009685 |
| Wild Type | 157 | 185 | ||
| DNAH3 | Mutation | 8 | 3 | 0.06748 |
| Wild Type | 150 | 195 | ||
| CACNA2D1 | Mutation | 7 | 2 | 0.08371 |
| Wild Type | 151 | 196 | ||
| PIK3CA | Mutation | 8 | 3 | 0.06748 |
| Wild Type | 150 | 195 | ||
| PCDHB16 | Mutation | 9 | 3 | 0.06063 |
| Wild Type | 149 | 195 | ||
| DMD | Mutation | 12 | 6 | 0.08735 |
| Wild Type | 146 | 192 | ||
| EPB41L3 | Mutation | 8 | 3 | 0.06748 |
| Wild Type | 150 | 195 | ||
| AHNAK | Mutation | 10 | 8 | 0.4619 |
| Wild Type | 148 | 190 | ||
| FLG | Mutation | 16 | 8 | 0.03914 |
| Wild Type | 142 | 190 |
Vascular invasion-related genes.
| Gene | Boruta algrithm* (P values) | Fisher’s test and Pearson’s test (P values) | ||||
|---|---|---|---|---|---|---|
| TCGA | Inserm | AMC | TCGA | Inserm | AMC | |
| AKAP6 | P < 0.05 | 0.1862 | 1.0000 | 0.3251 | ||
| OBSCN | P < 0.05 | 0.0210 | 0.5661 | 0.5330 | ||
| TSC2 | P < 0.05 | P < 0.05 | 0.1285 | 0.0317 | 0.6768 | |
| LAMA1 | P < 0.05 | 0.2509 | 0.7299 | 0.2885 | ||
| BIRC6 | P < 0.05 | 0.7260 | 0.0415 | 0.3633 | ||
| DNAH5 | P < 0.05 | 0.8609 | 0.0171 | 0.5176 | ||
| PKHD1 | P < 0.05 | 0.1894 | 0.0415 | 0.6734 | ||
| KIAA1109 | P < 0.05 | 1.0000 | 0.0415 | 0.5599 | ||
| DYNC1H1 | P < 0.05 | 0.2714 | 0.0232 | 0.5149 | ||
| FCGBP | P <0.05 | 0.5030 | 0.0735 | 0.6734 | ||
| FREM2 | P < 0.05 | 0.1894 | 0.1289 | 0.4228 | ||
| PLXNA1 | P < 0.05 | 0.3550 | 0.1817 | |||
| MUC12 | P < 0.05 | 1.0000 | 1.0000 | |||
| BSN | P < 0.05 | 0.2371 | 1.0000 | |||
| PLA2G4A | P < 0.05 | 0.2359 | 1.0000 | 0.0506 | ||
| LAMA2 | P < 0.05 | 0.7700 | 0.6626 | 0.1640 | ||
| PTPRZ1 | P < 0.05 | 0.7405 | 0.4576 | 0.06794 | ||
| CIT | P < 0.05 | 1.0000 | 1.0000 | 0.0866 | ||
*Boruta algorithm is a preliminary screening algorithm. P < 0.05 is the preset condition for preliminary screening. The relevant genes screened out do not give specific P values. After preliminary screening, Fisher’s test and Pearson’s test are used for accurate calculation.
Figure 2(A) Heat maps of somatic cell mutation, stage and age information in 10 patients with HCC; (B) left: Highly frequent mutant genes in 10 patients (25 in total). Right: Highly frequent mutant genes in TCGA database (28 in total). Heat maps were generated for the 53 gene mutations in 10 patients. The frequency of TCGA mutations was not high in our 10 patients. (C) Comparison of high frequency gene mutations between 10 HCC patients in our center and TCGA database. (D) GO and KEGG pathways involved in 10 HCC patients in our center. (E) Circos of mutation information in 10 HCC patients. (F) Venn diagram for comparison of mutant genes and TCGA mutant genes in 10 HCC patients. (G) Clustering heat map of high frequency mutant genes in 10 HCC patients. (H) Heat map of driver gene mutation in 10 HCC patients.
Figure 3(A) The flow of decision tree model; (B) The prediction weight of node genes in the decision tree; (C) The weight of each gene analyzed by SVM Model; (D) the ROC curves of the decision tree model and the SVM model are compared.
Figure 4The whole study flow. (A) Kaplan-Meier survival analysis and log-rank test were used to screen DFS-related mutant genes from TCGA database and AMC database. Then these genes were cross-verified in TCGA and AMC, and four DFS-related mutant genes were screened out in these two databases; (B) Boruta algorithm, Fisher’s test and Pearson’s test were used to screen race (Asian/non-Asian)-associated mutations from TCGA database; (C) Boruta algorithm, Fisher’s test and Pearson’s test were used to screen vascular invasion-associated mutations from TCGA, AMC and Inserm database; (D) The HCC data in TCGA were used to construct a model for predicting recurrence, and then AMC and 10 HCC patients in our center were used for verification.