| Literature DB >> 31490960 |
Harpreet Kaur1, Sherry Bhalla2,3, Gajendra P S Raghava2.
Abstract
BACKGROUND: Liver Hepatocellular Carcinoma (LIHC) is one of the major cancers worldwide, responsible for millions of premature deaths every year. Prediction of clinical staging is vital to implement optimal therapeutic strategy and prognostic prediction in cancer patients. However, to date, no method has been developed for predicting the stage of LIHC from the genomic profile of samples.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31490960 PMCID: PMC6730898 DOI: 10.1371/journal.pone.0221476
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1The workflow representing the analysis of methylation and expression profile of LIHC early and late stage samples.
The performance of different machine learning techniques-based stage classification models developed using 21 methylation CpG sites selected by WEKA (LS-CPG-WEKA).
| Machine Learning Techniques | Dataset | Performance Measures | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | AUC with 95%CI | ||
| SVM | Training | 80.43 | 71.63 | 75.99 | 0.52 | 0.81 (0.69–0.89) |
| Validation | 71.43 | 75 | 73.24 | 0.46 | 0.78 (0.67–0.89) | |
| Random Forest | Training | 77.54 | 73.05 | 75.27 | 0.51 | 0.8 (0.69–0.89) |
| Validation | 71.43 | 83.33 | 77.46 | 0.55 | 0.79 (0.68–0.87) | |
| Naïve Bayes | Training | 87.68 | 62.41 | 74.91 | 0.52 | 0.82 (0.73–0.89) |
| Validation | 80 | 63.89 | 71.83 | 0.44 | 0.82 (0.76–0.84) | |
| SMO | Training | 82.61 | 69.5 | 75.99 | 0.53 | 0.76 (0.68–0.83) |
| Validation | 74.29 | 75 | 74.65 | 0.49 | 0.75 (0.69–0.80) | |
| J48 | Training | 62.32 | 75.18 | 68.82 | 0.38 | 0.67 (0.61–0.72) |
| Validation | 45.71 | 61.11 | 53.52 | 0.07 | 0.57 (0.52–0.63) | |
Fig 2A) Heatmap displaying the differential methylation pattern (with FDR < 0.05) and B) Circos plot representing the chromosome location of top 21 CpG sites (LS-CPG-WEKA) in early versus late stage of LIHC.
Gene enrichment analysis of LS-RNA-AUC signature using Enrichr.
This signature contains 61 downregulated and 39 upregulated RNA transcripts in early stage in comparison to late stage of LIHC).
| Biological importance | 61 RNA transcripts (genes) downregulated in early stage | 39 RNA transcripts (genes) upregulated in early stage | ||
|---|---|---|---|---|
| Name | Adjusted | Name | Adjusted | |
| Glycolysis_Homo sapiens_P00024 | 0.0001441 | Blood coagulation_Homo sapiens_P00011 | 0.00000434 | |
| Fructose galactose metabolism_Homo sapiens_P02744 | 0.001562 | Pyrimidine Metabolism_Homo sapiens_P02771 | 0.0004127 | |
| Ubiquitin proteasome pathway_Homo sapiens_P00060 | 0.01952 | |||
| Pentose phosphate pathway_Homo sapiens_P02762 | 0.04431 | |||
| Glycolysis / Gluconeogenesis_Homo sapiens_hsa00010 | 0.05063 | Metabolic pathways_Homo sapiens_hsa01100 | 1.74E-09 | |
| Retinol metabolism_Homo sapiens_hsa00830 | 6.70E-08 | |||
| Drug metabolism—other enzymes_Homo sapiens_hsa00983 | 0.001571 | |||
| Arachidonic acid metabolism_Homo sapiens_hsa00590 | 0.002873 | |||
| Drug metabolism—cytochrome P450_Homo sapiens_hsa00982 | 0.003016 | |||
| Metabolism of xenobiotics by cytochrome P450_Homo sapiens_hsa00980 | 0.003016 | |||
| Complement and coagulation cascades_Homo sapiens_hsa04610 | 0.003016 | |||
| Peroxisome_Homo sapiens_hsa04146 | 0.003016 | |||
| Pantothenate and CoA biosynthesis_Homo sapiens_hsa00770 | 0.003016 | |||
| Histidine metabolism_Homo sapiens_hsa00340 | 0.004777 | |||
| anterior cell cortex (GO:0061802) | 0.0002658 | smooth endoplasmic reticulum lumen (GO:0048238) | 0.004391 | |
| equatorial cell cortex (GO:1990753) | 0.0002658 | endoplasmic reticulum lumen (GO:0005788) | 0.004391 | |
| posterior cell cortex (GO:0061803) | 0.0002658 | cortical endoplasmic reticulum lumen (GO:0099021) | 0.004391 | |
| cell cortex region (GO:0099738) | 0.0002658 | perinuclear endoplasmic reticulum lumen (GO:0099020) | 0.004391 | |
| mitotic spindle midzone (GO:1990023) | 0.00211 | sarcoplasmic reticulum lumen (GO:0033018) | 0.004391 | |
| meiotic spindle midzone (GO:1990385) | 0.00211 | rough endoplasmic reticulum lumen (GO:0048237) | 0.004391 | |
| condensed nuclear chromosome outer kinetochore (GO:0000942) | 0.008363 | Golgi lumen (GO:0005796) | 0.01546 | |
| mitotic spindle pole (GO:0097431) | 0.012 | trans-Golgi network transport vesicle lumen (GO:0098564) | 0.01792 | |
| ficolin-1-rich granule lumen (GO:1904813) | 0.01519 | Golgi stack lumen (GO:0034469) | 0.02105 | |
| spindle microtubule (GO:0005876) | 0.0246 | peroxisomal matrix (GO:0005782) | 0.02848 | |
| positive regulation of mitotic metaphase/anaphase transition (GO:0045842) | 0.0000599 | exogenous drug catabolic process (GO:0042738) | 0.0003105 | |
| plant-type primary cell wall biogenesis (GO:0009833) | 0.0005205 | exogenous antibiotic catabolic process (GO:0042740) | 0.0003105 | |
| cellular bud neck septin ring organization (GO:0032186) | 0.0005205 | peptidyl-glutamic acid carboxylation (GO:0017187) | 0.0003105 | |
| cellular bud site selection (GO:0000282) | 0.0005205 | epoxygenase P450 pathway (GO:0019373) | 0.0003105 | |
| protein localization to mitotic actomyosin contractile ring (GO:1904498) | 0.0005205 | signal peptide processing (GO:0006465) | 0.002393 | |
| mitotic cytokinesis (GO:0000281) | 0.0005205 | negative regulation of platelet activation (GO:0010544) | 0.001191 | |
| mitotic cytokinetic process (GO:1902410) | 0.0005205 | fibrinolysis (GO:0042730) | 0.001191 | |
| regulation of mitotic spindle checkpoint (GO:1903504) | 0.0005315 | phytosteroid metabolic process (GO:0016128) | 0.001191 | |
| negative regulation of mitotic metaphase/ anaphase transition (GO:0045841) | 0.0005315 | C21-steroid hormone metabolic process (GO:0008207) | 0.001214 | |
| regulation of mitotic metaphase/anaphase transition (GO:0030071) | 0.001438 | (25S)-Delta(4)-dafachronate metabolic process (GO:1902057) | 0.001214 | |
| Rho GDP-dissociation inhibitor activity (GO:0005094) | 0.04347 | arachidonic acid epoxygenase activity (GO:0008392) | 0.0000015 | |
| Rab GDP-dissociation inhibitor activity (GO:0005093) | 0.04347 | arachidonic acid 11,12-epoxygenase activity (GO:0008405) | 0.0000015 | |
| arachidonic acid 14,15-epoxygenase activity (GO:0008404) | 0.0000015 | |||
| sodium-independent organic anion transmembrane transporter activity (GO:0015347) | 0.001676 | |||
| heme binding (GO:0020037) | 0.005818 | |||
| bile acid transmembrane transporter activity (GO:0015125) | 0.009954 | |||
| metal ion binding (GO:0046872) | 0.01658 | |||
| alkali metal ion binding (GO:0031420) | 0.01658 | |||
| lead ion binding (GO:0032791) | 0.01658 | |||
| transition metal ion binding (GO:0046914) | 0.01658 | |||
The performance of stage classification models developed using 30 RNA transcripts selected using WEKA from 103 RNA transcripts (LS-RNA-WEKA).
| Machine Learning Techniques | Dataset | Performance Measures | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | AUC with 95% CI | ||
| SVM | Training | 80.43 | 73.05 | 76.7 | 0.54 | 0.8 (0.69–0.89) |
| Validation | 68.57 | 75 | 71.83 | 0.44 | 0.77 (0.65–0.89) | |
| Random Forest | Training | 81.88 | 74.47 | 78.14 | 0.56 | 0.84 (0.79–0.88) |
| Validation | 65.71 | 58.33 | 61.97 | 0.24 | 0.67 (0.56–0.73) | |
| Naïve Bayes | Training | 84.78 | 64.54 | 74.55 | 0.5 | 0.79 (0.68–0.89) |
| Validation | 80 | 72.22 | 76.06 | 0.52 | 0.79 (0.69–0.89) | |
| SMO | Training | 75.36 | 76.6 | 75.99 | 0.52 | 0.76 (0.65–0.88) |
| Validation | 62.86 | 77.78 | 70.42 | 0.41 | 0.7 (0.60–0.81) | |
| J48 | Training | 68.12 | 72.34 | 70.25 | 0.4 | 0.7 (0.64–0.75) |
| Validation | 57.14 | 66.67 | 61.97 | 0.24 | 0.63 (0.53–0.73) | |
Fig 3The differential expression pattern of 30 RNA transcripts (LS-RNA-WEKA) in early stage versus late stage tissue samples (With FDR <0.01).
The performance of stage classification hybrid models developed using 51 features (LS-CpG-RNA-hybrid) that comprise 21 CpG sites and 30 RNA transcripts.
| Machine Learning Techniques | Dataset | Performance Measures | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | AUC with 95% CI | ||
| SVM | Training | 81.16 | 73.76 | 77.42 | 0.55 | 0.82 (0.73–0.92) |
| Validation | 80 | 75 | 77.46 | 0.55 | 0.8 (0.68–0.91) | |
| Random Forest | Training | 79.71 | 75.89 | 77.78 | 0.56 | 0.85(0.77–0.92) |
| Validation | 71.43 | 72.22 | 71.83 | 0.44 | 0.79 (0.74–0.83) | |
| Naïve Bayes | Training | 89.13 | 67.38 | 78.14 | 0.58 | 0.81(0.73–0.88) |
| Validation | 85.71 | 72.22 | 78.87 | 0.58 | 0.82 (0.74–0.89) | |
| SMO | Training | 83.33 | 73.05 | 78.14 | 0.57 | 0.78 (0.73–0.83) |
| Validation | 80 | 72.22 | 76.06 | 0.52 | 0.76 (0.7–0.81) | |
| J48 | Training | 70.29 | 68.09 | 69.18 | 0.38 | 0.69 (0.59–0.81) |
| Validation | 65.71 | 75 | 70.42 | 0.41 | 0.75(0.65–0.86) | |
The performance models developed for discriminating LIHC and normal samples, using 5 methylation CpG sites.
| Machine Learning Techniques | Dataset | Performance Measures | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | AUC with 95%CI | ||
| SVM | Training | 99.33 | 90 | 98.23 | 0.91 | 0.99 (0.97–1) |
| Validation | 98.67 | 90 | 97.65 | 0.89 | 0.99 (0.98–0.99) | |
| Random Forest | Training | 99 | 87.5 | 97.64 | 0.88 | 0.97 (0.94–1) |
| Validation | 98.67 | 90 | 97.65 | 0.89 | 0.99 (0.97–1) | |
| Naïve Bayes | Training | 96.66 | 90 | 95.87 | 0.82 | 0.94(0.84–1) |
| Validation | 97.33 | 90 | 96.47 | 0.84 | 0.95 (0.86–1) | |
| SMO | Training | 99 | 90 | 97.94 | 0.9 | 0.94 (0.88–0.98) |
| Validation | 98.67 | 90 | 97.65 | 0.89 | 0.94 (0.84–1) | |
| J48 | Training | 97.66 | 80 | 95.58 | 0.79 | 0.86(0.79–0.93) |
| Validation | 98.67 | 80 | 96.47 | 0.82 | 0.89 (0.76–1) | |
Fig 4(A) The differential methylation pattern of LCN-5CpG, and (B) the differential expression pattern of (LCN-5RNA) in cancer vs. normal (adjacent non-tumor) tissue samples (Bonferroni adjusted p-value <0.001).
The performance of models developed for discriminating LIHC and normal samples using 5 RNA transcripts.
| Machine Learning Techniques | Dataset | Performance Measures | ||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | AUC with 95% CI | ||
| SVM | Training | 97.32 | 97.5 | 97.35 | 0.89 | 0.99 (0.98–0.99) |
| Validation | 97.33 | 90 | 96.47 | 0.84 | 0.97 (0.94–1) | |
| Random Forest | Training | 97.66 | 90 | 96.76 | 0.85 | 0.98 (0.94–1) |
| Validation | 98.67 | 80 | 96.47 | 0.82 | 0.89 (0.84–0.95) | |
| Naïve Bayes | Training | 96.66 | 97.5 | 96.76 | 0.86 | 0.97 (0.94–0.99) |
| Validation | 97.33 | 90 | 96.47 | 0.84 | 0.94 (0.84–1) | |
| SMO | Training | 97.32 | 97.5 | 97.35 | 0.89 | 0.97 (0.94–1) |
| Validation | 97.33 | 90 | 96.47 | 0.84 | 0.94 (0.84–1) | |
| J48 | Training | 96.99 | 97.5 | 97.05 | 0.87 | 0.96 (0.94–0.99) |
| Validation | 98.67 | 90 | 97.65 | 0.89 | 0.94 (0.84–1) | |
The performance Naïve Bayes model in the form of confusion matrix developed for classifying normal, early and late stage samples.
The model was developed using 33 CpG sites (multiclass-CpG) and 5 RNA transcripts (multiclass-RNA).
| Actual | Accuracy | Weighted average ROC | |||
| Late Stage | Early Stage | Normal | |||
| 102 | 36 | 3 | Late Stage | 77.43 | 0.88 |
| 29 | 107 | 2 | Early Stage | ||
| 0 | 2 | 38 | Normal | ||
| Accuracy | Weighted average ROC | ||||
| Late Stage | Early Stage | Normal | |||
| 25 | 10 | 1 | Late Stage | 76.54 | 0.86 |
| 7 | 27 | 1 | Early Stage | ||
| 0 | 0 | 10 | Normal | ||
| Accuracy | Weighted average ROC | ||||
| Late Stage | Early Stage | Normal | |||
| 87 | 54 | 0 | Late Stage | 72.73 | 0.81 |
| 33 | 105 | 0 | Early Stage | ||
| 0 | 0 | 40 | Normal | ||
| Predicted as | Accuracy | Weighted average ROC | |||
| Late Stage | Early Stage | Normal | |||
| 18 | 18 | 0 | Late Stage | 72.84 | 0.80 |
| 4 | 31 | 0 | Early Stage | ||
| 0 | 0 | 10 | Normal | ||