| Literature DB >> 32292778 |
Zi-Mei Zhang1, Jiu-Xin Tan1, Fang Wang1, Fu-Ying Dao1, Zhao-Yue Zhang1, Hao Lin1.
Abstract
Hepatocellular carcinoma (HCC) is a serious cancer which ranked the fourth in cancer-related death worldwide. Hence, more accurate diagnostic models are urgently needed to aid the early HCC diagnosis under clinical scenarios and thus improve HCC treatment and survival. Several conventional methods have been used for discriminating HCC from cirrhosis tissues in patients without HCC (CwoHCC). However, the recognition successful rates are still far from satisfactory. In this study, we applied a computational approach that based on machine learning method to a set of microarray data generated from 1091 HCC samples and 242 CwoHCC samples. The within-sample relative expression orderings (REOs) method was used to extract numerical descriptors from gene expression profiles datasets. After removing the unrelated features by using maximum redundancy minimum relevance (mRMR) with incremental feature selection, we achieved "11-gene-pair" which could produce outstanding results. We further investigated the discriminate capability of the "11-gene-pair" for HCC recognition on several independent datasets. The wonderful results were obtained, demonstrating that the selected gene pairs can be signature for HCC. The proposed computational model can discriminate HCC and adjacent non-cancerous tissues from CwoHCC even for minimum biopsy specimens and inaccurately sampled specimens, which can be practical and effective for aiding the early HCC diagnosis at individual level.Entities:
Keywords: REOs; cirrhosis; early diagnosis; hepatocellular carcinoma; mRMR; support vector machine
Year: 2020 PMID: 32292778 PMCID: PMC7122481 DOI: 10.3389/fbioe.2020.00254
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Flowchart presenting the process of developing and validating the HCC diagnostic signature.
The 11−gene−pair signature for early diagnosis of HCC.
| Signature | Gene | Gene |
| pair1 | TRMT112 | SF3B1 |
| pair2 | MFSD5 | COLEC10 |
| pair3 | FDXR | APC2 |
| pair4 | LAMC1 | CHST4 |
| pair5 | UBE4B | HGF |
| pair6 | NCAPH2 | APC2 |
| pair7 | HSPH1 | MTHFD2 |
| pair8 | TMEM38B | AGO3 |
| pair9 | PLGRKT | COLEC10 |
| pair10 | HNF1A | APC2 |
| pair11 | ARPC2 | SF3B1 |
FIGURE 2A plot showing the IFS procedure for identifying HCC. When the top 857 features optimized by mRMR were used to perform prediction, the overall success rate reaches an IFS peak of 100% in fivefold cross validation. The solid line represents the ROC curve. The dotted line represents the strategy of randomly guess.
The performance of the signature in the validation datasets.
| Datasets | NSnHCC | NSpCwoHCC |
| Testing datasets (biopsy) | 100%(29/29) | 100% (44/44) |
| Testing datasets (surgery) | 100%(245/245) | 100% (18/18) |
| GSE109211 | 31.43%(44/140) | – |
| GSE112790 | 100%(183/183) | – |
| GSE102079 | 100%(152/152) | – |
| GSE121248 | 100%(70/70) | – |
| TCGA | 100%(371/371) | – |
FIGURE 3Area under the receiver operating characteristic curve (AUC) of the validation data from public databases of biopsy and surgically resected HCC and CwoHCC samples. The solid line represents the ROC curve. The dotted line represents the strategy of randomly guess.
Comparison of 11 gene pairs with existing methods on independent datasets.
| Dataset | 11-gene-pair | 19-gene-pair | ||||
| NSnHCC | NACwHCC | NANwHCC | NSnHCC | NACwHCC | NANwHCC | |
| GSE6764 | − | 10/10(100.0%) | − | − | 10/10(100.0%) | − |
| GSE17548 | − | 18/20(90.0%) | − | − | 18/20(90.0%) | − |
| GSE17967 | − | 16/16(100.0%) | − | − | 8/16(50.0%) | − |
| GSE63898 | − | 168/168(100.0%) | − | − | 168/168(100.0%) | − |
| GSE25097 | − | 40/40(100.0%) | 243/243(100.0%) | − | 40/40(100.0%) | 243/243(100.0%) |
| GSE62232 | − | − | 10/10(100.0%) | − | − | 10/10(100.0%) |
| GSE36376 | − | − | 193/193(100.0%) | − | − | 172/193(89.1%) |
| GSE39791 | − | − | 72/72(100.0%) | − | − | 71/72(98.6%) |
| GSE41804 | − | − | 20/20(100.0%) | − | − | 20/20(100.0%) |
| GSE112790 | 183/183(100.0%) | − | 15/15(100.0%) | 183/183(100.0%) | − | 15/15(100.0%) |
| GSE102079 | 152/152(100.0%) | − | 91/91(100.0%) | 152/152(100.0%) | − | 91/91(100.0%) |
| GSE109211 | 44/140(31.4%) | − | − | 37/140(26.4%) | − | − |
| Total | 379/475(79.8%) | 238/254(93.7%) | 644/644(100.0%) | 372/475(79.3%) | 244/254(96.1%) | 622/644(96.6%) |
| GSE121248 | 70/70(100.0%) | − | 37/37(100.0%) | 70/70(100.0%) | − | 37/37(100.0%) |
| GSE64041 | − | − | 60/60(100.0%) | − | − | 60/60(100.0%) |
| GSE54236 | − | 80/80(100.0%) | − | − | 62/80(77.5%) | − |
| Total | 70/70(100.0%) | 80/80(100.0%) | 97/97(100.0%) | 70/70(100.0%) | 62/80(77.5%) | 97/97(100.0%) |