| Literature DB >> 34433487 |
Zishuang Zhang1, Zhi-Ping Liu2,3.
Abstract
BACKGROUND: Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques.Entities:
Keywords: Akaike information criterion; Biomarker discovery; Feature selection; Hepatocellular carcinoma; Omics data
Mesh:
Year: 2021 PMID: 34433487 PMCID: PMC8386074 DOI: 10.1186/s12920-021-00957-4
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The framework of robust biomarker discovery for HCC
Fig. 2ROC curves corresponding to the best subsets selected by 6 classification algorithms
The classification performance of the 6 classifiers
| Method | # of gene | SN | SP | F1-score | ACC | AUC |
|---|---|---|---|---|---|---|
| Adaboost | 21 | 0.940 | 1.00 | 0.969 | 0.970 | 0.994 |
| KNN | 62 | 0.960 | 1.00 | 0.979 | 0.980 | 0.999 |
| NB | 12 | 0.980 | 1.00 | 0.989 | 0.990 | 1.00 |
| NN | 63 | 0.960 | 1.00 | 0.979 | 0.980 | 1.00 |
| RF | 13 | 0.980 | 1.00 | 0.989 | 0.990 | 0.993 |
| SVM | 57 | 0.960 | 1.00 | 0.979 | 0.980 | 0.996 |
Fig. 3The process of RFE-CV in 6 classifiers. Red point refers to these features with the maximum accuracy
Fig. 4The overlap status of the 6 optimal feature subsets
The number of overlapping features and the corresponding significance P values
| Overlap | Adaboost | KNN | NB | NN | RF | SVM | SR |
|---|---|---|---|---|---|---|---|
| Adaboost | 21 | 21 | 12 | 4 | 8 | 21 | 2 |
| KNN | < 1e−6 | 62 | 12 | 16 | 12 | 57 | 4 |
| NB | < 1e−6 | < 1e−6 | 12 | 2 | 6 | 12 | 2 |
| NN | 6e-2 | < 1e−6 | 2.1e-1 | 63 | 3 | 14 | 3 |
| RF | < 1e−6 | < 1e−6 | < 1e−6 | 6e-2 | 13 | 11 | 0 |
| SVM | < 1e−6 | < 1e−6 | < 1e−6 | 1.43e−5 | < 1e−6 | 57 | 4 |
| SR | 5.36e-4 | 1.82e-5 | 9.95e-4 | 1.3e-3 | 1 | 1.55e-5 | 4 |
The last 5 genes deleted by stepwise regression
| Step | Deviance | Resid. Dev | AIC | |
|---|---|---|---|---|
| ID2B | 7.72e−11 | 1.91e−09 | 1.73e−17 | 18 + 1.91e−09 |
| PMP2 | 4.55e−10 | 2.37e−10 | 1.05e−19 | 16 + 2.37e−10 |
| MUC6 | 5.45e−10 | 2.91e−10 | 1.59e−07 | 14 + 2.91e−10 |
| C1QL1 | 1.03e−09 | 3.94e−09 | 1.74e−24 | 12 + 3.94e−09 |
| SKAP1 | 1.25e−08 | 1.64e−08 | 4.19e−13 | 10 + 1.64e−08 |
Fig. 5The trend of AIC value in the feature selection process
Some genes and their dysfunctions from the interactions of selected feature subsets of different methods
| Gene | Subset | Function |
|---|---|---|
| SKAP1 | 6 methods | SKAP1 encodes a T cell adaptor protein and it is involved in HCC signaling pathways [ |
| EPHB1 | SVM, KNN, NN | Ephrin-B1 participates in the tumor progression through promoting the formation of new vessels of HCC [ |
| STC2 | NN, SVM, KNN | STC2 is overexpressed in HCC and acts as a potential oncoprotein [ |
| CDHR2 | NN, SVM, KNN | CDHR2 is highly expressed in HCC para-carcinoma tissue, but is weakly expressed in tumors. It is found to inhibit tumor growth [ |
| FAM134B | NN, SVM, KNN | FAM134B works as a tumor inhibitor and inhibits cancer growth in vitro and in vivo [ |
| MUC6 | RF, NN, SVM, KNN | MUC6 encodes a member of the mucin protein family. It is a biomarker gene of many cancers [ |
| PHOSPHO1 | Adaboost, NB, NN, SVM, KNN, Stepwise Regression | PHOSPHO1 is associated with hepatitis B [ |
| OXT | NN, SVM, KNN, Stepwise Regression | OXT is found to regulate cell proliferation. It is a key differential gene in nonalcoholic fatty liver disease [ |
Fig. 6The ROC curves of the trained NB classifier in the independent data GSE25097