| Literature DB >> 26579207 |
Zhiwei Ji1, Guanmin Meng2, Deshuang Huang3, Xiaoqiang Yue4, Bing Wang5.
Abstract
BACKGROUND: Hepatocellular carcinoma (HCC) is a highly aggressive malignancy. Traditional Chinese Medicine (TCM), with the characteristics of syndrome differentiation, plays an important role in the comprehensive treatment of HCC. This study aims to develop a nonnegative matrix factorization- (NMF-) based feature selection approach (NMFBFS) to identify potential clinical symptoms for HCC patient stratification.Entities:
Mesh:
Year: 2015 PMID: 26579207 PMCID: PMC4633688 DOI: 10.1155/2015/846942
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
The description of original clinical data of HCC patients.
| Sex | Phase I (82) | Phase II (195) | Phase III (130) | |||
|---|---|---|---|---|---|---|
| Phase | Phase | Phase | Phase | Phase | Phase | |
| Male | 33 | 27 | 50 | 115 | 95 | 10 |
| Female | 12 | 10 | 10 | 20 | 16 | 9 |
Figure 1The flowchart of the proposed approach.
Eight irrelevant symptoms were screened with threshold 10%. Each of them is rarely positive in each phase.
| Symptoms | Phase I | Phase II | Phase III | |||
|---|---|---|---|---|---|---|
| Phase IA | Phase IB | Phase IIA | Phase IIB | Phase IIIA | Phase IIIB | |
| Pale white lip [ | 0 | 5.41% | 6.67% | 5.19% | 4.5% | 0 |
| Edema in lower extremities [ | 2.22% | 8.1% | 1.67% | 5.19% | 3.6% | 0 |
| Lack of urine output [ | 0 | 2.7% | 0 | 0 | 5.41% | 0 |
| Emotional depression [ | 4.44% | 0 | 5% | 8.89% | 6.31% | 5.26% |
| Head body trapped heavy [ | 0 | 2.7% | 3.33% | 2.22% | 2.7% | 0 |
| Hydrothorax [ | 6.67% | 2.7% | 1.67% | 3.7% | 2.7% | 0 |
| Rapid pulse [ | 4.44% | 2.7% | 1.67% | 0.74% | 5.41% | 5.26% |
| Uneven pulse [ | 4.44% | 5.41% | 8.33% | 3.7% | 3.6% | 0 |
Figure 2The heatmap of the representative clinical dataset D . (a) The heatmap of D with 49 symptoms and 120 samples. (b) The distribution patterns of symptoms V 6, V 8, V 28, V 37, and V 53 indicate that the frequencies of positive are low. (c) The distribution patterns of symptoms V 46, V 42, and V 25 indicate that the frequencies of positive are high.
Figure 3Estimation of the optimal rank r.
Figure 4The result of NMF on the dataset D . The left side indicates the visualization of matrix W (49∗3), and right side denotes matrix H (3∗120).
Figure 5The relationships between NMF-derived basis components and clinical stages of samples.
The NMF-derived participation of the symptoms to each corresponding basis component.
| Basis components | Number of symptoms | The names of symptoms |
|---|---|---|
| Basis 1 | 16 | Varicose veins [ |
|
| ||
| Basis 2 | 17 | Nausea [ |
|
| ||
| Basis 3 | 16 | Tinnitus [ |
The mean similarity values about the pairs of redundant symptoms within the same groups.
| Basis components | The screened redundant symptoms | Distance-based similarity | Correlation-based similarity |
|---|---|---|---|
| Basis 1 |
| 0.9672 | 1.0 |
|
| 0.9507 | 1.0 | |
|
| |||
| Basis 2 |
| 0.9685 | 0.9960 |
|
| 0.9628 | 1.0 | |
|
| |||
| Basis 3 |
| 0.9686 | 1.0 |
|
| 0.9520 | 0.9926 | |
The NMF-driven potential clinical features of HCC (threshold: 0.95).
| Basis components | Original features | Mixed features | Description about mixed features |
|---|---|---|---|
| Basis 1 |
|
| Converted from { |
|
| |||
| Basis 2 |
|
| Converted from { |
|
| |||
| Basis 3 |
|
| Converted from { |
|
| |||
| Number of features | 33 | 6 | Total: 39 features |
Classification accuracy among three feature subsets on the training set (120 representative samples). FS1 was obtained by the proposed approach with a given threshold (θ = 0.95), in which 33 original symptom features and 6 new mixed features were included. FS2 denotes the above-mentioned 33 original symptom features (FS2 ⊂ FS1). OFS indicates all the 49 symptoms before NMF calculation.
| Feature subsets | Dimension | Classification accuracy in LSSVM (%) |
|---|---|---|
| FS1 | 39 | 80.002 ± 9.95 |
| FS2 | 33 | 77.50 ± 12.36 |
| OFS | 49 | 72.50 ± 11.64 |
Classification accuracy of inferred optimal feature subset via NMFBFS, ReliefF, mRMR, and Elastic Net on the training set.
| Methods | Feature subset | Dimension | Classification accuracy in LSSVM (%) |
|---|---|---|---|
|
|
|
|
|
|
| |||
| ReliefF | FSRF20 | 20 | 65.00 ± 10.03 |
| FSRF40 | 40 | 73.33 ± 15.76 | |
|
| |||
| mRMR | FSMR20 | 20 | 70.83 ± 12.5 |
| FSMR40 | 40 | 74.17 ± 9.03 | |
|
| |||
| Elastic Net | FSEN20 | 20 | 70.00 ± 11.56 |
| FSEN40 | 40 | 76.67 ± 10.46 | |
Classification accuracy of inferred optimal feature subset via NMFBFS, ReliefF, mRMR, and Elastic Net on the testing set.
| Methods | Feature subset | Dimension | Classification accuracy in LSSVM (%) |
|---|---|---|---|
|
|
|
|
|
|
| |||
| ReliefF | FSRF20 | 20 | 50.71 ± 1.22 |
| FSRF40 | 40 | 76.43 ± 8.27 | |
|
| |||
| mRMR | FSMR20 | 20 | 63.79 ± 1.22 |
| FSMR40 | 40 | 77.14 ± 9.18 | |
|
| |||
| Elastic Net | FSEN20 | 20 | 67.57 ± 4.09 |
| FSEN40 | 40 | 78.38 ± 9.62 | |
The performance of classification for the inferred optimal feature subsets with different threshold θ.
| The values of | Original symptom features | New mixed features | Total number of features | Classification accuracy (%) |
|---|---|---|---|---|
|
| 33 | 6 | 39 | 80.002 ± 9.95 |
|
| 21 | 9 | 30 | 70.83 ± 6.59 |
|
| 10 | 8 | 18 | 70.00 ± 4.56 |