| Literature DB >> 29523793 |
Peiyang Li1,2, Weiwei Zhou1,2, Xiaoye Huang1,2, Xuyang Zhu1,2, Huan Liu2, Teng Ma1,2, Daqing Guo1,2, Dezhong Yao1,2, Peng Xu3,4.
Abstract
Artifacts in biomedical signal recordings, such as gene expression, sonar image and electroencephalogram, have a great influence on the related research because the artifacts with large value usually break the neighbor structure in the datasets. However, the conventional graph embedding (GE) used for dimension reduction such as linear discriminant analysis, principle component analysis and locality preserving projections is essentially defined in the L2 norm space and is prone to the presence of artifacts, resulting in biased sub-structural features. In this work, we defined graph embedding in the L1 norm space and used the maximization strategy to solve this model with the aim of restricting the influence of outliers on the dimension reduction of signals. The quantitative evaluation with different outlier conditions demonstrates that an L1 norm-based GE structure can estimate hyperplanes, which are more stable than those of conventional GE-based methods. The applications to a variety of datasets also show that the proposed L1 GE is more robust to outlier influence with higher classification accuracy estimated. The proposed L1 GE may be helpful for capturing reliable mapping information from the datasets that have been contaminated with outliers.Entities:
Year: 2018 PMID: 29523793 PMCID: PMC5844917 DOI: 10.1038/s41598-018-22207-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Classification accuracy and the projection direction in different outlier conditions.
| Methods | Occurrence rate (%) | Occurrence Strength | ||||
|---|---|---|---|---|---|---|
| 0% | 3% | 8% | 0.00 |
|
| |
| LDA (Angle) | 0.91 ± 0.03 (128.85 ± 32.27) | 0.73 ± 0.06 (79.89 ± 51.75) | 0.88 ± 0.04 (113.72 ± 50.08) | 0.78 ± 0.04 (90.36 ± 57.35) | ||
| (Time/s) | 0.62 | 0.64 | 0.61 | 0.62 | 0.61 | 0.63 |
| R LDA (Angle) | 0.91 ± 0.03 (128.93 ± 32.10) | 0.73 ± 0.06 (79.90 ± 51.79) | 0.88 ± 0.03 (113.25 ± 38.82) | 0.78 ± 0.05 (92.08 ± 53.09) | ||
| (Time/s) | 0.64 | 0.65 | 0.64 | 0.64 | 0.64 | 0.64 |
| H LDA (Angle) | 0.53 ± 0.05 (45.02 ± 1.54) | 0.51 ± 0.04 (44.63 ± 1.07) | 0.62 ± 0.14 (52.53 ± 30.37) | 0.53 ± 0.08 (47.19 ± 16.78) | ||
| (Time/s) | 2.80 | 2.83 | 2.82 | 2.80 | 2.83 | 2.84 |
| NDA (Angle) | 0.94 ± 0.04 (133.15 ± 28.20) | 0.74 ± 0.05 (85.12 ± 53.66) | 0.90 ± 0.04 (125.07 ± 52.00) | 0.79 ± 0.05 (95.47 ± 52.32) | ||
| (Time/s) | 4.56 | 4.58 | 4.56 | 4.56 | 4.56 | 4.58 |
| SRDA (Angle) | 0.91 ± 0.06 (128.23 ± 51.50) | 0.73 ± 0.06 (79.41 ± 54.41) | 0.88 ± 0.03 (113.27 ± 38.85) | 0.78 ± 0.05 (92.87 ± 53.10) | ||
| (Time/s) | 0.82 | 0.83 | 0.83 | 0.82 | 0.82 | 0.82 |
| LPP (Angle) | 0.91 ± 0.08 (128.33 ± 51.75) | 0.73 ± 0.03 (75.74 ± 53.71) | 0.88 ± 0.03 (114.70 ± 42.09) | 0.78 ± 0.05 (90.16 ± 53.11) | ||
| (Time/s) | 1.75 | 1.70 | 1.72 | 1.74 | ||
| SRKDA | 0.74 ± 0.20 | 0.73 ± 0.18 | 0.74 ± 0.19 | 0.74 ± 0.20 | 0.74 ± 0.19 | 0.74 ± 0.19 |
| (Angle) | — | — | — | — | — | — |
| (Time/s) | 0.97 | 1.00 | 0.95 | 0.82 | 0.96 | 0.94 |
| L1 LDA (Angle) | ||||||
| (Time/s) | 11.89 | 28.97 | 13.65 | 11.89 | 18.98 | 23.41 |
| Lasso SRDA | ||||||
| (Angle) | ( | |||||
| (Time/s) | 6.19 | 5.99 | 6.14 | 6.19 | 6.08 | 6.05 |
| L1 SR (Angle) | 0.94 ± 0.04 (130.72 ± 34.58) | 0.80 ± 0.05 (134.39 ± 4.87) | ||||
| (Time/s) | 75.65 | 68.38 | 80.08 | 75.65 | 158.10 | 129.73 |
| L1 GE (Angle) | ||||||
| (Time/s) | 4.63 | 5.92 | 5.93 | 4.63 | 4.91 | 5.07 |
Figure 1The effect of outliers on decision boundaries. (a) LDA. (b) RLDA. (c) HLDA. (d) NDA. (e) L1 GE. (f) SRDA. (g) LPP. (h) L1 LDA. (i) Lasso SRDA. (j) L1 SR.
Figure 2The gene dataset classification based on eight classifiers. (a) Colon cancer data. (b) Leukemia data.
Classification accuracy for the two BCI datasets.
| BCI Dataset | Algorithms | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| LDA | R LDA | HLDA | NDA | SRDA | LPP | SRKDA | L1 LDA | Lasso SRDA | L1 SR | L1 GE | |
| S1 | 0.52 | 0.52 | 0.53 | 0.50 | 0.52 | ||||||
| S2 | 0.90 | 0.90 | 0.83 | 0.90 | 0.90 | 0.90 | 0.88 | 0.92 | 0.89 | 0.88 | |
| S3 | 0.93 | 0.81 | 0.93 | 0.93 | 0.92 | 0.93 | 0.92 | 0.91 | |||
| S4 | 0.57 | 0.53 | 0.57 | 0.57 | 0.54 | 0.57 | 0.60 | 0.53 | 0.53 | 0.58 | |
| S5 | 0.70 | 0.70 | 0.67 | 0.70 | 0.70 | 0.70 | 0.72 | 0.69 | 0.72 | 0.70 | |
| S6 | 0.78 | 0.77 | 0.57 | 0.78 | 0.78 | 0.78 | 0.77 | 0.74 | 0.78 | 0.68 | |
| S7 | 0.49 | 0.75 | 0.75 | 0.73 | 0.75 | 0.75 | |||||
| S8 | 0.60 | 0.61 | 0.59 | 0.61 | 0.60 | 0.57 | 0.63 | 0.60 | 0.59 | 0.60 | |
| S9 | 0.80 | 0.82 | 0.60 | 0.84 | 0.82 | 0.82 | 0.80 | 0.84 | 0.84 | 0.80 | |
| S10 | 0.60 | 0.62 | 0.52 | 0.58 | 0.64 | 0.58 | 0.74 | 0.62 | 0.79 | ||
| S11 | 0.61 | 0.62 | 0.61 | 0.61 | 0.61 | 0.59 | 0.57 | 0.60 | 0.60 | 0.66 | |
| S12 | 0.71 | 0.70 | 0.71 | 0.73 | 0.70 | 0.71 | 0.63 | 0.82 | 0.82 | ||
| S13 | 0.58 | 0.61 | 0.58 | 0.58 | 0.61 | 0.58 | 0.61 | 0.70 | 0.75 | 0.73 | |
| S14 | 0.57 | 0.59 | 0.56 | 0.59 | 0.57 | 0.54 | 0.58 | 0.60 | 0.58 | 0.60 | |
| S15 | 0.58 | 0.58 | 0.52 | 0.58 | 0.58 | 0.58 | 0.53 | 0.80 | 0.58 | 0.55 | |
| S16 | 0.66 | 0.67 | 0.77 | 0.66 | 0.67 | 0.66 | 0.68 | 0.68 | 0.69 | 0.73 | |
| S7 | 0.98 | 0.98 | 0.55 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.92 | |
| S18 | 0.50 | 0.51 | 0.71 | 0.50 | 0.51 | 0.50 | 0.53 | 0.94 | 0.50 | 0.92 | |
| S19 | 0.52 | 0.52 | 0.47 | 0.52 | 0.52 | 0.51 | 0.50 | 0.52 | 0.52 | 0.52 | |
| aa | 0.68 | 0.68 | 0.68 | 0.68 | 0.68 | 0.68 | 0.68 | 0.68 | 0.69 | 0.68 | |
| al | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | 0.98 | ||
| av | 0.68 | 0.69 | 0.49 | 0.68 | 0.68 | 0.68 | 0.66 | 0.70 | 0.68 | ||
| aw | 0.75 | 0.81 | 0.74 | 0.75 | 0.79 | 0.75 | 0.74 | 0.78 | 0.88 | 0.84 | |
| ay | 0.81 | 0.82 | 0.81 | 0.81 | 0.81 | 0.70 | 0.78 | 0.80 | 0.82 | 0.81 | |
| Mean | 0.70 ± 0.14 | 0.71 ± 0.14 | 0.67 ± 0.14 | 0.70 ± 0.14 | 0.71 ± 0.14 | 0.70 ± 0.14 | 0.70 ± 0.15 | 0.75 ± 0.13 | 0.72 ± 0.15 | 0.75 ± 0.15 | |
| Highest Acc# | 2/24 | 3/24 | 5/24 | 1/24 | 1/24 | 2/24 | 5/24 | 3/24 | 2/24 | 6/24 |
|
The classification for the UCI dataset.
| UCI Dataset | Algorithm | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| LDA | R LDA | HLDA | NDA | SRDA | LPP | SRKDA | L1 LDA | Lasso SRDA | L1 SR | L1 GE | |
| Australian | 0.68 ± 0.05 | 0.85 ± 0.03 | 0.66 | 0.56 ± 0.03 | 0.72 ± 0.06 | 0.68 ± 0.03 | |||||
| Time | 0.30 | 0.08 | 6.80 | 1.12 | 0.12 | 1.80 | 2.96 | 50.75 | 0.27 | 436.02 | 4.68 |
| BreastC | 0.94 ± 0.02 | 0.59 ± 0.05 | 0.94 ± 0.07 | 0.89 | 0.67 ± 0.02 | 0.87 ± 0.03 | 0.92 ± 0.02 | ||||
| Time | 0.35 | 0.10 | 7.06 | 1.47 | 0.13 | 1.37 | 2.34 | 64.82 | 0.23 | 347.54 | 8.50 |
| HeartD | 0.83 ± 0.04 | 0.80 ± 0.05 | 0.57 ± 0.07 | 0.82 ± 0.05 | 0.80 ± 0.05 | 0.82 ± 0.05 | 0.83 ± 0.05 | 0.83 ± 0.05 | 0.83 ± 0.05 | ||
| Time | 0.13 | 0.07 | 2.13 | 0.61 | 0.09 | 0.36 | 0.37 | 18.20 | 0.22 | 126.06 | 1.76 |
| IonoS | 0.83 ± 0.05 | 0.83 ± 0.04 | 0.70 ± 0.06 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.72 ± 0.04 | 0.83 ± 0.04 | 0.83 ± 0.04 | 0.83 ± 0.04 | |
| Time | 0.28 | 0.10 | 4.11 | 1.34 | 0.11 | 0.50 | 0.69 | 56.84 | 0.31 | 297.69 | 3.46 |
| Liver | 0.65 ± 0.05 | 0.65 ± 0.05 | 0.61 ± 0.07 | 0.66 ± 0.06 | 0.65 ± 0.06 | 0.57 | 0.59 ± 0.06 | 0.66 ± 0.05 | 0.66 ± 0.06 | 0.66 ± 0.05 | |
| Time | 0.13 | 0.06 | 2.33 | 0.60 | 0.09 | 0.45 | 0.57 | 17.11 | 0.23 | 218.45 | 2.05 |
| Sonar | 0.89 ± 0.05 | 0.93 ± 0.04 | 0.53 ± 0.09 | 0.87 ± 0.05 | 0.94 ± 0.04 | 0.89 | 0.88 ± 0.05 | 0.73 ± 0.07 | 0.87 ± 0.05 | 0.93 ± 0.05 | |
| Time | 0.53 | 0.15 | 4.20 | 2.38 | 0.10 | 0.52 | 0.49 | 20.50 | 0.30 | 371.93 | 6.57 |
| SPECT | 0.72 ± 0.06 | 0.52 ± 0.07 | 0.69 ± 0.06 | 0.73 | 0.79 | 0.71 ± 0.06 | 0.72 ± 0.05 | 0.72 ± 0.06 | |||
| Time | 0.36 | 0.13 | 4.00 | 1.93 | 0.13 | 0.53 | 0.37 | 40.11 | 0.30 | 156.61 | 5.36 |
| 'Winequality' | 0.89 ± 0.01 | 0.82 ± 0.01 | 0.79 ± 0.01 | 0.82 ± 0.01 | 0.83 | 0.84 ± 0.01 | 0.77 ± 0.77 | 0.86 ± 0.03 | |||
| Time | 1.85 | 0.16 | 194.81 | 6.15 | 0.22 | 585.56 | 264.16 | 321.20 | 1.31 | 1008.39 | 78.36 |
| 'Thrombin | 0.84 ± 0.08 | 0.80 ± 0.12 | 0.78 ± 0.12 | 0.51 ± 0.24 | 0.48 | 0.68 | 0.78 ± 0.15 | 0.50 ± 0.23 | 0.87 ± 0.05 | 0.86 ± 0.08 | |
| Time | 0.54 | 0.35 | 1.02 | 7.00 | 0.23 | 0.35 | 1.17 | 5.71 | 0.71 | 41.94 | 2.16 |
| Mean_Result | 0.83 ± 0.03 | 0.83 ± 0.04 | 0.64 ± 0.07 | 0.82 ± 0.07 | 0.79 ± 0.09 | 0.78 | 0.75 | 0.79 ± 0.06 | 0.75 | 0.83 | |
| Highest Acc# | 2/9 | 3/9 | 0/9 | 2/9 | 2/9 | 1/9 | 2/9 | 0/9 | 2/9 | 2/9 |
|