| Literature DB >> 31068980 |
Renchu Guan1, Xu Wang1, Maurizio Marchese2, Mary Qu Yang3, Yanchun Liang1,4, Chen Yang5.
Abstract
With the massive volume and rapid increasing of data, feature space study is of great importance. To avoid the complex training processes in deep learning models which project original feature space into low-dimensional ones, we propose a novel feature space learning (FSL) model. The main contributions in our approach are: (1) FSL can not only select useful features but also adaptively update feature values and span new feature spaces; (2) four FSL algorithms are proposed with the feature space updating procedure; (3) FSL can provide a better data understanding and learn descriptive and compact feature spaces without the tough training for deep architectures. Experimental results on benchmark data sets demonstrate that FSL-based algorithms performed better than the classical unsupervised, semi-supervised learning and even incremental semi-supervised algorithms. In addition, we show a visualization of the learned feature space results. With the carefully designed learning strategy, FSL dynamically disentangles explanatory factors, depresses the noise accumulation and semantic shift, and constructs easy-to-understand feature spaces.Entities:
Keywords: Affinity Propagation; Feature space learning; Semi-supervised learning; k-means
Year: 2018 PMID: 31068980 PMCID: PMC6502470 DOI: 10.1007/s12652-018-0805-4
Source DB: PubMed Journal: J Ambient Intell Humaniz Comput
Pros and cons of the introduced models
| Models | Pros and Cons | References |
|---|---|---|
| Deep learning algorithms | Pros: They can explore feature space and learning good representations. Cons: | Deep belief networks ( |
| Clustering | Pros: Only using the unlabeled sources can detect intrinsic distribution. Cons: | Recommendation systems ( |
| Semi-supervised clustering | Pros: Use the information both from labeled data and unlabeled data. Cons: |
Fig. 1Flowchart of feature space learning model and feature space updating diagram
Fig. 2Comparison results of K-means based methods
Different learning strategies for related algorithms
| Tri-Set similarity | Semi-supervised | FSL | |
|---|---|---|---|
| k-means | × | × | × |
| AP(CC) | × | × | × |
| SK-means | × | ✓ | × |
| CK-means | × | ✓ | × |
| SAP (CC) | × | ✓ | × |
| SAP | ✓ | ✓ | × |
| FSSK-means | × | ✓ | ✓ |
| FSCK-means | × | ✓ | ✓ |
| FSAP | × | ✓ | ✓ |
| FSSAP | ✓ | ✓ | ✓ |
Semi-supervised K-means VS FSL K-means
| Data | Evaluation | SK-means | CK-means | FSSK-means | FSCK-means |
|---|---|---|---|---|---|
| Reuters | Max-F | 0.517 | 0.600 | 0.542 |
|
| Min-E | 0.563 | 0.494 | 0.547 |
| |
| Mean-F | 0.462 | 0.509 | 0.477 |
| |
| Mean-E | 0.619 | 0.576 | 0.611 |
| |
| 20 Newsgroup | Max-F | 0.289 | 0.336 | 0.329 |
|
| Min-E | 0.870 | 0.825 | 0.841 |
| |
| Mean-F | 0.250 | 0.288 | 0.276 |
| |
| Mean-E | 0.936 | 0.890 | 0.906 |
|
The best results are in bold
Fig. 3Comparison results of AP based methods
FSL AP vs IAP on F-measure
| 10 | 50 | 100 | 200 | 300 | 400 | |
|---|---|---|---|---|---|---|
| IAP | 0.456 | 0.465 | 0.449 | 0.459 | 0.465 | 0.465 |
| FSAP |
| 0.423 | 0.513 | 0.500 | 0.514 | 0.503 |
| FSSAP | 0.452 |
|
|
|
|
|
The best results are in bold
FSL AP vs IAP on entropy
| 10 | 50 | 100 | 200 | 300 | 400 | |
|---|---|---|---|---|---|---|
| IAP | 0.698 | 0.594 | 0.594 | 0.598 | 0.598 | 0.596 |
| FSAP |
| 0.621 | 0.565 | 0.517 | 0.540 | 0.512 |
| FSSAP | 0.616 |
|
|
|
|
|
The best results are in bold
Learned feature spaces
| Algorithms | Clusters | Features’ count | Example features |
|---|---|---|---|
| FSCK-means | 1 | 2434 | Aid = 3.37; share = 1.84; men = 1.72; ers = 1.54; dlr = 1.52 |
| 2 | 2233 | Pro = 3.93; pro = 3.93; aid = 3.71; mln = 3.07; est = 3.53; acre = 2.89 | |
| 3 | 3364 | Aid = 3.37; aid = 6.02; ill = 3.52; pct = 3.11; pro = 3.08; pri = 2.70 | |
| 4 | 1109 | Aid = 3.37; mln = 3.64; loss = 1.93 net = 1.70; pro = 1.55; dlr = 1.51 | |
| 5 | 2633 | Aid = 3.37; pro = 3.66; aid = 3.13; ers = 2.49; men = 2.44; eat = 2.17 | |
| 6 | 2281 | Aid = 3.37; ban = 3.79; rate = 3.66; bank = 3.47; pct = 3.11;aid = 2.56 | |
| 7 | 2652 | Aid = 3.37; ban = 4.56; aid = 4.47; bank = 4.42; int = 2.65;dollar = 2.45 | |
| 8 | 2956 | Aid = 3.37; aid = 3.66; ran = 2.67; ers = 2.47; men = 2.27; port = 2.19 | |
| 9 |
| Trade = 4.97; ill = 4.79; aid = 4.67; pro = 4.05; Japan = 3.85; | |
| 10 | 2317 | Ton = 3.40; aid = 3.23; tonnes = 2.88; eat = 2.85; wheat = 2.63; |
Fig. 4Visualization of feature space