| Literature DB >> 28811818 |
Leandro Juvêncio Moreira1, Leandro A Silva2.
Abstract
The k nearest neighbor is one of the most important and simple procedures for data classification task. The kNN, as it is called, requires only two parameters: the number of k and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of k parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named iNN. The SOMiNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.Entities:
Mesh:
Year: 2017 PMID: 28811818 PMCID: PMC5547710 DOI: 10.1155/2017/4263064
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The border between the weight vectors (squares) can be interpreted as a Voronoi region (shaded area). Thus, the input patterns object (filled circles) belongs to a Voronoi region.
Algorithm 1Prototype Generation based on SOM briefly described in a pseudo-code.
Figure 2Illustration of the classification process made by the kNN and iNN algorithm.
Figure 3Illustration of the classification process made by the SOM and iNN.
Properties of dataset used in experimental analysis of this work. These datasets are available in UCI webpage (https://archive.ics.uci.edu/ml/datasets.html) and also in webpage auxiliar of Triguero et al. publication [9].
| # | Name | Properties | ||
|---|---|---|---|---|
| (#Obj) | (#Att) | (#Cla) | ||
| 1 | appendicitis | 106 | 7 | 1 |
| 2 | iris | 150 | 4 | 2 |
| 3 | australian | 690 | 14 | 16 |
| 4 | balance | 625 | 4 | 15 |
| 5 | dermatology | 366 | 33 | 13 |
| 6 | glass | 214 | 9 | 7 |
| 7 | haberman | 306 | 3 | 11 |
| 8 | heart | 270 | 13 | 10 |
| 9 | hepatitis | 155 | 19 | 4 |
| 10 | mammographic | 961 | 5 | 20 |
| 11 | monk-2 | 432 | 6 | 14 |
| 12 | movement_libras | 360 | 90 | 12 |
| 13 | newthyroid | 215 | 5 | 8 |
| 14 | pima | 768 | 8 | 18 |
| 15 | sonar | 208 | 60 | 6 |
| 16 | spectfheart | 267 | 44 | 9 |
| 17 | tae | 151 | 5 | 3 |
| 18 | vehicle | 846 | 18 | 19 |
| 19 | vowel | 990 | 13 | 21 |
| 20 | wine | 178 | 13 | 5 |
| 21 | wisconsin | 699 | 9 | 17 |
Parametrization of algorithms.
| Algorithm | Parameters |
|---|---|
|
|
|
|
| |
| SOM | Euclidian distance, batch training, maximum training time equal to 1000, rectangular lattice, and Gaussian neighborhood function with maximum aperture of 1 with decay due to the number of iterations. The SOM map dimension has the square root of the number of dataset objects by two ( |
|
| |
|
| Execution of the |
Figure 4For each experiment, a set of four results was conducted and the results are (A) kNN; (B) iNN; (C) SOMkNN; and (D) SOMiNN.
Figure 5Accuracy results for classifiers using artificial dataset.
Classification results for all algorithms represented by mean and standard deviation of accuracy and kappa measures.
| Classifiers | Acc | Kappa | |
|---|---|---|---|
| 1 |
| 0.87 ± 0.1 | 0.6 ± 0.32 |
|
| 0.87 ± 0.06 | 0.52 ± 0.24 | |
| SOM- | 0.86 ± 0.09 | 0.57 ± 0.3 | |
| SOM- | 0.87 ± 0.06 | 0.52 ± 0.24 | |
|
| |||
| 2 |
| 0.96 ± 0.04 | 0.94 ± 0.07 |
|
| 0.95 ± 0.05 | 0.93 ± 0.08 | |
| SOM- | 0.95 ± 0.02 | 0.93 ± 0.03 | |
| SOM- | 0.93 ± 0.05 | 0.90 ± 0.08 | |
|
| |||
| 3 |
| 0.54 ± 0.1 | 0.30 ± 0.15 |
|
| 0.54 ± 0.07 | 0.30 ± 0.10 | |
| SOM- | 0.52 ± 0.07 | 0.28 ± 0.10 | |
| SOM- | 0.49 ± 0.08 | 0.23 ± 0.11 | |
|
| |||
| 4 |
| 0.86 ± 0.07 | 0.41 ± 0.32 |
|
| 0.86 ± 0.07 | 0.41 ± 0.32 | |
| SOM- | 0.88 ± 0.09 | 0.48 ± 0.41 | |
| SOM- | 0.88 ± 0.09 | 0.48 ± 0.41 | |
|
| |||
| 5 |
| 0.95 ± 0.03 | 0.92 ± 0.05 |
|
| 0.95 ± 0.03 | 0.92 ± 0.05 | |
| SOM- | 0.95 ± 0.03 | 0.93 ± 0.04 | |
| SOM- | 0.95 ± 0.03 | 0.93 ± 0.04 | |
|
| |||
| 6 |
| 0.85 ± 0.05 | 0.69 ± 0.10 |
|
| 0.84 ± 0.04 | 0.68 ± 0.09 | |
| SOM- | 0.87 ± 0.05 | 0.75 ± 0.11 | |
| SOM- | 0.87 ± 0.05 | 0.75 ± 0.11 | |
|
| |||
| 7 |
| 0.67 ± 0.05 | 0.55 ± 0.08 |
|
| 0.7 ± 0.05 | 0.58 ± 0.07 | |
| SOM- | 0.66 ± 0.04 | 0.54 ± 0.07 | |
| SOM- | 0.68 ± 0.04 | 0.56 ± 0.05 | |
|
| |||
| 8 |
| 0.94 ± 0.03 | 0.88 ± 0.05 |
|
| 0.95 ± 0.02 | 0.9 ± 0.04 | |
| SOM- | 0.95 ± 0.03 | 0.89 ± 0.06 | |
| SOM- | 0.95 ± 0 | 0.9 ± 0 | |
|
| |||
| 9 |
| 0.69 ± 0.03 | 0.16 ± 0.09 |
|
| 0.71 ± 0.01 | 0.17 ± 0.07 | |
| SOM- | 0.72 ± 0.05 | 0.2 ± 0.12 | |
| SOM- | 0.72 ± 0.06 | 0.18 ± 0.12 | |
|
| |||
| 10 |
| 0.79 ± 0.04 | 0.56 ± 0.09 |
|
| 0.80 ± 0.03 | 0.59 ± 0.06 | |
| SOM- | 0.76 ± 0.03 | 0.52 ± 0.06 | |
| SOM- | 0.79 ± 0.04 | 0.56 ± 0.08 | |
|
| |||
| 11 |
| 0.65 ± 0.05 | 0.11 ± 0.15 |
|
| 0.67 ± 0.03 | 0.04 ± 0.02 | |
| SOM- | 0.64 ± 0.05 | 0.07 ± 0.15 | |
| SOM- | 0.67 ± 0.07 | 0.03 ± 0.17 | |
|
| |||
| 12 |
| 0.83 ± 0.03 | 0.82 ± 0.03 |
|
| 0.84 ± 0.03 | 0.82 ± 0.04 | |
| SOM- | 0.83 ± 0.03 | 0.82 ± 0.04 | |
| SOM- | 0.83 ± 0.03 | 0.82 ± 0.04 | |
|
| |||
| 13 |
| 0.94 ± 0.02 | 0.92 ± 0.02 |
|
| 0.95 ± 0.02 | 0.94 ± 0.02 | |
| SOM- | 0.93 ± 0.02 | 0.92 ± 0.02 | |
| SOM- | 0.94 ± 0.01 | 0.93 ± 0.01 | |
|
| |||
| 14 |
| 0.82 ± 0.12 | 0.63 ± 0.25 |
|
| 0.89 ± 0.04 | 0.78 ± 0.09 | |
| SOM- | 0.79 ± 0.08 | 0.58 ± 0.15 | |
| SOM- | 0.79 ± 0.03 | 0.58 ± 0.06 | |
|
| |||
| 15 |
| 0.77 ± 0.02 | 0.6 ± 0.03 |
|
| 0.88 ± 0.02 | 0.78 ± 0.03 | |
| SOM- | 0.78 ± 0.03 | 0.62 ± 0.04 | |
| SOM- | 0.86 ± 0.02 | 0.75 ± 0.05 | |
|
| |||
| 16 |
| 0.81 ± 0.03 | 0.61 ± 0.06 |
|
| 0.84 ± 0.04 | 0.68 ± 0.08 | |
| SOM- | 0.8 ± 0.04 | 0.6 ± 0.08 | |
| SOM- | 0.84 ± 0.04 | 0.68 ± 0.07 | |
|
| |||
| 17 |
| 0.96 ± 0.02 | 0.90 ± 0.04 |
|
| 0.97 ± 0.01 | 0.94 ± 0.02 | |
| SOM- | 0.96 ± 0.02 | 0.92 ± 0.03 | |
| SOM- | 0.97 ± 0.01 | 0.94 ± 0.02 | |
|
| |||
| 18 |
| 0.71 ± 0.03 | 0.34 ± 0.05 |
|
| 0.75 ± 0.02 | 0.42 ± 0.05 | |
| SOM- | 0.71 ± 0.03 | 0.35 ± 0.06 | |
| SOM- | 0.74 ± 0.04 | 0.42 ± 0.07 | |
|
| |||
| 19 |
| 0.69 ± 0.02 | 0.58 ± 0.03 |
|
| 0.71 ± 0.03 | 0.62 ± 0.04 | |
| SOM- | 0.68 ± 0.02 | 0.57 ± 0.03 | |
| SOM- | 0.69 ± 0.04 | 0.59 ± 0.05 | |
|
| |||
| 20 |
| 0.75 ± 0.02 | 0.51 ± 0.04 |
|
| 0.83 ± 0.02 | 0.66 ± 0.03 | |
| SOM- | 0.76 ± 0.01 | 0.52 ± 0.03 | |
| SOM- | 0.83 ± 0.02 | 0.65 ± 0.04 | |
|
| |||
| 21 |
| 0.99 ± 0.01 | 0.99 ± 0.01 |
|
| 0.96 ± 0.01 | 0.96 ± 0.01 | |
| SOM- | 0.98 ± 0.01 | 0.98 ± 0.01 | |
| SOM- | 0.95 ± 0.01 | 0.95 ± 0.01 | |
Classifiers compared in pairs and datasets index (“#”) where the performance is significantly improved.
| |
|
|
|
|
|---|---|---|---|---|
|
|
| 7,8, 9,10,11,12,13,14,15,16,17,18,19,20 | 1,3, 4,5 | 2,6, 21 |
| SOM | SOM | 1,7, 10,11,13,15,16,17,18,19,20 | 4,5, 6,8, 9,12,14 | 2,3, 21 |
|
| SOM | 1,2, 3,7, 10,11,13,14,16,19,21 | 5,12,17,18 | 4,6, 8,9, 15,20 |
|
| SOM | 3,7, 10,12,13,14,15,18,19,21 | 1,2, 5,8, 11,16,17,20 | 4,6, 9 |
Dataset percentage for performance analysis in terms of statistical significance.
|
|
|
|
|
| ( |
|---|---|---|---|---|---|
|
|
| 66.7% | 19.1% | 14.2% | 33.30% |
| SOM | SOM | 52.4% | 33.3% | 14.3% | 47.6% |
|
| SOM | 52.4% | 19.1% | 28.62% | 47.7% |
|
| SOM | 47.6% | 38.1% | 14.3% | 52.4% |
Figure 6Radar graphic contrasting pairs of classifiers.
Figure 7Results of reduction per accuracy.
Figure 8Classification time analysis. The datasets are organized in ascendant order of number of objects.
Figure 9Classification time summarized for all classifiers.
Comparing the results of this work with the Chen algorithm [27].
| Accuracy | Time | Reduction | |
|---|---|---|---|
|
| 0.81 ± 0.04 | 88.04 ± 0.05 | 0 |
|
| 0.83 ± 0.03 | 99.04 ± 107.81 | 0 |
| SOM | 0.81 ± 0.04 | 19.32 ± 21.26 | 0.85 ± 0.05 |
| SOM | 0.82 ± 0.04 | 29.76 ± 30.68 | 0.85 ± 0.05 |
| Chen | 0.79 ± 0.01 | 30.32 ± 31.83 | 0.87 ± 0.10 |