| Literature DB >> 25302074 |
José Hernández-Torruco1, Juana Canul-Reich1, Juan Frausto-Solís2, Juan José Méndez-Castillo3.
Abstract
Guillain-Barré syndrome (GBS) is a neurological disorder which has not been explored using clustering algorithms. Clustering algorithms perform more efficiently when they work only with relevant features. In this work, we applied correlation-based feature selection (CFS), chi-squared, information gain, symmetrical uncertainty, and consistency filter methods to select the most relevant features from a 156-feature real dataset. This dataset contains clinical, serological, and nerve conduction tests data obtained from GBS patients. The most relevant feature subsets, determined with each filter method, were used to identify four subtypes of GBS present in the dataset. We used partitions around medoids (PAM) clustering algorithm to form four clusters, corresponding to the GBS subtypes. We applied the purity of each cluster as evaluation measure. After experimentation, symmetrical uncertainty and information gain determined a feature subset of seven variables. These variables conformed as a dataset were used as input to PAM and reached a purity of 0.7984. This result leads to a first characterization of this syndrome using computational techniques.Entities:
Mesh:
Year: 2014 PMID: 25302074 PMCID: PMC4180197 DOI: 10.1155/2014/432109
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Algorithm 1Partitions around medoids (PAM).
Purity of a clustering with three classes.
| Class A | Class B | Class C | ||
|---|---|---|---|---|
| Cluster 1 | 0 |
| 1 | 15 |
| Cluster 2 |
| 2 | 0 | 11 |
| Cluster 3 | 3 | 1 |
| 25 |
|
| ||||
| 12 | 17 | 22 |
| |
Results of filter methods ranked on purity.
| Method | Number of features | Purity |
|---|---|---|
| Information gain | 7 | 0.7984 |
| Symmetrical uncertainty | 7 | 0.7984 |
| CFS | 16 | 0.7984 |
| Chi-squared | 41 | 0.7829 |
| Consistency | 6 | 0.6589 |
List of variables with the highest purity (0.7984) selected by information gain and symmetrical uncertainty.
| Feature | Meaning |
|---|---|
| v105 | Amplitude of left ulnar motor nerve |
| v106∗ | Area under the curve of left ulnar motor nerve |
| v116 | Amplitude of right ulnar motor nerve |
| v172∗ | Amplitude of left median sensory nerve |
| v177∗ | Amplitude of right median sensory nerve |
| v182∗ | Amplitude of left ulnar sensory nerve |
| v187 | Amplitude of right ulnar sensory nerve |
List of variables with the highest purity (0.7984) selected by CFS.
| Feature | Meaning |
|---|---|
| v29 | Extraocular muscles involvement |
| v30 | Ptosis |
| v40 | Karnofsky at discharge |
| v105 | Distal amplitude of left ulnar motor nerve |
| v106∗ | Area under the curve of left ulnar motor nerve |
| v108 | Proximal amplitude of left ulnar motor nerve |
| v111 | Average F-wave latency of left ulnar motor nerve |
| v116 | Distal amplitude of right ulnar motor nerve |
| v134 | F-wave amplitude of left tibial motor nerve |
| v172∗ | Amplitude of left median sensory nerve |
| v173 | Area under the curve of left median sensory nerve |
| v177∗ | Amplitude of right median sensory nerve |
| v182∗ | Amplitude of left ulnar sensory nerve |
| v185 | Conduction velocity of right ulnar sensory nerve |
| v187 | Amplitude of right ulnar sensory nerve |
| v192 | Amplitude of left sural sensory nerve |
Figure 1Purity reached by the best feature subsets as ranked by chi-squared, information gain, and symmetrical uncertainty methods.
Purity of pairwise clustering of GBS subtypes.
| GBS subtypes | IG | SU | CFS | Consistency | Chi-squared | All features |
|---|---|---|---|---|---|---|
| AIDP and AMAN | 0.9649 (2) | 0.9649 (2) | ∗ |
| 0.9649 (2) | 0.8771 |
| AMAN and AMSAN | 0.927 (3) | 0.9375 (2) | 0.875 (7) | 0.927 (3) |
| 0.7395 |
| AIDP and AMSAN |
| 0.9367 (2) | 0.9113 (9) | 0.7468 (5) |
| 0.8354 |
| AIDP and MF | 0.8787 (3) | 0.8787 (4) | 0.8787 (4) | 0.8787 (3) |
| 0.6666 |
| AMAN and MF |
| 0.96 (3) |
| 0.74 (2) |
| 0.96 |
| AMSAN and MF | 0.9305 (13) | 0.9444 (13) |
| 0.8194 (5) | 0.9444 (14) | 0.8333 |
IG: information gain, SU: symmetrical uncertainty, and ∗one feature selected therefore purity was not computed. The number of features selected in each case is shown in parenthesis.
Purity for different numbers of clusters using the four GBS subtypes.
|
| IG | SU | CFS | Consistency | Chi-squared | All features |
|---|---|---|---|---|---|---|
| 2 | 0.6434 (9) | 0.6434 (10) | 0.6511 (16) | 0.4883 (6) | 0.6124 (48) | 0.5038 |
| 3 | 0.7906 (7) | 0.7906 (7) | 0.7286 (16) | 0.6666 (6) | 0.7829 (6) | 0.5813 |
| 4 | 0.7984 (7) | 0.7984 (7) | 0.7984 (16) | 0.6589 (6) | 0.7829 (41) | 0.6899 |
| 5 | 0.7984 (5) | 0.7984 (5) | 0.7906 (16) | 0.7286 (6) | 0.7829 (91) | 0.6821 |
| 6 | 0.7751 (4) | 0.7751 (7) | 0.8139 (16) | 0.7596 (6) | 0.7751 (38) | 0.6666 |
| 10 | 0.8139 (5) | 0.8139 (5) | 0.8217 (16) | 0.7596 (6) | 0.8062 (38) | 0.6976 |
| 20 | 0.8449 (38) | 0.8372 (31) | 0.8527 (16) | 0.8294 (6) | 0.8294 (53) | 0.7596 |
k: number of clusters, IG: information gain, and SU: symmetrical uncertainty. The number of features selected in each case is shown in parenthesis.