| Literature DB >> 26649071 |
Min Zhu1, Jing Xia2, Molei Yan3, Guolong Cai3, Jing Yan3, Gangmin Ning2.
Abstract
With the development of medical technology, more and more parameters are produced to describe the human physiological condition, forming high-dimensional clinical datasets. In clinical analysis, data are commonly utilized to establish mathematical models and carry out classification. High-dimensional clinical data will increase the complexity of classification, which is often utilized in the models, and thus reduce efficiency. The Niche Genetic Algorithm (NGA) is an excellent algorithm for dimensionality reduction. However, in the conventional NGA, the niche distance parameter is set in advance, which prevents it from adjusting to the environment. In this paper, an Improved Niche Genetic Algorithm (INGA) is introduced. It employs a self-adaptive niche-culling operation in the construction of the niche environment to improve the population diversity and prevent local optimal solutions. The INGA was verified in a stratification model for sepsis patients. The results show that, by applying INGA, the feature dimensionality of datasets was reduced from 77 to 10 and that the model achieved an accuracy of 92% in predicting 28-day death in sepsis patients, which is significantly higher than other methods.Entities:
Mesh:
Year: 2015 PMID: 26649071 PMCID: PMC4663319 DOI: 10.1155/2015/794586
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 2Flowchart of INGA.
Figure 1Niche Elimination Operation.
Figure 3The relationship between encoding and features.
d(P): the ability to maintain population diversity. d(P) was run for 20, 50, and 100 generations, respectively, in GA, NGA, and INGA.
| Generation | GA | NGA | INGA |
|---|---|---|---|
| 20 | 0.5635 | 0.5213 | 0.4812 |
| 50 | 0.6271 | 0.5748 | 0.5248 |
| 100 | 0.6963 | 0.6147 | 0.5629 |
Convergence of the Schaffer function.
| Execution count | GA | NGA | INGA | |||
|---|---|---|---|---|---|---|
| Optimal value | Whether converges | Optimal value | Whether converges | Optimal value | Whether converges | |
| 1 | 0.9903 | N | 0.9903 | N | 0.9995 | Y |
| 2 | 0.9632 | N | 0.9998 | Y | 1 | Y |
| 3 | 1 | Y | 1 | Y | 0.9998 | Y |
| 4 | 0.9619 | N | 0.9625 | N | 1 | Y |
| 5 | 0.9991 | Y | 0.9991 | Y | 0.9995 | Y |
| 6 | 0.9631 | N | 0.9995 | Y | 1 | Y |
| 7 | 0.9631 | N | 1 | Y | 0.9998 | Y |
| 8 | 0.9992 | Y | 0.9628 | N | 1 | Y |
| 9 | 0.9617 | N | 0.9982 | Y | 1 | Y |
| 10 | 0.9996 | Y | 1 | Y | 0.9998 | Y |
Figure 4Convergence curves.
Figure 5Individuals' distribution, where the x-axis represents the fitness of the individuals and the y- and z-axes represent the Euclidean distance between the individuals.
The number of feature parameters after dimensionality reduction.
| PCA | MDS | NGA | INGA | |
|---|---|---|---|---|
| BP | 17 ± 2 | 27 ± 3 | 21 ± 3 | 15 ± 2 |
| SVM | 2 ± 10 | 26 ± 4 | 20 ± 4 | 16 ± 3 |
| RF | 21 ± 4 | 22 ± 3 | 23 ± 3 | 10 ± 2 |
Figure 6Classification accuracy (%). (a) is the result before dimensionality reduction and (b)–(e) are the result after dimensionality reduction.
Figure 7ROC curves. Classification using BP, SVM, and RF based on the INGA dimension reduction algorithm.
Figure 8Area under the curve (AUC) of the algorithm.
Figure 9Robustness (%) of the algorithm.