| Literature DB >> 34335861 |
Luana Ibiapina Cordeiro Calíope Pinheiro1, Maria Lúcia Duarte Pereira1, Marcial Porto Fernandez2, Francisco Mardônio Vieira Filho2, Wilson Jorge Correia Pinto de Abreu3, Pedro Gabriel Calíope Dantas Pinheiro4.
Abstract
Dementia interferes with the individual's motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.Entities:
Year: 2021 PMID: 34335861 PMCID: PMC8286188 DOI: 10.1155/2021/4602465
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Classes and independent variables of people with HIV/AIDS.
| Classes | Variables |
|---|---|
| Sociodemographic profile | Age, gender, marital status, education |
|
| |
| History of infectious disease | Transmission route, virus type, years with HIV, hepatitis B, hepatitis C, comorbidities, previous hospitalisation, opportunistic disease, initial viral load, current viral load, initial TCD4, current TCD4 |
|
| |
| Drug adherence | Antiretroviral, forgets to use medication, careless use of medication, stops drug therapy |
|
| |
| Neurological assessment | Heads up, psychomotor speed, memory, construction |
|
| |
| Psychosomatic changes | Anxiety, obsession-compulsion, interpersonal sensitivity, depression, summing, psychoticism, phobic anxiety, hostility, paranoid ideation |
|
| |
| Daily activities | Food, wear, bath, hygiene, going to the bathroom, intestinal control, bladder control, climbing ladder, get up, wander |
Analysis of machine learning algorithms against the variation of HIV-associated dementia data.
| Total features: 104 | Logistic regression | Decision tree | Neural network | KNN | Random forest |
|---|---|---|---|---|---|
| No PCA | |||||
| Accuracy | 0.7805 | 0.7317 | 0.5732 | 0.6098 | 0.8293 |
| Precision | 0.7377 | 0.7451 | 0.5732 | 0.6415 | 0.7797 |
| Recall | 0.9574 | 0.8085 | 1.0 | 0.7234 | 0.9787 |
Analysis of machine learning algorithms against the variation of HIV-associated dementia data.
| Logistic regression | Decision tree | Neural network | KNN | Random forest | ||
|---|---|---|---|---|---|---|
| Total features: 104 | ||||||
| No PCA | Accuracy | 0.7805 | 0.7317 | 0.5732 | 0.6098 | 0.8293 |
| Precision | 0.7377 | 0.7451 | 0.5732 | 0.6415 | 0.7797 | |
| Recall | 0.9574 | 0.8085 | 1.0 | 0.7234 | 0.9787 | |
|
| ||||||
| Total features: 47 (PCA = 90%) | ||||||
| 90% of captured variation | Accuracy | 0.7439 | 0.7073 | 0.5732 | 0.5976 | 0.8049 |
| Precision | 0.7097 | 0.7091 | 0.5732 | 0.6296 | 0.7627 | |
| Recall | 0.9362 | 0.8298 | 1.0 | 0.7234 | 0.9574 | |
|
| ||||||
| Total features: 33 | ||||||
| 80% of captured variation | Accuracy | 0.7195 | 0.7683 | 0.5610 | 0.5976 | 0.8049 |
| Precision | 0.6935 | 0.7917 | 0.7619 | 0.6296 | 0.7541 | |
| Recall | 0.9149 | 0.8085 | 0.3404 | 0.7234 | 0.9787 | |
|
| ||||||
| Total features: 23 | ||||||
| 70% of captured variation | Accuracy | 0.5610 | 0.6829 | 0.5854 | 0.5976 | 0.8659 |
| Precision | 0.5679 | 0.6981 | 0.9333 | 0.6296 | 0.8462 | |
| Recall | 0.9787 | 0.7872 | 0.2979 | 0.7234 | 0.9362 | |
|
| ||||||
| Total features: 16 | ||||||
| 60% of captured variation | Accuracy | 0.7317 | 0.7073 | 0.6463 | 0.5366 | 0.8049 |
| Precision | 0.7049 | 0.7447 | 0.7500 | 0.5957 | 0.7818 | |
| Recall | 0.9149 | 0.7447 | 0.5745 | 0.5957 | 0.9149 | |
|
| ||||||
| Total features: 10 | ||||||
| 50% of captured variation | Accuracy | 0.7195 | 0.6098 | 0.5610 | 0.5122 | 0.7317 |
| Precision | 0.7000 | 0.6667 | 0.5696 | 0.5714 | 0.7358 | |
| Recall | 0.8936 | 0.6383 | 0.9574 | 0.5957 | 0.8298 | |
|
| ||||||
| Total features: 6 | ||||||
| 40% of captured variation | Accuracy | 0.6220 | 0.5610 | 0.5366 | 0.5000 | 0.6829 |
| Precision | 0.6111 | 0.6279 | 0.5570 | 0.5600 | 0.7059 | |
| Recall | 0.9362 | 0.5745 | 0.9362 | 0.5957 | 0.7660 | |
Figure 1Accuracy of machine learning algorithms regarding HIV-associated dementia characteristics.
Figure 2Precision ratio, accuracy, and recall of random forest using different amounts of features.
Demographic characteristics of people with HIV.
| Gender | Female | 91 |
| Male | 179 | |
|
| ||
| Age | 50 to 55 | 133 |
| 56 to 60 | 75 | |
| 61 to 70 | 52 | |
| >70 | 10 | |
|
| ||
| Marital status | Not married | 114 |
| Married | 71 | |
| Widowed | 43 | |
| Divorced | 42 | |
| Incomplete fundamental | 89 | |
|
| ||
| Education | Completed elementary school | 46 |
| High school | 61 | |
| Higher | 29 | |
| Illiterate | 45 | |