| Literature DB >> 31637055 |
Abstract
An individual's subjective judgment about his or her Human Immunodeficiency Virus status depends on certain factors, behavioral, health, and sociodemographic alike. This paper aims to develop a model with good accuracy for predicting subjective HIV infection status using the random forest approach. A total of 12,796 responses of Malawians over a 12-year period were assessed. Fourteen risk factors including behavioral, health, and sociodemographic information were analysed as potential predictors of subjective Human Immunodeficiency Virus infection status in the general population and thirteen behavioral, health, and sociodemographic information were analysed among males and females. The random forest approach was adopted to build a comprehensive model comprising 14 risk factors in Malawi. It was revealed that age, worries about infection, and health rate were the most significant predictors as compared to use of condoms, marital status, and education which were the least important predictors of subjective Human Immunodeficiency Virus status in Malawi. However, the importance of infidelity on the part of a spouse and marital status as predictors of subjective Human Immunodeficiency Virus status alternated among males and females. The importance of infidelity and marital status was relatively high among females than among males. The model achieved a prediction accuracy of about 97%-99% measured by c-statistic with jack-knife cross validation and verified by Mathews correlation coefficient. As a result, RF based model has great potential to be an effective approach for analysing subjective health status.Entities:
Year: 2019 PMID: 31637055 PMCID: PMC6766123 DOI: 10.1155/2019/5849183
Source DB: PubMed Journal: AIDS Res Treat ISSN: 2090-1240
Training and testing data used in the study.
| Data | Category | Subjective HIV status |
|---|---|---|
| Training data | Both sex | 8,973 |
| Testing data | 3,796 | |
| Total | 12,769 | |
| Training data | Males | 3,019 |
| Testing data | 1,292 | |
| Total | 4,311 | |
| Training data | Females | 5,958 |
| Testing data | 2,500 | |
| Total | 8458 |
Source. MLSFH, 1998–2010.
Figure 1Importance of classifiers.
Prediction results of the random forest method (confusion matrix) in the present study.
| Data | Categories | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC (%) |
|---|---|---|---|---|---|
| Training | Both Sex | 86.59 | 99.94 | 97.24 | 76.47 |
| Testing | 93.79 | 99.97 | 98.63 | 74.83 | |
| Training | Males | 88.96 | 100.00 | 97.81 | 78.06 |
| Testing | 95.72 | 99.71 | 98.92 | 69.39 | |
| Training | Females | 89.78 | 99.87 | 97.73 | 75.11 |
| Testing | 94.70 | 100.00 | 98.96 | 73.02 |
Source. MLSFH, 1998–2010. Note. All outputs were significant at p < 0.05.
Figure 2A ROC curve for subjective HIV status true positive and false positive rate.