| Literature DB >> 31264524 |
Humphrey Brydon1, Rénette Blignaut1, Joachim Jacobs2.
Abstract
The latest population estimates released by Statistics South Africa indicate that 25.03% of all deaths in 2017 in South Africa were AIDS-related. Along with these results, it is also reported that 7.06% of the population were living with HIV, with the HIV-prevalence among youth (aged 15-24) at 4.64% for 2017 (STATSSA. (2018). Retrieved from Statistics South Africa: http://www.statssa.gov.za/publications/P0302/P03022017.pdf ). The data used in the study contained information related to the risk-taking behaviours associated with the sexual activity of entering first-year students at the University of the Western Cape. In this study, a logistic regression modelling procedure was carried out on those students that were determined to be sexually active, therefore, in the modelling procedure significant risk behaviours of sexually active first-year students could be identified. Of the 14 variables included in the modelling procedure, six were found to be significantly associated with sexually active students. The significant variables included; the age and race of the student, whether the student had ever taken an HIV test, the importance of religion in influencing the sexual behaviour of the student, whether the student consumed alcohol and lastly whether the student smoked. This study further investigated the impact of introducing sample weighting, bootstrap sampling as well as variable selection methods into the logistic regression modelling procedure. It is shown that incorporating these techniques into the modelling procedure produces logistic regression models that are more accurate and have an increased predictive capability. The bootstrapping procedure is shown to produce logistic regression models that are more accurate than those produced without a bootstrap procedure. A comparison between 200, 500 and 1000 bootstrap samples is also incorporated into the modelling procedure with the models produced from 200 bootstrap samples shown to be just as accurate those produced from 500 or 1000 bootstrap samples. Of the five variable selection methods used, it is shown that the Newton-Raphson and Fisher methods are unreliable in producing logistic regression models. The forward, backward and stepwise variable selection methods are shown to produce very similar results.Entities:
Keywords: HIV prevention; logistic regression; sample weighting; sexual risk behaviour; variable selection; weighted bootstrap
Mesh:
Year: 2019 PMID: 31264524 PMCID: PMC6610523 DOI: 10.1080/17290376.2019.1636708
Source DB: PubMed Journal: SAHARA J ISSN: 1729-0376
Gender versus racial grouping in the training data set.
| Race | |||||
|---|---|---|---|---|---|
| African/Black | Coloured | White | Indian/Asian | Total | |
| Female | 237 (count) | 427 | 28 | 22 | 714 |
| 33.19 (row %) | 59.80 | 3.92 | 3.08 | 100.00 | |
| 65.29 (col. %) | 63.54 | 59.57 | 52.38 | 63.52 | |
| Male | 126 | 245 | 19 | 20 | 410 |
| 30.73 | 59.76 | 4.63 | 4.88 | 100.00 | |
| 34.71 | 36.46 | 40.43 | 47.62 | 36.48 | |
| Total | 363 | 672 | 47 | 42 | 1124 |
| 32.30 | 59.79 | 4.18 | 3.74 | ||
| 100.00 | 100.00 | 100.00 | 100.00 | ||
Sampling weights for the training data set.
| Race | ||||
|---|---|---|---|---|
| African/Black | Coloured | White | Indian/Asian | |
| Female | 21.09 (Sample %) | 37.98 | 2.49 | 1.96 |
| 24.11 (UWC %) | 32.10 | 1.65 | 2.91 | |
| 1.14 (Weight) | 0.85 | 0.66 | 1.49 | |
| Male | 11.21 | 21.80 | 1.69 | 1.78 |
| 17.84 | 17.90 | 1.12 | 2.38 | |
| 1.59 | 0.82 | 0.66 | 1.34 | |
Sexual activity versus race in the training data set (unweighted data).
| Race | |||||
|---|---|---|---|---|---|
| African/Black | Coloured | White | Indian/Asian | Total | |
| Sexually Active | 209 (count) | 297 | 29 | 6 | 542 |
| 38.56 (row %) | 54.80 | 5.35 | 1.29 | 100.00 | |
| 61.83 (col. %) | 47.44 | 65.91 | 19.44 | 51.92 | |
| Not Sexually Active | 129 | 329 | 15 | 29 | 502 |
| 25.70 | 65.54 | 2.99 | 5.78 | 100.00 | |
| 38.17 | 52.56 | 34.09 | 80.56 | 48.08 | |
| Total | 338 | 626 | 44 | 36 | 1044 |
| 32.38 | 59.96 | 4.21 | 3.45 | ||
| 100.00 | 100.00 | 100.00 | 100.00 | ||
Sexual activity versus gender in the training data set (unweighted data).
| Gender | |||
|---|---|---|---|
| Female | Male | Total | |
| Sexually Active | 308 (count) | 234 | 542 |
| 56.83 (row %) | 43.17 | 100.00 | |
| 45.77 (col. %) | 63.07 | 51.92 | |
| Not Sexually Active | 365 | 137 | 502 |
| 72.71 | 27.29 | 100.00 | |
| 54.23 | 36.93 | 48.08 | |
| Total | 673 | 371 | 1044 |
| 64.46 | 35.54 | ||
| 100.00 | 100.00 | ||
List of predictor variables and their responses.
| Predictor Variable | Variable Code Name | Response/Categories | |
|---|---|---|---|
| Gender of student | gender | Male | Female |
| Do you personally know anyone with HIV/AIDS? | know_anyone_HIV | Yes | No |
| Do you feel that you know enough about HIV/AIDS? | know_enough_HIV | Yes | No |
| Have you ever taken an HIV test? | Taken_HIV | Yes | No |
| Do you intend to go for an HIV test? | intention_HIV_test | Yes | No |
| Accommodation during studies | Res | Stays in hostel | Does not stay in hostel |
| Matriculation province | Prov | Western Cape | All other Provinces |
| Use any drug in the last 30 days | drug_use | Yes | No |
| Age Group | age_group | 16–19 | 20–24 |
| Racial Group | racial_gr2 | African | Not African |
| Do you use alcohol? | alcohol_use | Yes | No |
| Do you smoke? | smoke | Yes | No |
| Depressed more than 2 weeks in row | depressed | Yes | No |
| Importance of religion in influencing sexual activity | religion_vi | Very Important | Not so Important |
Average inclusion (given in %) of predictor variables in logistic regression models.
| Variable | 200 | 500 | 1000 | Total Average Inclusion | ||||
|---|---|---|---|---|---|---|---|---|
| W | UW | W | UW | W | UW | W | UW | |
| age_group | 76.33 | 80.83 | 71.87 | 77.73 | 74.10 | 80.70 | 74.10 | 79.75 |
| alcohol_use | 99.33 | 98.17 | 97.20 | 95.60 | 98.67 | 97.20 | 98.40 | 96.99 |
| depressed | 9.00 | 11.00 | 8.07 | 11.60 | 9.67 | 12.53 | 8.91 | 11.71 |
| drug_use | 13.33 | 16.17 | 16.67 | 22.60 | 16.03 | 20.40 | 15.34 | 19.72 |
| know_enough_HIV | 5.50 | 3.00 | 2.40 | 1.47 | 3.60 | 1.53 | 3.83 | 2.00 |
| taken_HIV_test | 44.00 | 46.33 | 48.60 | 52.67 | 43.57 | 47.83 | 45.39 | 48.94 |
| intention_HIV_test | 1.50 | 0.83 | 1.53 | 1.13 | 2.23 | 1.57 | 1.75 | 1.18 |
| know_anyone_HIV | 15.17 | 18.00 | 15.80 | 15.53 | 12.43 | 12.23 | 14.47 | 15.25 |
| racial_gr2 | 63.50 | 54.17 | 62.40 | 51.00 | 62.87 | 52.53 | 62.92 | 52.57 |
| religion_vi | 100.00 | 100.00 | 98.80 | 100.00 | 99.70 | 99.90 | 99.50 | 99.97 |
| smoke | 41.33 | 42.67 | 41.13 | 44.73 | 41.37 | 44.73 | 41.28 | 44.04 |
| gender | 19.00 | 16.17 | 21.07 | 18.47 | 20.90 | 16.40 | 20.32 | 17.01 |
| prov | 11.00 | 5.83 | 10.40 | 8.67 | 11.33 | 8.83 | 10.91 | 7.78 |
| res | 4.00 | 2.50 | 0.93 | 1.13 | 1.67 | 1.97 | 2.20 | 1.87 |
Figure 1.AIC values across data sets.
Figure 2.BIC/SC values across data sets.
Figure 3.values across data sets.
Stable Logistic Regression model estimates.
| Parameter | Coefficient | Standard Error |
|---|---|---|
| Intercept | −1.220083 | 9.507995 |
| age_group | 0.720440 | 0.201313 |
| alcohol_use | 0.698189 | 0.135473 |
| taken_HIV_test | −0.466004 | 0.135816 |
| racial_gr2 | −0.561591 | 0.140210 |
| religion_vi | 0.701504 | 0.132257 |
| smoke | 0.568500 | 0.166486 |
Correct classification percentages.
| Sexually Active | Not Sexually Active | |
|---|---|---|
| Stable Logistic Regression Model | 75.41% | 63.41% |
| Original Data Set Logistic Regression Model | 81.97% | 21.95% |