Literature DB >> 25140216

Selection of Variables that Influence Drug Injection in Prison: Comparison of Methods with Multiple Imputed Data Sets.

Saiedeh Haji-Maghsoudi1, Ali Akbar Haghdoost2, Mohammad Reza Baneshi3.   

Abstract

BACKGROUND: Prisoners, compared to the general population, are at greater risk of infection. Drug injection is the main route of human immunodeficiency virus ý(HIV) transmission, in particular in Iran. What would be of interest is to determine variables that govern drug injection among prisoners. However, one of the issues that challenge model building is incomplete national data sets. In this paper, we addressed the process of model development when missing data exist.
METHODS: Complete data on 2720 prisoners was available. A logistic regression model was fitted and served as gold standard. We then randomly omitted 20%, and 50% of data. Missing date were imputed 10 times, applying multiple imputation by chained equations (MICE). Rubin's rule (RR) was applied to select candidate variables and to combine the results across imputed data sets. In S1, S2, and S3 methods, variables retained significant in one, five, and ten imputed data sets and were candidate for the multifactorial model. Two weighting approaches were also applied.
FINDINGS: Age of onset of drug use, recent use of drug before imprisonment, being single, and length of imprisonment were significantly associated with drug injection among prisoners. All variable selection schemes were able to detect significance of these variables.
CONCLUSION: We have seen that the performances of easier variable selection methods were comparable with RR. This indicates that the screening step can be used to select candidate variables for the multifactorial model.

Entities:  

Keywords:  Drug injection; Missing data; Multiple imputation; Prison; Variable selection

Year:  2014        PMID: 25140216      PMCID: PMC4137438     

Source DB:  PubMed          Journal:  Addict Health        ISSN: 2008-4633


Introduction

Human immunodeficiency virus (HIV) is an important public health concerns.1 It has been shown that about 10% of new HIV infections occur in an injecting drug user (IDU).2 Furthermore, the prevalence of HIV infection was very high among IDUs, especially when harm reduction initiatives were not available.1,3 In addition, other countries' experiences suggest that the incidence and prevalence of HIV infection can increase rapidly among IDUs.1,3 In Iran, about 65% of reported HIV cases have been attributed to drug injection.4 In the Middle East and North Africa (MENA) region, the main route of HIV transmission is through drug injection.5 Evidence in the last three decades indicates that a group which is at high risk of HIV infection, and therefore needs careful attention, is prisoners. Use of non-sterile injecting equipment in prisons is one of the most important, independent determinants of HIV infection.6,7 In many countries, a considerable proportion of prisoners are drug dependent.8 In addition, the prevalence of HIV among prisoners has been reported at 10% to 25%.9 Turnover of the prison population, paves the way for widespread infection in the general population as well. It has been shown that the rate of sharing injection equipment in prisons is much higher than the community.7 This might partially justify the high rate of HIV infection among prisoners. Other influential factors are sexual activities, tattooing, and sharing of contaminated razor blades.10 What would be of interest is to identify the variables that persuade the prisoners to inject drugs in prison. This allows the development of a prediction tool to identify groups which are at high risk of drug injection. Regression models are frequently used to assess the association between potential risk factors and a clinical outcome.11,12 Clearly predictions drive would be valid if the assumptions of models are satisfied by the data. One of the issues that challenge model building is incomplete records. Missing data are common in national data sets due to several reasons. For example, people might refuse to respond to some questions, or blood samples taken from patients might be lost.13-15 Disadvantages of omission of incomplete records have been addressed extensively in the literature.16-19 The standard approach to impute the missing data is multiple imputations via chained equations (MICE). This process replaces each missing datum by different values, and therefore creates multiple data sets, typically 10.19-21 To identify the risk factors of the outcome (in our application drug injection in prison), the process follows by fitting separate regression models to each of 10 imputed data sets (10 models in total). Rubin’s rule (RR) provides a formula to combine regression coefficients and standard deviation (SD) across 10 data sets, to get aggregated odds ratios (OR), and to calculate a P-value for each variable. To apply backward elimination (BE) approach, the variable with the highest P-value is removed. Based on the remaining variables the whole process is repeated iteratively until all variables remain significant. Clearly, in RR the process of variable selection and combination of results are done simultaneously.11,22 The RR approach is time demanding. Therefore, alternative strategies have been proposed. In these approaches the process of variable selection is performed in each data set independently. In other words, after fitting of separate regression models to each data set, in a screening round, candidate variables for the multifactorial model are finalized. This is followed by the aggregation of estimates of selected variables across 10 data sets.22 An alternative is to combine all 10 imputed data sets to get one single data. Here the classic BE can be used to fit the final model and to identify significant variables. However, the sample size would be 10 times that of the original data. This artificially reduces SD and increases the chance of variables being significant. To tackle this problem, weighted regression methods can be implemented.22 There are few studies comparing the performance of alternative variable selection methods. This manuscript has two interrelated aims; to identify the variables that govern drug injection in prison, and to compare alternative variable selection methods in presence of missing data.

Methods

In the present study, information of the national HIV Bio-Behavioral Surveillance Survey (BBSS) among prisoners in 2009 was used. The dependent variable was history of drug injection (yes/no question). Independent variables were: history of imprisonment (in months); the onset of drug use; the main cause of recent incarceration including drug smuggling, murder, rape/sexual assault, violence/aggression, theft, smuggling of illegal goods, and financial crimes (all yes/no questions); dominant drug used in the last month before recent imprisonment [grass, ecstasy, heroin-crack, crystal, methadone, and alcohol (all yes/no questions)]; education level; job (4 categories); marital status (married, single, divorced/widowed); and knowledge about acquired immunodeficiency syndrome (AIDS). Information on 2720 prisoners was available. Using all data, a logistic regression model with BE variable selection was fitted to identify variables which influence the outcome. Results were considered as gold standard. Then, missing data was generated at 20%, and 50%. Missing data were imputed 10 times. Applying standard RR approach, final estimates were derived. We then tried some alternative variable selection algorithms (S1, S2, and S3) as follows. In S1, based on the results of logistic regression in each imputed data set, only variables which retained significance at least in one imputed data set were candidate for the multifactorial model. In S2 and S3, variables which retained significance in more than 5 and in all 10 data sets were selected to be offered to the multifactorial model. We also examined the performance of weighted regression approaches. We merged 10 imputed data sets to get one single set. Data were analyzed by fitting weighted logistic regression. Two weighting schemes were implemented (W1 and W2). In W1 weight of 0.1 was used. In W2 weight (1-f)/10 was applied where f is the mean of fraction of missing rate for all variables.22 To address the impact of event per variable (EPV), a random sample from the data was taken corresponding to an EPV of 5.23,24 All processes explained above were applied to this data set. Results of W1, W2, S1, S2, S3, and RR methods were compared in terms of estimated, OR, confidence interval (CI), and significance of variables. All analyses were done using Stata Statistical Software (Vesrion 10), Release (Stata Corporation, College Station, TX, USA), and R package mice.

Results

The mean ± SD of age was 32.82 ± 8.56 years. Mean ± SD of time spent in prison was 26.20 ± 28.70 months. Around 52.0% of subjects were married. The majority of subjects had preliminary education. Only 3.7% were unemployed (Table 1). The most important reasons for being in prison were drug trafficking (52.6%) and robbery (25.8%). In total, 22.7% had a history of drug injection. The main drugs misused were heroin (51.5%), and crystal (13.3%). Mean ± SD of age of onset of drug use was 20.79 ± 6.36 years (Table 1).
Table 1

Descriptive statistics of demographic information

VariableFrequency (Percentage)
Education
 Literate/illiterate367 (13.5)
 Primary school/guidance school1790 (65.8)
 High school diploma/university degree563 (20.7)
Job
 Transit driver148 (5.4)
 Seasonal worker347 (12.8)
 Unemployed100 (3.7)
 Other jobs2125 (78.1)
Marital status
 Single946 (34.8)
 Married1418 (52.1)
 Other (widow, divorce, …)356 (13.1)
Of 2720 prisoners, 618 subjects had a history of drug injection in prison. This gave an EPV of about 25. After fitting logistic regression to complete data, 5 variables remained significant in the model (full model). The onset of drug use, drug use in the last months before imprisonment (heroin-crack and methadone), marital status, and length of stay in prison significantly influenced the outcome. Recent drug users were less likely to inject drugs in prison (Table 2). One year decrease in onset of drug use was associated with 8.0% reduction in risk of drug injection in prison (P < 0.001). Those who used drugs in the last month before imprisonment were at least 3 times more likely to inject drugs (OR = 4.52 in the case of heroin-crack, and 3.06 in the case of methadone). In comparison to married prisoners, single and divorced subjects were 31.0% and 132% more likely to inject drugs in prison (Table 2).
Table 2

Comparison of performance of variable selection methods at EPV (Event per variable) of 25

VariableFull20% missing rate
50% missing rate
S1
S2
S3
W1
W2
RR
S1
S2
S3
W1
W2
RR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
PPPPPPPPPPPPP
The onset of drug use0.920.920.920.920.920.920.920.930.930.930.930.930.93
0.90-0.940.90-0.940.90-0.940.90-0.940.90-0.940.90-0.940.90-0.940.91-0.950.91-0.950.91-0.950.91-0.950.91-0.950.91-0.95
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001
Dominant drug used in last month before recent imprisonmentHeroin-crack4.524.534.534.534.534.534.534.514.514.514.514.514.51
3.47-5.893.48-5.903.48-5.903.48-5.903.48-5.903.48-5.913.45-5.903.45-5.903.45-5.903.45-5.903.46-5.873.45-5.903.45-5.90
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001
Methadone3.062.982.982.982.982.982.983.123.123.123.123.123.12
2.06-4.552.01-4.422.01-4.422.01-4.422.01-4.422.00-4.432.01-4.422.10-4.642.10-4.642.10-4.642.11-4.612.10-4.642.10-4.64
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001
Marital statusMarried-------------
Single1.311.321.321.321.321.321.321.381.381.381.381.381.38
1.05-1.651.05-1.651.05-1.651.05-1.651.05-1.651.05-1.651.05-1.651.09-1.731.09-1.731.09-1.731.10-1.731.09-1.731.09-1.73
0.0200.0200.0200.0200.0200.0200.0200.0100.0100.0100.0100.0100.010
Divorced/Widow2.322.352.352.352.352.352.352.492.492.492.492.492.49
1.75-3.081.77-3.121.77-3.121.77-3.121.77-3.121.77-3.121.77-3.121.86-3.321.86-3.321.86-3.321.87-3.301.86-3.321.86-3.32
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001
History of imprisonment in months 1.021.021.021.021.021.021.021.021.021.021.021.021.02
1.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.02
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001

OR: Odds ratios; CI: Confidence interval; RR: Rubin’s rule

In addition, duration of stay in prison was positively associated with risk of drug injection. One month increase in imprisonment time was associated with 2% increase in risk of drug injection (OR = 1.02, P < 0.001). At EPV of 25, with 20% and 50% missing rates, all variable selection methods provided results almost the same as the full model (Table 2). No considerable bias was observed in estimation of OR and its associated SD. Even the performance of S1 approach was satisfying. At EPV of 5, only the significance of marital status was lost in the full model (Table 3). However, performances of all variable selection methods were comparable. Both RR and less time demanding approaches were able to detect significance of all 5 variables.
Table 3

Comparison of performance of variable selection methods at EPV (Event per variable) of 5

VariableFull50% missing rate
20% missing rate
S1
S2
S3
W1
W2
RR
S1
S2
S3
W1
W2
RR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
OR
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
95% CI
PPPPPPPPPPPPP
The onset of drug use0.910.920.920.920.920.920.920.910.910.910.910.910.91
0.88-0.950.88-0.950.88-0.950.88-0.950.88-0.950.89-0.950.88-0.950.87-0.950.87-0.950.87-0.950.88-0.950.88-0.950.87-0.95
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001
Using heroin-crack in last month before recent imprisonment6.826.826.826.826.826.826.826.906.906.906.906.906.90
4.10-11.344.10-11.344.10-11.344.10-11.344.10-11.334.10-11.414.10-11.344.15-11.484.15-11.484.15-11.484.15-11.494.14-11.504.15-11.48
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001<0.001< 0.001< 0.001< 0.001< 0.001
Marital statusMarried-------------
Single-------------
-------------
-------------
Divorce/Widow1.821.931.931.931.931.931.931.851.851.851.851.851.85
1.13-2.951.19-3.121.19-3.121.19-3.121.20-3.101.19-3.121.19-3.121.15-2.991.15-2.991.15-2.991.15-2.981.15-2.991.15-2.99
0.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.0100.010
History of imprisonment in months1.011.011.011.011.011.011.011.011.011.011.011.011.01
1.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.021.01-1.02
< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001< 0.001

OR: Odds ratios; CI: Confidence interval; RR: Rubin’s rule

Discussion

It has been argued that drug use among prisoners is prevalent, and quality of inmate programs inside prisons is much poorer than those for the general population.8,10,25 Evidence from studies in Iran showed that those who had a history of imprisonment in the past year had 38% rise in risk on needle and syringe sharing.4 In addition, a local study in Fars province revealed that the prevalence rates of HCV infection among incarcerated drug users was about 78%.26 This indicates that prisoners are at greater risk for some of the harms associated with drug use, and need a special care system. The first aspect of our work was to reveal variables linked to drug injection in prison. Based on these result, age of drug initiation has a negative effect on drug injection. This means that the longer the period of the drug use, the higher the chance to inject drugs in prison; in addition, the longer the period of imprisonment, the greater the risk of drug injection. Being single, and use of heroin or methadone in the last month before imprisonments were also positively associated with the outcome. Studies in different regions of the world, such as western and southern Europe, Russia, Canada, Brazil, Iran, and Thailand, have shown that a history of imprisonment is associated with HIV, HCV, or hepatitis B virus (HBV) infection among IDUs.7 The negative relationship between age of drug initiation and drug injection has also been confirmed in a Northern Thailand study.27 In a similar study in Germany, it has been found that a history of imprisonment is associated with 50% increase in risk of HBV seropositivity.28 Moreover, a history of syringe sharing in prison is significantly associated with HBV, hepatitis B virus (HCV), and HIV infections.7 Another study in Nigeria showed that the duration of imprisonment and history of previous incarceration were significantly associated with HBV seropositivity.29 The second aspect of our work was to compare alternative variable selection methods when multiple imputed data sets exist. RR is the standard tool to analyze the missing data. However, application of this method in conjunction with BE variable selection method is time demanding. This highlights the importance of other algorithms to do the process of modeling. Comparing imputation approaches, our results showed that performance of S1, S2, S3, W1, and W2 were similar to that of RR. S1 and S2 methods suggest that, in a screening round, variables reached significance level in at least 10% or 50% of generated data sets, and can be selected as candidate for multifactorial model. S3 is more conservative as it selects variables significant in all 10 data sets for multifactorial modeling. In addition, performance of weighed regression models was similar to RR. This hugely decreases the burden of the modeling process. Wood et al. compared performance of RR, S1, S2, S3, W1, W2, and some other variable selection methods through a simulation study. The main criterion used to compare the models were power (the probability that a method correctly selects a given variable from the true model) and ‘type 1 error’ (the probability that a method wrongly selects a given variable not from the true model). Under logistic regression model simulations, results of RR were similar to the true model. S1 had highest type one error while performance of S3 was similar to the true model. The authors concluded that S3 works better than S1 and S2. In addition, it has been suggested that merging of data and application of weighted regression models is a good approximation of RR.22 One of the limitations of the study by Wood et al.22 was that only monotonic forms of associations were studied. The majority of studies showed that while standard regression models failed to detect significance of a variable, non-linear regression models were able to identify complex forms of association. Fractional polynomial (FP) model is a powerful tool to capture the optimum form of association between variables and outcome. This method applies a range of power transformations to independent variables and selects the one with the highest goodness of fit.30-33 If we analyze 10 imputed data sets independently, shape of association between independent variable and outcome can be checked across data sets. If shapes are not the same, ad hoc decision about combination of shapes across data sets should be made. Therefore, application of S1, S2, and S3 might not be valid. Another limitation of the study by Wood et al.22 was that the majority of scenarios were implemented for continuous outcome. In the case of binary outcome, missing data were generated under missing completely at random (MACR). In addition, 10% of information of each independent variable was dropped out. However, final attrition rate and EPV was not given. One of the limitations of our study was that a limited number of independent variables which can predict the outcome were available. Another limitation of this study was the nature of variables. The majority of independent variables were dichotomous. Therefore, the issue of impact of continuous independent variables and the shape of association remains to be addressed. We should also highlight that we only generated one data set at two EPVs and compared their performance under two missing rates (4 scenarios in total). We believe that under each scenario multiple data sets should be generated to address the impact of sampling variation. This can be an issue to be addressed in independent studies.

Conclusion

Besides these limitations, results of our study suggest that alternative variable selection methods, for example S1, S2, and S3, provide results which are comparable with RR. Application of such methods and comparison of results is recommended. This provides the opportunity to understand the scenarios in which easier variable selection methods provide results comparable with complicated methods.
  22 in total

1.  Building multivariable regression models with continuous covariates in clinical epidemiology--with an emphasis on fractional polynomials.

Authors:  P Royston; W Sauerbrei
Journal:  Methods Inf Med       Date:  2005       Impact factor: 2.176

2.  Multiple imputation by chained equations: what is it and how does it work?

Authors:  Melissa J Azur; Elizabeth A Stuart; Constantine Frangakis; Philip J Leaf
Journal:  Int J Methods Psychiatr Res       Date:  2011-03       Impact factor: 4.035

3.  How should variable selection be performed with multiply imputed data?

Authors:  Angela M Wood; Ian R White; Patrick Royston
Journal:  Stat Med       Date:  2008-07-30       Impact factor: 2.373

4.  History of syringe sharing in prison and risk of hepatitis B virus, hepatitis C virus, and human immunodeficiency virus infection among injecting drug users in Berlin.

Authors:  K Stark; U Bienzle; R Vonk; I Guggenmoos-Holzmann
Journal:  Int J Epidemiol       Date:  1997-12       Impact factor: 7.196

Review 5.  Missing data analysis using multiple imputation: getting to the heart of the matter.

Authors:  Yulei He
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2010-01

6.  Human immunonodeficiency virus, hepatitis B virus and hepatitis C virus: sero-prevalence, co-infection and risk factors among prison inmates in Nasarawa State, Nigeria.

Authors:  Moses P Adoga; Edmund B Banwat; Joseph C Forbi; Lohya Nimzing; Christopher R Pam; Silas D Gyar; Yusuf A Agabi; Simon M Agwale
Journal:  J Infect Dev Ctries       Date:  2009-08-30       Impact factor: 0.968

7.  Seroprevalence of HBV, HCV and HIV infection among intravenous drug users in Shahr-e-Kord, Islamic Republic of Iran.

Authors:  R Imani; A Karimi; R Rouzbahani; A Rouzbahani
Journal:  East Mediterr Health J       Date:  2008 Sep-Oct       Impact factor: 1.628

8.  Does the missing data imputation method affect the composition and performance of prognostic models?

Authors:  M R Baneshi; A R Talei
Journal:  Iran Red Crescent Med J       Date:  2012-01-01       Impact factor: 0.611

9.  Patterns of drug use among a sample of drug users and injecting drug users attending a General Practice in Iran.

Authors:  Carolyn Day; Bijan Nassirimanesh; Anthony Shakeshaft; Kate Dolan
Journal:  Harm Reduct J       Date:  2006-01-24

10.  Profiles of risk: a qualitative study of injecting drug users in Tehran, Iran.

Authors:  Emran M Razzaghi; Afarin Rahimia Movaghar; Traci Craig Green; Kaveh Khoshnood
Journal:  Harm Reduct J       Date:  2006-03-18
View more
  1 in total

1.  Underreporting in HIV-related high-risk behaviors: comparing the results of multiple data collection methods in a behavioral survey of prisoners in Iran.

Authors:  Ali Mirzazadeh; Mostafa Shokoohi; Soodabeh Navadeh; Ahmad Danesh; Jennifer Jain; Abbas Sedaghat; Marziyeh Farnia; AliAkbar Haghdoost
Journal:  Prison J       Date:  2018-01-24
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.