Renee Dale1,2, BeiBei Guo1. 1. Department of Experimental Statistics, Louisiana State University, Baton Rouge, Louisiana, United States of America. 2. Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America.
Abstract
Current estimates of the HIV epidemic indicate a decrease in the incidence of the disease in the undiagnosed subpopulation over the past 10 years. However, a lack of access to care has not been considered when modeling the population. Populations at high risk for contracting HIV are twice as likely to lack access to reliable medical care. In this paper, we consider three contributors to the HIV population dynamics: at-risk population exhaustion, lack of access to care, and usage of anti-retroviral therapy (ART) by diagnosed individuals. An extant problem in the mathematical study of this system is deriving parameter estimates due to a portion of the population being unobserved. We approach this problem by looking at the proportional change in the infected subpopulations. We obtain conservative estimates for the proportional change of the infected subpopulations using hierarchical Bayesian statistics. The estimated proportional change is used to derive epidemic parameter estimates for a system of stochastic differential equations (SDEs). Model fit is quantified to determine the best parametric explanation for the observed dynamics in the infected subpopulations. Parameter estimates derived using these methods produce simulations that closely follow the dynamics observed in the data, as well as values that are generally in agreement with prior understanding of transmission and diagnosis rates. Simulations suggest that the undiagnosed population may be larger than currently estimated without significantly affecting the population dynamics.
Current estimates of the HIV epidemic indicate a decrease in the incidence of the disease in the undiagnosed subpopulation over the past 10 years. However, a lack of access to care has not been considered when modeling the population. Populations at high risk for contracting HIV are twice as likely to lack access to reliable medical care. In this paper, we consider three contributors to the HIV population dynamics: at-risk population exhaustion, lack of access to care, and usage of anti-retroviral therapy (ART) by diagnosed individuals. An extant problem in the mathematical study of this system is deriving parameter estimates due to a portion of the population being unobserved. We approach this problem by looking at the proportional change in the infected subpopulations. We obtain conservative estimates for the proportional change of the infected subpopulations using hierarchical Bayesian statistics. The estimated proportional change is used to derive epidemic parameter estimates for a system of stochastic differential equations (SDEs). Model fit is quantified to determine the best parametric explanation for the observed dynamics in the infected subpopulations. Parameter estimates derived using these methods produce simulations that closely follow the dynamics observed in the data, as well as values that are generally in agreement with prior understanding of transmission and diagnosis rates. Simulations suggest that the undiagnosed population may be larger than currently estimated without significantly affecting the population dynamics.
The human immunodeficiency virus (HIV) progresses in three stages. The first stage lasts approximately 3 months and individuals in this stage are approximately 10 to 25 times more effective at transmitting the disease [1-3]. The chronic stage can last from 5-10 years without medication [4]. This is followed by acquired immunodeficiency syndrome (AIDS) and death shortly thereafter [1-3]. Individuals with HIV may go many years without diagnosis, during which time they may expose uninfected individuals to HIV. Efforts to improve the diagnosis rate include educational programs, as an individual’s perceived risk was shown to be highly correlated with the individual obtaining multiple HIV tests [5-8]. Several studies have found a 50% reduction in risky behaviors after diagnosis, including safer sex practices, reduction in partner number, and medications that reduce viral load [9, 10]. However, diagnosis events resulting in behavior modification are not thought to be sufficient to eradicate the epidemic [9, 11].After diagnosis, infected individuals have the opportunity to take anti-retroviral therapy (ART) that reduces their viral load and retards the progression of the disease, providing an improvement in life expectancy [12, 13]. The earlier that ART is received the higher the reduction in transmission events, particularly if obtained during the initial stage of HIV [1-3]. ART therapies could eradicate the epidemic in a population with high prevalence of infection even without the additional effect of behavioral changes [11]. Mathematical models estimate that the HIV epidemic could be reduced to less than 1% of the population infected (elimination phase) with universal testing and by providing ART consistently to newly diagnosed individuals [14]. However, issues with adherence and resistance are well documented in the literature [15-20]. Patients tend to report their adherence as much higher than it actually is, but studies indicate that even low adherence may be sufficient for control of the epidemic [10, 15]. Transmission is rare for individuals on ART, even with relatively high plasma HIV concentrations [21, 22].The largest barrier to eradication of the epidemic is lack of access to care, including diagnostic services and ART costs or prescriptions. A lack in access to care could create pockets of undiagnosed individuals while the overall trend appears to be a reduction in the size of the epidemic [23]. Various studies report between 50–96% of diagnosed individuals in the U.S. rely on public medical care to obtain their ART medications [10, 20, 24–26]. Access to care remains critical, but this has not been considered when modeling the dynamics of the epidemic [5, 6].Estimates using CD4 levels of newly diagnosed individuals suggest that the undiagnosed population is decreasing between 2005–2013 in the US [4, 27]. CD4 levels can be used to estimate the progression of HIV [28]. We consider three possible causes for this decrease including exhaustion of the at-risk population. The size of the at-risk population is not easy to estimate since it depends on behavior. High risk populations include individuals in poverty and men who have sex with men (MSM) [4]. This is particularly critical in the southern U.S., where individuals tend to be poor and lack access to medical care [4, 23]. As the at-risk population decreases the number of diagnoses will also decrease, which will cause the estimated number of undiagnosed individuals to decrease.An additional possibility is that the reduction in number of diagnoses is due to individuals lacking access to care. HIV is over-represented in impoverished populations where access to diagnosis and treatment may be more difficult to obtain. In this case, the number of newly diagnosed individuals is not representative of the number of undiagnosed individuals, and the estimates will be inaccurate. Finally, the usage of anti-retroviral therapies reduces the viral load and transmission potential of infected individuals.The difficulty in studying this system mathematically lies in parameter estimates. A minimal model of this system requires at least three parameters: transmission of the disease, diagnosis of the disease, and death due to the disease. Since knowledge about the undiagnosed population is restricted to those who have been diagnosed, estimates of these parameters are generally forced to assume that this population is representative of the whole.In this work we use coupled statistical and mathematical methodology to study the relationships between the three hypothesized causes and their respective population dynamics. We use hierarchical Bayesian statistics to get estimates for the size of the infected populations and their proportional changes across the years. These estimates are used to calculate epidemiological parameters for a system of stochastic differential equations. Currently we are not aware of any similar methods in the literature. Such a problem is challenging as the proportional change across the populations is a hyperparameter controlling the yearly proportions, which each have their own statistical model. This results in a large model that must be studied numerically.The resulting simulations give insight into the implications of the estimated undiagnosed population on epidemiological parameters. Our model suggests that the undiagnosed population may be larger than current estimates while recovering population dynamics. The best recovery of the dynamics observed in the existing data occurs when the increase in the diagnosed population due to diagnosis is greater than the decrease in the undiagnosed population. We hope this study will help inform future efforts to improve the situation of infected individuals and prevent future outbreaks.
Materials and methods
Bayesian statistics
A Markov model where p centered at qp is used to estimate the proportional change in the infected populations over time, where p is the proportion in the current year and q is the proportional change. These random variables are estimated using Bayesian statistics.The sampling model is x ∼ Bin(n, p), where n is population size in the current year. The random variable q is taken as a hyperparameter for p. The random variables to be estimated for each infected subpopulation are q and p, where t = 2005,…,2013. We estimate the random variables of undiagnosed and diagnosed subpopulations independently.
Prior
The prior for the proportional change q is a gamma distribution.The parameters were chosen so that the prior distribution was centered at the arithmetic estimates of q obtained from the CDC [4]. The arithmetic estimates were obtained by calculating:The arithmetic estimate for the undiagnosed q (q) is 0.979, and for the diagnosed q (q) 1.025. The priors used were GAM(9.79,10) and GAM(10.25,10) so they were centered at 1.The prior for the random variable p, the undiagnosed proportion, is a beta distribution centered at the previous proportion times q. The parameters of the beta distribution are α = 0.1n × qp, and β = 0.1n − α.In the case where t = 1, the previous undiagnosed proportion is taken to be the expert opinion of 20%, and the diagnosed proportion to be 1−p0(undiagnosed) [29]. The prior for the diagnosed population is formulated in the same way. Population sizes were considered in units of thousands.
Likelihood
The likelihood is a binomial likelihood, representing the chance of selecting an undiagnosed or diagnosed individual at random from the total infected population. For a given year t, the proportion of undiagnosed individuals depends on the total number of individuals:
where x is the total number of undiagnosed or diagnosed individuals, and n is the total number of infected individuals. The likelihood across all the years is the product of each year’s likelihood.
The likelihood for the diagnosed population was formulated in the same way.
Posterior
The joint posterior distribution is proportional to the priors multiplied by the likelihoods for all 9 years:The posterior full conditional of p for t = 2005,…,2012 is:The posterior full conditional of 2013, the 9th year, is:The full conditional of q does not have a closed form. The forms of the diagnosed random variables are the same. Random variable estimates were obtained using Metropolis-Hastings nested within a Gibbs sampler over 100,000 iterations with R version 3.3.3 [30]. The proposal distribution was a truncated normal distribution since the hyperparameter q cannot be negative, using package rmutil [31], centered at the previous value of the parameter. Proportions 1 through 9 were sampled consecutively, followed by hyperparameter q. The trace plots quickly converged within 200 iterations, and the first 2000 samples were removed. Code is provided in S3 and S4 Files.
Stochastic differential equations
The hyperparameter q was estimated to be 0.982 for the undiagnosed population and 1.036 for the diagnosed population. These were used as a boundary to solve for the epidemiological parameters in a simple stochastic differential model.
where U is the undiagnosed and D is the diagnosed populations, and dω ∼ Nor(0, σ) is Brownian white noise with units . The variance σ is chosen to be 10% of the size of the population.The simplest model is constructed describing the dynamics of the infected subpopulations. The values of parameters transmission (τ), diagnosis (δ), and death (ϵ) are calculated using the constraining q:We consider the parameters pseudo-steady state, and use the 2005 population sizes to estimate them. In addition, we assume that the general population are at steady state, and consider only the increased death rate due to infection ϵ as 0.01 [4]. The diagnosis rate δ is estimated by:
and the transmission rate τ is estimated by:Due to the magnitude of the scale of this system we assume that all events will happen, and the source of the stochasticity is primarily reporting issues. Tau leap algorithm was used to preform the stochastic simulations. A time step of 1 year was selected, and the population at time t+1 is the numerical solution of the population at time t and random noise from a NOR(0,σ), where σ is 10% of the population at time t0 with units . The initial conditions for the infected populations were sampled from the posterior distributions obtained by the Bayesian estimates. Calculations were performed in Matlab [32] and code is available upon request.The diagnosis rate was calculated using data from [27]. The at-risk population is estimated as twice the national average rate of self-identified MSM among adults. The mortality rate increase due to HIV was estimated using data from [4]. All calculations, including the effective parameter rates, are provided in S1 File.
Results
Bayesian model
Bayesian estimates for the proportions of diagnosed or undiagnosed individuals was obtained concurrently with the estimated proportional change. The prior distribution was chosen to be a beta for the proportions and a gamma for the proportional changes. The likelihood function was a binomial, representing the chance of randomly selecting a diagnosed or undiagnosed individual from a pool of infected individuals. The posterior did not have a closed form. Due to the symmetry of the posterior samples we summarize them using their mean and variance. The posterior means of the proportions for both undiagnosed and diagnosed estimates are very close to the original data (Fig 1) [27]. The posterior mean of q is 0.982, and q is 1.036. This means that 98% of the undiagnosed population is preserved from year to year, or is dropping by about 2% per year. Similarly, the diagnosed population is increasing by 3.6% per year. Posterior means and variances are given in Table 1.
Fig 1
Posterior information obtained from hierarchical Bayesian statistics.
Bayesian estimates are shown as hollow squares with error bars showing standard deviations. Estimated proportion of diagnosed (pink) and undiagnosed (blue) populations recover the estimated proportions (circles) [27].
Table 1
Summary statistics of the posterior distribution.
The parameter p represents the estimated size of the proportion in that year. The hyperparameter q represents the estimated proportional change of the population across all years.
Diagnosed
Undiagnosed
Parameters
Mean
Variance
Parameters
Mean
Variance
p2005
0.78
0.0026
p2005
0.22
0.0026
p2006
0.79
0.0026
p2006
0.21
0.0026
p2007
0.80
0.0026
p2007
0.20
0.0027
p2008
0.81
0.0026
p2008
0.19
0.0027
p2009
0.81
0.0026
p2009
0.19
0.0026
p2010
0.82
0.0026
p2010
0.18
0.0026
p2011
0.82
0.0026
p2011
0.18
0.0026
p2012
0.83
0.0026
p2012
0.17
0.0026
p2013
0.84
0.0026
p2013
0.16
0.0026
qd
1.036
0.2355
qu
0.982
0.0939
Posterior information obtained from hierarchical Bayesian statistics.
Bayesian estimates are shown as hollow squares with error bars showing standard deviations. Estimated proportion of diagnosed (pink) and undiagnosed (blue) populations recover the estimated proportions (circles) [27].
Summary statistics of the posterior distribution.
The parameter p represents the estimated size of the proportion in that year. The hyperparameter q represents the estimated proportional change of the population across all years.
Stochastic differential model
The Bayesian estimates of the proportional change in the diagnosed and undiagnosed population from 2005 to 2013 were used to determine the epidemiological parameters for a system of stochastic differential equations. The parameters transmission (τ), diagnosis (δ), and death (ϵ) were calculated using the proportional changes in the respective population.
where U is the undiagnosed and D is the diagnosed populations, and dω ∼ Nor(0, σ) is the noise term. These base estimates fit the data very well (Fig 2).
Fig 2
Method validation.
The method was tested by simulating with the epidemiological parameters calculated using the Bayesian estimates of the proportional changes as constraints. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.
Method validation.
The method was tested by simulating with the epidemiological parameters calculated using the Bayesian estimates of the proportional changes as constraints. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.
Exhaustion of Susceptibles
In the case where the at-risk population is not much larger than the infected population, the transmission is dependent on the size of both populations. We estimate the at-risk population size as a fraction of the total infected population:Then this is substituted into the model. The transmission term becomesThis gives an effective increase of fτ in the transmission rate (Table 2). This increase causes the simulations to fail to recover the diagnosed and undiagnosed population dynamics, although the at-risk population does decrease significantly (Fig 3). This result is intuitive since the infection rate is increased, but the diagnosis rate is not representative of this rate.
Table 2
Transmission and diagnosis rates are different under the different hypotheses.
Average likelihood across both populations and all years (Fig 6, S2 File).
Model
Transmission Rate
Diagnosis Rate
Likelihood
Base model
τ(U + D)
δU
0.87
Exhaustion of Susceptibles (ES)
τf(U + D)2
δU
0.16
Anti-retroviral Therapies (ART)
τ(U + 0.04D)
δU
0.72
Lack of Access to Care (LAC)
τ(U + D)
δ0
0.87
ES and ART
τf(U + 0.04D)2
δU
0.52
ES and LAC
τf(U + D)2
δ0
0.54
ART and LAC
τ(U + 0.04D)
δ0
0.70
ES, LAC, and ART
τf(U + 0.04D)2
δ0
0.58
Fig 3
Exhaustion of susceptibles.
Transmission of the disease is altered to reflect the impact of the size of the at-risk population. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.
Exhaustion of susceptibles.
Transmission of the disease is altered to reflect the impact of the size of the at-risk population. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.
Transmission and diagnosis rates are different under the different hypotheses.
Average likelihood across both populations and all years (Fig 6, S2 File).
Fig 6
Model fit was quantified by calculating the relative likelihood of observing the data within the simulations.
A higher likelihood is represented by a hotter color. From left to right: Base model, Exhaustion of Susceptibles (ES), Lack of Access to Care (LAC), Anti-Retroviral Therapy usage (ART), ES and LAC, ES and ART, LAC and ART, and ES, LAC, and ART. Details provided in S2 File.
Lack of access to care
Lack of access to care may be conceptualized as pockets of undiagnosed individuals who are not being diagnosed. To capture this, we consider the diagnosis rate to be independent of the size of the undiagnosed population. The diagnosis rate is estimated as:The resulting equation for the undiagnosed subpopulation then becomes:
where δ0 is 0.036. This large reduction in the diagnosis rate recovers the population dynamics well (Fig 4).
Fig 4
Lack of access to care.
The effect of undiagnosed individuals lacking access to care affects the rate of diagnosis of the undiagnosed individuals. The diagnosis rate is held constant to reflect this scenario. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.
Lack of access to care.
The effect of undiagnosed individuals lacking access to care affects the rate of diagnosis of the undiagnosed individuals. The diagnosis rate is held constant to reflect this scenario. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.
ART usage
Since ART results in a viral load that has low chance of infecting a at-risk individual, we removed these individuals from the pool of infected individuals able to transmit the disease. Since 96% of diagnosed individuals reported taking anti-retroviral therapies in a previous study, the transmission term was modified as follows [10].Variable or poor adherence on the part of diagnosed individuals is ignored due to the body of literature indicating that large benefit is gained from even poor adherence [15, 16]. This gives good recovery of both subpopulation dynamics and agrees best with both undiagnosed and diagnosed estimates (Fig 5).
Fig 5
Anti-retroviral therapy usage.
To reflect the high levels of ART prescription and usage reported by diagnosed individuals this percentage is removed from the pool of diagnosed individuals able to transmit the disease. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.
Anti-retroviral therapy usage.
To reflect the high levels of ART prescription and usage reported by diagnosed individuals this percentage is removed from the pool of diagnosed individuals able to transmit the disease. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.
Multiple causes
Since it seems likely that most or all of these scenarios affect the infected population simultaneously, we analyze all their possible combinations (S1 Fig). The parameters were altered as described in Table 2. To determine the best cause, we quantify the goodness of fit by determining the relative likelihood of observing the data given the mean and variance of the stochastic simulations. These probabilities are given in Fig 6 with numerical details in S2 File, as well as the average probability over the 9 years.
Model fit was quantified by calculating the relative likelihood of observing the data within the simulations.
A higher likelihood is represented by a hotter color. From left to right: Base model, Exhaustion of Susceptibles (ES), Lack of Access to Care (LAC), Anti-Retroviral Therapy usage (ART), ES and LAC, ES and ART, LAC and ART, and ES, LAC, and ART. Details provided in S2 File.Lack of access to care, ART usage, or their combined resulted in the best recovery of the data for both the undiagnosed and diagnosed populations. Under the ART scenario, the diagnosed population has been reduced by 96%, resulting in a dynamic reduction in the transmission rate. We originally estimate the transmission rate to be 3.4% of the total infected population. This is close to the literature estimate of around 4-6% of infected individuals transmitting HIV to an at-risk individual [29, 33]. With the majority of the diagnosed population removed, the effective transmission rate is much lower. Under the LAC scenario, there is a constant diagnosis rate. This represents a yearly reduction in the undiagnosed population and increase in the diagnosed population of 3.6%. This means the diagnosed population grows faster than the undiagnosed population is reduced. Although lack of access to care in the undiagnosed population would mean the data are inaccurate, in our case the best fitting model is consistent for both subpopulations.
Discussion
We were able to obtain conservative estimates of the proportional changes in the diagnosed and undiagnosed HIV-infected populations using hierarchical Bayesian statistics. Our estimates suggest that the proportion of infected individuals who are undiagnosed is decreasing by approximately 2.8% each year from 2005 to 2013, while the proportion of diagnosed individuals is increasing by approximately 3.6%. We used the proportional change as constraints on a system of stochastic differential equations. This allowed us to estimate the transmission and diagnosis rates. We were able to recover reasonable parameter estimates and population dynamics using this methodology. To learn more about the cause of the decrease in the undiagnosed population, we considered some scenarios that would affect the epidemiological parameters: exhaustion of the at-risk population, lack of access to care, and reduction in viral load by anti-retroviral therapy.We were able to recover the diagnosed population dynamics when we altered the parameters to reflect these scenarios with the exception of including exhaustion of susceptibles. Modeling transmission as a function of the size of the at-risk population caused the size of the infected populations to increase rapidly. In the other scenarios some interesting dynamics could be observed in the undiagnosed population. Lack of access to care was simulated by considering diagnosis rate a constant unaffected by the size of the undiagnosed population. This resulted in an improvement in the likelihood of observing the data (Fig 6, S2 File). Anti-retroviral therapy usage also improved the overall recovery, but this effect was weaker for the undiagnosed population dynamics. Although the undiagnosed population size is dependent on the quality of the data available on the diagnosed population of that year, these results indicate that the scenarios that maximizes the probability of observing the diagnosed population also maximizes the probability of observing the diagnosed population estimates.The observed results suggest that lack of access to care and ART usage contribute to the infected population dynamics. This is not unexpected. Many individuals with HIV are reported to lack access to care [23, 34]. In areas with high poverty rates the death rate of infected individuals is much higher than that of the general population [4, 35]. In 2017 the New York Times reported groups of untreated individuals in the deep south dying due to their lack of access to care [23]. The southern US in particular suffers from high rates of STDs and lack of access to care that exacerbates the spread of HIV [36, 37]. In 2007 the number of uninsured individuals had risen by 60% since 2003, with those living in poverty over-represented [38]. Although the Affordable Care Act increased the number of individuals eligible for Medicaid, physician retention issues that continue to plague the majority of rural areas are exacerbated with increasing numbers of Medicaid patients [39, 40].Both models and studies have shown that providing ART to infected individuals in the early stages of HIV reduces transmission events and frequency of death due to AIDS [11, 14, 19–22]. Even poor adherence may be enough to control or eradicate the epidemic and increase quality of life for infected individuals [16-18]. Greater effort must be made to ensure these populations have access to life-saving treatments.
Calculations.
Calculations for stochastic modeling using data obtained from [4, 27].(XLSX)Click here for additional data file.
Model fit.
Details of the model fit probabilities for each year and simulation.(PDF)Click here for additional data file.
Matlab code.
Code used to run simulations and generate heatmap.(M)Click here for additional data file.
R Code—Diagnosed.
Code used to run Bayesian simulations.(R)Click here for additional data file.
R Code—Undiagnosed.
Code used to run Bayesian simulations.(R)Click here for additional data file.
Combined cases.
Stochastic simulations for the combined cases. (A) Exhaustion of Susceptibles and lack of access to care. (B) Exhaustion of Susceptibles and ART. (C) Lack of access to care and ART. (D) Exhaustion of Susceptibles, lack of access to care, and ART.(PDF)Click here for additional data file.
Authors: Julio S G Montaner; Robert Hogg; Evan Wood; Thomas Kerr; Mark Tyndall; Adrian R Levy; P Richard Harrigan Journal: Lancet Date: 2006-08-05 Impact factor: 79.321
Authors: Janni J Kinsler; Mitchell D Wong; Jennifer N Sayles; Cynthia Davis; William E Cunningham Journal: AIDS Patient Care STDS Date: 2007-08 Impact factor: 5.078
Authors: David R Bangsberg; Edward P Acosta; Reena Gupta; David Guzman; Elise D Riley; P Richard Harrigan; Neil Parkin; Steven G Deeks Journal: AIDS Date: 2006-01-09 Impact factor: 4.177
Authors: Erik M Volz; Edward Ionides; Ethan O Romero-Severson; Mary-Grace Brandt; Eve Mokotoff; James S Koopman Journal: PLoS Med Date: 2013-12-10 Impact factor: 11.069