Literature DB >> 30044818

Estimating epidemiological parameters of a stochastic differential model of HIV dynamics using hierarchical Bayesian statistics.

Abstract

Current estimates of the HIV epidemic indicate a decrease in the incidence of the disease in the undiagnosed subpopulation over the past 10 years. However, a lack of access to care has not been considered when modeling the population. Populations at high risk for contracting HIV are twice as likely to lack access to reliable medical care. In this paper, we consider three contributors to the HIV population dynamics: at-risk population exhaustion, lack of access to care, and usage of anti-retroviral therapy (ART) by diagnosed individuals. An extant problem in the mathematical study of this system is deriving parameter estimates due to a portion of the population being unobserved. We approach this problem by looking at the proportional change in the infected subpopulations. We obtain conservative estimates for the proportional change of the infected subpopulations using hierarchical Bayesian statistics. The estimated proportional change is used to derive epidemic parameter estimates for a system of stochastic differential equations (SDEs). Model fit is quantified to determine the best parametric explanation for the observed dynamics in the infected subpopulations. Parameter estimates derived using these methods produce simulations that closely follow the dynamics observed in the data, as well as values that are generally in agreement with prior understanding of transmission and diagnosis rates. Simulations suggest that the undiagnosed population may be larger than currently estimated without significantly affecting the population dynamics.

Entities: Chemical Disease Species

Mesh：

Substances：
Anti-HIV Agents

Year: 2018 PMID： 30044818 PMCID： PMC6059410 DOI： 10.1371/journal.pone.0200126

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The human immunodeficiency virus (HIV) progresses in three stages. The first stage lasts approximately 3 months and individuals in this stage are approximately 10 to 25 times more effective at transmitting the disease [1-3]. The chronic stage can last from 5-10 years without medication [4]. This is followed by acquired immunodeficiency syndrome (AIDS) and death shortly thereafter [1-3]. Individuals with HIV may go many years without diagnosis, during which time they may expose uninfected individuals to HIV. Efforts to improve the diagnosis rate include educational programs, as an individual’s perceived risk was shown to be highly correlated with the individual obtaining multiple HIV tests [5-8]. Several studies have found a 50% reduction in risky behaviors after diagnosis, including safer sex practices, reduction in partner number, and medications that reduce viral load [9, 10]. However, diagnosis events resulting in behavior modification are not thought to be sufficient to eradicate the epidemic [9, 11]. After diagnosis, infected individuals have the opportunity to take anti-retroviral therapy (ART) that reduces their viral load and retards the progression of the disease, providing an improvement in life expectancy [12, 13]. The earlier that ART is received the higher the reduction in transmission events, particularly if obtained during the initial stage of HIV [1-3]. ART therapies could eradicate the epidemic in a population with high prevalence of infection even without the additional effect of behavioral changes [11]. Mathematical models estimate that the HIV epidemic could be reduced to less than 1% of the population infected (elimination phase) with universal testing and by providing ART consistently to newly diagnosed individuals [14]. However, issues with adherence and resistance are well documented in the literature [15-20]. Patients tend to report their adherence as much higher than it actually is, but studies indicate that even low adherence may be sufficient for control of the epidemic [10, 15]. Transmission is rare for individuals on ART, even with relatively high plasma HIV concentrations [21, 22]. The largest barrier to eradication of the epidemic is lack of access to care, including diagnostic services and ART costs or prescriptions. A lack in access to care could create pockets of undiagnosed individuals while the overall trend appears to be a reduction in the size of the epidemic [23]. Various studies report between 50–96% of diagnosed individuals in the U.S. rely on public medical care to obtain their ART medications [10, 20, 24–26]. Access to care remains critical, but this has not been considered when modeling the dynamics of the epidemic [5, 6]. Estimates using CD4 levels of newly diagnosed individuals suggest that the undiagnosed population is decreasing between 2005–2013 in the US [4, 27]. CD4 levels can be used to estimate the progression of HIV [28]. We consider three possible causes for this decrease including exhaustion of the at-risk population. The size of the at-risk population is not easy to estimate since it depends on behavior. High risk populations include individuals in poverty and men who have sex with men (MSM) [4]. This is particularly critical in the southern U.S., where individuals tend to be poor and lack access to medical care [4, 23]. As the at-risk population decreases the number of diagnoses will also decrease, which will cause the estimated number of undiagnosed individuals to decrease. An additional possibility is that the reduction in number of diagnoses is due to individuals lacking access to care. HIV is over-represented in impoverished populations where access to diagnosis and treatment may be more difficult to obtain. In this case, the number of newly diagnosed individuals is not representative of the number of undiagnosed individuals, and the estimates will be inaccurate. Finally, the usage of anti-retroviral therapies reduces the viral load and transmission potential of infected individuals. The difficulty in studying this system mathematically lies in parameter estimates. A minimal model of this system requires at least three parameters: transmission of the disease, diagnosis of the disease, and death due to the disease. Since knowledge about the undiagnosed population is restricted to those who have been diagnosed, estimates of these parameters are generally forced to assume that this population is representative of the whole. In this work we use coupled statistical and mathematical methodology to study the relationships between the three hypothesized causes and their respective population dynamics. We use hierarchical Bayesian statistics to get estimates for the size of the infected populations and their proportional changes across the years. These estimates are used to calculate epidemiological parameters for a system of stochastic differential equations. Currently we are not aware of any similar methods in the literature. Such a problem is challenging as the proportional change across the populations is a hyperparameter controlling the yearly proportions, which each have their own statistical model. This results in a large model that must be studied numerically. The resulting simulations give insight into the implications of the estimated undiagnosed population on epidemiological parameters. Our model suggests that the undiagnosed population may be larger than current estimates while recovering population dynamics. The best recovery of the dynamics observed in the existing data occurs when the increase in the diagnosed population due to diagnosis is greater than the decrease in the undiagnosed population. We hope this study will help inform future efforts to improve the situation of infected individuals and prevent future outbreaks.

Materials and methods

Bayesian statistics

A Markov model where p centered at qp is used to estimate the proportional change in the infected populations over time, where p is the proportion in the current year and q is the proportional change. These random variables are estimated using Bayesian statistics. The sampling model is x ∼ Bin(n, p), where n is population size in the current year. The random variable q is taken as a hyperparameter for p. The random variables to be estimated for each infected subpopulation are q and p, where t = 2005,…,2013. We estimate the random variables of undiagnosed and diagnosed subpopulations independently.

Prior

The prior for the proportional change q is a gamma distribution. The parameters were chosen so that the prior distribution was centered at the arithmetic estimates of q obtained from the CDC [4]. The arithmetic estimates were obtained by calculating: The arithmetic estimate for the undiagnosed q (q) is 0.979, and for the diagnosed q (q) 1.025. The priors used were GAM(9.79,10) and GAM(10.25,10) so they were centered at 1. The prior for the random variable p, the undiagnosed proportion, is a beta distribution centered at the previous proportion times q. The parameters of the beta distribution are α = 0.1n × qp, and β = 0.1n − α. In the case where t = 1, the previous undiagnosed proportion is taken to be the expert opinion of 20%, and the diagnosed proportion to be 1−p0(undiagnosed) [29]. The prior for the diagnosed population is formulated in the same way. Population sizes were considered in units of thousands.

Likelihood

The likelihood is a binomial likelihood, representing the chance of selecting an undiagnosed or diagnosed individual at random from the total infected population. For a given year t, the proportion of undiagnosed individuals depends on the total number of individuals: where x is the total number of undiagnosed or diagnosed individuals, and n is the total number of infected individuals. The likelihood across all the years is the product of each year’s likelihood. The likelihood for the diagnosed population was formulated in the same way.

Posterior

The joint posterior distribution is proportional to the priors multiplied by the likelihoods for all 9 years: The posterior full conditional of p for t = 2005,…,2012 is: The posterior full conditional of 2013, the 9th year, is: The full conditional of q does not have a closed form. The forms of the diagnosed random variables are the same. Random variable estimates were obtained using Metropolis-Hastings nested within a Gibbs sampler over 100,000 iterations with R version 3.3.3 [30]. The proposal distribution was a truncated normal distribution since the hyperparameter q cannot be negative, using package rmutil [31], centered at the previous value of the parameter. Proportions 1 through 9 were sampled consecutively, followed by hyperparameter q. The trace plots quickly converged within 200 iterations, and the first 2000 samples were removed. Code is provided in S3 and S4 Files.

Stochastic differential equations

The hyperparameter q was estimated to be 0.982 for the undiagnosed population and 1.036 for the diagnosed population. These were used as a boundary to solve for the epidemiological parameters in a simple stochastic differential model. where U is the undiagnosed and D is the diagnosed populations, and dω ∼ Nor(0, σ) is Brownian white noise with units . The variance σ is chosen to be 10% of the size of the population. The simplest model is constructed describing the dynamics of the infected subpopulations. The values of parameters transmission (τ), diagnosis (δ), and death (ϵ) are calculated using the constraining q: We consider the parameters pseudo-steady state, and use the 2005 population sizes to estimate them. In addition, we assume that the general population are at steady state, and consider only the increased death rate due to infection ϵ as 0.01 [4]. The diagnosis rate δ is estimated by: and the transmission rate τ is estimated by: Due to the magnitude of the scale of this system we assume that all events will happen, and the source of the stochasticity is primarily reporting issues. Tau leap algorithm was used to preform the stochastic simulations. A time step of 1 year was selected, and the population at time t+1 is the numerical solution of the population at time t and random noise from a NOR(0,σ), where σ is 10% of the population at time t0 with units . The initial conditions for the infected populations were sampled from the posterior distributions obtained by the Bayesian estimates. Calculations were performed in Matlab [32] and code is available upon request. The diagnosis rate was calculated using data from [27]. The at-risk population is estimated as twice the national average rate of self-identified MSM among adults. The mortality rate increase due to HIV was estimated using data from [4]. All calculations, including the effective parameter rates, are provided in S1 File.

Results

Bayesian model

Bayesian estimates for the proportions of diagnosed or undiagnosed individuals was obtained concurrently with the estimated proportional change. The prior distribution was chosen to be a beta for the proportions and a gamma for the proportional changes. The likelihood function was a binomial, representing the chance of randomly selecting a diagnosed or undiagnosed individual from a pool of infected individuals. The posterior did not have a closed form. Due to the symmetry of the posterior samples we summarize them using their mean and variance. The posterior means of the proportions for both undiagnosed and diagnosed estimates are very close to the original data (Fig 1) [27]. The posterior mean of q is 0.982, and q is 1.036. This means that 98% of the undiagnosed population is preserved from year to year, or is dropping by about 2% per year. Similarly, the diagnosed population is increasing by 3.6% per year. Posterior means and variances are given in Table 1.

Fig 1

Posterior information obtained from hierarchical Bayesian statistics.

Bayesian estimates are shown as hollow squares with error bars showing standard deviations. Estimated proportion of diagnosed (pink) and undiagnosed (blue) populations recover the estimated proportions (circles) [27].

Table 1

Summary statistics of the posterior distribution.

The parameter p represents the estimated size of the proportion in that year. The hyperparameter q represents the estimated proportional change of the population across all years.

Diagnosed			Undiagnosed
Parameters	Mean	Variance	Parameters	Mean	Variance
p₂₀₀₅	0.78	0.0026	p₂₀₀₅	0.22	0.0026
p₂₀₀₆	0.79	0.0026	p₂₀₀₆	0.21	0.0026
p₂₀₀₇	0.80	0.0026	p₂₀₀₇	0.20	0.0027
p₂₀₀₈	0.81	0.0026	p₂₀₀₈	0.19	0.0027
p₂₀₀₉	0.81	0.0026	p₂₀₀₉	0.19	0.0026
p₂₀₁₀	0.82	0.0026	p₂₀₁₀	0.18	0.0026
p₂₀₁₁	0.82	0.0026	p₂₀₁₁	0.18	0.0026
p₂₀₁₂	0.83	0.0026	p₂₀₁₂	0.17	0.0026
p₂₀₁₃	0.84	0.0026	p₂₀₁₃	0.16	0.0026
q_d	1.036	0.2355	q_u	0.982	0.0939

Posterior information obtained from hierarchical Bayesian statistics.

Summary statistics of the posterior distribution.

The parameter p represents the estimated size of the proportion in that year. The hyperparameter q represents the estimated proportional change of the population across all years.

Stochastic differential model

The Bayesian estimates of the proportional change in the diagnosed and undiagnosed population from 2005 to 2013 were used to determine the epidemiological parameters for a system of stochastic differential equations. The parameters transmission (τ), diagnosis (δ), and death (ϵ) were calculated using the proportional changes in the respective population. where U is the undiagnosed and D is the diagnosed populations, and dω ∼ Nor(0, σ) is the noise term. These base estimates fit the data very well (Fig 2).

Fig 2

Method validation.

The method was tested by simulating with the epidemiological parameters calculated using the Bayesian estimates of the proportional changes as constraints. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.

Method validation.

Exhaustion of Susceptibles

In the case where the at-risk population is not much larger than the infected population, the transmission is dependent on the size of both populations. We estimate the at-risk population size as a fraction of the total infected population: Then this is substituted into the model. The transmission term becomes This gives an effective increase of fτ in the transmission rate (Table 2). This increase causes the simulations to fail to recover the diagnosed and undiagnosed population dynamics, although the at-risk population does decrease significantly (Fig 3). This result is intuitive since the infection rate is increased, but the diagnosis rate is not representative of this rate.

Table 2

Transmission and diagnosis rates are different under the different hypotheses.

Average likelihood across both populations and all years (Fig 6, S2 File).

Model	Transmission Rate	Diagnosis Rate	Likelihood
Base model	τ(U + D)	δU	0.87
Exhaustion of Susceptibles (ES)	τf(U + D)²	δU	0.16
Anti-retroviral Therapies (ART)	τ(U + 0.04D)	δU	0.72
Lack of Access to Care (LAC)	τ(U + D)	δ₀	0.87
ES and ART	τf(U + 0.04D)²	δU	0.52
ES and LAC	τf(U + D)²	δ₀	0.54
ART and LAC	τ(U + 0.04D)	δ₀	0.70
ES, LAC, and ART	τf(U + 0.04D)²	δ₀	0.58

Fig 3

Exhaustion of susceptibles.

Transmission of the disease is altered to reflect the impact of the size of the at-risk population. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.

Exhaustion of susceptibles.

Transmission and diagnosis rates are different under the different hypotheses.

Average likelihood across both populations and all years (Fig 6, S2 File).

Fig 6

Model fit was quantified by calculating the relative likelihood of observing the data within the simulations.

Lack of access to care

Lack of access to care may be conceptualized as pockets of undiagnosed individuals who are not being diagnosed. To capture this, we consider the diagnosis rate to be independent of the size of the undiagnosed population. The diagnosis rate is estimated as: The resulting equation for the undiagnosed subpopulation then becomes: where δ0 is 0.036. This large reduction in the diagnosis rate recovers the population dynamics well (Fig 4).

Fig 4

Lack of access to care.

The effect of undiagnosed individuals lacking access to care affects the rate of diagnosis of the undiagnosed individuals. The diagnosis rate is held constant to reflect this scenario. The mean of 100 stochastic simulations (pink line) is compared with the data (circles). Proportions are relative to initial proportion.

Lack of access to care.

ART usage

Since ART results in a viral load that has low chance of infecting a at-risk individual, we removed these individuals from the pool of infected individuals able to transmit the disease. Since 96% of diagnosed individuals reported taking anti-retroviral therapies in a previous study, the transmission term was modified as follows [10]. Variable or poor adherence on the part of diagnosed individuals is ignored due to the body of literature indicating that large benefit is gained from even poor adherence [15, 16]. This gives good recovery of both subpopulation dynamics and agrees best with both undiagnosed and diagnosed estimates (Fig 5).

Fig 5

Anti-retroviral therapy usage.

To reflect the high levels of ART prescription and usage reported by diagnosed individuals this percentage is removed from the pool of diagnosed individuals able to transmit the disease. The mean of 100 stochastic simulations (pink line) is compared to the data (circles). Proportions are relative to initial proportion.

Anti-retroviral therapy usage.

Multiple causes

Since it seems likely that most or all of these scenarios affect the infected population simultaneously, we analyze all their possible combinations (S1 Fig). The parameters were altered as described in Table 2. To determine the best cause, we quantify the goodness of fit by determining the relative likelihood of observing the data given the mean and variance of the stochastic simulations. These probabilities are given in Fig 6 with numerical details in S2 File, as well as the average probability over the 9 years.

Model fit was quantified by calculating the relative likelihood of observing the data within the simulations.

A higher likelihood is represented by a hotter color. From left to right: Base model, Exhaustion of Susceptibles (ES), Lack of Access to Care (LAC), Anti-Retroviral Therapy usage (ART), ES and LAC, ES and ART, LAC and ART, and ES, LAC, and ART. Details provided in S2 File. Lack of access to care, ART usage, or their combined resulted in the best recovery of the data for both the undiagnosed and diagnosed populations. Under the ART scenario, the diagnosed population has been reduced by 96%, resulting in a dynamic reduction in the transmission rate. We originally estimate the transmission rate to be 3.4% of the total infected population. This is close to the literature estimate of around 4-6% of infected individuals transmitting HIV to an at-risk individual [29, 33]. With the majority of the diagnosed population removed, the effective transmission rate is much lower. Under the LAC scenario, there is a constant diagnosis rate. This represents a yearly reduction in the undiagnosed population and increase in the diagnosed population of 3.6%. This means the diagnosed population grows faster than the undiagnosed population is reduced. Although lack of access to care in the undiagnosed population would mean the data are inaccurate, in our case the best fitting model is consistent for both subpopulations.

Discussion

We were able to obtain conservative estimates of the proportional changes in the diagnosed and undiagnosed HIV-infected populations using hierarchical Bayesian statistics. Our estimates suggest that the proportion of infected individuals who are undiagnosed is decreasing by approximately 2.8% each year from 2005 to 2013, while the proportion of diagnosed individuals is increasing by approximately 3.6%. We used the proportional change as constraints on a system of stochastic differential equations. This allowed us to estimate the transmission and diagnosis rates. We were able to recover reasonable parameter estimates and population dynamics using this methodology. To learn more about the cause of the decrease in the undiagnosed population, we considered some scenarios that would affect the epidemiological parameters: exhaustion of the at-risk population, lack of access to care, and reduction in viral load by anti-retroviral therapy. We were able to recover the diagnosed population dynamics when we altered the parameters to reflect these scenarios with the exception of including exhaustion of susceptibles. Modeling transmission as a function of the size of the at-risk population caused the size of the infected populations to increase rapidly. In the other scenarios some interesting dynamics could be observed in the undiagnosed population. Lack of access to care was simulated by considering diagnosis rate a constant unaffected by the size of the undiagnosed population. This resulted in an improvement in the likelihood of observing the data (Fig 6, S2 File). Anti-retroviral therapy usage also improved the overall recovery, but this effect was weaker for the undiagnosed population dynamics. Although the undiagnosed population size is dependent on the quality of the data available on the diagnosed population of that year, these results indicate that the scenarios that maximizes the probability of observing the diagnosed population also maximizes the probability of observing the diagnosed population estimates. The observed results suggest that lack of access to care and ART usage contribute to the infected population dynamics. This is not unexpected. Many individuals with HIV are reported to lack access to care [23, 34]. In areas with high poverty rates the death rate of infected individuals is much higher than that of the general population [4, 35]. In 2017 the New York Times reported groups of untreated individuals in the deep south dying due to their lack of access to care [23]. The southern US in particular suffers from high rates of STDs and lack of access to care that exacerbates the spread of HIV [36, 37]. In 2007 the number of uninsured individuals had risen by 60% since 2003, with those living in poverty over-represented [38]. Although the Affordable Care Act increased the number of individuals eligible for Medicaid, physician retention issues that continue to plague the majority of rural areas are exacerbated with increasing numbers of Medicaid patients [39, 40]. Both models and studies have shown that providing ART to infected individuals in the early stages of HIV reduces transmission events and frequency of death due to AIDS [11, 14, 19–22]. Even poor adherence may be enough to control or eradicate the epidemic and increase quality of life for infected individuals [16-18]. Greater effort must be made to ensure these populations have access to life-saving treatments.

Calculations.

Calculations for stochastic modeling using data obtained from [4, 27]. (XLSX) Click here for additional data file.

Model fit.

Details of the model fit probabilities for each year and simulation. (PDF) Click here for additional data file.

Matlab code.

Code used to run simulations and generate heatmap. (M) Click here for additional data file.

R Code—Diagnosed.

Code used to run Bayesian simulations. (R) Click here for additional data file.

R Code—Undiagnosed.

Code used to run Bayesian simulations. (R) Click here for additional data file.

Combined cases.

Stochastic simulations for the combined cases. (A) Exhaustion of Susceptibles and lack of access to care. (B) Exhaustion of Susceptibles and ART. (C) Lack of access to care and ART. (D) Exhaustion of Susceptibles, lack of access to care, and ART. (PDF) Click here for additional data file.

32 in total

Review 1. Could widespread use of combination antiretroviral therapy eradicate HIV epidemics?

Authors: J X Velasco-Hernandez; H B Gershengorn; S M Blower
Journal: Lancet Infect Dis Date: 2002-08 Impact factor: 25.071

Review 2. The case for expanding access to highly active antiretroviral therapy to curb the growth of the HIV epidemic.

Authors: Julio S G Montaner; Robert Hogg; Evan Wood; Thomas Kerr; Mark Tyndall; Adrian R Levy; P Richard Harrigan
Journal: Lancet Date: 2006-08-05 Impact factor: 79.321

3. Projected life expectancy of people with HIV according to timing of diagnosis.

Authors: Fumiyo Nakagawa; Rebecca K Lodwick; Colette J Smith; Ruth Smith; Valentina Cambiano; Jens D Lundgren; Valerie Delpech; Andrew N Phillips
Journal: AIDS Date: 2012-01-28 Impact factor: 4.177

4. Gender differences in behavioural and psychosocial predictors of HIV testing and return for test results in a high-risk population.

Authors: J A Stein; A Nyamathi
Journal: AIDS Care Date: 2000-06

Review 5. Addressing the challenges of adherence.

Authors: J A Bartlett
Journal: J Acquir Immune Defic Syndr Date: 2002-02-01 Impact factor: 3.731

6. The effect of perceived stigma from a health care provider on access to care among a low-income HIV-positive population.

Authors: Janni J Kinsler; Mitchell D Wong; Jennifer N Sayles; Cynthia Davis; William E Cunningham
Journal: AIDS Patient Care STDS Date: 2007-08 Impact factor: 5.078

7. Using CD4 Data to Estimate HIV Incidence, Prevalence, and Percent of Undiagnosed Infections in the United States.

Authors: Ruiguang Song; H Irene Hall; Timothy A Green; Célia L Szwarcwald; Nikos Pantazis
Journal: J Acquir Immune Defic Syndr Date: 2017-01-01 Impact factor: 3.731

8. How many are underinsured? Trends among U.S. adults, 2003 and 2007.

Authors: Cathy Schoen; Sara R Collins; Jennifer L Kriss; Michelle M Doty
Journal: Health Aff (Millwood) Date: 2008-06-10 Impact factor: 6.301

9. Adherence-resistance relationships for protease and non-nucleoside reverse transcriptase inhibitors explained by virological fitness.

Authors: David R Bangsberg; Edward P Acosta; Reena Gupta; David Guzman; Elise D Riley; P Richard Harrigan; Neil Parkin; Steven G Deeks
Journal: AIDS Date: 2006-01-09 Impact factor: 4.177

10. HIV-1 transmission during early infection in men who have sex with men: a phylodynamic analysis.

Authors: Erik M Volz; Edward Ionides; Ethan O Romero-Severson; Mary-Grace Brandt; Eve Mokotoff; James S Koopman
Journal: PLoS Med Date: 2013-12-10 Impact factor: 11.069