| Literature DB >> 30787170 |
Andrew S Azman1, Justin Lessler2, Francisco J Luquero3,4, Taufiqur Rahman Bhuiyan5, Ashraful Islam Khan5, Fahima Chowdhury5, Alamgir Kabir5, Marc Gurwith6, Ana A Weil7,8, Jason B Harris7,9,10, Stephen B Calderwood7,8, Edward T Ryan7,8,11, Firdausi Qadri5, Daniel T Leung12,13.
Abstract
The development of new approaches to cholera control relies on an accurate understanding of cholera epidemiology. However, most information on cholera incidence lacks laboratory confirmation and instead relies on surveillance systems reporting medically attended acute watery diarrhea. If recent infections could be identified using serological markers, cross-sectional serosurveys would offer an alternative approach to measuring incidence. Here, we used 1569 serologic samples from a cohort of cholera cases and their uninfected contacts in Bangladesh to train machine learning models to identify recent Vibrio cholerae O1 infections. We found that an individual's antibody profile contains information on the timing of V. cholerae O1 infections in the previous year. Our models using six serological markers accurately identified individuals in the Bangladesh cohort infected within the last year [cross-validated area under the curve (AUC), 93.4%; 95% confidence interval (CI), 92.1 to 94.7%], with a marginal performance decrease using models based on two markers (cross-validated AUC, 91.0%; 95% CI, 89.2 to 92.7%). We validated the performance of the two-marker model on data from a cohort of North American volunteers challenged with V. cholerae O1 (AUC range, 88.4 to 98.4%). In simulated serosurveys, our models accurately estimated annual incidence in both endemic and epidemic settings, even with sample sizes as small as 500 and annual incidence as low as two infections per 1000 individuals. Cross-sectional serosurveys may be a viable approach to estimating cholera incidence.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30787170 PMCID: PMC6430585 DOI: 10.1126/scitranslmed.aau6242
Source DB: PubMed Journal: Sci Transl Med ISSN: 1946-6234 Impact factor: 17.956
Overview of participants in the Dhaka, Bangladesh cohort. IQR, interquartile range.
| Cholera Cases | Household contacts | |
|---|---|---|
| Number of participants | 320 | 58 |
| Median age of participants (IQR) | 25 (8–35) | 26 (18–34) |
| Male (%) | 63.1 | 39.7 |
| O blood group (%) | 44.1 | 25.9 |
| 89.7 | - | |
| Severely dehydrated at admission (%) | 51.9 | - |
Fig. 1Overviewof post-infection titer trajectories fromconfirmed cholera cases in Bangladesh cohort. (A to H) Titer for a different antibody as a function of the number of days from (self-reported) symptom onset. The y axes are varied to aid visualization. Panels A and B show titers, whereas panels C to H are shown in ELISA units.
Fig. 2Distribution of vibriocidal antibody titers by study visit day in the Bangladesh cohort. Data from confirmed cholera cases are shown in orange and household contacts are shown in green. The dashed line represents the “baseline” titer distribution, a combined density of contacts across all visits and cases at first enrollment visit. Data are illustrated as ticks across x axes (top and bottom). Two-sided Kolmogorov-Smirnov tests to assess the similarity between distributions of titers at enrollment (day 2) for cases and contacts and found no significant differences for Ogawa (P = 0.4) and Inaba (P = 0.1).
Fig. 3cvAUC for each marker for different infection time windows. Error bars represent the 95% CIs. Themarker labeled “Vibriocidals” represents using the maximum of each person’sOgawa and Inaba vibriocidal titers. Note that data on anti-LPS IgM (brown) and anti-CTB IgM (gray) were only available on a subset (n = 202) of participants.
Fig. 4cvAUCs and variable importance from random forest models by infection time window. Blue curves represent individual cross-validated receiver operating characteristic curves from 20-fold cross validation of the random forest model over different infection time windows (A to D). Insets for each panel show the distribution of relative importance of each variable (median across cross-validation folds), with larger values representing parameters with more influence in the final model prediction as assessed through a permutation test procedure.
cvAUCs from random forest models fit to Bangladesh data by infection time window. The full model included all markers and demographics (those shown in Fig. 4 panels). The two-marker model used only the top two markers from the full random forest model for each window, and the enzymelinked immunosorbent assay (ELISA)–only model used anti-CTB and anti-LPS IgA and IgG titers. Vibriocidal Ogawa titers were used in all two-marker models, the 10-day model used anti-CTB IgA, and the others used anti-CTB IgG. Estimates of performance for models fit to the subset of data that have IgM measurements are included in table S4. The 95% CIs are shown in parentheses.
| Infection time window | |||||
|---|---|---|---|---|---|
| Model | 10 days | 45 days | 100 days | 200 days | 365 days |
| Full model | 94.5 (93.1–95.9) | 97.1 (96.2–97.9) | 95.0 (94.1–96.0) | 93.6 (92.4–94.8) | 93.4 (92.1–94.7) |
| Two markers | 91.3 (89.0–93.7) | 94.3 (92.9–95.7) | 93.5 (92.2–94.8) | 91.6 (90.1–93.1) | 91.0 (89.2–92.7 |
| ELISA only | 90.1 (88.0–92.2.0) | 93.6 (92.3–95.0) | 91.9 (90.6–93.2) | 89.8 (88.3–91.3) | 87.0 (85.2–88.9) |
Fig. 5Receiver operating characteristic curves for the external validation dataset of North American volunteers challenged with Two-marker (vibriocidal and anti-CTB IgG) models were used for this because other antibody measures were not available in this cohort. Three curves are plotted, each using a different infection time window.
Fig. 6Performance of random forest models and corrected vibriocidal test in estimating the infection attack rate in simulated post-epidemic serosurveys. (A) Simulated epidemics had the same shape as that observed in an internally displaced person camp in South Sudan (). The timing of the simulated serosurveys is shown as a vertical orange bar. (B) Infection attack rate estimates from the random forest model for different assumed case-to-infection ratios and a serosurvey sample size of 500 individuals. The dashed line represents the true simulated incidence, and numbers such as 0.5:1 represent the simulated infection-to-case ratio. For example, 4:1 represents simulations that followed an epidemiccurve with the same shape as that shown in (A) but with four times more infections than reported suspected cases. (C) Infection attack rate estimates from using a vibriocidal threshold of 320 but corrected for the estimated sensitivity and specificity over a 200-day infection window. The boxplots in (B) and (C) represent the median and IQR of the estimated attack rate, with the lines extending from each box representing 1.5 times the 25th or 75th percentile.