Quirine Ten Bosch1,2, Voahangy Andrianaivoarimanana3, Beza Ramasindrazana3, Guillain Mikaty4, Rado J L Rakotonanahary3, Birgit Nikolay1, Soloandry Rahajandraibe3, Maxence Feher4, Quentin Grassin4, Juliette Paireau1, Soanandrasana Rahelinirina3, Rindra Randremanana5, Feno Rakotoarimanana5, Marie Melocco5, Voahangy Rasolofo6, Javier Pizarro-Cerdá7,8,9, Anne-Sophie Le Guern7,8,9, Eric Bertherat10, Maherisoa Ratsitorahina6,11, André Spiegel6, Laurence Baril5, Minoarisoa Rajerison3, Simon Cauchemez1. 1. Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, Université Paris Cité, CNRS UMR2000, F-75015 Paris, France. 2. Quantitative Veterinary Epidemiology, Department of Animal Sciences, Wageningen University and Research, Wageningen, the Netherlands. 3. Plague Unit, Institut Pasteur de Madagascar, Antananarivo, Madagascar. 4. Environment and Infectious Risks Research Unit, Laboratory for Urgent Response to Biological Threats (ERI-CIBU), Institut Pasteur, Paris, France. 5. Epidemiology and Clinical Research Unit, Institut Pasteur de Madagascar, Antananarivo Madagascar. 6. Direction, Institut Pasteur de Madagascar, Antananarivo, Madagascar. 7. Yersinia Research Unit, Institut Pasteur, Université Paris Cité, CNRS UMR 6047, F-75015 Paris, France. 8. National Reference Laboratory for Plague and other Yersiniosis, Institut Pasteur, F-75015 Paris, France. 9. World Health Organization Collaborating Center for Plague FRA-140, Institut Pasteur, F-75015 Paris, France. 10. World Health Organization, Health Emergency Programme, Department of Infectious Hazard Management, Geneva, Switzerland. 11. Directorate of Health and Epidemiological Surveillance, Ministry of Public Health, Antananarivo, Madagascar.
Abstract
During outbreaks, the lack of diagnostic "gold standard" can mask the true burden of infection in the population and hamper the allocation of resources required for control. Here, we present an analytical framework to evaluate and optimize the use of diagnostics when multiple yet imperfect diagnostic tests are available. We apply it to laboratory results of 2,136 samples, analyzed with 3 diagnostic tests (based on up to 7 diagnostic outcomes), collected during the 2017 pneumonic (PP) and bubonic plague (BP) outbreak in Madagascar, which was unprecedented both in the number of notified cases, clinical presentation, and spatial distribution. The extent of these outbreaks has however remained unclear due to nonoptimal assays. Using latent class methods, we estimate that 7% to 15% of notified cases were Yersinia pestis-infected. Overreporting was highest during the peak of the outbreak and lowest in the rural settings endemic to Y. pestis. Molecular biology methods offered the best compromise between sensitivity and specificity. The specificity of the rapid diagnostic test was relatively low (PP: 82%, BP: 85%), particularly for use in contexts with large quantities of misclassified cases. Comparison with data from a subsequent seasonal Y. pestis outbreak in 2018 reveal better test performance (BP: specificity 99%, sensitivity: 91%), indicating that factors related to the response to a large, explosive outbreak may well have affected test performance. We used our framework to optimize the case classification and derive consolidated epidemic trends. Our approach may help reduce uncertainties in other outbreaks where diagnostics are imperfect.
During outbreaks, the lack of diagnostic "gold standard" can mask the true burden of infection in the population and hamper the allocation of resources required for control. Here, we present an analytical framework to evaluate and optimize the use of diagnostics when multiple yet imperfect diagnostic tests are available. We apply it to laboratory results of 2,136 samples, analyzed with 3 diagnostic tests (based on up to 7 diagnostic outcomes), collected during the 2017 pneumonic (PP) and bubonic plague (BP) outbreak in Madagascar, which was unprecedented both in the number of notified cases, clinical presentation, and spatial distribution. The extent of these outbreaks has however remained unclear due to nonoptimal assays. Using latent class methods, we estimate that 7% to 15% of notified cases were Yersinia pestis-infected. Overreporting was highest during the peak of the outbreak and lowest in the rural settings endemic to Y. pestis. Molecular biology methods offered the best compromise between sensitivity and specificity. The specificity of the rapid diagnostic test was relatively low (PP: 82%, BP: 85%), particularly for use in contexts with large quantities of misclassified cases. Comparison with data from a subsequent seasonal Y. pestis outbreak in 2018 reveal better test performance (BP: specificity 99%, sensitivity: 91%), indicating that factors related to the response to a large, explosive outbreak may well have affected test performance. We used our framework to optimize the case classification and derive consolidated epidemic trends. Our approach may help reduce uncertainties in other outbreaks where diagnostics are imperfect.
The availability of accurate diagnostics is essential for an effective response to infectious disease outbreaks. In the relatively common situation where no gold standard diagnostic is available (i.e., absence of a diagnostic test with perfect sensitivity and specificity), interpretation of diagnostic results becomes challenging [1,2]. This may hamper case identification and management; jeopardize the evaluation of the burden, scope, timing, and spatial expansion of the outbreak; and ultimately impede control. Here, taking a large plague outbreak in Madagascar as a case study, we present an integrative analytical framework to assess the performance of diagnostics and reconstruct spatiotemporal epidemic patterns in situations where multiple yet imperfect diagnostics are available.Plague is a highly fatal disease caused by a gram-negative bacillus Yersinia pestis [3]. Rodents constitute its natural reservoir and the bacillus can be transmitted to humans by fleas. When bitten by an infected flea, a person typically develops bubonic plague (BP), which is characterized by fever and painful lymphadenitis in the area of the fleabite [3]. Septicemic spread can occasionally lead to pneumonic plague (PP) that typically consists of sudden fever, cough, and symptoms of lower respiratory tract infections. Interhuman transmission of PP is possible through droplet spread [4]. Plague case fatality ratio (CFR) has been estimated between 10% to 40% [5-7]. Diagnosis, particularly of PP, is challenging due to (i) nonspecific early symptoms [8,9]; (ii) the difficulty to collect high-quality sputum samples, especially from severely ill and young patients [10]; and (iii) the scarcity of PP cases hampering evaluation of diagnostics; most assays have been evaluated on BP samples [11].Between August and November 2017, Madagascar experienced a large number (2,414) of notifications of clinically suspected plague cases that were predominantly in 2 major urban areas (79%) with unusually high proportions of PP (78%) (Fig 1A and 1B) [12]. Important discrepancies between tests (the proportion of positive PP results ranged from 1% to 18% depending on the test; Fig 1C and 1D) mean that the true extent of the PP outbreak remains unclear. Besides, without a good understanding of the performances of the diagnostics available, it is difficult to optimize diagnostic and case classification algorithms for future outbreaks. Here, we analyze data describing this large plague epidemic to obtain a comprehensive view of the burden of infection among notified cases. We evaluate the performance of test diagnostics and propose updated case classification algorithms to better allocate sparse resources during future outbreaks. Using the combined test results and diagnostic performance estimates, we reconstruct epidemiological trends over space and time.
Fig 1
Diagnostics and case classification during the plague outbreak in Madagascar in 2017.
(A, B) Weekly number of notified cases for PP (A) and BP (B) by case classification. (C–E) Proportion of notified cases classified as confirmed (conf) or probable (prob) (C), with a positive test result for RDT, culture, or MB (NB, only cases on whom the respective test was performed are considered in the denominator. No restrictions were put on the use of MB and RDT. Culture was only performed if RDT was positive, apart from PP samples from nonendemic regions. On those samples, culture was performed irrespective of RDT result) (D) and with a certain combination of diagnostic outcomes (E), presenting outcomes that were performed on all samples (RDT, qPCR on pla and caf1 genes). Model fits to these proportions are provided with black dots and lines indicating model predictions and 95% credible intervals, respectively. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; MB, molecular biology; PP, pneumonic plague; qPCR, quantitative polymerase chain reaction; RDT, rapid diagnostic test.
Diagnostics and case classification during the plague outbreak in Madagascar in 2017.
(A, B) Weekly number of notified cases for PP (A) and BP (B) by case classification. (C–E) Proportion of notified cases classified as confirmed (conf) or probable (prob) (C), with a positive test result for RDT, culture, or MB (NB, only cases on whom the respective test was performed are considered in the denominator. No restrictions were put on the use of MB and RDT. Culture was only performed if RDT was positive, apart from PP samples from nonendemic regions. On those samples, culture was performed irrespective of RDT result) (D) and with a certain combination of diagnostic outcomes (E), presenting outcomes that were performed on all samples (RDT, qPCR on pla and caf1 genes). Model fits to these proportions are provided with black dots and lines indicating model predictions and 95% credible intervals, respectively. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; MB, molecular biology; PP, pneumonic plague; qPCR, quantitative polymerase chain reaction; RDT, rapid diagnostic test.
Results
Of 2,414 notifications, we consider those with sputum or bubo aspirates and known clinical form (PP: 1,779, BP: 357) [12]. Of PP sputum samples, 22% have at least 1 positive culture (N = 4), rapid diagnostic test (RDT) (N = 327), or molecular biology (MB) (N = 84) (Fig 1D) and are classified, based on their diagnostic outcomes (Fig 2), as either confirmed (2%) or probable (20%) (Fig 1C), versus 34% of BP (37 culture, 99 RDT, 79 MB) (Fig 1D) with 16% confirmed and 18% probable (Fig 1C) [12].
Fig 2
Case classification algorithm.
Confirmed cases include cases with positive results for both RDT and MB and/or positive culture, probable have either RDT or MB positive, and suspected have no confirmatory laboratory results. MB, molecular biology; RDT, rapid diagnostic test.
Case classification algorithm.
Confirmed cases include cases with positive results for both RDT and MB and/or positive culture, probable have either RDT or MB positive, and suspected have no confirmatory laboratory results. MB, molecular biology; RDT, rapid diagnostic test.We develop a latent-class statistical model [13] to estimate the performance of diagnostic tests and the scale of the outbreak from contingency tables describing 3 tests with up to 7 separate diagnostics outcomes (i.e., 2 single-outcome tests: RDT, culture; plus up to 5 genes for MB) for 2,136 samples received at the central laboratory for plague (CLP) between August 1 and November 26, 2017. The model describes the joint expected distribution of diagnostic outcomes as a function of the prevalence (proportion of Y. pestis infections among notified, clinically suspected cases), the sensitivity (probability of positive result if the sample is from a Y. pestis-infected person), and specificity (probability of negative result if the sample is from a person that was not infected with Y. pestis) of each test. Estimation of model parameters is performed in a Bayesian framework via Markov chain Monte Carlo (MCMC) sampling [14] under the assumption that culture specificity is 100%. Technical details are provided in Materials and methods.We estimate that test specificity was similar between sample types. MB was highly specific (PP: 100%, 95% credible interval 99% to 100%, BP: 100%, 98% to 100%), whereas RDT specificity was around 80% for both PP (82%, 80% to 84%) and BP (85%, 81% to 89%) (Fig 3A and Table A in S1 Text). Additional analyses including an initially implemented classical polymerase chain reaction (cPCR) protocol confirm that these lacked specificity (PP: 55%, 52% to 58%, BP: 62%, 56% to 69%) (Table B in S1 Text) and justifies its timely replacement by MB. The latter was the most sensitive test (PP: 80%, 61% to 97%; BP: 95%, 86% to 100%), markedly higher than that of culture (PP: 7%, 0% to 23%; BP: 64%, 46% to 85%) and RDT (PP: 28%, 18% to 41%, BP: 72%, 61% to 83%) (Fig 3B and Table A in S1 Text). The statistical analysis also provides estimates of the performance of diagnostic tests that would be based on single gene diagnostic outcomes obtained from the quantitative PCR (qPCR) (Table A in S1 Text). Estimates were robust for deviations from model assumptions including the inclusion of the initial cPCR (Table B in S1 Text) and the use of a uniform prior on prevalence (Table D in S1 Text).
Fig 3
Model estimates of test performance and prevalence.
(A) Specificity of each test, with RDT denoting rapid diagnostic test and MB denoting molecular biology. (B) Sensitivity of each test. (C) Prevalence of Y. pestis infection among notified cases, under the assumption of perfect sample quality. (D) Relationship between sample quality (i.e., the proportion of samples from infected individuals that contain detectable bacterial material) and estimated prevalence of infection among notified cases. Results are presented by clinical form: pneumonic (PP: blue) and bubonic (BP: orange). The circle/triangle shows the posterior median of the parameter while the lines show the 95% credible interval. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; MB, molecular biology; PP, pneumonic plague; RDT, rapid diagnostic test.
Model estimates of test performance and prevalence.
(A) Specificity of each test, with RDT denoting rapid diagnostic test and MB denoting molecular biology. (B) Sensitivity of each test. (C) Prevalence of Y. pestis infection among notified cases, under the assumption of perfect sample quality. (D) Relationship between sample quality (i.e., the proportion of samples from infected individuals that contain detectable bacterial material) and estimated prevalence of infection among notified cases. Results are presented by clinical form: pneumonic (PP: blue) and bubonic (BP: orange). The circle/triangle shows the posterior median of the parameter while the lines show the 95% credible interval. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; MB, molecular biology; PP, pneumonic plague; RDT, rapid diagnostic test.Under the assumption that samples were of good quality, we estimate that prevalence of infection among notified cases was 4% (3 to 7) for PP and 25% (18% to 28%) for BP (Fig 3C). This corresponds to 78 (50 to 119) and 81 (64 to 98) Y. pestis infections among notified PP (N = 1,779) and BP cases (N = 357), respectively. However, a challenge in diagnosing PP is the risk for samples to be of poor quality, i.e., that samples from a Y. pestis-infected individual do not contain detectable bacterial material. If a proportion of samples were of poor quality, estimates for the prevalence of infection would increase (Fig 3D). For example, in the extreme scenario where only 50% of samples were of good quality, estimates of the prevalence of infection would rise to 9% (6% to 13%) for PP and 45% (36% to 55%) for BP. For this analysis, we assumed sample quality to affect all tests equally. We also assessed a scenario in which test sensitivities were not fully independent and only the 2 qPCR gene results were affected by sample quality. This did not improve model fit (Fig B in S1 Text) and most parameter estimates were robust to departures from the assumption of test independence (Fig C in S1 Text).We find that these estimates present good adequacy with the observed data [12] and can accurately reproduce (i) the number of notified cases classified as confirmed (PP: 19, 8 to 47 expected versus 27 observed; BP: 58, 37 to 81 versus 57) and probable (PP: 356, 338 to 377 versus 364; BP: 66, 45 to 87 versus 66) (Fig 1C); (ii) the number of notified cases testing positive for RDT, culture, or MB (Fig 1D); and (iii) the more detailed contingency table of the different diagnostic outcomes used for inference (Fig 1E).Our analytical framework can be used to assess the performance of the case classification. For example, it can explain why the prevalence of Y. pestis among PP notified cases is estimated to be lower than the proportion of confirmed or probable cases (Fig 4A). In a scenario of low prevalence, the suboptimal specificity of RDT means that classification for PP based on confirmed or probable cases is characterized by a proportion of false positives (approx. 1-specificity) that is large relative to the prevalence. In contrast, a classification that solely relies on confirmed cases consistently underrepresents the prevalence due to low sensitivity of RDT and culture. For BP, the case classification performs well at any prevalence level, with the true prevalence always falling between the proportion of confirmed and confirmed/probable cases (Fig 4B and B panel of Fig D in S1 Text).
Fig 4
Performance of the case classification system.
(A, B) Expected proportion of notified cases classified as confirmed (dark blue or orange), probable (light blue or orange), and suspected (white), as a function of prevalence of infection for PP (A) and BP (B). The dashed vertical line indicates the prevalence among notified cases estimated during the 2017 Madagascar outbreak. The dashed diagonal line corresponds to perfect classification (C, D). Expected proportion of Y. pestis infections among cases in the category confirmed, confirmed or probable, and suspected as a function of prevalence of infection for PP (C) and BP (D). (E, F) ROC plots presenting sensitivity versus (1-specificity) for a range of possible classification criteria for PP (E) and BP (F) and for simplifications of the MB algorithm for PP (inset of E) and BP (inset of F). MB is considered here due to its potential for being considered as a classifier by itself. Here, conf denotes confirmed and prob denotes probable. Classifications ≥1 qpcr and 2 qpcr represent results based on qPCR solely, i.e., in the absence of confirmatory cPCR, with ≥1 qpcr denoting “at least 1 gene positive” and 2 qpcr “both genes positive.” The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; cPCR, classical polymerase chain reaction; MB, molecular biology; PP, pneumonic plague; qPCR, quantitative polymerase chain reaction; ROC, Receiver operating characteristic.
Performance of the case classification system.
(A, B) Expected proportion of notified cases classified as confirmed (dark blue or orange), probable (light blue or orange), and suspected (white), as a function of prevalence of infection for PP (A) and BP (B). The dashed vertical line indicates the prevalence among notified cases estimated during the 2017 Madagascar outbreak. The dashed diagonal line corresponds to perfect classification (C, D). Expected proportion of Y. pestis infections among cases in the category confirmed, confirmed or probable, and suspected as a function of prevalence of infection for PP (C) and BP (D). (E, F) ROC plots presenting sensitivity versus (1-specificity) for a range of possible classification criteria for PP (E) and BP (F) and for simplifications of the MB algorithm for PP (inset of E) and BP (inset of F). MB is considered here due to its potential for being considered as a classifier by itself. Here, conf denotes confirmed and prob denotes probable. Classifications ≥1 qpcr and 2 qpcr represent results based on qPCR solely, i.e., in the absence of confirmatory cPCR, with ≥1 qpcr denoting “at least 1 gene positive” and 2 qpcr “both genes positive.” The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; cPCR, classical polymerase chain reaction; MB, molecular biology; PP, pneumonic plague; qPCR, quantitative polymerase chain reaction; ROC, Receiver operating characteristic.The positive predictive value (PPV) for a category of cases is the proportion of cases of that category that are Y. pestis infected. As expected, we find that the PPV of the confirmed or probable category is strongly impacted by prevalence among notified cases (Fig 4C and 4D). For example, if the prevalence of PP was 20%, over half of confirmed or probable cases would be expected to be Y. pestis infected. This proportion drops to as little as 22% (21% to 24%) for a prevalence of 5%. This shows that it is critical to avoid overreporting and ensure notified cases meet the clinical case definition. Cases classified as confirmed were, for both clinical forms, almost all Y. pestis infected (PP: 98%, 91% to 100%; BP: 100%, 99% to 100%), deriving from perfect specificity of culture and the strict criterium requiring both RDT and MB to be positive. We further assess the risk of missing Y. pestis-infected cases and predict that 29% (16% to 42%) of Y. pestis-infected PP cases were classified as confirmed and 87% (73% to 98%) as confirmed or probable. This classification sensitivity is better for BP with 89% (81% to 96%) of infected cases being confirmed and 100% (99% to 100%) being confirmed or probable. The performance of case classification would be hampered if a substantial proportion of samples were of poor quality (Fig D in S1 Text).We can also determine how to revise the classification system to minimize the proportions of false positive (1-specificity) and false negative cases (1-sensitivity) (Fig 4E and 4F). Best classification for both forms is based on MB, with a proportion of false positive and false negative cases, respectively, reduced from 2% to 0% (0% to 0%) and from 71% to 20% (3% to 40%) for PP (BP: 0% to 0%, 0% to 2% and 11% to 5%, 0% to 14%) (Fig 4E and 4F), providing a robust representation of the prevalence.We then compare the MB algorithm (Fig A in S1 Text) to simpler alternatives that would not require confirmatory cPCR. We show that the MB algorithm is more sensitive than classification based on qPCR alone using “both genes positive” as a criterium and more specific than the one using “at least 1 gene positive” (Fig 4E and 4F).Concordance between RDT and MB improved over time among negative MB samples (B and D panels of Fig E in S1 Text) but decreased among positive MB samples for PP (S5A Fig in S1 Text). We investigate possible changes in RDT performance during the epidemic. We find that RDT specificity increased significantly from 72% (69% to 76%) before week 41 to 95% (93% to 97%) afterward for PP (BP: 71%, 63% to 78% to 98%, 95% to 100%). Sensitivity of RDT was unchanged for BP (73%, 59% to 87% to 72%, 55% to 88%) but decreased for PP (34%, 16% to 53% to 14%, 3% to 30%) (Table C in S1 Text). Earlier and later cutoff times result in a lesser fit (Fig F in S1 Text). Estimates of RDT specificity for the second part of the outbreak are consistent with those obtained for the subsequent endemic BP season, during which the same batch was used (specificity: 99%, 96% to 100%), and are quite consistent with estimates from earlier evaluations of this test (64% sensitivity and 93% specificity based on latent class analysis) [11]. The 19% increase sensitivity estimated in the subsequent BP season (91%, 84% to 96%) suggests that outbreak-specific factors may have indeed hampered RDT and case classification performance in 2017 (Fig G in S1 Text).Lastly, we can use our framework to derive, for each notified case, the probability of Y. pestis infection given their test results (i.e., the PPV). The probability is highest among cases with positive MB (100%) (Fig H in S1 Text) or culture (100%). We then use these estimates, together with the location and timing of cases, to reconstruct the dynamics of spread corrected for spatiotemporal variations in prevalence. Prevalence of Y. pestis infections among notified PP cases was 3-fold (BP: 2-fold) lower during the outbreak phase (weeks 39 to 43; when 75% of notifications occurred) than during the initial phase (Fig 5A and 5B). Such phenomenon is common when an outbreak receives a lot of attention from authorities, media, and communities, as was the case in 2017. Prevalence of Y. pestis infection among notified cases was highest in plague-endemic regions (BP: 3-fold higher than Antananarivo), where health personnel is accustomed to responding to BP (Fig 5C and 5D). Prevalence was lower among children (<5 year old) among notified BP cases, but not for PP (Fig 4E and 4F). Correcting for temporal variations in the prevalence, we find that the transmission of Y. pestis during this outbreak was less efficient than what was suggested by the analysis of notified cases, particularly for PP: The doubling time in the first 6 weeks was estimated to be 18 rather than 6 days (or 8 based on confirmed/probable) (BP: 24 versus 13 (17)) (Fig 5G and 5H).
Fig 5
Reconstruction of the outbreak by place and time.
(A, B) Estimated prevalence of infection among notified cases by time period for PP (A) and BP (B). Here, the initial phase spans weeks 34–38, outbreak phase 39–43, and the end phase 44–48. (C, D) Prevalence estimates by zone for PP (C) and BP (D). No BP cases were notified from Toamasina. (E, F) Prevalence estimates by age for PP (E) and BP (F). (G, H) Observed notifications (bars) vs. estimated infections (solid lines with shading denoting 95% credible intervals) among notified cases for PP (G) and BP (H). The stacked bar plots denote the percentage (A–F) and absolute numbers (G–H) by case classification. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; PP, pneumonic plague.
Reconstruction of the outbreak by place and time.
(A, B) Estimated prevalence of infection among notified cases by time period for PP (A) and BP (B). Here, the initial phase spans weeks 34–38, outbreak phase 39–43, and the end phase 44–48. (C, D) Prevalence estimates by zone for PP (C) and BP (D). No BP cases were notified from Toamasina. (E, F) Prevalence estimates by age for PP (E) and BP (F). (G, H) Observed notifications (bars) vs. estimated infections (solid lines with shading denoting 95% credible intervals) among notified cases for PP (G) and BP (H). The stacked bar plots denote the percentage (A–F) and absolute numbers (G–H) by case classification. The underlying data and code to reproduce this figure are available on Open Science Framework (https://osf.io/nbc4t/). BP, bubonic plague; PP, pneumonic plague.
Discussion
Assessing the true burden of an outbreak can be difficult in the absence of “gold standard” diagnostic. This can be especially problematic when scarce resources need to be allocated for outbreak control. Here, using plague as a case study, we presented a statistical framework based on latent-class models to parse the results from multiple imperfect diagnostics and assess the true burden of the outbreak. We showed that around one tenth of notified cases were likely to be infected with Y. pestis. We showed that, particularly in scenarios with substantial misclassification of cases, poor specificity of some diagnostics can greatly skew case classification, even if a combination of diagnostic tests is used, and contribute to an overestimation of the true burden of infection. We used estimates of diagnostic test performance together with individual test results to reconstruct epidemiological trends in the proportion of true infections among notified cases and showed that misclassification of cases was highest during the peak of the epidemic and in regions nonendemic to plague. We illustrated the importance of optimized case classification algorithms, highlighting an overestimation of the transmission potential of the bacteria if based on the, typically used, tally of confirmed and probable cases.This study highlights challenges inherent to plague diagnostics, particularly those of pneumonic cases. While specificity of test results was similar between bubon and sputum samples, the sensitivity of all diagnostics was substantially lower for sputum samples. This is in line with other respiratory illnesses such as pneumonia [10]. Poor-quality samples may well result in an underestimation of the true prevalence of infection among notified cases. We assessed how limited sample quality would affect our findings and showed that our general conclusion that the majority of notified cases were not infected with Y. pestis is robust to substantial amounts of poor-quality samples. We came to the same conclusion if sample quality issues affected some diagnostics more than others.Classifying cases into confirmed, probable, and suspected is a routine public health effort that gives insight into the extent of the outbreak and an indication of the levels of uncertainty surrounding this. We highlight the importance of accurate classification algorithms and show that, particularly for diseases with nonspecific symptoms and high risks of misclassification (e.g., due to raised awareness or nonfamiliarity with the disease among public health responders), classification based on tests with poor specificity can result in vast overestimations of the outbreak extent. In the case of the plague outbreak in Madagascar, limited RDT specificity contributed to the majority of probable PP cases not to be infected with Y. pestis. We showed that the performance of the RDT improved toward the end of the outbreak. Such evolving RDT performance might be explained by the extreme circumstances surrounding this outbreak which may have resulted in changes within laboratories, e.g., due to overworked personnel or, conversely, changes in workflow to increase efficiency and proficiency of sample processing. It might also be due to a change of RDT batch that occurred in week 43. Assessing historical data [11] as well as data from a subsequent outbreak year indeed revealed better RDT test performance than was observed during the first half of the outbreak. The performance of case classification algorithms may therefore be better during nonoutbreak years and care should be taken to uphold this in crisis situations. Apart from upholding test performances, this may also include robust clinical case definitions to prevent large overreporting. Even under better conditions, however, the RDT is likely of limited value for case classification when other tools are available. Yet, RDTs are vital for point-of-care diagnostics in peripheral health settings, in particular for BP. Improving RDT performance should therefore be prioritized. Similarly, the inclusion of culture did not improve case classification, owing to its limited sensitivity. While the proportion of confirmed cases gave the best indication of the true proportion of infections, a large underestimation of the true burden of infections is expected in scenarios with less overreporting and a higher prevalence among notified cases. The inclusion of culture nevertheless remains fundamental for assessing circulating strains and the antibiotic resistance thereof. In this outbreak, this was particularly relevant as widespread use of prophylactic treatment was observed in response to the large volume of notified cases. The real risks of resistance emergence associated with widespread use are another reason why accurate case classification is important.MB had the best specificity and sensitivity. Especially for BP, adding other diagnostic tests to classify cases does not improve our ability to accurately classify cases. For PP, MB by itself would result in somewhat lower sensitivity than if used in combination with culture and RDT, but the reduced specificity outweighs this benefit. We also assessed whether the MB algorithm (Fig A in S1 Text) itself could be further improved. While the MB algorithm, particularly for PP, is somewhat less sensitive than the one using “at least 1 gene positive,” we pose that increased specificity should be prioritized in low prevalence scenarios such as the one described here, confirming the relevance of the confirmatory cPCR performed in the MB algorithm.The integrative framework presented here makes it possible to assess the performance of diagnostics, optimize their use for case classification, and reconstruct the whereabouts of infected cases during outbreaks in situations where no diagnostic gold standard is available. Improved case classification is particularly important for the allocation of scarce resources, for example, by accurately targeting contact tracing efforts and optimizing the impact of mobile test facilities. Beyond plague, such analytical framework could be a valuable tool to reduce uncertainties in other infectious disease outbreaks affected by nonoptimal diagnostics. This is particularly important when overreporting is likely due to nonspecific symptoms or if mass testing is applied. Above all, the development and availability of high-quality diagnostics remains a priority, particularly for pathogens prone to causing explosive outbreaks such as Y. pestis.
Materials and methods
Background information about plague in Madagascar
Madagascar accounts for 75% of plague cases worldwide [5]. Health professionals are required to notify all cases clinically suspected of Y. pestis infection to the CLP (WHO Collaborating Center) at the Institut Pasteur de Madagascar (IPM), where case notification forms are recorded and biological samples analyzed for laboratory confirmation. Treatment of cases is not contingent on biological confirmation of CLP. Annually, between 200 to 700, mostly bubonic (BP) (75%) cases are notified. The majority of these cases occur in the rural central highlands during the country’s plague season (October to April). Occasional small outbreaks of PP were recorded in 1997, 2011, and 2015 in rural areas [15-18] and in 2004 in 1 commune of the county’s capital city, Antananarivo.Between August and November of 2017, the country experienced an outbreak that, with 2,414 notified cases, was much larger than regular plague seasons and presented with an unusual proportion of cases with clinically suspected PP.
Data
Samples from clinically suspected cases (2,414) were sent to the CLP at the IPM for diagnostic testing. Treatment of suspected cases was not contingent on biological results. Biological samples were taken from cases presenting at health care settings with symptoms consistent with plague (i.e., for BP: presence of an isolated, painful adenopathy; for PP: cough (<5 days), bloody sputum, chest pain with fever) [3]. There was no formal clinical case definition that needed to be satisfied for patients to be tested. Samples included bubo aspirates from BP and secondary PP, sputum samples for PP, and liver and/or lung aspirates from deceased cases. All samples were tested for fraction 1 (F1) capsular antigen using an RDT [11]. Initially, MB was performed using cPCR targeting the pla gene on all samples. Due to low specificity of the cPCR, this test was abandoned on November 3 and replaced by real time qPCR targeting pla and caf1 genes. If both genes tested positive, a sample was considered positive for MB. Samples with discordant or inconclusive qPCR results were verified using confirmatory cPCR on pla, caf1, and inv genes, with protocols improved to reach better specificity. They were considered positive upon positive results for the inv gene and/or for both pla and caf1 genes (see Fig A in S1 Text for decision tree) [19]. All samples received before November 3 were retested using the MB protocols (November to December 2017). In addition, culture was performed on all samples with positive RDT [20]. PP samples from nonendemic regions received between September 11 and October 3 were cultured irrespective of RDT result. No serological testing was performed during the outbreak.As per WHO guidelines, cases were classified based on their diagnostic test results as confirmed if culture and/or both RDT and MB were positive, probable upon positive results for either MB or RDT, and suspected otherwise (Fig 2). Initial cPCR results were not considered for case classification. Culture is often regarded as a gold standard given its perfect specificity yet lacks sensitivity. Culture sensitivity may have been particularly challenged during this outbreak as a result of widespread prophylactic antibiotic use [9].
Estimating diagnostic test performances and burden of infection
We develop a statistical model based on latent class methods to estimate test performances and burden of infection among the population of notified cases (N). Here, we distinguish diagnostic outcomes as the raw outcomes of performed qPCRs (i.e., gene-specific outcomes) and the composite result of the confirmatory cPCR (Fig A in S1 Text). The composite of these makes up the result of a diagnostic test. For each notified case (i) dichotomous results (y) are available for up to J diagnostic outcomes (j) (i.e., 1 for each of RDT and culture and up to 3 for MB: 2 for qPCR (pla, caf1) and 1 for confirmatory cPCR (pla, caf1, inv), with y = 1 denoting a positive and y = 0, a negative result. The infection status of case i is denoted d (= 1 if infected and 0 otherwise). The sensitivity and specificity of diagnostic outcome j are denoted S = P(Y = 1|D = 1) and C = P(Y = 0|D = 0), respectively.Here, we calculate the contribution to the likelihood of the different diagnostic outcomes. Test-specific sensitivities and specificities are then calculated from the characteristics of the diagnostic outcomes that make up a specific test (MB in particular). We first discuss the likelihood for those diagnostic outcomes that are performed irrespective of other diagnostic outcomes (RDT, qPCR), followed by those that are performed conditional on other diagnostic outcomes (culture and confirmatory cPCR).
Contribution to the likelihood of RDT, qPCR (pla, caf1)
We first calculate the contribution to the likelihood of diagnostic outcomes that are performed irrespective of other diagnostic outcomes, namely RDT, qPCR (pla and caf1) (indexed in Eq 1 as 1…U). If the infection status of a case was known, 2 conditional probabilities would have to be considered:Conditional on being infected by plague and given model parameters θ, the joint probability of test results for case i isConditional on not being infected by plague and given model parameters θ, this probability becomes:In practice, the infection status of a case is unknown, and we therefore work on the unconditional joint probability of diagnostic outcomes for case i that integrates over the different possibilities:
where π is the prevalence of plague infection among notified cases.While RDT and qPCR (pla and caf1) were performed independent of other results, culture was done based on RDT outcome, period, and zone, and confirmatory cPCR (pla, caf1, inv) was performed only if qPCR was inconclusive. We need to integrate such conditioning in our analysis to avoid biases.
Contribution to the likelihood of culture
For PP samples received from a nonendemic region between September 11 and October 3, culture was performed irrespective of RDT results and we therefore use the formulation described above. For all other samples, culture was performed only if RDT was positive. Hence, the conditional probability for these individuals to obtain a culture result (y) is
which can also be expressed in terms of the PPV (the proportion infected among individuals with a positive test result) of RDT
where PPV isResults from culture that did not adhere to this conditioning (PP: 283, BP: 60) were not included in the analysis as the reason for this additional testing cannot be traced back but was likely nonrandom and affected by other test results.
Contribution to the likelihood of cPCR (pla, caf1, and inv1100)
Test results for MB were a composite of up to 5 diagnostic outcomes (Fig 2). Confirmatory cPCR for genes 1 to k were performed conditional on discordant qPCR results. The composite result of this test was included in the model.
Joint likelihood
The likelihood per case is the product of terms described in Eqs 3, 4, and 7:
Inference
Parameter estimation was done in a Bayesian setting using a Bayesian Metropolis–Hastings MCMC approach [14]. We utilized a weakly informative beta-distributed prior for prevalence (shape = 1, scale = 2) (i.e., chance of prevalence being below 50% is twice as high as above) based on estimates of prevalence from previous BP outbreaks [21]. To confirm the robustness of the results to the choice of priors on prevalence, the MCMC was also performed with a uniform prior between 0 and 1 (Table D in S1 Text). For specificities of tests associated with MB, beta-distributed priors were used with means of 95% (shape = 12.7, scale = 0.67) based on verifications done in the IPM laboratories prior to implementation. The specificity of culture was fixed at 100%. For all other parameters, we used uniform priors between 0 and 1 (i.e., for all sensitivities as well as the specificity of RDT). Metropolis–Hastings updates were performed on a natural scale with step sizes adjusted such to obtain an acceptance probability between 10% and 50% [14]. Traces of the MCMC were plotted per parameter and convergence was assessed visually (Figs I and J in S1 Text).
Assuming imperfect sample quality
Collection of good-quality samples is challenging and might be affected by prophylactic treatment, preservation techniques, and the delays between symptom onset and sample testing at CLP. The prevalence of infection is related to the prevalence of detectable bacterial material in the collected samples (τ) such that τ = ρπ, where ρ is the probability of good sample quality given a truly infected individual. Accounting for ρ in Eq 3 givesHere, Sj and Cj denote the absolute sensitivity and specificity, i.e., assuming the sample is of good quality. The definition of ρ implies that all tests are equally affected by factors reducing sample quality.
Assuming dependence between qPCR results
While in the above calculations, test results are assumed independent of each other, in practice this may not always be true. Notably, as results from both genes assessed by qPCR are performed in the same assay, possible contaminations or technical problems might affect both test outcomes concurrently. To assess the sensitivity of our results to departures from the assumption of independence, we adjusted the contribution of the outcomes qPCRpla and qPCRcaf to the likelihood (Eq 8) to reflect a larger likelihood of concordance between both diagnostic outcomesHere, indices 1 and 2 refer to qPCR pla and caf, respectively, and cov denotes the pairwise covariance between diagnostic outcomes [22]. We assess whether the inclusion of a covariance factor affected the fit to the data, using DIC as an indicator of fit [23], and whether estimated test characteristics were robust to departures from assumptions of independence.Model fit did not improve upon the inclusion of a covariance factor (Fig B in S1 Text) and parameter estimates were relatively robust: Prevalence of PP was insensitive to the existence of correlations between these tests (4% versus 6%) (Fig C in S1 Text). Prevalence estimates of BP increased for high levels of correlation (35% versus 23%), which came with increased RDT specificity (97% versus 85%) and reduced sensitivity of culture (41% versus 64%).
Detecting heterogeneity in test performance
Over the course of the outbreak
Changes in concordance between RDT and MB were observed toward the end of the outbreak (Fig E in S1 Text). To test whether a change in RDT-test performance could explain this observation, we reran the inference routine allowing RDT to have different test characteristics before and after a predetermined cutoff week (i.e., essentially treating RDT before and after the cutoff point as distinct tests). We used different cutoff weeks (38 to 43), where the week denotes, as elsewhere in the manuscript, the date of symptom onset of the cases. The model with the cutoff week that yielded the highest likelihood was then compared to the baseline model (i.e., assuming no change in RDT performance) using DIC to test whether the change in RDT performance resulted in an improved model fit. We did not examine changes in other tests because (i) cPCR was terminated halfway through the outbreak; (ii) culture only yielded few positive samples; and (iii) MB was performed (in retrospect) at the end of the outbreak.
Between outbreaks
We compared results from the outbreak year (2017 to 2018) with those from the subsequent plague season (2018 to 2019). Between August 17, 2018 and April 7, 2019, 261 (46 PP, 211 BP, 4 unknown form) cases were reported to the CLP. All samples were analyzed using MB, RDT, and culture, with the same protocols as were used during the 2017 outbreak year. Among sputum samples for PP (n = 25), 22 were negative for both MB and RDT. Others were positive for either or both tests (1 MB+ and RDT+, 1 MB− and RDT+, 1 MB+ and RDT−), with those positive for MB confirmed by culture. Among bubon aspirates for BP (n = 194), 174 were concordant between MB and RDT (94 positive, 80 negative). Given the low number of positive sputum samples for PP (1 confirmed, 2 probable, 22 suspected), we only analyzed BP samples from this season. Inference was similar to that on samples from 2017, but since culture was performed on all samples, no conditioning was needed when assessing the contribution of culture on the joint likelihood. In addition, due to high concordance between both genes used for qPCR, few conditional PCRs were performed. We thus did not estimate the performance of conditional PCRs during this season.
Outbreak reconstruction
We derived the probability of Y. pestis infection for each notified case based on the PPV associated with their results and assuming the medians of the estimated prevalence, sensitivity, and specificity (see Eq 6). The sum of all PPVs denotes the expected number of true infections among notified cases. We used this relationship to reconstruct the number of expected infections by subgroup. We divided the notified cases according to the following categories: (i) by period, distinguishing the initial phase (weeks 34 to 38), the outbreak phase (weeks 39 to 43), and the end phase (weeks 44 to 48); (ii) by week; (iii) by zone, distinguishing endemic zones (plague-endemic districts [24] apart from greater Antananarivo), greater Antananarivo (urban community of Antananarivo and the 3 neighboring districts), and Toamasina district; and (iv) by age group (below and above 5 years of age). Using these, we estimated the prevalence and exact binomial 95% confidence interval of infection among notified cases by subgroup.
Software
All analyses have been performed in R [25]. MCMC results have been processed using the coda [26] and BayesianTools packages [27].
Supplementary information appendix.
Fig A. Molecular biology (MB) algorithm. Fig B. Model fit as a function of covariance between qPCR and qPCRc sensitivities for pneumonic forms (A) and bubonic forms (B). Fig C. Sensitivity of parameter estimates to different levels of correlation between the sensitivity of qPCRpla and qPCRcaf1. Fig D. Performance of case classification system assuming sample quality of 75%. Fig E. RDT vs. MB concordance over time. Fig F. Model fit as a function of the timing of changed RDT performance for pneumonic (PP) (A) and bubonic plague (BP) (B). Fig G. ROC plots presenting for a range of possible classification criteria for pneumonic (PP) (A, C) and bubonic plague (BP) (B, D, E) before (A, B) and after week 41 (C, D) and during the 2018 endemic season (E). Fig H. Distribution of positive predictive values (PPVs) by test result and clinical form. Fig I. Traceplots for MCMC of default model for pneumonic forms. Fig J. Traceplots for MCMC of default model for bubonic forms. Table A. Model estimates of the performance of RDT, culture, MB, and of tests that would be based on single diagnostic outcomes. Table B. Model estimates of test performance of RDT, culture, MB, and of tests that would be based on single diagnostic outcomes. In addition to the default analysis presented in Table A in S1 Text, here, the initial cPCR was included in the analysis. Results of this test were removed from the final analysis because performances of that test were too low. The results of the initial cPCR were not considered in the case classification. Table C. Model estimates of the performance of RDT, culture, MB, and of tests that would be based on single diagnostic outcomes, in a scenario change in RDT performance at week 41 of the outbreak. Table D. Model estimates of the performance of RDT, culture, MB, and of tests that would be based on single diagnostic outcomes, with a noninformative uniform prior on the prevalence of infection among notified cases.(DOCX)Click here for additional data file.3 Dec 2021Dear Dr. Ten Bosch,Thank you for submitting your manuscript entitled "Evaluating and optimizing the use of diagnostics during epidemics: Application to the 2017 plague outbreak in Madagascar" for consideration as a Research Article by PLOS Biology. I apologize for the time you have been waiting for a decision while we were discussing about your manuscript.Your manuscript has now been evaluated by the PLOS Biology editorial staff, as well as by an academic editor with relevant expertise, and I am writing to let you know that we would like to send your submission out for external peer review.However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed the checks it will be sent out for review. To provide the metadata for your submission, please Login to Editorial Manager (https://www.editorialmanager.com/pbiology) within two working days, i.e. by Dec 05 2021 11:59PM.If your manuscript has been previously reviewed at another journal, PLOS Biology is willing to work with those reviews in order to avoid re-starting the process. Submission of the previous reviews is entirely optional and our ability to use them effectively will depend on the willingness of the previous journal to confirm the content of the reports and share the reviewer identities. Please note that we reserve the right to invite additional reviewers if we consider that additional/independent reviewers are needed, although we aim to avoid this as far as possible. In our experience, working with previous reviews does save time.If you would like to send previous reviewer reports to us, please email me at pjaureguionieva@plos.org to let me know, including the name of the previous journal and the manuscript ID the study was given, as well as attaching a point-by-point response to reviewers that details how you have or plan to address the reviewers' concerns.During the process of completing your manuscript submission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF.Given the disruptions resulting from the ongoing COVID-19 pandemic, please expect some delays in the editorial process. We apologise in advance for any inconvenience caused and will do our best to minimize impact as far as possible.Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.Kind regards,PaulaPaula Jauregui, PhDEditorPLOS Biologypjaureguionieva@plos.org27 Feb 2022Dear Dr Ten Bosch,Thank you for submitting your manuscript "Evaluating and optimizing the use of diagnostics during epidemics: Application to the 2017 plague outbreak in Madagascar" for consideration as a Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by several independent reviewers.As you will see in the reviews pasted below, all reviewers appreciate the importance and accuracy of the work, and raise several points about methodology reporting and presentation/interpretation of the results. Please pay close attention to the technical concerns raised by Reviewer 2, and to ensuring that the findings are robust once reviewer reviewer 2's concerns are all clarified or corrected.In light of the reviews, we are pleased to offer you the opportunity to address the comments from the reviewers in a revised version that we anticipate should not take you very long. We will then assess your revised manuscript and your response to the reviewers' comments and we may consult the reviewers and the Academic Editor again. We also request that your please address the following data and other policy-related requests:1) Data: you may be aware of the PLOS Data Policy, which requires that all data be made available without restriction: http://journals.plos.org/plosbiology/s/data-availability. For more information, please also see this editorial: http://dx.doi.org/10.1371/journal.pbio.1001797Note that we do not require all raw data. Rather, we ask for all individual quantitative observations that underlie the data summarized in the figures and results of your paper. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5These data can be made available in one of the following forms:I) Supplementary files (e.g., excel). Please ensure that all data files are uploaded as 'Supporting Information' and are invariably referred to (in the manuscript, figure legends, and the Description field when uploading your files) using the following format verbatim: S1 Data, S2 Data, etc. Multiple panels of a single or even several figures can be included as multiple sheets in one excel file that is saved using exactly the following convention: S1_Data.xlsx (using an underscore).II) Deposition in a publicly available repository. Please also provide the accession code or a reviewer link so that we may view your data before publication.Regardless of the method selected, please ensure that you provide the individual numerical values that underlie the summary data displayed in the following figure panels: Figures 1 A–E , 2 A–D, 3 A–F, 4 A–H, S 3 AB, S 4, S 5 A–D, S 6 A–D, S 7 AB, S 8 A–E, S 9 , Table S1, Table S 2, Table S3.NOTE: the numerical data provided should include all replicates AND the way in which the plotted mean and errors were derived (it should not present only the mean/average values).1.1) Please also ensure that each figure legend in your manuscript includes information on where the underlying data can be found and that your supplemental data file/s has/have a legend.1.2) Please ensure that your Data Statement in the submission system accurately describes where your data can be found.As you address these items, please take this last chance to review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the cover letter that accompanies your revised manuscript.We expect to receive your revised manuscript within 1 month.Please email us (plosbiology@plos.org) if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not intend to submit a revision so that we may end consideration of the manuscript at PLOS Biology.**IMPORTANT - SUBMITTING YOUR REVISION**Your revisions should address the specific points made by each reviewer, as well our editorial requests outlined above. Please submit the following files along with your revised manuscript:1. A 'Response to Reviewers' file - this should detail your responses to the editorial requests, present a point-by-point response to all of the reviewers' comments, and indicate the changes made to the manuscript.*NOTE: In your point by point response to the reviewers, please provide the full context of each review. Do not selectively quote paragraphs or sentences to reply to. The entire set of reviewer comments should be present in full and each specific point should be responded to individually.You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.2. In addition to a clean copy of the manuscript, please also upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type.*Resubmission Checklist*When you are ready to resubmit your revised manuscript, please refer to this resubmission checklist: https://plos.io/Biology_ChecklistTo submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.Please make sure to read the following important policies and guidelines while preparing your revision:*Published Peer Review*Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/*Protocols deposition*To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocolsThank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.Sincerely,DarioDario Ummarino, PhDSenior EditorPLOS Biologydummarino@plos.org*****************************************************REVIEWS:Reviewer's reports:Reviewer #1: This manuscript employs Bayesian latent class analysis to better estimate the test characteristics of diagnostic tests used during an outbreak of plague in Madagascar, as well as to estimate the true prevalence of plague among reported cases. This type of analysis is increasingly being used to better understand medical diagnostics, particularly in cases for which no clear gold standard exists. The authors' work is novel; while the methods used are well-described they have not previously been applied to plague diagnostics. The manuscript is well-written and clear. The methods are also well-described; specifically the authors clearly delineate the prior distributions used for each parameter as well as the type of covariance structure used to evaluate the assumption of conditional independence. While one might have chosen different methods to model conditional dependence (e.g. the random effect model of Qu et al.), the authors' choice is reasonable and appropriate for this purpose.Minor comments:- The authors did not mention what diagnostics were used to assess MCMC convergence; this should be explicitly stated in the methods.- The title for Table S2 would benefit from explicitly stating that this table was based on a model in which the prior for specificity was a uniform beta distribution (this is suggested in the text)-Several of the supplemental figures were not referenced/mentioned in the text. Most of these were fairly well-explained, but it was not clear what Figure S9 was demonstrating. It would be helpful to have some mention of the supplemental figures either in the main text or perhaps in expanded/more descriptive figure legends (starting on line 491).-The authors state that they will make their code available on a specified github site. It would have been helpful for the review to have included the code as a supplemental file.--------------------------Reviewer #2: The authors set out to address a major challenge in epidemiology: uncovering the spread of a pathogen when only imperfect diagnostic tests are available. I found the study to be well designed and enjoyed reading the manuscript. The approach used in this study has potential to be of broad interest for epidemiologists studying wide range of infectious disease. However, I do have some concerns, in particular with: i) the scope of the results, ii) the clarity/comprehensiveness of the Methods, and iii) the clarity of some figures. Provided these concerns can be addressed, I expect the manuscript to be suitable for publication in PLoS Biology.Specific concerns:Scope of results:1. The estimates of prevalence of both BP and PP among notified cases seem very similar to their respective fractions of notified cases that are confirmed (i.e. confirmed using the imperfect diagnostic tests). I feel the study's importance would be much more evident if the deviation between estimated prevalence and fraction confirmed was clearly demonstrated, and its consequences explored. The authors state that "We showed that around one tenth of notified cases were likely to be infected with Yersinia pestis", but to my eye this looks comparable to the fraction of confirmed cases in the data (Figure 1C)?2. The author's results suggest that both "suspected" and "probable cases" are extremely unreliable indicators for whether an individual is infected with plague (with suspected being far worse). Do you know why both are so bad? Also could you state the diagnostic criteria used to determine whether someone is suspected vs. probable? I'm not an expect on plague diagnosis, but I would imagine some of the symptoms that medical practitioners are using to inform their diagnosis (of bubonic plague in particular) are pretty unique, so I'm surprised they are doing so poorly.3.The poor sensitivity of both RDT and culture suggest there would be a massive underestimation of PP cases if they were used as primary diagnostic methods. Could you explain why this underestimation is not the case (I'm trying to compare Figure 1A with Figure 4G taking into account the results in Figure 2).4. To what extent do the sensitivities and specificities shown in Figure 2 deviate from what was known before the study? If the tests come with stated sensitivities and specificities (which may, in light of this study, be inaccurate) it would helpful to state them here.Likelihood function and data:5. Estimating diagnostic test performance. I found the derivation and explanation of the per case likelihood function to be well reasoned and mostly clear, but a few changes would improve clarity.Firstly, on lines 302-305 the diagnostic outcomes are indexed "1,…U" and then introduced explicitly "(i.e. RDT, qPCR (pla and caf1)". The relationship between the different test listed and the indexing (j = 1,…,U) took a bit of time to figure out. I suggest flipping the order, i.e. introducing the four tests performed irrespective of other outcomes and then explain how they are indexed. Also, if U = 4 then explicitly state this (or leave out the parameter U and replace it with 4 throughout). Finally, and this may be a question of personal preference, but when I see "y_ij" my mind thinks elements of a square matrix (e.g. correlation or distance matrices). Given the two indices label different things, I suggest changing one to a non-neighbouring letter (e.g. a or μ)6. Eq. 8 is the likelihood per case. I suggest adding a subscript to the left-hand side to indicate this (e.g.. L_i).7. The main text and figures suggest that various different disaggregations of the data were performed (e.g. by spatial location or time of sampling). I found which data were used in which fit a bit challenging. It would be helpful to summarise/list these in a section of the methods, and explicitly state what was done in each scenario (e.g. something along the lines "the MCMC inference procedure was re-ran for each dataset individually").MCMC:8. The authors should explicitly list of all parameters estimated (perhaps as a table with the priors also in a column). At present the methods state "[w]e used uniform priors between 0 and 1 for most parameters", but I am unclear what those parameters are.9. The authors should state the exact MCMC algorithm used (e.g. "the Metropolis-Hastings algorithm with No-U-Turn sampler"). Additionally the authors should state the implementation of the algorithm (e.g. "implemented in python package pymc3").10. I'm a bit concerned by the use of a weakly-informative beta-distributed prior for prevalence, given it is "based on estimates of prevalence from previous BP outbreaks". One of the findings of this study is that estimates of confirmed BP prevalence can be wrong -- so what's stopping this previous study suffering from the same diagnostic issues as present in the 2017 plague season. Is it necessary to assume this prior distribution and how does it impact on the study findings?Figure 111. I was unclear exactly what the proportions in Figure 1D and E correspond to. Are they the number of samples with a positive test result divided by the number of notified cases? If so, eyeballing panel B suggests that ~40% is too high for bubonic plague and culture, even if all confirmed cases are confirmed via culture. Or is it the proportion of culture samples that were positive? I think the caption needs revising to make clear exactly what is being shown. Furthermore, given the y-axis ranges aren't that different between panels C-E, I suggest making them all the same.12. Are the "confidence intervals" really confidence intervals? The approach here is Bayesian — how are the confidence intervals calculated?13. I wonder if panel A and B might benefit from being plotted on a log or square root scale.Figure 214. Figure 2 would be improved by labelling all y-axis.Figure 314. I think clarity would be improved by changing the caption to read prevalence of infection amongst notified cases (consistent with rest of paper).15. I take it the dashed diagonal line in A and B corresponds to a perfect classifier (sensitivity = specificity = 1). It would be helpful to mention this in the caption.16. I enjoyed the ROC curve plots and found them very informative. However, I was unsure what "conf" and "prob" meant in this context (is it the same as confirmed and probable above?) and how their sensitivity and specificity were calculated. Furthermore, why was only molecular biology plotted here? Would't it be informative to also show the other diagnostic methods?Figure 417. Are these prevalence estimates among notified cases or overall population prevalence estimates (including cases that were not notified)?18. It would be helpful to compare each panel to the respective data from each grouping. E.g. total fraction of notified cases that are confirmed/suspected.19. The multi-y-axis in panels G and H fooled me for a long time. I spent a while confused how estimated infection among notified could be larger than notified. I suggest the authors plot both time series on the same y scale, maybe using a log or square root scale. If the authors want to stick with having a twinned y axis then the right-hand scale also needs labelling to make it clearer. Furthermore, I really think it would be helpful to also show the data on confirmed and suspected cases in these panels — i.e. what is shown in Figure 1A and B.------------------------------------------Reviewer #3: This is a post-hoc analysis of an outbreak of plague caused by the bacterium Yersinia pestis in Madagascar in 2017 to 2018. The goal is to more accurately determine the size, timing, and spatial extent of the outbreak. The issue is that testing criteria for properly diagnosing plague generally have low sensitivity, especially for detection based on culturing of bacteria and a rapid diagnostic test based (RDT) on detecting antigens to proteins of the F1 capsule. Tests based on molecular biology (MB) use qPCR to detect the pla and caf1 genes have higher sensitivity. The problem is that the true gold standard, culturing of bacteria, has especially low sensitivity for sputum samples for pneumonic plague, and only slightly higher sensitivity for samples aspirated from lymph nodes for bubonic plague. A final MB qPCR test for a third gene inv1100 is highly sensitive and makes final confirmation more secure.The main conclusion is that the best estimates for prevalence of confirmed and probable cases among suspected cases was only 5% for pneumonic plague, and 25% for bubonic plague. Thus, the outbreaks were not as large as initially suspected and reported. Further, neither culturing nor RDT provided reliable information. Higher accuracy in the case of both tests could have affected public health responses to the outbreaks. qPCR tests, especially for all three gene fragments, provide the most reliable testing. Nonetheless, the problem is still whether the testing would have provided actionable decisions, based both on how quickly qPCR tests can be turned around, and because prophylactic antibiotic treatment during outbreaks likely both suppressed the outbreak and affected the qPCR test sensitivity.The study nonetheless provides a good framework for analysis of outbreaks, if the sensitivity and specificity of the tests are known.One aspect that is not discussed at all is whether effective testing could lead to more judicious use of antibiotics, and potentially help slow antibiotic resistance by the plague pathogen. Keeping off the antibiotic treadmill as one antibiotic after another loses its effectiveness is a major reason for more accurate and efficient testing.The flow of the paper is difficult to follow. Putting the decision trees for deciding whether a specific case is probable or confirmed into the main text, rather than in supplementary materials, would be helpful.22 May 2022Submitted filename: OptimizingDiagnostics_ReviewComments_PLOSBIOLOGY.pdfClick here for additional data file.30 Jun 2022Dear Dr. Ten Bosch,Thank you for the submission of your revised Research Article "Evaluating and optimizing the use of diagnostics during epidemics: Application to the 2017 plague outbreak in Madagascar" for publication in PLOS Biology. On behalf of my colleagues and the Academic Editor, James Lloyd-Smith, I am pleased to say that we can in principle accept your manuscript for publication, provided you address any remaining formatting and reporting issues. These will be detailed in an email you should receive within 2-3 business days from our colleagues in the journal operations team; no action is required from you until then. Please note that we will not be able to formally accept your manuscript and schedule it for publication until you have completed any requested changes.We suggest a change in the title that you could modify when the operations team contact you. Our suggestion is: "Analytical framework to evaluate and optimize the use of imperfect diagnostics during epidemics to inform public health responses".Please take a minute to log into Editorial Manager at http://www.editorialmanager.com/pbiology/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production process.PRESSWe frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have previously opted in to the early version process, we ask that you notify us immediately of any press plans so that we may opt out on your behalf.We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.Thank you again for choosing PLOS Biology for publication and supporting Open Access publishing. We look forward to publishing your study.Sincerely,Paula---Paula Jauregui, PhD,Senior EditorPLOS Biologypjaureguionieva@plos.org