| Literature DB >> 35618714 |
Fergus J Chadwick1,2, Jessica Clark3,4, Shayan Chowdhury5, Tasnuva Chowdhury3, David J Pascall6, Yacob Haddou3,4, Joanna Andrecka7, Mikolaj Kundegorski4,8, Craig Wilkie4,8, Eric Brum7, Tahmina Shirin9, A S M Alamgir9, Mahbubur Rahman9, Ahmed Nawsher Alam9, Farzana Khan9, Ben Swallow4,8, Frances S Mair10, Janine Illian4,8, Caroline L Trotter11, Davina L Hill3,4, Dirk Husmeier8, Jason Matthiopoulos3,4, Katie Hampson3,4, Ayesha Sania12.
Abstract
Diagnostics for COVID-19 detection are limited in many settings. Syndromic surveillance is often the only means to identify cases but lacks specificity. Rapid antigen testing is inexpensive and easy-to-deploy but can lack sensitivity. We examine how combining these approaches can improve surveillance for guiding interventions in low-income communities in Dhaka, Bangladesh. Rapid-antigen-testing with PCR validation was performed on 1172 symptomatically-identified individuals in their homes. Statistical models were fitted to predict PCR-status using rapid-antigen-test results, syndromic data, and their combination. Under contrasting epidemiological scenarios, the models' predictive and classification performance was evaluated. Models combining rapid-antigen-testing and syndromic data yielded equal-to-better performance to rapid-antigen-test-only models across all scenarios with their best performance in the epidemic growth scenario. These results show that drawing on complementary strengths across rapid diagnostics, improves COVID-19 detection, and reduces false-positive and -negative diagnoses to match local requirements; improvements achievable without additional expense, or changes for patients or practitioners.Entities:
Mesh:
Year: 2022 PMID: 35618714 PMCID: PMC9135686 DOI: 10.1038/s41467-022-30640-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Breakdown of patient numbers by age and gender, in relation to case positivity by PCR and reported symptoms (both as % rounded to nearest integer).
| Symptoms (%) | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Age (years) | Gender | Count | Positivity rate (%) | Breathing problems | Cough (any) | Cough (dry) | Cough (wet) | Diarrhoea | Ongoing fever | Headache | Loss of smell | Loss of taste | Muscle pain | Red eyes | Runny nose | Sore throat | Tiredness | Vomiting |
| 16–25 | Women | 124 | 19 | 23 | 73 | 69 | 19 | 4 | 94 | 77 | 38 | 51 | 52 | 10 | 49 | 43 | 73 | 19 |
| 16–25 | Men | 157 | 20 | 20 | 74 | 72 | 22 | 5 | 91 | 73 | 44 | 45 | 50 | 10 | 36 | 42 | 62 | 13 |
| 26–35 | Women | 144 | 17 | 25 | 72 | 70 | 19 | 10 | 90 | 75 | 35 | 42 | 51 | 4 | 40 | 43 | 69 | 7 |
| 26–35 | Men | 178 | 26 | 26 | 80 | 78 | 14 | 10 | 89 | 74 | 38 | 38 | 49 | 7 | 38 | 33 | 69 | 16 |
| 36–45 | Women | 101 | 26 | 28 | 79 | 77 | 25 | 4 | 93 | 78 | 38 | 48 | 53 | 5 | 47 | 42 | 72 | 18 |
| 36–45 | Men | 119 | 24 | 23 | 75 | 71 | 18 | 7 | 89 | 71 | 38 | 38 | 55 | 8 | 39 | 41 | 67 | 8 |
| 46–55 | Women | 66 | 20 | 17 | 74 | 74 | 15 | 3 | 86 | 70 | 32 | 32 | 55 | 0 | 35 | 33 | 58 | 15 |
| 46–55 | Men | 58 | 22 | 16 | 55 | 55 | 14 | 2 | 84 | 57 | 34 | 34 | 52 | 10 | 45 | 33 | 69 | 7 |
| 56+ | Women | 57 | 23 | 25 | 72 | 68 | 23 | 11 | 84 | 54 | 33 | 30 | 49 | 4 | 32 | 26 | 60 | 14 |
| 56+ | Men | 61 | 26 | 30 | 66 | 64 | 15 | 5 | 77 | 59 | 41 | 36 | 49 | 8 | 36 | 23 | 52 | 11 |
| All | 1065 | 22 | 23 | 74 | 71 | 19 | 7 | 89 | 71 | 38 | 41 | 51 | 7 | 40 | 38 | 66 | 13 | |
Although age is binned here, raw age in years was used for analyses. Furthermore, in the survey non-binary genders were permitted but none reported.
Fig. 1Model predictive performance.
Predictive performance of candidate models was measured using out-of-sample cross-entropy. Combined posterior median and interquartile ranges for n = 1172 biologically independent individuals predicted under temporally structured cross-validation. Cross-entropy shows the most generalised-level of model predictive power, assessing performance in the probability scale without requiring classification threshold decisions. A cross-entropy of zero indicates a model that predicts with certainty the correct result each time. A random classifier for the problem scored 11.54. Interquartile ranges are shown for the posterior cross-entropy of the best candidate models at each level of model complexity tested under temporal cross-validation. The intermediate complexity models perform best at prediction, although performance is similar across all the models within each model class. There was a marked decline in predictive power at more than four symptoms, leading us to choose this as the maximum complexity model in our candidate models. Model classes are colour-coded, the rapid antigen test only (RAT-only) model is purple, Syndromic-only model is teal, and the Syndromc-RAT Combined model is yellow.
Fig. 2Generic model classification performance.
Median (grey dots) and interquartile ranges for receiver operating characteristics (ROC) for rapid antigen testing only approach (purple) and posterior median and interquartile range ROC for Syndromic-only (teal) and Syndromic-Rapid Antigen Test (RAT) Combined (yellow) models for n = 1172 biologically independent individuals predicted under temporally structured cross-validation. In the RAT-only model, the ROC is a single value (i.e., a dot rather than a line) as the binary test has a single sensitivity and specificity. In Syndromic-only and Syndromic-RAT Combined classes, the ROC values demonstrate the performance of the model for any hypothetical scenario as defined by the axes (as opposed to Fig. 5 which demonstrates model performance in specific epidemiological scenarios which are realisations of single points in this space). While ROC plots are often plotted as curves, we do not have continuous probability values due to the binary nature of predictor symptoms. This is important as discontinuity in the probabilities impacts the sensitivity of the model to classification thresholds, such as those used in the scenarios below.
Fig. 5Model selection procedure.
Rounds of model selection in the multivariate probit component of the Syndromic-only and Syndromic-Rapid Antigen Test (RAT) Combined models. With 14 symptoms (5 shown for demonstration purposes) and 2 covariates there are over 131,000 possible model combinations. To make exploring these models computationally feasible and to reduce the risk of overfitting, we carried out two rounds of model selection. A subset of symptoms are identified using the strength of posterior correlation between each symptom and PCR status identified by the corresponding model, with the weakest correlated symptoms removed during each round of selection. From this subset of symptoms, a more exhaustive search of potential models is then conducted to identify the best symptom-covariate relationships, using temporal cross-validation to measure model performance. The best model for each level of complexity (i.e., number of symptoms) are then used as our candidate models. Only these final models are used for classification. This reduces the set of models tested as classifiers from >131,000 to just four per model class.
Fig. 3Performance of models under three epidemiological scenarios.
Combined posterior median and interquartile ranges of error rates for n = 1172 biologically independent individuals predicted under temporally structured cross-validation. In the Agnostic Scenario, the model is maximises the correct classification rate with error measured as the sum of the false-positive and false-negative rates. In the Epidemic Growth Scenario, a maximum false-negative rate of 20% is permitted, and the error is measured as the false-positive rate. In the Declining Incidence scenario, a maximum false-positive rate of 20% is permitted, and the error is measured as the false-negative rate. These requirements were determined through discussion with colleagues at the Institute of Epidemiology and Disease Control (IEDCR), Bangladesh. The plot shows the posterior median and interquartile range for scenario-specific errors. Lower errors correspond to better model performance. There is no error rate defined for the rapid antigen testing only model (RAT-only) in the Epidemic Growth Scenario as the model failed to meet the requirement for that scenario (indicated by grey bar). Model classes are colour-coded, the RAT-only model is purple, the Syndromic-only model is teal, and the Syndromc-RAT Combined model is yellow.
Requirements and performance criteria for each epidemiological scenario.
| Scenario name | Requirement | Performance criterion (error) |
|---|---|---|
| 1 Agnostic | Maximise correct classification rates | Sum of error rates |
| 2 Epidemic growth | <20% false-negative rate | False-positive rate |
| 3 Declining incidence | <20% false-positive rate | False-negative rate |
The requirement refers to a base level of performance the model must achieve, allowing the more flexible models to be adapted to meet that requirement as closely as possible (e.g., by determining a classification threshold). These requirements were determined through discussion with colleagues at the Institute of Epidemiology and Disease Control (IEDCR), Bangladesh, using internal resource projections. The performance criterion is used to determine which model performs the ’best’ given that the requirement has been met.
Fig. 4Schematic description of identification of likely COVID-19 cases by community support teams (CSTs) and model definitions.
CSTs collect syndromic data (age, gender and presence/absence of 14 predetermined symptoms), and two sets of naso-pharyngeal swabs (for rapid antigen testing and PCR). We used three model classes: rapid antigen test only in 1, syndromic data only in 2, and both rapid antigen test and syndromic data in 3. The PCR result is used to train and test each model using temporal cross-validation.