| Literature DB >> 34051491 |
Xuan Li1, Jagadeeshkumar Kulandaivelu2, Shuxin Zhang3, Jiahua Shi3, Muttucumaru Sivakumar3, Jochen Mueller4, Stephen Luby5, Warish Ahmed6, Lachlan Coin7, Guangming Jiang8.
Abstract
Wastewater-based epidemiology (WBE) has been regarded as a potential tool for the prevalence estimation of coronavirus disease 2019 (COVID-19) in the community. However, the application of the conventional back-estimation approach is currently limited due to the methodological challenges and various uncertainties. This study systematically performed meta-analysis for WBE datasets and investigated the use of data-driven models for the COVID-19 community prevalence in lieu of the conventional WBE back-estimation approach. Three different data-driven models, i.e. multiple linear regression (MLR), artificial neural network (ANN), and adaptive neuro fuzzy inference system (ANFIS) were applied to the multi-national WBE dataset. To evaluate the robustness of these models, predictions for sixteen scenarios with partial inputs were compared against the actual prevalence reports from clinical testing. The performance of models was further validated using unseen data (data sets not included for establishing the model) from different stages of the COVID-19 outbreak. Generally, ANN and ANFIS models showed better accuracy and robustness over MLR models. Air and wastewater temperature played a critical role in the prevalence estimation by data-driven models, especially MLR models. With unseen datasets, ANN model reasonably estimated the prevalence of COVID-19 (cumulative cases) at the initial phase and forecasted the upcoming new cases in 2-4 days at the post-peak phase of the COVID-19 outbreak. This study provided essential information about the feasibility and accuracy of data-driven estimation of COVID-19 prevalence through the WBE approach.Entities:
Keywords: Artificial neural network; COVID-19; Data-driven models; SARS-CoV-2; Wastewater-based epidemiology
Mesh:
Substances:
Year: 2021 PMID: 34051491 PMCID: PMC8141262 DOI: 10.1016/j.scitotenv.2021.147947
Source DB: PubMed Journal: Sci Total Environ ISSN: 0048-9697 Impact factor: 7.963
Coefficient of determination (R2) determined for robustness analysis using data-driven models in predicting SARS-CoV-2 prevalence with partial input parameters.
| Scenario | Coefficient of determination (R2) | ||
|---|---|---|---|
| MLR | ANN | ANFIS | |
| 1. | 0.58 | 0.79 | |
| 2. | 0.57 | 0.70 | |
| 3. | 0.56 | 0.45 | |
| 4. | 0.55 | 0.44 | |
| 5. | 0.43 | 0.76 | 0.60 |
| 6. | 0.32 | ||
| 7. | 0.51 | 0.75 | |
| 8. | 0.48 | ||
| 9. | 0.44 | ||
| 10. | 0.41 | 0.74 | 0.54 |
| 11. | 0.30 | ||
| 12. | 0.35 | 0.70 | 0.58 |
| 13. | 0.30 | 0.73 | 0.72 |
| 14. | 0.32 | 0.48 | |
| 15. | 0.30 | 0.72 | 0.54 |
| 16. | 0 | 0.33 | 0.21 |
Note: R, average testing ratio/1000 people every 30 days, T, wastewater temperature (°C); T, air temperature (°C); P, community population (×100,000 person); C, the virus RNA concentration (log10 copies/L) in wastewater; F, a categorical factor to different the recovery-corrected results and non-corrected results for CRNA; Q, average daily water consumption (L/person∙day); S, sampling technique (grab or composite); and P, precipitation (mm).
Fig. 1Pairwise Pearson's correlation plot between prevalence data (PWBE) and the nine explanatory factors. The color and size of the circles indicate the strength of Pearson's correlation coefficient (bigger circle = stronger link; blue = positive correlation and red = negative correlation) (A). The db-RDA diagram showing the relationship between the prevalence data and explanatory factors. Prevalence data from 7 publications were identified with different colors (P1-P7), and the countries of the 7 publications were differentiated with shapes. The % value in the RDA axes indicates the % of the total variation explained by each RDA axes (B). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
MLR model coefficients using the complete WBE dataset for the prediction of COVID-19 community prevalence.
| Coefficient | Estimate | Std. error | t value | P(>|t|) | Significance |
|---|---|---|---|---|---|
| Intercept | 579.39 | 151.83 | 3.82 | 1.96 × 10−4 | *** |
| −20.60 | 3.85 | −5.34 | 3.24 × 10−7 | *** | |
| 14.36 | 2.05 | 6.99 | 8.07 × 10−11 | *** | |
| 8.07 | 1.03 | 7.81 | 8.68 × 10−13 | *** | |
| −1.33 | 0.58 | −2.29 | 0.02 | * | |
| −4.97 | 5.39 | −0.92 | 0.36 | ||
| 10.26 | 9.79 | 1.05 | 0.30 | ||
| −2.38 | 0.44 | −5.35 | 3.09 × 10−7 | *** | |
| −149.18 | 33.87 | −4.41 | 1.98 × 10−5 | *** | |
| 0.68 | 6.44 | 0.11 | 0.92 |
P(>|t|) is the probability value using the t-test.
Significance codes represent P values of 0–0.001: ***; 0.001–0.01: **; 0.01–0.05: *.
Fig. 2The outputs of the ANN model (A) and ANFIS model (B), and their correlations with the actual prevalence reported from clinical tests using all of the datasets. Target is the prevalence of active COVID-19 cases reported from the clinical testing. The output is the value obtained from the model predicting the SARS-CoV-2 prevalence using input parameters. The Y = T line is where the y-axis value equals the target value.
Fig. 3Comparison of the output from ANN model and prevalence determined by cumulative cases (Pcum), daily new cases (Pday), weekly new cases (Pweek) and upcoming new cases in the following 2 or 4 days (P2d, P4d) for the initial (pre-peak) stage of an outbreak (A) and post-peak stage of an outbreak (B). Y = X line is where the y-axis value equals the x-axis value.