| Literature DB >> 35378754 |
Jessilyn Dunn, Md Mobashir Hasan Shandhi, Peter Cho, Ali Roghanizad, Karnika Singh, Will Wang, Oana Enache, Amanda Stern, Rami Sbahi, Bilge Tatar, Sean Fiscus, Qi Xuan Khoo, Yvonne Kuo, Xiao Lu, Joseph Hsieh, Alena Kalodzitsa, Amir Bahmani, Arash Alavi, Utsab Ray, Michael Snyder, Geoffrey Ginsburg, Dana Pasquale, Christopher Woods, Ryan Shaw.
Abstract
Mass surveillance testing can help control outbreaks of infectious diseases such as COVID-19. However, diagnostic test shortages are prevalent globally and continue to occur in the US with the onset of new COVID-19 variants, demonstrating an unprecedented need for improving our current methods for mass surveillance testing. By targeting surveillance testing towards individuals who are most likely to be infected and, thus, increasing testing positivity rate (i.e., percent positive in the surveillance group), fewer tests are needed to capture the same number of positive cases. Here, we developed an Intelligent Testing Allocation (ITA) method by leveraging data from the CovIdentify study (6,765 participants) and the MyPHD study (8,580 participants), including smartwatch data from 1,265 individuals of whom 126 tested positive for COVID-19. Our rigorous model and parameter search uncovered the optimal time periods and aggregate metrics for monitoring continuous digital biomarkers to increase the positivity rate of COVID-19 diagnostic testing. We found that resting heart rate features distinguished between COVID-19 positive and negative cases earlier in the course of the infection than steps features, as early as ten and five days prior to the diagnostic test, respectively. We also found that including steps features increased the area under the receiver operating characteristic curve (AUC-ROC) by 7-11% when compared with RHR features alone, while including RHR features improved the AUC of the ITA model's precision-recall curve (AUC-PR) by 38-50% when compared with steps features alone. The best AUC-ROC (0.73 ± 0.14 and 0.77 on the cross-validated training set and independent test set, respectively) and AUC-PR (0.55 ± 0.21 and 0.24) were achieved by using data from a single device type (Fitbit) with high-resolution (minute-level) data. Finally, we show that ITA generates up to a 6.5-fold increase in the positivity rate in the cross-validated training set and up to a 3-fold increase in the positivity rate in the independent test set, including both symptomatic and asymptomatic (up to 27%) individuals. Our findings suggest that, if deployed on a large scale and without needing self-reported symptoms, the ITA method could improve allocation of diagnostic testing resources and reduce the burden of test shortages.Entities:
Year: 2022 PMID: 35378754 PMCID: PMC8978951 DOI: 10.21203/rs.3.rs-1490524/v1
Source DB: PubMed Journal: Res Sq
Figure 1Overview of the Intelligent Testing Allocation (ITA) model, and the CovIdentify cohort, and data.
A. Overview of the ITA model in comparison to a Random Testing Allocation (RTA) model which demonstrates the benefit of using the ITA model over existing RTA methods to improve the positivity rate of diagnostic testing in resource-limited settings. B. A total of 7,348 participants were recruited following informed consent in the CovIdentify study, out of which 1,289 participants reported COVID-19 diagnostic tests (1,157 diagnosed as negative for COVID-19 and 132 diagnosed as positive for COVID-19). C. The top panel shows the time-averaged step count and the bottom panel shows the time-averaged resting heart rate (RHR) of all participants (n=50) in the training set (Extended Data Fig 2, blue) who tested positive for COVID-19 with the pre-defined baseline (between −60 and −22 days from the diagnostic test) and detection (between −21 and −1 days from the diagnostic test) periods marked with vertical black dashed lines. The dark green dashed lines and the light green dash-dotted lines display the baseline period mean and ± 2 standard deviations from the baseline mean respectively. The light purple dashed vertical line shows the diagnostic test date.
Summary of the Cohorts.
Total refers to training + test data
| Cohort | Total N (Test N) | Total COVID+ (Test) | Total COVID− (Test) |
|---|---|---|---|
|
| 520 (105) | 63 (13) | 457 (92) |
|
| 469 (97) | 54 (11) | 415 (86) |
|
| 280 (63) | 40 (7) | 240 (56) |
Features Extracted from the Digital Biomarkers (DBs) for the Development of ITA Algorithm
| Metric | Definition | Equation |
|---|---|---|
|
| ||
| Delta (Δ) | Deviation in digital biomarker from baseline median value | DBDetection – DBBaseline, Median |
| Delta_Normalized | Delta normalized by baseline median value | ((DBDetection – DBBaseline, Median) / DBBaseline, Median) |
| Delta_Standardized | Delta standardized by baseline median and interquartile range (IQR) | ((DBDetection – DBBaseline, Median) / DBBaseline, IQR) |
| Z-score | Deviation in digital biomarker from baseline mean, standardized by baseline standard deviation (SD) | ((DBDetection – DBBaseline, Median) / DBBaseline, SD) |
|
| ||
| Average | Average of inter-day deviation metrics | |
| Median | Median of inter-day deviation metrics | |
| Maximum | Maximum of inter-day deviation metrics | |
| Minimum | Minimum of inter-day deviation metrics | |
| Range | Range of inter-day deviation metrics | |
Figure 2Overview of digital biomarker exploration and feature engineering for the ITA model development on the AF cohort.
A. Time-series plot of the deviation in digital biomarkers (ΔSteps and ΔRHR) in the detection window compared to baseline periods, between the participants diagnosed as COVID-19 positive and negative. The horizontal dashed line displays the baseline median and the confidence bounds show the 95% confidence intervals. B. Heatmaps of steps and RHR features that are statistically significantly different (p-value < 0.05) in a grid search with different DED and DWL combinations, with green boxes showing p-values < 0.05 and gray boxes showing p-values ≥ 0.05. The p-values are adjusted with the Benjamini-Hochberg method for multiple hypothesis correction. C. Summary of the significant features (p-value < 0.05) from B, with each box showing the number of statistically significant features for the different combinations of DED and DWL. The intersection of the significant features across DWL of 3 and 5 days with a common DED of 1 day prior to the test date (as shown using the black rectangle) were used for the ITA model development. D. Box plots comparing the distribution of the two most significant steps and RHR features between the participants diagnosed as COVID-19 positive and negative.
Figure 3Prediction and ranking results of the ITA models on both the training (A-C) and test sets (D-F) for the AF cohort.
A. Receiver operating characteristics curves (ROCs) and B. precision recall curves (PRCs) for the discrimination between COVID-19 positive participants (n=50) and negative participants (n=365) in the training set. The gray area shows one standard deviation from the mean of the ROCs/PRCs generated from 10-fold nested cross-validation on the training set and the red dashed line shows the results based on a Random Testing Allocation (RTA) model (the null model). C. The positivity rate of the diagnostic testing subpopulation as determined by ITA given a specific number of available diagnostic tests. The red dashed line displays the positivity rate/pre-test probability of an RTA (null) model. D. ROC and E. PRC for the discrimination between COVID-19 positive participants (n=13) and negative participants (n=92) in the test set. The red dashed line shows the results based on an RTA model. F. Positivity rate of the diagnostic testing subpopulation as determined by ITA given a specific number of available diagnostic tests. The red dashed line shows the positivity rate of an RTA (null) model.
Figure 4Prediction and ranking results of the ITA models on both the training (A-C) and test sets (D-F) for the participants with FHF wearable data.
A. Receiver operating characteristics curves (ROCs) and B. precision-recall curves (PRCs) for the discrimination between COVID-19 positive participants (n=33) and negative participants (n=184) in the training set. The gray area shows one standard deviation from the mean of the ROCs/PRCs generated from 10-fold nested cross-validation on the training set and the red dashed line shows the results based on a Random Testing Allocation (RTA) model (the null model). C. The positivity rate of the diagnostic testing subpopulation as determined by ITA given a specific number of available diagnostic tests. The red dashed line displays the positivity rate/pre-test probability of an RTA (null) model. D. ROC and E. PRC for the discrimination between Covid-19 positive participants (n=7) and negative participants (n=56) in the test set. The red dashed line shows the results based on an RTA model. F. Positivity rate of the diagnostic testing subpopulation as determined by ITA given a specific number of available diagnostic tests. The red dashed line shows the positivity rate of an RTA (null) model.