| Literature DB >> 34334333 |
Liane S Canas1, Carole H Sudre2, Joan Capdevila Pujol3, Lorenzo Polidori3, Benjamin Murray4, Erika Molteni4, Mark S Graham4, Kerstin Klaser4, Michela Antonelli4, Sarah Berry5, Richard Davies3, Long H Nguyen6, David A Drew6, Jonathan Wolf3, Andrew T Chan6, Tim Spector5, Claire J Steves5, Sebastien Ourselin4, Marc Modat4.
Abstract
BACKGROUND: Self-reported symptoms during the COVID-19 pandemic have been used to train artificial intelligence models to identify possible infection foci. To date, these models have only considered the culmination or peak of symptoms, which is not suitable for the early detection of infection. We aimed to estimate the probability of an individual being infected with SARS-CoV-2 on the basis of early self-reported symptoms to enable timely self-isolation and urgent testing.Entities:
Year: 2021 PMID: 34334333 PMCID: PMC8321433 DOI: 10.1016/S2589-7500(21)00131-X
Source DB: PubMed Journal: Lancet Digit Health ISSN: 2589-7500
Demographic information of the study population
| 1 day | 2 days | 3 days | 1 day | 2 days | 3 days | ||
|---|---|---|---|---|---|---|---|
| Training set | 1965 (1·3%) | 1057 (1·7%) | 997 (1·9%) | 144 490 (98·7%) | 60 114 (98·3%) | 52 532 (98·1%) | |
| Testing set | 1158 (7·7%) | 752 (11·1%) | 679 (13·3%) | 13 891 (92·3%) | 5993 (88·9%) | 4439 (86·7%) | |
| Male | |||||||
| Training set | 537 (27·3%) | 276 (26·1%) | 262 (26·3%) | 36 601 (25·3%) | 13 889 (23·1%) | 11 901 (22·7%) | |
| Testing set | 334 (28·8%) | 211 (28·1%) | 193 (28·4%) | 3422 (24·6%) | 1342 (22·4%) | 1001 (22·6%) | |
| Female | |||||||
| Training set | 1428 (72·7%) | 781 (73·9%) | 735 (73·7%) | 107 889 (74·7%) | 46 225 (76·9%) | 40 631 (77·3%) | |
| Testing set | 824 (71·2%) | 541 (71·9%) | 486 (71·6%) | 10 469 (75·4%) | 4651 (77·6%) | 3438 (77·4%) | |
| Training set | 46·7 (14·3) | 46·9 (14·3) | 46·5 (14·2) | 49·3 (13·2) | 49·4 (13·0) | 49·6 (12·8) | |
| Testing set | 50·3 (12·7) | 50·0 (12·5) | 50·0 (12·6) | 50·8 (12·8) | 51·2 (12·6) | 51·2 (12·5) | |
| Training set | 27·4 (6·9) | 27·4 (6·8) | 27·4 (6·8) | 27·2 (7·0) | 27·1 (6·9) | 27·2 (6·9) | |
| Testing set | 27·8 (7·0) | 27·9 (7·2) | 27·7 (7·1) | 27·1 (6·9) | 27·0 (6·8) | 27·1 (6·9) | |
| Training set | 189 (9·6%) | 125 (11·8%) | 113 (11·3%) | 7045 (4·9%) | 2985 (5·0%) | 2463 (4·7%) | |
| Testing set | 48 (4·1%) | 32 (4·3%) | 30 (4·4%) | 649 (4·7%) | 266 (4·4%) | 179 (4·0%) | |
Data are n (%) or mean (SD). Data are stratified by the number of days after symptom onset. BMI=body-mass index.
Denominators are the total number of participants in each set for each day.
Denominators are the training or testing set numbers of participants who are either positive or negative for SARS-CoV-2 each individual day (the first two rows).
Overall performance metrics in the test set
| High sensitivity | High specificity | Optimal threshold | High sensitivity | High specificity | Optimal threshold | ||
|---|---|---|---|---|---|---|---|
| Logistic regression | 0·87 (0·02; 0·85–0·89) | 0·29 (0·02; 0·27–0·30) | 0·43 (0·06; 0·40–0·49) | 0·22 (0·03; 0·19–0·24) | 0·89 (<0·01; 0·89–0·89) | 0·77 (0·05; 0·73–0·81) | 0·64 (0·01; 0·63–0·65) |
| Hierarchical Gaussian process | 0·95 (0·01; 0·94–0·95) | 0·49 (0·04; 0·46–0·53) | 0·76 (0·06; 0·71–0·80) | 0·16 (0·02; 0·15–0·20) | 0·83 (0·02; 0·81–0·85) | 0·57 (0·08; 0·51–0·64) | 0·73 (<0·01; 0·73–0·74) |
| Logistic regression | 0·91 (0·01; 0·90–0·92) | 0·34 (0·01; 0·33–0·35) | 0·58 (0·06; 0·52–0·63) | 0·24 (0·03; 0·21–0·27) | 0·90 (0·01; 0·90–0·91) | 0·73 (0·06; 0·68–0·79) | 0·71 (0·01; 0·70–0·71) |
| Hierarchical Gaussian process | 0·94 (0·01; 0·93–0·95) | 0·57 (0·04; 0·54–0·59) | 0·75 (0·04; 0·72–0·78) | 0·29 (0·03; 0·27–0·31) | 0·84 (0·01; 0·83–0·86) | 0·68 (0·04; 0·64–0·71) | 0·79 (<0·01; 0·78–0·79) |
| NHS algorithm | .. | .. | 0·60 (0·02; 0·58–0·62) | .. | .. | 0·75 (<0·01; 0·75–0·75) | 0·67 (<0·01; 0·67–0·67) |
| Logistic regression | 0·91 (0·03; 0·88–0·94) | 0·36 (0·02; 0·34–0·37) | 0·59 (0·06; 0·54–0·65) | 0·31 (0·04; 0·26–0·33) | 0·91 (0·01; 0·90–0·91) | 0·76 (0·06; 0·71–0·81) | 0·74 (0·01; 0·74–0·75) |
| Hierarchical Gaussian process | 0·95 (0·01; 0·93–0·95) | 0·59 (0·03; 0·57–0·61) | 0·73 (0·05; 0·69–0·77) | 0·31 (0·04; 0·28–0·35) | 0·85 (0·01; 0·84–0·86) | 0·72 (0·02; 0·70–0·73) | 0·80 (<0·01; 0·80–0·81) |
Data are mean (SD; 95% CI). For the NHS algorithm, symptoms could be recorded on any of the 3 days. A Mann–Whitney U test was used to assess statistical significance. AUC=area under the receiver operating characteristic curve. NHS=National Health Service.
p<0·01.
p<0·05.
Statistically different from the hierarchical Gaussian process model.
Statistically different from the hierarchical Gaussian process model and the logistic regression model proposed by Menni and colleagues.
Figure 1Feature relevance by occupation
Symptoms are grouped according to their clinical manifestations: gastrointestinal symptoms and other symptoms (yellow sector), flu-like symptoms (green sector), neurological symptoms (purple sector), and cardiac and respiratory symptoms (white sector). The grey line represents overall symptom relevance without stratification. Points further from the centre correspond to a higher relevance. Relevance is normalised for direct interpretation.
Figure 2Feature relevance by sex
Symptoms are grouped according to their clinical manifestations: gastrointestinal symptoms and other symptoms (yellow sector), flu-like symptoms (green sector), neurological symptoms (purple sector), and cardiac and respiratory symptoms (white sector). The grey line represents overall symptom relevance without stratification. Points further from the centre correspond to a higher relevance. Relevance is normalised for direct interpretation.
Figure 3Feature relevance by age group
Symptoms are grouped according to their clinical manifestations: gastrointestinal symptoms and other symptoms (yellow sector), flu-like symptoms (green sector), neurological symptoms (purple sector), and cardiac and respiratory symptoms (white sector). The grey line represents overall symptom relevance without stratification. Points further from the centre correspond to a higher relevance. Relevance is normalised for direct interpretation.
Figure 4Feature relevance by BMI category
Symptoms are grouped according to their clinical manifestations: gastrointestinal symptoms and other symptoms (yellow sector), flu-like symptoms (green sector), neurological symptoms (purple sector), and cardiac and respiratory symptoms (white sector). The grey line represents overall symptom relevance without stratification. Points further from the centre correspond to a higher relevance. Relevance is normalised for direct interpretation. BMI=body-mass index.