| Literature DB >> 35068405 |
Md S Bhuiyan1, Ben J Brintz2, Alana L Whitcombe3,4, Alena J Markmann5, Luther A Bartelt5, Nicole J Moreland3,4, Andrew S Azman6,7, Daniel T Leung1,8.
Abstract
Serosurveillance is an important epidemiologic tool for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), used to estimate infection rates and the degree of population immunity. There is no general agreement on which antibody biomarker(s) should be used, especially with the rollout of vaccines globally. Here, we used random forest models to demonstrate that a single spike or receptor-binding domain (RBD) antibody was adequate for classifying prior infection, while a combination of two antibody biomarkers performed better than any single marker for estimating time-since-infection. Nucleocapsid antibodies performed worse than spike or RBD antibodies for classification, but can be useful for estimating time-since-infection, and in distinguishing infection-induced from vaccine-induced responses. Our analysis has the potential to inform the design of serosurveys for SARS-CoV-2, including decisions regarding a number of antibody biomarkers measured.Entities:
Keywords: Antibody; COVID-19; SARS-CoV-2; machine learning; seroprevalence; serosurveillance
Mesh:
Substances:
Year: 2022 PMID: 35068405 PMCID: PMC8795773 DOI: 10.1017/S0950268821002764
Source DB: PubMed Journal: Epidemiol Infect ISSN: 0950-2688 Impact factor: 2.451
Summary of the characteristics of datasets used in this analysis, with cross-validated AUC (95% CI) from classifying the previous infection on four of the published datasets
| Dan | Peluso | Whitcombe | Markmann | Isho | |
|---|---|---|---|---|---|
| No. of patients | 101 | 122 | 112 | 156 | 343 |
| Age, Median (IQR) | 38 (29.5–46.5) | 48 (19–85) | 48 (31–57) | 46.5 (32–60) | 36 (21–44) |
| Sex (male, %) | 45 | 54.9 | 43.7 | 44.0 | 52.1 |
| % patients with severe disease (hospitalised) | 7 | 24.2 | 13.7 | 17.7 | n/a |
| Median time since infection (IQR) | 62 (47.5–87) | 115 (95–124) | 109 (94.7–190.5) | 57 (49–79) | 72 (51–86) |
| Antibody isotypes measured | IgA, IgG | IgG | IgA, IgG, IgM | IgA, IgG, IgM | IgA, IgG, IgM |
| Immunoassay platform | ELISA | ELISA/Luminex | Luminex | ELISA/Luminex | ELISA |
| RBD IgG | n/a | 97.8 (97.2–98.5) | 99.7 (99.4–99.9) | 98.7 (97.9–99.5) | 96.9 (96.6–97.3) |
| Spike IgG | n/a | 98.8 (98.3–99.2) | 99.7 (99.5–99.9) | n/a | 99.1 (98.8–99.3) |
| Nucleocapsid IgG | n/a | 97.9 (97.4–98.4) | 93.9 (93.2–94.5) | 98.9 (98.2–99.5) | 94.4 (93.9–94.9) |
| Nucleocapsid IgG vs. best of spike/RBD IgG. Mean ( | n/a | 0.27 (0.27), 0.24 | 0.38 (0.28), 0.11 | 0.36 (0.30), 0.15 | 0.40 (0.28), 0.1 |
| Best two biomarkers | n/a | 98.8 (98.4–99.2) | 99.9 (99.9–99.9) | 99.2 (98.6–99.8) | 99.4 (99.2–99.5) |
| Spike/RBD IgG vs. best two biomarkers. Mean ( | 0.62 (0.31), 0 | 0.28 (0.28), 0.22 | 0.23 (0.29), 0.28 | 0.19 (0.26), 0.48 | |
| Best Three biomarkers | n/a | 99.0 (98.7–99.4) | 99.9 (99.9–99.9) | n/a | 99.2 (99.1–99.4) |
| Best two vs. best three biomarkers. Mean ( | n/a | 0.30 (0.31), 0.11 | 0.35 (0.29), 0.14 | n/a | 0.16 (0.25), 0.48 |
| Best two nucleocapsid biomarkers | n/a | n/a | 95.0 (94.5–95.5) | n/a | 95.9 (95.5–96.2) |
| Full/saturated model | n/a | 99.0 (98.8–99.3) | 99.9 (99.9–99.9) | n/a | 99.4 (99.3–99.5) |
The rows with row-name starting with ‘Best’ include a screening step in which the biomarkers are ordered by importance for classification (ever-infected) using the random forest conditional permutation algorithm and only the top biomarkers from that iteration are used when training the model.
Mean (standard deviation) of MAE from predicting time since infection from repeated cross-validation on five published datasets
| Dan | Peluso | Whitcombe | Markmann | Isho | |
|---|---|---|---|---|---|
| Antibody isotypes measured | IgA, IgG | IgG | IgA, IgG, IgM | IgA, IgG, IgM | IgA, IgG, IgM |
| Best RBD IgG | 18.7 (1.7) | 29.2 (3.7) | 59.8 (5.4) | 21.4 (5.7) | 17.3 (1.1) |
| Best Spike IgG | 19.6 (1.9) | 26.6 (3.6) | 61.1 (6.9) | n/a | 17.5 (1.3) |
| Nucleocapsid IgG | 18.8 (2.0) | 23.9 (4.0) | 55.9 (7.4) | 21.2 (6.3) | 18.3 (1.4) |
| Best Two biomarkers | 17.1 (1.9) | 22.6 (4.4) | 53.1 (5.1) | 20.9 (5.3) | 15.7 (1.4) |
| Best Three biomarkers | 17.1 (1.9) | 22.4 (4.0) | 52.5 (5.3) | n/a | 15.3 (1.4) |
| Best two Nucleocapsid biomarkers | n/a | 24.7 (3.9) | 51.3 (7.1) | n/a | 17.5 (1.1) |
| Full/saturated model | 17.8 (1.9) | 22.2 (3.8) | 51.5 (6.3) | 20.6 (5.9) | 15.1 (1.1) |
The rows with row-name starting with ‘Best’ include a screening step in which the biomarkers are ordered by importance for time-since-infection using the random forest conditional permutation algorithm and only the top biomarkers from that iteration are used when training the model (low MAE indicates better performance).
Fig. 1.Conditional permutation variable importance from random forest regression measured by mean decrease in accuracy. Negative importance indicates that the variables inclusion has decreased mean accuracy, probably due to overfitting or random error. Each column represents the order of importance of biomarkers in five datasets. In Peluso et al. dataset, S_Ortho_Ig and S_Ortho_IgG indicate total Ig and S IgG by Ortho Clinical Diagnostics VITROS kits; N_abbott indicate Abbot ARCHITECT (IgG); S_DiaSorin is Spike IgG by DiaSorin LIASON(IgG); Neu_Monogram is Monogram PhenoSense (neutralising antibodies); RBD_LIPS, S_LIPS, N_LIPS is IgG by Luciferase Immunoprecipitation System (LIPS); RBD_Split_Luc, N_Split_Lum, S_Lum, N.full_Lum, N.frag_Lum indicate IgG to respective antigens by Luminex assay.