Literature DB >> 32882315

The estimation of diagnostic accuracy of tests for COVID-19: A scoping review.

Dierdre B Axell-House1, Richa Lavingia2, Megan Rafferty3, Eva Clark4, E Susan Amirian5, Elizabeth Y Chiao6.   

Abstract

OBJECTIVES: To assess the methodologies used in the estimation of diagnostic accuracy of SARS-CoV-2 real-time reverse transcription polymerase chain reaction (rRT-PCR) and other nucleic acid amplification tests (NAATs) and to evaluate the quality and reliability of the studies employing those methods.
METHODS: We conducted a systematic search of English-language articles published December 31, 2019-June 19, 2020. Studies of any design that performed tests on ≥10 patients and reported or inferred correlative statistics were included. Studies were evaluated using elements of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) guidelines.
RESULTS: We conducted a narrative and tabular synthesis of studies organized by their reference standard strategy or comparative agreement method, resulting in six categorizations. Critical study details were frequently unreported, including the mechanism for patient/sample selection and researcher blinding to results, which lead to concern for bias.
CONCLUSIONS: Current studies estimating test performance characteristics have imperfect study design and statistical methods for the estimation of test performance characteristics of SARS-CoV-2 tests. The included studies employ heterogeneous methods and overall have an increased risk of bias. Employing standardized guidelines for study designs and statistical methods will improve the process for developing and validating rRT-PCR and NAAT for the diagnosis of COVID-19.
Copyright © 2020. Published by Elsevier Ltd.

Entities:  

Keywords:  COVID-19; Diagnostic accuracy; QUADAS-2; SARS-CoV-2; Sensitivity; Specificity

Mesh:

Substances:

Year:  2020        PMID: 32882315      PMCID: PMC7457918          DOI: 10.1016/j.jinf.2020.08.043

Source DB:  PubMed          Journal:  J Infect        ISSN: 0163-4453            Impact factor:   6.072


Introduction

After its emergence in December 2019, the virus now known as SARS-CoV-2 was identified and sequenced in early January 2020, allowing for the rapid development of diagnostic testing based on the detection of viral nucleic acid (i.e., real-time reverse transcription polymerase chain reaction [rRT-PCR]). Because infected patients can present with non-specific symptoms or be asymptomatic, the development of accurate diagnostic tests for both clinical and epidemiological purposes was a crucial step in the response to the COVID-19 pandemic. In the United States, the spread of SARS-CoV-2 rapidly outpaced the capacity to test for it, resulting in the Food and Drug Administration (FDA) relaxing regulatory requirements to increase testing availability. The FDA granted the first Emergency Use Authorization (EUA) for a SARS-CoV-2 rRT-PCR diagnostic test on February 4, 2020. Consequently, hundreds of tests for SARS-CoV-2, among them rRT-PCRs, other types of nucleic acid amplification tests (NAATs), and automated and/or multiplex methods based on proprietary platforms, obtained FDA Emergency Use Authorization (EUA). As of August 4th, 2020, the FDA has granted EUAs to 203 diagnostic tests, including 166 molecular tests, 35 antibody assays, and 2 antigen tests. Although the FDA began requiring the submission of validation methods and results as part of EUA application for SARS-CoV-2 diagnostic tests, these tests were not initially required to undergo the rigorous assessment that would normally be part of the FDA approval process. Researchers also began developing alternative nucleic-acid based methodologies to detect SARS-CoV-2, including reverse-transcription loop-mediated isothermal amplification (RT-LAMP), and others. Concurrently with rapid test production, publications emerged reporting clinical diagnostic test performance characteristics, such as “sensitivity” and “specificity”, though some lacked the rigorous methodologies usually required to formally estimate diagnostic accuracy. Here we present a scoping review of the literature with two main objectives: 1) to assess the methodologies used in the estimation of diagnostic accuracy of SARS-CoV-2 tests and 2) to evaluate the quality and reliability of the studies employing those methods.

Methods

Data sources and searches

Searches were performed through MEDLINE (Ovid), EMBASE (Elsevier), Scopus, Web of Science, CINAHL, and PubMed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines between December 1, 2019 and June 19, 2020. The following search string was used: (2019-nCOV or SARS-CoV-2 or SARS-CoV2 or COVID-19 or COVID19 or COVID) and (“positive agreement” or “negative agreement” or “overall agreement” or “diagnostic accuracy” or “positive rate” or “positivity rate” or “test performance” or “reference standard” or “gold standard” or sensitivity or specificity or “percent agreement” or “concordance” or “test agreement” or “predictive value” or "false negative" or “false positive”) and (“polymerase chain reaction” or PCR or “reverse transcriptase” or “nucleic acid amplification test” or NAAT or isothermal or “RT-LAMP” or “RT-PCR” or “molecular test”). The literature hub LitCovid's “Diagnosis” section was screened in its entirety once and then daily for relevant titles.

Study selection

We liberally screened articles by title and abstract for further evaluation. Articles were included if they met the following criteria on screening: 1) Peer-reviewed publication, 2) Study evaluated diagnostic test accuracy of NAAT, 3) Diagnostic test performed on ≥10 patients, 4) Diagnostic/Clinical sensitivity, specificity, other correlative statistics, or test positive rate were either identified by name or were included in the publication as a numerical value and we could reproduce the calculations. Exclusion criteria included: 1) Pre-print status, 2) Guidelines, consensus, review, opinion, and other summary articles 3) Entirely pregnant or pediatric populations, 4) Overlap of study population with another included publication.

Data extraction and quality assessment

Four authors independently extracted data and two authors reviewed data for accuracy. For study characteristics, we extracted: first author name, country, study design, patient population, total number of patients or samples included in test performance calculations, and number of cases according to rRT-PCR (Table 1, Table 2, Table 3, Table 4, Table 5 ) or total number of cases based on positive result of any platform tested (Table 6 ). For patient characteristics, we extracted age and sex. For index test and reference standard characteristics, we extracted: test type (NAAT) or definition (clinical diagnosis, composite reference standards), specimen (NAAT), specimen dry/collection liquid status (for studies evaluating Abbott ID NOW), proprietary automated and/or multiplex systems – henceforth called “platforms” (NAAT), and target genes of primers (NAAT). For outcomes, we extracted the values of test performance characteristics with their designation according to the original authors, without our interpretation. For this reason, we indicate these outcomes as “reported” (r): reported sensitivity (rSN), specificity (rSP), positive predictive value (rPPV), negative predictive value (rNPV), accuracy (rAcc), positive percent agreement (rPPA), negative percent agreement (rNPA), overall agreement (rOA), and Kappa coefficient. Additionally, we extracted “positive rate,” a non-standard term used by the included studies to refer to the number of positive NAATs in a population of patients suspected to have COVID-19 (Table 1), or to the number of positive samples in a total population of positive samples after repeat testing (Table 2). We constructed 2 × 2 contingency tables and reproduced test performance characteristic calculations to demonstrate the methods of how the original authors obtained the values (Supplementary Table 1). We report additional pertinent study data in Supplementary Table 2: enrollment dates, number of sites of enrollment, symptomatic status, and chest radiology status. No articles were excluded on the basis of quality in order to present the most comprehensive summary of the currently available evidence.
Table 1

Studies reporting the “positive rate” of rRT-PCR testing within a population of patients suspected to have COVID-19.

AuthorsCountryStudy TypeNo. PatientsDemographicsIndex TestReference Standard: Case Definition/Clinical DiagnosisStudy Findings
TotalrRT-PCR positive*Age (y)% MaleType Specimen (No.)PrimersrRT-PCR Kit CompanyPR§ (95% CI)
Ai et al. 7ChinaCross Sectional||101460151 ± 1546.0%rRT-PCR Throat Swab (1014)ORF1ab, NShanghai Huirui Biotech, Shanghai BioGerm“patients…who were suspected of novel coronavirus infection” (p. 5)59%  (56 - 62)
Liu et al. 8ChinaCross Sectional||4880187550  (IQR 27)46.13%rRT-PCR Nasal swab, pharyngeal swab, BAL, sputumORF1ab, NShanghai Huirui Biotech“All the cases were suspected of SARS-CoV-2 infection because of, (1) typical respiratory infection symptoms such as fever, cough and dyspnea, or (2) close contact with a COVID-19 patient.” (p.172)38.42%
Xie et al. 9ChinaCross Sectional||19933  (8 - 62)42.1%rRT-PCR OP swab (19)ORF1ab, NGeneoDx, Maccura, Life-river“…suspected cases…” (p. 264)47.4%

Reported instead of “cases according to reference standard” as present in other tables.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD.

PR: positive rate, which is the number of rRT-PCR patients out of the number of patients suspected to have COVID-19 (i.e. the reference standard). Patient population.

Hospitalized patients. Abbreviations- BAL: bronchoalveolar lavage, CI: confidence interval, IQR: Interquartile range, N: nucleocapsid, OP: oropharyngeal, ORF1ab: open reading frame 1ab, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, y: years.

Table 2

Studies reporting test performance characteristics of initial rRT-PCR result compared to result after repeated tests of rRT-PCR as reference standard.

AuthorsCountryStudy TypeDemographicsSpecimen Type  (No.)Primers/ Platform  No. samples (%)No. pts with 1st rRT-PCR positiveTotal No. of pts with rRT-PCR ever positiveTotal No. of pts in calculations*Interval (d) between each re-testNo. of tests performed per pt until positiveInterval (d) between initial and positive (final) rRT-PCRStudy FindingsPR
Age (y)% MaleCorrelative Statistics (95% CI)
rSNrSPrAcc
Bernheim et al. 10ChinaCases only||45.3 ± 15.650.4%NPS, OPS, Trach Asp, BAL (nr)ORF1ab, NrRT-PCR90102102nr1 test: 90>1 test: 12nr88%
Fang et al. 11ChinaCases only||45 (IQR 39–55)56.9%Throat swab (45), sputum (6)ORF1ab, NrRT-PCR365151≥11 test: 362 tests: 123 tests: 24 tests: 1nr71% (56–83)
Green et al. 12USACohort||53.1 ± 22.345.5%NPS, OPS (nr)RdRp, ERoche cobas19,195 (70.1%)10,07017,405⁎⁎22,061median 8range(1 – 49)1 test: 10,070>1 test: 7335⁎⁎1 - 49Lower bound estimate:⁎⁎57.9%(55.2–60.5)
N2, E Cepheid Xpert6219 (22.7%)
N1, N2rRT-PCR1884 (6.9%)
10,643⁎⁎22,061median 8range(1 – 49)1 test: 10,070>1 test: 573⁎⁎1 - 49Upper bound estimate:⁎⁎ 94.6%(94.2–95.0)
RdRpAbbott ID NOW53 (0.2%)
ORF1ab (x2)Hologic Panther26 (0.1%)
He et al. 13Hong KongCase-Control||52 (8 - 74)50%NPS, OPS, Trach Asp, BAL (nr)RdRp, SrRT-PCR273482nr1 test: 27>1 test: 71–1479% (66–93)100% (100)92% (91–92)
Lee et al. 14SingaporeCases only||nrnrNPS (70)ORF1ab, NrRT-PCR6270701st-2nd: 12nd-Xth: 1–21 test: 622 tests: 53 tests: 15 tests: 16 tests: 11st-2nd: 12nd-Final: 2,4,788.6%
Long et al. 15ChinaCases only||44.8 ± 18.255.6%OPS, NPS (nr)ORF1ab, NrRT-PCR303636nr1 test: 302 tests: 33 tests: 32 - 883.3%
Wong et al. 16Hong KongCases only||56 ± 1940.6%NPS, throat swab (nr)RdRp/HelrRT-PCR586464nr, “not uniform”1 test: 58>1 test: 6nr91% (83–97)
Wu et al. 17ChinaCases only||46.1 ± 15.448.8%Nose swab, throat swab (nr)ORF1ab, NrRT-PCR41808011 test: 412 tests: 303 tests: 91 - 251.25%

Pts included in test performance calculations.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD. Patient Population.

Hospitalized patients. Abbreviations- BAL: Bronchoalveolar lavage, CI: confidence interval, d: days, E: envelope, Hel: helicase, IQR: Interquartile range, N: nucleocapsid, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, PR: positive rate, pts: patients, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, rSN: reported sensitivity, rSP: reported specificity, S: spike, Trach Asp: Tracheal Aspirate, y: years.

Sensitivity estimates for the first test conducted on patients were calculated based on different assumptions about true negatives. The estimate of the upper bound estimate assumes that any negative test results (whether negative on a single test or consistently negative across multiple, repeated tests) was a true negative (aka, false negative rate=0%). The estimate of the lower bound uses the proportion of repeatedly tested cases who initially tested negative but then tested positive in repeated tests to calculate a false negative rate (16.8%) and apply that rate to the patients who only received a single test to calculate an assumed number of false negative cases. Additional details are provided in Suppl. Fig. 1 and Green et al.

Table 3

Studies that calculate test performance characteristics of rRT-PCR or automated rRT-PCR platforms compared to composite reference standards.

AuthorsCountryStudy TypeNo. PatientsDemographicsSpecimenIndex TestComposite Reference Standard DefinitionStudy Findings (95% CI)
TotalCases*Age (y)% MTypePrimersPlatformrSNrSPrPPVrNPVrAccrPPArNPACohen's κ
Cradic et al. 18USACohort||18433nrnrNPS in VTM (184)Automated Multiplex rRT-PCRORF1ab, SDiasorin SimplexaResult obtained from at least 2 of the 3 assays is consensus result.100% (90–100)100% (98–100)
Automated Isothermal NAATRdRpAbbottID NOW91% (79–97)100% (98–100)
Automated Multiplex rRT-PCRORF1, ERoche cobas 6800100% (90–100)100% (98–100)
Suo et al. 19,aChinaCohort⁎⁎5852nrnrThroat swab (58)Initial rRT-PCR (China CDC protocol)ORF1ab, NN/APositive result of repeated rRT-PCR, or serology is considered a positive result.40% (27–55)100% (54–100)100% (N/A)16% (13–19)47% (33–60)
Zhen & Mangi et al. 20USACase Control††10451nrnrNPS (104)rRT-PCR  (US CDC protocol)N1, N2N/AResult obtained by 3 out of 4 assays tested is consensus result100% (93–100)98% (89–99)0.98 (0.94–1)
Automated multiplex rRT-PCRORF1ab, SDiasorinSimplexa100% (93–100)100% (93–100)1.0 (0.99–1)
Automated rRT-PCR w/sensorNGenMark ePlex96% (87–99)100% (93–100)0.96 (0.91–1)
Automated multiplex rRT-PCRORF1ab (2 targets)Hologic Panther100% (93–100)96% (87–99)0.96 (0.91–1)

Cases according to composite reference standard.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD. Patient Population.

Hospitalized patients.

Emergency Room patients.

Outpatients, some of whom were later hospitalized.

not reported.

Suo et al. data is also present in Table 4. Abbreviations- E: envelope, IQR: Interquartile range, : kappa coefficient, M: male, N/A: not applicable, N: nucleocapsid, NPS: nasopharyngeal swab, nr: not reported, ORF1ab: open reading frame 1ab, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rPPA: reported positive percent agreement, rPPV: reported positive predicted value, rSN: reported sensitivity, rSP: reported specificity, S: spike, VTM: viral transport media, y:years.

Table 4

Studies reporting test performance characteristics of other nucleic acid amplification test methods compared to rRT-PCR.

AuthorsCountryStudy TypeNo. PatientsDemographicsSpecimen (No.)Index TestRef Stnd: rRT-PCR
Study Findings (95% CI or p-value)
TotalCases*Age (y)% MaleTypePrimersPlatform
PrimersrSNrSPrPPVrNPVrAccrOACohen's κ
Baek et al. 21KoreaCase Control||15414nrnrNasal swab (154)RT-LAMPNnrORF1ab, S100%98.7%0.826
Kitagawa et al. 22JapanCohort||7630nrnrNPS (76)RT-LAMPnrLA-200 turbidimeterN100%95.6%97.4%
Lau et al. 23MalaysiaCase Control††89‡‡47‡‡nrnrNPS (89)RT-LAMPNLA-320 turbidimeterRdRp,E100%100%
Lu et al. 24ChinaCase Control||5636nrnrThroat swab (56)RT-LAMPNnrORF1ab, N92.9%
Yan et al. 25ChinaCohort††13058nrnrThroat swab, BAL (nr)RT-LAMPORF1ab, SnrORF1ab, N100% (92.3–100)100% (93.7–100)
Wang, Cai, & He et al. 26ChinaCohort††94733844 ± 17.160%OPS (834), sputum (82), NPS (16), nasal swab (8), BAL (4), stool (2), blood (1)RT-RAAORF1abRAA-F1620 fluorescent detectorORF1ab &N, or ORF1ab97.6%(95.2–98.9)97.8%(96.2–98.8)96.2%(93.4–97.8)98.6%(97.3–99.3)0.952p<0.05
Xue et al. 27ChinaCohort††120‡‡22‡‡nrnrNPS, sputum (nr)RT-RAAORF1abRAA-1620 fluorescent detectorORF1ab,  S100%100%1.0p < 0.001
Perchetti et al. 28USACase Control††356186nrnrNPS (356)Triplex rRT-PCRN1, N2n/aN1, N298.4%100%99.2%
Waggoner et al. 29USACohort||2711nrnrNPS, OPS (nr)Triplex rRT-PCRN2, En/aN2, E100%
Li et al. 30ChinaCohort††303‡‡126‡‡nrnrthroat swab (267), sputum (22), nose swab (8), BAL (3), blood (3)AIGSORF1ab, N, SLifeReady 1000ORF1ab, N97.62%(93.2–99.5)100%
Suo et al. 19,aChinaCohort⁎⁎5852nrnrThroat swab (58)ddPCRORF1ab, NQX200 SystemORF1ab, N94%(83–99)100%(48–100)100%(NA)63%(36–83)95%(84–99)
Bulterys et al. 31USACohort††8030nrnrNPS (80)Isothermal amplificationORF1ab, NAtila iAMP kitE82.8%(65.0–92.9)0.86(0.74–0.98)
Wang , Cai, & Zhang et al. 32ChinaCohort††181‡‡25‡‡nrnrThroat swab (181)OSN-qRT-PCRORF1ab, NLife Tech. 480ORF1ab,  N0.737

Cases according to reference standard.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD. Patient Population.

Hospitalized patients

¶Emergency Room/Immediate Care Center patients.

Outpatients, who were later hospitalized.

not reported.

Number of samples (when number of patients not reported).

Suo et al. data is also present in Table 2. Abbreviations- AIGS: Automatic integrated gene detection system, BAL: Bronchoalveolar lavage, CI: confidence interval, ddPCR: digital droplet polymerase chain reaction, E: envelope, iAMP: isothermal amplification, IQR: Interquartile range, : kappa statistic, n/a: not applicable, N: nucleocapsid, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, OSN-qRT-PCR: one-step single-tube nested quantitative real-time polymerase chain reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, Ref Stnd: reference standard, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rSN: reported sensitivity, rSP: reported specificity, RT-LAMP: reverse transcription loop-mediated isothermal amplification, RT-RAA: reverse-transcription recombinase-aided amplification, S: spike, y: years.

Table 5

Studies estimating NAAT platform test performance characteristics compared to rRT-PCR as the reference standard.

CountryStudy TypeNo. PatientsDemographicsSpecimenIndex TestRef Stnd: rRT-PCRStudy Findings (95% CI)
TotalCases*Age (y)% MTypePrimersPlatformPrimersrSNrSPrPPVrNPVrPPArNPArOACohen's κ
Mitchell et al. 33USACase Control††61‡‡46‡‡nrnrNPS in VTM (61)Automated Isothermal NAATRdRpAbbottID NOWN1, N271.70%100%78.70%
Rhoads et al. 35,bUSCases only||96‡‡96‡‡nrnrNPS (85), nasal swab (11) in NS or UTMAutomated Isothermal NAATRdRpAbbottID NOWN1, N294% (87–98)
Moore et al. 34,aUSACohort||¶,⁎⁎20011950 ± 1746%NPS in VTM(200)Automated Isothermal NAATRdRpAbbott ID NOW80.3% (71.9–87.1)100% (95.4–100)
Automated Multiplex RT-PCRRdRp,NAbbott RealTimeN1, N2100% (96.9–100)92.4% (84.2–97.2)
DegliAngeli et al. 36USACase Control††60‡‡30‡‡nrnrNasal swab, NPS (nr)Automated Multiplex RT-PCRRdRp,NAbbott RealTimeN1, N293%100%
Hou et al. 37ChinaCohort||,⁎⁎285153<65 y: 77.2%55.8%OPS(285)Automated Multiplex RT-PCRN2, ECepheid Xpert XpressORF1ab, N96.1%(91.3–98.4)96.2%(90.9–98.6)96.1%§0.92(0.88–0.97)
Lieberman et al. 38USACohort††26‡‡13‡‡nrnrNPS(26)Automated Multiplex RT-PCRN2, ECepheid Xpert XpressN1, N2100%§
Loeffelholz et al. 39,aUSA, UK, FR, ITCohort†† (enriched for positive cases)88‡‡13‡‡nrnrNPS (339), NPS+OPS (97), Trach Asp (30), OPS (15)Automated Multiplex RT-PCRN2, E, RdRpCepheid Xpert XpressN1, N399.5% (97.5–99.9)95.8% (92.6–97.6)
129‡‡60‡‡S, E100% (94.0–100)100% (94.7–100)
99‡‡74‡‡N1, N2100% (94.2–100)92.0% (75.0–97.8)
65‡‡30‡‡RdRP100% (88.7–100)74.3% (57.9–85.8)
79‡‡35‡‡RdRp, E, N100% (67.6–100)100% (92.0–100)
Bordi et al. 40ItalyCohort + Controls††278 + 2099nrnrNasal swab, NPS (nr)Multiplex RT-PCRORF1ab, SDiasorinSimplexaRdRp,E100%100%0.938 (0.89–0.98)
Rhoads et al35,bUSCases only||96‡‡96‡‡nrnrNPS (85), nasal swab (11)Multiplex RT-PCRORF1ab, SDiasorin SimplexaN1, N296% (90–99)
Poljak et al. 41SloveniaCohort††50163nrnrNPS (489), NPS+OPS (12)Automated Multiplex RT-PCRORF1, ERoche cobas 6800RdRP, E100% (92.8–100)99.5% (98.2–99.9)99.6% (98.4–99.9)0.98(0.96–1.0)
Pujadas et al. 42USACohort††963‡‡640‡‡nrnrNPS(963)Automated Multiplex RT-PCRORF1, ERoche cobas 6800N1, N2, N394.2% (92.2–95.9)99.6%(98.1–99.9)95.8% (94.4–97.0)0.904 (0.87–0.93)
Rahman et al. 43AustraliaCohort††52‡‡5‡‡31.5 (0–84)58%NPS+OPS (30), NPS (16), N Asp (5), sputum (1)Multiplex RT-PCRORF1Aus DiagnosticsRdRp, E100%92.16%55.56%100%
Hogan et al.JCV, 4–24 44USACohort††180‡‡77‡‡nrnrNPS(184)Automated Multiplex RT-PCRORF1ab
(2)Hologic PantherE98.7% (93.0–100)98.1% (93.1–99.8)98.3% (95.2–99.7)0.97(0.93–1.0)
Chen et al. 45Hong KongCohort||2149151(IQR 31–69)nrNPS (214)Multiplex RT-PCRORF1ab, E, NLuminexNxTAG CoVRdRp/Hel,E97.8%(92.2–99.7)100% (97.1–100)100% (95.9–100)98.4% (94.3–99.8)0.98(0.95–1.0)
Hogan et al. JCM 46USACase Control ||10050nrnrNPS(100)Automated PCR with LFAORF1ab (2)Mesa BioTech AcculaE68.0% (53.3–80.5)100% (92.9–100)84.0% (75.3–90.6)0.74 (0.61–0.87)
Visseaux et al. 47FranceCase Control||6940nrnrNPS (66), BAL(1), Trach Asp (2)Automated Multiplex RT-PCRORF1, EQIAstat-DxRdRp, E100%93%97%

Cases according to reference standard.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD. Patient Population.

Hospitalized patients.

Emergency Room/Immediate Care Center patients.

Outpatients.

not reported.

Number of samples (when number of patients not reported).

Reported as concordance.

Loeffelholz et al. and Moore et al. also appear in Table 6.

Rhoads et al. appears twice in Table 5 for ease of comparison of studies of the same platform. Abbreviations- BAL: Bronchoalveolar lavage, CI: confidence interval, E: envelope, Hel: helicase, IQR: Interquartile range, : kappa statistic, LFA: lateral flow assay, M: male, n/a: not applicable, N: nucleocapsid, NAAT: nucleic acid amplification test, N Asp: nasopharyngeal aspirate, No.: number, NPS: nasopharyngeal swab, nr: not reported, NS: normal saline, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, RdRp: RNA-dependent RNA polymerase, Ref Stnd: reference standard, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rSN: reported sensitivity, rSP: reported specificity, S: spike, Trach Asp: Tracheal Aspirate, UTM: Universal transport medium, VTM: viral transport medium, y: years.

Table 6

Studies assessing agreement between NAAT platforms.

AuthorsCountryStudy TypeNo. PatientsDemographicsSpecimenPlatform #1Platform #2Study Findings (95% CI)
TotalCases*Age (y)% MTypePrimersPlatformTypePrimersPlatformrPPArNPArOArPPVrNPVCohen's κ
Harrington et al. 48USACohort524188nrnrPaired NPS in VTM (RealTime) & foam nasal swab (ID NOW) (524 pairs)Automated Multiplex RT-PCRRdRp, NAbbott RealTimeAutomated Isothermal NAATRdRpAbbottID NOW75%(67.7–80.6)99%(97.6–99.8)
Moore et al34,aUSACohort||,,⁎⁎20012550 ± 1746%NPS in VTM (200)Automated Multiplex RT-PCRRdRp,  NAbbott RealTimeAutomated Isothermal NAATRdRpAbbottID NOW75.2%(66.7–82.5)100%(95.4–100)
Basu et al. 49USACohort10132(28 - 90)nrNPS dry (101)Automated Multiplex RT-PCRN2, ECepheid XpertXpressAutomated Isothermal NAATRdRpAbbottID NOW54.8%(37.8–70.8)98.6%(92.3–99.7)85.1%(76.9–90.8)94.4%(74.3–99)83.1%(73–89.7)
Cases only1515nrnrNPS in VTM (15)66.7%
Hogan et al. JCV, 5–1 50USACase Control††100‡‡53‡‡nrnrNPSc (100)Automated Multiplex RT-PCRORF1ab (2)Hologic PantherAutomated Isothermal NAATRdRpAbbottID NOW80.4% (66.9–90.2)95.9% (86.0–99.5)
Zhen, Smith et al51,bUSACohort††10858nrnrNPS in VTM (108)Automated Multiplex RT-PCRORF1ab (2)Hologic PantherAutomated Isothermal NAATRdRpAbbottID NOW87.7%(76–95)100%(93–100)0.87(0.78–0.96)
Smithgall et al. 52USACase Control||¶1139065 (0–101)60.2%NPS in VTM or UTM (113)Automated Multiplex RT-PCRORF1, ERoche cobas 6800Automated Isothermal NAATRdRpAbbottID NOW73.9%(63.2–82.3)100%(83.4–100)
Automated Multiplex RT-PCRORF1, ERoche cobas 6800Automated Multiplex RT-PCRN2, ECepheid Xpert Xpress98.9%(92.9–100)92%(72.4–98.6)
Moran et al. 53USACohort||,⁎⁎103‡‡42‡‡nrnrNPS (95), nasal swab (8)Automated Multiplex RT-PCRORF1, ERoche cobas 6800Automated Multiplex RT-PCRN2, ECepheid Xpert Xpress99%
Craney et al. 54USACohort††389147nrnrNPS (389)Automated Multiplex RT-PCRORF1ab (2)Hologic PantherAutomated Multiplex RT-PCRORF1, ERoche cobas 680096.4%0.922
Zhen, Smith et al51,bUSACohort††10858nrnrNPS in VTM (108)Automated Multiplex RT-PCRORF1ab (2)Hologic PantherAutomated Multiplex RT-PCRN2, ECepheid Xpert Xpress98.3%(91–100)100%(93–100)0.98(0.95–1.0)
Automated Multiplex RT-PCRORF1ab (2)Hologic PantherAutomated RT-PCR w/sensorNGenMarkePlex91.4%(81–97)100%(93–100)0.91(0.83–0.99)
Loeffelholz et al39,aUSA, UK, FR, ITCohort†† (enriched for positive cases)18‡‡8‡‡nrnrNPS, OPS, NPS+OPS, Trach Asp (nr)Automated Multiplex RT-PCRRdRp, NAbbott RealTimeAutomated Multiplex RT-PCRN2, ECepheid Xpert Xpress100%(67.6–100)100%(77.2–100)
Norz et al. 55GermanyCase Control††165‡‡36‡‡nrnrNPS, OPS (nr)Automated Multiplex RT-PCRORF1, ERoche cobas 6800Automated RT-PCRENeuMoDx 96100%99.2%

Case estimated as a positive result of any evaluated platform.

Of cohort or cases.

Format: median (range), median(IQR), or mean±SD. Patient Population.

Hospitalized patients.

Emergency Room/Immediate Care Center patients.

Outpatient.

not reported.

Number of samples (when number of patients not reported).

Loeffelholz et al. and Moore et al. also appear in Table 5.

Zhen, Smith et al. appears twice in Table 6 for ease of comparison of studies of the same platform.

Dry or VTM status not reported. Rhoads et al. appears twice in Table 5 for ease of comparison of studies of the same platform Abbreviations- CI: confidence interval, E: envelope, IQR: Interquartile range, : kappa statistic, M: male, N: nucleocapsid, NAAT: nucleic acid amplification test, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, RdRp: RNA-dependent RNA polymerase, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, S: spike, Trach Asp: Tracheal Aspirate, UTM: universal transport medium, VTM: viral transport medium, y: years.

Studies reporting the “positive rate” of rRT-PCR testing within a population of patients suspected to have COVID-19. Reported instead of “cases according to reference standard” as present in other tables. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. PR: positive rate, which is the number of rRT-PCR patients out of the number of patients suspected to have COVID-19 (i.e. the reference standard). Patient population. Hospitalized patients. Abbreviations- BAL: bronchoalveolar lavage, CI: confidence interval, IQR: Interquartile range, N: nucleocapsid, OP: oropharyngeal, ORF1ab: open reading frame 1ab, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, y: years. Studies reporting test performance characteristics of initial rRT-PCR result compared to result after repeated tests of rRT-PCR as reference standard. Pts included in test performance calculations. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. Patient Population. Hospitalized patients. Abbreviations- BAL: Bronchoalveolar lavage, CI: confidence interval, d: days, E: envelope, Hel: helicase, IQR: Interquartile range, N: nucleocapsid, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, PR: positive rate, pts: patients, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, rSN: reported sensitivity, rSP: reported specificity, S: spike, Trach Asp: Tracheal Aspirate, y: years. Sensitivity estimates for the first test conducted on patients were calculated based on different assumptions about true negatives. The estimate of the upper bound estimate assumes that any negative test results (whether negative on a single test or consistently negative across multiple, repeated tests) was a true negative (aka, false negative rate=0%). The estimate of the lower bound uses the proportion of repeatedly tested cases who initially tested negative but then tested positive in repeated tests to calculate a false negative rate (16.8%) and apply that rate to the patients who only received a single test to calculate an assumed number of false negative cases. Additional details are provided in Suppl. Fig. 1 and Green et al.
Fig. 1

PRISMA Flow diagram of studies included in the review.

Studies that calculate test performance characteristics of rRT-PCR or automated rRT-PCR platforms compared to composite reference standards. Cases according to composite reference standard. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. Patient Population. Hospitalized patients. Emergency Room patients. Outpatients, some of whom were later hospitalized. not reported. Suo et al. data is also present in Table 4. Abbreviations- E: envelope, IQR: Interquartile range, : kappa coefficient, M: male, N/A: not applicable, N: nucleocapsid, NPS: nasopharyngeal swab, nr: not reported, ORF1ab: open reading frame 1ab, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rPPA: reported positive percent agreement, rPPV: reported positive predicted value, rSN: reported sensitivity, rSP: reported specificity, S: spike, VTM: viral transport media, y:years. Studies reporting test performance characteristics of other nucleic acid amplification test methods compared to rRT-PCR. Cases according to reference standard. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. Patient Population. Hospitalized patients ¶Emergency Room/Immediate Care Center patients. Outpatients, who were later hospitalized. not reported. Number of samples (when number of patients not reported). Suo et al. data is also present in Table 2. Abbreviations- AIGS: Automatic integrated gene detection system, BAL: Bronchoalveolar lavage, CI: confidence interval, ddPCR: digital droplet polymerase chain reaction, E: envelope, iAMP: isothermal amplification, IQR: Interquartile range, : kappa statistic, n/a: not applicable, N: nucleocapsid, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, OSN-qRT-PCR: one-step single-tube nested quantitative real-time polymerase chain reaction, rAcc: reported accuracy, RdRp: RNA-dependent RNA polymerase, Ref Stnd: reference standard, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rSN: reported sensitivity, rSP: reported specificity, RT-LAMP: reverse transcription loop-mediated isothermal amplification, RT-RAA: reverse-transcription recombinase-aided amplification, S: spike, y: years. Studies estimating NAAT platform test performance characteristics compared to rRT-PCR as the reference standard. Cases according to reference standard. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. Patient Population. Hospitalized patients. Emergency Room/Immediate Care Center patients. Outpatients. not reported. Number of samples (when number of patients not reported). Reported as concordance. Loeffelholz et al. and Moore et al. also appear in Table 6. Rhoads et al. appears twice in Table 5 for ease of comparison of studies of the same platform. Abbreviations- BAL: Bronchoalveolar lavage, CI: confidence interval, E: envelope, Hel: helicase, IQR: Interquartile range, : kappa statistic, LFA: lateral flow assay, M: male, n/a: not applicable, N: nucleocapsid, NAAT: nucleic acid amplification test, N Asp: nasopharyngeal aspirate, No.: number, NPS: nasopharyngeal swab, nr: not reported, NS: normal saline, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, RdRp: RNA-dependent RNA polymerase, Ref Stnd: reference standard, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, rSN: reported sensitivity, rSP: reported specificity, S: spike, Trach Asp: Tracheal Aspirate, UTM: Universal transport medium, VTM: viral transport medium, y: years. Studies assessing agreement between NAAT platforms. Case estimated as a positive result of any evaluated platform. Of cohort or cases. Format: median (range), median(IQR), or mean±SD. Patient Population. Hospitalized patients. Emergency Room/Immediate Care Center patients. Outpatient. not reported. Number of samples (when number of patients not reported). Loeffelholz et al. and Moore et al. also appear in Table 5. Zhen, Smith et al. appears twice in Table 6 for ease of comparison of studies of the same platform. Dry or VTM status not reported. Rhoads et al. appears twice in Table 5 for ease of comparison of studies of the same platform Abbreviations- CI: confidence interval, E: envelope, IQR: Interquartile range, : kappa statistic, M: male, N: nucleocapsid, NAAT: nucleic acid amplification test, No.: number, NPS: nasopharyngeal swab, nr: not reported, OPS: oropharyngeal swab, ORF1ab: open reading frame 1ab, RdRp: RNA-dependent RNA polymerase, rNPA: reported negative percent agreement, rNPV: reported negative predictive value, rOA: reported overall agreement, rPPA: reported positive percent agreement, rPPV: reported positive predictive value, rRT-PCR: real-time Reverse Transcription Polymerase Chain Reaction, S: spike, Trach Asp: Tracheal Aspirate, UTM: universal transport medium, VTM: viral transport medium, y: years.

Data synthesis and analysis

We presented the extracted data in tabular form mirrored by a descriptive synthesis in two broad categories: diagnostic accuracy studies for rRT-PCR (Table 1, Table 2, Table 3), and diagnostic accuracy or comparative agreement studies of two NAATs (Table 4, Table 5, Table 6). Tables are thematically divided based on the reference standard strategy, or approach to obtaining comparative agreement measures. Diagnostic accuracy studies for rRT-PCR were arranged alphabetically in tables by first author last name (Table 1, Table 2, Table 3). Diagnostic accuracy and comparative agreement studies for two NAATs were arranged by decreasing order of studies per methodology, then alphabetically by methodology or platform (Table 4, Table 5, Table 6) for easy comparison. Due to significant diversity in methods and reporting of results, we reported grouped summary data for study characteristics, patient characteristics, and outcomes. We used the framework of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) to evaluate our selected articles (Supplementary Table 3). We collected data, or noted their absence, for a narrative description of risk of bias and concerns of applicability based on the QUADAS domains. For assessment of bias in patient selection, we evaluated author conflicts of interest, study design type, inclusion/exclusion criteria, method of patient enrollment, and reporting of patient demographics and characteristics (i.e. symptomatic status). For assessment of bias in reference standard and index test, we evaluated the accuracy of the reference standard, the description of duration of symptoms at time of testing, whether the threshold to determine a positive test was prespecified, and researcher blinding to reference standard and index test results. For assessment of bias in flow and timing, we evaluated whether the reference standard was the same for all patients, the sequence and timing of the performance of the reference standard and index test, whether test performance characteristics were calculated based on sample numbers or patient numbers, and whether indeterminate or invalid results were included in test performance calculations.

Results

Our search yielded 1537 articles, with 816 unique articles after deduplication. After screening title and abstract, 130 articles underwent full text evaluation. Ultimately, 49 articles were included in our review (Fig. 1 ). PRISMA Flow diagram of studies included in the review.

The performance of rRT-PCR compared to case definitions or clinical diagnoses

Three studies, with 19 to 1014 patients, report a “positive rate” as the number of positive rRT-PCR out of the number of suspected cases of COVID-19, with a range of 38.42% to 59% (Table 1).7, 8, 9 The studies do not report these values as “sensitivity” directly, however these values were interpreted as reflective of the accuracy of rRT-PCR. Ai et al. compared the positive rate of rRT-PCR (59%) with the positive rate of Chest CT in order to draw conclusions about the accuracy of Chest CT for the diagnosis of COVID-19, and Liu et al. and Xie et al. expressed concern that their low calculated positive rates (38.42% and 47.4%, respectively) were indicative of a failure of rRT-PCR to diagnose COVID-19. , In terms of quality assessment, the studies lack specific details as to how patients were classified as having suspected COVID-19 infection. The accuracy of clinical diagnosis based on case definitions is unclear but is likely not ideal for diagnosis. Additionally, duration of symptoms at the time of clinical diagnosis or rRT-PCR testing was not provided (Supplementary Table 3).

The performance of rRT-PCR compared to end result after multiple repetitions of rRT-PCR

Eight studies, with a range of 36 to 22,061 patients per study, attempted to determine the accuracy of rRT-PCR by comparing the initial rRT-PCR result to the result after multiple repeated samples from the patient submitted for rRT-PCR testing, which was called the reference standard (Table 2).10, 11, 12, 13, 14, 15, 16, 17 Three studies reported this value as a “positive rate,” ranging from 51.25% to 88%, , , and five reported sensitivity, with a range of 57.9% to 94.6%.11, 12, 13 , , Of these studies, only He et al. included an rSp of 100%, calculated from patients who remained negative for SARS-CoV-2 after repeated sample testing (Supplementary Table 1). Green et al. included patients in their study regardless of whether they were tested once or multiple times, using data from these subsets of patients to make assumptions for estimating clinical test characteristics. In addition, this study also conducted multiple different NAATs and rRT-PCRs on patients, whereas other studies employing this strategy used only one type of NAAT. Additionally, the authors do not clarify whether patients who had repeat SARS-CoV-2 test were consistently tested with the same NAAT/rRT-PCR test or a different one. They also calculated test performance characteristics differently from other studies: two estimates of sensitivity were calculated, one in which the rate of false negatives for single-tested patients was 0%, and one in which the “false negative” rate was the same as in repeat-tested patients in their study of approximately 16.8%. However, the details of how they calculated test characteristics were not presented. To clarify the two assumptions made in the calculations, we reconstructed the calculation in Supplementary Figure 1 which demonstrated the range of rSN with an estimate of the lower bound of 57.9% (55.2%−60.5%) and an estimate of the upper bound of 94.6% (94.2%−95.0%). In terms of quality assessment, most of the studies were performed with non-cohort design, and six consisted of only patients who were determined to have COVID-19 by rRT-PCR, i.e. cases only (Table 2). , , 14, 15, 16 Five of the studies had inclusion criteria which caused pertinent patients to be excluded by necessitating patients to have had a well-performed CT Chest or X-Ray (Supplementary Table 3). This excluded several patients who would otherwise have been pertinent to the study of test diagnostic accuracy. , , , , The studies involved repeating rRT-PCR several times for a reference standard, but each patient received a different number of repeat tests over a different time period, resulting in each patient receiving a different reference standard. One study tracked negative-to-positive conversion over 1 to 49 days, and another tracked over 1 to 14 days, leading to concern that potentially a patient could have been infected in the time between the initial test and the final test and confounding results. One study counted invalid results as negative results and indeterminate results as positive results when calculating test performance characteristics, otherwise the rationale and ways invalid and indeterminate results were handled were not reported in these studies.

The performance of rRT-PCR compared to various composite reference standards

Three studies determined the accuracy or agreement of rRT-PCR or automated rRT-PCR platforms/instruments compared to a reference standard based on the results of several tests as a “composite reference standard” (Table 3).18, 19, 20 There were between 58 and 184 patients per study. Suo et al. considered a positive result of either repeated measurements of rRT-PCR or serology to indicate a positive test according to the reference standard; reported sensitivity of initial rRT-PCR result was 40%, rSp 100%, rPPV 100%, and rNPV 16%. Zhen et al. compared rRT-PCR performed according to the US CDC protocol to a composite reference standard in which the consensus result of 3 or more out of 4 molecular assays was considered the correct result. The rRT-PCR had an rPPA of 100%, an rNPA of 98%, and Cohen's kappa coefficient of 0.98. Cradic et al. did not study rRT-PCR but studied three automated molecular assays and used a composite reference standard of the consensus result of two or more of the three assays. While Abbott ID NOW had a rPPA of 91%, the Roche cobas 6800 and Diasorin Simplexa assays had a rPPA of 100%. These studies either did not report how samples were selected for evaluation (Supplementary Table 3), , or reported that only samples which had sufficient residual volume and had been properly stored were selected. Cradic et al. and Zhen & Mangi et al. had initially tested samples with one platform, and some or all samples were frozen, and subsequently thawed and tested with other platforms, leading to confounding factors in the reference standard and test performance calculations involving the various platform results. , Suo et al. used repeat rRT-PCR testing as part of the reference standard, with repeat tests performed 2–10 days after the initial test, after the patient had been discharged from the hospital, leading to potential exposure for initial infection or reinfection.

The performance other nucleic acid amplification test methods compared to standard rRT-PCR

Fourteen studies compared other nucleic acid amplification test methods to detect SARS-CoV-2 to rRT-PCR (Table 4), with between 27 and 356 patients per study. Five of the studies evaluated reverse transcription loop mediated isothermal amplification (RT-LAMP),21, 22, 23, 24, 25 four reported sensitivity of 100% and specificity of 95.6 to 100%,21, 22, 23 , and one study reported accuracy of 92.9%. Two studies, Wang, Cai, & He et al. and Xue et al., evaluated reverse-transcription recombinase-aided amplification (RT-RAA) with Cohen's kappa of 0.952 and 1.0. , Two studies, Perchetti et al. and Waggoner et al., evaluated triplex rRT-PCR, reporting overall agreement as 99.2 and 100%. , Li et al. evaluated an automatic integrated gene detection system (AIGS) with rSN of 97.2% and rSp 98.5%. Suo et al. evaluated digital droplet polymerase chain reaction (ddPCR), with rSN 94%, rSp 100%, rPPV 100%, rNPV of 63%, and rAcc 95%. Bulterys et al. study evaluated an isothermal amplification method with rSN 82.8% and Cohen's kappa 0.86. Wang, Cai, and Zhang et al. evaluated one-step single-tube nested quantitative polymerase chain reaction (OSN-qRT-PCR) with Cohen's kappa of 0.737. Regarding evaluation of quality (Supplementary Table 3), the majority of studies did not report how patient samples were selected for evaluation. , , , , 27, 28, 29, 30 , In the study conducted by Bulterys et al., sample selection was a convenience selection of samples with residual volume that had been stored correctly. Most studies did not report symptomatic status of the patient 21, 22, 23, 24, 25, 26, 27, 28 , 30, 31, 32 or patient demographics.21, 22, 23, 24, 25 , 27, 28, 29, 30, 31, 32 Problematically, many of the studies did not report when the reference standard was conducted on the patient samples compared to the index test, or whether actions that could potentially alter test results (such as freeze/thaw cycles) occurred between reference standard or index test.21, 22, 23, 24 , , , Four studies calculated test performance characteristics based on number of samples rather than number of patients. , , , The management of indeterminate and invalid test results went largely unreported.21, 22, 23, 24, 25 , ,

The performance of NAAT platforms compared to rRT-PCR as the reference standard

Fifteen studies compared automated NAAT platforms to various rRT-PCR assays to determine test performance characteristics (Table 5), with between 26 and 963 patients or samples per study. Three studies evaluated Abbott ID NOW, an isothermal NAAT platform, with rPPA or rSN of 71.7% to 94%, and rNPA or rSP of 100%.33, 34, 35 Two studies evaluated Abbott RealTime with rSN or rPPA of 93% to 100%, and rSP or rNPA of 92.4% to 100%. , Three studies evaluated Cepheid Xpert Xpress, with rPPA 96.1% to 100%, rNPA 74.3% to 100%, rOA 96.1% to 100%, and Cohen's Kappa of 0.92.37, 38, 39 Two studies evaluated Diasorin Simplexa with rSN or rPPA of 96% to 100%, and rSN of 100%. , Two studies evaluated Roche cobas 6800 with rPPA 94.2% to 100%, rNPA 99.5% to 99.6%, and Cohen's Kappa of 0.904 to 0.98. , Other studies evaluated AusDiagnostics (rSN 100%, rSP 92.16%), Hologic Panther Fusion (rPPA 98.7%, rNPA 98.1%), Luminex NxTAG (rSN 97.8%, rSP 100%), Mesa BioTech Accula (rPPA 68.0%, rNPA 100%), or QIAstat-Dx (rSN 100%, rSP 93%) compared to rRT-PCR. With regards to quality evaluation (Supplementary Table 3), most studies did not report method of sample collection/patient recruitment, , , , 41, 42, 43, 44, 45, 46, 47 and four studies conducted a convenience selection of samples, including enrichment for positive samples.34, 35, 36 , Eight studies conducted test performance calculations on sample numbers instead of patient numbers. , , , , , 42, 43, 44 Four studies conducted calculation of test performance characteristics with indeterminate or inconclusive results as “positive,” , , , and the management of indeterminate/inconclusive as well as invalid results went unreported in an additional three studies. , , No study reported the blinding of researchers to the reference standard or index test results.

The agreement of NAAT platforms compared to other NAAT platforms

Ten studies, containing between 15 and 524 patients per study, evaluated the agreement between two different types of NAAT platforms (Table 6), typically under the circumstances where one platform was the standard of care at the institution, and another was introduced. The Abbott ID NOW platform, using isothermal amplification, was the most frequently studied test, with an rPPA of 75–75.2% compared to Abbott Real Time, , 54.8% compared to Cepheid Xpert Xpress, 80.4–87.7% compared to Hologic Panther Fusion, , and 73.9% compared to Roche cobas 6800. Two studies evaluated Cepheid Xpert Xpress compared to Roche cobas 6800, with rPPA 98.9% and rNPA 92% in one, and overall agreement of 99% in another. Several platforms were compared to Hologic Panther Fusion, including Roche cobas 6800 with rOA 96.4%, Cepheid Xpert Xpress with rPPA 98.3% and rNPA 100%, and GenMark ePlex with rPPA 91.4% and rNPA 100%. In the studies, some platforms were identified as the “comparator” or “reference” platforms, including Cepheid Xpert Xpress, Abbott RealTime, , Hologic Panther Fusion, , and Roche cobas 6800, , and these were listed as “Platform #1″ in Table 6. Three studies did not identify any studied platform as the “comparator” or “reference standard,” and instead only reported general, non-directional measures of agreement such as overall agreement, Cohen's Kappa, or alternatively, the calculations of PPA and NPA were identical no matter their method of calculation (Supplementary Table 1). , , Regarding quality evaluation (Supplementary Table 3), the samples used for calculating test performance characteristics were reported to be selected for enrichment of positive samples, , for diversity of viral load, , otherwise curated, or the method of selecting samples was unreported. , , , Symptomatic status of the patients was largely unreported. , 49, 50, 51, 52, 53, 54, 55 Five studies included samples where one test was conducted, then interim freezing, cooling, or other storage, before performance of the second test. , 50, 51, 52 , Two studies did not report the sequence of testing of the two platforms or interim handling or storage of the samples. , The status of researcher blinding to either platform result was not reported in any study.

Discussion

In our scoping review of 49 articles concerning test performance characteristics of rRT-PCR and other NAAT used for the diagnosis of COVID-19, we were able to observe several overarching themes. Clinical diagnosis by the case definition for COVID-19 used in the early period of the pandemic does not correlate well with positive rates of COVID-19 rRT-PCR (Table 1). The result of the initial rRT-PCR performed on a patient, if negative, may not be reflective of the result after multiple repeated rRT-PCRs for that patient (Table 2). Several alternative NAAT methods, many of which are easier or faster to perform, may be comparable to standard rRT-PCR (Table 4). Proprietary multiplex, automated, and/or point-of-care methods are comparable to in accuracy to rRT-PCR (Table 5) and to each other (Table 6), although the Abbott ID NOW SARS-CoV-2 test appears to have lower comparative agreement to other platforms. , 48, 49, 50, 51, 52 These findings should be viewed cautiously as the SARS-CoV-2 tests in these studies have not undergone rigorous evaluation necessary for FDA approval due to the emergency state generated by the COVID-19 pandemic. In addition, during our scoping review, we found substantial heterogeneity among available studies in terms of test types, reference standards, metrics, and details of study design and methodology. We categorized the included studies by four different reference standard strategies: clinical diagnosis/case definitions (Table 1), repeated index testing (Table 2), composite reference standard (Table 3), and rRT-PCR (Table 4 and 5). Additionally, we identified a fifth category, where instead of using a reference standard, comparative agreement between two NAAT platforms was calculated (Table 5 and 6). The main limitation of the first group of studies (Table 1) was the use of a “case definition” as the reference standard to report a “positive rate” of rRT-PCR. During novel disease outbreaks, standard case definitions are often developed to assist clinicians in case identification before a diagnostic test is available. Unfortunately, the studies included in this group were unable to use a clear case definition; instead they refer to a population of “suspected cases,” for which the definition is not reported.7, 8, 9 Because this group enrolled patients prior to February 15, 2020 in China, during the time in which the Chinese National Guideline for Diagnosis and Treatment of COVID-19 (NGDTC) published five different versions of the COVID-19 case definition, the case definitions in use at the time of these studies varied. A recent study estimated that if a single guideline (specifically, version 5 of the NGDTC) had been used to identify cases from the beginning of the outbreak to February 20, 2020, there would have been more than three times as many identified cases in Hubei province. This is relevant to our review because the two largest studies that evaluated the rRT-PCR positive rate of patients with a clinical diagnosis of COVID-19 took place in Wuhan, Hubei province, and included patients evaluated before February 14, 2020 , (Supplementary Table 2). This increased case estimate due to diagnosis of COVID-19 based on case definition complicates the legitimacy and reported accuracy of the “positive rate” of rRT-PCR referred to in these studies. The second group assessed rRT-PCR test performance characteristics via repeated index rRT-PCR testing (Table 2). Most studies in this group reported “sensitivity” by dividing the number of participants with positive baseline rRT-PCRs by the total number of participants who eventually had a positive rRT-PCR after repeated measurements. While such an approach may have some advantages over the use of a case definition alone as a reference standard, this strategy is, nonetheless, an imperfect solution with its own set of inherent limitations. SARS-CoV-2 infection is transient and the associated viral loads are time-varying because of the natural pathophysiology of the infection. Therefore, the time interval between each repeated test becomes crucially important, and even relatively small time differences (and/or lack of uniformly used intervals) could complicate the interpretation of re-test results and their quality as reference standards. Furthermore, repeated use of the same test as a reference standard for itself does not eliminate the inaccuracies or limitations of the test. Such comparisons ultimately reflect the reliability of the test (assuming a short, uniform time interval between tests), rather than providing a true view of test accuracy. The third group of three studies calculated test performance characteristics of rRT-PCR according to a composite reference standard (Table 3). Using arbitrary rules to combine multiple different and imperfect tests inevitably creates a reference standard with some degree of bias. Furthermore, all three studies in this group included the test under evaluation as part of the composite reference standard, which leads to additional bias, described below. Use of a biased composite standard is likely to lead to reduced sensitivity, among other errors affecting true test performance characteristics. The fourth group of studies evaluated SARS-CoV-2 diagnostic tests that are under development as well as proprietary testing platforms (most of which are based on standard rRT-PCR methods). These studies used traditional rRT-PCR as a reference standard; results are summarized in Tables 4 and 5, respectively. Importantly, while these studies were not designed to estimate the accuracy of rRT-PCR, their results indicate that the index tests did not identify significantly more positive samples than rRT-PCR. Finally, the last group of studies compared SARS-CoV-2 NAAT platforms (Table 6). These comparative accuracy studies examined the agreement between two non-reference standard tests. Although most of the testing platforms evaluated in these studies were based on standard rRT-PCR, the agreement between two non-reference standard tests is not equivalent to test accuracy, as mentioned previously. This scoping review is limited by the lack of reporting of several key study features in the majority of the articles evaluated, which is an important indicator of quality and potential bias. Based on the QUADAS-2 criteria, most of the included studies had concern for bias (Supplementary Table 3). The most prominent concerns were unclear inclusion/exclusion criteria, unclear method of enrollment/selection of patients and samples, and unclear handling of indeterminate/inconclusive and invalid results. Additionally, many of the studies were conducted in a so-called “two gate” (case-control) design, in which cases and controls were known and selected ahead of time, rather than performing the test on a group of patients or samples with suspected COVID-19. These factors likely incorporate bias that significantly confounds the results of the studies, thus, the accuracy of the tests in other settings with different prevalences (such as asymptomatic screening, other age groups) may not be truly generalizable. Furthermore, few studies were able to evaluate both the index and reference tests simultaneously or within a short period of time, which is key to avoiding biases caused by changes in the patient's true disease status; this bias can also affect the diagnostic accuracy of the index test. The best approach to determining diagnostic test performance characteristics in the absence of a “gold” standard is an open question in diagnostic accuracy methodology. While many methods have been described, there are only a few well-defined statistical approaches that use a reference standard in lieu of a gold standard, reviewed elsewhere. Latent class analysis is one commonly used approach in situations in which neither the true error rates of the reference standard nor the true prevalence of the disease are known. This approach uses the results of a set of imperfect tests to estimate parameters related to sensitivity, specificity, and prevalence often using maximum likelihood methods. However, this is not the only method available and every method has its own strengths and limitations. Therefore, careful interpretation by studies that attempt to estimate test characteristics is warranted to account for and clarify the inherent limitations of assessing accuracy-related metrics when a gold standard is unavailable. Evaluation of the performance characteristics of SARS-CoV-2 diagnostic tests is vital to control of the ongoing COVID-19 pandemic. While more than 200 SARS-CoV-2 molecular diagnostic tests have received FDA EUAs, we have described in this scoping review that the performance of few of these tests has been assessed appropriately. The lack of robust test performance that we noted in many studies published to date is undoubtably due in part to the critical need for tests, which resulted in accelerated test development. However, our scoping review also uncovered imperfect methods for estimating diagnostic test performance in the absence of a gold standard and demonstrate that the accuracy of these tests should be interpreted with caution. Future studies would benefit from employing statistical methods such as latent class analysis and other methods referenced above to accurately analyze their data. Indeed, instituting national requirements for test performance analysis and reporting, perhaps based on the existing FDA guidelines on diagnostic tests, would advance the goal of standardizing the evaluation SARS-CoV-2 diagnostic test performance. Such an initiative would lead to statistically robust conclusions regarding the accuracy of the index test, which will in turn support hospitals and clinicians as they determine the optimal test to use for COVID-19 diagnosis.

Declaration of Competing Interest

None
  58 in total

1.  Reverse-Transcription Recombinase-Aided Amplification Assay for Rapid Detection of the 2019 Novel Coronavirus (SARS-CoV-2).

Authors:  Guanhua Xue; Shaoli Li; Weiwei Zhang; Bing Du; Jinghua Cui; Chao Yan; Lei Huang; Lu Chen; Linqing Zhao; Yu Sun; Nannan Li; Hanqing Zhao; Yanling Feng; Zhimin Wang; Shiyu Liu; Qun Zhang; Xianghui Xie; Di Liu; Hailan Yao; Jing Yuan
Journal:  Anal Chem       Date:  2020-07-10       Impact factor: 6.986

2.  QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies.

Authors:  Penny F Whiting; Anne W S Rutjes; Marie E Westwood; Susan Mallett; Jonathan J Deeks; Johannes B Reitsma; Mariska M G Leeflang; Jonathan A C Sterne; Patrick M M Bossuyt
Journal:  Ann Intern Med       Date:  2011-10-18       Impact factor: 25.391

3.  Clinical Evaluation of the cobas SARS-CoV-2 Test and a Diagnostic Platform Switch during 48 Hours in the Midst of the COVID-19 Pandemic.

Authors:  Mario Poljak; Miša Korva; Nataša Knap Gašper; Kristina Fujs Komloš; Martin Sagadin; Tina Uršič; Tatjana Avšič Županc; Miroslav Petrovec
Journal:  J Clin Microbiol       Date:  2020-05-26       Impact factor: 5.948

4.  Comparison of Two High-Throughput Reverse Transcription-PCR Systems for the Detection of Severe Acute Respiratory Syndrome Coronavirus 2.

Authors:  Arryn R Craney; Priya D Velu; Michael J Satlin; Kathy A Fauntleroy; Katrina Callan; Amy Robertson; Marisa La Spina; Beryl Lei; Anqi Chen; Tricia Alston; Anna Rozman; Massimo Loda; Hanna Rennert; Melissa Cushing; Lars F Westblade
Journal:  J Clin Microbiol       Date:  2020-07-23       Impact factor: 5.948

5.  Performance of Abbott ID Now COVID-19 Rapid Nucleic Acid Amplification Test Using Nasopharyngeal Swabs Transported in Viral Transport Media and Dry Nasal Swabs in a New York City Academic Institution.

Authors:  Atreyee Basu; Tatyana Zinger; Kenneth Inglima; Kar-Mun Woo; Onome Atie; Lauren Yurasits; Benjamin See; Maria E Aguero-Rosenfeld
Journal:  J Clin Microbiol       Date:  2020-07-23       Impact factor: 5.948

6.  Five-minute point-of-care testing for SARS-CoV-2: Not there yet.

Authors:  Catherine A Hogan; Malaya K Sahoo; ChunHong Huang; Natasha Garamani; Bryan Stevens; James Zehnder; Benjamin A Pinsky
Journal:  J Clin Virol       Date:  2020-05-01       Impact factor: 3.168

7.  Clinical Performance of the Luminex NxTAG CoV Extended Panel for SARS-CoV-2 Detection in Nasopharyngeal Specimens from COVID-19 Patients in Hong Kong.

Authors:  Jonathan Hon-Kwan Chen; Cyril Chik-Yan Yip; Jasper Fuk-Woo Chan; Rosana Wing-Shan Poon; Kelvin Kai-Wang To; Kwok-Hung Chan; Vincent Chi-Chung Cheng; Kwok-Yung Yuen
Journal:  J Clin Microbiol       Date:  2020-07-23       Impact factor: 5.948

8.  Comparison of different samples for 2019 novel coronavirus detection by nucleic acid amplification tests.

Authors:  Chunbao Xie; Lingxi Jiang; Guo Huang; Hong Pu; Bo Gong; He Lin; Shi Ma; Xuemei Chen; Bo Long; Guo Si; Hua Yu; Li Jiang; Xingxiang Yang; Yi Shi; Zhenglin Yang
Journal:  Int J Infect Dis       Date:  2020-02-27       Impact factor: 3.623

9.  Synthesis without meta-analysis (SWiM) in systematic reviews: reporting guideline.

Authors:  Mhairi Campbell; Joanne E McKenzie; Amanda Sowden; Srinivasa Vittal Katikireddi; Sue E Brennan; Simon Ellis; Jamie Hartmann-Boyce; Rebecca Ryan; Sasha Shepperd; James Thomas; Vivian Welch; Hilary Thomson
Journal:  BMJ       Date:  2020-01-16

10.  Comparison of SARS-CoV-2 detection from nasopharyngeal swab samples by the Roche cobas 6800 SARS-CoV-2 test and a laboratory-developed real-time RT-PCR test.

Authors:  Elisabet Pujadas; Nnaemeka Ibeh; Matthew M Hernandez; Aneta Waluszko; Tatyana Sidorenko; Vanessa Flores; Biana Shiffrin; Numthip Chiu; Alicia Young-Francois; Michael D Nowak; Alberto E Paniz-Mondolfi; Emilia M Sordillo; Carlos Cordon-Cardo; Jane Houldsworth; Melissa R Gitman
Journal:  J Med Virol       Date:  2020-05-22       Impact factor: 20.693

View more
  22 in total

1.  The performance of the SARS-CoV-2 RT-PCR test as a tool for detecting SARS-CoV-2 infection in the population.

Authors:  Andreas Stang; Johannes Robers; Birte Schonert; Karl-Heinz Jöckel; Angela Spelsberg; Ulrich Keil; Paul Cullen
Journal:  J Infect       Date:  2021-06-01       Impact factor: 6.072

2.  The Role of Anticoagulation in Post-COVID-19 Concomitant Stroke, Myocardial Infarction, and Left Ventricular Thrombus: A Case Report.

Authors:  Phool Iqbal; Bushra Laswi; Muhammad Bilal Jamshaid; Aamir Shahzad; Hammad Shabir Chaudhry; Dawlat Khan; Muhammad Sohaib Qamar; Zohaib Yousaf
Journal:  Am J Case Rep       Date:  2021-01-15

3.  A Software Tool for Calculating the Uncertainty of Diagnostic Accuracy Measures.

Authors:  Theodora Chatzimichail; Aristides T Hatjimihail
Journal:  Diagnostics (Basel)       Date:  2021-02-27

4.  Development and External Validation of a Machine Learning Tool to Rule Out COVID-19 Among Adults in the Emergency Department Using Routine Blood Tests: A Large, Multicenter, Real-World Study.

Authors:  Timothy B Plante; Aaron M Blau; Adrian N Berg; Aaron S Weinberg; Ik C Jun; Victor F Tapson; Tanya S Kanigan; Artur B Adib
Journal:  J Med Internet Res       Date:  2020-12-02       Impact factor: 5.428

5.  Detection of SARS-CoV-2 Infection in Human Nasopharyngeal Samples by Combining MALDI-TOF MS and Artificial Intelligence.

Authors:  Meritxell Deulofeu; Esteban García-Cuesta; Eladia María Peña-Méndez; José Elías Conde; Orlando Jiménez-Romero; Enrique Verdú; María Teresa Serrando; Victoria Salvadó; Pere Boadas-Vaello
Journal:  Front Med (Lausanne)       Date:  2021-04-01

6.  Sensitivity of the Molecular Test in Saliva for Detection of COVID-19 in Pediatric Patients With Concurrent Conditions.

Authors:  Guzmán-Ortiz Ana Laura; Nevárez-Ramírez Abraham Josué; López-Martínez Briceida; Parra-Ortega Israel; Angeles-Floriano Tania; Martínez-Rodríguez Nancy; Jamaica-Balderas Lourdes; De la Rosa-Zamboni Daniela; Ortega-Riosvelasco Fernando; Jaramillo-Esparza Carlos Mauricio; Bonilla-Pellegrini Sergio René; Reyna-Trinidad Irineo; Márquez-González Horacio; Medina-Contreras Oscar; Quezada Héctor
Journal:  Front Pediatr       Date:  2021-04-12       Impact factor: 3.418

7.  Utility of Routine Laboratory Biomarkers to Detect COVID-19: A Systematic Review and Meta-Analysis.

Authors:  Jana Suklan; James Cheaveau; Sarah Hill; Samuel G Urwin; Kile Green; Amanda Winter; Timothy Hicks; Anna E Boath; Ashleigh Kernohan; D Ashley Price; A Joy Allen; Eoin Moloney; Sara Graziadio
Journal:  Viruses       Date:  2021-04-30       Impact factor: 5.048

8.  Higher Accuracy of Lung Ultrasound over Chest X-ray for Early Diagnosis of COVID-19 Pneumonia.

Authors:  Javier Martínez Redondo; Carles Comas Rodríguez; Jesús Pujol Salud; Montserrat Crespo Pons; Cristina García Serrano; Marta Ortega Bravo; Jose María Palacín Peruga
Journal:  Int J Environ Res Public Health       Date:  2021-03-27       Impact factor: 3.390

9.  Assessment of the Diagnostic Ability of Four Detection Methods Using Three Sample Types of COVID-19 Patients.

Authors:  Fei Yu; Guoliang Xie; Shufa Zheng; Dongsheng Han; Jiaqi Bao; Dan Zhang; Baihuan Feng; Qi Wang; Qianda Zou; Ruonan Wang; Xianzhi Yang; Weizhen Chen; Bin Lou; Yu Chen
Journal:  Front Cell Infect Microbiol       Date:  2021-06-07       Impact factor: 5.293

10.  Deep learning based detection of COVID-19 from chest X-ray images.

Authors:  Sarra Guefrechi; Marwa Ben Jabra; Adel Ammar; Anis Koubaa; Habib Hamam
Journal:  Multimed Tools Appl       Date:  2021-07-19       Impact factor: 2.757

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.