Literature DB >> 32968883

Diagnostic accuracy and interobserver variability of CO-RADS in patients with suspected coronavirus disease-2019: a multireader validation study.

Davide Bellini^1,2, Nicola Panvini³, Marco Rengo³, Simone Vicini³, Miriam Lichtner⁴, Tiziana Tieghi⁴, Dea Ippoliti³, Federica Giulio³, Elena Orlando³, Mario Iozzino⁵, Maria Grazia Ciolfi⁵, Sarah Montechiarello⁵, Ugo d'Ambrosio⁵, Emanuele d'Adamo⁵, Chiara Gambaretto⁵, Stefano Panno⁵, Vanessa Caldon⁵, Cesare Ambrogi⁵, Iacopo Carbone³.

Abstract

OBJECTIVE: To conduct a multireader validation study to evaluate the interobserver variability and the diagnostic accuracy for the lung involvement by COVID-19 of COVID-19 Reporting and Data System (CO-RADS) score.
METHODS: This retrospective study included consecutive symptomatic patients who underwent chest CT and reverse transcriptase-polymerase chain reaction (RT-PCR) from March 2020 to May 2020 for suspected COVID-19. Twelve readers with different levels of expertise independently scored each CT using the CO-RADS scheme for detecting pulmonary involvement by COVID-19. Receiver operating characteristic (ROC) curves were computed to investigate diagnostic yield. Fleiss' kappa statistics was used to evaluate interreader agreement.
RESULTS: A total of 572 patients (mean age, 63 ± 20 [standard deviation]; 329 men; 142 patients with COVID-19 and 430 patients without COVID-19) were evaluated. There was a moderate agreement for CO-RADS rating among all readers (Fleiss' K = 0.43 [95% CI 0.42-0.44]) with a substantial agreement for CO-RADS 1 category (Fleiss' K = 0.61 [95% CI 0.60-0.62]) and moderate agreement for CO-RADS 5 category (Fleiss' K = 0.60 [95% CI 0.58-0.61]). ROC analysis showed the CO-RADS score ≥ 4 as the optimal threshold, with a cumulative area under the curve of 0.72 (95% CI 66-78%), sensitivity 61% (95% CI 52-69%), and specificity 81% (95% CI 77-84%).
CONCLUSION: CO-RADS showed high diagnostic accuracy and moderate interrater agreement across readers with different levels of expertise. Specificity is higher than previously thought and that could lead to reconsider the role of CT in this clinical setting. KEY POINTS: • COVID-19 Reporting and Data System (CO-RADS) demonstrated a good diagnostic accuracy for lung involvement by COVID-19 with an average AUC of 0.72 (95% CI 67-75%). • When a threshold of ≥ 4 was used, sensitivity and specificity were 61% (95% CI 52-69%) and 81% (95% CI 76-84%), respectively. • There was an overall moderate agreement for CO-RADS rating across readers with different levels of expertise (Fleiss' K = 0.43 [95% CI 0.42-0.44]).

Entities: Chemical

Keywords: COVID-19; ROC curve; Sensitivity and specificity; Severe acute respiratory syndrome coronavirus 2; Tomography, X-ray computed

Mesh：

Year: 2020 PMID： 32968883 PMCID： PMC7510765 DOI： 10.1007/s00330-020-07273-y

Source DB: PubMed Journal: Eur Radiol ISSN： 0938-7994 Impact factor: 5.315

Introduction

The coronavirus disease-2019 (COVID-19) outbreak is currently straining numerous health systems throughout the world. Timely identification of SARS-CoV-2 positive patients is proving to be the key to contain the spread of the virus effectively. Reverse transcriptase-polymerase chain reaction (RT-PCR) is considered the reference standard for the diagnosis of COVID-19. However, a not negligible number of initial false-negative results have been reported [1-3], and many areas throughout the world are facing the challenge of relative shortage and unavailability of RT-PCR tests [4]. On the other hand, the role of CT in the workup of patients with suspected COVID-19 is still under debate. Whereas the early evidence from China reported a high sensitivity [5], suggesting that CT could play a pivotal role in the rapid detection of infected patients, many scientific societies have recommended not to use CT for disease screening, especially in asymptomatic patients, due to its low specificity in differentiating COVID-19 from other viral infections that could present with overlapping imaging features [6, 7]. Moreover, as underlined in two recently published meta-analyses, the high variability of CT findings considered to define a CT scan consistent with SARS-CoV-2 pneumonia introduced a relevant bias hampering the applicability of results and leading to an underestimation of the real specificity of CT [1, 8]. In this scenario, the introduction of standardized reporting systems for patients with suspected COVID-19 has been extensively advocated with the aim to improve communication with referring physicians and make CT diagnostic accuracy data comparable across different institutions to gather scientific evidence [9, 10]. Trying to accomplish this task, the Dutch Radiological Society has recently developed the COVID-19 Reporting and Data System (CO-RADS) and tested it on a sample of symptomatic patients, demonstrating a satisfying diagnostic yield for predicting COVID-19 with substantial interreader agreement [11]. However, to promote its introduction into clinical practice, a more extensive validation of the CO-RADS should be provided, especially considering a larger patient sample and radiologists with different levels of experience as readers. Therefore, the purpose of our study was to investigate, in a large sample of consecutive patients and through the analysis of multiple readers with different levels of expertise, whether the CO-RADS score is a valid tool in order to identify lung involvement in patients with suspected COVID-19.

Materials and methods

This retrospective, single-center study was approved by our institutional review board and written informed consent was waived. Reporting was done in accordance with the Standards for Reporting Diagnostic Accuracy Initiative (STARD) recommendations [12]

Study participants

We retrospectively reviewed medical records from our university hospital archives to search for consecutive patients who underwent chest CT and RT-PCR testing for suspected COVID-19, from March 9, 2020, to May 3, 2020. Chest CT and RT-PCR testing were performed for clinical suspicion of COVID-19 based on the presence of at least one of the following respiratory tract infection symptoms: (I) fever higher > 37.5 °C, (II) cough, and (III) clinically relevant dyspnea, with or without a history suggestive of exposure to SARS-CoV-2 including (a) close relationship with a confirmed positive individual, (b) travel or residential history in areas with high prevalence of disease, or (c) contact with individuals with fever or respiratory symptoms from those areas within 14 days prior to CT scan. Exclusion criteria were as follows: (I) lack of RT-PCR testing results, (II) time interval between CT scan and RT-PCR longer than 7 days, and (III) uninterpretable CT scans due to motion artifacts or incomplete scanning.

CT Technique and image analysis

All CT scans were performed using a single 64-slice CT scanner (LightSpeed VCT, GE Healthcare). All patients were scanned on supine position during single deep-inspiration breath-hold. No contrast medium was administered. Scanning parameters were as follows: tube voltage of 100 or 120 kV according to the patient’s body size, variable tube current with automatic mAs modulation (Smart mA, GE Healthcare), 0.6-mm section thickness and a pitch of 1.388, and iterative reconstruction techniques (ASIR) at 40%. All CT scans were retrieved from the Picture Archiving and Communication Systems, anonymized and uploaded onto a dedicated workstation (SuiteEstensa 2.0, EBIT - Esaote Group Company) for image analysis. Each CT scan was independently analyzed by twelve readers, stratified into four different groups according to their experience as follows: high-experience group (R1, R2, and R3 [D.B., U.D., and E.D.], board-certified radiologists with more than 10 years of experience in thoracic imaging and more than 100 COVID-19 positive CTs reported); intermediate-experience group (R4, R5, and R6 [S.M., M.G.C., and M.I.], board-certified radiologists with more than 50 and less than 100 COVID-19-positive CTs reported); low-experience group (R7, R8, and R9 [M.D.I., F.G., and E.O.], radiologists in-training with less than 50 COVID-19-positive CTs reported); and group of radiographers (R10, R11, and R12 [S.P., V.C., and C.G.], all with a background of more than 50 CTs performed on COVID-19-positive patients). A training set of 30 CTs, in which findings corresponding to each CO-RADS category were equally distributed, was provided to each reader. Furthermore, all readers had a general familiarity with CO-RADS, having adopted it at our institution since its introduction, approximately a month before the start of our study. All readers scored each CT scan assigning a CO-RADS category reflecting their overall suspicion of COVID-19 lung involvement as follows: CO-RADS 1, very low probability; CO-RADS 2, low probability; CO-RADS 3, equivocal/unsure probability; CO-RADS 4, high probability; and CO-RADS 5, very high probability. For a detailed description of all the CT findings associated with each CO-RADS category, please refer to the original paper by Prokop et al [11]. All readers were blinded to the RT-PCR results, to the clinical information and radiological reports of individual patients, and to the disease prevalence in the study sample.

Reference standard

RT-PCR testing performed on respiratory specimens obtained by nasopharyngeal and throat swabs served as a reference standard for the diagnosis of COVID-19. Clinical information and index test results were not available to the assessors of the reference standard. As per our institution guidelines, patients with initial negative RT-PCR, but CT findings suggestive of COVID-19, underwent repeated RT-PCR testing up to a maximum of three times within 7 days after CT scan. Patients who showed at least one positive RT-PCR were considered to be positive for COVID-19; otherwise, they were considered negative. Nonetheless, patients with initial negative RT-PCR and negative CT findings underwent a 14-day follow-up and were considered to be negative if no symptoms’ worsening or laboratory findings consistent with COVID-19 occurred.

Statistical analysis

Categorical variables were expressed as frequencies or percentages. Continuous variables were expressed as means ± standard deviations (SD). The Χ2 test was used to calculate differences in sex, symptoms, and number of RT-PCR testing between COVID+ and COVID− participants. The Mann-Whitney U test was performed to assess differences in age between the two groups. Fleiss’ kappa statistics were used to evaluate interreader agreement for CO-RADS rating both among all readers and among each group of reader. The following coefficients were applied: κ ≤ 0.20, slight agreement; κ = 0.21–0.40, fair agreement; κ = 0.41–0.60, moderate agreement; κ = 0.61–0.80, substantial agreement; and κ = 0.81–1.0, almost perfect agreement [13]. For each reader, the receiver operating characteristics curve (ROC) and the corresponding area under the curve (AUC) were calculated by using the DeLong et al method [14], to assess the CO-RADS diagnostic performance. Mean AUC across observers from the four different readers’ groups and their corresponding 95% confidence intervals (95% CI) were computed and a pairwise comparison of AUCs from all readers was performed by means of the DeLong et al method [14]. For each reader, the highest Youden index (J = sensitivity + specificity− 1) was calculated to select the optimal threshold to discriminate between COVID+ and COVID− participants, and the corresponding sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were computed. Inconclusive results (i.e., CO-RADS 3) were included in the analysis of the diagnostic performance and whether to treat them as positive or negative results depended on the ROC curve and Youden’s index analysis results. In addition, the number and the percentage of readings assigned to each CO-RADS category were determined for both COVID+ and COVID− participants. False-positive CO-RADS 4 and 5 patients and false-negative CO-RADS 1 and 2 patients were subsequently investigated to clarify the reason for erroneous classification. In all cases, p < 0.05 was considered the threshold for assessing statistical significance. All statistical analyses were performed with commercially available software (MedCalc Statistical Software version 19.2.5, MedCalc Software Ltd).

Results

Figure 1 portrays the study participants’ STARD flowchart. From 789 consecutive patients initially screened from our database, we included in the final study sample 572 patients (mean age, 63 years ± 20standard deviation [SD]; range, 8–96 years; including 332 men [mean age, 62 years ± 19 SD; range, 8–96 years] and 240 women [mean age, 64 years ± 21 SD; range, 12–96 years]). Two hundred and seventeen patients were excluded due to the lack of reference standard (n = 25), interval between CT scan and reference standard longer than 1 week (n = 187), or uninterpretable CT scans (n = 5). Detailed baseline characteristics of the patient study groups are reported in Table 1.

Fig. 1

Table 1

Patient characteristics

Characteristic	All participants (n = 572)	COVID-19+ (n = 142)	COVID-19– (n = 430)	p value
Sex
Male	329 (58%)	77 (54%)	252 (59%)	0.36
Female	243 (42%)	65 (46%)	178 (41%)
Age, years *	63 ± 20 (8–96)	63 ± 20 (8–96)	61 ± 19 (12–96)	0.11
Symptoms
Fever	267 (47%)	86 (61%)	181 (53%)	0.14
Cough	317 (55%)	88 (62%)	229 (67%)	0.26
Dyspnea	233 (41%)	77 (54%)	156 (46%)	0.09
RT-PCR testing
1	355 (62%)	16 (12%)	339 (79%)	< 0.001
2	85 (15%)	19 (13%)	66 (15%)
3	132 (23%)	107 (75%)	25 (6%)

Unless otherwise indicated, data are numbers with percentages in parentheses. COVID-19, coronavirus disease-2019; RT-PCR, reverse transcription polymerase chain reaction

*Data are means ± standard deviations, with ranges in parentheses

Study flowchart for the inclusion and exclusion criteria of the patient sample and CO-RADS ratings. Note: CO-RADS, COVID-19 Reporting and Data System; COVID-19, coronavirus disease-2019; RT-PCR, reverse transcriptase-polymerase chain reaction; PCR+, single or multiple RT-PCR testing positive for SARS-CoV-2 infection; mPCR−, multiple negative RT-PCR testing; sPCR-/FU-, single negative RT-PCR testing and negative follow-up during the 14 days following the CT scan Patient characteristics Unless otherwise indicated, data are numbers with percentages in parentheses. COVID-19, coronavirus disease-2019; RT-PCR, reverse transcription polymerase chain reaction *Data are means ± standard deviations, with ranges in parentheses Three hundred and fifty-five participants underwent a single RT-PCR testing, whereas 217 participants underwent multiple RT-PCR testing, including 85 patients who were tested twice, and 132 participants who underwent three consecutive RT-PCR testing. The median time interval between CT scan and reference standard was 2 days (range 0–7). One hundred and forty-two patients had at least one RT-PCR testing positive for SARS-CoV-2 infection, resulting in a disease prevalence in our study sample of 24.8%. Four hundred and thirty patients were tested negative for SARS-CoV-2 infection, including 91 patients with multiple negative RT-PCR testing, and 339 (78.8%) patients with initial negative RT-PCR and negative follow-up during the 14 days following the CT scan.

Interrater agreement

There was a moderate agreement for overall CO-RADS rating among all observers (Fleiss’ K = 0.43 [95% CI 0.42–0.44]). Regarding each individual CO-RADS category, there was a substantial agreement for CO-RADS 1 category (Fleiss’ K = 0.61 [95% CI 0.60–0.62], moderate agreement for CO-RADS 5 category (Fleiss’ K = 0.60 [95% CI 0.58–0.61]), and fair agreement for CO-RADS categories 2 and 4 (Fleiss’ K = 0.28 [95% CI 0.27–0.29] and 0.23 [95% CI 0.22–0.24], respectively), and poor agreement for CO-RADS 3 category (Fleiss’ K = 0.16 [95% CI 0.15–0.17]). Fleiss’ K values among each group of readers were as follows: high-experience group: 0.54 (95% CI 0.52–0.57), intermediate-experience group: 0.38 (95% CI 0.36–0.41), low-experience group: 0.66 (95% CI 0.64–0.69), and group of radiographers: 0.30 (95% CI 0.27–0.32).

CO-RADS diagnostic performance

Receiver operating characteristic analysis showed the CO-RADS score ≥ 4 as the optimal threshold to discern between patients with PCR+ from those with PCR−. Considering the high-experience reader group and CO-RADS score ≥ 4 as a threshold, the average value of area under the curve was 0.74 (95% CI 0.67–0.79) (Fig. 2), with a sensitivity of 59.1% (95% CI 0.52–0.67) and a specificity of 84.1% (95% CI 0.81–0.87).

Fig. 2

Receiver operating characteristic curve of high-experience group (a), intermediate-experience group (b), low-experience group (c), and group of radiographers (d) for predicting lung involvement by coronavirus disease-2019 using the COVID-19 Reporting and Data System (CO-RADS). Note: AUC, area under the curve The average value of area under the curve for intermediate-experience readers was 0.70 (95% CI 0.66–0.76) (Fig. 2), with a sensitivity of 62% (95% CI 54.2–69.9) and a specificity of 78.7% (95% CI 74.5–82.5). The average value of area under the curve for low-experience readers was 0.73 (95% CI 0.67-0.79) (Fig. 2), with a sensitivity of 62.7% (95% CI 53.2–70.1) and a specificity of 79.3% (95% CI 75.1–83). Averaging data from radiographers, we obtained an area under the curve of 0.68 (95% CI 0.62–0.75) (Fig. 2), a sensitivity of 40.8% (95% CI 33–49), and a specificity of 87.3% (95% CI 83.8–90.3). The diagnostic accuracy values for all readers are supplied in Table 2. When all datasets from each radiologist were blended, the optimal diagnostic threshold of ≥ 4 yielded an area under the curve of 0.72 (95% CI 66–78%) with a sensitivity of 61% (95% CI 52–69%) and specificity of 81% (95% CI 77–84%). Detailed results from pairwise comparison of the different readers’ AUC are displayed in Table E1 of the supplemental material; briefly, there was no difference in diagnostic accuracy between readers with the highest AUC from the three different radiologists’ groups (p > 0.05 for all comparisons).

Table 2

Readers diagnostic performance for lung involvement by COVID-19 when CO-RADS ≥ 4 was used as a positive threshold

Reader	TP	TN	FP	FN	Sensitivity (%)	Specificity (%)	PPV (%)	NPV (%)	AUC
R1	81	363	67	61	57.0 (81/142) [48.5–65.3]	84.4 (363/430) [80.6–87.7]	54.7 (81/148) [48.2–61.1]	85.6 (363/424) [83.1–87.8]	0.73 [0.69–0.79]
R2	85	369	61	57	59.9 (85/142) [51.3–68.0]	85.8 (369/430) [82.2–89.0]	58.2 (85/146) [51.6–64.6]	86.6 (369/426) [84.1–88.8]	0.74 [0.69–0.79]
R3	86	351	79	56	60.6 (86/142) [52.0–68.7]	81.6 (351/430) [77.6–85.2]	52.1 (86/165) [46.1–58.0]	86.2 (351/407) [83.6–88.5]	0.74 [0.69–0.79]
R4	89	345	85	53	62.7 (89/142) [54.2–70.6]	80.2 (345/430) [76.1–83.9]	51.1 (89/174) [45.4–56.8]	86.7 (345/398) [84.0–89.0]	0.72 [0.68–0.77]
R5	86	331	99	56	60.6 (86/142) [52.0–68.7]	77.0 (331/430) [72.7–80.9]	46.5 (86/185) [41.1–51.9]	85.5 (331/387) [82.7–87.9]	0.70 [0.65–0.75]
R6	89	340	90	53	62.7 (89/142) [54.2–70.6]	79.1 (340/430) [74.9–82.8]	49.7 (89/179) [44.2–55.3]	86.5 (340/393) [83.8–88.9]	0.70 [0.66–0.76]
R7	90	333	97	52	63.4 (90/142) [54.9–71.3]	77.4 (333/430) [73.2–81.3]	48.1 (90/187) [42.8–53.5]	86.5 (333/385) [83.7–88.9]	0.71 [0.66–0.76]
R8	88	344	86	54	62.0 (88/142) [53.5–70.0]	80.0 (344/430) [75.9–83.7]	50.6 (88/174) [44.9–56.3]	86.4 (344/398) [83.7–88.8]	0.73 [0.68–0.78]
R9	88	346	84	54	62.0 (88/142) [53.5–70.0]	80.5 (346/430) [76.4–84.1]	51.2 (88/172) [45.4–56.9]	86.5 (346/400) [83.8–8.88]	0.72 [0.67–0.77]
R10	67	379	51	75	47.2 (67/142) [38.8–55.7]	88.1 (379/430) [84.7–91.0]	56.8 (67/118) [49.0–64.2]	83.5 (379/454) [81.2–85.6]	0.71 [0.66–0.75]
R11	77	364	66	65	54.2 (77/142) [45.7–62.6]	84.6 (364/430) [80.9–87.9]	53.8 (77/143) (47.1–60.4)	84.8 (364/429) [82.3–87.1]	0.71 [0.65–0.75]
R12	30	384	46	112	21.1 (30/142) [14.7–28.8]	89.3 (384/430) [86.0–92.1]	39.5 (30/76) [30.0–49.8]	77.4 (384/496) [75.8–79.0]	0.64 [0.56–0.66]

Data in parentheses are numerators and denominators, with 95% confidence intervals in brackets

COVID-19, coronavirus disease-2019; CO-RADS, COVID-19 Reporting and Data System; TP, true positive; TN, true negative; FP, false positive; FN, false negative; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the ROC curve

Readers diagnostic performance for lung involvement by COVID-19 when CO-RADS ≥ 4 was used as a positive threshold 57.0 (81/142) [48.5–65.3] 84.4 (363/430) [80.6–87.7] 54.7 (81/148) [48.2–61.1] 85.6 (363/424) [83.1–87.8] 0.73 [0.69–0.79] 59.9 (85/142) [51.3–68.0] 85.8 (369/430) [82.2–89.0] 58.2 (85/146) [51.6–64.6] 86.6 (369/426) [84.1–88.8] 0.74 [0.69–0.79] 60.6 (86/142) [52.0–68.7] 81.6 (351/430) [77.6–85.2] 52.1 (86/165) [46.1–58.0] 86.2 (351/407) [83.6–88.5] 0.74 [0.69–0.79] 62.7 (89/142) [54.2–70.6] 80.2 (345/430) [76.1–83.9] 51.1 (89/174) [45.4–56.8] 86.7 (345/398) [84.0–89.0] 0.72 [0.68–0.77] 60.6 (86/142) [52.0–68.7] 77.0 (331/430) [72.7–80.9] 46.5 (86/185) [41.1–51.9] 85.5 (331/387) [82.7–87.9] 0.70 [0.65–0.75] 62.7 (89/142) [54.2–70.6] 79.1 (340/430) [74.9–82.8] 49.7 (89/179) [44.2–55.3] 86.5 (340/393) [83.8–88.9] 0.70 [0.66–0.76] 63.4 (90/142) [54.9–71.3] 77.4 (333/430) [73.2–81.3] 48.1 (90/187) [42.8–53.5] 86.5 (333/385) [83.7–88.9] 0.71 [0.66–0.76] 62.0 (88/142) [53.5–70.0] 80.0 (344/430) [75.9–83.7] 50.6 (88/174) [44.9–56.3] 86.4 (344/398) [83.7–88.8] 0.73 [0.68–0.78] 62.0 (88/142) [53.5–70.0] 80.5 (346/430) [76.4–84.1] 51.2 (88/172) [45.4–56.9] 86.5 (346/400) [83.8–8.88] 0.72 [0.67–0.77] 47.2 (67/142) [38.8–55.7] 88.1 (379/430) [84.7–91.0] 56.8 (67/118) [49.0–64.2] 83.5 (379/454) [81.2–85.6] 0.71 [0.66–0.75] 54.2 (77/142) [45.7–62.6] 84.6 (364/430) [80.9–87.9] 53.8 (77/143) (47.1–60.4) 84.8 (364/429) [82.3–87.1] 0.71 [0.65–0.75] 21.1 (30/142) [14.7–28.8] 89.3 (384/430) [86.0–92.1] 39.5 (30/76) [30.0–49.8] 77.4 (384/496) [75.8–79.0] 0.64 [0.56–0.66] Data in parentheses are numerators and denominators, with 95% confidence intervals in brackets COVID-19, coronavirus disease-2019; CO-RADS, COVID-19 Reporting and Data System; TP, true positive; TN, true negative; FP, false positive; FN, false negative; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the ROC curve We reported a total number of 6864 CO-RADS scorings from twelve readers (Fig. 3). Among them, 911 (8.9%) in 185 negative patients for SARS-CoV-2 infection were scored as 4 and 5 and resulted in false-positive chest CT. The post hoc analysis of those cases showed that 37.8 % (70/185) were effectively a 4 and 5 CO-RADS pattern and 62.2% (115/185) were erroneous evaluations from readers. False-negative CO-RADS 1 and CO-RADS 2 were found in 586 (8.5%) readings (96 patients). Among them, 44% (42/96) were correctly identified as CO-RADS 1 and 2 categories, while 56% (54/96) were erroneous evaluations. Representative CT images for each CO-RADS category are illustrated in Fig. 4.

Fig. 3

Fig. 4

Pictorial overview portraying axial chest CT images from our study population illustrating imaging findings characteristics of the CO-RADS 1 (a, b), CO-RADS 2 (c, d), CO-RADS 3 (e, f), CO-RADS 4 (g, h), and CO-RADS 5 (i, j, k, l) scores and their corresponding descriptions. Note: CO-RADS, COVID-19 Reporting and Data System

Distribution of final diagnosis among each cumulative CO-RADS score category. Red columns show the percentage of patients with positive RT-PCR (PCR+), dark green columns show the percentage of patients with multiple negative RT-PCR (mPCR−), and light green columns show the percentage of patients with initial negative RT-PCR and negative clinical follow-up the 14 days after CT scan (sPCR-/FU-). Note: CO-RADS, COVID-19 Reporting and Data System Pictorial overview portraying axial chest CT images from our study population illustrating imaging findings characteristics of the CO-RADS 1 (a, b), CO-RADS 2 (c, d), CO-RADS 3 (e, f), CO-RADS 4 (g, h), and CO-RADS 5 (i, j, k, l) scores and their corresponding descriptions. Note: CO-RADS, COVID-19 Reporting and Data System

Discussion

Since the early stage of the coronavirus disease-2019 (COVID-19) outbreak, the value of CT in the workup of patients with suspected SARS-CoV-2 infection has been controversial [15, 16]. The adoption of standardized reporting systems for lung involvement in COVID-19 has been encouraged in an effort to enhance communication with referring physicians and multicentric gathering of CT diagnostic accuracy data [9-11]. In our retrospective study, we aimed to validate the proposed COVID-19 Reporting and Data System (CO-RADS) in a retrospective multireader diagnostic accuracy study. Our results show that, when a threshold of CO-RADS ≥ 4 was used, readers with different levels of expertise were able to discriminate in approximately 30 s between patients with positive and negative reverse transcriptase-polymerase chain reaction (RT-PCR) testing, with a sensitivity of 61% (95% CI 52–69%), a specificity of 81% (95% CI 76–84%), and an AUC of 0.72 (95% CI 67–75%). Our findings are in contrast with those reported by the earlier literature on CT diagnostic performance for COVID-19. In detail, a recent meta-analysis estimated a pooled sensitivity and specificity for chest CT of 94% and 37%, respectively [8]. However, it has been underlined that most of the earlier studies investigating the diagnostic performance of CT for COVID-19 were flawed by interpretation bias [15, 17, 18]. Indeed, most of the studies published to date did not provide pre-specified criteria to differentiate between positive and negative CT scans. Therefore, it could be hypothesized that applying unclear CT signs to define a positive case may have resulted in an optimistic overestimation of CT sensitivity at the expense of a reduced specificity. On the other hand, our results suggest that the use of standardized threshold for CT findings evaluation provided by reporting systems such as CO-RADS could more realistically reflect the diagnostic value of CT in the daily clinical practice, characterized by a lower sensitivity and an improved specificity for COVID-19. CO-RADS yielded an overall moderate interreader agreement for the assessment of lung involvement in COVID-19 (Fleiss’ K = 0.43 [95% CI 0.42–0.44]). This is comparable to what is reported by Prokop et al, who found an overall moderate interrater agreement among the 8 observers included (Fleiss’ K = 0.47 [95% CI 0.45–0.49]) [11]. In the current study, CO-RADS yielded a lower diagnostic accuracy compared with what was reported by Prokop et al, who found a mean AUC of 0.91 (95% CI 0.85–0.97) for positive RT-PCR testing. We believe that this discrepancy could be related to the higher degree of expertise and familiarity with the CO-RADS scoring of the readers involved in the study by Prokop et al. However, we observed that CO-RADS diagnostic accuracy was not significantly affected by readers’ experience, when CT scans were read by radiologists with different levels of expertise, including radiologists in-training. These results suggest that the adoption of the CO-RADS reporting scheme could represent an effective strategy to facilitate and speed up the learning curve of radiologists in-training in the evaluation of chest CT scans of patients with suspected lung involvement by COVID-19. This latter aspect is noteworthy as, at many institutions, radiologists in-training are the first to assess CT scans, and they should be able to rapidly recognize typical CT findings of COVID-19. Indeed, the finding of CT patterns consistent with COVID-19 even in asymptomatic patients is not uncommon, and their recognition by radiologists in-training at the moment of CT scanning might limit the risk of spreading the contamination, allowing suspicious cases to be directed promptly to dedicated care pathways. This study has several limitations that should be addressed. First, this is a single-center, retrospective study; further multicenter studies with a prospective design are warranted to confirm our findings. Second, multiple RT-PCR testing to rule out SARS-CoV-2 infection was available only for 92 (21 %) of the participants who were tested negative at the initial RT-PCR; this could raise concerns about the reference standard reliability in this subset of participants, since a considerable rate of initial false-negative RT-PCR has been reported [1-3]. However, all patients with initial negative RT-PCR underwent clinical and laboratory monitoring for at least 14 days to confirm their negative status in accordance with guidelines from our institution. Third, our study sample is composed only of symptomatic patients admitted to the Emergency Department with suspected COVID-19, without including asymptomatic patients. We recognize that this may have biased patient selection toward participants with a more severe disease spectrum, thereby affecting the CO-RADS accuracy estimate. However, our study sample reflects the actual population for whom chest CT scan is recommended as a problem-solving tool [6, 7]. Moreover, the CO-RADS scheme has been developed specifically to be used in patients with moderate to severe symptoms [11]. Fourth, a final diagnosis was not available for patients with false-positive CT findings. Therefore, we were not able to assess the prevalence of co-morbidities or other respiratory tract infections, whose imaging findings may overlap with those typically observed in COVID-19. Additional studies addressing this issue are advisable to validate further the CO-RADS algorithm, since its performance may vary according to the different prevalence of conditions with overlapping CT characteristics. In conclusion, CO-RADS score, tested on a large sample of symptomatic patients, should be considered a valid tool for the identification of lung involvement in patients with suspected COVID-19, showing high diagnostic accuracy across readers with different levels of expertise. Specificity, in particular, is higher than previously thought and that could lead to reconsider the role of CT in this clinical setting. (DOCX 19 kb)

16 in total

Review 1. Chest CT and Coronavirus Disease (COVID-19): A Critical Review of the Literature to Date.

Authors: Constantine A Raptis; Mark M Hammer; Ryan G Short; Amar Shah; Sanjeev Bhalla; Andrew J Bierhals; Peter D Filev; Michael D Hope; Jean Jeudy; Seth J Kligerman; Travis S Henry
Journal: AJR Am J Roentgenol Date: 2020-04-16 Impact factor: 3.959

2. Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR.

Authors: Yicheng Fang; Huangqi Zhang; Jicheng Xie; Minjie Lin; Lingjun Ying; Peipei Pang; Wenbin Ji
Journal: Radiology Date: 2020-02-19 Impact factor: 11.105

3. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases.

Authors: Tao Ai; Zhenlu Yang; Hongyan Hou; Chenao Zhan; Chong Chen; Wenzhi Lv; Qian Tao; Ziyong Sun; Liming Xia
Journal: Radiology Date: 2020-02-26 Impact factor: 11.105

4. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society.

Authors: Geoffrey D Rubin; Christopher J Ryerson; Linda B Haramati; Nicola Sverzellati; Jeffrey P Kanne; Suhail Raoof; Neil W Schluger; Annalisa Volpi; Jae-Joon Yim; Ian B K Martin; Deverick J Anderson; Christina Kong; Talissa Altes; Andrew Bush; Sujal R Desai; Onathan Goldin; Jin Mo Goo; Marc Humbert; Yoshikazu Inoue; Hans-Ulrich Kauczor; Fengming Luo; Peter J Mazzone; Mathias Prokop; Martine Remy-Jardin; Luca Richeldi; Cornelia M Schaefer-Prokop; Noriyuki Tomiyama; Athol U Wells; Ann N Leung
Journal: Radiology Date: 2020-04-07 Impact factor: 11.105

5. Radiological Society of North America Expert Consensus Document on Reporting Chest CT Findings Related to COVID-19: Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA.

Authors: Scott Simpson; Fernando U Kay; Suhny Abbara; Sanjeev Bhalla; Jonathan H Chung; Michael Chung; Travis S Henry; Jeffrey P Kanne; Seth Kligerman; Jane P Ko; Harold Litt
Journal: Radiol Cardiothorac Imaging Date: 2020-03-25

6. Coronavirus disease 2019 (COVID-19) imaging reporting and data system (COVID-RADS) and common lexicon: a proposal based on the imaging data of 37 studies.

Authors: Sana Salehi; Aidin Abedi; Sudheer Balakrishnan; Ali Gholamrezanezhad
Journal: Eur Radiol Date: 2020-04-28 Impact factor: 5.315

Review 7. Imaging Publications in the COVID-19 Pandemic: Applying New Research Results to Clinical Practice.

Authors: John Eng; David A Bluemke
Journal: Radiology Date: 2020-04-23 Impact factor: 11.105

8. Diagnostic Performance of CT and Reverse Transcriptase Polymerase Chain Reaction for Coronavirus Disease 2019: A Meta-Analysis.

Authors: Hyungjin Kim; Hyunsook Hong; Soon Ho Yoon
Journal: Radiology Date: 2020-04-17 Impact factor: 11.105

9. Chest Computed Tomography for Detection of Coronavirus Disease 2019 (COVID-19): Don't Rush the Science.

Authors: Michael D Hope; Constantine A Raptis; Travis S Henry
Journal: Ann Intern Med Date: 2020-04-08 Impact factor: 25.391

10. Chest CT for Typical Coronavirus Disease 2019 (COVID-19) Pneumonia: Relationship to Negative RT-PCR Testing.

Authors: Xingzhi Xie; Zheng Zhong; Wei Zhao; Chao Zheng; Fei Wang; Jun Liu
Journal: Radiology Date: 2020-02-12 Impact factor: 11.105

18 in total

Review 1. Thoracic imaging tests for the diagnosis of COVID-19.

Authors: Sanam Ebrahimzadeh; Nayaar Islam; Haben Dawit; Jean-Paul Salameh; Sakib Kazi; Nicholas Fabiano; Lee Treanor; Marissa Absi; Faraz Ahmad; Paul Rooprai; Ahmed Al Khalil; Kelly Harper; Neil Kamra; Mariska Mg Leeflang; Lotty Hooft; Christian B van der Pol; Ross Prager; Samanjit S Hare; Carole Dennie; René Spijker; Jonathan J Deeks; Jacqueline Dinnes; Kevin Jenniskens; Daniël A Korevaar; Jérémie F Cohen; Ann Van den Bruel; Yemisi Takwoingi; Janneke van de Wijgert; Junfeng Wang; Elena Pena; Sandra Sabongui; Matthew Df McInnes
Journal: Cochrane Database Syst Rev Date: 2022-05-16

2. An ML prediction model based on clinical parameters and automated CT scan features for COVID-19 patients.

Authors: Abhishar Sinha; Swati Purohit Joshi; Purnendu Sekhar Das; Soumya Jana; Rahuldeb Sarkar
Journal: Sci Rep Date: 2022-07-04 Impact factor: 4.996

3. Analysis of Diagnostic Modalities in Hospital-admitted Patients Evaluated for COVID-19.

Authors: Benedict Gereke; Andree Friedl; Jonas Rutishauser; Benedikt Wiggli; Tilo Niemann; Romana Calligaris-Maibach; Hans-Rudolf Schmid; Chiara Vanetta
Journal: In Vivo Date: 2022 May-Jun Impact factor: 2.406

4. Machine learning automatically detects COVID-19 using chest CTs in a large multicenter cohort.

Authors: Eduardo J Mortani Barbosa; Bogdan Georgescu; Shikha Chaganti; Gorka Bastarrika Aleman; Jordi Broncano Cabrero; Guillaume Chabin; Thomas Flohr; Philippe Grenier; Sasa Grbic; Nakul Gupta; François Mellot; Savvas Nicolaou; Thomas Re; Pina Sanelli; Alexander W Sauter; Youngjin Yoo; Valentin Ziebandt; Dorin Comaniciu
Journal: Eur Radiol Date: 2021-05-01 Impact factor: 5.315

5. Radiographers and COVID-19 pneumonia: Diagnostic performance using CO-RADS.

Authors: S Vicini; N Panvini; D Bellini; M Rengo; M Ciotola; M De Vivo; C Gambaretto; V Caldon; S Panno; C Del Borgo; I Carbone
Journal: Radiography (Lond) Date: 2021-04-30

Review 6. Comprehensive literature review on the radiographic findings, imaging modalities, and the role of radiology in the COVID-19 pandemic.

Authors: Aman Pal; Abulhassan Ali; Timothy R Young; Juan Oostenbrink; Akul Prabhakar; Amogh Prabhakar; Nina Deacon; Amar Arnold; Ahmed Eltayeb; Charles Yap; David M Young; Alan Tang; Subramanian Lakshmanan; Ying Yi Lim; Martha Pokarowski; Pramath Kakodkar
Journal: World J Radiol Date: 2021-09-28

7. Thoracic imaging tests for the diagnosis of COVID-19.

Authors: Nayaar Islam; Sanam Ebrahimzadeh; Jean-Paul Salameh; Sakib Kazi; Nicholas Fabiano; Lee Treanor; Marissa Absi; Zachary Hallgrimson; Mariska Mg Leeflang; Lotty Hooft; Christian B van der Pol; Ross Prager; Samanjit S Hare; Carole Dennie; René Spijker; Jonathan J Deeks; Jacqueline Dinnes; Kevin Jenniskens; Daniël A Korevaar; Jérémie F Cohen; Ann Van den Bruel; Yemisi Takwoingi; Janneke van de Wijgert; Johanna Aag Damen; Junfeng Wang; Matthew Df McInnes
Journal: Cochrane Database Syst Rev Date: 2021-03-16

Review 8. COVID-19 pandemic in flu season. Chest computed tomography - what we know so far.

Authors: Marcin Stański; Łukasz Gąsiorowski; Mateusz Wykrętowicz; Natalia Krystyna Majewska; Katarzyna Katulska
Journal: Pol J Radiol Date: 2021-12-31

9. Incidental Findings Suggestive of COVID-19 Pneumonia in Oncologic Patients Undergoing ¹⁸F-FDG PET/CT Studies: Association Between Metabolic and Structural Lung Changes.

Authors: Cristina Gamila Wakfie-Corieh; Federico Ferrando-Castagnetto; Alba María Blanes García; Marta García García-Esquinas; Aída Ortega Candil; Cristina Rodríguez Rey; María Nieves Cabrera-Martín; Ana Delgado Cano; José Luis Carreras Delgado
Journal: J Nucl Med Date: 2021-06-04 Impact factor: 10.057

10. Diagnostic accuracy and inter-observer agreement with the CO-RADS lexicon for CT chest reporting in COVID-19.

Authors: Anirudh Venugopalan Nair; Matthew McInnes; Bamil Jacob; Devendra Kumar; Deepak Kuttikatt Soman; Hanee Subair Valiyakath Subair; Parag Suresh Mahajan; Mohammed Ameer Hamza Shah; Mohammed Ali Saleh Sabawi; Mahmoud Al-Heidous
Journal: Emerg Radiol Date: 2021-07-24