| Literature DB >> 34903808 |
Andrew J Codlin1, Thang Phuoc Dao2, Luan Nguyen Quang Vo3,2, Rachel J Forse3,4, Vinh Van Truong5, Ha Minh Dang5, Lan Huu Nguyen5, Hoa Binh Nguyen6, Nhung Viet Nguyen6, Kristi Sidney-Annerstedt4, Bertie Squire7, Knut Lönnroth4, Maxine Caws7,8.
Abstract
There have been few independent evaluations of computer-aided detection (CAD) software for tuberculosis (TB) screening, despite the rapidly expanding array of available CAD solutions. We developed a test library of chest X-ray (CXR) images which was blindly re-read by two TB clinicians with different levels of experience and then processed by 12 CAD software solutions. Using Xpert MTB/RIF results as the reference standard, we compared the performance characteristics of each CAD software against both an Expert and Intermediate Reader, using cut-off thresholds which were selected to match the sensitivity of each human reader. Six CAD systems performed on par with the Expert Reader (Qure.ai, DeepTek, Delft Imaging, JF Healthcare, OXIPIT, and Lunit) and one additional software (Infervision) performed on par with the Intermediate Reader only. Qure.ai, Delft Imaging and Lunit were the only software to perform significantly better than the Intermediate Reader. The majority of these CAD software showed significantly lower performance among participants with a past history of TB. The radiography equipment used to capture the CXR image was also shown to affect performance for some CAD software. TB program implementers now have a wide selection of quality CAD software solutions to utilize in their CXR screening initiatives.Entities:
Mesh:
Year: 2021 PMID: 34903808 PMCID: PMC8668935 DOI: 10.1038/s41598-021-03265-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Demographic and clinical description of participants included in the test library.
| Total (N, %) | Xpert MTB/RIF results | P-value╪ | ||
|---|---|---|---|---|
| Negative (N, %) | Positive (N, %) | |||
| All participants | 1032 | 899 (87.1%) | 133 (12.9%) | N/A |
| Gender | ||||
| Male | 712 (69.0%) | 605 (85.0%) | 107 (15.0%) | |
| Female | 320 (31.0%) | 294 (91.9%) | 26 (8.1%) | |
| Age, median (IQR) | 62 (53–70) | 62 (54–71) | 59 (48–68) | |
| 15–54 years | 291 (28.2%) | 240 (82.5%) | 51 (17.5%) | |
| ≥ 55 years | 741 (71.8%) | 659 (88.9%) | 82 (11.1%) | |
| Health insurance | 881 (85.4%) | 769 (87.3%) | 112 (12.7%) | 0.686 |
| Residency status | ||||
| Long-term resident of HCMC | 896 (86.8%) | 783 (87.4%) | 113 (12.6%) | 0.497 |
| Recent migrant to HCMC | 136 (13.2%) | 116 (85.3%) | 20 (14.7%) | |
| Cough (C) | ||||
| No Cough | 455 (44.1%) | 420 (92.3%) | 35 (7.7%) | |
| < 2 weeks | 175 (17.0%) | 148 (84.6%) | 27 (15.4%) | |
| ≥ 2 weeks | 402 (39.0%) | 331 (82.3%) | 71 (17.7%) | |
| Fever (F) | 56 (5.4%) | 43 (76.8%) | 13 (23.2%) | |
| Weight loss (WL) | 113 (10.9%) | 94 (83.2%) | 19 (16.8%) | 0.187 |
| Night sweats (NS) | 64 (6.2%) | 59 (92.2%) | 5 (7.8%) | 0.211 |
| 4 Symptoms: C + F + WL + NS | 638 (61.8%) | 534 (83.7%) | 104 (16.3%) | |
| Chest pain | 229 (22.2%) | 199 (86.9%) | 30 (13.1%) | 0.913 |
| Fatigue | 235 (22.8%) | 189 (80.4%) | 46 (19.6%) | |
| Any TB symptom | 727 (70.4%) | 614 (84.5%) | 113 (15.5%) | |
| Cough plus any one symptom | 291 (28.2%) | 238 (81.8%) | 53 (18.2%) | |
| Any TB symptom except cough | 441 (42.7%) | 373 (84.6%) | 68 (15.4%) | |
| Contact of TB patient | 64 (6.2%) | 48 (75.0%) | 16 (25.0%) | |
| Past history of TB | 346 (33.5%) | 294 (85.0%) | 52 (15.0%) | 0.145 |
| Diabetes | 103 (10.0%) | 87 (84.5%) | 16 (15.5%) | 0.398 |
| HIV | 2 (0.2%) | 2 (100.0%) | 0 (0.0%) | 0.586 |
| Radiography system | ||||
| JPI Healthcare | 493 (47.8%) | 464 (94.1%) | 29 (6.3%) | |
| DRTECH | 539 (52.2%) | 435 (80.7%) | 104 (23.9%) | |
| Abnormal CXR | ||||
| Expert Reader | 647 (62.7%) | 520 (80.4%) | 127 (19.6%) | 0.338 |
| Intermediate Reader | 495 (48.0%) | 386 (78.0%) | 109 (22.0%) | |
| Normal/Clear CXR | ||||
| Expert Reader | 385 (37.3%) | 379 (98.4%) | 6 (1.6%) | |
| Intermediate Reader | 537 (52.0%) | 513 (95.5%) | 24 (4.5%) | |
Significant values are in bold.
╪Chi-squared test.
Area under the ROC and precision-recall (RC) curves for each CAD software.
| Developer (software name╪, version) | ROC AUC (95% CI) | PR AUC (95% CI) |
|---|---|---|
| Qure.ai (qXR v3) | 0.82 (0.79–0.86) | 0.41 (0.33–0.50) |
| Delft Imaging (CAD4TB v7) | 0.82 (0.78–0.85) | 0.39 (0.31–0.47) |
| DeepTek (Genki v2) | 0.78 (0.75–0.82) | 0.28 (0.22–0.34) |
| Lunit (INSIGHT CXR v3.1.0.0) | 0.82 (0.79–0.86) | 0.44 (0.35–0.54) |
| JF Healthcare (JF CXR-1 v3.0) | 0.77 (0.73–0.81) | 0.28 (0.22–0.35) |
| InferVision (InferRead DR Chest v1.0.0.0) | 0.76 (0.72–0.80) | 0.29 (0.22–0.36) |
| OXIPIT (ChestEye v1) | 0.73 (0.69–0.77) | 0.23 (0.18–0.28) |
| Artelus (T-Xnet v1) | 0.70 (0.66–0.74) | 0.23 (0.17–0.29) |
| EPCON (XrayAME v1) | 0.66 (0.61–0.71) | 0.23 (0.17–0.28) |
| COTO (v1) | 0.66 (0.61–0.71) | 0.22 (0.17–0.28) |
| SemanticMD (v1) | 0.53 (0.48–0.58) | 0.14 (0.10–0.17) |
| Dr CADx (v0.1) | 0.50 (0.45–0.55) | 0.13 (0.10–0.16) |
ROC AUC area under the receiver operating characteristic curve, PR AUC area under the precision-recall curve.
╪Software name omitted if none available.
Figure 1ROC graphs for each CAD software. ROC AUC area under the receiver operating characteristic curve.
CAD software performance when matching the sensitivity of the Expert Reader.
| Cut-off Score | TP | FP | FN | TN | Sensitivity (95% CI) | Specificity (95% CI) | Accuracy (95% CI) | |
|---|---|---|---|---|---|---|---|---|
| Expert Reader | N/A | 127 | 520 | 6 | 379 | 95.5% (90.4–98.3) | 42.2% (38.9–45.5) | 49.0% (45.9–52.1) |
| Qure.ai | 44.1 | 127 | 461 | 6 | 438 | 95.5% (90.4–98.3) | 48.7% (45.4–52.0) | 54.7% (51.7–57.8) |
| DeepTek | 31.1 | 127 | 483 | 6 | 416 | 95.5% (90.4–98.3) | 46.3% (43.0–49.6) | 52.6% (49.5–55.7) |
| Delft imaging | 46.7 | 127 | 492 | 6 | 407 | 95.5% (90.4–98.3) | 45.3% (42.0–48.6) | 51.7% (48.7–54.8) |
| JF Healthcare | 83.4 | 127 | 530 | 6 | 369 | 95.5% (90.4–98.3) | 41.0% (37.8–44.3) | 48.1% (45.0–51.2) |
| OXIPIT | 15.4 | 127 | 532 | 6 | 367 | 95.5% (90.4–98.3) | 40.8% (37.6–44.1) | 47.9% (44.8–51.0) |
| Lunit | 3.0 | 127 | 551 | 6 | 348 | 95.5% (90.4–98.3) | 38.7% (35.5–42.0) | 46.0% (43.0–49.1) |
| InferVision | 53.8 | 127 | 661 | 6 | 238 | 95.5% (90.4–98.3) | 26.5% (23.6–29.5) | 35.4% (32.5–38.4) |
| Artelus | 1.2 | 127 | 691 | 6 | 208 | 95.5% (90.4–98.3) | 23.1% (20.4–26.0) | 32.5% (29.6–35.4) |
| Dr CADx | 27.8 | 127 | 790 | 6 | 109 | 95.5% (90.4–98.3) | 12.1% (10.1–14.4) | 22.9% (20.3–25.6) |
| SemanticMD | 0.4 | 127 | 808 | 6 | 91 | 95.5% (90.4–98.3) | 10.1% (7.2–10.8) | 21.1% (16.6–21.2) |
| EPCON | 0.6 | 127 | 815 | 6 | 84 | 95.5% (90.4–98.3) | 9.3% (6.6–10.0) | 20.4% (16.0–20.6) |
| COTO | 1.5 | 127 | 842 | 6 | 57 | 95.5% (90.4–98.3) | 6.3% (4.8–8.1) | 17.8% (15.5–20.3) |
TP true positive, FP false positive, FN false negative, TN true negative.
CAD software performance when matching the sensitivity of the Intermediate Reader.
| Cut-off Score | TP | FP | FN | TN | Sensitivity (95% CI) | Specificity (95% CI) | Accuracy (95% CI) | |
|---|---|---|---|---|---|---|---|---|
| Intermediate Reader | N/A | 109 | 386 | 24 | 513 | 82.0% (74.4–88.1) | 57.1% (53.8–60.3) | 60.3% (57.2–63.3) |
| Qure.ai | 76.5 | 109 | 307 | 24 | 592 | 82.0% (74.4–88.1) | ||
| Delft Imaging | 64.7 | 109 | 309 | 24 | 590 | 82.0% (74.4–88.1) | ||
| DeepTek | 55.7 | 109 | 331 | 24 | 568 | 82.0% (74.4–88.1) | 63.2% (59.9–66.3) | 65.6% (62.6–68.5) |
| Lunit | 20.7 | 109 | 314 | 24 | 585 | 82.0% (74.4–88.1) | ||
| JF Healthcare | 98.3 | 109 | 379 | 24 | 520 | 82.0% (74.4–88.1) | 57.8% (54.5–61.1) | 60.9% (57.9–63.9) |
| InferVision | 77.4 | 109 | 387 | 24 | 512 | 82.0% (74.4–88.1) | 57.0% (53.6–60.2) | 60.2% (57.1–63.2) |
| OXIPIT | 23.8 | 109 | 441 | 24 | 458 | 82.0% (74.4–88.1) | 50.9% (47.6–54.3) | 54.9% (51.9–58.0) |
| Artelus | 5.6 | 109 | 492 | 24 | 431 | 82.0% (74.4–88.1) | 45.3% (42.0–48.6) | 50.0% (46.9–53.1) |
| EPCON | 11.7 | 109 | 547 | 24 | 352 | 82.0% (74.4–88.1) | 39.2% (36.0–42.4) | 44.7% (41.6–47.8) |
| COTO | 12.2 | 109 | 568 | 24 | 331 | 82.0% (74.4–88.1) | 36.8% (33.7–40.1) | 42.6% (39.6–45.7) |
| Dr CADx | 64.1 | 108 | 713 | 25 | 186 | 81.2% (73.5–87.5)╪ | 20.7% (18.1–23.5) | 28.5% (25.8–31.4) |
| SemanticMD | 0.9 | 109 | 714 | 24 | 185 | 82.0% (74.4–88.1) | 20.6% (18.0–23.4) | 28.5% (25.8–31.4) |
TP true positive, FP false positive, FN false negative; TN True Negative.
Bolded figures indicate performance significantly higher than the Intermediate Reader.
╪It was impossible to select a cut-off score achieving 109 true positives, as two Xpert-positive participants have the same score.
Comparison of CAD software ROC AUC by key demographic and clinical factors.
| Qure.ai | Delft Imaging | DeepTek | ||||||
|---|---|---|---|---|---|---|---|---|
| ROC AUC (95% CI) | P-value╪ | ROC AUC (95% CI) | P-value╪ | ROC AUC (95% CI) | P-value╪ | |||
| 0.82 (0.79–0.85) | – | 0.82 (0.78–0.85) | – | 0.78 (0.74–0.82) | – | |||
| Male | 0.80 (0.76–0.85) | 0.222 | 0.80 (0.76–0.85) | 0.363 | 0.75 (0.71–0.80) | 0.066 | ||
| Female | 0.85 (0.79–0.92) | 0.84 (0.78–0.90) | 0.83 (0.76–0.90) | |||||
| 15–54 years | 0.84 (0.79–0.90) | 0.248 | 0.86 (0.81–0.91) | 0.79 (0.73–0.85) | 0.534 | |||
| ≥ 55 years | 0.80 (0.76–0.85) | 0.79 (0.74–0.84) | 0.77 (0.72–0.82) | |||||
| No | 0.78 (0.71–0.86) | 0.294 | 0.78 (0.72–0.85) | 0.307 | 0.75 (0.67–0.83) | 0.489 | ||
| Yes | 0.83 (0.79–0.87) | 0.82 (0.78–0.86) | 0.78 (0.74–0.82) | |||||
| No | 0.86 (0.83–0.90) | 0.85 (0.82–0.89) | 0.83 (0.79–0.87) | |||||
| Yes | 0.73 (0.65–0.80) | 0.76 (0.69–0.83) | 0.69 (0.62–0.77) | |||||
| JPI Healthcare | 0.85 (0.79–0.90) | 0.072 | 0.82 (0.76–0.87) | 0.514 | 0.81 (0.75–0.86) | |||
| DRTECH | 0.78 (0.73–0.83) | 0.79 (0.75–0.84) | 0.72 (0.67–0.77) | |||||
ROC AUC area under the receiver operating characteristic curve.
Significant values are in bold.
╪Chi-squared test.
Figure 2Diagram of test library creation.