| Literature DB >> 33180129 |
David F Steiner1, Kunal Nagpal1, Rory Sayres1, Davis J Foote1, Benjamin D Wedin1, Adam Pearce1, Carrie J Cai1, Samantha R Winter1, Matthew Symonds1, Liron Yatziv1, Andrei Kapishnikov1, Trissia Brown2, Isabelle Flament-Auvigne2, Fraser Tan1, Martin C Stumpe3, Pan-Pan Jiang1, Yun Liu1, Po-Hsuan Cameron Chen1, Greg S Corrado1, Michael Terry1, Craig H Mermel1.
Abstract
Importance: Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored. Objective: To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies. Design, Setting, and Participants: This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists. Exposure: An AI-based assistive tool for Gleason grading of prostate biopsies. Main Outcomes and Measures: Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies.Entities:
Year: 2020 PMID: 33180129 PMCID: PMC7662146 DOI: 10.1001/jamanetworkopen.2020.23267
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Patient Characteristics
| Characteristic | No. (%) | ||
|---|---|---|---|
| ML1 | ML2 | ML1 and ML2 | |
| Total participants, No. | 85 | 155 | 240 |
| Age at biopsy, y | |||
| <65 | 43 (50.6) | 53 (34.2) | 96 (40.0) |
| ≥65 | 37 (43.5) | 102 (65.8) | 139 (57.9) |
| Not available | 5 (5.9) | NA | 5 (2.1) |
| PSA level at biopsy, ng/mL | |||
| <10 | 22 (25.9) | 102 (65.8) | 124 (51.7) |
| ≥10 | 2 (2.3) | 34 (21.9) | 36 (15.0) |
| Not available | 61 (71.8) | 19 (12.3) | 80 (33.3) |
| Reference standard grade group | |||
| No tumor | 20 (23.5) | 20 (12.9) | 40 (16.7) |
| Grade group | |||
| 1 | 35 (41.2) | 75 (48.4) | 110 (45.8) |
| 2 | 10 (11.8) | 40 (25.8) | 50 (20.8) |
| 3 | 10 (11.8) | 10 (6.5) | 20 (8.3) |
| 4-5 | 10 (11.8) | 10 (6.5) | 20 (8.3) |
Abbreviations: ML, medical laboratory; NA, not applicable; PSA, prostate-specific antigen.
SI conversion factor: To convert PSA to micrograms per liter, multiply by 1.0.
Biopsies were obtained from 2 independent medical laboratories (ML1 and ML2). One core needle biopsy per independent case was included in the study.
Figure 1. User Interface for Artificial Intelligence (AI)–Based Assistive Tool and Summary of Study Design
A, Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite assistance modality to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, each pathologist reviewed the cases for a second time using the opposite modality. Details of the implementation of case distribution and washout period are available in the Study Design section of eMethods in the Supplement. The order of biopsies within each block was randomized independently for each pathologist and each round of the crossover. B, The interface of the AI-based assistive tool illustrates localized region-level Gleason pattern interpretations as colored outlines overlaid on the tissue image. Green indicates Gleason pattern 3; yellow, Gleason pattern 4; and red, Gleason pattern 5 (not present). In the left toolbar, the AI-provided Gleason score, grade group, and Gleason pattern percentage are summarized, with toggles provided so that users can turn the visibility of several features on or off. Slide thumbnails allow users to quickly switch between multiple sections of the biopsy.
Figure 2. Impact of Artificial Intelligence (AI)–Based Assistance for Prostate Biopsy Grading and Tumor Detection
A, Individual pathologist agreement with the majority opinion of subspecialists across all 240 biopsies. Dotted lines connect the points representing the same pathologist for each modality (assisted vs unassisted), and box-plot edges represent quartiles. B, Error bars represent 95% CIs. C, Circles and triangles represent sensitivities and specificities for each pathologist. The black line represents the receiver operating characteristic curve of the underlying deep learning system. D, Visualization of grades provided by all pathologists for all biopsies. Each colored box represents a grade for a single biopsy by a single pathologist. Each column represents a biopsy, and each row represents a pathologist. The greater number of solid-colored blocks in the AI-assisted plot illustrate the assistance-associated increases in interpathologist agreement and accuracy. AUROC indicates area under the receiver operating characteristic curve; GG, grade group.
Confusion Matrices (Contingency Tables) for Unassisted and Assisted Reviews Relative to the Subspecialist Reference Standard
| Subspecialist reference standard grade group | Pathologist grade group, % | ||||
|---|---|---|---|---|---|
| No tumor | GG1 | GG2 | GG3 | GG4-5 | |
| Pathologists with AI assistance | |||||
| No tumor | 748 (15.6) | 36 (0.8) | 5 (0.1) | 4 (0.1) | 7 (0.1) |
| GG1 | 241 (5.0) | 1590 (33.1) | 347 (7.2) | 19 (0.4) | 3 (0.1) |
| GG2 | 14 (0.3) | 260 (5.4) | 542 (11.3) | 143 (3.0) | 41 (0.9) |
| GG3 | 14 (0.3) | 15 (0.3) | 128 (2.7) | 163 (3.4) | 80 (1.7) |
| GG4-5 | 28 (0.6) | 2 (0.0004) | 17 (0.4) | 49 (1.0) | 304 (6.3) |
| Pathologists without AI assistance | |||||
| No tumor | 769 (16.0) | 19 (0.4) | 4 (0.1) | 1 (0.0002) | 7 (0.1) |
| GG1 | 207 (4.3) | 1727 (36.0) | 261 (5.4) | 4 (0.1) | 1 (0.0002) |
| GG2 | 10 (0.2) | 601 (5.0) | 601 (12.5) | 129 (2.7) | 20 (0.4) |
| GG3 | 12 (0.3) | 123 (0.1) | 123 (2.6) | 218 (4.5) | 42 (0.9) |
| GG4-5 | 16 (0.3) | 0 | 2 (0.0004) | 83 (1.7) | 299 (6.2) |
Abbreviations: AI, artificial intelligence; GG, grade group.
Values are percentage of total readings across all biopsies for the indicated assistance modality (n = 4800 per assistance modality; 20 pathologists multiplied by 240 biopsies).
Mean Review Time per Biopsy With and Without Artificial Intelligence Assistance
| Category | Biopsies, No. | Mean time per biopsy (95% CI), min | |
|---|---|---|---|
| Unassisted | Assisted | ||
| All | 240 | 3.7 (3.6-3.8) | 3.2 (3.2-3.3) |
| No tumor | 40 | 2.6 (2.5-2.8) | 2.2 (2.1-2.3) |
| Grade group | |||
| 1 | 110 | 3.6 (3.5-3.7) | 3.0 (2.9-3.1) |
| 2 | 50 | 4.3 (4.1-4.5) | 3.8 (3.7-4.0) |
| 3 | 20 | 4.4 (4.1-4.7) | 4.0 (3.8-4.3) |
| 4-5 | 20 | 4.3 (4.0-4.5) | 4.1 (3.8-4.4) |