| Literature DB >> 32542445 |
Lars Egevad1, Daniela Swanberg2, Brett Delahunt3, Peter Ström4, Kimmo Kartasalo4, Henrik Olsson4, Dan M Berney5, David G Bostwick6, Andrew J Evans7, Peter A Humphrey8, Kenneth A Iczkowski9, James G Kench10, Glen Kristiansen11, Katia R M Leite12, Jesse K McKenney13, Jon Oxley14, Chin-Chen Pan15, Hemamali Samaratunga16, John R Srigley17, Hiroyuki Takahashi18, Toyonori Tsuzuki19, Theo van der Kwast7, Murali Varma20, Ming Zhou21, Mark Clements4, Martin Eklund4.
Abstract
The International Society of Urological Pathology (ISUP) hosts a reference image database supervised by experts with the purpose of establishing an international standard in prostate cancer grading. Here, we aimed to identify areas of grading difficulties and compare the results with those obtained from an artificial intelligence system trained in grading. In a series of 87 needle biopsies of cancers selected to include problematic cases, experts failed to reach a 2/3 consensus in 41.4% (36/87). Among consensus and non-consensus cases, the weighted kappa was 0.77 (range 0.68-0.84) and 0.50 (range 0.40-0.57), respectively. Among the non-consensus cases, four main causes of disagreement were identified: the distinction between Gleason score 3 + 3 with tangential cutting artifacts vs. Gleason score 3 + 4 with poorly formed or fused glands (13 cases), Gleason score 3 + 4 vs. 4 + 3 (7 cases), Gleason score 4 + 3 vs. 4 + 4 (8 cases) and the identification of a small component of Gleason pattern 5 (6 cases). The AI system obtained a weighted kappa value of 0.53 among the non-consensus cases, placing it as the observer with the sixth best reproducibility out of a total of 24. AI may serve as a decision support and decrease inter-observer variability by its ability to make consistent decisions. The grading of these cancer patterns that best predicts outcome and guides treatment warrants further clinical and genetic studies. Results of such investigations should be used to improve calibration of AI systems.Entities:
Keywords: Artificial intelligence; Grading; Pathology; Prostate cancer; Reproducibility; Standardization
Year: 2020 PMID: 32542445 PMCID: PMC7683442 DOI: 10.1007/s00428-020-02858-w
Source DB: PubMed Journal: Virchows Arch ISSN: 0945-6317 Impact factor: 4.064
Fig. 1Grading performance relative to ISUP expert panel on Imagebase. The distribution of ISUP scores given by the 23 pathologists from the ISUP expert panel and the AI for each of the 87 case IDs in Imagebase. Each row corresponds to one case, and the cases are organized into three plots according to average ISUP score increasing from left to right, and from top to bottom. The areas of the blue circles represent the proportion of pathologists who voted for a specific ISUP score (x-axis). The red dot indicates the ISUP score given by the AI. Example: In the last row (bottom-right; case ID 5) most pathologists voted ISUP 5 and a minority ISUP 4; the red dot indicates that AI voted ISUP 4
Agreement (%) between AI grades and the pathologists’ grading by ISUP grade (all cases)
| ISUP grades assigned by Imagebase panel (%) | |||||
|---|---|---|---|---|---|
| ISUP grades by AI | 1 | 2 | 3 | 4 | 5 |
| 1 | 72.0 | 27.2 | 0.8 | 0 | 0 |
| 2 | 29.7 | 58.8 | 10.5 | 1.0 | 0 |
| 3 | 4.3 | 39.9 | 38.6 | 12.5 | 4.6 |
| 4 | 0 | 7.3 | 19.8 | 43.5 | 29.3 |
| 5 | 0 | 0.6 | 23.6 | 33.5 | 42.2 |
Average and range agreement (%, mean, range) across all pathologists by ISUP grade (all cases)
| ISUP grades assigned by other pathologists in the Imagebase panel (%, mean, range) | |||||
|---|---|---|---|---|---|
| ISUP grade by pathologist | 1 | 2 | 3 | 4 | 5 |
| 1 | 70.6 (56.7–86.4) | 27.9 (13.3–40.2) | 1.4 (0.3–3.1) | 0.1 (0–0.5) | 0 |
| 2 | 20.7 (8.9–38.4) | 63.5 (56.6–72.3) | 13.1 (4.8–23.2) | 2.3 (0.2–6.2) | 0.3 (0–3.1) |
| 3 | 2.4 (0–13.1) | 27.0 (4.2–60.5) | 44.4 (26.4–55.6) | 18.6 (2.8–42.0) | 7.6 (0.3–23.6) |
| 4 | 0.1 (0–1.4) | 4.8 (0–21.9) | 20.9 (1.8–37.7) | 58.5 (41.1–80.0) | 15.6 (0.8–34.5) |
| 5 | 0 | 1.0 (0–2.1) | 12.3 (2.3–21.8) | 20.3 (11.4–29.9) | 66.4 (52.9–86.4) |
Fig. 2a–c Mean weighted kappas for International Society of Urological Pathology (ISUP) grades 1–5 of 24 observers with complete data submission for all cases (a), consensus cases (b), and non-consensus cases (c). The AI system’s mean kappa is shown as a triangle in blue color, and the 23 expert pathologists in the Imagebase reference panel as red dots. Whiskers indicate 95% confidence intervals
Causes of disagreement between pathologists among non-consensus cases of ISUP Imagebase and results of AI. GS = Gleason score, GP = Gleason pattern
| Causes of disagreement | Number of cases | AI results |
|---|---|---|
| GS 3 + 3 with tangential cutting artifacts vs. GS 3 + 4 with poorly formed or fused glands | 13 | 3 + 4 in 8/13 |
| GS 3 + 4 vs. 4 + 3 | 7 | 4 + 3 in 6/7 |
| GS 4 + 3 vs. 4 + 4 | 8 | 4 + 4 in 4/8 |
| Identification of small component of Gleason pattern 5 | 6 | 4 + 5/5 + 4 in 2/6 |
| Other (a possible glomeruloid body, mucinous cancer) | 2 | 3 + 3 and 4 + 3 |
| Total non-consensus | 36 |
Fig. 3a, b Cancer bordering between Gleason score 3 + 3 = 6 with tangential cuts and Gleason score 3 + 4 = 7 with poorly formed glands. Panel members voted 3 + 3 = 6 in 54.2% and 3 + 4 = 7 in 45.8% and AI assigned a Gleason score of 3 + 3 = 6. c, d Cancer bordering between Gleason score 3 + 3 = 6 with tangential cuts and Gleason score 3 + 4 = 7 with fused glands, particularly in c. Panel members voted 3 + 3 = 6 in 37.5% and 3 + 4 = 7 in 58.3% and AI assigned a Gleason score of 3 + 4 = 7. All microphotographs show hematoxylin and eosin stains at × 20 lens magnification
Fig. 4a, b Cancer bordering between Gleason score 3 + 4 = 7 and 4 + 3 = 7. Panel members voted 3 + 4 = 7 in 33.3% and 4 + 3 = 7 in 58.3% and AI assigned a Gleason score of 4 + 3 = 7. c, d Cancer bordering between Gleason score 4 + 3 = 7 and 4 + 4 = 8. Mostly cribriform and glomeruloid glands but also occasional separate glands, particularly in d. Panel members voted 4 + 3 = 7 in 37.5% and 4 + 4 = 8 in 62.5% and AI assigned a Gleason score of 4 + 4 = 8. All microphotographs show hematoxylin and eosin stains at × 20 lens magnification
Fig. 5a, b Cancer bordering between Gleason score 4 + 4 = 8 and Gleason score 9. The tumor is dominated by cribriform cancer but there is also an area with some seemingly dispersed cells and strands, particularly in b. The possibility of crush artifacts may be considered. Panel members voted 4 + 4 = 8 in 41.7% and 9 in 58.3% and AI assigned a Gleason score of 9. c, d Cancer bordering to Gleason score 9. Pale cells forming some gland-like nests but strands and some single cells are also seen, particularly in c. Newly diagnosed cancer with no history of hormonal treatment. Poorly formed, tadpole-like structures with tapered and sometimes transitions to strands are seen in d. Panel members voted Gleason score 9 in 58.3% while the remainder were spread across Gleason scores 3 + 4, 4 + 3, and 4 + 4. AI assigned a Gleason score of 9. All microphotographs show hematoxylin and eosin stains at × 20 lens magnification