| Literature DB >> 33719168 |
Pegah Khosravi1,2,3, Maria Lysandrou4, Mahmoud Eljalby5, Qianzi Li2,6, Ehsan Kazemi7, Pantelis Zisimopoulos2,3, Alexandros Sigaras2,3, Matthew Brendel2, Josue Barnes2,3, Camir Ricketts2,3, Dmitry Meleshko2,3, Andy Yat8, Timothy D McClure5, Brian D Robinson9, Andrea Sboner2,3,9, Olivier Elemento2,3,10, Bilal Chughtai5, Iman Hajirasouliha2,3.
Abstract
BACKGROUND: A definitive diagnosis of prostate cancer requires a biopsy to obtain tissue for pathologic analysis, but this is an invasive procedure and is associated with complications.Entities:
Keywords: MRI images; PI-RADS; artificial intelligence; biopsy; deep neural networks; prostate cancer
Mesh:
Year: 2021 PMID: 33719168 PMCID: PMC8360022 DOI: 10.1002/jmri.27599
Source DB: PubMed Journal: J Magn Reson Imaging ISSN: 1053-1807 Impact factor: 4.813
Grade Group and Gleason Score and Their Association With the Risk Level of Prostate Cancer
| Grade Group | Gleason Score | Combined Gleason Score | Aggressiveness degree |
|---|---|---|---|
| Grade Group 1 | 3 + 3 | 6 | Low risk |
| Grade Group 2 | 3 + 4 | 7 | Intermediate risk but closer to low risk |
| Grade Group 3 | 4 + 3 | 7 | Intermediate risk but closer to high risk |
| Grade Group 4 | 4 + 4, 3 + 5, 5 + 3 | 8 | High risk |
| Grade Group 5 | 4 + 5, 5 + 4, 5 + 5 | 9 and 10 | High risk |
These two different systems are mapped together using the table that was provided and simplified based on the NCCN guidelines version 4.2018 prostate cancer.
FIGURE 1Method flow chart. (a) Unsegmented consistent sequences of seven axial T2w magnetic resonance (MR) image slices for each patient were selected that represent the prostate glands. (b) Each patient's MRI slice labeled by their corresponding biopsy result based on its Grade Group (GG) and Gleason Score (GS). (c) A convolutional neural network (CNN)‐based model (Model 1) classifies the cancer vs. benign and subsequently, and the second CNN‐based model (Model 2) predicts the risk level for each patient. (d) We highlighted the regions of MR images that algorithms focus on for prediction and compared the output of Model 2 with Prostate Imaging Reporting and Data System (PI‐RADS) using pathology labels as ground truth for a subset of test set. Receiver operating characteristic curves (ROCs) were used to assess the performance of different models based on individual patient.
Characteristics of All Five Cohorts and the Comprised Biopsy Reports and T2w Images Obtained from TCIA and In‐House
| Databases and references | Selected cases and MRI types | Annotation method (biopsy types) | Cancer patients | Benign cases | |||
|---|---|---|---|---|---|---|---|
| High‐risk | Low‐risk | Intermediate‐risk | Intermediate‐risk | Benign | |||
| (GS ≥ 8) | (GS = 6) | (GS = 7) | (GS = 7) | ||||
| (GG = 4 & GG = 5) | (GG = 1) | (GG = 2) | (GG = 3) | ||||
| Weill Cornell Medicine | 228, age (52–85), 3.0 T | GS and GG (fusion guided biopsy), PI‐RADS | 11 | 48 | 37 | 15 | 117 |
| PROSTATEx | 99, 3.0 T | GG (core needle biopsy) | 13 | 29 | 38 | 19 | 0 |
| PROSTATE‐DIAGNOSIS | 38, 1.5 T | GS (core needle biopsy) | 9 | 5 | 15 | 9 | 0 |
| PROSTATE‐MRI | 26, 3.0 T | GS (prostatectomy) | 11 | 0 | 13 | 2 | 0 |
| TCGA‐PRAD | 9, 3.0 T | GS and GG (core needle biopsy) | 4 | 0 | 3 | 2 | 0 |
| Total | 400, 1.5 T to 3.0 T | GG and GS (reviewed pathology report) | 48 | 82 | 106 | 47 | 117 |
T = Tesla; GS = Gleason Score; GG = Grade Group; T2w = T2‐weighted; TCIA = The Cancer Imaging Archive; MRI = magnetic resonance imaging.
Characteristics of Both Trained Models and the Comprised Patients
| Model | Data resources | Number of patients with cancerous tumor in training and validation sets | Number of patients with benign tumor in training and validation sets | Number of patients in test set |
|---|---|---|---|---|
|
Model 1: Benign vs. cancer | In‐house and public | 75 patients (37 GG = 3, 38 GG = 4 and GG = 5) | 107 patients (benign) |
10 benign 10 GG = 1 10 GG = 2 10 GG = 3 10 GG = 4&5 |
| Model | Data resources | Number of patients with high‐risk tumor in training and validation sets | Number of patients with low‐risk tumor in training and validation sets | Number of patients in test set |
|
Model 2: High‐risk vs. low‐risk | In‐house and public | 75 patients (37 GG = 3, 38 GG = 4 and GG = 5) | 168 patients (72 GG = 1 and 96 GG = 2) |
10 GG = 1 10 GG = 2 10 GG = 3 10 GG = 4&5 |
GG = Grade Group.
FIGURE 2Performance of two trained models for individual patient in the test set. (a) Model 1 performance for classifying cancer vs. benign. (b) The number of patients that were identified correct or incorrect by Model 1, negative predictive value, positive predictive value, specificity, sensitivity, and accuracy for cancer vs. benign. (c) Model 2 performance for classifying high risk vs. low risk. (d) The number of patients that were identified correct or incorrect by Model 2, negative predictive value, positive predictive value, specificity, sensitivity, and accuracy for high risk vs. low risk.
FIGURE 3The highlighted prostate glands using class activation map (CAM) and radiologists. Model 2 classifies each image as high risk or low risk, and the deep feature analysis highlights the discriminative regions of the images. A radiologist marked the prostate gland of the images using green square dots. Biopsy results (based on Grade Groups [GGs]) as ground truth and Prostate Imaging Reporting and Data System (PI‐RADS) also are indicated in the figure. (a) Artificial intelligence (AI)‐biopsy predicts the risk level of cases (with a probability score for each class) and highlighted the prostate gland correctly. (b) AI‐biopsy is not able to predict the correct risk level of cases in which the prostate glands are not correctly detected. Red color illustrates features with higher weight.
FIGURE 4AI‐biopsy is a fully automated framework to use in clinics for evaluation of the prostate cancer risk level. We employed a threshold condition on the output of both models for diagnosis using minimum seven T2w axial image slices. (a) While for prediction of benign diagnosis, all seven image slices should get P ≥ 0.5 for the benign class; (b) one image slice (out of seven imported image slices) with P ≥ 0.5 is enough for Model 1 to result in cancer prediction; (c) Model 2 needs at least two image slices (out of seven imported image slices) with high‐risk P ≥ 0.5 for a patient to result in high‐risk diagnosis; and (d) the result explanation could be seen by clicking on “N/A” option in the web interface (https://ai‐biopsy.eipm‐research.org).