Literature DB >> 36249889

A review of artificial intelligence in prostate cancer detection on imaging.

Indrani Bhattacharya^1,2, Yash S Khandwala², Sulaiman Vesal², Wei Shao³, Qianye Yang^4,5, Simon J C Soerensen^2,6, Richard E Fan², Pejman Ghanouni^3,2, Christian A Kunder⁷, James D Brooks², Yipeng Hu^4,5, Mirabela Rusu³, Geoffrey A Sonn^3,2.

Abstract

A multitude of studies have explored the role of artificial intelligence (AI) in providing diagnostic support to radiologists, pathologists, and urologists in prostate cancer detection, risk-stratification, and management. This review provides a comprehensive overview of relevant literature regarding the use of AI models in (1) detecting prostate cancer on radiology images (magnetic resonance and ultrasound imaging), (2) detecting prostate cancer on histopathology images of prostate biopsy tissue, and (3) assisting in supporting tasks for prostate cancer detection (prostate gland segmentation, MRI-histopathology registration, MRI-ultrasound registration). We discuss both the potential of these AI models to assist in the clinical workflow of prostate cancer diagnosis, as well as the current limitations including variability in training data sets, algorithms, and evaluation criteria. We also discuss ongoing challenges and what is needed to bridge the gap between academic research on AI for prostate cancer and commercial solutions that improve routine clinical care.

Entities: Chemical

Keywords: artificial intelligence; histopathology images; magnetic resonance imaging; prostate cancer diagnosis; registration; ultrasound images

Year: 2022 PMID： 36249889 PMCID： PMC9554123 DOI： 10.1177/17562872221128791

Source DB: PubMed Journal: Ther Adv Urol ISSN： 1756-2872

Introduction

Prostate cancer screening with prostate-specific antigen (PSA) has contributed to a >50% reduction in death from prostate cancer,[1] yet it has also resulted in a major problem of overdiagnosis and overtreatment of non-aggressive prostate cancer.[2] As a result, focus has shifted to preferential diagnosis and treatment of aggressive prostate cancers. New diagnostic tests including blood- and urine-based biomarkers, genetic tests, and improved imaging modalities have great potential to save lives while at the same time reduce the problem of overdiagnosis. However, the optimal usage of the massive amount of data generated by these new tests remains a major clinical and research challenge. Artificial intelligence (AI)-based systems will play a major role in addressing this challenge.[3,4] AI models are computational approaches that learn patterns from existing data to enable predictions in new, unseen data. Earlier AI models used, what is often referred to as, ‘traditional’ machine learning approaches, which were often executed in two steps. First, domain-experts (human experts in the subject matter) carefully designed features to extract quantitative variables from the data specific to the task, for example, tumor volume or shape. Second, these hand-crafted features were fed into computational models to learn which features were useful and how to combine them to maximize accuracy in classifying data in categories (e.g. benign nodules vs malignant tumors). Once trained, such AI models could generate predictions in new, previously unseen data. Recent advances in the computing power of graphics processing units (GPUs) have enabled the development of deep learning models. Deep learning models alleviate the need for hand-crafted features, thereby working in a completely automated manner to both identify the features and use them for the desired downstream task. Deep learning models have revolutionized the field of AI through unprecedented performance that often exceeds human performance, particularly in tasks related to image analysis. Development of medical AI models (in particular, deep learning models) that learn and predict from medical data to assist diagnosis, prognosis, and clinical decision-making for a variety of diseases is an active area of research.[5-7] Research on AI-assisted prostate cancer diagnosis is also evolving rapidly and has the potential to facilitate all aspects of the current standard diagnostic pathway (Figure 1). Although there exists a large body of research literature surrounding the use of AI in prostate cancer diagnosis, most of these methods are not yet ready for clinical deployment. Several challenges exist that impede the deployment of these widely researched AI tools for diagnostic support in the clinic.

Figure 1.

Potential of AI to assist prostate cancer diagnosis on imaging. AI models can help in detecting and characterizing cancer aggressiveness on non-invasive radiology images (MRI and ultrasound), as well as on histopathology images acquired through prostate biopsy. Aggressive cancer is shown in yellow, and indolent cancer in green in the ‘AI for cancer diagnosis’ panel. AI models can also help in supporting tasks for cancer detection, namely prostate gland segmentation, MRI-ultrasound registration, and MRI-histopathology registration. Clinicians and AI researchers working on prostate cancer should develop a thorough understanding of this emerging interdisciplinary domain to successfully herald AI-enabled precision medicine, with a goal of revolutionizing the diagnosis and treatment of prostate cancer. Here, we provide a systematic overview of the relevant literature involving the use of AI for prostate cancer diagnosis on medical images (Figure 1). In particular, we discuss existing AI literature for Detecting prostate cancer on radiology images (magnetic resonance and ultrasound imaging). Detecting prostate cancer on histopathology images. Supporting tasks for prostate cancer detection. We then discuss challenges associated with implementing these AI-enabled diagnostic tools in the clinic, and possible solutions to overcome them.

Potential of AI in prostate cancer diagnosis

Medical images play an important role in prostate cancer diagnosis. For many years, this involved transrectal ultrasound alone to guide systematic biopsy. More recently, magnetic resonance imaging (MRI) has been shown to greatly improve prostate cancer detection.[8,9] MRI-ultrasound fusion biopsies are increasingly used to target lesions outlined on MRI by radiologists. MRI-ultrasound fusion biopsies improve detection of clinically significant prostate cancer over ultrasound-guided systematic biopsies alone.[8,10-13] Finally, prostate tissue obtained through biopsy is subjected to histopathological analysis to identify the presence and grade prostate cancer. Urologists plan treatment based on the aggressiveness of prostate cancer, with a primary objective of treating aggressive cancer and reducing over-treatment of indolent cancer. Numerous opportunities exist for optimizing this workflow (Figure 1), such as improving detection of cancer on ultrasound and MRI, reducing inter-observer variability among radiologists, and assisting pathologists in identifying and grading cancer on histopathology images. Moreover, AI models can help cancer diagnosis by facilitating supporting tasks in cancer detection that are labor- and experience-intensive, such as prostate gland segmentation, MRI-ultrasound registration, MRI-ultrasound fusion biopsies, and MRI-histopathology registration for developing cancer detection models. This review categorizes the existing studies on AI models to facilitate prostate cancer diagnosis as described below: AI models for prostate cancer detection and characterization of cancer aggressiveness. (a) On prostate MRI. (b) On prostate ultrasound images. (c) On histopathology images collected through prostate needle biopsies. AI models for supporting tasks in cancer detection. (a) Prostate gland segmentation on MRI and ultrasound images to facilitate MRI-ultrasound fusion biopsies. (b) MRI-ultrasound registration to facilitate MRI-ultrasound fusion biopsies. (c) MRI-histopathology registration for ground truth labeling of cancer detection models. The following sections briefly summarize relevant AI studies in each of these areas, highlighting strengths, weaknesses, variabilities, potential, and scope for use in clinical care.

AI models for prostate cancer detection and characterizing cancer aggressiveness

AI models can help detect cancer and characterize cancer aggressiveness on three kinds of images widely used in the clinical workflow of prostate cancer diagnosis: (1) MRI, (2) ultrasound, and (3) histopathology images of prostate biopsy tissue (Figure 1). Implications of accurate prostate cancer detection and aggressiveness characterization on imaging: accurately detecting, localizing, and characterizing lesions as aggressive or indolent using AI methods on non-invasive images (e.g. MRI and ultrasound) may significantly impact patient management and treatment planning. Non-invasive imaging can be used in conjunction with clinical variables (PSA density, race, prior biopsy history, etc) in routine clinical care to decide when biopsy is needed and how treatment is performed. For example, patients with aggressive prostate cancer accurately detected and localized on non-invasive images can be targeted with MRI-ultrasound fusion biopsies with more precision and using fewer biopsy needle samples. Patients with no cancer or with indolent cancer according to non-invasive images could safely avoid biopsy, thereby minimizing the unnecessary side-effects of invasive biopsy procedures (pain, bleeding, and infection).[8] Accurate selective identification of aggressive and indolent prostate cancer on non-invasive imaging can help prioritize and enable timely treatment planning for aggressive prostate cancer patients. Location and extent of aggressive cancer on non-invasive imaging can also help guide treatment decisions, that is, whether to perform radical prostatectomy or focal therapy or active surveillance. Accurate automated cancer aggressiveness grading on prostate histopathology images acquired through invasive biopsy procedures can help alleviate inter- and intra-pathologist variability in Gleason grading, and also significantly reduce time required from pathologists. Such standardization of pathologist interpretations and time savings will eventually facilitate disease management.

Cancer detection on prostate MRI

MRI is increasingly used to detect prostate cancer, guide MRI-ultrasound fusion biopsies, and plan treatment.[14] Currently, it is considered to be the most sensitive non-invasive imaging modality that enables visualization, detection, and localization of prostate cancer. However, the often subtle visual differences between benign and cancerous tissue on MRI make radiologist interpretations of MR images challenging. Despite adoption of PIRADS (Prostate Imaging-Reporting and Data System),[15] problems remain with false negatives (12% of aggressive cancers missed during screening,[8] 34% of aggressive and 81% of indolent cancers missed in men undergoing prostatectomy[10]), false positives (>35% false-positive rate[8]), and high inter-reader variability (inter-reader agreement κ = 0.46−0.78 [inter-reader agreement κ = 0.46−0.78[16,17]]. As a result, many unnecessary biopsies continue to be performed. Moreover, MRI-ultrasound fusion-targeted biopsies are usually supplemented with systematic biopsies, leading to increased risks (infection, bleeding, and pain), as well as over-detection and over-treatment of indolent cancers. Detecting cancer and simultaneously characterizing cancers as aggressive or indolent on MRI is an unmet clinical need. Such selective identification of aggressive and indolent cancer on MRI could help identify men with aggressive prostate cancer, and reduce unnecessary biopsies in men without cancer or with indolent prostate cancer. Several studies have investigated the use of AI for prostate cancer detection on MRI with encouraging performance. A body of literature exists surrounding the use of AI to predict the likelihood of a patient having prostate cancer without explicitly detecting lesions,[18] or the likelihood of biochemical recurrence in a patient after radical prostatectomy.[19] These studies use MR images, with or without clinical variables like age, PSA-density, PIRADS scores, etc. In this review, we only focus on methods for detecting cancer in patients without known cancer and further sub-divide AI models into two major tasks (Figure 2):

Figure 2.

Lesion classification: This group of AI models classify radiologist-outlined lesions (regions of interest) into categories (i.e. cancer or benign, clinically significant cancer or benign, or different Gleason grade groups) (Figure 2(a)); Table 1). AI models for lesion classification often use traditional machine learning, which involves extracting hand-crafted features from the region of interest, and then using a classifier to attempt to classify what category that lesion falls into. Hand-crafted features assess texture, shape, volume, or image-based radiomic features. Some of the traditional machine learning classifiers include artificial neural networks, random forests, support vector machines, or logistic regression-based classifiers. With the increasing success of deep learning–based methods, several lesion classification methods were also developed that can classify lesions using deep neural networks, without the need to select and extract hand-crafted features. If successfully deployed in the clinic, automated lesion classification methods would allow a physician to select a region of interest on an MRI slice and receive AI assistance in determining whether that region is likely to be cancerous.

Table 1.

AI models for prostate lesion classification on MRI.

Study	Input data	Cohort size	Data type	Algorithm	Training labels	Evaluation labels	Evaluation metric	Source code availability
Algohary et al.[20]	T2w, ADC	231	Retrospective, 4 inst.	TML	Biopsy	Biopsy	ROC-AUC, Acc.	No
Antonelli et al.[21]	T2w, ADC DWI, DCE	164	Retrospective, 1 inst.	TML	Biopsy	Biopsy	ROC-AUC, Se. at 50% threshold of Sp.	No
Bleker et al.[22]	T2w, ADC DWI, DCE	206	Retrospective, public data set	TML	Biopsy	Biopsy	ROC-AUC, Se., Sp.	No
Bonekamp et al.[23]	T2w, ADC DWI	316	Retrospective, public data set	TML	Biopsy	Biopsy	ROC-AUC, Se., Sp.	No
Chen et al.[24]	T2w, ADC	381	Retrospective, 1 inst.	TML	Biopsy	Biopsy	ROC-AUC, Acc., Se., Sp.,	No
Akamine et al.[25]	DWI, DCE	52	Retrospective, 1 inst.	Hierarchical clustering	RP	RP	Acc.	No
Kwon et al.[26]	T2w, ADC, DWI, DCE	344	Retrospective, public data set	TML	biopsy	biopsy	ROC-AUC, Se., PPV	No
Chaddad et al.[27]	T2w, ADC	112	Retrospective, 1 inst., public data set	TML	Biopsy	Biopsy	ROC-AUC	No
Hectors et al.[28]	T2w	64	Retrospective, 1 inst.	TML	RP	RP	ROC-AUC	No
Xu et al.[29]	T2w	331	Retrospective, 1 inst.	TML	RP	RP	ROC-AUC, decision curve analysis	No
Viswanath et al.[30]	T2w	85	Retrospective, 3 inst.	TML	RP	RP	ROC-AUC	No
Transin et al.[31]	ADC, DCE	74	Retrospective, 1 inst.	TML	Biopsy/RP	Biopsy/RP	ROC-AUC, Se., Sp.	No
Zhang et al.[32]	T2w, ADC	159	Retrospective, 2 inst.	TML	Biopsy	Biopsy	ROC-AUC	No
Deniffel et al.[33]	T2w, ADC, DWI	499	Retrospective, 1 inst.	DL	Biopsy	Biopsy	ROC-AUC, decision-curve analysis	No
Song et al.[34]	T2w, ADC, DWI	185	Retrospective, public data set	DL	Biopsy	Biopsy	ROC-AUC, Se., Sp., PPV	No
Takeuchi et al.[35]	T2w, ADC, DWI	334	Retrospective, 1 inst.	DL	Biopsy	Biopsy	ROC-AUC, Net-benefit curve, NPV	No
Yuan et al.[36]	T2w, ADC	244	Retrospective, 2 inst.	DL	Biospy	Biospy	Acc., Prec., Recall, F1-score	No
Aldoj et al.[37]	T2w, ADC, DWI, DCE	200	Retrospective, public data set	DL	Biopsy	Biopsy	ROC-AUC, Se., Sp.	No
Zhong et al.[38]	T2w, ADC	140	Retrospective, 1 inst.	DL	RP	RP	ROC-AUC, Acc., Se. Sp.	No
Abraham and Nair[39]	T2w, ADC, DWI	112	Retrospective, public data set	DL	Biopsy	Biopsy	ROC-AUC, quadratic wtd. kappa, PPV	No

Acc, accuracy; ADC, apparent diffusion coefficient; AI, artificial intelligence; DCE, dynamic contrast enhanced; DL, deep learning; DWI, diffusion weighted imaging; inst., institution; MRI, magnetic resonance imaging; NPV, negative predictive value; PPV, positive predictive value; Prec, precision; ROC-AUC, receiver operating characteristics – area under the curve; RP, radical prostatectomy; Se, sensitivity; Sp, specificity; T2w, T2-weighted MRI; TML, traditional machine learning.

Lesion detection: This group of AI models use all the images from a prostate MR exam as inputs, and detect, localize, and/or stratify cancer aggressiveness on the entire prostate MRI (Figure 2(b); Table 2). Often, these lesion detection methods provide a pixel-level probability of cancer distribution for the prostate, highlighting areas of the prostate which are highly suspicious for cancer. While earlier lesion detection methods used traditional machine learning,[58] recent studies almost always use deep learning-based models. If successfully deployed in the clinic, automated lesion detection methods would automatically evaluate an entire MRI exam and provide a physician with outlines of all areas that are suspicious for cancer.

Table 2.

AI models for prostate lesion detection on MRI.

Study	Input data	Cohort size	Data type	Algorithm	Training labels	Evaluation labels	Evaluation granularity	Evaluation metric	Source code availability
Saha et al.[40]	T2w, ADC DWI	2732	Retrospective, 2 inst., PIRADS or biopsy	DL	Radiologist, w/o path. confirm.	Radiologist, w/o & with path. confirm. from biopsy	Lesion-level, patient-level	ROC, FROC	Yes
Yu et al.[41]	T2w, ADC DWI	1745	Retrospective, 4 inst., PIRADS or biopsy external validation on public data set	DL	Radiologist, w/o path. confirm.	Radiologist, w/o & with path. confirm. from biopsy	Lesion-level, patient-level	FROC, DSC, ROC-AUC	No
Schelb et al.[42]	T2w, DWI	312	Retrospective, 1 inst., biopsy	DL	Radiologist, path confirm. from biopsy	Radiologist, path confirm. from biopsy	Sextant-level, patient-level	Se, Sp, Prec, NPV, ROC	Yes
Sumathipala et al.[43]	T2w, ADC DWI	186	Retrospective, 6 inst., RP or biopsy	DL	Radiologist, path. confirm. from RP or biopsy	Radiologist, path confirm. from RP or biopsy	Patient-level	ROC-AUC	No
Bhattacharya et al.[44]	T2w, ADC	75	Retrospective, 1 inst., RP	DL	Pathologist, automated registration	Pathologist, automated registration	Pixel-level, lesion-level	ROC-AUC, Se, Sp	No
Sanyal et al.[45]	T2w, ADC DWI	77	Retrospective, 1 inst., biopsy	DL	Radiologist, path confirm. from biopsy	Radiologist, path confirm. from biopsy	Pixel-level	ROC-AUC	Yes
Jin et al.[46]	T2w, ADC, DWI, DCE	34	Retrospective, 1 inst.	TML	Pathologist, automated registration	Pathologist, automated registration	Pixel-level	ROC-AUC	Yes
McGarry et al.[47]	T2w, ADC, DWI, DCE	48	Prospecitvely recruited, 1 inst., RP	TML	Pathologist, automated registration	Pathologist, automated registration	Lesion-level	ROC-AUC	No
Cao et al.[48]	T2w, ADC	417	Retrospective, 1 inst., 4 scanners, RP	DL	Radiologist, path confirm., cognitive registration with RP	Radiologist, path confirm., cognitive registration with RP	Lesion-level	FROC	No
De Vente et al.[49]	T2w, ADC	162	Retrospective, 1 inst., public data set, biopsy	DL	Semi-automated region growing from targeted biopsy centroid	Semi-automated region growing from targeted biopsy centroid	Pixel-level, lesion-level	Quadratic weighted kappa-score	No
Seetharaman et al.[50]	T2w, ADC	424	Retrospective, 1 inst., biopsy & RP	DL	Automated Gleason patterns from RP, automated registration	Automated Gleason patterns from RP, radiologist labels with path confirm. from targeted biopsy	Pixel-level, lesion-level, Patient-level	ROC-AUC, Se, Sp	Yes
Bhattacharya et al.[51]	T2w, ADC	443	retrospective, 1 inst., biopsy & RP	DL	Pathologist & automated Gleason patterns from RP, automated registration	Automated Gleason patterns from RP, radiologist labels with path confirm. from targeted biopsy	Pixel-level, lesion-level, patient-level	ROC-AUC, PR-AUC, Se, Sp, Prec, NPV, F1-score, DSC, Acc	Soon to be released
Zhang et al.[52]	T2w, ADC	358	Retrospective, 1 inst., biopsy	DL	Retrospective radiologist outline from biopsy path.	Retrospective radiologist outline from biopsy path.	Pixel-level	DSC, Se, Prec, VOE, RVD	No
Alkadi et al.[53]	T2w	19	retrospective, 1 inst. (public), biopsy	DL	Radiologist, path confirm. from biopsy	Radiologist, path confirm. from biopsy	Pixel-level	Acc, IoU, Recall, DSC	No
Arif et al.[54]	T2w, DWI, ADC	292	Retrospective, 1 inst., biopsy	DL	Radiologist, path confirm. from biopsy	Radiologist, path confirm. from biopsy	Patient-level	Acc, IoU, Recall, DSC	No
Mehralivand et al.[55]	T2w, DWI, ADC	236	Retrospective, multi inst., biopsy	TML	Radiologist, path confirm. from biopsy or RP	Radiologist, path confirm. from biopsy or RP	Lesion-level	AUC, Se, PPV	No
Netzer et al.[56]	T2w, DWI	1488	Retrospective, multi inst., multi scanner biopsy	DL	Radiologist, path confirm. from biopsy or RP	Radiologist, path confirm. from biopsy or RP	Patient-level, sextant-level	ROC-AUC, Se, Se	No
Duran et al.[57]	T2w, ADC	318	Retrospective, 2 inst., different scanners, external validation on public data set	DL	Radiologist, cognitive alignment with RP	Radiologist, cognitive alignment with RP	Lesion-level	FROC, Cohen’s Quadratic kappa	Yes (claimed)

Acc, Accuracy; ADC, apparent diffusion coefficient; AI, artificial intelligence; confim., confirmation; DCE, dynamic contrast enhanced; DL, deep learning; DSC, dice coefficient; DWI, diffusion-weighted imaging; FROC, free-response receiver operating characteristics; inst., institution; IoU, Intersection over Union; MRI, magnetic resonance imaging; NPV, negative predictive value; path., pathology; PIRADS, Prostate Imaging-Reporting and Data System; PPV, positive predictive value; PR-AUC, precision recall–area under the curve; Prec, Precision; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; RVD, relative volume difference; Se, Sensitivity; Sp, Specificity; T2w, T2-weighted MRI; TML, traditional machine learning; VOE, volumetric overlap error; w/o, without.

AI models for prostate cancer detection on MRI can be subdivided into two major tasks: lesion classification and lesion detection. Lesion classification involves classifying a radiologist-outlined lesion (region of interest) into categories (cancer vs benign, clinically significant cancer vs benign or indolent, or Gleason grade groups). Lesion detection involves detecting and characterizing cancer aggressiveness on the entire prostate MRI. AI models for prostate lesion classification on MRI. Acc, accuracy; ADC, apparent diffusion coefficient; AI, artificial intelligence; DCE, dynamic contrast enhanced; DL, deep learning; DWI, diffusion weighted imaging; inst., institution; MRI, magnetic resonance imaging; NPV, negative predictive value; PPV, positive predictive value; Prec, precision; ROC-AUC, receiver operating characteristics – area under the curve; RP, radical prostatectomy; Se, sensitivity; Sp, specificity; T2w, T2-weighted MRI; TML, traditional machine learning. AI models for prostate lesion detection on MRI. Acc, Accuracy; ADC, apparent diffusion coefficient; AI, artificial intelligence; confim., confirmation; DCE, dynamic contrast enhanced; DL, deep learning; DSC, dice coefficient; DWI, diffusion-weighted imaging; FROC, free-response receiver operating characteristics; inst., institution; IoU, Intersection over Union; MRI, magnetic resonance imaging; NPV, negative predictive value; path., pathology; PIRADS, Prostate Imaging-Reporting and Data System; PPV, positive predictive value; PR-AUC, precision recall–area under the curve; Prec, Precision; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; RVD, relative volume difference; Se, Sensitivity; Sp, Specificity; T2w, T2-weighted MRI; TML, traditional machine learning; VOE, volumetric overlap error; w/o, without. Existing AI studies for prostate cancer detection on MRI (both lesion classification and lesion detection) have used a variety of traditional machine learning and deep learning approaches. These AI models also vary greatly in the following ways: (a) Ground truth labels used for training and evaluation (biopsy or radical prostatectomy, radiologist outlines with or without pathology confirmation, pathologist outlines etc.). (b) Evaluation criteria (patient-level, lesion-level, or pixel-level evaluation, evaluation metrics, etc.). (c) Data set size and type (cohort size, input MRI sequences, data from single or multiple institutions, and retrospective or prospective data etc.). Unfortunately, direct comparison across the many published AI models for prostate cancer diagnosis on MRI is not possible due to (1) the wide variability in labels, evaluation criteria, and data for trained models, (2) the lack of access to published models and source code for pre- and post-processing and training, and (3) the lack of large publicly available multi-institution MR imaging data for independent model testing. The following sections summarize the methodologic variability that limits direct comparison across models: (a) Ground truth labels for training and evaluation: AI models for prostate lesion classification mostly use radiologist outlines as inputs, and pathology confirmation from prostate biopsy or surgery as ground truth. However, AI models for prostate lesion detection differ widely in the ground truth labels used for training and evaluation. These approaches for ground truth labeling include: Radiologist outlines of PI-RADS 3 or above lesions with[43,45,48] or without[40,41,59] pathology confirmation. Pathologist outlines of cancer (without grade information) on whole-mount histopathology images mapped onto pre-operative MRI using MRI-histopathology registration approaches.[44,47] Automated Gleason pattern labels on whole-mount histopathology images from deep learning algorithms,[60] mapped onto MRI through automated MRI-histopathology registration.[50,51] While the first ground truth labeling approach trains models to perform PIRADS scoring like a radiologist, the latter two approaches allow for detection of cancers that may not have been seen by a radiologist. Training a model using radiologist outlines without any pathology confirmation may lead to high rates of false-positive findings. Conversely, obtaining pathology confirmation should reduce false-positives because it enables training using only cancerous areas. Pathology-confirmation may include either targeted biopsy results from radiologist-outlined lesions,[45] or post-operative whole-mount histopathology images from radical prostatectomy patients through cognitive registration or manual matching.[43,48] For the latter two approaches of ground truth labeling, MRI-histopathology registration approaches are used to map labels from whole-mount histopathology images onto pre-operative MRI (see section “MRI-histopathology registration for ground truth labeling of cancer detection models”). All label types used to train AI models for prostate cancer detection have advantages and disadvantages. Radiologist outlines without pathology confirmation are easier to obtain in large numbers from routine clinical care (and arguably more feasible to predict), but they include many false positives,[61,62] routinely underestimate tumor extent,[63] and may miss cancers completely (up to 34% of aggressive cancers in men undergoing radical prostatectomy are missed on MRI).[8,10] Unlike radiologist annotations, pathologist outlines on whole-mount histopathology images capture the complete extent of cancer, but mapping pathologist outlines onto pre-operative MRI requires accurate MRI-histopathology registration approaches. These MRI-histopathology registrations are labor- and experience-intensive (see section “MRI-histopathology registration for ground truth labeling of cancer detection models”). Moreover, it is impossible for pathologists to annotate large data sets of whole-mount histopathology images with gland-level annotations of cancer and Gleason pattern to train machine learning models on prostate MRI. Automated Gleason pattern labels derived from deep learning on pathology images[60] have recently shown[64] to perform with similar accuracy to experienced pathologist labels, while circumventing the constraints of labor, time, and variability associated with human-annotated labels. Moreover, when cancer detection models for MRI are trained using automated pixel-level Gleason pattern labels on whole-mount histopathology images, they can selectively identify aggressive and indolent cancer components, even in mixed lesions; this is intractable with any human-annotated labels on MRI.[50,51,64] (b) Data type and size: The input data to train AI models typically consists of one or more MR sequences [T2w-MRI, apparent diffusion coefficient (ADC) Maps, diffusion-weighted images (DWI), dynamic contrast-enhanced (DCE) sequences]. While most studies used MR images or features derived from MR images as inputs, two recent studies, CorrSigNet[44] and CorrSigNIA,[51] presented a radiology-pathology fusion approach to identify MRI features correlated to pathology features of cancer, and used these correlated MRI features to detect and localize aggressive and indolent cancer on MRI. These studies[44,51] showed that algorithms leveraging radiology-pathology fusion to identify pathology features on non-invasive imaging performed better than algorithms that used MR-derived features alone. Recent studies[40,59,65,66] also show that adding prior knowledge about cancer distribution in the different prostate zones to the AI models improves cancer detection and localization. The size of data sets used to train and validate AI models also varies significantly, ranging from as low as 19 patients[53] to 2732 patients[40] (Tables 1 and 2). Most studies used retrospective data, either from public data sets,[22,23,26,27,34,37,39,53] single institutions,[21,31,33,44,50,51] or multiple institutions.[20,40,43] In general, studies using large data sets tend to have lower quality labels, while those with smaller data sets tend to have high-quality labels. For example, two recent studies[40,59] used ≈2000 patients and radiologist labels without pathology confirmation to develop and validate their methods. One of these two studies[59] also showed that a large data set of radiologist labels without pathology confirmation could be used to successfully train an AI model to detect clinically significant cancer on prostate MRI. Studies have also used patient populations with different distributions of the disease to train and validate AI models. While some studies used patients with aggressive prostate cancer that underwent radical prostatectomy,[43,48] others used patients from a population undergoing MRI-based screening who had varying distributions of cancer or no cancer.[23,40,45] To test generalizability, some studies trained AI models using one group of patients, and tested on a different group, including patients with different disease distributions[50,51] or different label types.[40,59] Due to the difficulty in acquiring large data sets of pathology-confirmed cancer labels to train AI models, a study[67] proposed a weakly supervised learning approach to alternatively learn the normal appearance of prostate MRIs using 1145 negative MRI scans, and then use this baseline model to predict pixel-wise suspicion of prostate cancer. Another recent study[68] proposed a self-supervised learning approach where the AI model first learns prostate MRI features from unlabeled data, and then further fine-tunes the learned models to detect cancer using limited labeled data. (c) Evaluation criteria: Evaluation methods and metrics vary based on the task (lesion classification, lesion detection), as well as the granularity of the available labels. Evaluation of cancer detection models can be on a patient-level (whether the AI model correctly detects a person as having prostate cancer or not), lesion-level (whether the AI model correctly detects/classifies individual lesions, while not incorrectly predicting false positives), or pixel level (whether the AI method correctly classifies all the prostate MRI pixels into benign, cancer, or cancer aggressiveness subtypes). The definition of the evaluation metrics also differ across studies, for example, Sumathipala et al.[43] used a lesion-level evaluation where the negative class was defined using 3 × 3 voxels, while other studies[42,50,51] used different sextant-based lesion-level evaluations in line with how prostate biopsies are conducted in the clinic. Several evaluation metrics have been used in existing studies, including but not limited to, the area under the receiver operating characteristics curve (ROC-AUC), area under the precision-recall curve (PR-AUC), free-response receiver operating characteristics (FROC), sensitivity (Se), specificity (Sp), F1-score, accuracy, positive predictive value (PPV), negative predictive value (NPV), and dice coefficient. Such wide variability in evaluation methods and metrics raises the need for definition of a set of clinically relevant standardized evaluation criteria which can be used to uniformly validate and compare all AI models for cancer detection.

Summary

AI models for prostate cancer detection on MRI show great promise, but they are not ready for clinical deployment. There remains a wide variability in methods, labels, and evaluation criteria among these AI models, limiting comparison. Most of these AI models have been developed and validated on single institution, retrospective and small patient data sets, and lack tests to assess generalizability in larger, heterogeneous patient data. In order to reap the benefits of these AI models in clinical care, there remains the need for developing publicly available anonymized large patient data sets, publicly available source code and trained models, standardized evaluation criteria, external validation, multi-reader studies to assess performance of AI models, and prospective trials (see Section “Challenges in AI for PCa” for more details).

Cancer detection on prostate ultrasound images

Prostate cancer is most commonly diagnosed using grayscale transrectal ultrasound-guided biopsy.[69,70] While grayscale ultrasound accurately identifies the prostate gland, low signal-to-noise ratio and artifacts (e.g. speckle and shadowing) prevent clinicians from reliably differentiating cancerous from non-cancerous regions. The detection rate of prostate cancer on grayscale ultrasound images is reported to be as low as 40%.[71-73] When visible on ultrasound, cancers most often appear hypoechoic[74] because they reflect significantly less sound echoes than normal tissue. To supplement grayscale ultrasound, other new ultrasound-based imaging techniques such as shear-wave elastography,[75-77] color doppler ultrasound,[77] contrast-enhanced ultrasound,[78] micro-ultrasound,[79] and their combination have been proposed. These alternative ultrasound-based imaging modalities provide enhanced image resolution and better visualization of the prostate compared to grayscale ultrasound, and enable prostate cancer detection with better sensitivity compared to grayscale ultrasound.[80] In particular, high-frequency micro-ultrasound images are showing promise in detecting clinically significant prostate cancer with similar or higher sensitivity, similar specificity, and much lower cost in comparison to MRI.[81-85] As such, research on development AI models for prostate cancer detection on micro-ultrasound images is also growing.[86,87] Although grayscale transrectal ultrasound is widely used for prostate biopsy in clinical settings, only one AI study[88] focused on prostate cancer detection on grayscale ultrasound images. Most AI models for prostate cancer detection used new ultrasound-based imaging modalities (Table 3). These models used a variety of AI methods, ranging from traditional machine learning to deep learning (Table 3). Some other studies investigated the role of radio-frequency time-series data[93,94] to detect prostate cancer using traditional machine learning. Moreover most of these studies focused on the task of lesion classification (classifying a physician-outlined region of interest into benign vs cancerous tissue)[72,90,91,93,94] while only a few focused on lesion detection (detecting and localizing cancer on the entire ultrasound image).[80,89,92]

Table 3.

AI models for prostate cancer detection on ultrasound.

Study	Cohort size	Input data	Data type	Algorithm	Training labels	Evaluation granularity	Evaluation metric	Source code availability	Task
Sedghi et al.[89]	157	TeUS	Retrospective, 1 inst.	DL	Radiologist, path confirm. from biopsy	Lesion level	Se, Sp ACC, AUC	No	Lesion detection
Azizi et al.[90]	163	TeUS	Retrospective, 2 inst.	DL	Radiologist, path confirm. from biopsy	Lesion level	Se, Sp ACC, AUC	No	Lesion classification
Azizi et al.[72]	157	TeUS	Retrospective, 1 inst.	DL	Radiologist, path confirm. from biopsy	Lesion level	Se, Sp ACC, AUC	No	Lesion classification
Azizi et al.[91]	155	TeUS	Retrospective, 1 inst.	DL	Radiologist, path confirm. from biopsy, biopsy length	Patient level	AUC, MSE	No	Lesion classification
Han et al.[92]	51	TRUS	N/A	TML	Biopsy	Patient level, Lesion level	Se, Sp ACC, ROC-AUC	No	Lesion detection
Wildeboer et al.[80]	50	TRUS, SWE, DCE-US	Retrospective, 1 inst.	TML	RP, Biopsy	Pixel level, Lesion level	ROC-AUC	No	Lesion detection
Moradi et al.[93]	16	RF time series	Retrospective, 1 inst.	ML	RP	Patient level	Se, Sp ACC, ROC-AUC	No	Lesion classification
Imani et al.[94]	14	RF time series	Retrospective, 1 inst.	TML	RP, Biopsy	Patient level	Se, Sp ACC, ROC-AUC	No	Lesion classification
Hassan et al.[88]	1151	TRUS	Retrospective, 1 inst. public data set	TML & DL	Biopsy	Patient level	ACC	No	Lesion classification

Acc, accuracy; AI, artificial intelligence; AUC, area under the curve; confirm., confirmation; DCE-US, dynamic contrast-enhanced ultrasound; DL, deep learning; inst., institution; MSE, Mean Square Error; path, Pathology; RF, radio frequency; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; Se, sensitivity; Sp, specificity; SWE, shear-wave elastography; TeUS, temporal enhanced ultrasound; TML, traditional machine learning; TRUS, transrectal ultrasound.

AI models for prostate cancer detection on ultrasound. Acc, accuracy; AI, artificial intelligence; AUC, area under the curve; confirm., confirmation; DCE-US, dynamic contrast-enhanced ultrasound; DL, deep learning; inst., institution; MSE, Mean Square Error; path, Pathology; RF, radio frequency; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; Se, sensitivity; Sp, specificity; SWE, shear-wave elastography; TeUS, temporal enhanced ultrasound; TML, traditional machine learning; TRUS, transrectal ultrasound. Prostate cancer detection on ultrasound images using AI models remains poor, and miminal AI literature exists employing AI approaches on grayscale ultrasound. Development of AI-based approaches for prostate cancer detection on ultrasound represents a significant research opportunity. Generalizability of these methods also needs further investigation since most of these methods have been evaluated on small patient cohorts with retrospective data from single institutions. Privacy-protecting data sharing and public availability of source code and trained models are imperative to improve the performance of AI models on ultrasound (see section “Challenges in AI” for PCa for more details).

Cancer detection on prostate histopathology images

Gleason grading[95] on histopathology images is the strongest predictor of prostate cancer aggressiveness and recurrence. However, Gleason grading suffers from significant inter- and intra-pathologist variability.[96-98] While sub-specialized genitourinary pathologists achieve high concordance in Gleason grading, such expertise is not universally available. The emergence of technology to digitize glass slides into whole slide images (WSIs) has revolutionized the field of computational pathology by enabling computer-assisted diagnostic support to pathologists. AI models on prostate histopathology images have been developed to distinguish cancer from non-cancer regions[99,100] and for automated Gleason grading.[60,101-108] AI models for pathology images also suffer from challenges associated with limited labeled data sets. In addition, pathology images are extremely large, which leads to additional challenges in processing them, as well as in generating labeled data sets. To put this in perspective, a single whole-mount histopathology slice from a radical prostatectomy patient in uncompressed form is 2–4 Gigabytes (GB) in size, and a single patient data with several whole-mount slices is often more than 20 GB in size. Even in compressed form, a single patient data with several histopathology slices (biopsy or whole-mount) can occupy an average storage size of 2–3 GB.[109] When compared to natural images, around 470 whole slide images contain approximately the same number of pixels as the entire ImageNet[110] data set (the public data set of over 14 million natural images used to train AI models for classification of natural images).[102] As such, several studies considered only a subset of the complete pathology image, or tissue microarrays[111-113] for development and validation of AI models. In this review, we only include studies that considered digital histopathology images derived from prostate needle core biopsies or radical prostatectomies. Most recent studies on cancer detection and Gleason grading on histopathology images use deep learning models. Similar to AI models for prostate cancer detection on MRI, AI models for prostate cancer detection on histopathology images also differ in (a) ground truth labels for training and validation, (b) data type and size, and (c) evaluation criteria (see Table 4).

Table 4.

AI models for cancer detection and Gleason grading on prostate histopathology whole slide images (WSI).

Study	Input data	Cohort size	Data type	Algorithm	Training labels	Evaluation labels	Evaluation granularity	Evaluation metric	Code availability
Lucas et al.[101]	WSI, biopsy	38 slides	Retrospective, 1 inst.	DL	Pathologist, pixel-level	Pathologist, pixel-level	Patch-based	Se, Sp, F1-score	No
Campanella et al.[102]	WSI, biopsy	15,187 slides	Retrospective, multiple inst.,	DL	Reported diagnosis	Reported diagnosis	Slide-level	ROC-AUC	Yes
Bulten et al.[96]	WSI, biopsy	1410 slides	Retrospective, multiple inst.,	DL	Pathologists’ reports	Pathologists’ reports, Consensus reference standard by 3 expert urologic pathologist	slide-level	ROC-AUC, F1-score, Acc, Prec, Rec, Sp, NPV	Yes
Nagpal et al.[107]	WSI, RP	1557 slides	Retrospective, multiple inst.	DL	Slide-level & region-level annotations by pathologists	Slide-level & region-level annotations by pathologists	slide-level	ROC-AUC	No
Pinckaers et al.[114]	WSI, biopsy	5949 slides	Retrospective, multiple inst. retrospective	DL	Pathologists’ reports	Pathologists’ reports, Consensus reference standard by 3 expert urologic pathologist	slide-level	ROC-AUC	Yes
Ström et al.[103]	WSI, biopsy	1474 patients, 9001 slides	Prospectively collected, multiple inst.	DL	Annotations by single experienced urological pathologist	Annotations by 23 experienced urological pathologist	slide-level	ROC-AUC, Se, Sp, cancer length measurement, Cohen’s kappa	No
Marginean et al.[105]	WSI, biopsy	195 patients, 735 slides	Retrospective, 1 inst., same slide different scanners	DL	Pixel-level annotations by 2 experienced pathologists	Pixel-level annotations by 2 experienced pathologists	Pixel-level slide-level	Correlation, Se, Sp	No
Kott et al.[106]	WSI, biopsy	80 patients, 85 slides	Retrospective, 1 inst.	DL	Pixel-level annotations by pathologists	Pixel-level annotations by pathologists	Patch-level	Acc, Se, Sp, Prec	No
Li et al.[108]	WSI, RP	70 patients, 543 slides	Retrospective, 1 inst.	DL & TML	Pixel-level annotations by pathologists	Pixel-level annotations by pathologists	Pixel-level	Overall Pixel Acc. IoU	No
Ryu et al.[60]	WSI, biopsy	1833 slides	Retrospective, 2 inst.	DL	Pixel-level annotations by 1 experienced pathologist	Slide-level annotations by 3 experienced pathologists, difficulty-level	Slide-level	Cohen’s kappa, Tumor length	No

AI, artificial intelligence; DL, deep learning; inst., institution; IoU, intersection over union; NPV, negative predictive value; Prec, precision; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; Se, Sensitivity; Sp, specificity; TML, traditional machine learning; WSI, whole slide images.

AI models for cancer detection and Gleason grading on prostate histopathology whole slide images (WSI). AI, artificial intelligence; DL, deep learning; inst., institution; IoU, intersection over union; NPV, negative predictive value; Prec, precision; ROC-AUC, receiver operating characteristics–area under the curve; RP, radical prostatectomy; Se, Sensitivity; Sp, specificity; TML, traditional machine learning; WSI, whole slide images. Labels for training and evaluating AI models for Gleason grading on prostate histopathology images are derived either from pathology reports,[102,104,114] or from pixel-level annotations from experienced pathologists.[60,101,105] Digital histopathology images to train AI models may either be derived from prostate needle core biopsies[60,101]–[106] or radical prostatectomy[107,108] specimens. The data set size varies widely (Table 4), mostly depending on the label used for training and evaluation. AI models trained with pixel-level Gleason pattern labels from experienced pathologists typically have smaller data set sizes,[101,106,108] whereas those developed with patient-level labels from diagnostic reports have larger data set sizes.[102,104] Evaluation of AI models are either on a pixel-level (whether the AI method correctly predicts Gleason patterns for each pixel of the image), region-level (whether the AI method assigns the correct Gleason score to a given region of the digitized histopathology image), or slide-level (whether the AI method assigns the Gleason score for the entire slide). Patient-level Gleason scores are often derived from slide-level predictions. Like with MRI, evaluation metrics for histopathology images vary based on the label type and evaluation granularity. For example, pixel-level evaluation is only possible when detailed pixel-level labels are available as in Figure 3, and evaluation metrics may measure the degree of overlap or correlation between labels and predictions, sensitivity and specificity at a very fine granularity (pixel-level) (Figure 3(b)–(d)). However, in most cases, such detailed pixel-level labels are unavailable as these are impractically time-consuming for pathologists, and evaluation is performed on a coarser granularity (region-, slide-, or patient-level) using pathologist reports. Evaluation metrics for such evaluation may include ROC-AUC, Sensitivity, Specificity, Cohen’s kappa etc., but on a region-, slide-, or patient- level. A study to investigate the appropriate approach to evaluate AI classification methods on prostate histopathology images[113] found that AI models trained using data with multiple expert annotations yielded more accurate performance than models trained with single expert annotation. Moreover, patient-based cross-validation provided more realistic and unbiased evaluations of AI models than patch-based cross-validation-based evaluation methods.[113]

Figure 3.

The AI-predicted[60] automated aggressive (Gleason pattern 4, green) and indolent (Gleason Pattern 3, blue) cancers visually match the manual cancer annotations by an expert pathologist (black, yellow, orange, red). (a) Whole mount histopathology image with (b–d) Close-up into the two cancer lesions. (c) Cancer labels manually outlined by an expert pathologist (black outline) shows high agreement with overall cancer (combined blue and green) predicted by the AI model. (b, d) It is impractically time-consuming for a human pathologist to manually assign pixel-level Gleason patterns (yellow, orange, red) to each gland in detail as done by the AI model (blue, green). Several studies have compared the performance of (a) AI models versus pathologists, and (b) pathologists with and without AI assistance.[103,104,107,114-118] Most of these multi-reader and AI-assisted studies confirm the value of AI models in diagnostic pathology; they show increased sensitivity without statistical reduction in specificity, and reduced inter- and intra-observer variability. Recent results from the Prostate cANcer graDe Assessment (PANDA) challenge[118] show that AI models for Gleason grading are generalizable to different patient populations across the world and achieve strong concordance with expert genitourinary pathologists (see Section “Challenges in AI” for PCa for more details). Paige Prostate (Paige AI, New York, USA)[119] recently received approval from the Food and Drug Administration (FDA) as the first ever AI-based clinical pathology solution.[119] Independent studies on Paige Prostate showed generalizable performance on external test sets,[116] and significant sensitivity improvement (74%–97%)[115] of non-genitourinary specialist pathologists without prior experience in digital pathology when assisted by Paige Prostate. Although non-genitourinary pathologist sensitivity improvements were noted for cancers of all sizes and Grade Groups, the most pronounced improvements were noted for smaller and lower-grade (Grade Groups 1, 2, and 3) cancers.[115] Gleason grading, while standardized, is constantly being tweaked. This means that AI models either need to evolve, or the data need to be linked to hard clinical endpoints, like recurrence and death. For successful clinical deployment of AI models, regulatory authorities (e.g. FDA) need to design strategies where AI models must also evolve with clinical knowledge, rather than being frozen with locked-in variables. The accurate performance of AI models on prostate histopathology images motivated their use as labeling strategies for training prostate cancer detection methods on MRI.[50,51,66] These AI models on prostate histopathology images generate precise, gland-level annotations, that are not feasible for human pathologists (Figure 3). These automated pixel-level Gleason pattern labels, together with accurate MRI-histopathology registration (see section “MRI-histopathology registration for ground truth labeling of cancer detection models”), enable AI-based radiology-pathology fusion for selective identification of aggressive and indolent cancer on MRI.[50,51] This is not possible using human annotations on MRI. AI models for prostate cancer detection and Gleason grading on histopathology images have demonstrated excellent performance comparable to expert genitourinary pathologists. When compared to AI models for radiology images, AI models for histopathology images have undergone more rigid experimentation and validation using larger, heterogenous patient data sets. Multi-reader studies and evaluation on external validation sets demonstrate that these AI models have generalizable performance in heterogeneous patient populations from across the globe and have the potential to help pathologists in the clinic by improving sensitivity of non-genitourinary specialist pathologists, and helping reduce inter- and intra-pathologist variability in Gleason grading.

AI models for supporting tasks in cancer detection

Supporting tasks for cancer detection include tasks that are labor-, time-, or experience-intensive, but form an integral part of the clinical workflow to detect cancer. These supporting tasks include prostate gland segmentation and MRI-ultrasound registration to guide fusion biopsy procedures. Another supporting task is MRI-histopathology registration for patients who underwent radical prostatectomy. MRI-histopathology registration is necessary to study correlations between pre-operative MRI and post-operative histopathology images of the prostate and for deriving accurate ground truth labels for training cancer detection AI models on MRI. While several AI models exist for these supporting tasks, only a few are being commercially used in the clinic for support, or as pre-processing steps for AI cancer detection models.

Prostate gland segmentation to facilitate MRI-ultrasound fusion biopsies

Targeted MRI-ultrasound fusion biopsy workflow relies on accurate prostate gland segmentations on T2-weighted MRI and ultrasound images.[120] However, manually outlining the prostate is a time-consuming and tedious task.[121] Automated methods have the potential to reduce the manual effort, time, and variability associated with prostate gland segmentations on MRI and ultrasound images during clinical biopsy procedures. AI for prostate gland segmentation on MRI: Many studies have proposed deep learning models to segment the prostate on MRI.[121-134] As with AI models for cancer detection tasks, AI models for prostate segmentation on MRI are mostly trained and validated on small data sets (40–250 patients),[123-131] often with retrospective, single-center data[124,131-133] without validation in external cohorts.[124,131,132] The trained models and the source code to pre-process the data and train the model are often not publicly available,[123-125,127,128,130,132,133,135] limiting the comparison between these models, as well as their usage. The better-performing models achieved Dice scores (metric of similarity between manual and AI-predicted segmentations) of at least 0.90 in internal and 0.80 in external data sets.[121-123,125-127,130] A recent study that prospectively implemented an AI model for prostate segmentation in a urology clinic found that AI performance was more accurate and 17 times faster than trained radiology technicians.[121] Finally, FDA-cleared commercial AI-based solutions for prostate gland segmentation are also available to optimize the clinical workflow.[136-139] AI for prostate gland segmentation on ultrasound images: AI models for prostate gland segmentation on grayscale transrectal ultrasound images have used both traditional machine learning,[140-143] as well as deep learning–based approaches.[144-151] To further improve the segmentation of challenging regions (e.g. apex and base), studies have explored the use of prior shape information as statistical shape models,[152,153] and temporal information for transrectal ultrasound images.[154,155] Although these methods demonstrated good performance, most of these studies included small patient cohorts from a single institution and a single manufacturer, thus providing limited evidence about generalizability across data from other institutions and different imaging devices. AI models for prostate gland segmentation on MRI and ultrasound have demonstrated promising results, but external validation on large patient cohorts are needed for wide clinical implementation. Moreover, source code and trained models must be shared publicly to derive the maximum benefits of the best-performing approaches.

MRI-ultrasound registration to facilitate MRI-ultrasound fusion biopsies

Registration of preoperative MRI and intra-operative ultrasound is necessary for guiding MRI-ultrasound fusion biopsy,[156,157] focal therapy,[158,159] and radiotherapy planning on MRI.[160] However, registration of the two different imaging modalities, MRI and ultrasound, is complicated due to (a) the difference in the underlying MR and ultrasound imaging processes, and (b) the deformation between the two imaging procedures. In an attempt to improve registration between the two modalities, several studies used pre-defined corresponding anatomical structures.[156-159],[161,162] Some approaches used deformable transformations[161] to model patient movement, surrounding organs, for example, bladder and rectum, or interaction with surgical instrument, for example, biopsy needles and ultrasound probes. Others used AI models without constrained transformation models,[156]–[158] or prior knowledge in modeling soft tissue motion.[163,164] AI models have also been proposed to learn similarity measures[165] or transformation models from either biomechanical simulations (which emphasize biologically meaningful registration)[159] or shape populations.[166] A popular class of methods utilize prostate gland segmentations on both MR and ultrasound images before registering the resulting point sets[162] (Figure 4). AI models processing point set data[167] for registration often represent corresponding structures without detailed voxel-level correspondence.[156,157,164] An advantage of using point set is its robustness to scanning protocols, largely thanks to well-established independent segmentation algorithms, and arguably even faster inference.[157] While most above methods discuss registration between 3D MR and 3D transrectal ultrasound images, there have also been advances in aligning 3D MRI to 2D ultrasound which is much easier to obtain.[168] A summary of the related references in this section are listed in Table 5.

Figure 4.

Table 5.

AI models for registration between MRI (T2w) and ultrasound (TRUS) images.

Study	Number of subjects	Data type	Approach	Prostate segmentation	Evaluation metric	Source code availability
Hu et al.[166]	143	Retrospective	DL	No	TREs, TDR, RMSE	No
Hu et al.[161,169]	76	Retrospective	DL	Yes; manual	TREs, DSC	Yes
Hu et al.[163]	76	Retrospective	DL	Yes; manual	TREs, DSC	No
Ghavami et al.[162]	59	Retrospective	DL + TML	Yes; DL	DSC, GVE, TREs	No
Hu et al.[158]	80	Retrospective	DL	Yes; manual	TREs, DSC	Yes
Haskins et al.[165]	679	Retrospective	DL + TML	No	TREs	No
Guo et al.[170]	679	Retrospective	DL	Yes; manual	TREs, SRE	Yes
Saeed et al.[159]	320	Retrospective	DL	Yes; manual	MAE	No
Baum et al.[156,157]	108	Retrospective	DL	Yes; manual	TREs, CD, HD	No
Zeng et al.[171]	36	Retrospective	DL	Yes; manual	TREs, DSC	No
Zeng et al.[160]	36	Retrospective	DL	Yes; DL	TREs, DSC	No
Song et al.[172]	528	Retrospective	DL	Yes; manual	SRE	Yes
Fu et al.[164]	50	Retrospective	DL	Yes; DL	TREs, DSC, MSD, HD	No
Guo et al.[168]	619	Retrospective	DL	No	TREs, NCC	Yes

AI, artificial intelligence; CD, chamfer distance; DL, deep learning; DSC, dice score coefficient; HD, Hausdorff Distance; GVE, gland volume error; MRI, magnetic resonance imaging; MAE, Mean Absolute Error; MSD, Mean Square Distance; NCC, normalized cross correlation; RMSE, Root Mean Square Error; SRE, surface registration error; T2w, T2-weighted MRI; TDR, tumor detection rate; TML, traditional machine learning; TREs, target registration Errors; TRUS, transrectal ultrasound.

AI can help in supporting tasks for cancer detection like prostate gland segmentation on MRI and ultrasound (left), and MRI-ultrasound registration (right). The AI-predicted prostate segmentations on MRI and ultrasound can help in automated MRI-ultrasound registration which aligns the two modalities, mapping lesions from MRI onto ultrasound. MRI-ultrasound registration helps guide systematic and targeted fusion biopsy procedures. AI models for registration between MRI (T2w) and ultrasound (TRUS) images. AI, artificial intelligence; CD, chamfer distance; DL, deep learning; DSC, dice score coefficient; HD, Hausdorff Distance; GVE, gland volume error; MRI, magnetic resonance imaging; MAE, Mean Absolute Error; MSD, Mean Square Distance; NCC, normalized cross correlation; RMSE, Root Mean Square Error; SRE, surface registration error; T2w, T2-weighted MRI; TDR, tumor detection rate; TML, traditional machine learning; TREs, target registration Errors; TRUS, transrectal ultrasound. AI methods have the potential of completely automating the MRI-ultrasound registration task. The best-performing AI-based MRI-ultrasound registration methods achieved average target registration errors of ≈2–3 mm,[161,164] although with relatively large variance. Anatomical information, such as prostate gland segmentations or surface points help improve AI performance in MRI-ultrasound registration. However, of the studies performed to date, all have shortcomings. These shortcomings include retrospective design, lack of prospective evaluation, and development and validation using single institution data. Several MR and ultrasound manufacturers[173,174] have integrated tools to assist with MRI-ultrasound registration, although these are still semi-automated and require human input in real-time.

MRI-histopathology registration for ground truth labeling of cancer detection models

Several AI models of prostate cancer detection on MRI[43,44,48,50,51] derive ground truth labels from whole-mount prostate histopathology images through accurate registration with pre-operative MRI (Table 6). Pathologist labels mapped from histopathology images onto MRI through image registration is considered the most accurate labeling strategy[64] for training AI cancer detection models.

Table 6.

MRI-histopathology registration approaches (not exhaustive) for generating ground truth cancer labels on MRI.

Study	Number of subjects	Pathology type	Registration type	Intermediate modality	Require 2D slice correspondences	Prostate sectioning	Source code availability
Chappelow et al.[175]	25	Whole-mount	Traditional automated	None	Yes	Manual	No
Ward et al.[176]	13	Whole-mount	Traditional automated	Fiducial markers	Yes	Image-guided	No
Kalavagunta et al.[177]	35	Pseudo-whole mount	Traditional automated	Manual landmarks	Yes	Sectioning box	No
Reynolds et al.[178]	6	Whole-mount	Traditional automated	Ex vivo MRI + Manual landmarks	Yes	Sectioning box	No
Li et al.[179]	19	Pseudo-whole mount	Traditional automated	None	Yes	Manual	No
Losnegård et al.[180]	12	Whole-mount	Traditional automated	None	No	Manual	No
Wu et al.[181]	17	Whole-mount	Traditional automated	Ex vivo MRI + fiducial markers	Yes	3D-printed mold	No
Rusu et al.[182]	157	Whole-mount	Traditional automated	None	Yes	3D-printed mold	Yes
Shao et al.[183]	152	Whole-mount	Deep learning	None	Yes	3D-printed mold	Yes
Sood et al.[184]	106	Whole-mount	Traditional automated	None	No	3D-printed mold	No
Shao et al.[185]	183	Whole-mount	Deep learning	None	Yes	3D-printed mold	No

MRI, magnetic resonance imaging.

MRI-histopathology registration approaches (not exhaustive) for generating ground truth cancer labels on MRI. MRI, magnetic resonance imaging. MRI-histopathology registrations are performed either cognitively, manually, or automatically. In cognitive methods,[43,48] researchers mentally project cancer labels from histopathology images onto the corresponding MRI slices without quantitative spatial alignment of the two modalities. Manual registration involves spatially aligning the MRI and histopathology images on a case-by-case basis by human experts. Cognitive and manual registration approaches are known to be labor-intensive, requiring highly skilled experts in both radiology and pathology. As such, these approaches can only been applied to small patient data sets. Moreover, these methods failed to map MRI-invisible or hardly visible lesions from histopathology images onto MRI. In traditional automated approaches, MRI, and histopathology images are directly registered by using customized image similarity loss functions,[175,179,182,186] fiducial markers,[176] or the use of intermediate ex vivo imaging modalities to facilitate the registration.[181,187] Many of these methods rely on patient-specific 3D-printed molds derived from preoperative MRI[135] to maintain slice correspondences between MRI and histopathology images, while others directly register the MRI and histopathology volumes without the need for MRI-histopathology slice correspondences.[180,188-190] However, the absence of accurate slice correspondences can lead to partial volume artifacts. A recent study used 3D super-resolution of MRI and histopathology images prior to performing a 3D registration to alleviate issues with partial volume artifacts.[184] Although traditional automated approaches are advantageous over manual and cognitive registration, they are often time-consuming, requiring several minutes to register data from a single patient. Recent deep learning models[183] greatly speed up the registration process. Traditional automated approaches generally require manual prostate gland segmentation to facilitate the registration. To avoid this step, a recent deep learning model proposed a weakly supervised registration approach that avoids the need for prostate segmentation at inference.[185] Registration of pre-operative MRI with post-operative histopathology images is complicated due to the inherent difference of the two modalities, and their acquisition processes. Nonetheless, it is an important task for deriving accurate ground truth labels for cancer detection models on MRI. Although several automated MRI-histopathology registration approaches have been developed, only a few studies on AI-based prostate cancer detection[44,47,50,51] use automated registration methods to derive ground truth labels. There remains the need to publicly share source code, trained models, and benchmarking data sets to compare different registration approaches, and to enable their usage in deriving accurate cancer labels.

Challenges in AI for PCa

AI models have great potential to improve diagnosis and management of prostate cancer and to bring precision medicine to patients. For example, AI models may help accurate and timely detection of aggressive cancer, reduction of cancer-related deaths, and avoiding unnecessary invasive biopsies and their associated side-effects. In addition, AI models may help streamline supporting tasks for cancer detection that are labor-, time-, and experience-intensive. Despite promising outcomes of AI research in prostate cancer diagnosis, most methods are not ready to be used in clinical care. In the United States, only one prostate cancer detection system, the ‘Paige Prostate’, has received FDA approval for in vitro diagnostic use for detecting cancer on histopathology images of prostate biopsies.[119] Other commercially available FDA-cleared or European CE-marked applications (OnQ Prostate[137], PROView[136] Quantib Prostate,[138] qp-Prostate[139]) mostly focus on supporting tasks such as prostate segmentation, volumetry computation, or PSA density calculation. Reducing the gap between academic research and translation of these AI models for diagnostic support in the clinic will require addressing the following challenges: (a) Limited labeled data: In order to be robust, generalizable, and unbiased, AI models must be trained and validated with large data sets with accurate labels, which capture variability in patient populations and imaging acquisition. For example, the AI models for natural image recognition tasks that have achieved performance exceeding humans, have been trained and validated with ≈14 million images in the publicly available ImageNet data set.[191] However, due to privacy concerns of medical data-sharing, AI models for prostate cancer diagnosis are mostly trained with small data sets, often from a single institution having patient populations with specific socio-economic or racial distributions, or images acquired with certain scanners and acquisition protocols. AI models trained on a homogeneous patient population or imaging data may not generalize to different demographics or different kinds of scanners.[192] Studies to test generalizability of AI models on prostate cancer in different racial, socio-economic, or ethnic patient populations are limited, thereby raising the question of unbiased and robust applicability of these AI-based models support in the clinic. Development of robust, generalizable, and unbiased AI systems may require consolidated efforts from medical institutions across the globe to enable privacy-protecting medical data sharing. Research on federated learning-[193,194] or incremental learning-based[195] AI models that enable improvement of AI models through sharing and constantly updating the model, rather than sharing of data, is another possible solution. Finally, self-supervised,[68,196] weakly supervised,[67,102] semi-supervised,[197,198] few-shot[199] learning techniques may be used to further improve robustness of AI models that are developed with large data sets, but without accurate labels (i.e. thousands of prostate MRIs in Picture Archiving and Communication System–PACS) that lack annotations about where cancer is located in the images). (b) Limited multi-reader studies to assess AI-assisted performance of clinicians: There are relatively few multi-reader studies that assess how AI models perform in comparison to clinicians. Moreover, only a few studies exist that analyze if the use of AI models can help standardize or improve human-reader performance.[55,104,115,117,200] This would be particularly useful in assessing the potential of AI in resource-lacking institutions, or for less-experienced radiologists or pathologists. Multireader studies for prostate MRI interpretation[42,55,200,201] in limited patient populations demonstrated the utility of AI in providing diagnostic support to radiologists by improving sensitivity and positive predictive value of patient-level and lesion-level cancer detection. Multireader studies for prostate histopathology image interpretation[103,104,117] demonstrated that AI models performed similarly to highly experienced genitourinary pathologists. Furthermore, AI-assisted pathologists outperformed the standalone AI system and the unassisted pathologists, by reducing inter- and intra-pathologist variability in Gleason grading. These multi-reader studies unequivocally confirm the benefits of clinician-AI synergy, but the need remains for more extensive and carefully designed studies with larger, varied, multi-center patient populations to completely assess the clinical readiness of the AI models, particularly for radiology images. Moreover, research needs to be conducted on making the technology user-friendly and ensuring that it does not negatively affect clinical workflow. (c) Limited prospective evaluations: Most AI models on prostate cancer have been trained and evaluated with retrospective data; only a few have focused on prospective evaluation.[103,121] To deploy AI models for diagnostic support in the clinic, models trained with retrospective data must be evaluated in a prospective setting. Moreover, clinical trials that evaluate AI models for prostate cancer detection on non-invasive imaging need to be designed and explored. (d) Lack of standard evaluation criteria: Variability in evaluation criteria used in existing AI studies makes it difficult to compare the different automated approaches with one another. Definition of a unified, clinically relevant standard to evaluate AI models is required to assess the best-performing, robust, and unbiased AI system for clinical deployment. While patient outcome like death or recurrence can be considered as hard clinical endpoints for evaluation, such long-term outcome data are often unavailable since prostate cancer is a slow-progressing disease. Making the source code and trained models of published studies publicly available helps in testing different approaches on independent data sets, often without the need to share the data, or with different evaluation criteria. A possible way to encourage participation of the AI community in prostate cancer research and to enable comparison, validation, and benchmarking of AI models for cancer detection could be through organization of grand challenges. Grand challenges provide large, publicly available data sets for training AI models, and also allow comparison and validation of AI models through well-curated, representative test data sets and defined clinically relevant evaluation metrics. The Prostate cANcer graDe Assessment (PANDA) Challenge[118] was organized with the aim of testing the generalizability and clinical-readiness of AI models for Gleason grading on prostate biospy histopathology images. More than 10,000 histopathology images were made publicly available through the challenge, and 1290 AI developers from 65 countries participated in this challenge. On an independent validation set of 2009 biopsies, the AI models generalized across different patient populations, imaging parameters and reference standards, achieving strong agreements with expert genitourinary pathologists. Results from the PANDA challenge suggest that AI models for histopathology images are robust enough to be implemented in clinical trials.[118] Grand challenges for MR images have included relatively smaller patient cohorts with lesser diversity. The ProstateX and ProstateX-2 challenges[202] were organized with the aim of developing AI models that can enable cancer detection and aggressiveness characterization on prostate MR images. The ProstateX challenges included smaller patient cohort (346 patients) scanned at a single institution (Radboud University Medical Center) using two different Siemens scanners. The PROMISE-12 challenge[134] was organized with the aim of developing AI models that can segment the prostate gland. It included 100 patients from four different institutions and different scanners and scanning protocols. The NCI-ISBI 2013 challenge on automated segmentation of prostate structures[203] was organized to segment the prostate peripheral zone and central glands, in addition to the whole prostate gland. The NCI-ISBI challenge included 80 patients from two different institutions, scanned with two different scanners and scanning protocols. These studies show that AI models for prostate MR interpretation still need development and external validation. Multi-institution collaborations are needed to drive more extensive grand challenges on prostate MRI and ultrasound images. Such grand challenges will encourage AI model-development on larger and heterogenous patient data sets. Moreover, these challenges will also allow testing generalizability and clinical readiness of AI models.

Limitations of this study

This study has several limitations. First, this is not an exhaustive review. While we curated relevant literature to cover the breadth and depth of the different applications of AI models in prostate cancer diagnosis, the multitude of research publications in the different facets of this field and the limited review space forced us to be selective. Second, we could not provide a comparative analysis of the different methods due to the variability in data sets and evaluation criteria. Third, we did not provide extensive details of the AI models. Our aim in this study was to provide a broad overview of the potential of AI models in prostate cancer care and make it generally comprehensible to both clinical and technical readers. We described algorithms as belonging to two major categories – traditional machine learning and deep learning based approaches. Fourth, we did not discuss automated systems that use clinical data, genomic data, or newer imaging modalities such as prostate-specific membrane antigen (PSMA) PET scans. We focused on three major imaging modalities (MRI, ultrasound, and histopathology) most commonly used in routine clinical care and did not cover newer imaging modalities that are showing benefit in prostate cancer detection. One such modality is Gallium-68 PSMA-11 PET-CT scans.[204] Recent studies demonstrate that PSMA PET-CT scans can significantly improve prostate cancer detection and treatment planning, are more accurate than conventional imaging with CT and bone scanning,[205] add value to MRI for diagnosis,[206] and may enable better prediction of pre-operative pathological outcomes than MRI.207–209 There is an opportunity for AI to address shortcomings of these new imaging modalities, but due to its recent FDA approval, only a few published AI models[32,210,211] exist for prostate cancer detection using PSMA PET currently.

Conclusion

AI models for prostate cancer detection on imaging are showing great promise and encouraging performance. Yet, AI models on radiology images need further development and validation on larger and diverse patient populations, standardized evaluation criteria, multi-reader studies, and prospective evaluation to make them robust, generalizable, and unbiased. AI models on histopathology images have undergone more rigid experimentation on larger data sets and multi-reader evaluations, as compared to radiology images. These studies suggest that AI models on histopathology images are generalizable to different patient populations across the globe and can assist pathologists in the clinic by reducing intra- and inter-pathologist variability in Gleason grading. AI models for supporting tasks have the potential of reducing manual labor- and time-investments, particularly as more images are acquired and processed daily in clinical care. However, research needs to be done on best practices to integrate AI predictions in the clinical workflow in a seamless way, to enable clinician–AI synergy and precision medicine. AI-enabled precision medicine may eventually help in reducing disparity and advancing health equity in prostate cancer management.

166 in total

1. Deep transfer learning-based prostate cancer classification using 3 Tesla multi-parametric MRI.

Authors: Xinran Zhong; Ruiming Cao; Sepideh Shakeri; Fabien Scalzo; Yeejin Lee; Dieter R Enzmann; Holden H Wu; Steven S Raman; Kyunghyun Sung
Journal: Abdom Radiol (NY) Date: 2019-06

2. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study.

Authors: Peter Ström; Kimmo Kartasalo; Henrik Olsson; Leslie Solorzano; Brett Delahunt; Daniel M Berney; David G Bostwick; Andrew J Evans; David J Grignon; Peter A Humphrey; Kenneth A Iczkowski; James G Kench; Glen Kristiansen; Theodorus H van der Kwast; Katia R M Leite; Jesse K McKenney; Jon Oxley; Chin-Chen Pan; Hemamali Samaratunga; John R Srigley; Hiroyuki Takahashi; Toyonori Tsuzuki; Murali Varma; Ming Zhou; Johan Lindberg; Cecilia Lindskog; Pekka Ruusuvuori; Carolina Wählby; Henrik Grönberg; Mattias Rantalainen; Lars Egevad; Martin Eklund
Journal: Lancet Oncol Date: 2020-01-08 Impact factor: 41.316

3. Utility of Pathology Imagebase for standardisation of prostate cancer grading.

Authors: Lars Egevad; Brett Delahunt; Daniel M Berney; David G Bostwick; John Cheville; Eva Comperat; Andrew J Evans; Samson W Fine; David J Grignon; Peter A Humphrey; Jonas Hörnblad; Kenneth A Iczkowski; James G Kench; Glen Kristiansen; Katia R M Leite; Cristina Magi-Galluzzi; Jesse K McKenney; Jon Oxley; Chin-Chen Pan; Hemamali Samaratunga; John R Srigley; Hiroyuki Takahashi; Lawrence D True; Toyonori Tsuzuki; Theo van der Kwast; Murali Varma; Ming Zhou; Mark Clements
Journal: Histopathology Date: 2018-03-05 Impact factor: 5.087

4. MRI-Guided In-Bore Biopsy: Differences Between Prostate Cancer Detection and Localization in Primary and Secondary Biopsy Settings.

Authors: Lars Schimmöller; Dirk Blondin; Christian Arsov; Robert Rabenalt; Peter Albers; Gerald Antoch; Michael Quentin
Journal: AJR Am J Roentgenol Date: 2016-01 Impact factor: 3.959

5. Label-driven magnetic resonance imaging (MRI)-transrectal ultrasound (TRUS) registration using weakly supervised learning for MRI-guided prostate radiotherapy.

Authors: Qiulan Zeng; Yabo Fu; Zhen Tian; Yang Lei; Yupei Zhang; Tonghe Wang; Hui Mao; Tian Liu; Walter J Curran; Ashesh B Jani; Pretesh Patel; Xiaofeng Yang
Journal: Phys Med Biol Date: 2020-06-26 Impact factor: 3.609

6. Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies.

Authors: David F Steiner; Kunal Nagpal; Rory Sayres; Davis J Foote; Benjamin D Wedin; Adam Pearce; Carrie J Cai; Samantha R Winter; Matthew Symonds; Liron Yatziv; Andrei Kapishnikov; Trissia Brown; Isabelle Flament-Auvigne; Fraser Tan; Martin C Stumpe; Pan-Pan Jiang; Yun Liu; Po-Hsuan Cameron Chen; Greg S Corrado; Michael Terry; Craig H Mermel
Journal: JAMA Netw Open Date: 2020-11-02

7. Automated detection of aggressive and indolent prostate cancer on magnetic resonance imaging.

Authors: Arun Seetharaman; Indrani Bhattacharya; Leo C Chen; Christian A Kunder; Wei Shao; Simon J C Soerensen; Jeffrey B Wang; Nikola C Teslovich; Richard E Fan; Pejman Ghanouni; James D Brooks; Katherine J Too; Geoffrey A Sonn; Mirabela Rusu
Journal: Med Phys Date: 2021-03-24 Impact factor: 4.071

8. Automatic segmentation of prostate MRI using convolutional neural networks: Investigating the impact of network architecture on the accuracy of volume measurement and MRI-ultrasound registration.

Authors: Nooshin Ghavami; Yipeng Hu; Eli Gibson; Ester Bonmati; Mark Emberton; Caroline M Moore; Dean C Barratt
Journal: Med Image Anal Date: 2019-09-11 Impact factor: 8.545

9. Simulated clinical deployment of fully automatic deep learning for clinical prostate MRI assessment.

Authors: Patrick Schelb; Xianfeng Wang; Jan Philipp Radtke; Manuel Wiesenfarth; Philipp Kickingereder; Albrecht Stenzinger; Markus Hohenfellner; Heinz-Peter Schlemmer; Klaus H Maier-Hein; David Bonekamp
Journal: Eur Radiol Date: 2020-08-07 Impact factor: 5.315

10. Prostate-specific membrane antigen PET-CT in patients with high-risk prostate cancer before curative-intent surgery or radiotherapy (proPSMA): a prospective, randomised, multicentre study.

Authors: Michael S Hofman; Nathan Lawrentschuk; Roslyn J Francis; Colin Tang; Ian Vela; Paul Thomas; Natalie Rutherford; Jarad M Martin; Mark Frydenberg; Ramdave Shakher; Lih-Ming Wong; Kim Taubman; Sze Ting Lee; Edward Hsiao; Paul Roach; Michelle Nottage; Ian Kirkwood; Dickon Hayne; Emma Link; Petra Marusic; Anetta Matera; Alan Herschtal; Amir Iravani; Rodney J Hicks; Scott Williams; Declan G Murphy
Journal: Lancet Date: 2020-03-22 Impact factor: 79.321