Literature DB >> 26060063

Clinical evaluation of a computer-aided diagnosis system for determining cancer aggressiveness in prostate MRI.

Geert J S Litjens¹, Jelle O Barentsz², Nico Karssemeijer², Henkjan J Huisman².

Abstract

OBJECTIVES: To investigate the added value of computer-aided diagnosis (CAD) on the diagnostic accuracy of PIRADS reporting and the assessment of cancer aggressiveness.
METHODS: Multi-parametric MRI and histopathological outcome of MR-guided biopsies of a consecutive set of 130 patients were included. All cases were prospectively PIRADS reported and the reported lesions underwent CAD analysis. Logistic regression combined the CAD prediction and radiologist PIRADS score into a combination score. Receiver-operating characteristic (ROC) analysis and Spearman's correlation coefficient were used to assess the diagnostic accuracy and correlation to cancer grade. Evaluation was performed for discriminating benign lesions from cancer and for discriminating indolent from aggressive lesions.
RESULTS: In total 141 lesions (107 patients) were included for final analysis. The area-under-the-ROC-curve of the combination score was higher than for the PIRADS score of the radiologist (benign vs. cancer, 0.88 vs. 0.81, p = 0.013 and indolent vs. aggressive, 0.88 vs. 0.78, p < 0.01). The combination score correlated significantly stronger with cancer grade (0.69, p = 0.0014) than the individual CAD system or radiologist (0.54 and 0.58).
CONCLUSIONS: Combining CAD prediction and PIRADS into a combination score has the potential to improve diagnostic accuracy. Furthermore, such a combination score has a strong correlation with cancer grade. KEY POINTS: • Computer-aided diagnosis helps radiologists discriminate benign findings from cancer in prostate MRI. • Combining PIRADS and computer-aided diagnosis improves differentiation between indolent and aggressive cancer. • Adding computer-aided diagnosis to PIRADS increases the correlation coefficient with respect to cancer grade.

Entities: Disease Gene Species

Keywords: Computer-aided diagnosis; Diagnostic performance; Magnetic resonance imaging; Observer study; Prostate cancer

Mesh：

Substances：
Contrast Media

Year: 2015 PMID： 26060063 PMCID： PMC4595541 DOI： 10.1007/s00330-015-3743-y

Source DB: PubMed Journal: Eur Radiol ISSN： 0938-7994 Impact factor: 5.315

Introduction

Multi-parametric magnetic resonance imaging (mpMRI) is emerging as an important modality in prostate cancer diagnosis [1-3]. Several studies have shown that in patients with initial negative trans-rectal ultrasound-guided biopsies (TRUSGB) and persistently elevated prostate-specific antigen (PSA) expert readers using mpMRI find cancer in 38–59 % of the cases [4, 5]. Furthermore, it has been shown that mpMRI correctly upgrades TRUSGB-detected cancers in up to 30 % of cases [6]. Several other studies found that the negative predictive value of mpMRI is high enough to avoid TRUSGB in 30–50 % of men with persistently elevated PSA [7, 8]. However, one of the main limitations for broader acceptance of mpMRI is the lack of required expertise, especially in the acquisition and interpretation of the MR images [1, 9, 10]. In order to improve the acquisition and interpretation of mpMRI, the European Society for Urogenital Radiology (ESUR) established initial guidelines for acquisition and standardized interpretation of mpMRI (PIRADS) [1]. These guidelines have been evaluated by several groups, for detection of cancer both prior to biopsy [8, 11] and after initial negative TRUSGB [12-15]. There are, however, still two major issues in current prostate MRI: determining which cancers need treatment (assessment of aggression) and the large amount of false positives resulting in unnecessary biopsies. Computer-aided diagnosis (CAD) might be able to address these problems. The purpose of this study is twofold. One is to provide a clinical investigation of the effect of CAD [16] on the diagnostic accuracy of prostate mpMRI reporting via independent combination of PIRADS scores and CAD prediction into a radiologist/CAD combination score. The performance of this combination score is evaluated in a comparatively large cohort of patients with MR-guided MR-biopsy histopathological outcome as reference standard. Secondly, we investigate the ability of CAD to estimate prostate cancer aggressiveness.

Materials and methods

Patient data

An institutional review board (IRB) waiver applies to this study as it uses anonymized imaging data and MR-guided biopsy results obtained through regular clinical care. In total 130 consecutive patients from 1 January to 1 September 2013 who received both an mpMRI and a subsequent MR-guided MR-biopsy at our institution were included. The inclusion criteria for the detection mpMRI were an initial negative TRUSGB and persistently elevated PSA (consistently above 4 ng/mL). Multi-parametric MRIs were acquired according to the ESUR guidelines and included T2-weighted imaging in three orthogonal directions, diffusion-weighted imaging and dynamic contrast-enhanced imaging. All MRIs were performed at a Siemens 3 T MRI scanner (TRIOTIM or Skyra) without an endo-rectal coil. Full acquisition details are presented in Table 1.

Table 1

MRI sequence details for the different types of acquisitions

	SN	SR	ST	AM	FOV	ET	RT	FA	SS
T2W	Turbo spin-echo	0.28 –0.6 mm	3.0–3.2 mm	320 × 320 – 384 × 384	108 × 108 - 192 × 192 mm	101 – 104 ms	4480–6840 ms	120- 160°	Acquired in three orthogonal directions: transversal, sagittal and coronal
DWI	Echo planar	2 mm	3 mm	128 × 128	256 × 256 mm	63–81 ms	2800 – 3600 ms	90°	3 b-values: 50, 400–500, 800 averaged over three directions. Apparent diffusion coefficient map calculated by the scanner software
DCE	Fast low-angle shot spoiled gradient recalled echo	1.5–1.8 mm	3.2–5 mm	128 × 128	192 × 192 – 230x230 mm	1.41 ms	36 ms	10– 14°	Temporal resolution of 3.38–4.65 seconds, 36–50 timepoints. 15 mL contrast agent used (Dotarem, Guerbet, France)

SN = sequence name, SR = spatial resolution, ST = slice thickness, AM = acquisition matrix, FOV = Field of View, ET = echo time, RT = repetition time, FA = flip angle, SS = sequence specific details

MRI sequence details for the different types of acquisitions SN = sequence name, SR = spatial resolution, ST = slice thickness, AM = acquisition matrix, FOV = Field of View, ET = echo time, RT = repetition time, FA = flip angle, SS = sequence specific details Each mpMRI was regularly, prospectively read by one radiologist out of a group of seven radiologists who reported prostate MRI in our clinic. Experience levels of the reporting radiologist ranged from moderately experienced (2 years) to very experienced (J.B., 20 years). Details on the amount of cases read by each radiologist are presented in Table 2. The ESUR prostate imaging reporting and data system (PIRADS) classification was used to assign a five-point PIRADS score to one or more lesions.

Table 2

Overview of radiologist reading cases in the study cohort, including amount of cases read (out of 107 included studies) and years of experience

Reader	Years of experience	Cases read
J.B.	20	25
J.F.	12	14
P.Z.	8	16
S.J.	3	20
M. vd. L	2	22
R.M.	2	5
J.H.	2	5

Overview of radiologist reading cases in the study cohort, including amount of cases read (out of 107 included studies) and years of experience Each MR study was reported using a dedicated prostate MR workstation that allowed radiologists to indicate one or more areas of suspicion with a sphere enclosing the lesions. If no suspicious areas could be identified, a location was marked which was deemed normal/benign and assigned a PIRADS score of 1 or 2. This is performed in routine clinical care for accountability, to ensure that each case has been read and reported. Typically, PIRADS 1 or 2 lesions do not get biopsied. Sometimes a PIRADS 2 lesion was also biopsied, when a PIRADS 3 or higher lesion was also identified and the patient was thus already scheduled for MR-guided MR biopsy. The locations and scores were automatically recorded in a database. MR-guided biopsies were performed by medical experts with multiple years of experience in MR-guided prostate biopsies. At the start of the biopsy procedure a T2-weighted volume and an ADC map were acquired according to the prostate cancer detection protocol (Table 1). These sequences were used to relocate the lesions in the prior detection MRI. After lesions have been identified, a needle guide is inserted transrectally. Consecutive sagittal and transversal MRIs are made during repositioning of the needle guide to assess whether the correct position has been reached. Once the correct position has been reached a biopsy needle is inserted and a biopsy taken. To verify the biopsy location, sagittal and transversal images were made with the needle in situ. Subsequently, biopsies were histopathologically processed, inspected and graded by an experienced uropathologist (17 years of experience in prostate pathology).

Computer-aided diagnosis system

The computer-aided diagnosis system evaluated in this paper was previously presented in [16]. First, the system computes quantitative voxel features, which were designed to capture characteristics described by the PIRADS guidelines. A full feature listing can be found in Table 3. These voxel features are then fed to a random forest classifier trained to determine a continuous likelihood score for each voxel to identify cancer, resulting in a likelihood image. Subsequently, in a second stage, the system used the centre of the sphere indicated by the radiologist as a starting point for lesion segmentation, which is performed on the pre-computed likelihood image. After lesion segmentation histogram statistics are calculated on the voxel features within the lesion (e.g., percentiles, mean, standard deviation). In addition, local contrast is calculated by comparing the voxel feature values within the lesion to values outside the lesion. Symmetry is calculated by comparing the feature values within the lesion to the feature values at the same relative position on the contra-lateral side of the prostate. The statistical, local contrast and symmetry features are then combined using a second random forest classifier trained to predict cancer likelihood per lesion. The system is able to take into account the zonal location of the lesion via the use of a probabilistic segmentation of the prostate zones as one of its features (Table 3). The construction of this probabilistic segmentation is detailed in [17].

Table 3

Descriptions of the voxel features used in the computer-aided diagnosis system

Name	Type	Description
T2W	Intensity	T2-weighted voxel grey value, related to voxel T2
ADC	Intensity	Apparent diffusion coefficient, measure for cellular density
b800	Intensity	High b-value image, areas with low diffusivity appear bright
T2-map	Intensity	Calculated T2-map based on proton density and transversal T2W image [17]
x-pos	Anatomical	Relative cumulative position within the prostate mask between 0 and 1 in the x-direction
y-pos	Anatomical	Relative cumulative position within the prostate mask between 0 and 1 in the y-direction
z-pos	Anatomical	Relative cumulative position within the prostate mask between 0 and 1 in the z-direction
Distance	Anatomical	Relative distance to the prostate boundary between 0 and 1
PZ	Likelihood	Anatomical likelihood of being a peripheral zone voxel between 0 and 1 [17]
Ktrans	Pharmacokinetic	Pharmacokinetic parameter, related to vessel permeability
kep	Pharmacokinetic	Pharmacokinetic parameter, related to permeability and extracellular volume
tau	Pharmacokinetic	Dynamic parameter, related to the time-to-peak of contrast agent concentration
LateWash	Pharmacokinetic	Dynamic parameter, related to the washout of contrast agent
Gaussian texture bank	Texture	Calculate multi-scale Gaussian derivatives on the T2W image
ADC	Spatial filter	Multi-scale focal lesion detection using the Li spatial filter [27] on ADC map
Ktrans	Spatial filter	Multi-scale focal lesion detection using the Li spatial filter [27] on Ktrans map
LateWash	Spatial filter	Multi-scale focal lesion detection using the Li spatial filter [27] on LateWash map
tau	Spatial filter	Multi-scale focal lesion detection using the Li spatial filter [27] on tau map

Descriptions of the voxel features used in the computer-aided diagnosis system The CAD system was trained with independent, retrospective patient data (237 patients), which had no overlap with the data set used in this study. The retrospective data was acquired in a similar manner (same MRI protocol) as the evaluation data and also had MR-guided biopsy as the reference standard.

Combination of PIRADS score and computer-aided diagnosis (CAD) likelihood

The use of the system as proposed in this paper is presented schematically in Fig. 1. The initial identification of potential suspicious regions was performed by the radiologist, after which the radiologist and the CAD system gave independent scores on whether clinically significant cancer was present [1]. The radiologist did this by assigning a five-point PIRADS score, while the CAD system assigned a continuous likelihood score between 0 and 1.

Fig. 1

Suggested workflow for the proposed computer-aided diagnosis (CAD) system. The biopsy decision can be made by the radiologist, another attending clinician or by using the combination score to independently combine the PIRADS score and the CAD likelihood The reported scores of the radiologists (PIRADS) and CAD (likelihoods) were combined into a combination score via logistic regression, which is a technique to map multiple variables to one single, continuous outcome variable (between 0.0 and 1.0) in an independent manner. The regression model was created with SPSS (version 20.0.01, Chicago, IL, USA). The logistic regression was performed on the retrospective data that was also used to train the CAD system to ensure no bias would occur by training and testing on the same data. It was thus independent of the evaluation data used in this paper. Alternative methods of incorporating CAD results may be used in clinical workflow, such as asking a second radiologist to make a final decision based on the two scores, but these were not investigated in this paper.

Statistical evaluation

Radiologist-identified lesions were categorized into either benign or cancer based on the MR-guided MR biopsy outcome. Cancerous lesions were further subdivided into low-grade, intermediate-grade or high-grade cancer based on the MR-guided biopsy Gleason scores, similar to Vos et al. [18] and Hambrock et al. [19]. Our high-sensitivity MR-guided biopsy strategy has been shown to have a concordance of 95 % with prostatectomy Gleason grade [20]. We used two different settings for evaluation in this study: either benign versus cancerous or indolent versus aggressive lesions. In the latter case the benign and low-grade lesions are considered indolent and intermediate- and high-grade lesions are considered aggressive. These settings are summarized in Table 4. The CAD system and logistic regression model were constructed separately for each setting using the retrospective data.

Table 4

Mapping of Gleason scores to cancer grade

Gleason scores	Grade	Category
None	Benign	Indolent
3 + 3 or lower, no 4 or 5 component	Low-grade	Indolent
2 + 4, 3 + 4, 2 + 5	Intermediate-grade	Aggressive
3 + 5, any cancer with a major 4 or 5 component	High-grade	Aggressive

Mapping of Gleason scores to cancer grade The statistical evaluation consisted of three parts. First, we investigated the hypothesized increase in predictive power of the combination score over the radiologist PIRADS score alone using the likelihood ratio test on the logistic regression models. Second, the diagnostic performance of the CAD system, the radiologist PIRADS score and the combination score was evaluated using receiver-operating characteristic (ROC) analysis for both evaluation strategies. The significance of improvement for area under the ROC curve, and different sensitivity-specificity pairs at the different PIRADS thresholds, was tested using bootstrapping. A total of 10,000 bootstrap samples was used to obtain the 95 % confidence intervals (CIs). Bootstrapping was stratified according to patient to circumvent bias introduced by multiple lesions per patient. To assess the effect of the zonal location of the lesions on the performance of the radiologist, the CAD system and the combination score the dataset was split into two sets, one set containing only central gland lesions and one set only containing peripheral zone lesions. The effect of observer experience on the performance of the combination score was also assessed. The dataset was split into two groups, one containing the cases reported by the experienced radiologists (more than 5 years) and one containing the cases reported by the less experienced radiologists (less than 5 years, but more than 2 years). Third, we correlated radiologist PIRADS, CAD score and the combined score to cancer grade. As cancer grade is an ordinal variable, Spearman’s rank correlation coefficient was used. The significance of differences in correlation coefficients was tested using Steiger’s z-test for dependent correlation coefficients [21]. For all significance tests a p-value threshold of 0.05 was chosen. SPSS (SPSS, version 20.0.01) and in-house developed tools for bootstrapping were used for all statistical analysis.

Results

Of the initially included 130 patients, 23 were excluded, 18 due to previous treatment for prostate cancer, two for failed diffusion-weighted imaging, two because they did not undergo dynamic contrast-enhanced imaging and one patient because no biopsy was taken during the biopsy session. The 107 included patients had a median age and PSA level of 66 years (range 48–83) and 13 ng/mL (range 1–56), respectively, which is similar to other studies using patient data with the same inclusion criteria (initial negative TRUSGB and persistently elevated PSA) [12-15]. Further details are summarized in Fig. 2 and Table 5.

Fig. 2

STARD diagram of inclusion and exclusion criteria of the prospective patient cohort

Table 5

Characteristics of patients and biopsy specimens for the prospective cohort used to evaluate the potential added value of a computer-aided diagnosis system for the assessment of prostate cancer. For each group of lesions the numbers between brackets indicate the number of lesions in the peripheral zone and the central gland, respectively

Number of patients	107
PSA level, ng/ml, median (range)	13 (1–56)
Age, y, median (range)	66 (48–83)
Percentage of cancer per core, median (range)	50 (7–100)
Gleason score	Grade	No. of lesions	Totals	141 (69/72)
Normal/Benign	Normal/Benign	45 (28/17)	No cancer	45 (28/17)
2 + 5	Intermediate	1 (0/1)	Low	28 (10/18)
3 + 2	Low	2 (0/2)	Intermediate	37 (16/21)
3 + 3	Low	26 (10/16)	High	31 (15/16)
3 + 4	Intermediate	36 (16/20)
4 + 3	High	12 (7/5)
4 + 4	High	5 (2/3)
4 + 5	High	10 (4/6)
5 + 4	High	3 (2/1)
5 + 5	High	1 (0/1)

STARD diagram of inclusion and exclusion criteria of the prospective patient cohort Characteristics of patients and biopsy specimens for the prospective cohort used to evaluate the potential added value of a computer-aided diagnosis system for the assessment of prostate cancer. For each group of lesions the numbers between brackets indicate the number of lesions in the peripheral zone and the central gland, respectively In total 141 suspicious regions were identified in these patients. All these regions were biopsied under MR-guidance. Of these regions, 68 % were positive and 32 % were negative for prostate cancer. The zonal distribution of the lesions was almost equal, with 69 regions being located in the peripheral zone and 72 in the central gland. Gleason grades were 2 + 5 (1 %), 3 + 2 (2 %), 3 + 3 (27 %), 3 + 4 (38 %), 4 + 3 (13 %), 4 + 4 (5 %), 4 + 5 (10 %), 5 + 4 (3 %) and 5 + 5 (1 %). Further details about the distribution of the lesion grades can be found in Table 5.

The effect of CAD on radiologist performance

First, the logistic regression procedure showed that including the CAD system likelihood in addition to the radiologist PIRADS score resulted in a model with significantly improved predictive power (p < 0.001, likelihood ratio test) for both evaluation settings (benign vs. cancer and indolent vs. aggressive). Using the obtained regression coefficients we created a weighted combination score: for the benign versus cancer setting and for the indolent versus aggressive setting. C is the CAD system likelihood (ranging from 0 to 1) and P is the radiologist PIRADS score (ranging from 1 to 5). The regression models are visually represented in Fig. 3.

Fig. 3

Visual depictions of the regression models to generate the combination score of the radiologist and the computer-aided diagnosis (CAD) system: (a) shows the model for the benign vs. cancer setting, (b) for the indolent vs. aggressive setting. The likelihood of cancer is indicating by the colour coding and the contour labels and ranges from 0 to 1. Green indicates low likelihood and red indicates high likelihood The ROC analyses showed a significant increase (p < 0.05) in area under the ROC curve from 0.81 to 0.88 in the benign versus cancer setting and from 0.78 to 0.87 in the indolent versus aggressive setting when using the combination score versus only PIRADS (Fig. 4a, b and Table 6). Furthermore, this increase is not affected by the zonal location of the lesion under investigation (Fig. 4c, d, e and f and Table 7). The ROC analysis also shows that radiologists and CAD have a comparable diagnostic accuracy. Lastly, Table 6 also includes the increases in sensitivity and specificity obtained at specific PIRADS scores when using the combination score. For example, at PIRADS 4 without using CAD a sensitivity of 0.93 is attainable at a specificity of 0.37 in the indolent versus aggressive setting. However, when combining the PIRADS score with the CAD score we obtain a significantly increased sensitivity of 0.98 at a significantly increased specificity of 0.59 (p < 0.05). Last, we show that both less experienced and experienced readers can improve their performance by using CAD. In the indolent versus aggressive evaluation setting both groups improve significantly when using CAD with an increase in AUC from 0.76 to 0.85 for less experienced and from 0.78 to 0.87 for experienced readers (Fig. 5, Table 8). In the benign versus cancer settings both groups also improve, but only the less experienced readers significantly (AUC from 0.79 to 0.89).

Fig. 4

Table 6

Sensitivity-specificity pairs and area under the receiver operating characteristic (ROC) curve for the radiologist and the computer-aided diagnosis (CAD)-radiologist combination including 95 % confidence intervals (CIs) and p-values determined by bootstrapping. Significant p-values (p < 0.05) are indicated in bold. The first part of the table contains the results for the benign vs. cancer evaluation setting whereas the second part of the table contains the results for the indolent vs. aggressive evaluation setting

CAD+radiologist					Radiologist
Benign vs. cancer	Sensitivities, mean (95 % CI)	p-value	Specificities, mean (95 % CI)	p-value	Sensitivities, mean (95 % CI)	Specificities, mean (95 % CI)
PIRADS 2	1.0 (1.0–1.0)	1	0.25 (0.0–0.44)	0.008	1.0 (1.0–1.0)	0.12 (0.0–0. 27)
PIRADS 3	0.99 (0.99–1.0)	0.49	0.30 (0.17–0.45)	0.02	0.99 (0.98–1.0)	0.15 (0.04–0.28)
PIRADS 4	0.92 (0.84–0.98)	0.44	0.50 (0.26–0.72)	0.48	0.91 (0.81–0.97)	0.49 (0.30–0.67)
PIRADS 5	0.76 (0.61–0.88)	0.08	0.90 (0.77–0.98)	0.098	0.62 (0.38–0.79)	0.81 (0.67–0.92)
AUC	0.878 (0.824–0.928)	0.013			0.808 (0.728–0.880)
CAD+radiologist					Radiologist
Indolent vs. aggressive	Sensitivities, mean (95 % CI)	p-value	Specificities, mean (95 % CI)	p-value	Sensitivities, mean (95 % CI)	Specificities, mean (95 % CI)
PIRADS 2	1 (1–1)	1	0.259 (0.00–0.604)	0.023	1 (1–1)	0.094 (0.0–0.185)
PIRADS 3	0.99 (0.98–1.0)	0.51	0.259 (0.00–0.604)	0.023	0.997 (0.983–1.0)	0.094 (0.0–0.185)
PIRADS 4	0.98 (0.94–1.0)	0.029	0.585 (0.379–0.763)	0.013	0.934 (0.861–0.98)	0.366 (0.200–0.536)
PIRADS 5	0.82 (0.68–0.96)	0.09	0.78 (0.64–0.90)	0.105	0.731 (0.523–0.873)	0.707 (0.534–0.833)
AUC	0.874 (0.813–0.927)	0.001			0.779 (0.701–0.848)

Table 7

Receiver operating characteristic (ROC) analysis differentiated with respect to the zonal location of the lesions. Area under the ROC curve is reported for both the benign vs. cancer and indolent vs. aggressive evaluation settings. P-values measuring whether the increase in area under the ROC curve is significant when using computer-aided diagnosis (CAD) were calculated using bootstrapping. Significant p-values are indicated in bold

Area under the ROC curve	Radiologist (PZ)	Radiologist (CG)	CAD (PZ)	CAD (CG)	Combined score (PZ)	p-value	Combined score (CG)	p-value
Benign vs. cancer	0.81 (0.70–0.90)	0.83 (0.72–0.92)	0.79 (0.67–0.89)	0.76 (0.62–0.87)	0.88 (0.80–0.94)	0.04	0.87 (0.78–0.95)	0.15
Indolent vs. aggressive	0.79 (0.69–0.88)	0.77 (0.66–0.87)	0.80 (0.70–0.89)	0.81 (0.69–0.91)	0.87 (0.79–0.93)	0.002	0.89 (0.80–0.95)	0.002

Fig. 5

Receiver-operating characteristic (ROC) curve showing the performance of the combined score vs. the radiologist alone with respect to the level of experience. The shaded areas indicated the 95 % confidence intervals (CIs) as calculated using bootstrapping. The radiologist performance is indicated with point for the different PIRADS thresholds. The vertical error bars indicate the 95 % CIs on the sensitivity and horizontal error bars indicated the 95 % CI on the specificity as estimated by bootstrapping. a is the result of the benign versus cancer evaluation setting, b is the result of the indolent vs. aggressive settings

Table 8

Receiver operating characteristic (ROC) analysis comparing inexperienced and experienced readers (less or more than 8 years of experience with prostate MRI) when using computer-aided diagnosis (CAD). Area under the ROC curve including 95 % confidence intervals (CIs) are reported for both the benign vs. cancer and indolent vs. aggressive evaluation settings. P-values measuring whether the increase in area under the ROC curve is significant when using CAD were calculated using bootstrapping. Significant p-values are indicated in bold

Area under the ROC curve	Less experienced readers	Experienced readers	Combined score of inexperienced readers+CAD	p-value	Combined score of experienced readers+CAD	p-value
Benign vs. cancer	0.79 (0.69–0.90)	0.82 (0.71–0.92)	0.89 (0.82–0.96)	0.004	0.86 (0.77–0.94)	0.25
Indolent vs. aggressive	0.76–(0.66–0.86)	0.78 (0.70–0.85)	0.85 (0.76–0.93)	0.006	0.87 (0.81–0.93)	0.001

Receiver-operating characteristic (ROC) curve showing the performance of the computer-aided diagnosis (CAD) system (orange) and the radiologist/CAD-system combination (blue). The shaded areas indicated the 95 % confidence intervals (CIs) as calculated using bootstrapping. The radiologist performance is indicated with point for the different PIRADS thresholds. The vertical error bars indicate the 95 % CI on the sensitivity and horizontal error bars indicated the 95 % CI on the specificity as estimated by bootstrapping. a, c and e are the results of the benign versus cancer evaluation setting, b, d and f are the result of the indolent versus aggressive settings. a and b show the results over all lesions, c and d only the peripheral zone lesions and e and f only the central gland lesions Sensitivity-specificity pairs and area under the receiver operating characteristic (ROC) curve for the radiologist and the computer-aided diagnosis (CAD)-radiologist combination including 95 % confidence intervals (CIs) and p-values determined by bootstrapping. Significant p-values (p < 0.05) are indicated in bold. The first part of the table contains the results for the benign vs. cancer evaluation setting whereas the second part of the table contains the results for the indolent vs. aggressive evaluation setting Receiver operating characteristic (ROC) analysis differentiated with respect to the zonal location of the lesions. Area under the ROC curve is reported for both the benign vs. cancer and indolent vs. aggressive evaluation settings. P-values measuring whether the increase in area under the ROC curve is significant when using computer-aided diagnosis (CAD) were calculated using bootstrapping. Significant p-values are indicated in bold Receiver-operating characteristic (ROC) curve showing the performance of the combined score vs. the radiologist alone with respect to the level of experience. The shaded areas indicated the 95 % confidence intervals (CIs) as calculated using bootstrapping. The radiologist performance is indicated with point for the different PIRADS thresholds. The vertical error bars indicate the 95 % CIs on the sensitivity and horizontal error bars indicated the 95 % CI on the specificity as estimated by bootstrapping. a is the result of the benign versus cancer evaluation setting, b is the result of the indolent vs. aggressive settings Receiver operating characteristic (ROC) analysis comparing inexperienced and experienced readers (less or more than 8 years of experience with prostate MRI) when using computer-aided diagnosis (CAD). Area under the ROC curve including 95 % confidence intervals (CIs) are reported for both the benign vs. cancer and indolent vs. aggressive evaluation settings. P-values measuring whether the increase in area under the ROC curve is significant when using CAD were calculated using bootstrapping. Significant p-values are indicated in bold

Correlation of likelihood and cancer grade

Both CAD likelihood and PIRADS score correlate significantly with cancer grade, but the combination score shows the strongest correlation. This is confirmed when assessing the correlation coefficients. In the benign versus cancer setting, correlation was 0.534, 0.582 and 0.684 for CAD, radiologist and combination, respectively. In the indolent versus aggressive setting the correlation coefficient was 0.536, 0.582 and 0.694 for CAD, radiologist and combination, respectively. The increase in correlation when using the combination score instead of just PIRADS or CAD was significant (p < 0.01). The ability of each of the three scores to predict aggressiveness is visualized in Fig. 6 for both evaluation settings.

Fig. 6

Relationship between computer-aided diagnosis (CAD) system likelihood and cancer grade presented by box-plots. One can observe a positive correlation between cancer grade and CAD system likelihood. a is the result of the benign vs. cancer evaluation setting, b is the result of the indolent vs. aggressive settings

Discussion

A recently developed CAD system for the computerized analysis of prostate MR was shown to have a similar diagnostic accuracy to well trained prostate MR radiologists. When combined with the PIRADS score into a combination score, diagnostic performance improved significantly. It is important to note that the CAD system was used in a regular clinical practice setting on a large cohort. This provides further evidence of the benefit of CAD in helping improve diagnostic accuracy. The CAD system score showed a significant correlation (0.54) with cancer grade similar to the PIRADS score (0.58). A significantly higher correlation (0.69) was obtained by using the combination score. This correlation coefficient is the highest currently reported in literature [18, 22]. A noticeable difference with other multivariate aggressiveness correlation studies is that this study attains the correlation in a setting with radiologist-indicated regions instead of pathology pre-defined regions of interest, which is more similar to regular clinical practice. Although the performance of both the radiologist and the CAD system were similar, they provide complementary information, as the combination score results in an improved ROC curve (blue curve, Fig. 4a, b). In both evaluation settings (benign vs. cancer and indolent vs. aggressive) the area under the ROC curve increased significantly (0.81 to 0.88, p = 0.013 and 0.78 to 0.87, p = 0.001, respectively). If we compare these results to those found in the literature, we observe similar values for performance of readers with CAD; however, the PIRADS performance is somewhat lower in our study (0.84 to 0.87 in [23], 0.85 to 0.91 in [24]). We believe these differences to be caused by the difference in reading setting and the fact that we used a substantially larger and more difficult cohort. Our study used prospective clinical reading, whereas these previous studies used a retrospective batch reading setting. Furthermore, previous studies were limited to patients scheduled for radical prostatectomy, which differs from the regular clinical population used in this study. Furthermore, we showed that identifying clinically significant disease using CAD can improve both experienced and less experienced readers (increase of 0.09 in AUC for both groups). However, when discriminating any cancer from benign lesions only less experienced readers improved significantly, indicating that CAD might especially be helpful in identifying clinically significant disease. Note that direct comparison between the performance of less experienced and experienced readers cannot be performed in this study as both groups reported on different patients. However, we do not expect results to change as both groups evaluated a sufficiently large and similar subset of cases. The added value of CAD did not seem to depend on the zonal location of the lesions. Increase in performance when using CAD was similar for both peripheral zone and central gland lesions. The CAD system performed equally well on peripheral zone and central gland lesions, indicating that it has successfully learned how to take into account the zonal characteristics of the lesions. CAD could possibly help shift the biopsy threshold from PIRADS 3 to PIRADS 4, which would lead to a significant reduction of MR-guided biopsies. At a PIRADS threshold of 3, CAD significantly increased the specificity in both evaluation settings (Table 1; 0.15 to 0.30, p = 0.020 and 0.09 to 0.26 p = 0.023). At the PIRADS 4 threshold a significant increase in sensitivity and specificity was found (0.93 to 0.98, p = 0.029, 0.37 to 0.59, p = 0.013) in the indolent versus aggressive setting. The latter indicates that by using CAD and increasing the biopsy threshold from PIRADS 3 to 4 almost no loss in detection of aggressive lesions (sensitivity 0.99 vs. 0.98) occurs, while a specificity improvement is obtained (0.59 vs. 0.26), reducing unnecessary biopsies. Of course this has to be investigated further in future clinical studies. One important aspect relevant to patient prognosis was not assessed by the CAD system; the presence or absence of extracapsular extension. Being able to identify this aspect would further enhance the applicability of the CAD system. This study has some limitations. Firstly, MR-guided biopsy has a 95 % concordance with prostatectomy Gleason grade for Gleason 4 and 5 components [20, 25] and has been shown to be able to detect clinically significant cancer in men with previous negative TRUS biopsies [26]. Although in general concordance rates with prostatectomy Gleason grade are high, they are not perfect. Thus, some of the cancers in our study may be under- or overgraded. We expect the effect on our results to be minimal, as this only effects the indolent versus aggressive setting. A second limitation is that each case in this study was read by one of seven radiologists. We know diagnostic accuracy is dependent on reader experience and thus our results depend on the average reader experience of the group. All our readers have had reasonable training and experience of at least 2 years [8, 11]. Third, due to the single reader per case and the prospective reading setting inter- and intraobserver variability and false-negative rates could not be assessed. As such, this study does not vacate the need for retrospective observer studies, in which these aspects could be assessed, but provides a different, more clinically realistic view on the added value of CAD. Furthermore, due to our comparatively large number of cases we were still able to show significantly improved diagnostic performance when using CAD. The proposed method of implementing CAD in clinical practice (independent combination of PIRADS score and CAD likelihood into a combination score) might not be feasible, as radiologists or urologists will always have the final say. Nevertheless, we choose to perform independent combination to assess the potential observer-independent effect of CAD. In future work, one could assess the most optimal way for radiologists to incorporate CAD results in their reports. PIRADS 1 and 2 lesions were generally not biopsied and therefore are only partially included in this study. This precludes assessment of the effect of the CAD system in those lesions. However, this has little impact on the results of this study. Of all seven biopsied PIRADS 2 lesions, none were categorized as cancer. The negative predictive value of PIRADS 1 and 2 scores is already so high that radiologist do not need computer aid for these PIRADS scores. The literature also confirms this assessment, with the studies by Thompson et al. [8] and Pokorny et al. [11] reporting MRI sensitivities and negative predictive values of 97 and 96.9 %, respectively. The CAD system has most potential in more accurately discriminating which PIRADS 3, 4 or 5 lesions require biopsy and the results at these scores are not affected by the lack of PIRADS 1 or 2 biopsies. Last, the patient population in this study contained only patients for whom initial PSA tests and TRUS biopsies were inconclusive. As such, the results of this study cannot be directly translated to other patient groups (e.g., staging). However, due to the similar protocols for detection and staging MRI we expect results to be comparable. Furthermore, with prostate cancer guidelines in many countries now recommending MRI if PSA/TRUS results are inconclusive, we expect that the majority of prostate MRIs will be done for detection purposes.

Conclusions

In this paper the use of a CAD system in conjunction with the radiologist to accurately characterize prostate lesions was investigated. Results showed that a significant increase in diagnostic performance can be achieved when combining the radiologist PIRADS score and CAD system likelihood into a combination score. Furthermore, a significant correlation between CAD likelihood and cancer grade exists; this increases further when using the combination score.

25 in total

Review 1. Prostate cancer: multiparametric MR imaging for detection, localization, and staging.

Authors: Caroline M A Hoeks; Jelle O Barentsz; Thomas Hambrock; Derya Yakar; Diederik M Somford; Stijn W T P J Heijmink; Tom W J Scheenen; Pieter C Vos; Henkjan Huisman; Inge M van Oort; J Alfred Witjes; Arend Heerschap; Jurgen J Fütterer
Journal: Radiology Date: 2011-10 Impact factor: 11.105

2. Selective enhancement filters for nodules, vessels, and airway walls in two- and three-dimensional CT scans.

Authors: Qiang Li; Shusuke Sone; Kunio Doi
Journal: Med Phys Date: 2003-08 Impact factor: 4.071

3. Validation of the European Society of Urogenital Radiology scoring system for prostate cancer diagnosis on multiparametric magnetic resonance imaging in a cohort of repeat biopsy patients.

Authors: Daniel Portalez; Pierre Mozer; François Cornud; Raphaëlle Renard-Penna; Vincent Misrai; Matthieu Thoulouzan; Bernard Malavaud
Journal: Eur Urol Date: 2012-06-27 Impact factor: 20.096

4. EAU guidelines on prostate cancer. Part 1: screening, diagnosis, and treatment of clinically localised disease.

Authors: Axel Heidenreich; Joaquim Bellmunt; Michel Bolla; Steven Joniau; Malcolm Mason; Vsevolod Matveev; Nicolas Mottet; Hans-Peter Schmid; Theo van der Kwast; Thomas Wiegel; Filliberto Zattoni
Journal: Eur Urol Date: 2010-10-28 Impact factor: 20.096

5. Prospective assessment of prostate cancer aggressiveness using 3-T diffusion-weighted magnetic resonance imaging-guided biopsies versus a systematic 10-core transrectal ultrasound prostate biopsy cohort.

Authors: Thomas Hambrock; Caroline Hoeks; Christina Hulsbergen-van de Kaa; Tom Scheenen; Jurgen Fütterer; Stefan Bouwense; Inge van Oort; Fritz Schröder; Henkjan Huisman; Jelle Barentsz
Journal: Eur Urol Date: 2011-08-27 Impact factor: 20.096

6. Relationship between apparent diffusion coefficients at 3.0-T MR imaging and Gleason grade in peripheral zone prostate cancer.

Authors: Thomas Hambrock; Diederik M Somford; Henkjan J Huisman; Inge M van Oort; J Alfred Witjes; Christina A Hulsbergen-van de Kaa; Thomas Scheenen; Jelle O Barentsz
Journal: Radiology Date: 2011-05 Impact factor: 11.105

Review 7. How good is MRI at detecting and characterising cancer within the prostate?

Authors: Alexander P S Kirkham; Mark Emberton; Clare Allen
Journal: Eur Urol Date: 2006-06-30 Impact factor: 20.096

8. Prostate cancer aggressiveness: in vivo assessment of MR spectroscopy and diffusion-weighted imaging at 3 T.

Authors: Thiele Kobus; Pieter C Vos; Thomas Hambrock; Maarten De Rooij; Christina A Hulsbergen-Van de Kaa; Jelle O Barentsz; Arend Heerschap; Tom W J Scheenen
Journal: Radiology Date: 2012-07-27 Impact factor: 11.105

9. Magnetic resonance imaging for the detection, localisation, and characterisation of prostate cancer: recommendations from a European consensus meeting.

Authors: Louise Dickinson; Hashim U Ahmed; Clare Allen; Jelle O Barentsz; Brendan Carey; Jurgen J Futterer; Stijn W Heijmink; Peter J Hoskin; Alex Kirkham; Anwar R Padhani; Raj Persad; Philippe Puech; Shonit Punwani; Aslam S Sohaib; Bertrand Tombal; Arnauld Villers; Jan van der Meulen; Mark Emberton
Journal: Eur Urol Date: 2010-12-21 Impact factor: 20.096

10. ESUR prostate MR guidelines 2012.

Authors: Jelle O Barentsz; Jonathan Richenberg; Richard Clements; Peter Choyke; Sadhna Verma; Geert Villeirs; Olivier Rouviere; Vibeke Logager; Jurgen J Fütterer
Journal: Eur Radiol Date: 2012-02-10 Impact factor: 5.315

22 in total

1. Computer-aided diagnosis of prostate cancer with MRI.

Authors: Baowei Fei
Journal: Curr Opin Biomed Eng Date: 2017-09

2. Computer-aided diagnosis prior to conventional interpretation of prostate mpMRI: an international multi-reader study.

Authors: Matthew D Greer; Nathan Lay; Joanna H Shih; Tristan Barrett; Leonardo Kayat Bittencourt; Samuel Borofsky; Ismail Kabakus; Yan Mee Law; Jamie Marko; Haytham Shebel; Francesca V Mertan; Maria J Merino; Bradford J Wood; Peter A Pinto; Ronald M Summers; Peter L Choyke; Baris Turkbey
Journal: Eur Radiol Date: 2018-04-12 Impact factor: 5.315

3. PROSTATEx Challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images.

Authors: Samuel G Armato; Henkjan Huisman; Karen Drukker; Lubomir Hadjiiski; Justin S Kirby; Nicholas Petrick; George Redmond; Maryellen L Giger; Kenny Cha; Artem Mamonov; Jayashree Kalpathy-Cramer; Keyvan Farahani
Journal: J Med Imaging (Bellingham) Date: 2018-11-10

4. Amide proton transfer (APT) magnetic resonance imaging of prostate cancer: comparison with Gleason scores.

Authors: Yukihisa Takayama; Akihiro Nishie; Masaaki Sugimoto; Osamu Togao; Yoshiki Asayama; Kousei Ishigami; Yasuhiro Ushijima; Daisuke Okamoto; Nobuhiro Fujita; Akira Yokomizo; Jochen Keupp; Hiroshi Honda
Journal: MAGMA Date: 2016-03-10 Impact factor: 2.310

5. Multiparametric magnetic resonance imaging of the prostate with computer-aided detection: experienced observer performance study.

Authors: Valentina Giannini; Simone Mazzetti; Enrico Armando; Silvia Carabalona; Filippo Russo; Alessandro Giacobbe; Giovanni Muto; Daniele Regge
Journal: Eur Radiol Date: 2017-04-06 Impact factor: 5.315

Review 6. Artificial intelligence at the intersection of pathology and radiology in prostate cancer.

Authors: Stephnie A Harmon; Sena Tuncer; Thomas Sanford; Peter L Choyke; Barış Türkbey
Journal: Diagn Interv Radiol Date: 2019-05 Impact factor: 2.630

Review 7. The current role of MRI for guiding active surveillance in prostate cancer.

Authors: Guillaume Ploussard; Olivier Rouvière; Morgan Rouprêt; Roderick van den Bergh; Raphaële Renard-Penna
Journal: Nat Rev Urol Date: 2022-04-07 Impact factor: 16.430

Review 8. Computer-aided Detection of Prostate Cancer with MRI: Technology and Applications.

Authors: Lizhi Liu; Zhiqiang Tian; Zhenfeng Zhang; Baowei Fei
Journal: Acad Radiol Date: 2016-04-25 Impact factor: 3.173

9. Radiomics prediction model for the improved diagnosis of clinically significant prostate cancer on biparametric MRI.

Authors: Mengjuan Li; Tong Chen; Wenlu Zhao; Chaogang Wei; Xiaobo Li; Shaofeng Duan; Libiao Ji; Zhihua Lu; Junkang Shen
Journal: Quant Imaging Med Surg Date: 2020-02

10. Factors Influencing Variability in the Performance of Multiparametric Magnetic Resonance Imaging in Detecting Clinically Significant Prostate Cancer: A Systematic Literature Review.

Authors: Armando Stabile; Francesco Giganti; Veeru Kasivisvanathan; Gianluca Giannarini; Caroline M Moore; Anwar R Padhani; Valeria Panebianco; Andrew B Rosenkrantz; Georg Salomon; Baris Turkbey; Geert Villeirs; Jelle O Barentsz
Journal: Eur Urol Oncol Date: 2020-03-17