Literature DB >> 31850209

Detection of Dominant Intra-prostatic Lesions in Patients With Prostate Cancer Using an Artificial Neural Network and MR Multi-modal Radiomics Analysis.

Hassan Bagher-Ebadian¹, Branislava Janic¹, Chang Liu¹, Milan Pantelic², David Hearshen², Mohamed Elshaikh¹, Benjamin Movsas¹, Indrin J Chetty¹, Ning Wen¹.

Abstract

Purpose: The aim of this study was to identify and rank discriminant radiomics features extracted from MR multi-modal images to construct an adaptive model for characterization of Dominant Intra-prostatic Lesions (DILs) from normal prostatic gland tissues (NT). Methods and Materials: Two cohorts were retrospectively studied: Group A consisted of 98 patients and Group B 19 patients. Two image modalities were acquired using a 3.0T MR scanner: Axial T2 Weighted (T2W) and axial diffusion weighted (DW) imaging. A linear regression method was used to construct apparent diffusion coefficient (ADC) maps from DW images. DILs and the NT in the mirrored location were drawn on each modality. One hundred and sixty-eight radiomics features were extracted from DILs and NT. A Partial-Least-Squares-Correlation (PLSC) with one-way ANOVA along with bootstrapping ratio techniques were recruited to identify and rank the most discriminant latent variables. An artificial neural network (ANN) was constructed based on the optimal latent variable feature to classify the DILs and NTs. Nineteen patients were randomly chosen to test the contour variability effect on the radiomics analysis and the performance of the ANN. Finally, the trained ANN and a two dimension (2D) convolutional sampling method were combined and used to estimate DIL-NT probability map for two test cases.
Results: Among 168 radiomics-based latent variables, only the first four variables of each modality in the PLSC space were found to be significantly different between the DILs and NTs. Area Under Receiver Operating Characteristic (AUROC), Positive Predictive and Negative Predictive values (PPV and NPV) for the conventional method were 94%, 0.95, and 0.92, respectively. When the feature vector was randomly permuted 10,000 times, a very strong permutation-invariant efficiency (p < 0.0001) was achieved. The radiomic-based latent variables of the NTs and DILs showed no statistically significant differences (Fstatistic < Fc = 4.11 with Confidence Level of 95% for all 8 variables) against contour variability. Dice coefficients between DIL-NT probability map and physician contours for the two test cases were 0.82 and 0.71, respectively.
Conclusion: This study demonstrates the high performance of combining radiomics information extracted from multimodal MR information such as T2WI and ADC maps, and adaptive models to detect DILs in patients with PCa.

Entities: CellLine Chemical Disease Gene Species

Keywords: artifical neural network (ANN); intraprostatic lesion; multiparametric MRI (mpMRI); prostate cancer; radiomics

Year: 2019 PMID： 31850209 PMCID： PMC6901911 DOI： 10.3389/fonc.2019.01313

Source DB: PubMed Journal: Front Oncol ISSN： 2234-943X Impact factor: 6.244

Introduction

Radiation Therapy (RT) has been proven to be an effective form of treatment for prostate cancer (PCa) and still is considered as one of the standard treatment options available. The current practice is to treat the entire prostate with a homogeneous dose distribution (1, 2). Escalated dose conformal radiotherapy has shown an advantage in biochemical progression-free survival but it is associated with the increase in acute and late toxicities (3). Simultaneous dose escalation to the dominant intra-prostatic lesions (DILs), while maintaining acceptable doses to the whole prostate gland has potential to improve therapeutic ratio for prostate cancer patients. A median dose to the entire gland could prevent the disease recurrence in the prostate from satellite tumors and significantly reduce the side effects associated with escalated radiation dose to the entire gland. A boosting dose to the DIL can maintain the effectiveness of focal therapy to treat the DIL that is the main determinant for tumor progression and prognosis. For this strategy to be successful, key requirements are the ability to accurately and reliably identify clinically significant tumors in the prostate gland. Among different imaging techniques, Magnetic Resonance Imaging (MRI) is used increasingly and provides clinicians and researchers with useful information for delineation of the prostate gland and clinically significant tumors in PCa patients (1, 2, 4). While multi-parametric (MP) MRI is well-established (5, 6) for detection of lesions and for staging of the disease, the sensitivity for small and lower grade lesions as well as spare tumors has been low (7) and MP-MRI has failed to improve the detection accuracy of lesions in the central gland (8). Furthermore, accurate and automatic delineation of DILs from prostate glandular tissue which is not a common practice, still remains a challenge. Radiomics analysis, which is defined as the post-processing for high throughput extraction of textural and intensity-based information from medical images, can play a central role toward detecting biomarkers for diagnosis and/or therapy of patients with cancer (9, 10). This study aims to identify discriminant radiomics features in the real radiomics-feature space and the latent-variable space (constructed from radiomics features in the space of Partial Least Square Correlation, PLSC) for construction of an adaptive model to classify DILs and NTs. The discriminant feature set in the PLSC latent-variable space can also be used for intra-tumoral segmentation and treatment response evaluation.

Methods and Materials

Patient Population, and Pre-processing

A total of hundred-seventeen patients consisted of the following two groups were studied: This group consisted of 98 PCa patients collected in Radboud University Nijmegen Medical Centre (11) and evaluated with Computer-Aided Diagnosis (CAD) (12, 13). Each MR study was read and reported by or under the supervision of an expert radiologist (Barentsz), with more than 20 years of experience in prostate MR. The radiologist indicated areas of suspicion with a score per modality using a point marker. If an area was considered likely for cancer a biopsy was performed. All biopsies were performed under MR-guidance and confirmation scans of the biopsy needle in situ were made to confirm accurate localization. Biopsy specimen were subsequently graded by a pathologist and the results were used as ground truth. Gleason grade groups for these patients are listed in Table 1, GroupA.

Table 1

Gleason Grade Group and PSA level of PCa patients for the two groups are shown in the table.

	Gleason grade group	No. of patients
GROUP A
	1	30
	2	39
	3	19
	4	5
	5	5
GROUP B
	1	5
	2	7
	3	3
	4-5	4
	PSA level	No. of patients
	<4	5
	4-10	8
	10–20	2
	>20	4

The PSA levels are not available for group A.

Gleason Grade Group and PSA level of PCa patients for the two groups are shown in the table. The PSA levels are not available for group A. All MR studies included T2-weighted (T2W) and diffusion-weighted (DW) imaging. The images were acquired on two different types of Siemens 3T MR scanners, the MAGNETOM Trio and Skyra. T2W images were acquired using a turbo spin echo sequence and had a resolution of around 0.5 mm in plane and a slice thickness of 3.6 mm. the DWI series were acquired with a single-shot echo planar imaging sequence with a resolution of 2 mm in-plane and 3.6 mm slice thickness and with diffusion-encoding gradients in three directions. Three b-values were acquired [50, 400, and 800 (sec-mm−2)], and subsequently, the ADC map was calculated by the scanner software. All images were acquired without an endorectal coil, as per the PI-RADS guidelines for acquisition of prostate MRI (14). Consisted of 19 patients (age range: 56–84, mean: 67) collected in our hospital, presented with increased PSA levels, suspicion in MR images, and biopsy-proven localized prostate carcinoma with no prior treatment. PSA and Gleason score of these patients are listed in Table 1, GroupB. All patients underwent an MP MRI study. An ultrasound guided needle biopsy was performed to confirm the diagnosis. Among 19 patients, 15 had histopathologically identified cancer in peripheral zone and 4 in the central gland. Two image modalities were acquired from the pelvis of all patients using a 3.0 T MR scanner (Ingenia, Philips Medical System, Best, the Netherlands) using small field of view as follows: Axial T2W Images (T2WI) acquired with Fast-Spin-Echo (TE/TR: 4389/110 ms, Flip Angle: 90° with image resolution of 0.42 × 0.42 × 2.4 mm3) and axial Diffusion Weighted Images (DWI) with two b-values [TE/TR:4000/85 ms, FA:90°, 1.79 × 1.79 × 0.56 mm3, b-values:0 and 1000 (sec-mm−2)]. The voxel-wise Apparent Diffusion Coefficient (ADC) map was constructed using two DWIs with two b-values. A large field of view transverse T2W sequences was also acquired to access the pelvic bones and lymph nodes. Image registration and lesion contouring was performed on in-house developed software.

Data Contouring and Harmonization

For each patient of group B, a radiologist with over 20 years of experience evaluated the axial T2WI and ADC maps and used the following criteria for delineation of DIL: Areas with a well-circumscribed, hypo-intense with the highest Gleason score in the prostate on T2WI and ADC map. DIL and the equivalent region in contralateral (normal prostatic glandular tissues, NT) were contoured on axial T2WI and ADC maps, respectively. To harmonize the data and make them independent from MR scanner gains (can affect weighted images), for each patient of both groups, the signal intensity of their DIL was normalized to the mean value of their corresponding normal volume prior to the radiomics analysis.

Radiomics Analysis

All data processing was performed off-line using a commercial software package (MATLAB 2016a, the MathWorks Inc., Natick, MA, 2000). For each patient, 168 radiomics features (15), from eight different categories, were extracted from DIL and NT volumes contoured on ADC maps and T2W images. The 8 feature categories (15), as detailed below and in Table 2, were classified as follows: Intensity Based Histogram Features (IBHF−9 features), Gray Level Run Length (GLRL−7 features), Law's Textural information (LAWS−18 features), Discrete Orthonormal Stockwell Transform (DOST−18 features), Local Binary Pattern (LBP−6 features), Two-Dimensional Wavelet Transform (2DWT−48 features), Two Dimensional Gabor Filter (2DGF−40 features), and Gray Level Co-Occurrence Matrix (GLCM−22 features) (15).

Table 2

Eight different radiomics feature categories along with a short explanation of each category is shown in this table.

Category	Number of radiomics features	Radiomics features
IBHF	9 features	Nine features are extracted from histogram of the pixel intensity values: 1-Mean, 2-Standard Deviation, 3-Skewness, 4-Kurtosis, 5-Entropy, 6-Central Moment of 3rd order, 7- Central Moment of 4th Order, 8- Central Moment of 5th Order, 9- Central Moment of 6th Order.
GLRL	7 features	Seven Gray Level Run Length texture descriptors were constructed based on the following emphasizes: Short Run Emphasis (SRE), Long Run Emphasis (LRE), Gray Level Non-Uniformity (GLN), Run Percentage (RP), Run Length Non-Uniformity (RLN), Low Gray Level Run Emphasis (LGRE), and High Gray Level Run Emphasis (HGRE).
LAWS	18 features	Nine textural maps were constructed by filtering the image data using the following convolution kernels: L5 = [1 4 6 4 1], E5 = [−1 −2 0 2 1], S5 = [−1 0 2 0 −1], R5 = [1 −4 6 −4 1] and then, 18 LAWS textural features were computed by applying and combining the energy and entropy operators on these maps as following: L5E5/E5L5, L5R5/R5L5, E5S5/S5E5, S5S5, R5R5, L5S5/S5L5, E5E5, E5R5/R5E5, and S5R5/R5S5.
DOST	18 features	The two-dimensional matrix of DOST coefficients was divided into nine equal segments and the energy and entropy of each segment was averaged over the tumor volume and eighteen features (nine energy along with nine entropy) were generated and used as the DOST radiomics features.
LBP	6 features	Local Binary Pattern algorithm with a radial filter (eight-neighborhood) was used to generate a two-dimensional LBP map and Entropy, Entropy, Mean, Standard Deviation, Skewness, and Kurtosis of the LBP maps were used as the six LBPF radiomics features.
2DWT	48 features	Two-dimensional Wavelet Transform with six decomposition levels for four different information attributes (Multi-resolution image, vertical, horizontal, and diagonal) was used to generate 24 maps of 2DWT information. Energy and entropy of the information maps were calculated and used as the 48 2DWT radiomics features.
2DGF	40 features	Two-dimensional Gabor (2DG) filter with five different scales for four different orientations generated 20 maps. Energy and entropy of the maps was averaged over the tumor volume and used as the 2DGT radiomics features.
GLCM	22 features	Gray-Level-Co-occurrence Matrix (GLCM) was generated and the following 22 features were measured from the GLCM using an 8-bit depth quantization: 1-Autocorrelation, 2-Contrast, 3-Correlation (2), 4-Correlation (1), 5-Cluster Prominence, 6-Cluster Shade, 7-Dissimilarity, 8-Energy, 9-Entropy, 10-Homogeneity (1), 11-Homogeneity (2), 12-Maximum probability, 13-Sum of squares(Variance), 14-Sum average, 15-Sum variance, 16-Sum entropy, 17-Difference variance, 18-Difference entropy, 19-Information measure of correlation (1), 20-Information measure of correlation (2), 21-Inverse difference normalized, and 22-Inverse difference moment normalized.

Eight different radiomics feature categories along with a short explanation of each category is shown in this table.

Feature Selection and Statistical Analysis

A Partial Least Square Correlation (PLSC) (16) technique combined with one-way analysis of variance (ANOVA) were recruited to identify the most discriminant PLSC latent variables constructed from radiomics features extracted from NTs and DILs of multimodal MR information (T2WI and ADC map). PLSC method which is also called as projection to latent structures, can relate the information present in two MR modalities in which collect measurements on the same set of observations (16, 17). The goal of the PLSC is to find pairs of latent vectors with maximal covariance and with the additional constraints that the pairs of latent vectors made from two different indices are uncorrelated and the coefficients used to compute the latent variables are normalized. As shown in Figure 1, two observation matrices were constructed using 168 radiomics features extracted from the two image modalities (T2WI and ADC) from total patients. A singular value decomposition (SVD) technique was used to analyze the common and discriminant information between the two observation matrices. For each MR modality, a latent vector was computed by the SVD technique and then it was tested by the ANOVA (with homoscedasticity assumption and confidence level of 0.95) to identify the most discriminant features in latent variable space between the features extracted from DIL and NT volumes in both groups. The Holm–Bonferroni method (18) was also used for circumventing the problem of multiple comparisons for the p-values. This method of p-value adjustment controls the familywise error rate and offers a uniform test, which is more powerful than the classic Bonferroni correction (18). Using the discriminant latent variable set identified by ANOVA, an optimal feature set for both modalities was identified and constructed.

Figure 1

The flowchart demonstrates different steps for the extraction of radiomics features from T2W images and ADC maps for DILs and normal tissues. As shown in this figure, for each MR modality, 168 radiomics features are extracted from normal and DIL volumes. The optimal feature set for the two MR modalities are identified using ANOVA applied on the latent variables generated by the PLSC technique for features with Silhouette coefficient of 0.5 and greater.

Feature Ranking Using Bootstrapping Ratio Technique

A bootstrapping ratio (16, 19, 20) and permutation test (10,000 times randomly repeated) were performed on the latent vectors of the features sets (extracted from T2WI and ADC) and the SVD was computed for each configuration and distribution of eigen values was used to estimate the ranking and efficiency of the radiomics features against random permutation. For radiomics feature ranking, bootstrap ratios were computed by dividing the mean of the bootstrapped distribution of a significant latent variable by its standard deviation. The bootstrap ratio is akin to a Student t criterion and so if a ratio is large enough (>2.00; because it roughly corresponds to 95% of confidence level for a t-test) then the variable is considered significant/important for the dimension. The bootstrap estimates a sampling distribution of a statistic by computing multiple instances of this statistic from bootstrapped samples obtained by sampling with replacement from the original sample (16, 19, 20).

Artificial Neural Networks: Architecture Optimization, Training, and Validation Strategies

Eight latent variables constructed from the radiomics information were identified as the optimal feature set and were used as the input to an artificial neural networks (ANN) with a feed-forward multilayer perceptron (MLP) architecture and back-propagation training algorithm (21) for classification of DILs and NTs. In this type of ANN, the nodes are organized in multiple layers; The ANN used in our study had three layers: the input layer, single intermediate layer, and the output layer (21, 22). Nodes were interconnected by weights in such a way that information propagates from one layer to the next, passing through a sigmoid (bipolar) activation function (22). Learning rate and momentum factors were set to control the internode weight adjustments during training (learning rate: 0.01, and Momentum: 0.01). A back propagation learning strategy (21) was employed for training the ANN in a supervised mode. In this strategy, a trial set of weights (the weight vectors, one vector for each layer of the ANN) was proposed. The initial weights were assigned randomly, and the same set of initial weights was saved and used for different trial during the leave-one-out method. The weight vectors were then adjusted to minimize some measure of error (in this case the Mean Square Error, MSE) between the output of the ANN and the training set. This procedure was performed iteratively across the entire data set using a batch processing mode to improve the convergence rate and the stability of training. The weight changes obtained from each training case were accumulated, and the weights updated after the entire set of training cases was evaluated. Batch processing improves stability, but with a tradeoff in reduction of the convergence (21–23). Two different training and validation strategies were recruited and tested as follows: Strategy 1: Leave-One-Out Cross-Validation (LOOCV) method, which is a particular case of the Leave-P-Out Cross Validation (called as Exhaustive Method) was employed for training, testing, and ANN architecture optimization (21, 22, 24–26). LOOCV was recruited to find the optimal structure, termination error, and validation of the ANN. As shown in Figure 2, this approach leaves one data point out of training data, i.e., if there are N data points in the original sample then, N-1 samples are used to train the model and 1 point is used as the validation. This is repeated for all combinations in which original sample can be separated this way, and then the error is averaged for all trials, to give overall effectiveness with less estimated bias (27). This method is generally preferred over the Leave-P-Out Cross Validation when the sample size is small since it does not suffer from the intensive computation, as number of possible combinations is equal to number of data points in original sample or N (28). Finally, to evaluate the stability of the optimal ANN against optimal number of training epochs, a series of ROC curves were generated by applying a threshold at the output of the randomly (100 times) trained ANN. The, the optimal cut-point which is the point closest-to- corner in the ROC plane was calculated. The optimal cut-point defines as the point minimizing the Euclidean distance between the ROC curve and the (0, 1) point (29). As the sensitivity (true positives) increases, the ANN can identify more cases with DIL, while the accuracy on identifying NTs (specificity) are sacrificed. Cut-points dichotomize the test values, so this provides the classification (DIL or not). Simultaneous assessment of sensitivity and specificity is used to estimate the cut-point value which is considered as optimal when the point classifies most of the individuals correctly (29, 30).

Figure 2

This figure demonstrate three major phases as follows: Training, optimization, and evaluation phases for the ANN using the leave-one-out technique and area under correct classification fraction.

This figure demonstrate three major phases as follows: Training, optimization, and evaluation phases for the ANN using the leave-one-out technique and area under correct classification fraction. To measure how accurately the ANN matched the whole input dataset with the entire identifier set, the ANN's Correct-Classification-Fraction (CCF: True Positive plus True Negative, TP+TN) curve was generated at different levels of epochs during the LOOCV procedure. The area under Receiver-operating characteristic (AUROC, Az-value) curves (21, 22, 24, 25) for the ANN that is an index of predictive performance, was used to compare the ANN's performance in determining the optimal architecture of the ANN, and also finding the termination error (avoid overfitting) for training the optimal ANN. Strategy 2: For each discriminant latent variable, the data of the patient group A (96 patients) was split 100 times into training and validation components. In each data split, two-thirds (67%) of the entire dataset was randomly sampled and used as a training set and the remaining one-third (33%) was used as the unseen cohort or validation dataset (31). Using the training and validation sets for each of the 100 iterations, the ANN was trained and validated separately for each discriminant latent variable. The same procedure was repeated for the set of eight latent variables. The AUROC, Positive Predictive value (PPV) and Negative Predictive value (NPV) were computed for each trial and were averaged to evaluate ANN classification performance for each discriminant latent variable and the set of eight latent variables. All data processing and classifier implementation were performed using a series of in-house codes developed in the MATLAB environment.

Testing of Data Harmonization, Feature Consistency, and Generalization Error

Data harmonization refers to all efforts to combine different datasets collected by different scanners in different institutions. Finally, in order to test the consistency of the identified discriminant latent variables against the data harmonization and also testing the performance of the classifiers against prospective/unseen datasets (ANN generalization error), the following sub-analysis was conducted: An ANN was trained using the eight discriminant latent variables (constructed from radiomics information) extracted from patients information of group A. The trained ANN was then applied on the eight discriminant latent variables (constructed from radiomics information) extracted from patient information of group B (as test set or unseen patient cohorts). Ultimately, a ROC analysis was performed on the predictions of the trained ANN and AUROC, NP, and PP values for the unseen testing cohort (group B) were calculated.

Contour Variability Test

Nineteen patients were randomly chosen from hundred-seventeen patients and their DIL and NT contours were modified by scaling the contours by a factor of 1.2 in all directions followed by a 1 voxel shift in all directions and their modified contours were used to repeat the radiomics and PLSC analyses and ANOVA method was used to test the sensitivity of the latent variables against contour variability.

Tumor Probability Map

The trained ANN and a two dimension (2D) convolutional sampling method window size = 25 × 25) were combined and used to estimate DIL-NT probability map for two test cases. Dice coefficients between the DIL contours and the DIL patch estimated from the probability maps (P > 0.001) for the two cases were calculated and compared.

Results

A flowchart demonstrating different steps for extracting radiomics features from T2W images and ADC maps for DILs and NTs are shown in Figure 1. As shown in the figure, for each MR modality, 168 radiomics features were extracted from each of the NTs and DILs and finally, the optimal discriminant latent feature set for the two MR modalities were identified using a PLSC technique and ANOVA. Table 3 shows feature ranking results based on the PLSC and bootstrapping ratio techniques for the first 10 significant radiomic features of two MR modalities. Figures 3A,B demonstrate the scatter plots of the first three PLSC latent variables for T2WI and ADC, respectively. Figures 3C,D demonstrate the permutation tests for the inertia explained by the PLSC of the T2WI and ADC map along with their observed inertia for the 10,000 permutations.

Table 3

Feature ranking based on the PLSC and Bootstrapping techniques for the first 10 significant radiomic features of two MR modalities.

Feature no.	Image modality	Radiomics feature	Radiomics feature category	Bootstrapping ratio (mean/std)
1	T2WI	LBP_Energy	LBP	21412.01
2	ADC Map	LBP_Energy	LBP	410.83
3	T2WI	RLN	GLRL	159.69
4	ADC Map	RLN	GLRL	70.32
5	T2WI	GLN	GLRL	35.29
6	T2WI	HGRE	GLRL	35.28
7	ADC Map	DOST_ENTROPY_22	DOST	25.13
8	ADC Map	ENG_GAB02	2DGF	22.99
9	ADC Map	LBP_KURTOSIS	LBP	22.95
10	T2WI	LBP -KURTOSIS	LBP	22.92

Figure 3

(A,B) Clusters of NTs and DILs for each latent variable are well-separated with less diffusivity. It confirms that the distribution of the identified latent variable (PLSC-ANOVA) in the feature space is well-matched to its own cluster (less scattered) and poorly diffused to its neighboring clusters for the MR modalities. (C,D) Show the results of the permutation tests for the inertia explained by the PLSC of T2WI and ADC map for 10,000 permutations. As shown in the subfigures, the observed value (shown by vertical arrows) were never obtained in the 10,000 permutations for both modalities. Therefore, it is concluded that PLSC extracted a significant amount of common variance between these two modalities with P < 0.0001.

Feature ranking based on the PLSC and Bootstrapping techniques for the first 10 significant radiomic features of two MR modalities. (A,B) Clusters of NTs and DILs for each latent variable are well-separated with less diffusivity. It confirms that the distribution of the identified latent variable (PLSC-ANOVA) in the feature space is well-matched to its own cluster (less scattered) and poorly diffused to its neighboring clusters for the MR modalities. (C,D) Show the results of the permutation tests for the inertia explained by the PLSC of T2WI and ADC map for 10,000 permutations. As shown in the subfigures, the observed value (shown by vertical arrows) were never obtained in the 10,000 permutations for both modalities. Therefore, it is concluded that PLSC extracted a significant amount of common variance between these two modalities with P < 0.0001. Figure 4A shows correct classification fraction (CCF = TP + TN) of the optimal ANN at different training epochs for LOOCV technique. The epoch corresponding to 10% change in plateau for the optimum architecture (8:5:1) was used as the stopping epoch (epoch = 17) of the ANN. Figure 4B shows TP, TN, false positive (FP), and false negative (FN), of the optimal ANN at different training epochs.

Figure 4

(A) Shows true positive plus true negative (TP+TN) of the optimal ANN (8:5:1) at different training epochs. (B) Shows true positive, true negative, false positive, and false negative of the optimal ANN at different training epochs. (C) Demonstrates the area under receiver operating characteristic (AUROC, Az test) value for different ANN structures. As shown in this figure, the ANN with five neurons in its only hidden layer shows the highest performance and is chosen as the optimal ANN. (D) Shows the average ROC of the optimal ANN along with optimal-cut-point of the ANN. The AUCCF values for different ANN structures for LOOCV technique are shown in Figure 4C. As shown in this figure, the ANN with five neurons in its only hidden layer shows the highest performance (A = 0.95) and is chosen as the ANN with optimal structure. Figure 4D shows the average AUROC of the ANN generated for randomly (100 times) trained ANNs along with the optimal cut-point (OCP = 0.96). Given the average AUROC (Az test ~ 0.96), the optimal cut-point of the ANN, and the eigen value distributions for the randomly permuted (10,000 permutations) radiomics features, the generalization error of the ANN was about 4% with a very strong permutation-invariant efficiency, p < 0.0001) against the order of the latent variables. AUROC, PPV, and NPV for the conventional method were 94%, 0.95, and 0.92, respectively. ROC analyses for eight individual latent variables (4 for T2WI and 4 for ADC) are shown in Figure 5. Figures 5A–D demonstrate the ROC analyses of the ANN for the first 4 latent variables constructed from T2WI for 100 random iteration corresponding to a different division of training and validation data of group A while Figures 5E–H depict the corresponding information for the ADC map. Table 4 shows AUROC, NPV, and PPV values along with their confidence intervals measured for each individual latent variable for 100 iterations (each corresponding to a different division of training and validation datasets).

Figure 5

Table 4

This table shows AUROC, NPV, and PPV values along with their confidence intervals measured for each individual latent variable for 100 iterations (each corresponding to a different division of training and validation datasets).

Latent variable	AUROC	AUROC-CI	PPV	PPV-CI	NPV	NPV-CI
First latent variable (T2WI)	0.87	0.86–0.89	0.86	0.84–0.88	0.78	0.76–0.79
Second latent variable (T2WI)	0.79	0.72–0.85	0.84	0.78–0.90	0.71	0.68–0.74
Third latent variable (T2WI)	0.76	0.75–0.80	0.72	0.67 0.74	0.72	0.70–0.74
Fourth latent variable (T2WI)	0.66	0.64–0.68	0.58	0.57–0.60	0.69	0.65–0.73
First latent variable (ADC)	0.91	0.90–0.92	0.88	0.85–0.90	0.82	0.80–0.84
Second latent variable (ADC)	0.88	0.86–0.89	0.87	0.85–0.89	0.81	0.80–0.83
Third latent variable (ADC)	0.79	0.72–0.87	0.80	0.75–0.85	0.83	0.81–0.86
Fourth latent variable (ADC)	0.74	0.72–0.76	0.66	0.64–0.67	0.81	0.78–0.85

(A–D) Depict ROC curves corresponding to 100 iterations each corresponding to a different division of training and validation datasets for ANN for the T2WI latent variables number 1–4. (E–H) depict ROC curves corresponding to 100 iterations each corresponding to a different division of training and validation datasets for ANN for the ADC latent variables number 1–4. As shown in this figure for each modality, from left to right as the order of latent variable increases the information content or discrimination power of the variable for classification deceases. (I) illustrates a family of ROC curves for 100 iterations, each corresponding to a different division of training and validation datasets for ANN for all 8 latent variables. (J) shows the response of the trained ANN against an unseen/prospective dataset (trained with group A and tested with group B). This table shows AUROC, NPV, and PPV values along with their confidence intervals measured for each individual latent variable for 100 iterations (each corresponding to a different division of training and validation datasets). As shown in Figure 5I, for the conventional training and validation method, the average AUROC, PPV and NPV were 95%, 0.96, and 0.93, respectively. Figure 5J shows the response of the trained ANN (group A) when it was applied on group B. The performance of the trained ANN (using group A dataset) when it was applied on the unseen data cohort (group B) was: Sensitivity/Specificity = 0.95/0.94. The radiomic-based latent variables of the NTs and DILs showed no statistically significant differences (Fstatistic for all 8 latent variables were smaller than Fcritical = 4.11, with Confidence Level of 95%) against contour variability. Figures 6A–F, illustrate T2WI, ADC map, and lesion probability map for a slice of prostate gland of two different patients estimated by the trained ANN using a 2D-convolutional sampling method (window size = 25 × 25). Dice coefficients between DIL-NT probability map and physician contours for the two test cases were 0.82 and 0.71, respectively.

Figure 6

(A–F) illustrate T2WI, ADC map, and lesion probability map for a slice of prostate gland for two different patients estimated by the trained PLSC-ANN using a 2D-convolutional sampling method (window size = 25 × 25).

Discussion

Recent studies have shown that cancerous tissues are spatially heterogeneous due to factors, such as cell structures, genes, protein contents, cell morphologies, tumor microenvironment, and physiology (32). Indeed, the main purpose of using radiomics is to reveal and extract additional information from medical imaging modalities, associated with macroscopic and microscopic image-based features that have the potential to serve as surrogates for pathophysiological and radiological parameters, such as tumor heterogeneity level, pathology, response to a given therapy, decoration and distribution of information in images, and structural and image-based patterns in digital images. In our study, given the variation and nature of the radiomics features, we extracted multi scale information in form of features from the prostate gland to characterize normal prostatic tissue and tumor phenotypes from multi model MRI. The PLSC technique used in this study allowed the finding of shared information between the two image modalities (T2WI and ADC). This approach is equivalent to a correlation problem (16, 17, 33) and provided descriptive features from multivariate information in form of latent variables which are optimal linear combinations of the variables extracted from the two image modalities. Partial least square (PLS) method that benefits from projecting feature information on latent structures, relates the information present in two data tables (modalities) that collect measurements on the same set of observations (16). PLSC latent variables constructed on the basis of radiomics information extracted from DIL and NT consists of all radiomics features and can help reveal variations of descriptive features or discriminant parameters for classification of DIL from NT. An adaptive classifier (such as ANN) provides capability of implicitly detecting complex non-linear relationships between dependent and independent radiomics variables (already found as optimal feature set in latent variable space) and their variations, modeling their non-linear changes as well as detecting all possible interactions between the predictor variables. As shown in Figures 3A,B, clusters of NTs and DILs for each latent variable are well-separated with less diffused marginal points in the feature space. It confirms that the distribution of the identified latent variable (PLSC-ANOVA) in the PLSC space is well-matched to its own cluster (less scattered) and poorly diffused to its neighboring clusters. Figures 3C,D show the results of the permutation tests for the inertia explained by the PLSC of T2WI and ADC map for 10,000 permutations. The observed value (shown by vertical arrows) were never obtained in the 10,000 permutations for both modalities. Therefore, it is concluded that PLSC technique was able to successfully extract significant amount of common variances between these two modalities with p-value smaller than 0.0001. Recruitment of PLSC technique and ANOVA in this study allowed robust comparison and revealing of the correlation and descriptive power of different radiomics features extracted from the two MR modalities, while providing more predictive accuracy and a much lower chance of risk for the two sets of features affecting each other. The major limitations could be the sensitivity to the relative scaling of the descriptor variables that was addressed by the standardization and harmonization steps prior to the feature extraction. Recent studies (34–39) have shown that ADC measurements are affected by the user selected repetition time (TR) values, especially if it is comparable to the relaxation time. The degree of TR dependence is also codependent on another parameter called number of diffusion preparation pulses. Similar to TR dependence of ADC values, it is expected that there could be an echo time (TE) dependence on ADC values. In fact, Wang et al. (39) found a modest correlation between TE and ADC values in the prostate. It has been shown that tissue specific relaxation time parameters such as T1 and T2 and imaging parameters such as TR and TE affects the optimum b-value for different anatomies, tissues, and even lesion types within the same organ. Therefore, since the ADC value could be highly and “non-linearly” affected by the MR imaging parameters (34–39), in this study, as part of data harmonization, normalization to normal volume was performed to suppress the effect of the MR imaging parameters on the ADC values. Such normalization made the ANN independent and less sensitive to the MR imaging parameters for prospective patients whom could be scanned with different scanners or different imaging parameters. As shown in Figure 5 and according to the statistical measures reported in Table 4, as it is expected, for each modality, from left to right (Figures 5A–D or Figures 5E–H), as the order of the latent variable increases the information content or discrimination power of the variable for DIL classification deceases. As shown in Table 4 and Figure 5, the analysis results strongly confirm that compared to T2WI modality, the ADC modality is more discriminative with higher information content for the classification of DILs and NTs. The application of novel machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer have been proposed by several research groups (40–42). Moreover, different features extracting strategies are proposed to improve the DIL detection performance (40). ANNs have been used in different fields on a variety of tasks such as computer vision, speech recognition, machine translation, social network filtering, medical diagnosis, and in many other domains. There have been numerous applications of ANNs within medical decision-making (26, 43, 44). It has been shown that ANNs have unique properties including robust performance in dealing with noisy or incomplete input patterns, high fault tolerance, and the ability to generalize from the training data (26, 43). The adaptive model constructed in this study can benefit from the ANN's properties stated above and can distinguish DILs from NTs with almost uniform sensitivity at different levels of specificities (see Figures 4A,B, 5I). The stability (lesions being non-patchy and uniform) of the predicted DILs and NTs in the probability maps (shown in Figure 6) clearly confirm the robustness of the PLSC-ANN technique in information extraction from the two MR modalities. The proposed ANN in this study was trained without any data augmentation. The results implied that the trained ANN can also evaluate any suspicious lesion in different zones of the prostate gland (PZ or TZ) regardless of its Gleason score. Our study also confirms that the most discriminant features are textural-based features and given the bootstrapping feature ranking results, it can be concluded that frequency or arrangement-based features (LBP, GLRL, DOST, and 2DGF, see Table 3, a measure of the decoration or disorder of information distribution within a region), that are associated with subtle and descriptive information content of the two image modalities, play a key role in discrimination of DIL from NT. Also, we did not include morphological features such as volume, shape, solidity, convexity, eccentricity, and etc. in order to eliminate any possible biasing result from the manual contouring of DILs and NTs. In this study, DIL and the NT contours were separately drawn on each image modality. While such a process could increase the chance of contour variability and negatively increase the variation of the data, it had an advantage that the two image modalities (T2W images and ADC map) did not necessarily need to be co-registered to each other prior to the radiomic analysis and adaptive modeling and therefore, the analysis results were not negatively affected by any possible co-registration errors. DILs and NTs contoured on unregistered image modalities were directly used for training and testing of the ANN. We only co-registered the two image modalities (T2WI and ADC map) using rigid co-registration [affine transform (45)] method for the two test cases (see Figure 6) to predict DIL-NT probability map using the trained ANN and 2D-convolutional sampling method. The current major computer aided diagnosis systems recorded AUROC performance ranging from 0.77 to 0.89 and the focus was to detect lesions in the peripheral zone. Most image features, either individually or in combination that were effective in the differentiation of prostate cancer, are volume averaged quantities such as the 10th percentile of the ADC, T2W signal intensity skewness (46). Niaf et al. studied texture features extracted from MP-MRI on 30 fully annotated patients using four different feature selection and classification methods (47). They could achieve a diagnostic performance of 0.89 but the study was limited to the peripheral zone only. The performance was poorer due to the overfitting problem when all features were used for classification. In this study, despite using 117 subjects (two cohorts: 96, and 19) with two different training and validation strategies, there are still several challenges as follows: Compared to the number of radiomics features, the study is limited by the number of patients, which will impact the optimal features selected, and also might render a predictive model susceptible to Type II errors. A larger sample size will also allow the construction of a more reliable ANN in order to draw a reliable and unequivocal conclusion. In this study, two different training and validation strategies were recruited and the strong agreement between the analysis results confirmed the robustness of the identified features. In the first strategy, employing the LOOCV method in this study, allowed us to use a high proportion of the available training data fraction (1–1/K = 0.99 for K = 117), for training, while making use of all the data in estimating the generalization error or agreement. The cost is that the process can be lengthy, since we need to train and evaluate the network K times. Typically, according to the literatures, K ≈ 10 is considered reasonable (48). In this study, K was set to 117 for 117 patients (one case with DIL and NT in each fold) and the ANN had a single output, to predict the outcome. The radiomics features selected might be impacted by the intensities, size of the contour, and contrast of the NT. Since the region of interests were delineated manually, the accuracy and variability of the ROIs could impact on the optimal feature selection and the training results. The Az-test for the average ROC analysis of the ANN is 1% higher than the Az-test of the optimal ANN (see Figures 3C,D). This is due to the difference between the way the two tests are conducted: for average AUROC, each NT or DIL from each subject is considered as a sample (thus the total samples are equal to 234) while in the ordinary Az-test for the optimal ANN, pair of NT and DIL for each subject is considered as a sample (thus the total samples are equal to 117). Strong agreement between the statistical measures of the LOOCV and conventional methods and also the high predictive power of the trained ANN (group A) when it was applied on group B (as prospective or unseen data cohort), confirm the consistency and high information content of the discriminant features identified in this study. The 2D-convolutional sampling analysis results presented in Figure 6, imply that the trained-ANN is capable of estimating the DIL and normal tissue probabilities when the target contour (the 2D window) consists of a mixed radiomic information extracted from DIL and normal tissue. ANN was implanted as a classifier since it has high tolerance against variation of input feature components and contours (according to the contour variability test results) while they are less sensitive to random noise (49), which allows the construction of a variation- and noise-insensitive adaptive classifier with higher accuracy and speed. Most importantly, ANN considers non-linear relationships among input data that cannot always be recognized by conventional analyses. Results of the permutation test also imply that the discriminant features used for training, are reliable and efficient for classification.

Conclusion

In conclusion, this study demonstrates the high performance of combining radiomics analysis, PLSC technique and adaptive model for extracting and ranking features from multimodal MR information such as T2WI and ADC maps to detect DILs and NTs in patients with PCa. The radiomics information of ADC modality was proved to have higher discrimination power compared to the corresponding features extracted from T2WI modality. Results are suggestive that the integration of quantitative image analysis methods such as radiomics analysis and PLSC technique when combined with an adaptive model can help identify imaging biomarkers and show great potential to help clinicians improve the classification of clinically significant prostate lesions for therapy of prostate cancer.

Data Availability Statement

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

Ethics Statement

The studies involving human participants were reviewed and approved by The Internal Review Board at Henry Ford Health System. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author Contributions

HB-E and NW designed the research and methodology. HB-E, CL, BJ, and NW performed the research. HB-E and CL contributed to the data pre-processing. HB-E developed statistical analysis, PLSC, ANN training, and validation as well as the development of analytical tools. HB-E and NW wrote the paper. MP and BJ investigated the data and also contoured and labeled the tumors and normal tissues on the MR images using the pathology images. DH helped with the implementation of the MR pulse sequences. HB-E, ME, BM, IC, and NW advised and mentored the study.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

41 in total

1. Effect of imaging parameters on the accuracy of apparent diffusion coefficient and optimization strategies.

Authors: Azim Celik
Journal: Diagn Interv Radiol Date: 2016 Jan-Feb Impact factor: 2.630

2. The inconsistency of "optimal" cutpoints obtained using two criteria based on the receiver operating characteristic curve.

Authors: Neil J Perkins; Enrique F Schisterman
Journal: Am J Epidemiol Date: 2006-01-12 Impact factor: 4.897

3. On the impact of smoothing and noise on robustness of CT and CBCT radiomics features for patients with head and neck cancers.

Authors: Hassan Bagher-Ebadian; Farzan Siddiqui; Chang Liu; Benjamin Movsas; Indrin J Chetty
Journal: Med Phys Date: 2017-04-17 Impact factor: 4.071

4. How to compute reliability estimates and display confidence and tolerance intervals for pattern classifiers using the Bootstrap and 3-way multidimensional scaling (DISTATIS).

Authors: Hervé Abdi; Joseph P Dunlop; Lynne J Williams
Journal: Neuroimage Date: 2008-11-24 Impact factor: 6.556

5. Imaging parameter effects in apparent diffusion coefficient determination of magnetic resonance imaging.

Authors: Akio Ogura; Katsumi Hayakawa; Tosiaki Miyati; Fumie Maeda
Journal: Eur J Radiol Date: 2009-07-31 Impact factor: 3.528

6. Multiparametric 3T prostate magnetic resonance imaging to detect cancer: histopathological correlation using prostatectomy specimens processed in customized magnetic resonance imaging based molds.

Authors: Baris Turkbey; Haresh Mani; Vijay Shah; Ardeshir R Rastinehad; Marcelino Bernardo; Thomas Pohida; Yuxi Pang; Dagane Daar; Compton Benjamin; Yolanda L McKinney; Hari Trivedi; Celene Chua; Gennady Bratslavsky; Joanna H Shih; W Marston Linehan; Maria J Merino; Peter L Choyke; Peter A Pinto
Journal: J Urol Date: 2011-09-25 Impact factor: 7.450

7. Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies.

Authors: Lal Hussain; Adeel Ahmed; Sharjil Saeed; Saima Rathore; Imtiaz Ahmed Awan; Saeed Arif Shah; Abdul Majid; Adnan Idris; Anees Ahmed Awan
Journal: Cancer Biomark Date: 2018-02-06 Impact factor: 4.388

8. Multi-kernel graph embedding for detection, Gleason grading of prostate cancer via MRI/MRS.

Authors: Pallavi Tiwari; John Kurhanewicz; Anant Madabhushi
Journal: Med Image Anal Date: 2012-12-13 Impact factor: 8.545

9. Predicting final extent of ischemic infarction using artificial neural network analysis of multi-parametric MRI in patients with stroke.

Authors: Hassan Bagher-Ebadian; Kourosh Jafari-Khouzani; Panayiotis D Mitsias; Mei Lu; Hamid Soltanian-Zadeh; Michael Chopp; James R Ewing
Journal: PLoS One Date: 2011-08-10 Impact factor: 3.240

10. ESUR prostate MR guidelines 2012.

Authors: Jelle O Barentsz; Jonathan Richenberg; Richard Clements; Peter Choyke; Sadhna Verma; Geert Villeirs; Olivier Rouviere; Vibeke Logager; Jurgen J Fütterer
Journal: Eur Radiol Date: 2012-02-10 Impact factor: 5.315

5 in total

1. Evaluation of the Efficiency of MRI-Based Radiomics Classifiers in the Diagnosis of Prostate Lesions.

Authors: Linghao Li; Lili Gu; Bin Kang; Jiaojiao Yang; Ying Wu; Hao Liu; Shasha Lai; Xueting Wu; Jian Jiang
Journal: Front Oncol Date: 2022-07-05 Impact factor: 5.738

2. Ultra high speed SPECT bone imaging enabled by a deep learning enhancement method: a proof of concept.

Authors: Boyang Pan; Na Qi; Qingyuan Meng; Jiachen Wang; Siyue Peng; Chengxiao Qi; Nan-Jie Gong; Jun Zhao
Journal: EJNMMI Phys Date: 2022-06-13

3. Differentiating Between Multiple Myeloma and Metastasis Subtypes of Lumbar Vertebra Lesions Using Machine Learning-Based Radiomics.

Authors: Xing Xiong; Jia Wang; Su Hu; Yao Dai; Yu Zhang; Chunhong Hu
Journal: Front Oncol Date: 2021-02-24 Impact factor: 6.244

Review 4. Radiomics in prostate cancer: an up-to-date review.

Authors: Matteo Ferro; Ottavio de Cobelli; Gennaro Musi; Francesco Del Giudice; Giuseppe Carrieri; Gian Maria Busetto; Ugo Giovanni Falagario; Alessandro Sciarra; Martina Maggi; Felice Crocetto; Biagio Barone; Vincenzo Francesco Caputo; Michele Marchioni; Giuseppe Lucarelli; Ciro Imbimbo; Francesco Alessandro Mistretta; Stefano Luzzago; Mihai Dorin Vartolomei; Luigi Cormio; Riccardo Autorino; Octavian Sabin Tătaru
Journal: Ther Adv Urol Date: 2022-07-04

Review 5. Artificial Intelligence Based Algorithms for Prostate Cancer Classification and Detection on Magnetic Resonance Imaging: A Narrative Review.

Authors: Jasper J Twilt; Kicky G van Leeuwen; Henkjan J Huisman; Jurgen J Fütterer; Maarten de Rooij
Journal: Diagnostics (Basel) Date: 2021-05-26

5 in total