Literature DB >> 30013306

A Supervised Learning Tool for Prostate Cancer Foci Detection and Aggressiveness Identification using Multiparametric magnetic resonance imaging/magnetic resonance spectroscopy imaging.

John Papadimitrou¹, Gokhan Kirlik², Rao Gullapalli³, Warren D'Souza², Gazi Md Daud Iqbal², Michael Naslund³, Jade Wong³, Steve Roys³, Nilesh Mistry⁴, Hao Zhang².

Abstract

Prostate cancer is the most frequently diagnosed cancer in men in the United States. The current main methods for diagnosing prostate cancer include prostate-specific antigen test and transrectal biopsy. Prostate-specific antigen screening has been criticized for overdiagnosis and unnecessary treatment, and transrectal biopsy is an invasive procedure with low sensitivity for diagnosis. We provided a quantitative tool using supervised learning with multiparametric imaging to be able to accurately detect cancer foci and its aggressiveness. A total of 223 specimens from patients who received magnetic resonance imaging (MRI) and magnetic resonance spectroscopy imaging prior to the surgery were studied. Multiparametric imaging included extracting T2-map, apparent diffusion coefficient (ADC) using diffusion-weighted MRI, Ktrans using dynamic contrast-enhanced MRI, and 3-dimensional-MR spectroscopy. A pathologist reviewed all 223 specimens and marked cancerous regions on each and graded them with Gleason scores, which served as the ground truth to validate our prediction model. In cancer aggressiveness prediction, the average area under the receiver operating characteristic curve (AUC) value was 0.73 with 95% confidence interval (0.72-0.74) and the average sensitivity and specificity were 0.72 (0.71-0.73) and 0.73 (0.71-0.75), respectively. For the cancer detection model, the average AUC value was 0.68 (0.66-0.70) and the average sensitivity and specificity were 0.73 (0.70-0.77) and 0.62 (0.60-0.68), respectively. Our method included capability to handle class imbalance using adaptive boosting with random undersampling. In addition, our method was noninvasive and allowed for nonsubjective disease characterization, which provided physician information to make personalized treatment decision.

Entities: CellLine Chemical Disease Gene Species

Keywords: diagnostic imaging; multiparametric MRI/MRSI; predictive modeling; prostate cancer

Year: 2018 PMID： 30013306 PMCID： PMC6043929 DOI： 10.1177/1176935118786260

Source DB: PubMed Journal: Cancer Inform ISSN： 1176-9351

Background

Prostate cancer is the most frequently diagnosed cancer in men. The American Cancer Society estimated approximately 161 360 new cases of prostate cancer and about 26 730 deaths from prostate cancer in the United States in 2017.[1] Currently, the 2 main methods for diagnosing prostate cancer are prostate-specific antigen (PSA) test in conjunction with a digital rectal examination and transrectal biopsy. Prostate-specific antigen, which is measured by an immunoassay, has gained wide acceptance and approved by Food and Drug Administration as a serum tumor diagnostic marker in the management of prostate cancer.[2] However, recent studies have shown that some men with low PSA levels (<4.0 ng/mL) have prostate cancer and many men with high PSA levels do not have prostate cancer.[3] In addition, it has been shown that there is little to no reduction in prostate cancer–specific mortality resulting from PSA screening, and PSA screening may be responsible for overdiagnosis and unnecessary treatment.[4] The conflicting evidence on the benefit of PSA makes it an unreliable method for prostate cancer diagnosis.[5,6] The other commonly used diagnostic method, transrectal ultrasound (TRUS)-guided biopsy, uses a 12-core sampling of the prostate gland. It can result in cancers being missed if regions were not sampled.[7] Even when the biopsy does detect cancer, the localization of tumor within the gland remains imprecise.[8] Due to the imprecise nature and low sensitivity of the biopsy procedure, patients may need to undergo repeated biopsies or convert to MRI/US fusion or even other types of biopsies.[9] This may lead to either a delayed detection of aggressive cancer or unnecessary recurrent invasive biopsies in the absence of conclusive results.[10] Recently, multiparametric magnetic resonance (MR) imaging, which combines various functional MRI techniques with conventional T2-weighted imaging, has been established as a method for detection of prostate cancer.[11,12] The functional imaging techniques include diffusion-weighted imaging (DWI), dynamic contrast-enhanced MRI (DCE-MRI) and magnetic resonance spectroscopy imaging (MRSI). Apparent diffusion coefficient (ADC) values from DWI have been used to differentiate prostate tumors from normal tissue as the magnitude of diffusion of the prostate tumors is lower than the normal gland.[13] Several studies have shown that ADC values are associated with patients’ Gleason scores (GSs).[14-16] The DCE-MRI has also been used to differentiate malignant from normal tissues for the prostate gland.[17] And, MRSI aims to detect alterations in cellular metabolism that occur in prostate cancer.[18] It is known that using conventional T2-weighted imaging alone cannot identify the tumors within the prostate accurately.[19] To overcome this, DCE-MRI was combined with DWI to differentiate central gland cancer from benign prostatic hyperplasia.[20] The DWI, DCE-MRI, and MRSI were incorporated to predict prostate cancer aggressiveness.[21] One group combined T2-weighted imaging, DWI, DCE-MRI,[22] and another group combined T2-weighted MRI and ADC MRI[23] for prostate cancer detection. Although combining several data sources can improve the quality of prediction, extracting complex relationship from multiple sources can be challenging. Advanced predictive models are required in addition to quality imaging sources. Machine learning methods, such as logistic regression, have been proposed to identify prostate cancer.[24] However, the challenge is class imbalance, namely, the number of instances of one class (eg, indolent disease samples) far exceeds the other class (eg, highly aggressive cancer samples). If a classifier is created without considering class imbalance, the result could be biased toward the majority class. Several methods have been proposed to deal with class imbalance problem.[25] These methods can be categorized into 2 groups: cost sampling methods and data-level approaches.[26] The cost sampling methods use an asymmetric cost function to artificially balance the training process.[27] However, the data-level approaches turn the imbalanced problem into a balanced one by either oversampling the minority class (replicating minority class observations or creating synthetic data)[28,29] or undersampling (removing observations from the majority class).[30,31] For the cost sampling approaches, the performance of the model heavily relies on the cost parameters and the parameters are not known a priori. And if the correlation between the predictor and output variable is weak, which we have identified is the case for the multiparametric MRI/MRSI data and the GS, using oversampling has a negative effect on the predictive model. Hence, in this study, we used an undersampling approach to systematically deal with class imbalance and developed a noninvasive tool using multiparametric imaging data in supervised machine learning methods.

Methods

Patient cohort and specimen octants generation

Data were collected from 11 patients who had TRUS-guided biopsy-proven prostate cancer and elected to have radical prostatectomy received MRI/MRSI prior to their surgical procedure. The average PSA level of these patients was 9.4 (0.5-29.0) ng/mL. After radical prostatectomy, each prostate specimen was fixed in formalin and high-resolution MR images were obtained prior to whole mount sectioning of the prostate. Axial sections (3 mm) from the specimen were made using an in-house prostate slicer. Hematoxylin-eosin (H&E) staining was performed on 50-µm sections from each of the slides. Digital images of both the slice specimens and the pathologic slides were obtained, which were used to match to the MR images. After discarding unusable slices, the remaining 28 slices were subdivided into octants. This resulted in 223 octants (1 octant was not usable). A GS was given to each of the octant by a pathologist. In our data set, GSs range from 0 to 8, with 0 indicating no cancer cell identified, GS ⩽ 6 indicating indolent (slow-growing or nonaggressive), and GS > 6 indicating aggressive cancer. In Figure 1, we show the distribution of GS in our data set.

Figure 1.

Gleason score histogram.

Multiparametric MRI/MRSI

The following images were acquired: (a) conventional T2-weighted (T2W) images, (b) DWI-MRI, (c) DCE-MRI, and (d) MRSI covering the entire prostate using PRESS localization to attain MR spectroscopy score. Sample images are shown in Figure 2. This particular subject shows a tumor in the peripheral zone (arrows), and while it is difficult to locate the tumor foci on the T2W images and the T2-map, it can be readily detected using ADC and as areas of reduced ADC and elevated , respectively. Spectroscopy data from a selected voxel in the same region show elevated level of choline as compared with the normal tissue and a reduction in the citrate peak. This is the characteristic signature of higher-grade malignancy in the prostate. The location of the tumor with a GS of 7 is confirmed for this patient by histopathology using the H&E stain.

Figure 2.

An example of multiparametric imaging of prostate: Top row: T2-weighted (T2W) image, T2-map, H&E stain (histology). Bottom row: ADC map (DWI), (DCE-MRI), MR spectroscopy. Histology and MR images showing cancer as marked by the arrow, and corresponding spectra from the tumor showing low citrate and high choline. ADC indicates apparent diffusion coefficient; DCE-MRI, dynamic contrast-enhanced magnetic resonance imaging; DWI, diffusion-weighted imaging. From these images, we extracted 4 types of quantitative features for predictive modeling. From T2W, we use the average of values that measures the proton spin decay rate. From DWI-MRI, we extracted the ADC, which measures the magnitude of diffusion. From DCE-MRI, we obtained the volume transfer constant that was extracted using the Tofts kinetic model.[32] From MRSI, MR spectroscopy was extracted which is used to estimate the relative concentrations of biochemical compounds in the target area. The distributions of the 4 features were collected using percentiles (eg, 5th, 10th, 50th, 90th, and 95th percentiles). Then, the average and standard deviation of the values for the voxels within each percentile were calculated as input features for our next step predictive modeling. As an illustration, Figure 3 shows the correlation plot for the average of the 50th percentile features with the GS.

Figure 3.

Correlation plot of the average values of the 50th percentile voxels for features (ADC, , spectroscopy score, and T2) and the Gleason scores. ADC indicates apparent diffusion coefficient.

Predictive modeling via supervised machine learning

We considered 2 binary classification problems. In the first one, we aim to distinguish aggressive prostate cancer (GS > 6) from indolent disease and absence of cancer (GS ⩽ 6). In the second classification problem, we aim to detect cancerous samples (GS > 0). Before building a predictive model, it is critical to handle the class imbalance problem. As seen in Figure 1, the number of nonaggressive cancer samples was 187 and the number of aggressive ones was 36. The ratio was approximately 5:1. When there is an imbalanced distribution in the data set, a typical classifier would be biased toward one class because it has the goal of maximizing overall accuracy. Because there was a weak correlation between the features and the GS, as shown in Figure 3, oversampling approaches may increase the noise in data which deteriorates the quality of the predictive model. Therefore, we addressed the class imbalance with undersampling method which removes the observations from the majority class to turn the training data set into a balanced one. For the aggressive cancer prediction problem, the method eliminated observations from the class which included indolent disease and absence of cancer observations. For the cancer foci detection problem, the number of noncancerous samples was 96 and the number of cancerous samples was 127. The ratio was close to 1:1.3. Therefore, the problem is balanced. The machine learning model that was applied to extract complex relationship between the multiparametric imaging features and the GS was an ensemble method called boosting.[33] Boosting creates a highly accurate prediction model by combining multiple weak learners. Among the boosting method, we used the adaptive boosting method which is known as AdaBoost in the literature.[34] In our implementation of AdaBoost, we used decision trees as the weak learner, ie, final classifier is a combination of several decision trees with different weights. For the decision tree classifiers, we used Gini’s diversity index to decide a variable at each step that split the set of items and the minimum number of leaf node observations was set to 2. Training set for the AdaBoost consisted of feature and label pairs where the represented the features in domain , and the labels were known outcomes. In each iteration , where represented the number of iterations, a distribution was computed using the correctly and misclassified training samples, and a weak learner was applied to find a hypothesis that minimized the error relative to . Initially, for all . After all the iterations, multiple weak learners were obtained. The combined hypothesis led to the sign of a weighted combination of weak hypotheses: where is the weight of the weak classifier . An example of the AdaBoost is illustrated in Figure 4. Red and blue circles represent 2 different classes. The algorithm starts with equal weights for each observation in the training set at iteration (Figure 4A). For , the algorithm creates a weak classifier , which is represented by the line separating the 2 classes. Based on the results of the weak classifier , the algorithm updates the weights of the observations where misclassified observations are given higher weights. For , another weak classifier is created and weights are updated (Figure 4B). In this example, the total number of iterations is 3, ie, . Hence, the final classifier is .

Figure 4.

Weak classifier for different iterations. (A) At iteration t = 1, a weak classifier is created for D1 where each observation has the same weight. (B) At iteration t = 2, after updating the weights of the observations, a new weak classifier is obtained for D2. (C) At iteration t = 3, final weak classifier is generated which is h3. To evaluate the performance of the model, we tested the model using cross-validation which is a general model validation technique for assessing how the prediction of a model will be generalized to an independent data set.[35] In this technique, data are separated into , where is less than or equal to the number of observations in the data set. Then, one of the folds is kept as the test set, and the rest of the folds are used for training the model. This process is replicated for each fold in the data, ie, times. The process is illustrated in Figure 5. In this study, we used 10-fold cross-validation and repeated the 10-fold cross-validation 10 times to eliminate the bias and overfitting the data.

Figure 5.

Illustration of cross-validation.

Results

After testing the average and standard deviation of different percentiles (eg, 5th, 10th, 50th, 90th, and 95th percentiles) of the 4 imaging features, the average of the 50th percentile features performed the best. These 4 features were used to demonstrate the results. To separate aggressive prostate cancer (GS > 6) from indolent disease (GS ⩽ 6), we created models using 2 features at a time. Figure 6 shows the probability obtained from the classifiers for 6 possible combinations of the 4 imaging features. Using Figure 6A as an example, an adaptive boost model was created with ADC and values. Aggressive prostate cancer and indolent disease observations are represented with red and blue, respectively. Any point in the 2-dimensional space is shown with red, blue, or combination of red and blue based on the probability given by the classifier. Then, the decision boundaries for the aggressive prostate cancer (red) and indolent disease (blue) were obtained considering the probabilities (Figure 7). The final classifier separated 2-dimensional space into blue and red regions. For example, in Figure 7B, given ADC and values, the classifier predicts the aggressiveness of the cancer based on the color of the region that a point falls into. In the figure, the actual observations are shown as well.

Figure 6.

Figure 7.

Classifiers separating aggressive prostate cancer (red) and indolent disease (blue) from AdaBoost using combinations of 2 imaging features. A) ADC and T2, (B) ADC and Ktrans, (C) ADC and Spectroscore, (D) T2 and Ktrans, (E) T2 and Spectroscore, (F) Ktrans and Spectroscore.

Probability from AdaBoost representing aggressive prostate cancer (red) and indolent disease (blue) using combinations of 2 imaging features. (A) ADC and T2, (B) ADC and Ktrans, (C) ADC and Spectroscore, (D) T2 and Ktrans, (E) T2 and Spectroscore, (F) Ktrans and Spectroscore. Classifiers separating aggressive prostate cancer (red) and indolent disease (blue) from AdaBoost using combinations of 2 imaging features. A) ADC and T2, (B) ADC and Ktrans, (C) ADC and Spectroscore, (D) T2 and Ktrans, (E) T2 and Spectroscore, (F) Ktrans and Spectroscore. Figures 8 and 9 show the quantitative results of our methods from repetitions of 10-fold cross-validations. For distinguishing aggressive prostate cancer versus indolent disease (Figure 8), the averages and corresponding 95% confidence intervals of AUC, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were 0.73 (0.72-0.74), 0.72 (0.71-0.73), 0.73 (0.71-0.75), 0.34 (0.33-0.37), and 0.93 (0.92-0.94), respectively. For cancer foci detection (Figure 9), ie, classification between the absence of cancer (GS = 0) and presence of cancer (GS > 0), the averages and corresponding 95% confidence intervals of AUC, sensitivity, specificity, NPV, and PPV were 0.68 (0.66-0.70), 0.73 (0.70-0.77), 0.62 (0.60-0.68), 0.73 (0.71-0.76), and 0.65 (0.62-0.68), respectively.

Figure 8.

Figure 9.

Summary of prostate cancer foci detection accuracy from 10 runs of 10-fold cross-validations showing average and 95% confidence intervals of AUC, sensitivity, specificity, PPV, and NPV.

Summary of prostate cancer aggressiveness classification accuracy from 10 runs of 10-fold cross-validations showing average and 95% confidence intervals of AUC, sensitivity, specificity, PPV, and NPV. Summary of prostate cancer foci detection accuracy from 10 runs of 10-fold cross-validations showing average and 95% confidence intervals of AUC, sensitivity, specificity, PPV, and NPV.

Discussion

The current methods for prostate cancer diagnosis include PSA testing and transrectal biopsy. However, the accuracy of PSA testing is low with sensitivity around 20% for detecting any prostate cancer and around 50% for detecting high-grade prostate cancers.[36] However, biopsy is more reliable for prostate cancer diagnosis than PSA testing, but it is an invasive method. In a recent study, the reliability of a 12-core biopsy for prostate cancer detection was evaluated.[4] For patients with <4 ng/mL, (4-10) ng/mL and >10 ng/mL PSA levels, the sensitivities were 40%, 63%, and 76%, respectively. The average sensitivity for the whole test group was 59%. We provided a noninvasive supervised learning tool using multiparametric MRI/MRSI that achieved an average sensitivity of 73% compared with PSA and biopsy. When attempting to predict prostate cancer aggressiveness, previous studies excluded noncancerous observations (GS = 0). In this study, we included these observations while predicting the prostate cancer aggressiveness. Although this turned the classification problem difficult (as seen in Figure 6, the positive class [GS > 6] and the negative class [GS ⩽ 6] are very close to each other), it is more realistic and we were able to achieve an average AUC of 0.73 for prostate cancer aggressiveness prediction. A potential limitation of this study is that all our data were from patients with prostate cancer and we did not have healthy prostate data as control. However, many specimens were not cancerous (Figure 1). We tested the correlation between the GS of adjacent specimens. The correlation coefficient was 0.3. The correlation coefficient for specimens that were one more slice apart was 0.004. Therefore, it was a valid assumption to treat specimens as independent observations. We plan to include healthy prostate data in the future to test our tool. It was critical to be able to handle class imbalance when predicting prostate cancer aggressiveness. In practice, aggressive cancer only represents a small portion of the whole prostate. However, it is very important for the clinicians to be able to identify the aggressive cancer so that personalized treatment can be given. Dealing with class imbalance is still an ongoing research topic in machine learning field. And there were few studies which addressed this issue in prostate cancer prediction. In this study, the number of observations in one class (GS ⩽ 6) significantly outnumbered the other class (GS > 6) with the ratio of 5:1 (187/36). We demonstrated that our method of using undersampling in AdaBoost model was an effective way of handling class imbalance for prostate cancer aggressiveness prediction. After prostate cancer diagnosis, many types of treatments are available including radiotherapy, endocrine therapy, surgery, etc. For men diagnosed with aggressive cancer, the goal is to keep the disease from spreading. Physicians can treat these patients with localized therapies such as surgery and radiotherapy. And systemic treatments, such as hormonal therapy, can also be used for these patients. A recent study shows that a mix of different treatments improves survival of patients with Gleason 9 and 10.[37] If aggressive prostate cancer can be identified early using the tools provided in this work, these types of treatment can be considered by physician.

Conclusions

Our results on both cancer foci detection and aggressiveness classification problems showed that using multiparametric MRI/MRSI with machine learning method could provide clinicians a more accurate predictive tool for prostate cancer assessment. Adaptive boosting with random undersampling could accurately identify highly aggressive prostate cancer. This noninvasive method will allow for nonsubjective disease characterization, which provides physician information to make personalized treatment decisions.

26 in total

1. Exploratory undersampling for class-imbalance learning.

Authors: Xu-Ying Liu; Jianxin Wu; Zhi-Hua Zhou
Journal: IEEE Trans Syst Man Cybern B Cybern Date: 2008-12-16

2. Prostate MRI: evaluating tumor volume and apparent diffusion coefficient as surrogate biomarkers for predicting tumor Gleason score.

Authors: Olivio F Donati; Asim Afaq; Hebert Alberto Vargas; Yousef Mazaheri; Junting Zheng; Chaya S Moskowitz; Hedvig Hricak; Oguz Akin
Journal: Clin Cancer Res Date: 2014-05-21 Impact factor: 12.531

3. Dynamic contrast-enhanced magnetic resonance imaging in prostate cancer clinical trials: potential roles and possible pitfalls.

Authors: Fiona M Fennessy; Rana R McKay; Clair J Beard; Mary-Ellen Taplin; Clare M Tempany
Journal: Transl Oncol Date: 2014-02-01 Impact factor: 4.243

Review 4. In vivo magnetic resonance spectroscopy in cancer.

Authors: Robert J Gillies; David L Morse
Journal: Annu Rev Biomed Eng Date: 2005 Impact factor: 9.590

5. How reliable is 12-core prostate biopsy procedure in the detection of prostate cancer?

Authors: Ege Can Serefoglu; Serkan Altinova; Nevzat Serdar Ugras; Egemen Akincioglu; Erem Asil; M Derya Balbay
Journal: Can Urol Assoc J Date: 2013-05-13 Impact factor: 1.862

6. Automatic classification of prostate cancer Gleason scores from multiparametric magnetic resonance images.

Authors: Duc Fehr; Harini Veeraraghavan; Andreas Wibmer; Tatsuo Gondo; Kazuhiro Matsumoto; Herbert Alberto Vargas; Evis Sala; Hedvig Hricak; Joseph O Deasy
Journal: Proc Natl Acad Sci U S A Date: 2015-11-02 Impact factor: 11.205

Review 7. Multiparametric magnetic resonance imaging in the detection of prostate cancer.

Authors: T Durmus; A Baur; B Hamm
Journal: Aktuelle Urol Date: 2014-04-03 Impact factor: 0.658

8. Prevalence of prostate cancer among men with a prostate-specific antigen level < or =4.0 ng per milliliter.

Authors: Ian M Thompson; Donna K Pauler; Phyllis J Goodman; Catherine M Tangen; M Scott Lucia; Howard L Parnes; Lori M Minasian; Leslie G Ford; Scott M Lippman; E David Crawford; John J Crowley; Charles A Coltman
Journal: N Engl J Med Date: 2004-05-27 Impact factor: 91.245

Review 9. Performance of multiparametric magnetic resonance imaging in the evaluation and management of clinically low-risk prostate cancer.

Authors: Seyed Saeid Dianat; H Ballentine Carter; Katarzyna J Macura
Journal: Urol Oncol Date: 2013-06-17 Impact factor: 3.498

Review 10. Estimating kinetic parameters from dynamic contrast-enhanced T(1)-weighted MRI of a diffusable tracer: standardized quantities and symbols.

Authors: P S Tofts; G Brix; D L Buckley; J L Evelhoch; E Henderson; M V Knopp; H B Larsson; T Y Lee; N A Mayr; G J Parker; R E Port; J Taylor; R M Weisskoff
Journal: J Magn Reson Imaging Date: 1999-09 Impact factor: 4.813