Literature DB >> 29998885

Diagnostic Performance and Interobserver Consistency of the Prostate Imaging Reporting and Data System Version 2: A Study on Six Prostate Radiologists with Different Experiences from Half a Year to 17 Years.

Zan Ke1, Liang Wang1, Xiang-De Min1, Zhao-Yan Feng1, Zhen Kang1, Pei-Pei Zhang1, Ba-Sen Li1, Hui-Juan You1, Sheng-Chao Hou2.   

Abstract

BACKGROUND: One of the main aims of the updated Prostate Imaging Reporting and Data System Version 2 (PI-RADS v2) is to diminish variation in the interpretation and reporting of prostate imaging, especially among readers with varied experience levels. This study aimed to retrospectively analyze diagnostic consistency and accuracy for prostate disease among six radiologists with different experience levels from a single center and to evaluate the diagnostic performance of PI-RADS v2 scores in the detection of clinically significant prostate cancer (PCa).
METHODS: From December 2014 to March 2016, 84 PCa patients and 99 benign prostatic shyperplasia patients who underwent 3.0T multiparametric magnetic resonance imaging before biopsy were included in our study. All patients received evaluation according to the PI-RADS v2 scale (1-5 scores) from six blinded readers (with 6 months and 2, 3, 4, 5, or 17 years of experience, respectively, the last reader was a reviewer/contributor for the PI-RADS v2). The correlation among the readers' scores and the Gleason score (GS) was determined with the Kendall test. Intra-/inter-observer agreement was evaluated using κ statistics, while receiver operating characteristic curve and area under the curve analyses were performed to evaluate the diagnostic performance of the scores.
RESULTS: Based on the PI-RADS v2, the median κ score and standard error among all possible pairs of readers were 0.506 and 0.043, respectively; the average correlation between the six readers' scores and the GS was positive, exhibiting weak-to-moderate strength (r = 0.391, P = 0.006). The AUC values of the six radiologists were 0.883, 0.924, 0.927, 0.932, 0.929, and 0.947, respectively.
CONCLUSION: The inter-reader agreement for the PI-RADS v2 among the six readers with different experience is weak to moderate. Different experience levels affect the interpretation of MRI images.

Entities:  

Keywords:  Benign Prostatic Hyperplasia; Diagnosis; Magnetic Resonance Imaging; Prostate Cancer; Prostate Imaging Reporting and Data System Version 2

Mesh:

Year:  2018        PMID: 29998885      PMCID: PMC6048930          DOI: 10.4103/0366-6999.235872

Source DB:  PubMed          Journal:  Chin Med J (Engl)        ISSN: 0366-6999            Impact factor:   2.628


INTRODUCTION

For the past decade, benign prostatic hyperplasia (BPH) and prostate cancer (PCa) have remained the most common diseases of the male prostate. In 2017, 161,360 new PCa cases and 26,730 PCa deaths are projected to occur in the United States according to the American Cancer Society.[1] PCa is the third leading cause of cancer-related death among males,[1] followed by lung/bronchus and colon/rectum-related neoplastic diseases, and prostate disease remains a significant challenge not only for urologists and oncologists, but also for radiologists. Advances in computer software and hardware have led to multiparametric magnetic resonance imaging (mp-MRI), combining anatomical T2-weighted imaging (T2WI) and functional MRI sequences, such as diffusion-weighted imaging (DWI), apparent-diffusion coefficient (ADC) maps, or dynamic contrast-enhanced (DCE) imaging,[23] which has become the preferred imaging method for the prostate and periprostatic structures. This approach provides more accurate localization and high-quality images for the detection of prostate diseases, especially for PCa.[4567] Due to differences in magnetic resonance scanners, acquisition parameter settings, and subjective evaluation criteria, the interpretation of mp-MRI findings by radiologists differs from different clinicians. Therefore, how to unify diagnostic systems and bridge the gap between different radiologists is increasingly recognized as an important clinical problem. To address this issue, the European Society of Urogenital Radiology (ESUR) launched the first version of a global prostate standardization guide called the Prostate Imaging Reporting and Data System (PI-RADS; herein referred to as the PI-RADS v1) in 2012.[8] The PI-RADS v1 was widely distributed, but some limitations in its clinical application caused significant controversy regarding inter-reader reproducibility and the feasibility of the guidelines.[910111213] First, the PIRADS v1 does not include a rating scheme, and no weights for individual parameters were defined. Second, the PI-RADS v1 does not combine all imaging sequences into a comprehensive assessment.[14] Third, the value of DCE in evaluating the transition zone (TZ) is overestimated by the PI-RADS v1[15] and no value was assigned or recommended for DCE. In addition, DWI in the peripheral zone (PZ) has previously been reported to exhibit superior performance.[12] Considering these issues, in 2014, the updated PI-RADS version 2 (herein referred to as the PI-RADS v2) was released by the International Collaboration of the American College of Radiology, ESUR, and AdMetech Foundation, based on the best available evidence and expert consensus opinion worldwide.[16] Compared with the PI-RADS v1, the PI-RADS v2 has a simplified scoring system and uses only a 5-point scale for comprehensive evaluation of all imaging sequences. In addition, the PI-RADS v2 uses a more differentiated weighting system based on the concept of dominant techniques and does not recommend the magnetic resonance spectroscopic (MRS) imaging for PI-RADS assessment but rather DWI as the dominant sequence in PZ and T2WI as the dominant sequence in TZ. Third, the PI-RADS v2 recommends optimal technical parameters for T2WI, DWI, and DCE sequences and introduces a new size threshold of 15 mm for T2WI, DWI, and ADC to differentiate between PI-RADS scores of 4 and 5. Moreover, the aims of the PI-RADS v2 are to improve the detection of clinically significant cancer and increase the accuracy of risk assessment for patients with suspected PCa, to enhance diagnostic confidence in benign diseases, to establish the most simplified MRI capture process globally and diminish variation in the acquisition and interpretation of prostate images, and to promote communication between clinicians and radiologists.[1617] Based on these advantages and aims of the PI-RADS v2, investigation of the utility of the PI-RADS among readers is crucial, not only between two readers for a small number of cases, but also among more readers with varying experience levels for a high number of cases. Therefore, the purpose of this study was to retrospectively analyze consistency and accuracy in diagnosing prostate disease among six radiologists with different experience levels and to evaluate the diagnostic performance of the PI-RADS v2 in detecting clinically significant PCa.

METHODS

Ethical approval

This retrospective, single-center study was approved by the Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology Institutional Review Board, and written informed consent was provided by all patients before examination.

Study design

Between December 2014 and March 2016, patients with clinically suspected PCa due to elevated prostate-specific antigen (PSA) levels and/or abnormal signal nodules were recruited for this study. Initially, we reviewed 317 patients who underwent 3.0T prostate MRI. However, 134 patients were excluded for the following reasons: (1) the patients had no histopathologically confirmed results (n = 30); (2) the patients underwent prior treatment, including surgical therapies, irradiation, cryosurgery, or hormonotherapy (n = 15); (3) previous biopsies were performed within 6 weeks before the MR examination (n = 2); (4) DCE imaging was not performed in the patient due to renal dysfunction and/or unwillingness to undergo the procedure (n = 84); and (5) the quality of the MRI images was poor due to movement artifacts, catheter artifacts, or the presence of hip implants (n = 3). Finally, the remaining 183 patients were included and a flowchart of the patient selection process is provided in Figure 1.
Figure 1

Flowchart for the selection of patients in the present study. mp-MRI: Multiparametric magnetic resonance imaging; DCE: Dynamic contrast enhanced.

Flowchart for the selection of patients in the present study. mp-MRI: Multiparametric magnetic resonance imaging; DCE: Dynamic contrast enhanced.

Magnetic resonance imaging protocol

All examinations were performed with a 3.0T system (MAGNETOM Skyra, Siemens Healthcare, Erlangen, Germany), using an anterior 18-element body coil combined with a posterior spine coil array. The scan sequences included T2WI, DWI, and DCE, which were performed using the parameters shown in Table 1. In DWI, the b values consisted of 0, 50, 200, 400, 600, 800, 1000, and 1500 s/mm2. ADC maps were automatically reconstructed for qualitative and quantitative assessments of DWI. Axial DCE images were obtained before, during, and after rapid injection of gadolinium chelate (35 phases and 8 s for each phase) using a power injector (Medtron, Saarbruecken, Germany), followed by a 20 ml saline flush injected at a rate of 2.5 ml/s. All axial images were copied at the same location.
Table 1

mp-MR imaging sequence parameters at 3.0T

ParameterT2WIT1WIDWIDCE
Repetition time (ms)6874.00807.004500.005.08
Echo time (ms)104.0013.0085.001.77
Section thickness (mm)3.005.003.003.50
Intersection gap (mm)0000.70
Field of view (mm2)180 × 180300 × 356214 × 171260 × 260
Matrix384 × 384320 × 24090 × 72192 × 154
Parallel imaging factor2NA22
Flip angle (°)1601609015
Time of acquisition (s)196186248284

T2WI: T2-weighted imaging; T1WI: T1-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; NA: Not applicable; mp-MRI: Multiparametric magnetic resonance imaging.

mp-MR imaging sequence parameters at 3.0T T2WI: T2-weighted imaging; T1WI: T1-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; NA: Not applicable; mp-MRI: Multiparametric magnetic resonance imaging.

Magnetic resonance imaging interpretation and PI-RADS scoring

For each patient, mp-MRI images of the prostate were shown to six independent readers with varying levels of experience in the diagnosis of prostate diseases (reader 1, Zhen Kang, with 6 months of experience [approximately 100 examinations]; reader 2, Pei-Pei Zhang, with 2 years of experience [approximately 400 examinations]; reader 3, Zan Ke, with 3 years of experience [approximately 600 examinations]; reader 4, Xiang-De Min, with 4 years of experience [approximately 800 examinations]; reader 5, Zhao-Yan Feng, with 5 years of experience [approximately 1000 examinations]; and reader 6, Liang Wang, with 17 years of experience who was a reviewer/contributor for the PI-RADS v2 [approximately 10,000 examinations]). These six readers were blinded to all identifying information of the patients and their clinicopathologic outcomes. During scoring, the T2WI, DWI, and DCE images of each patient were shown to the readers at the same location on one single screen by an assistant fellow who assigned a scoring region but was not involved in the scoring process, and then the readers independently provided a single score (on a scale from 1 to 5 scores) based on the PI-RADS v2 and their own experience and comprehensive judgment after browsing all sequences. After 2 weeks, reader 3 repeated the scoring process to test intrareader reproducibility.

Pathologic evaluation

After the MRI examination, all patients underwent a 12-core transrectal ultrasound (TRUS)-guided prostate biopsy (within 6 weeks; median: 1 week) to obtain tissue samples for histopathological examination. To match biopsy sextants and MR images, the prostate was divided into 12 regions, and each specimen was individually labeled according to its location and histologically analyzed. The targeted biopsy was performed using an ultrasound system (Hawk 2102, BK Medical, Denmark) equipped with a 5.1-MHz endocavitary probe and a spring-loaded biopsy gun with an 18G core biopsy needle; a single urologist with 20 years of skilled experience performed these biopsies. The samples were assessed by an experienced genitourinary pathologist with more than 10 years of experience, who was blinded to the MRI results. The cases were obtained from standard pathologic reports, and each sample was histologically analyzed as cancerous or noncancerous and then given a respective Gleason score (GS) if the sample was classified as PCa. Finally, we selected the GS matching the scoring region in the MRI images as the final GS.

Statistical analysis

SPSS 19.0 (SPSS, Chicago, IL, USA) and MedCalc version 11.4.2.0 (MedCalc statistical software, Mariakerke, Belgium) were used for the data analysis, and all data were expressed as the mean ± standard deviation (SD). The normality and equality of variances of the parameter value distributions were tested by the Kolmogorov-Smirnov test and Levene's F-test. Differences in reader grouping variables were evaluated by the Kruskal-Wallis H-test and a comparison between all possible pairs of readers was performed using the Nemenyi test. Intra- and inter-reader agreement was evaluated using κ statistics,[18] and κ coefficients were assessed as follows:[19] 0.01–020: slight agreement; 0.21–0.40: fair agreement; 0.41–0.60: moderate agreement; 0.61–0.80: substantial agreement; and 0.81–0.99: almost perfect agreement. The correlation among the readers' scores of PCa and the GS was determined with the Kendall τ correlation coefficient (presented as “r”), which is a nonparametric statistical method used for variables that do not meet normality. The r ranged from −1 to 1, with 1 corresponding to a 100% positive correlation, −1 corresponding to a 100% negative correlation, and 0 corresponding to independence.[18] The Wald test was used to obtain the P value of the final Kendall τ estimate. A receiver operating characteristic curve (ROC) analysis was performed, and the area under the curve (AUC) was obtained to evaluate diagnostic performance. The AUC values from the six readers were compared using the Z-test. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated by dichotomizing the PI-RADS criteria according to cutoff values of 3 and 4, which were used as the threshold to distinguish benign cases from cancer and low-risk cancer (defined as GS ≤ 3 + 4 = 7) from clinically significant cancer (defined as GS ≥ 4 + 3 = 7)[16] in PCa patients. A P < 0.05 was used to identify a statistically significant difference.

RESULTS

Patient characteristics

One-hundred and eighty-three patients were included in this retrospective, single-center study, including 84 patients who were diagnosed with PCa and 99 patients whose 1188 biopsy specimens were all diagnosed as benign hyperplasia tissue, representing the BPH group in the study. The mean age of our study population was 65.4 ± 8.5 years (range: 46–88 years). The mean PSA level was 134.48 ± 230.97 ng/ml (range: 1.51–1000.00 ng/ml, excluding one patient without a PSA value) in the PCa group and 14.29 ± 19.17 ng/ml (range: 0.26–115.04 ng/ml, excluding six patients without a PSA value) in the BPH group. The biopsy results confirmed clinically significant PCa (GS ≥ 4 + 3 = 7) in 58 (69%) patients and low-risk PCa (GS ≤ 3 + 4 = 7) in 26 (31%) patients. Patient characteristics are shown in Table 2.
Table 2

Characteristics of patients enrolled in this study

CharacteristicsTotalPCaBPH
Number of patients1838499
Age (years), mean (range)65.4 (46.0–88.0)66.1 (50.0–88.0)64.9 (46.0–85.0)
PSA (ng/ml), mean (range)70.97 (0.26–1000.00)134.48 (1.51–1000.00)14.29 (0.26–115.04)
Prostate volume (ml), mean (range)54.61 (12.31–271.47)49.02 (12.31–271.47)59.34 (13.31–232.55)
Clinically significant PCa, n (%)58 (31.7)58 (69.0)NA
Low-risk PCa, n (%)26 (14.2)26 (31.0)NA
GS, n
 GS of 2 + 3NA1NA
 GS of 3 + 3NA7NA
 GS of 3 + 4NA18NA
 GS of 4 + 3NA14NA
 GS of 4 + 4NA25NA
 GS of 4 + 5NA7NA
 GS of 5 + 4NA10NA
 GS of 5 + 5NA2NA
Clinical stage, n
 cT2aNA10NA
 cT2bNA15NA
 cT2cNA1NA
 cT3aNA7NA
 cT3bNA32NA
 cT4NA19NA

PSA: Prostate-specific antigen; PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; NA: Not applicable; GS: Gleason score.

Characteristics of patients enrolled in this study PSA: Prostate-specific antigen; PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; NA: Not applicable; GS: Gleason score.

Interobserver agreement

The κ statistics of all possible pairs of readers were calculated, and pairwise κ statistics and standard errors are shown in Table 3. In general, the inter-reader agreement was weak to moderate, while the intrareader agreement was good. The median κ statistic and standard error among all possible pairs of readers for the PI-RADS v2 were 0.506 and 0.043, respectively. Figure 2 and 3 show representative lesions for BPH and PCa with inter-reader variability, respectively.
Table 3

Pair-wise inter-reader κ statistic of the PI-RADS v2 (n = 183)

Reader pairsκ score*Standard error
1 and 20.4750.043
1 and 3a0.5580.044
1 and 40.4780.043
1 and 50.3690.038
1 and 60.4550.045
2 and 3a0.5530.045
2 and 40.5360.044
2 and 50.4410.041
2 and 60.6200.043
3a and 40.5690.045
3a and 50.3850.040
3a and 60.5200.045
4 and 50.6420.041
4 and 60.5590.044
5 and 60.4300.039
Mean0.5060.043
3a and 3b0.7880.036

*κ score of the overall PI-RADS score; †3a-The first score of reader 3; 3b-The second score of reader 3 (the second score was performed two weeks after the first score). PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Figure 2

Images from a 55-year-old man who was diagnosed with PCa (GS = 3 + 3 = 6), with a PSA level of 22.43 mg/ml. The readers evaluated the prostate based on (a) T2-WI, (b) DWI (b = 1500 s/mm2), and (c) an axial early DCE image, and the results were confirmed by (d) a pathology image (hematoxylin and eosin staining, ×200). Five of the six readers did not note the right peripheral zone lesion; only reader 6 noticed it. The DCE image showed that the lesion presented slight early enhancement, but the other five readers considered the prostate as a whole to be negative for DCE. Finally, the overall PI-RADS v2 scores assigned by the six readers were 2, 2, 2, 2, 2, and 4, respectively. T2WI: T2-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; PSA: Prostate-specific antigen; PCa: Prostate cancer; GS: Gleason score; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Figure 3

Images from a 73-year-old man who was diagnosed with BPH with a PSA level of 15.948 mg/ml. The readers evaluated the prostate based on (a) T2WI (the asterisk represents a urethral catheter), (b) DWI (b = 1500 s/mm2), and (c) an axial early DCE image, and the results were confirmed by (d) a pathology image (hematoxylin and eosin staining, ×200). All the readers considered the prostate as a whole to be negative for DCE, with only slight diffusion restriction on DWI. Finally, the overall PI-RADS v2 scores assigned by the six readers were 2, 3, 2, 2, 3, and 3, respectively. T2WI: T2-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; PSA: Prostate-specific antigen; BPH: Benign prostatic hyperplasia; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Pair-wise inter-reader κ statistic of the PI-RADS v2 (n = 183) *κ score of the overall PI-RADS score; †3a-The first score of reader 3; 3b-The second score of reader 3 (the second score was performed two weeks after the first score). PI-RADS v2: Prostate Imaging Reporting and Data System Version 2. Images from a 55-year-old man who was diagnosed with PCa (GS = 3 + 3 = 6), with a PSA level of 22.43 mg/ml. The readers evaluated the prostate based on (a) T2-WI, (b) DWI (b = 1500 s/mm2), and (c) an axial early DCE image, and the results were confirmed by (d) a pathology image (hematoxylin and eosin staining, ×200). Five of the six readers did not note the right peripheral zone lesion; only reader 6 noticed it. The DCE image showed that the lesion presented slight early enhancement, but the other five readers considered the prostate as a whole to be negative for DCE. Finally, the overall PI-RADS v2 scores assigned by the six readers were 2, 2, 2, 2, 2, and 4, respectively. T2WI: T2-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; PSA: Prostate-specific antigen; PCa: Prostate cancer; GS: Gleason score; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2. Images from a 73-year-old man who was diagnosed with BPH with a PSA level of 15.948 mg/ml. The readers evaluated the prostate based on (a) T2WI (the asterisk represents a urethral catheter), (b) DWI (b = 1500 s/mm2), and (c) an axial early DCE image, and the results were confirmed by (d) a pathology image (hematoxylin and eosin staining, ×200). All the readers considered the prostate as a whole to be negative for DCE, with only slight diffusion restriction on DWI. Finally, the overall PI-RADS v2 scores assigned by the six readers were 2, 3, 2, 2, 3, and 3, respectively. T2WI: T2-weighted imaging; DWI: Diffusion-weighted imaging; DCE: Dynamic contrast enhanced; PSA: Prostate-specific antigen; BPH: Benign prostatic hyperplasia; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Differences in grouping variables

The data did not conform to the criteria for normality or homogeneity of variance. The Kruskal-Wallis H-test showed significant differences in the overall variables, and the Nemenyi test indicated that some of the possible pairs of readers presented significant differences [Supplementary Table 1]. For the 183 patients, including 84 PCa patients and 99 BPH patients, significant differences among the six readers were identified (F = 39.42, P < 0.001; F = 32.09, P < 0.001; and F = 97.45, P < 0.001, respectively).
Supplementary Table 1

Overall and pair-wise inter-reader differences according to PI-RADS v2 score

All reader pairsFPPCa reader pairsFPBPH reader pairsFP
N = 18339.420.00n = 8432.090.00n = 9997.450.00
1 and 20.910.631 and 25.290.071 and 20.190.91
1 and 3a4.150.131 and 3a0.810.671 and 3a9.470.01
1 and 45.350.071 and 48.260.021 and 47.190.03
1 and 514.160.001 and 512.220.001 and 535.720.00
1 and 60.350.841 and 68.410.011 and 60.670.72
2 and 3a8.940.012 and 3a10.250.012 and 3a12.340.00
2 and 41.850.402 and 40.330.852 and 45.040.08
2 and 57.890.022 and 51.430.492 and 530.700.00
2 and 60.130.942 and 60.360.832 and 61.570.46
3a and 418.930.003a and 414.260.003a and 433.150.00
3a and 533.640.003a and 519.340.003a and 581.970.00
3a and 66.890.033a and 614.450.003a and 65.110.08
4 and 52.100.354 and 50.390.824 and 510.860.00
4 and 62.980.234 and 60.001.004 and 612.230.00
5 and 610.080.015 and 60.350.845 and 646.140.00

PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Overall and pair-wise inter-reader differences according to PI-RADS v2 score PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Correlations of the Prostate Imaging Reporting and Data System with pathologic results

The double-variable data in our study did not meet the requirements for normality, and the correlations between the PI-RADS v2 scores of the readers and the GSs of the pathologic results are shown in Table 4. On the basis of the PI-RADS v2, the average correlation between the six readers' scores and the GS was positive (r = 0.319; P = 0.006), exhibiting significance and weak-to-moderate strength. The scores of reader 3 were most significant in relation to the GS (r = 0.464; P = 0.000), while the correlation between the scores of readers 2, 4, and 6 and the GS was weak.
Table 4

Correlation coefficient of Kendall test and P values between six readers’ PI-RADS v2 scores and GS on PCa patients (n = 84)

ReaderrP
10.3770.000
20.2840.004
3a0.4640.000
40.2530.011
50.3060.002
60.2310.020
Mean0.3190.006

PCa: Prostate cancer; GS: Gleason score; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Correlation coefficient of Kendall test and P values between six readers’ PI-RADS v2 scores and GS on PCa patients (n = 84) PCa: Prostate cancer; GS: Gleason score; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2.

Receiver operating characteristic curves and diagnostic performance

Supplementary Table 2 shows that in all cases, readers 2 and 6 showed the highest accuracy (90.2%), reader 4 showed the highest sensitivity (96.4%), and reader 1 showed the highest specificity (91.9%). For PCa detection, reader 6 showed the lowest accuracy (70.2%), reader 3 showed the highest accuracy (79.8%), and reader 5 showed the highest sensitivity (96.6%). Figure 4 shows the ROC curves of the six readers with different experience levels. The comparison of the AUC values is shown in Supplementary Table 3. In addition to readers 1 and 6 (Z = 2.341; P = 0.019), no significant differences were found in the overall AUC values among the readers. However, readers 2 and 3, readers 3 and 4, readers 3 and 6, readers 1 and 6, and readers 5 and 6 showed significant differences in AUC values for the PCa group.
Supplementary Table 2

Diagnostic performance of PI-RADS v2 scores from six readers

ReadersThreshold ≤3 (n = 183)*

AUC (95% CI)Sensitivity (%)Specificity (%)PPV (%)NPV (%)Accuracy (%)
10.883 (0.828–0.926)77.491.989.082.785.3
20.924 (0.876–0.958)88.189.988.189.990.2
3a0.927 (0.879–0.960)86.987.985.988.888.0
40.932 (0.885–0.963)96.476.877.996.286.3
50.929 (0.882–0.962)95.278.879.295.188.4
60.947 (0.903–0.974)92.986.985.793.590.2

ReadersThreshold ≥4 (n = 84)

AUC (95% CI)Sensitivity (%)Specificity (%)PPV (%)NPV (%)Accuracy (%)

10.678 (0.567–0.776)71.264.082.448.572.6
20.592 (0.479–0.698)86.432.075.050.075.0
3a0.721 (0.612–0.813)74.660.081.550.079.8
40.615 (0.503–0.719)89.832.075.757.173.8
50.646 (0.543–0.747)96.632.077.080.073.8
60.557 (0.445–0.666)88.124.073.246.270.2

*Cutoff value for differentiating between benign and malignant cases was set at 3, with values and Data System Version. †The cutoff value for differentiating between low-risk and clinically significant PCa was set at 4, with values ≥4 considered positive. PPV: Positive predictive value; NPV: Negative predictive value; AUC: Area under the curve; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2; CI: Confidence interval.

Figure 4

ROC analysis results of the six readers with different experience levels for the PCa patients. (a) ROC curves of the six readers with different experience levels for all the 183 patients. (b) ROC curves of the six readers with different experience levels for the 84 PCa patients. ROC: Receiver operating characteristic; PCa: Prostate cancer.

Supplementary Table 3

AUC values of overall PI-RADS scores for cancer detection and PCa PI-RADS scores for clinically significant cancer detection

Reader pairsOverall (n = 183)PCa (n = 84)


ZPZP
1 and 21.6800.9291.6710.095
1 and 3a1.6460.1000.9140.361
1 and 41.7820.0751.1210.262
1 and 51.5830.1130.5790.563
1 and 62.3410.0192.1500.032
2 and 3a0.1470.8832.6300.009
2 and 40.3760.7070.6330.527
2 and 50.2140.8311.4500.142
2 and 61.3230.1861.1890.234
3a and 40.2710.7872.0480.041
3a and 50.1210.9041.4770.140
3a and 60.9400.3473.0240.003
4 and 50.1750.8610.9420.346
4 and 60.7890.4301.6820.093
5 and 60.8440.3992.4420.015

AUC: Area under the curve; PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; PI-RADS: Prostate Imaging Reporting and Data System.

Diagnostic performance of PI-RADS v2 scores from six readers *Cutoff value for differentiating between benign and malignant cases was set at 3, with values and Data System Version. †The cutoff value for differentiating between low-risk and clinically significant PCa was set at 4, with values ≥4 considered positive. PPV: Positive predictive value; NPV: Negative predictive value; AUC: Area under the curve; PI-RADS v2: Prostate Imaging Reporting and Data System Version 2; CI: Confidence interval. ROC analysis results of the six readers with different experience levels for the PCa patients. (a) ROC curves of the six readers with different experience levels for all the 183 patients. (b) ROC curves of the six readers with different experience levels for the 84 PCa patients. ROC: Receiver operating characteristic; PCa: Prostate cancer. AUC values of overall PI-RADS scores for cancer detection and PCa PI-RADS scores for clinically significant cancer detection AUC: Area under the curve; PCa: Prostate cancer; BPH: Benign prostatic hyperplasia; PI-RADS: Prostate Imaging Reporting and Data System.

DISCUSSION

In the current study, we invited six radiologists with varying experience levels to read prostate MRI images and assign scores using the PI-RADS v2. Our study revealed a moderate level of interobserver agreement among these readers, indicating that different experience levels may affect the interpretation of images, even under the guidance of the PI-RADS v2. A similar level of interobserver agreement was reported by Muller et al.,[18] who showed that the interobserver reproducibility for the overall suspicion score of the PI-RADS v2 was moderate (κ statistic score: 0.46; standard error: 0.03) as scored by five independent readers with varying experience levels (12 years, 7 years, 1 year for two readers, and 6 months). However, the readers in their study showed a narrow range of experience, while our study involved six readers with a broad range of experience (2, 3, 4, 5, and 17 years, and 6 months). In another study, Rosenkrantz et al.[20] found that the interobserver agreement was 0.593 for the PZ and 0.509 for the TZ based on a PI-RADS v2 score of 4 or greater; their analysis included six experienced radiologists from six separate institutions, consisted of two sessions, and included an intersession training period with discussion. However, no substantial difference in interobserver agreement was observed between the two sessions, and a training session was neither required nor provided an added benefit. Significant differences were identified among the scores of the six readers in our study. Therefore, radiologist experience is a crucial factor when evaluating MR images. However, differences were not noted between each pair of readers, and most of the differences were associated with reader 1 who had only 6 months of experience, suggesting that lack of experience has an impact on MRI interpretation, even though according to the PI-RADS v2 which is based on expert consensus opinion worldwide, lack of experience may correspond to a lack of understanding. Our results also indicated that the average correlation between the scores of the six readers for the 84 PCa patients and the GS was positive and moderate according to the PI-RADS v2. NiMhurchu et al.[21] showed that the correlation between a positive targeted biopsy and both the T2WI and overall PI-RADS scores was also significant (P < 0.001), while the correlation between a targeted biopsy and the DWI score was significant only for PZ tumors. However, this study was based on the PI-RADS v1, and whether the PI-RADS v2 would have led to the same result is difficult to determine. In distinguishing between benign and malignant lesions, the most experienced reader (reader 6) in our study achieved the highest accuracy and AUC when the cutoff value was set at 3; however, this reader showed neither the highest nor the lowest percentage in terms of sensitivity, specificity, PPV, and NPV. Meanwhile, the least experienced reader (reader 1) achieved the lowest AUC, sensitivity, NPV, and accuracy and the highest specificity and PPV among the readers, which may be due to the different experience levels of the readers. In the study of Baldisserotto et al.,[22] a PI-RADS score of 3 was applied as an indicator of the absence of cancer, and the accuracy, sensitivity, specificity, PPV, and NPV of reader 1 (10 years of experience) were 77.8%, 73.5%, 85.0%, 89.3%, and 65.4%, respectively, and these values for reader 2 (4 years of experience) were 77.8%, 76.5%, 80.0%, 86.7%, and 66.7%, respectively. These values are lower than those of our study, which also demonstrates that the differences among readers may have been caused by varying experience levels. Previous studies without the PI-RADS criteria, such as that by Garcia-Reyes et al.,[23] have also shown that readers' experience influences the accuracy of mp-MRI regarding the diagnosis of PCa. Nevertheless, from the results of the distinction between low-risk cancer and clinically significant cancer, reader 1 achieved the highest PPV and specificity, while reader 6 showed the lowest specificity, PPV, and accuracy, which can be interpreted as the reader with more experience showing more conservative tendencies. A study in 2016 by Zhao et al.[24] revealed a significant correlation between a higher PI-RADS v2 score and the presence of clinically significant PCa (P < 0.001), and a PI-RADS score of 3 was identified as the best cutoff point with a sensitivity and specificity greater than 80%. Our results showed a similar average specificity and sensitivity using the same cutoff. In recent years, most studies have concluded that the PI-RADS v2 exhibits better diagnostic performance than the PI-RADS v1.[2526] Another study by Wang et al.[27] that evaluated the PI-RADS v1 score with respect to the PCa detection rate in patients with PSA levels <20 ng/ml showed a good correlation between an increased PI-RADS score and an increased cancer detection rate, and the summed score of T2WI + DWI showed the highest accuracy for PCa detection. However, a few studies have produced different results, showing that although the PI-RADS v2 uses a simplified approach, this system can lead to a higher rate of false-negative results and lower diagnostic accuracy due to the risk of missing low PI-RADS-scored tumors. In a study by Auer et al.,[28] the authors included fifty PCa patients who underwent mp-MRI, and all the images were evaluated according to the PI-RADS v1 and PI-RADS v2 by two radiologists with a similar level of expertise. Their results showed that the PI-RADS v1 had a significantly larger discriminative ability for tumor detection regardless of whether the lesion was in the PZ or the TZ (PI-RADS v1 AUC: 0.96; PI-RADS v2 AUC: 0.90). Several limitations existed in our study. The primary limitation is that the mean PSA level for the PCa population was slightly higher, and 70% of the PCa cases were locally advanced (stage T3/T4; 23% of the tumors were T4), which may have biased the study because larger, more aggressive tumors will be found by most radiologists; therefore, the agreement will be high and the diagnostic accuracy will be good. This phenomenon has also been observed in other studies.[29] However, PSA screening is not common in China, so our patients usually visit a doctor when they have obvious clinical symptoms, which often reflect an advanced disease stage. Therefore, we hope to improve this aspect in future research. Second, our readers provided only one final score for each case, and the results were not separately analyzed according to the PZ, TZ, T2WI, T1WI, or DWI. The aim of the PI-RADS v2 is to conduct a comprehensive evaluation of prostate lesions according to all major sequences rather than just one sequence. Therefore, providing a fast, accurate, and comprehensive judgment according to the PI-RADS v2 is important, which is why we conducted a comprehensive evaluation to adapt to these new conditions. Another potential limitation is that our reader and patient data all came from the same center, and although the readers were blinded to all identifying patient information, the readers may have been familiar with the cases in our database, which may have increased the inter-reader agreement or accuracy. Therefore, a larger dataset from a multicentric study is needed in the future. In addition, we selected a GS ≥4 + 3 as the definition of clinically significant PCa. However, no universally accepted consensus exists regarding the definition of clinically significant PCa. Finally, the reference standard that we used was TRUS-guided prostate biopsy, which may be less accurate than prostatectomy.[30] However, the primary goal of our study was to explore diagnostic performance and interobserver consistency among readers with different experience levels according to the latest PI-RADS version. Therefore, the impact of this limitation was very small, and we aim to enroll more patients with prostatectomy in the future to support the results of this study. In conclusion, six prostate radiologists with different experience levels achieved weak-to-moderate inter-reader agreement using the PI-RADS v2 lexicon, and varying levels of experience have an impact on the interpretation of MR images. However, the PI-RADS v2 showed excellent diagnostic performance for different readers; therefore, our data suggested that as a living document, the PI-RADS will evolve and change in response to clinical needs and technical improvements in the future. Supplementary information is linked to the online version of the paper on the Chinese Medical Journal website.

Financial support and sponsorship

This work was supported by grants from the National Natural Science Foundation of China (NSFC; No. 81671656 and No. 81171307).

Conflicts of interest

There are no conflicts of interest.
  30 in total

1.  Inter-reader agreement of the ESUR score for prostate MRI using in-bore MRI-guided biopsies as the reference standard.

Authors:  L Schimmöller; M Quentin; C Arsov; R S Lanzman; A Hiester; R Rabenalt; G Antoch; P Albers; D Blondin
Journal:  Eur Radiol       Date:  2013-06-12       Impact factor: 5.315

2.  PIRADS 2.0: what is new?

Authors:  Baris Turkbey; Peter L Choyke
Journal:  Diagn Interv Radiol       Date:  2015 Sep-Oct       Impact factor: 2.630

Review 3.  Multiparametric MRI and prostate cancer diagnosis and risk stratification.

Authors:  Baris Turkbey; Peter L Choyke
Journal:  Curr Opin Urol       Date:  2012-07       Impact factor: 2.309

4.  Cancer Statistics, 2017.

Authors:  Rebecca L Siegel; Kimberly D Miller; Ahmedin Jemal
Journal:  CA Cancer J Clin       Date:  2017-01-05       Impact factor: 508.702

5.  Assessment of PI-RADS v2 for the Detection of Prostate Cancer.

Authors:  Moritz Kasel-Seibert; Thomas Lehmann; René Aschenbach; Felix V Guettler; Mohamed Abubrig; Marc-Oliver Grimm; Ulf Teichgraeber; Tobias Franiel
Journal:  Eur J Radiol       Date:  2016-01-19       Impact factor: 3.528

6.  Comparison of PI-RADS 2, ADC histogram-derived parameters, and their combination for the diagnosis of peripheral zone prostate cancer.

Authors:  W C Lin; A C Westphalen; G E Silva; S Chodraui Filho; R B Reis; V F Muglia
Journal:  Abdom Radiol (NY)       Date:  2016-11

7.  Evaluation of the PI-RADS scoring system for mpMRI of the prostate: a whole-mount step-section analysis.

Authors:  Daniel Junker; Michael Quentin; Udo Nagele; Michael Edlinger; Jonathan Richenberg; Georg Schaefer; Michael Ladurner; Werner Jaschke; Wolfgang Horninger; Friedrich Aigner
Journal:  World J Urol       Date:  2014-08-01       Impact factor: 4.226

Review 8.  PI-RADS version 2: what you need to know.

Authors:  T Barrett; B Turkbey; P L Choyke
Journal:  Clin Radiol       Date:  2015-07-29       Impact factor: 2.350

9.  ESUR prostate MR guidelines 2012.

Authors:  Jelle O Barentsz; Jonathan Richenberg; Richard Clements; Peter Choyke; Sadhna Verma; Geert Villeirs; Olivier Rouviere; Vibeke Logager; Jurgen J Fütterer
Journal:  Eur Radiol       Date:  2012-02-10       Impact factor: 5.315

10.  Evaluation of the Prostate Imaging Reporting and Data System for Magnetic Resonance Imaging Diagnosis of Prostate Cancer in Patients with Prostate-specific Antigen <20 ng/ml.

Authors:  Xuan Wang; Jian-Ye Wang; Chun-Mei Li; Ya-Qun Zhang; Jian-Long Wang; Ben Wan; Wei Zhang; Min Chen; Sa-Ying Li; Gang Wan; Ming Liu
Journal:  Chin Med J (Engl)       Date:  2016-06-20       Impact factor: 2.628

View more
  3 in total

1.  PI-RADS version 2.1 scoring system is superior in detecting transition zone prostate cancer: a diagnostic study.

Authors:  Zhibing Wang; Wenlu Zhao; Junkang Shen; Zhen Jiang; Shuo Yang; Shuangxiu Tan; Yueyue Zhang
Journal:  Abdom Radiol (NY)       Date:  2020-09-09

2.  Low PI-RADS assessment category excludes extraprostatic extension (≥pT3a) of prostate cancer: a histology-validated study including 301 operated patients.

Authors:  Sarah Alessi; Paola Pricolo; Paul Summers; Marco Femia; Elena Tagliabue; Giuseppe Renne; Roberto Bianchi; Gennaro Musi; Ottavio De Cobelli; Barbara Alicja Jereczek-Fossa; Massimo Bellomi; Giuseppe Petralia
Journal:  Eur Radiol       Date:  2019-03-18       Impact factor: 5.315

3.  Factors Influencing Variability in the Performance of Multiparametric Magnetic Resonance Imaging in Detecting Clinically Significant Prostate Cancer: A Systematic Literature Review.

Authors:  Armando Stabile; Francesco Giganti; Veeru Kasivisvanathan; Gianluca Giannarini; Caroline M Moore; Anwar R Padhani; Valeria Panebianco; Andrew B Rosenkrantz; Georg Salomon; Baris Turkbey; Geert Villeirs; Jelle O Barentsz
Journal:  Eur Urol Oncol       Date:  2020-03-17
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.