| Literature DB >> 34976817 |
Zilong He1, Yue Li2, Weixiong Zeng1, Weimin Xu1, Jialing Liu1, Xiangyuan Ma2,3, Jun Wei4, Hui Zeng1, Zeyuan Xu1, Sina Wang1, Chanjuan Wen1, Jiefang Wu1, Chenya Feng1, Mengwei Ma1, Genggeng Qin1, Yao Lu2,5, Weiguo Chen1.
Abstract
Radiologists' diagnostic capabilities for breast mass lesions depend on their experience. Junior radiologists may underestimate or overestimate Breast Imaging Reporting and Data System (BI-RADS) categories of mass lesions owing to a lack of diagnostic experience. The computer-aided diagnosis (CAD) method assists in improving diagnostic performance by providing a breast mass classification reference to radiologists. This study aims to evaluate the impact of a CAD method based on perceptive features learned from quantitative BI-RADS descriptions on breast mass diagnosis performance. We conducted a retrospective multi-reader multi-case (MRMC) study to assess the perceptive feature-based CAD method. A total of 416 digital mammograms of patients with breast masses were obtained from 2014 through 2017, including 231 benign and 185 malignant masses, from which we randomly selected 214 cases (109 benign, 105 malignant) to train the CAD model for perceptive feature extraction and classification. The remaining 202 cases were enrolled as the test set for evaluation, of which 51 patients (29 benign and 22 malignant) participated in the MRMC study. In the MRMC study, we categorized six radiologists into three groups: junior, middle-senior, and senior. They diagnosed 51 patients with and without support from the CAD model. The BI-RADS category, benign or malignant diagnosis, malignancy probability, and diagnosis time during the two evaluation sessions were recorded. In the MRMC evaluation, the average area under the curve (AUC) of the six radiologists with CAD support was slightly higher than that without support (0.896 vs. 0.850, p = 0.0209). Both average sensitivity and specificity increased (p = 0.0253). Under CAD assistance, junior and middle-senior radiologists adjusted the assessment categories of more BI-RADS 4 cases. The diagnosis time with and without CAD support was comparable for five radiologists. The CAD model improved the radiologists' diagnostic performance for breast masses without prolonging the diagnosis time and assisted in a better BI-RADS assessment, especially for junior radiologists.Entities:
Keywords: computer-aided diagnosis; convolutional neural network; diagnosis performance; digital mammographic; mass lesion
Year: 2021 PMID: 34976817 PMCID: PMC8719464 DOI: 10.3389/fonc.2021.773389
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1The distribution of the benign and malignant cases obtained in this study. In order to ensure that the study power was better than 0.8, according the reference method, for a ratio of benign and malignant cases of 1.0 in the study and six readers, an evaluation data set of at least 51 cases was needed to be randomly selected.
Characteristics about the age, breast composition, and biopsy results of the population for this study.
| Variable | Training set (n = 214) | Test set (n = 202) | 51 Cases in observer evaluation |
|---|---|---|---|
|
| |||
| Mean | 45.64 | 45.51 | 46.53 |
| Median | 45 | 45 | 47 |
| Range | 23–73 | 23–78 | 27–65 |
| Interquartile range | 40–50 | 40–50 | 40–51 |
| p-value compared with training set | – | 0.8910 | 0.5441 |
|
| |||
| a | 6 | 5 | 3 |
| b | 23 | 25 | 9 |
| c | 169 | 155 | 35 |
| d | 16 | 17 | 4 |
| p-value compared with training set | – | 0.9268 | 0.3404 |
*BI-RADS breast composition is defined in the fifth ACR BI-RADS; it includes four categories. “a”: almost entirely fatty; “b”: scattered areas of fibroglandular density; “c”: heterogeneously dense; “d”: extremely dense.
BI-RADS, Breast Imaging Reporting and Data System.
Specific quantification for different descriptions summarized from radiology reports of all cases in our study.
| Descriptions | Radiologists’ assessment | Quantification |
|---|---|---|
| Shape | Oval or round | 0 |
| Irregular | 1 | |
| Margin sharpness | Circumscribed | 0 |
| Obscured | 0.5 | |
| Indistinct | 1 | |
| Microlobulated margins | No | 0 |
| Yes | 1 | |
| Spiculated margins | No | 0 |
| Yes | 1 | |
| Density | Low or fat-containing | 0 |
| Equal | 0.5 | |
| High | 1 |
Figure 2An example of quantification for a malignant mass in mediolateral oblique (MLO)-view full-field digital mammography (FFDM). Five text descriptions assessed by a radiologist as shown in the red box are quantified as corresponding numbers. A five-dimension vector is generated, which is used as the ground truth to train the perceptive feature extractor.
Figure 3The architecture of this feature extractor. The input is a two-channel tensor, which consists of an original mammography patch and its corresponding mask of region of interest (ROI). The extractor is a modified VGG16 neural network, which consists of 13 convolution layers and three fully connected layers. The last fully connected layer has 128 neurons, which are used as perceptive features in this study. ReLU, rectified linear unit; Conv, convolution.
The comparisons for specific AUCs for six readers and their averaged AUC in multi-reader multi-case observer study.
| Reader | AUC unaided | AUC with model reference | Difference | p-value |
|---|---|---|---|---|
| 1 | 0.842 | 0.920 | 0.078 | |
| 2 | 0.783 | 0.892 | 0.109 | |
| 3 | 0.889 | 0.922 | 0.033 | |
| 4 | 0.852 | 0.890 | 0.038 | |
| 5 | 0.866 | 0.904 | 0.038 | |
| 6 | 0.869 | 0.847 | -0.022 | |
| Diagonal average | 0.850 | 0.896 | 0.046 | 0.0209 |
AUC, area under the curve.
Figure 4Receiver operating characteristic (ROC) curves for six readers in three groups during diagnosis with and without computer-aided diagnosis (CAD) model support. (A) Junior group of readers 1 and 2. (B) Middle-seniority group of readers 3 and 4. (C) Senior group of readers 5 and 6. We can observe that ROCs show an upward trend more highly and steadily in junior group, but in middle-seniority and senior groups, ROCs are not obviously changed, which has decreased with support in reader 6.
The difference in sensitivity, specificity, PPV, and NPV in different experience groups with and without model reference.
| Sensitivity | Reader | Unaided | Aided | Difference |
|---|---|---|---|---|
| Group | ||||
| Junior | 1 | 0.545 | 0.682 | 0.137 |
| 2 | 0.682 | 0.864 | 0.182 | |
| Middle-seniority | 3 | 0.773 | 0.773 | 0 |
| 4 | 0.773 | 0.864 | 0.091 | |
| Senior | 5 | 0.819 | 0.901 | 0.082 |
| 6 | 0.773 | 0.773 | 0 | |
|
| ||||
|
|
|
|
|
|
| Junior | 1 | 0.931 | 0.931 | 0 |
| 2 | 0.621 | 0.621 | 0 | |
| Middle-seniority | 3 | 0.793 | 0.897 | 0.104 |
| 4 | 0.862 | 0.931 | 0.069 | |
| Senior | 5 | 0.793 | 0.689 | -0.104 |
| 6 | 0.863 | 0.897 | 0.034 | |
|
| ||||
|
|
|
|
|
|
| Junior | 1 | 0.857 | 0.882 | 0.025 |
| 2 | 0.577 | 0.633 | 0.056 | |
| Middle-seniority | 3 | 0.739 | 0.850 | 0.111 |
| 4 | 0.810 | 0.905 | 0.095 | |
| Senior | 5 | 0.750 | 0.689 | -0.061 |
| 6 | 0.809 | 0.850 | 0.041 | |
|
| ||||
|
|
|
|
|
|
| Junior | 1 | 0.729 | 0.794 | 0.065 |
| 2 | 0.720 | 0.857 | 0.137 | |
| Middle-seniority | 3 | 0.821 | 0.838 | 0.017 |
| 4 | 0.833 | 0.900 | 0.067 | |
| Senior | 5 | 0.851 | 0.909 | 0.058 |
| 6 | 0.833 | 0.838 | 0.005 | |
Counts of BI-RADS changes for each reader under supporting by CAD model.
| Junior group | Middle-seniority group | Senior group | Total | |||||
|---|---|---|---|---|---|---|---|---|
| Reader 1 | Reader 2 | Reader 3 | Reader 4 | Reader 5 | Reader 6 | |||
|
|
| 14 | 11 | 11 | 17 | 22 | 5 | 80 |
|
| 4 | 15 | 13 | 5 | 4 | 7 | 48 | |
|
| 18 | 26 | 24 | 22 | 26 | 12 | 128 | |
|
|
| 10 | 15 | 13 | 8 | 14 | 7 | 67 |
|
| 8 | 11 | 11 | 14 | 12 | 5 | 56 | |
|
|
| 1 | 0 | 1 | 0 | 1 | 0 | 3 |
|
| 2 | 7 | 5 | 2 | 6 | 1 | 23 | |
|
| 12 | 18 | 16 | 17 | 17 | 10 | 90 | |
|
| 3 | 1 | 2 | 3 | 2 | 1 | 12 | |
BI-RADS, Breast Imaging Reporting and Data System.
The mean diagnosis time for radiologists in multi-reader multi-case study.
| Reader | Mean time w/o support(s) | Mean time with model support (s) | Difference | p-value | With model increased time cases | With model decreased time cases | With model remained the same time cases |
|---|---|---|---|---|---|---|---|
| 1 | 55.27 | 55.51 | 0.24 | 0.955 | 22 | 29 | 0 |
| 2 | 80.59 | 81.18 | 0.59 | 0.912 | 12 | 38 | 1 |
| 3 | 63.90 | 64.24 | 0.34 | 0.928 | 22 | 27 | 2 |
| 4 | 45.10 | 42.59 | -2.51 | 0.378 | 9 | 39 | 3 |
| 5 | 42.35 | 37.35 | -5 | 0.089 | 14 | 36 | 1 |
| 6 | 56.96 | 43.96 | -13 | 0.001 | 19 | 32 | 0 |
Figure 5Diagnosis time comparison. (A) The time comparison of all readers. (B) The time comparison for reader 6, who was the only one who showed an obvious difference between the two sessions. The graph shows differences in diagnosis time per case for all readers. Each red point indicated diagnosis time for a certain case with or without model support. There is no significant change when the point falls on the diagonal. Point above the diagonal indicates diagnosis time increased with model support. Point below the diagonal means the time decreased with model support.
Diagnostic performance of previous models in classification of masses.
| Year | First author | Model | Traning cases | AUC |
|---|---|---|---|---|
| 2015 | Ertosun and Rubin ( | VGG -Net 16 | 2,250 | 0.82 |
| 2015 | Surendiran et al. ( | A univariate ANOVA discriminant analysis | 300 | 0.93 |
| 2016 | Sun et al. ( | 4 Convolutional ANN and 1 fully connected layer | 840 | 0.70 |
| 2017 | Becker et al. ( | ViDi Red | 286 | 0.81 |
| 2020 | Agarwal et al. ( | Faster-RCNN | 800 | 0.90 |
| 2020 | Boumaraf et al. ( | BPN | 500 | 0.94 |
| 2021 | Yan et al. ( | CMCNet VGG16 | 586 | 0.94 |
| 2021 | Ours | Classical CNN VGG16 based on perceptive features | 214 | 0.91 |