Breast density is an important variable that can change the sensitivity of
mammography in the screening populations. False-negative rates are higher in dense
breasts due to the masking effect of density. Additional methods are considered to
overcome this issue (1–3). Furthermore, women with
dense breasts have a four- to six-fold increased risk of breast cancer when compared
to women with fatty breasts (4).Studies concerning breast density have been ongoing since the 1990s; those
mammographic interpretation-based studies were conducted to analyze the usage of the
Breast Imaging and Reporting Data System (BI-RADS) of the American College of
Radiology (ACR) (5–8). These studies continued to increase
after recognizing the impact of breast density on screening for breast cancer.
Additionally, the density notification laws in the United States, the development of
new quantitative density assessment tools, and the introduction of the BI-RADS 5th
edition in 2013 led to a rise in published research on the topic (5–10). Roughly, the main goal of these
studies was to determine a method to improve the measurement of breast density for
the risk assessment of breast cancer. This would, therefore, help identify better
screening strategies.Mammographic density is commonly measured using BI-RADS recommendations from the ACR.
The 4th edition of BI-RADS categorizes breast density based on the percentage of
fibroglandular tissue present (11). The 5th edition of BI-RADS (10), published in 2013, redefines the
density categories; it excludes the numeric quartiles for percentage density based
on dense area readings and describes the distribution on the basis of possibly
having an obscured lesion. Although the reliability and reproducibility of visual
assessments are limited by inter-observer and intra-observer variability, BI-RADS is
the most commonly used method for the assessment of breast density in clinical
practice (12).
Additionally, automated volumetric methods for density assessments have been
introduced. Although they are easily reproducible, they have not been incorporated
into the clinical setting (13–16).Visual assessments of the BI-RADS breast density category are subjective, and the
level of agreement between readers varies in the literature from “slight” to “almost
perfect” (17–24). This discrepancy persists due to many
reasons, which include differences in the study populations, the reader’s level of
experience, and the BI-RADS version and methods used in the study. The aim of the
present study was to examine the variation in assessment of breast density using two
versions of BI-RADS, which were performed by two readers with different levels of
experience.
Material and Methods
Study design
A total of 330 full-field digital mammography (FFDM) examinations with and
without symptoms of breast cancer that were acquired in our university hospital
between January 2018 and March 2018 were retrospectively analyzed in the study.
Women with a history of breast surgery, breast augmentation, chemotherapy, and
lesions detected by mammography were excluded.Routine craniocaudal and mediolateral oblique views were obtained for each
breast. FFDM images required the Hologic Selenia Dimension system (Hologic,
Bedford, MA, USA) using a dedicated FFDM (Hologic, Selenia Dimensions) with
standard-screening automatic exposure control. The monitor settings and reading
conditions remained unchanged throughout the study.One technologist with 10 years of mammography experience performed the exam to
obtain similar compression technics in all patients. The compression was
complete when blanching occurred on the breast or the patient could not tolerate
any more pressure.Mammographic density was analyzed according to the 4th and 5th editions of the
BI-RADS by two readers with different levels of experience. One reader was a
breast radiologist with five years of experience in reading mammograms and the
other was a third-year radiology resident with six months of experience in
breast radiology. All mammograms were read four times by each radiologist— twice
using the 4th edition of the ACR BI-RADS guidelines and twice using the 5th
edition. Each reading was separated by a one-month interval. The presentation of
cases within each reading session was randomized to reduce bias. Each reader was
blinded to the results of any previous reading. The radiologists were also
blinded to patient information.
ACR BI-RADS density
The 4th edition of the ACR BI-RADS guidelines relies on the percentage of
fibroglandular tissue within the total breast using craniocaudal and
mediolateral oblique views. Breasts with glandular densities of <25%,
25%–50%, 50%–75%, and >75% were assigned a BI-RADS density value of 1, 2, 3,
and 4, respectively (11).In the 5th edition, the percentage system was redefined with an emphasis on the
possibility of having an obscured lesion. The categories were defined as
follows: category A = almost entirely fat; category B = scattered fibroglandular
densities; category C = heterogeneously dense; and category D = extremely dense
(10).The intra-reader and inter-reader analyses were performed for each of the two
BI-RADS versions using four- and two-category scales. BI-RADS assessment
categories 1 and 2 or A and B were considered non-dense, and categories 3 and 4
or C and D were considered dense. In the case of differences in the assessment
between two breasts, the BI-RADS breast density category was assigned on the
basis of the denser breast.
Ethical considerations
Our institutional review board approved this retrospective study (decision
number: 2018-03-03), and the requirement for informed consent was waived.
Statistical analysis
Cohen’s kappa coefficient (k) and standard error were calculated to measure
inter- and intra-reader variability. Because both variables are ordinal scales,
we used k with linear weights. Inter-observer agreements and comparison of
density assignment distributions were performed according to the radiologists’
first assessments.The kappa values were interpreted as suggested by Landis and Koch (25): a kappa value of
0.20 indicates slight agreement; 0.21–0.40, fair agreement; 0.41–0.60, moderate
agreement; 0.61–0.80, substantial agreement; and 0.81–1.00, almost perfect
agreement. The analyses were also performed for the two broader categories:
non-dense and dense.Statistical analysis was performed using SPSS statistical analysis software (PASW
Statistics, version 21.0.0; SPSS Inc., Chicago, IL, USA) and
P < 0.05 was considered to be statistically significant.
Results
The mean age of the 330 participants was 51.1 years (age range = 35–76 years). The
mean percentage of mammograms rated as BI-RADS density 1 was 36% and category A
accounted for 23%. BI-RADS density 2 was 33% and category B was 37%. BI-RADS density
3 was 26% and category C was 24%. BI-RADS density 4 was 13% and category D was 24%
(Fig. 1).
Fig. 1.
Assessments of two readers with BI-RADS 4th and 5th editions on a
four-category scale.
Assessments of two readers with BI-RADS 4th and 5th editions on a
four-category scale.
Agreement analyses: BI-RADS category
The intra-reader agreement of the breast radiologist for the 4th and 5th editions
of BI-RADS was almost perfect (k = 0.90 and k = 0.87, respectively). The
resident had similar results (k = 0.88 and k = 0.87, respectively). The
intra-reader agreement for the 4th and 5th editions of BI-RADS was moderate for
both the breast radiologist and resident (k = 0.62, k = 0.69, respectively). The
distribution of breast density for the four-cycle analyses of both readers and
the changes that occurred by shifting the BI-RADS versions from the 4th to the
5th edition were shown in Fig.
1. The breast radiologist was defined as Reader 1. The resident was
defined as Reader 2.The agreement between the breast radiologist and resident with regard to the 4th
and 5th editions of BI-RADS was substantial (k = 0.70 and k = 0.63,
respectively). It was almost perfect for category 1, substantial for category 2,
and moderate for category 3 for both editions. The inter-reader agreement for
category 4 changed from moderate to substantial when using the 5th edition
instead of the 4th edition (k = 0.49 and k = 0.61, respectively). With the
exception of category 4, the inter-reader agreement for the individual
categories generally decreased when using the 5th edition, although they stayed
within the same agreement levels (Table 1).
Table 1.
Inter-reader agreement for the 4th and 5th versions of the ACR BI-RADS
using the four-category scale.
Outcome
Kappa
Z
Prob > Z
Inter-reader agreement for the 4th version of
BI-RADS
Density 1
0.8603
38.28
0.0000
Density 2
0.7275
32.37
0.0000
Density 3
0.5826
25.93
0.0000
Density 4
0.4940
21.98
0.0000
Combined
0.7010
51.24
0.0000
Inter-reader agreement for the 5th version of
BI-RADS
Density A
0.8332
37.08
0.000
Density B
0.6734
29.96
0.000
Density C
0.4078
18.15
0.000
Density D
0.6092
27.11
0.000
Combined
0.6352
48.40
0.0000
Inter-reader agreement for the 4th and 5th versions of the ACR BI-RADS
using the four-category scale.
Agreement analyses: non-dense versus dense
The intra-reader agreement of the breast radiologist for the 4th and 5th editions
of BI-RADS was almost perfect (k = 0.93 and k = 0.92, respectively). It was
nearly the same for the resident (k = 0.92–0.90) when using the two-category
scale assessment. The intra-reader agreement between the 4th and 5th editions of
BI-RADS was substantial for the breast radiologist and almost perfect for the
resident (k = 0.68 and k = 0.81, respectively).The agreement between the breast radiologist and resident for the 4th and 5th
editions of BI-RADS when using the two-category scale was almost perfect
(k = 0.89 and k = 0.81, respectively). There was a statistically significant
difference with regard to the two-scale analysis between the dense and non-dense
categorization for both readers (McNemar’s test, P < 0.001)
(Table 2).
Table 2.
Intra-reader agreement for the 4th and 5th versions of the ACR BI-RADS
with respect to the two-category scale.
Non-dense
Dense
Total
Reader 1 V5
Reader 1 V4*
Non-dense
177
44
221
Dense
6
103
109
Total
183
147
330
Reader 2 V5
Reader 2 V4[†]
Non-dense
186
27
213
Dense
2
115
117
Total
188
142
330
*k = 0.68; McNemar’s test, P < 0.001.
†k = 0.81; McNemar’s test,
P < 0.001.
Intra-reader agreement for the 4th and 5th versions of the ACR BI-RADS
with respect to the two-category scale.*k = 0.68; McNemar’s test, P < 0.001.†k = 0.81; McNemar’s test,
P < 0.001.
Discussion
In the present study, the density distribution under the BI-RADS 4th edition
guidelines was as follows for the breast radiologist: 67% for the non-dense breast
and 33% for the dense breast; the distribution for the resident was 64.5% for the
non-dense breast and 35.5% for the dense breast. Before the introduction of the
BI-RADS 5th edition, the non-dense and dense breast tissues were nearly equally
distributed within the general screening population, with 10% almost entirely fatty,
40% scattered fibroglandular, 40% heterogeneously dense, and 10% extremely dense
(26–28). When using the BI-RADS 4th edition in
the present study, the non-dense breast was seen more frequently than the rate
reported in the literature. Although the reports of dense breasts increased and the
distribution became closer when changing the BI-RADS version, the most commonly seen
pattern in our study group was again the non-dense pattern. The breast radiologist
reported 55.4% as non-dense breasts and 44.6% as dense breast, whereas the resident
reported 57% as non-dense breasts and 43% as dense breasts. This discrepancy may
have been attributed to differences in the study group characteristics, such as in
age, body mass index, menopausal status, and whether the examination was for
screening or diagnostic purposes. From these parameters, we could only evaluate the
mean age of the study group, which was 51 years; it was not greater than the mean
age of the general screening population reported in the literature (29,30). Another reason for that density
distribution might be the geographical variation but there is no published
literature about the breast density of Turkish women.To ensure consistency with the breast density assessments, the present study analyzed
the extent of breast density classification and agreement levels when the 4th and
5th editions of BI-RADS was used by two readers with different experience levels.
The results showed near-perfect intra-reader agreement when using both editions with
a four-category scale. The results were similar when using a two-category scale.
However, there was no evidence indicating that the consistency changed as a result
of different experience levels. Eom et al. (31) conducted their study, which used
similar methodology with six readers and three different levels of experience (two
breast imaging experts, two general radiologists, and two medical students) and
showed substantial to near-perfect intra-reader agreement when using the BI-RADS 5th
edition. They also found no statistically significant difference with regard to the
different levels of experience for the intra-reader agreement. Additionally, they
studied the agreement between the Volpara automated volumetric breast density
measurements and those obtained using the BI-RADS; they found that expert
radiologists showed higher consistency with regard to the volumetric and qualitative
assessments than did the students (31).In the literature, intra-reader agreement levels were found to be substantial and
nearly perfect (18,22,23,31,32). Similar to the results reported in the
literature, we found a high intra-reader agreement level, suggesting that visual
methods might be reproducible without any effect from the experience level of the
reader. However, it should be mentioned that reliability might be more important
than reproducibility; the variation in radiologist assessment of breast density may
present a problem in screening strategies that use breast density to personalize
screening methods. To our knowledge, one of the oldest studies presented by
Kerlikowske et al. (5)
showed that intra-radiologist agreement was higher than the inter-radiologist
agreement with regard to breast density analysis, which are findings similar to
ours. Two of our readers showed near-perfect consistencies in their judgment
regardless of the BI-RADS version used. However, the inter-reader agreement became
slightly lower in the present study. To overcome this objectiveness, several
automated software programs have been developed for density quantification; these
provide a highly reproducible (13–16) and objective method to measure
density. In the present study, we did not aim to compare the reliability of BI-RADS
with other qualitative methods, as our focus was on the reproducibility of breast
density measurements under different conditions.Inter-reader agreement was found to be broad in the literature, with a range of kappa
values indicating slight agreement (17), moderate agreement (18,19), substantial agreement (20–23), and almost-perfect agreement (24). In the aforementioned
studies, the readers were chosen with different levels of experience and different
editions of the BI-RADS lexicon were used. Additionally, the kappa analysis, which
can be done linearly or weighting may affect the agreement level. For example, the
best inter-reader agreement was presented in the study by Østerås et al. (24) which used
quadratic-weighted kappa values different from those used in our study.
Additionally, the radiologists in the aforementioned study went through a training
program which increased compliance. Those differences in methodology might be
attributed to different agreement levels. In the present study, inter-reader
agreement between the breast radiologist and resident was substantial. However, it
should be noted that during the study period, the resident worked in the same
department with the breast radiologist. Also, because the present study was
conducted when the resident was training in breast imaging, this might have
influenced the high agreement levels obtained. Whether this level of agreement can
be sustained by the resident over time should be evaluated.Additionally, the used of different BI-RADS editions may affect consistency. The
study by Ekpo et al. (23)
was the first to use the 5th edition of BI-RADS for calculating the intra-reader and
inter-reader agreement. Their results showed similar agreement levels to those of
the present study. They assessed substantial to almost-perfect inter- and
intra-reader agreement levels with 1000 cases evaluated by five radiologists from
the same institution. Their study differed from ours in that the readers comprised
West African College of Radiology board-certified general radiologists with the
number of years of experience in the range of 9–16 years (median = 11 years). In the
present study, the resident showed similar inter-reader agreement with the breast
radiologist, even with a low experience level. To our knowledge, there is currently
no study in the literature that evaluates the effect of the reader’s experience
level on compliance.When using the 5th edition of the BI-RADS guidelines instead of the 4th edition,
studies aimed to clarify similarly to us that more subjective judgment with leaving
the percentage assessment might be done about breast density or on the other hand
the simplicity by analyzing the possibility of lesions obscured by fibroglandular
tissue might bring more consistency (31–35). In 2016, Irshad et al. (33) reported lower
inter-reader agreement using the 5th edition (k = 0.57) than when using the 4th
edition (k = 0.63). In our study, we found substantial inter-reader agreement, which
is higher than that reported by Irshad et al. Similarly, with the exception of
category 4, the inter-reader agreement of individual categories slightly decreased
when using the 5th edition, even if it stayed within the same agreement level (Table 1). The inter-reader
agreement of category 4 changed from moderate to substantial when using the 5th
edition (k = 0.49 and k = 0.61, respectively), which are results new to the
literature. A reason for this is that it is the radiologist’s job to define lesions
and determine which densities obscure a lesion most should be a comprehensible issue
for a radiologist.The majority of studies in the literature that analyzed the changes when using the
5th edition instead of the previous version, found statistical differences between
the dense and non-dense groups; this might affect the screening approach, and
different supplemental methods may come into question (33–35). A study by Alikhassi et al. (32), which used a
methodology similar to the present study, showed that the percentage of dense
breasts was higher when radiologists used the 5th edition of the ACR BI-RADS
guidelines than when they used the 4th edition. This finding was not in agreement
with those reported in the literature; however, this difference was not found to be
statistically significant in their study. Their incompatible result may be
attributed to the low number of mammograms (n = 72) they analyzed. In the present
study, we found statistically significant differences between the dense and
non-dense distributions when using the BI-RADS 5th edition—results that are
consistent with those in the literature. It should be understood that this
difference also occurred with the resident’s analysis. This may present an
opportunity to include an additional screening test, such as an ultrasound
examination, that could identify additional mammographically occult breast cancers.
On the other hand, this additional workload may increase healthcare costs. Further
studies are needed to evaluate the effectiveness of adding supplemental tools for
personalized screening when using the BI-RADS 5th edition for breast density
assignment. The effect of increasing adjuvant screening tools should be
evaluated.The present study some some limitations. First, it was performed by two radiologists
with two different levels of experience at the same institution. This may have led
to similar assessments based on shared practice patterns. Also, the small sample
size used in the study might limit the generalizability of the results. A larger and
multi-institutional study may elicit more accurate findings.In conclusion, although it might be seen as a sign of consistency, intra- and
inter-reader agreement levels were high when using both versions of the BI-RADS
guidelines, regardless of differences in experience level. Because the percentage of
dense breasts was higher when the radiologists used the 5th edition of the ACR
BI-RADS, there may be an opportunity to introduce a second step for a more
personalized screening approach. Also it might be continue to analyse the effect of
changing the BI-RADS version on the different density categories individually.
Authors: D Bernardi; M Pellegrini; S Di Michele; P Tuttobene; C Fantò; M Valentini; M Gentilini; S Ciatto Journal: Radiol Med Date: 2012-01-07 Impact factor: 3.469
Authors: E A Ooms; H M Zonderland; M J C Eijkemans; M Kriege; B Mahdavian Delavary; C W Burger; A C Ansink Journal: Breast Date: 2007-12 Impact factor: 4.380
Authors: A Redondo; M Comas; F Macià; F Ferrer; C Murta-Nascimento; M T Maristany; E Molins; M Sala; X Castells Journal: Br J Radiol Date: 2012-09-19 Impact factor: 3.039
Authors: Diana S M Buist; Peggy L Porter; Constance Lehman; Stephen H Taplin; Emily White Journal: J Natl Cancer Inst Date: 2004-10-06 Impact factor: 13.506