Literature DB >> 29479948

Errors in Mammography Cannot be Solved Through Technology Alone

Ernest Usang Ekpo¹, Maram Alakhras, Patrick Brennan.

Abstract

Mammography has been the frontline screening tool for breast cancer for decades. However, high error rates in the form of false negatives (FNs) and false positives (FPs) have persisted despite technological improvements. Radiologists still miss between 10% and 30% of cancers while 80% of woman recalled for additional views have normal outcomes, with 40% of biopsied lesions being benign. Research show that the majority of cancers missed is actually visible and looked at, but either go unnoticed or are deemed to be benign. Causal agents for these errors include human related characteristics resulting in contributory search, perception and decision-making behaviours. Technical, patient and lesion factors are also important relating to positioning, compression, patient size, breast density and presence of breast implants as well as the nature and subtype of the cancer itself, where features such as architectural distortion and triple-negative cancers remain challenging to detect on screening. A better understanding of these causal agents as well as the adoption of technological and educational interventions, which audits reader performance and provide immediate perceptual feedback, should help. This paper reviews the current status of our knowledge around error rates in mammography and explores the factors impacting it. It also presents potential solutions for maximizing diagnostic efficacy thus benefiting the millions of women who undergo this procedure each year. Creative Commons Attribution License

Entities: Chemical Disease Gene Species

Keywords: Mammography; radiographic image interpretation; cancer detection; diagnostic imaging; radiological errors

Mesh：

Year: 2018 PMID： 29479948 PMCID： PMC5980911 DOI： 10.22034/APJCP.2018.19.2.291

Source DB: PubMed Journal: Asian Pac J Cancer Prev ISSN： 1513-7368

Introduction

Breast cancer is the most prevalent female cancer; it is the most frequent cause of cancer death in females in low and middle-resource countries and the second leading cause of cancer-related deaths in the developed world (Desantis et al., 2013; Ferlay et al., 2013; Jemal et al., 2011; Lauby-Secretan et al., 2015). Survival rates have improved due to improvement in early detection and treatment strategies (Njor et al., 2012). Early detection of the disease can be achieved through breast self-examination and clinical examination; however, some breast lesions are non-palpable and require visual assessment through imaging (Hou et al., 2002; Rosen et al., 1999; Tartter et al., 1999). Screening mammography is the frontline imaging tool for early detection and has been credited with 30% to 40% reduction in mortality from breast cancer (Lauby-Secretan et al., 2015; Njor et al., 2012; Paci, 2012). However, mammography may have some drawbacks such as false positive (FP) and false negative (FN) screening outcomes. Normal breast parenchymal perturbations on mammograms or benign lesions may mimic breast cancer, and may lead to wrong diagnosis of cancer when none is present (FP errors) (Castells et al., 2016). FP errors result in low positive predictive value (PPV) (10%), psychological effects (Molina et al., 2016), and recall for further assessment (Vernacchia and Pena, 2009). On the other hand, cancer may be present in a mammogram but missed by breast readers and constitute a FN error (Evans et al., 2013). Some of the cancers missed on screening may be clinically and mammographically occult (Buchberger et al., 2000; Tartter et al., 1999) or demonstrate subtle radiographic features that are difficult to perceive (Banik et al., 2011). However, some of the missed cancers are visible on mammograms but are either not identified or are disregarded by breast readers (Evans et al., 2013; Maxwell, 1999). Such FN errors account for 1.3% to 45% of missed cancers, with error rates determined by the subtype and characteristics of the cancer (Banik et al., 2011; Evans et al., 2013; Maxwell, 1999). The literature demonstrates considerable variation in the error levels (Evans et al., 2013; Maxwell, 1999), which result from differences in the physical characteristics of the population being screened (Ekpo et al., 2015; Mandelson et al., 2000), technological and technical factors during image acquisition (Holland et al., 2016; Popli et al., 2014), the subtypes and radiographic features of breast lesions (Bird et al., 1992; Burrell et al., 2001;Suleiman et al., 2016a), and the characteristics of breast imaging readers and reading conditions (Brady et al., 2012; Kerlikowske et al., 1998). The basis of detection and interpretative errors appears to be a combination of human and technological limitations, however this is not well understood. We urgently need to address this gap in knowledge so that reliable and accurate screening strategies can be developed using more effective technology and education. An understanding of how patient, technical, lesion, and reader-related factors impact upon mammography errors and ways of mitigating them may be the key to improved early cancer detection and further reduction in mortality from breast cancer. Therefore, this paper explores factors responsible for mammography errors including characteristics of the screening population, types and radiographic features of breast lesions, technological and technical factors, characteristics of breast imaging readers, and reading environment. It also examines the types of errors made and potential solutions to mitigate these errors.

1. Factors influencing mammographic performance

This section examines factors affecting the outcome of mammographic image interpretation. These include characteristics of the screening population, lesion characteristics as well as technological, technical and reader related factors and reading conditions.

1.1 Characteristics of the screening population

Patients’ physical characteristics such as body habitus, breast density (Ekpo et al., 2015; Freer, 2015), use of hormone replacement therapy (HRT) (Banks et al., 2006; Carney et al., 2003; Kavanagh et al., 2005), breast augmentation and implant, as well as disease prevalence (Fajardo et al., 1995; Handel, 2007; Miglioretti et al., 2004) may impact upon the detection of breast cancer using mammography. Patients’ body habitus such as body mass index (BMI) and breast size have been shown to affect the performance of mammography (Bassett et al., 2000; Elmore et al., 2004; Hunt and Sickles, 2000; Njor et al., 2016; Popli et al., 2014). These two characteristics are inter-related, with overweight (BMI: 25 – 29.9), and obese (BMI: ≥30) women demonstrating larger breasts (Brown et al., 2012). A 3.0% to 38% increase in mammographic sensitivity has been reported for overweight women (BMI: ≥24.9) compared with those who have normal BMI (Banks et al., 2004; Njor et al., 2016), however, no significant differences in recall rates and specificity between these categories of women have been reported (Banks et al., 2004; Elmore et al., 2004; Njor et al., 2016). There are various potential causal agents with regards to breast size which may affect cancer detection; the small-sized imaging plates used in mammography systems has been shown to pose difficulties in positioning large floppy breasts, and often negatively affect compression, compromise image sharpness, and increase blur (Bassett et al., 2000; Popli et al., 2014). Inadequate positioning and compression of large breasts increases tissue superimposition and non-uniformity, and imaging large breasts on small-sized imaging plates may result to incomplete coverage of breast regions such as the posterior portions and pectoral muscles (Bassett et al., 2000; Popli et al., 2014). Together, these negative confounders may increase the chances of errors during mammography interpretation. Breast composition also influences the difficulty of cancer detection in mammography. Breast density (the proportion of the breast composed of fibroglandular tissue) and solid breast cancers are mammographically radiodense (Ekpo et al., 2015). Dense tissue produces a masking effect, which reduces the ability to visualize cancer in mammograms (Ekpo et al., 2015; Mandelson et al., 2000; Pisano et al., 2008). Studies have consistently demonstrated lower sensitivity in dense breasts, ranging from 27% to 70.1% compared to 90% in fatty breasts (Carney et al., 2003; Mandelson et al., 2000; Pisano, et al., 2008; Rosenberg et al., 1998). A recent study has reported an association between breast density and recall for additional examination (Ekpo et al. 2016a). It has also been shown to account for 16% higher incidence of interval breast cancer relative to fatty breasts (Boyd et al., 2014). Breast density is inversely related to BMI (Ekpo, 2016a), therefore it is unsurprising that women with low BMI demonstrate 3.0% to 38% reduced sensitivity with mammography as discussed previously (Banks et al., 2004; Njor et al., 2016). Age is another factor that affects cancer detection in mammography. Mammography is performed with women standing erect, a position that is difficult to maintain by frail elderly women especially with the pain from compression. This can cause inadequate positioning and motion blur, reducing the visibility of image details and reader ability to detect microcalcifications (Abdullah et al., 2017; Popli et al., 2014; Rosen et al., 2002). The composition of the breast changes with age, with women younger than 50 years demonstrating high breast densities compared to older women (McCormack and dos Santos, 2006). This age-related difference in breast composition has been shown to be concomitant with mammography performance (Carney et al., 2003; Rosenberg et al., 1998). The reported sensitivity of mammography for women younger than 50 years varies from 54.0% to 78.0% compared with 78% to 85% in women aged 70 years and older (Carney et al., 2003; Houssami et al., 2003; Keen and Keen, 2008). This is reasonable given that the breast undergoes atrophic changes with ageing and becomes replaced by fat, which appears radiolucent relative to cancer lesions thereby enhancing the visibility of these lesions on mammograms. On the other hand, masking effect produced by dense tissue reduces the visibility of cancer in mammograms of younger women (Boyd et al., 2007). The limitation of mammography in younger women has given rise to recommendations for use of other imaging modalities such as ultrasound for imaging this category of women (Devolli-Disha et al., 2009; Houssami et al., 2003). Exogenous hormonal agents play a critical role in breast tissue changes as demonstrated by the variations in the mammographic appearance of breast parenchyma between users and non-users of hormonal substances (Buist et al., 2004). HRT use has been shown to lower mammography performance by increasing FP recall (Banks et al., 2006). The literature shows that HRT use is associated with 7%-22% and 12%-50% reductions in mammographic sensitivity and specificity respectively (Banks et al., 2006; Carney et al., 2003; Kavanagh et al., 2005). Reduced mammography performance has been shown to be particularly pronounced in users of HRT regimens containing estrogen and progesterone (Banks et al., 2006), which can be attributed to their effect on increasing breast density (Buist et al., 2004). This is further supported by the fact that hormonal agents such as tibolone, which do not increase breast density (Ekpo, 2016b), also do not affect mammography performance (Banks et al., 2006). HRT use is also associated with a higher risk of benign breast disease (Rohan and Miller, 1999), a determinant of FP mammography outcome and risk of subsequent cancer (Castells et al., 2013). Thus, the impact of hormonal agents on mammography performance should be considered when referring patients for screening and when interpreting their mammograms in order to minimise FP recall. Breast implants and augmentation materials such as silicon have been shown to affect mammographic performance and can lead to omission errors, mainly because of their high X-ray attenuation coefficient and opaque appearance on mammograms (Handel, 2007). It has been reported that breast augmentation and implants produce capsular contraction that obscure the visualization of breast tissue by 15% to 88% depending on the size and opacity of the material (Handel et al., 1992; Silverstein et al., 1991). They have been shown to limit the detection of breast cancer in mammograms by about 22% (Handel 2007; Miglioretti et al., 2007). Augmentation mammoplasty is also associated with scarring and distortion of the breast parenchyma (Handel 2007; Hayes et al., 1988), and the opacities created by these scars may mimic malignant calcifications and subtle cancer types such as architectural distortion (AD) mammographically (Handel 2007; Silverstein et al., 1991). Previous breast conservative surgery alters breast architecture (Piroth et al., 2014), and has been shown to reduce mammography sensitivity by 9.1 – 10% (van Breest et al., 2012a; van Breest et al., 2012b). Such surgical procedures are also associated with FP errors (Holli et al., 1998), with FP rates doubling in women who have had post-surgical radiotherapy (Holli et al., 1998). Together, patients’ physical characteristics discussed above have the potential to limit adequate visualization of breast parenchyma for features of cancer. They can also cause parenchyma perturbations that may increase the likelihood of FP errors. The prevalence of disease in the population may also impact upon interpretive performance by affecting reader expectation and influencing their search, perceptual and decision-making behavior and confidence (Evans et al., 2013; Gur et al., 2007; Reed et al., 2011). Studies have shown that low disease prevalence lowers readers’ level of concentration, and that normal images tend to attract more scrutiny and fixations at higher disease prevalence (Evans et al., 2013; Gur et al., 2007; Reed et al., 2011). Evidence shows that with increased prevalence, visual search increases and confidence that a normal image is in fact normal decreases, however confidence for abnormals remains unchanged (Fanshawe et al., 2016; Gur et al., 2007; Reed et al., 2011). Therefore, high disease prevalence may increase FP errors and recall rates in a screening scenario.

1.2 Types and radiographic features of breast lesions

The variability in lesion morphology and its effect on the heterogeneity of the breast parenchyma makes cancer detection and characterization challenging (Bird et al., 1992; Popli, 2001). Lesion characteristics and mammographic appearances that have been shown to impact upon radiographic detection and characterization include size, shape, density, margins and subtlety (Bird et al., 1992; Burrell et al., 2001; Mello-Thoms et al., 2014). Lesion location and its impression on adjacent breast tissues is also an important factor affecting cancer detection (Bird et al., 1992; Burrell et al., 2001; Mello-Thoms et al., 2014). Small-sized lesions have been shown to be more difficult to detect than larger ones (Malich et al., 2003; Mello-Thoms et al., 2014). Even when lesions are clearly visible, their shape, margins and density would determine their classification as benign or malignant (Popli, 2001). Radiologically, round and oval masses with fatty or low-fat content and well-defined borders are associated with benign conditions (Popli, 2001). Isodense masses with lobulated, obscured and ill-defined or indistinct margins are classified as suspicious (James et al., 2010; Lee et al., 2014; Popli, 2001). Highly suspicious lesions are of high density, irregular in shape, with spiculated, ill-defined or indistinct margins (James et al., 2010; Lee et al., 2014; Popli, 2001). Masses demonstrating these characteristics are easily detected (Bird et al., 1992), however, about 10% of malignant lesions demonstrate benign features (round, oval, well-defined), and sometimes spiculations and parenchymal changes induced by malignant masses may be subtle and difficult to perceive (Bird et al., 1992; Roberts-Klein et al., 2011). These scenarios may lead to potentially malignant lesions being overlooked or misinterpreted, with wrong interpretation accounting for 52% of errors in mammography (James et al., 2010; Lee et al., 2014). Previous works showed that cancer subtype and characteristics influence the difficulty of detection and characterization with mammography, with triple-negative breast cancers and invasive lobar carcinoma more difficult to detect on screening, yet constituting the most common subtype of interval cancer (Caldarella et al., 2013; Domingo et al., 2010; Hoff et al., 2012; Johnson et al., 2015; Lowery et al., 2011; Raposo et al., 2012; Rayson et al., 2011; Sung et al., 2016). Lesion subtlety is also a determinant of cancer detection, and contributes to 43% of mammographically missed cancers (Bird et al., 1992). Subtle masses such as architectural distortion (AD) are particularly difficult to detect (Gaur et al., 2013; Suleimanet al., 2016a) or characterize, which can be due to the plethora of conditions associated with its radiographic features. In mammograms, AD features may be due to malignancies such as invasive lobar carcinoma and ductal carcinoma in-situ or benign conditions such as radial scars, previous surgery, trauma, sclerosing adenosis, infection, and fat necrosis (Gaur et al., 2013). Lesion features such as non-specific or asymmetric densities, isodensity to fibroglandular breast tissue, indistinct margins, and absence of calcifications and ductal dilatations have also been shown to account for missed cancers (Hoff et al., 2012; Majid et al., 2003). Although the visibility of these features has improved in the digital era, they have low PPV and may be overlooked (Hoff et al., 2012). Thus, cancer subtype and subtlety, as well as the low PPV associated with the features described above may limit perception or reporting of perturbations produced by malignancy. This emphasises the need for human and technological interventions to ensure early detection of these missed signs.

1.3 Impact of technical factors, image quality, and mammography technology on image interpretation

The conspicuity of lesions in their background depends on the technical and radiographic quality of the images produced (Ekpo et al., 2014). Technical factors such as positioning, compression and exposure conditions affect the amount of breast tissues imaged, tissue spread and the visibility of lesion features (Holland et al., 2016). The difficulty of positioning large floppy breasts and older women and their impact on the accuracy of image interpretation have been discussed in section 1.1 (Bassett et al., 2000; Popli et al., 2014). Inadequate positioning is 26% more common in women with BMI ≥30 compared to those having normal BMI (Guertin et al., 2014), and mammograms with inadequate positioning, image quality and compression have been shown to demonstrate 18% lower sensitivity compared to those of good overall quality (Taplin et al., 2002). A study has also reported varying PPV at different degrees of compression, with highest compression demonstrating reduced lesion detection (Holland et al., 2016). All these indicate that technical factors contributing to missed cancers must be addressed. Advances in imaging technology have led to the transition from screen-film mammography (SFM) to DM. Although SFM has better spatial resolution compared to DM, it has lower contrast resolution (Faridah, 2008). Whilst spatial resolution is relevant to the detection of high spatial frequency lesions such as microcalcifications, the higher contrast resolution of digital systems allows better differentiation of densities as well as normal and diseased tissues on an image (Faridah, 2008). In addition, post-processing capabilities of DM offers opportunities to manipulate image contrast to suit a particular detection task. However, studies comparing the diagnostic performance of SFM and DM have generated conflicting results (Bluekens et al., 2012; Hambly et al., 2009; Pisano et al., 2005; Pisano et al., 2008; Skaane, 2009), with many demonstrating comparable or slightly better cancer detection performance of DM in all breast compositions (Bluekens et al., 2012; Hambly et al., 2009; Pisano et al., 2005; Skaane, 2009), albeit with higher FP recalls (Hambly et al., 2009; Skaane and Skjennald, 2004). Despite important advances in mammography technology, the sensitivity of DM is still below optimal levels and varies between readers (Clauser et al., 2016; Pisano et al., 2005; Pisano et al., 2008). It is clear that improvement in imaging technology is not the only solution for removing detection errors, and instead we must identify and remedy human errors limiting breast cancer detection, if early breast cancer diagnosis is to be transformed.

1.4 Reader characteristics and interpretative performance

Evidence shows wide inter-reader differences in cancer detection with mammography, suggesting that humans are a major determinant of mammography performance (Evans et al., 2013; Jackson et al., 2015; Maxwell, 1999). It also suggests that reader characteristics such as experience, specialization, number of mammograms read per year, and other factors may influence error rates. Of these, number of years reading mammograms and the number of cases read per year are the most widely studied observer characteristics (Suleiman, 2016b). Experience is quantified using parameters such as specialization in breast radiology, number of years of reading mammograms, and number of mammograms read per year (Rawashdehet al., 2013). These factors considered in isolation have generated contradictory outcomes. For example, Sickles et al (Sickles et al., 2002) reported a two-fold higher cancer detection rate for specialist radiologists compared to general radiologists, but this was not always consistent in other studies comparing radiologists and radiographers (Debono et al., 2015; Torres-Mejía et al., 2015). Whilst number of years of reading mammograms has been shown to improve performance in some studies (Rawashdehet et al., 2013; Suleiman, 2016b), it has demonstrated no significant effect on observer performance in other studies (Jackson et al., 2015; Suleiman et al., 2014a). The number of mammograms read per year has been reported as a good indicator of observer performance, and a potential optimisation tool for breast screening programs (Rawashdeh et al., 2013; Suleiman, 2016b). However, other studies have reported either no relationship or an inverse relationship between number of cases read per year and performance (Beam et al., 2003; Molins et al., 2008). It has also been shown that beyond a threshold annual volume read, performance stagnates (Rawashdehet et al., 2013) or begins to decline (Kan et al., 2000), suggesting that the impact of volume read per year on performance may be threshold dependant. Evidence suggests that the contradictory evidence for the impact of volume read on performance may be due to differences in other reader characteristics such as experience, hours spent reading mammograms per week, and importantly, the ability to identify normal image features (specificity) (Rawashdeh et al., 2013). Other confounding factors are practice-related and include double reading practices and number of diagnostic and interventional procedures performed (Beam et al., 2003), with radiologists who participate in their own diagnostic mammography and have higher volumes of work-ups demonstrating better outcomes compared to those who do not (Buist et al., 2014). Despite the conflicting evidence for the number of mammogram cases read per year on reader performance, it is used as a criterion for certification in many countries (Rawashdeh et al., 2013). For example, certification for mammography reporting in the USA requires 960 cases read biannually, whilst in Australia and European countries, it is 2000 and 5000 cases read per year respectively (Suleiman, 2016b). However, these differences do not reflect inter-country variations in cancer detection performance by radiologists (Suleiman et al., 2014b; Suleiman, 2016b). In fact, a recent study showed no difference in performance between Australian and USA radiologists for breast cancer detection (Suleiman et al., 2014a), suggesting that parameters other than volume read per year may also be key determinants of performance. The findings above demonstrate the complexity of using volume read and number of years of reading mammograms to quantify experience. Although radiologists receive training in breast image interpretation, they are exposed to different levels of mentorship, disease prevalence, and working conditions during the course of practice (Beam et al., 2003). These differences may impact differently on expertise and consequently mammography interpretation performance (Beam et al., 2003). Available evidence emphasise the need for programs and interventions that expose readers to continuous training and mentorship. It also underscores the need for platforms to identify errors, provide immediate feedback, and identify ways of mitigating these errors.

1.5 Impact of reading environment and workload on accuracy of image interpretation

Image interpretation requires both perceptual and cognitive processes, which can be affected by ambient lighting and distractions in the reading environment. Ambient light increases reflection (diffuse and specular) and glare, however no study has reported a significant difference in observer performance between low and moderate ambient light conditions (Chawla and Samei, 2007; Pollard et al., 2009; Pollard et al., 2012). Studies have shown that distractions may lead to attentional deficit and affect radiologist’s perceptual and cognitive functions. For example, distractions due to phone call has been reported to account for a 12% discrepancy error among radiology residents (Balint et al., 2014), and negatively affects task completion time (Williams and Drew, 2017). The number of patients undergoing screening mammography daily has increased exponentially, resulting in increased workload for radiologists. This condition is exacerbated by the increasing volume of radiological data generated with the advent of 3D imaging modalities. The number of hours spent reading these images may cause fatigue and occulomotor strain and reduce image interpretation accuracy (Krupinski et al., 2010a). Studies have reported a significant reduction in image interpretation accuracy after prolonged periods of reporting (Krupinski, 2010a; Krupinski et al., 2010b; Krupinski et al., 2012), for example a recent study reported that fatigue contributed to satisfaction of search (Krupinski et al. 2012). Thus, reading conditions contribute to radiological errors and need to be optimised to mitigate missed cancers.

1.6 Impact of patient’s clinical history on accuracy of image interpretation

There are contentions that the availability of clinical history at the time of image interpretation may lead to biases and cognitive heuristics such as anchoring (locking on to salient evidence early in the interpretation process), availability (making biased judgments based on what frequently comes to mind), confirmation (looking for evidence to confirm a particular disease), representativeness (decision-making based on similarity to mental prototype), and search satisficing (abrupt termination of search after identification of irrelevant features) (Crowley et al., 2013). However to date, studies have reported at least better (Berbaum et al., 1988; Doubilet and Herman, 1981; Leslie et al., 2000; Loy and Irwig, 2004) or at worst no effect on diagnostic performance when good clinical information is provided (Cooperstein et al., 1990; Good et al., 1990).

2. Common human errors limiting mammography interpretation

Radiographic image interpretation involves evaluation and organisation of image information to make a diagnostic decision. These processes can be challenging, particularly with mammography, due to the heterogeneity of the breast parenchyma, anatomical noise arising from dense tissue masking (Ekpo et al., 2015), and the subtlety of some breast cancer types (Gaur et al., 2013). In other situations, the characterization of detected lesions can be difficult and depends on the presenting features of the lesion and the knowledge of the image reader as discussed earlier (Bird et al., 1992; Roberts-Klein et al., 2011). These factors may either cause lesions to be concealed or conspicuous and fixated cancer lesions to be ignored (Bird et al., 1992; Roberts-Klein et al., 2011). The interpretation process involves search, perception and decision-making (Berlin, 2014; Brady et al., 2012; Brunoet al., 2015; Pinto and Brunese, 2010). Faults in any of these processes results in an interpretative error.

2.1 Search errors

Search errors ensue from inadequate scanning of the image, resulting in non-fixation (no visual attention or dwell) on the perturbations produced by cancer (Brady et al., 2012). Search errors may also arise from premature termination of search due to identification of stimuli elicited by another disease condition, which may be irrelevant and unconnected to the disease of concern, a situation referred to as “satisfaction of search (SOS)” (Fleck et al., 2010). Search errors have been estimated to account for 42% of error in DM (Palazzetti et al., 2016; Pinto and Brunese, 2010), and vary with readers’ experience, workload and fatigue (Berbaum, 2010; Berlin, 2014; Nodine et al., 1996).

2.2 Perceptual errors

Sometimes malignant lesions briefly (<0.48 seconds) fixated by radiologists go unreported and constitutes a perceptual error (Bruno et al., 2015; Krupinski, 2010c). Perceptual errors account for 31% of errors in DM (Krupinski, 2010; Palazzetti et al., 2016; Samei, 2010.) and may be caused by insufficient stimuli by the lesion(s) and the nature of its background (Mello-Thoms, 2006). Perceptual error rates vary between radiologists and are inversely related to experience and availability of adequate clinical information. A report suggests that perceptual errors may be due to poor pattern recognition skills (Krupinski, 2010c).

2.3 Decision-making errors

Decision-making errors occur when the region containing the lesion is fixated for a prolonged period (>0.48 seconds), but the lesion is misidentified (Bruno, et al., 2015). Decision-making errors account for 37% of errors in DM (Krupinski, 2010c; Nodine et al., 2002; Samei, 2010.), and may be due to poor reader’s knowledge of the radiographic features of the disease and poor reader judgment (Mello-Thoms, 2006; Samei, 2010.). Other causal factors include fatigue, absence of prior images and inadequate clinical history (Krupinski, 2010c; Nodine et al., 2002). Decision-making errors are also influenced by recall rate recommendations, with lower recall rates associated with higher specificity and lower sensitivity values and vice versa (Norsuddin et al., 2016). A recent study showed that non-specific densities and AD lesions classified as malignant at free recall were dismissed at policy-driven fixed (target) recall rates (Norsuddin et al., 2016). It is therefore important that factors contributing to these errors are redressed.

3. Possible strategies to reduce errors in mammography interpretation

Despite advances in mammography technology and image quality improvement in the digital era, limitations remain around diagnostic efficacy. Therefore human factors affecting the accuracy of image interpretation must be remedied to improve the detection of missed cancers. Potential solutions include optimisation of technical and display parameters, adoption of double reading strategy, and technological and educational interventions.

3.1 Optimisation of technical and display parameters

Technical parameters that affect cancer detection such as positioning and compression need to be optimised to ensure adequate visibility of breast tissue. It is important that radiographers adopt correct positioning to ensure nipple alignment and inclusion of pectoral muscles (Taplin et al., 2002). Appropriate compression force should be applied to uniformly spread breast tissue and enhance the visibility of breast parenchyma and subtle cancers (Holland et al., 2016). Display tools and reading environment should be optimised to reduce reflection, glare, and reader fatigue (Krupinski et al., 2010b; Waite et al., 2016). Also, population-based mammography-screening programs must carefully monitor the technical quality of mammograms on a regular basis to ensure that they are adequate for purpose.

3.2 Double reading strategy

Many screening programs including the Dutch Nationwide Breast Cancer Screening Program and BreastScreen Australia employ an independent double reading strategy with arbitration to improve cancer detection rates. This strategy accounts for the differences in human perceptual and cognitive abilities and has been shown to improve cancer detection by between 5.6 – 15% (Duijm et al., 2007; Duijm et al., 2008). Studies have explored changing the order of reading, where the second reader interprets a batch of mammograms in an order opposite that of the first reader to overcome vigilance diminution, however this has not been shown to improve the effectiveness of double reading strategy (Taylor-Phillips et al., 2016; Taylor-Phillips et al., 2014). It should be acknowledged however that double reading increases financial cost (Posso et al., 2016), has wide inter-reader disagreement, and is time consuming (Ekpo et al., 2016c; Redondo et al., 2012) and therefore to reduce the limitations of cost and time, computer systems have been designed to act as a second reader (Taylor and Potts, 2008).

3.3 Use of computer-aided devices (CAD)

Technological innovations such as computer-aided detection (CAD) have been explored to mitigate perceptual errors. These devices use computerized algorithms to highlight perturbations in the image (CADe) or perform diagnostic assessment (CADx). Whilst CAD has been shown to increases sensitivity (Georgian-Smith et al., 2007; Gromet, 2008; Karssemeijer et al., 2003; Skaane et al., 2007; Taylor and Potts, 2008) there is contrasting evidence for the relative impact of double reading versus single reading with CAD (Georgian-Smith et al., 2007; Gromet, 2008; Karssemeijer et al., 2003; Skaane et al., 2007). The literature generally supports double reading to outperform single reading with CAD (Taylor and Potts, 2008) A major limitation of CAD is its high FP rates (Philpotts, 2009), which increases at lower dose levels (Wittenberg et al., 2011). As a result, some of the malignant lesions marked by CAD may be dismissed by readers. This suggests that such technology is not necessarily the complete solution for removing detection errors, and emphasises the need for educational and practical interventions to improve human perceptual and decision-making skills.

3.4 Audit and immediate feedback mechanisms

Clinical audits are used to evaluate the performance of any program and aim to identify errors in order to tailor interventions to improve performance. However, clinical audits often take a longer period to complete, delaying feedback. Also, feedback from audits is often provided to the screening program and not the individual breast reader, making it difficult for individuals to identify their errors and take corrective actions. Perceptual feedback has been shown to improve reader performance (Buist et al., 2011; Donovan et al., 2008; Krupinski et al., 1993). Therefore, platforms that audit and provide immediate perceptual feedback to readers should be explored to enhance interpretative accuracy. Examples of platforms that have been established for this purpose include Breast Reader Assessment Strategy (BREAST) and PERsonal perFORmance in Mammographic Screening (PERFORMS) (Scott and Gale, 2006; Suleimanet al., 2016c). These platforms generate mammographic test-sets containing various cancer types and radiographic features that have been missed by at least one radiologist during clinical reporting. These test-sets are hosted online enabling radiologists to evaluate and receive immediate feedback on location and type (stellate, mass, non-specific density, and AD) of lesion present in the mammogram. Breast Software Display Showing Reader’s Mark and Lesion Classification (Yellow) against the Actual Lesion Location and Type (Red). The BREAST tool provides immediate feedback to readers on cancers that present perceptual difficulties and describes their features such as stellate, discrete, architectural distortion and nonspecific density. Radiologists from any part of the world can log unto the system, undertake self-assessment and obtain immediate feedback. In so doing, they examine reader performance, identify errors, and provide immediate feedback and continuous professional development opportunities. A recent study reported improvement in cancer detection performance of radiologists undertaking BREAST intervention over time regardless of their levels of experience (Suleimanet et al., 2016c). However, more studies comparing radiologist test reading performance versus their clinical performance are needed to confirm the clinical impact of these interventions. If successful, such platforms can provide e-Learning opportunities and allow personalized learning and self-assessment, identify common mammography errors, and provide feedback to mitigate these errors. In conclusion, errors in mammography interpretation arise from patient, lesion, technical, and reader factors as well as other extraneous variables such as distraction and fatigue. Evidence shows that, despite improvement in imaging technology, the accuracy of image interpretation still suffers from intrinsic human limitations. Therefore, technology alone cannot mitigate radiological errors, suggesting that if the benefits of breast screening are to be maximized, human errors limiting early cancer detection need to be identified and remedied. Double reading and immediate feedback loops may facilitate discussions and learning opportunities to improve diagnostic efficacy.

134 in total

Review 1. Can computer-aided detection be detrimental to mammographic interpretation?

Authors: Liane E Philpotts
Journal: Radiology Date: 2009-10 Impact factor: 11.105

2. Effect of Using the Same vs Different Order for Second Readings of Screening Mammograms on Rates of Breast Cancer Detection: A Randomized Clinical Trial.

Authors: Sian Taylor-Phillips; Matthew G Wallis; David Jenkinson; Victor Adekanmbi; Helen Parsons; Janet Dunn; Nigel Stallard; Ala Szczepura; Simon Gates; Olive Kearins; Alison Duncan; Sue Hudson; Aileen Clarke
Journal: JAMA Date: 2016-05-10 Impact factor: 56.272

3. The effect of clinical history on chest radiograph interpretations in a PACS environment.

Authors: L A Cooperstein; B C Good; E A Eelkema; J H Sumkin; E K Tabor; K Sidorovich; H D Curtin; S A Yousem
Journal: Invest Radiol Date: 1990-06 Impact factor: 6.016

4. Diagnostic performance of digital versus film mammography for breast-cancer screening.

Authors: Etta D Pisano; Constantine Gatsonis; Edward Hendrick; Martin Yaffe; Janet K Baum; Suddhasatta Acharyya; Emily F Conant; Laurie L Fajardo; Lawrence Bassett; Carl D'Orsi; Roberta Jong; Murray Rebner
Journal: N Engl J Med Date: 2005-09-16 Impact factor: 91.245

5. Comparison of breast mammography, sonography and physical examination for screening women at high risk of breast cancer in taiwan.

Authors: Ming-Feng Hou; Hung-Yi Chuang; Fu Ou-Yang; Chen-Ya Wang; Chyi-Lie Huang; Hui-Mei Fan; Chieh-Han Chuang; Jaw-Yuan Wang; Jan-Singh Hsieh; Gin-Chung Liu; Tsung-Jen Huang
Journal: Ultrasound Med Biol Date: 2002-04 Impact factor: 2.998

6. Comparison of digital screening mammography and screen-film mammography in the early detection of clinically relevant cancers: a multicenter study.

Authors: Adriana M J Bluekens; Roland Holland; Nico Karssemeijer; Mireille J M Broeders; Gerard J den Heeten
Journal: Radiology Date: 2012-10-02 Impact factor: 11.105

7. Influence of personal characteristics of individual women on sensitivity and specificity of mammography in the Million Women Study: cohort study.

Authors: Emily Banks; Gillian Reeves; Valerie Beral; Diana Bull; Barbara Crossley; Moya Simmonds; Elizabeth Hilton; Stephen Bailey; Nigel Barrett; Peter Briers; Ruth English; Alan Jackson; Elizabeth Kutt; Janet Lavelle; Linda Rockall; Matthew G Wallis; Mary Wilson; Julietta Patnick
Journal: BMJ Date: 2004-08-28

8. Comparison of digital mammography and screen-film mammography in breast cancer screening: a review in the Irish breast screening program.

Authors: Niamh M Hambly; Michelle M McNicholas; Niall Phelan; Gormlaith C Hargaden; Ann O'Doherty; Fidelma L Flanagan
Journal: AJR Am J Roentgenol Date: 2009-10 Impact factor: 3.959

9. Radiographers supporting radiologists in the interpretation of screening mammography: a viable strategy to meet the shortage in the number of radiologists.

Authors: Gabriela Torres-Mejía; Robert A Smith; María de la Luz Carranza-Flores; Andy Bogart; Louis Martínez-Matsushita; Diana L Miglioretti; Karla Kerlikowske; Carolina Ortega-Olvera; Ernesto Montemayor-Varela; Angélica Angeles-Llerenas; Sergio Bautista-Arredondo; Gilberto Sánchez-González; Olga G Martínez-Montañez; Santos R Uscanga-Sánchez; Eduardo Lazcano-Ponce; Mauricio Hernández-Ávila
Journal: BMC Cancer Date: 2015-05-16 Impact factor: 4.430

10. Cost-Effectiveness of Double Reading versus Single Reading of Mammograms in a Breast Cancer Screening Programme.

Authors: Margarita Posso; Misericòrdia Carles; Montserrat Rué; Teresa Puig; Xavier Bonfill
Journal: PLoS One Date: 2016-07-26 Impact factor: 3.240

17 in total

1. Benefits of Independent Double Reading in Digital Mammography: A Theoretical Evaluation of All Possible Pairing Methodologies.

Authors: Patrick C Brennan; Aarthi Ganesan; Miguel P Eckstein; Ernest Usang Ekpo; Kriscia Tapia; Claudia Mello-Thoms; Sarah Lewis; Mordechai Z Juni
Journal: Acad Radiol Date: 2018-07-29 Impact factor: 3.173

2. Automated mammographic density measurement using Quantra™: comparison with the Royal Australian and New Zealand College of Radiology synoptic scale.

Authors: Inez Yeo; Judith Akwo; Ernest Ekpo
Journal: J Med Imaging (Bellingham) Date: 2020-05-29

3. Visual search in breast imaging.

Authors: Ziba Gandomkar; Claudia Mello-Thoms
Journal: Br J Radiol Date: 2019-07-18 Impact factor: 3.039

4. Visual bias could impede diagnostic accuracy of breast cancer calcifications.

Authors: Jessica K Witt; Amelia C Warden; Michael D Dodd; Elizabeth E Edney
Journal: J Med Imaging (Bellingham) Date: 2022-06-09

Review 5. Mandating Limits on Workload, Duty, and Speed in Radiology.

Authors: Robert Alexander; Stephen Waite; Michael A Bruno; Elizabeth A Krupinski; Leonard Berlin; Stephen Macknik; Susana Martinez-Conde
Journal: Radiology Date: 2022-06-14 Impact factor: 29.146

6. A machine learning model based on readers' characteristics to predict their performances in reading screening mammograms.

Authors: Ziba Gandomkar; Sarah J Lewis; Tong Li; Ernest U Ekpo; Patrick C Brennan
Journal: Breast Cancer Date: 2022-02-05 Impact factor: 3.307

7. Value and Diagnostic Efficacy of Fetal Morphology Assessment Using Ultrasound in A Poor-Resource Setting.

Authors: Ofonime N Ukweh; Theophilus I Ugbem; Chibuike M Okeke; Ernest U Ekpo
Journal: Diagnostics (Basel) Date: 2019-09-01

8. A novel framework for rapid diagnosis of COVID-19 on computed tomography scans.

Authors: Tallha Akram; Muhammad Attique; Salma Gul; Aamir Shahzad; Muhammad Altaf; S Syed Rameez Naqvi; Robertas Damaševičius; Rytis Maskeliūnas
Journal: Pattern Anal Appl Date: 2021-01-22 Impact factor: 2.580

9. The effect of breast density on the missed lesion rate in screening digital mammography determined using an adjustable-density breast phantom tailored to Japanese women.

Authors: Mika Yamamuro; Yoshiyuki Asai; Naomi Hashimoto; Nao Yasuda; Yoshiaki Ozaki; Kazunari Ishii; Yongbum Lee
Journal: PLoS One Date: 2021-01-07 Impact factor: 3.240

Review 10. What do radiologists look for? Advances and limitations of perceptual learning in radiologic search.

Authors: Robert G Alexander; Stephen Waite; Stephen L Macknik; Susana Martinez-Conde
Journal: J Vis Date: 2020-10-01 Impact factor: 2.240