Literature DB >> 29029029

A manifesto for cardiovascular imaging: addressing the human factor.

Abstract

Our use of modern cardiovascular imaging tools has not kept pace with their technological development. Diagnostic errors are common but seldom investigated systematically. Rather than more impressive pictures, our main goal should be more precise tests of function which we select because their appropriate use has therapeutic implications which in turn have a beneficial impact on morbidity or mortality. We should practise analytical thinking, use checklists to avoid diagnostic pitfalls, and apply strategies that will reduce biases and avoid overdiagnosis. We should develop normative databases, so that we can apply diagnostic algorithms that take account of variations with age and risk factors and that allow us to calculate pre-test probability and report the post-test probability of disease. We should report the imprecision of a test, or its confidence limits, so that reference change values can be considered in daily clinical practice. We should develop decision support tools to improve the quality and interpretation of diagnostic imaging, so that we choose the single best test irrespective of modality. New imaging tools should be evaluated rigorously, so that their diagnostic performance is established before they are widely disseminated; this should be a shared responsibility of manufacturers with clinicians, leading to cost-effective implementation. Trials should evaluate diagnostic strategies against independent reference criteria. We should exploit advances in machine learning to analyse digital data sets and identify those features that best predict prognosis or responses to treatment. Addressing these human factors will reap benefit for patients, while technological advances continue unpredictably.

Entities: Chemical Disease Gene Species

Keywords: cardiovascular imaging; clinical guidelines; diagnostic error; evidence-based medicine; metacognition

Mesh：

Year: 2017 PMID： 29029029 PMCID： PMC5837338 DOI： 10.1093/ehjci/jex216

Source DB: PubMed Journal: Eur Heart J Cardiovasc Imaging ISSN： 2047-2404 Impact factor: 6.875

Introduction

I was privileged to meet Inge Edler. When we invited him to the first EuroEcho conference, which was held in Prague in 1997, he described how he and his colleagues had paused to think about the clinical problem that faced them (how to select patients for closed mitral valvotomy, without exacerbating mitral regurgitation) and whether they could tackle it using radar or sonar. Through a series of contacts and coincidences, he met the physicist Hellmuth Hertz and they were able to start clinical studies using reflected ultrasound in October 1953. More than 40 years later, I asked Professor Edler if he had ever guessed at the future of the technique that they initiated. He declined to answer, stating that he never speculated, because it was impossible to predict technological developments. That is still the situation today. We have a wonderful array of imaging tools that already outstrip our capacity to exploit their potential or apply them appropriately. We should challenge engineers with new objectives, but we can also leave innovative technological advances to take care of themselves. A greater concern is that new methods are sometimes implemented without a clear concept of their clinical role and some become established without ever having been evaluated properly. Arguably, we adopt more complex imaging modalities before we know if they have a positive impact on outcomes, instead of relying on the best or the most efficient test for answering each diagnostic question (Figure ). The biggest challenge for specialists in imaging is how we think about, evaluate, and use the tools that are already available. Each has advantages and limitations, and each depends to some degree on expert interaction to acquire optimal images and to analyse and report them. It is this human factor that now becomes paramount and which I want to address. An example of multimodality imaging. Diagnostic images obtained in a 34-year-old man who presented in an electrical storm, having recurrent monomorphic ventricular tachycardia, and with complete heart block. He had previously had chemotherapy for an extranodal natural killer T-cell lymphoma. Transthoracic echocardiography (A, apical four-chamber view) and MR imaging (B) showed focal thickening and impaired function of the mid and basal ventricular septum. Hybrid CT and positron emission tomography (PET) (fused short-axis view, image C) demonstrated avid uptake of fluorodeoxyglucose in the septum. The PET whole-body scan (D) confirmed recurrence of lymphoma in the nasopharynx with high uptake also in the enlarged spleen. The diagnosis was apparent from the first bedside echocardiographic study, and in this patient, the subsequent imaging did not alter the outcome. A manifesto is a written statement of beliefs and policies. In that context, I offer here my personal perspective of what I think should be our priorities for improving the quality of diagnostic imaging and its contribution to clinical decision-making and outcomes for patients with cardiovascular disease.

Reducing diagnostic error

The recent classification lists almost 70 000 diagnostic codes, so it is clearly impossible for an individual physician even to be aware of all diseases. Accurate diagnosis is the foundation stone of evidence-based practice, but errors occur in 10–15% of cases. A recent report from the Institute of Medicine in the USA stated that ‘all of us will likely experience a meaningful diagnostic error in our lifetime’. Surprisingly, the commonest explanation in more than two-thirds is not poor knowledge but faulty cognition and synthesis. Most cardiologists can recall individual patients who were given a wrong diagnosis or whose diagnosis was missed or delayed, but there have been few systematic studies of diagnostic errors in cardiovascular imaging. In 2008, a retrospective analysis of more than 50 000 echocardiographic studies performed in children identified 87 errors (or 0.17%), of which 34% could be attributed to cognitive error and 31% to technical factors. Inaccuracy was more common in rare diseases (odds ratio 9.2) or when studies were performed in the recovery room after cardiac surgery (odds ratio 7.9). In a later analysis, faulty cognition still accounted for 37% of 254 diagnostic errors. In other paediatric studies, 44% of diagnoses made by non-specialists but only 3% made by experts were inaccurate when compared with surgical findings. In adult practice, a 12% error rate was reported in one study of echocardiography in 170 patients with prosthetic heart valves, a few of whom died after unnecessary reoperation. Grading of mitral regurgitation was inaccurate in 10% of reports made by 152 observers, especially if the quality of images was suboptimal. Among 89 patients diagnosed by general cardiologists after magnetic resonance (MR) imaging to have arrhythmogenic right ventricular cardiomyopathy, 73% did not meet the criteria for the disease when they were reinvestigated in a specialist centre. In a multicentre study of non-invasive coronary angiography using computed tomography (CT), significant luminal narrowing was diagnosed erroneously in 5% of segments when compared with quantitative coronary arteriography (QCA). Cardiovascular diseases such as myocardial infarction, pulmonary embolism, and acute aortic syndromes had been misdiagnosed in 18.7% of 970 autopsies. Whatever is the true incidence of cardiovascular diagnostic errors—and we need prospective studies to investigate that question—there is consensus that the major cause is faulty thinking. We can learn from psychologists how we might mitigate this. According to Daniel Kahneman, we default to intuitive or ‘System 1’ thinking because of its cognitive ease. It is fast and may be dominant in diagnostic imaging since we learn to recognize visual patterns, relying on mental shortcuts or heuristics, but it is also prone to error. Analytical or ‘System 2’ thinking, on the other hand, is deductive and deliberate; by reviewing options and choosing the most appropriate decision, a consistent diagnosis can be reached that is more scientific and less prone to error. By training ourselves to pause, we can over-ride type 1 in favour of type 2 thinking and avoid jumping to a conclusion that might be wrong. Diagnostic imaging is also subject to fashion and to cognitive biases. A current example may be left ventricular non-compaction (LVNC), which is a normal stage of myocardial development during foetal life. Three different echocardiographic criteria have been proposed, which show poor concordance. In 199 unselected patients with systolic heart failure, 24% fulfilled at least one criterion and >90% had apical segments that were hypertrabeculated. In another study that used MR, 5 LV segments were non-compacted in >70% of 45 healthy volunteers. Investigators who proposed original criteria for LVNC detected more cases up to 2006, but almost 50% of the subjects who were diagnosed recently were asymptomatic and none had any clinical events during follow-up. They were probably healthy but overdiagnosed as LVNC because it was suspected, which would be an example of confirmation bias. It is likely that some hypertrabeculation can persist in healthy hearts and that LVNC as a distinct disease is rare. The wide range of mutations that have been associated with it, without a clear answer emerging, would support this. To err is human, as we were reminded by the Institute of Medicine in 2000, but at that time few professional societies had demonstrated a visible commitment to reducing errors in health care and to improving patient safety. We are all liable to make mistakes and some misdiagnoses may be unavoidable—but ‘normalization of deviance’ is avoidable. Errors can be tolerated as long as we work to minimize their occurrence and consequent risk through individual and collective action at a personal and an institutional level. There are many effective interventions (Table )., We can practise and teach metacognition (thinking about thinking, or specifically in this context, about clinical reasoning), and we can apply strategies that are effective for debiasing., We can study and avoid common pitfalls, perhaps using diagnostic checklists. We can recognize common artefacts. We can learn from errors if we report them routinely (in a voluntary and blame-free system), investigate their causes, and share our experience with others. That practice is well established in diagnostic radiology, with discrepancy meetings, but it has not yet been specified in standards from the European Association of Cardiovascular Imaging (EACVI) relating to quality control and audit. Priorities for improving diagnostic analysis and reporting

Selecting the right diagnostic target

A common approach is to consider each diagnostic error as an isolated individual case (or an active error made by one person), but sometimes our collective failure to understand the pathophysiological background to a question or the physical properties of the imaging modality means that we apply faulty concepts and reach a mistaken diagnosis in most cases (which would be an example of a latent or systems error). Often this is related to a morphological feature that seems to have little or no functional significance. In the 1980s, there was a surge in the diagnosis of mitral valve prolapse soon after the introduction of cross-sectional echocardiography, before it was appreciated that the mitral annulus is not planar in shape. For example, 35% of healthy children aged 10–18 years were reported to have prolapse in apical four-chamber views. These were false-positive diagnoses because that imaging plane crossed the mitral annulus at its low points, and the prevalence fell to 1% if prolapse was required to be visible in other imaging planes too. Elderly subjects may have prominent LV hypertrophy that is confined to the basal or outlet ventricular septum. This has been named variously as a sigmoid septum or septal bulge or discrete upper septal hypertrophy, and it has been diagnosed as a variant of hypertrophic cardiomyopathy although a causative mutation is rarely found. Now we know that hypertensive LV hypertrophy starts at this site, because it has the greatest regional wall stress and that impaired function is related to high late systolic pressure rather than peak systolic pressure. Upper septal hypertrophy is probably a manifestation of adverse ventricular-arterial coupling. In a definitive report of 3562 subjects in the Framingham Heart Study, upper septal hypertrophy correlated with age and blood pressure, but after adjusting for risk factors it was unrelated to cardiovascular disease or mortality at 15 years. There is a huge literature on carotid intima–media thickness (cIMT) as a marker of vascular disease, perhaps because it is a very accessible target. In large epidemiological studies, it predicts outcomes, but its added value is small. In the Rotterdam study of 3580 non-diabetic subjects aged 55–75 years, cIMT improved prediction of cardiovascular risk only in women, of whom 8% were reclassified. There are almost no studies in which a reduction or slower rate of progression of cIMT as a surrogate end point has been matched by fewer clinical events., It seems that increased cIMT may be a diagnostic epiphenomenon rather than an early stage of disease, unlike atherosclerotic plaque which is more predictive., In the RISC study of 627 healthy subjects, increased cIMT appeared to be an adaptive response to ageing, obesity, blood pressure, and low density lipoprotein cholesterol. Recommendations for measuring cIMT in individuals have been taken out of clinical practice guidelines. Myocardial strain by speckle tracking is popular for quantifying LV long-axis function, because it is quick to perform and more reproducible than alternatives. Global longitudinal strain has been applied to measure responses to exercise but that takes no account of the basic principle that strain is preload dependent., Its response is biphasic, increasing at low workloads, and then declining at peak when the end-diastolic volume declines., Myocardial ischaemia is better detected by changes in strain rate, because it is not significantly influenced by preload, but that needs deformation imaging with much faster frame rates. Speckle tracking is also used to study left atrial (LA) function, ignoring the evidence that changes in LA strain can largely be explained by LV function, except during atrial contraction. There could be other examples. The message is that we need to choose the right imaging test to answer each question, by understanding the basic mechanisms of the disease and the technical capacity of the imaging tools, and by applying the most relevant research. David Sackett wrote in 1994 that ‘the ultimate criterion for the usefulness of a diagnostic test is whether it adds information that leads to a change in management that is ultimately beneficial to the patient’. According to the ‘Grading of Recommendations, Assessment, Development and Evaluation’ (GRADE) collaborators, the best way to assess any diagnostic strategy is a trial in which investigators randomize patients to experimental or control diagnostic approaches and measure mortality, morbidity, symptoms, and quality of life. Too often, diagnostic studies are performed to compare one test against another in subjects who are already known to have the disease. Instead, we should evaluate new tests against independent reference criteria and, if possible, assess their impact on clinical decision-making rather than on surrogate end points.

Defining normality and preventing overdiagnosis

Remarkably, there seems to be no consensus within diagnostic imaging about how we should define normal subjects to establish reference ranges. For instance, should we apply statistical cut points in normative random samples that are representative of the whole population, or should we consider as healthy controls only people who do not have risk factors? The question has been considered by geneticists who selected ‘hypercontrols’ with a low pre-test probability of disease in order to minimize the inclusion of false-positive subjects in mapping studies. Applying this approach more generally would be difficult, because, to some extent, we are all abnormal. We share >99% of our genomes, but each of us may have >2800 unique variants in our exomes. Many mutations may be benign or of uncertain significance, but 0.5% of 870 apparently healthy subjects had pathogenic variants for cardiomyopathy or arrhythmia. The more we investigate controls, the less normal they will become. Genetic factors account for only 1–3% of variance in common measurements such as LV ejection fraction and LV mass, in predominantly Caucasian populations. There are stronger associations of LV long-axis function with ageing, physiological variables such as height and body mass, and risk factors such as hypertension and metabolic syndrome or diabetes. These factors are normally distributed, and they progress from subclinical phenotypes to established disease, which leads to what can be called a paradox of precision: the more precise the measurement, the more difficult it becomes to discriminate health from disease. Selected diagnostic cut points should have a sound statistical basis. Normal LV ejection fraction (EF), for example, has been quoted at >50% but an individual patient meta-analysis of echocardiographic data derived from >22 000 subjects participating in 43 studies quoted the lower limit of normal (fifth centile) of EF for a 30-year-old European man at 49%, whereas for an East Asian woman aged >50 years it was 57%. These values can also be questioned since subjects with EF <30%, blood pressure >140/90 mmHg, serum creatinine >1.5 mg/dL, and other factors were excluded. Using a statistical approach, it would make sense to analyse a large random sample of the whole population including subjects with disease to define cut points corresponding to values more than, say, 2 SDs beyond the mean (Z score >2 or <2). An example of this approach was the study of diastolic function using the mitral E/A ratio in one MONICA cohort. There should be different reference ranges for male and female subjects of all ages and for different ethnic groups. Ultimately, the logical way to define health is by relating diagnostic measurements to clinical outcomes. A good example comes from the International Database on Ambulatory Blood Pressure in relationship to Cardiovascular Outcomes (IDACO) Investigators. In 1991, data from 23 studies suggested that normal daytime blood pressure (BP; 95% confidence intervals) was 101–146/61–91 mmHg. Later, IDACO used associations between ambulatory BP and 10-year outcomes to define optimal daytime BP as 122/79 mmHg and ambulatory hypertension as >140/85 mmHg; above these thresholds, risk was increased. Another much larger analysis of 958 074 participants aged 40–89 years in 61 studies found increased risk at pressures above 115/75 mmHg. There would be many practical difficulties but no theoretical reasons why we could not undertake similar studies for measurements of cardiac function. Measurements of function are more important than structure. We know that visual assessment of coronary arteriography is inaccurate and subject to bias but quantitative QCA can also be misleading. In 4086 coronary stenoses observed in 2986 patients, the sensitivity, specificity, and diagnostic accuracy of a diameter stenosis by QCA of ≥50% for predicting fractional flow reserve ≤0.80 (the functional test and reference criterion) were only 61%, 67%, and 0.64, respectively, and the tests were discordant in 35% of all cases. We need to guard against unthinking application of inappropriate diagnostic cut-points that do not take account of subclinical changes such as adaptations to regular exercise or that are unrelated to outcomes, as this may cause ‘non-disease’ which is a serious disservice to patients. Overdiagnosis is more common when sensitive diagnostic tests are used for screening. It can be suspected when the incidence of a disease increases, but its mortality does not change. After CT angiography was introduced in the USA, the incidence of pulmonary embolism increased by 81% while there was minimal change in total mortality and a 36% decrease in case fatality, presumably because more small pulmonary emboli were being detected. In an earlier study, missed pulmonary embolism was the most frequent diagnostic error. It is impossible to know without further research whether treating small pulmonary emboli amounts to overdiagnosis or else might prevent some patients from developing chronic thromboembolic pulmonary hypertension. With the extra treatment, however, there was a 71% increase in complications mostly related to anticoagulation. Any use of new technology with improved resolution may demonstrate findings that are unusual but harmless, known as incidentalomas. In 1426 research imaging examinations, 39.8% had at least one incidental finding; their discovery was associated with clear medical benefit in only 1% of cases and with a definite burden in 0.5%. CT imaging of the thorax revealed at least one unexpected abnormality in 55% of patients. We need strategies to identify which chance findings merit further investigation and which are unimportant.

Establishing the ‘basic science’ of diagnostic imaging

Rigorous pre-market evaluation of any new diagnostic method is essential before it can be adopted widely. This should be comprehensive, as listed in Table , including in vitro testing against standard imaging phantoms. Methods have been developed for testing software in silico against common datasets. Reproducibility should be investigated in clinical studies with good statistical power. These should include independent reacquisition of images and not only (as is commonly the case) independent reanalysis of the same images or digital data. Requirements for evidence-based diagnostic imaging • for independent acquisition of images • for independent analysis of images • for biological/temporal variability • in the target clinical population Reproducibility studies should assess if measurements are stable over a specified time period. The performance of the Agatston score for measuring coronary arterial calcification on CT images was studied in 104 subjects who were reinvestigated after 2 weeks. Inter-scan/inter-observer variability—the appropriate test for extrapolating to usual clinical practice—was 14.5 ± 21.8% and 12.5 ± 21.8%, compared with intra-scan/intra-observer variability of 1.7 ± 5.9% and 1.3 ± 3.7%. Positron emission tomography is considered as a reference method for measuring myocardial perfusion, but the coefficient of variation (CV) for serial measurements, which was 10% when scanning was repeated within minutes, was only 21% when the tests were repeated after a median interval of 16 days. The variability of a measurement should also be assessed in the target population where it is going to be used, rather than in healthy controls. Isovolumic acceleration (IVA), which is a load-insensitive indicator of myocardial contractile function, could be determined in 97% of healthy subjects, with a CV for intra-observer reproducibility of 12%, but the measurement was feasible in only 82% of patients with heart failure in whom the CV for inter-observer reproducibility was 28%. The first result made IVA appear suitable for use in individual patients, whereas the appropriate evaluation showed that it would be better as a research tool. Another aspect of diagnostic performance that may be overlooked is inter-machine variability. Here, another paradox becomes apparent: the more advanced the imaging modality, the less likely it becomes that diagnostic systems from different manufacturers will give the same results. Each manufacturer processes digital data in its own way, using its own software to measure the same features that are reported by other manufacturers using their software. Substantial inter-vendor differences (with correlations between 0.23 and 0.72) have been reported for global and segmental longitudinal strain measured by echocardiographic speckle tracking and for comparisons using software for CT and MR imaging. These results indicate that serial studies in individuals should be performed using the same machine and software, if the measurements are to be compared. The initiative of the EACVI and the American Society of Echocardiography (ASE) with all the major vendors, to standardize strain measurements, is an excellent example of how to address this generic problem. Implementing common protocols has halved inter-vendor variability., All results of validation and reproducibility studies and inter-vendor comparisons should be widely available, so that they can be considered in clinical practice. An important concept is the reference change value, which is the difference between consecutive measurements that needs to be exceeded before it can be concluded that an observed difference is a real change rather than a chance variation (Figure ). A CV of 20% can be acceptable for a research study when each group has >50 subjects, as an inter-group difference of >8% will be significant, but for a single patient a CV of 20% means that any difference <55% might occur by chance (see left end of each curve in Figure ). In one individual, reference change values for measurements with coefficients of variation of 10%, 5%, and 2% are ±28%, ±14%, and ±6%, respectively. Only some imaging tests have CVs < 10%, but differences between tests in one patient that are within the range of ±28% are commonly and erroneously considered as significant. Reference change values have been developed and reported for laboratory assays,, but they need to become commonplace in quantitative diagnostic imaging. Repeatability limits have been defined for some specific applications of cardiovascular CT., The reference change value. Plots demonstrating the differences between consecutive measurements that need to be exceeded before confidently stating that an observed change is likely to be a real effect rather than perhaps the consequence of chance variation related to the imprecision of the measurement. CV, coefficient of variation. Courtesy of Professor Frank Dunstan, Cardiff University. In the 18th century, Thomas Bayes attempted ‘to find out a method by which we might judge concerning the probability that an event has to happen, in given circumstances’. We are familiar with the principles of a Bayesian approach to diagnosis, and experienced clinicians balance probabilities all the time, but we could do more to make this process more robust. A simple nomogram for applying Bayes’ theorem is available (Figure ). If the pre-test probability and the likelihood ratio (of true positives to false positives) are known, then it is easy to predict the post-test probability. For example, if the pre-test probability is 5%, which would not be unusual, then for the post-test probability to exceed 50%, the false-positive rate must be below 5%, which is uncommon. Diagnostic tests have limited added value in patients with low or high pre-test probabilities, but that does not stop us from using them even when we may not believe the results. A analysis of dobutamine stress echocardiography in 3259 subjects illustrated how tests are most useful at intermediate pre-test probabilities. Nomogram for applying Bayes' Theorem. From New England Journal of Medicine, Fagan TJ, Letter: nomogram for Bayes theorem, 293:257. Copyright © 1975, Massachusetts Medical Society. Reprinted with permission. We should understand the technical features of the diagnostic imaging machines that we use, and they should be maintained regularly. Worryingly, 40% of ultrasonic transducers in routine daily use in Swedish hospitals were faulty, with deficiencies such as broken crystals or delaminated cables that meant that they should already have been replaced.

Producing evidence-based diagnostic recommendations

Coping with cognitive overload is an almost insurmountable task. It is inconceivable that any individual can keep up with the literature even in a specific field, such as cardiovascular imaging, so reading and synthesis must be a collective endeavour, just as diagnosis should also be a collaborative effort. This places a heavy responsibility on the authors of expert consensus statements and clinical guidelines. Contributors should stand aside if they have academic conflicts of interests such as patents or primary publications for a new diagnostic method. The categories of evidence that should be available from studies of diagnostic imaging have been recognized for many years, and standard tools are available to judge their quality, but in cardiovascular diagnostic imaging the ‘evidence base’ from which we start is not impressive. In a review of guidelines by the American College of Cardiology and the American Heart Association, published in 2009, the authors identified 333 recommendations concerning echocardiography, not one of which was linked to evidence. All 84 recommendations relating to the use of radionuclide imaging were supported by evidence but was it was at level A for only 4.8%. Guidelines need to be based on comprehensive systematic reviews and meta-analyses and their processes and judgements must be transparent so that their conclusions are re-testable. They could then be debated like any scientific hypothesis. Two recent publications can illustrate the need for reform. The EACVI and ASE revised their recommendations for evaluating diastolic function. They are based on 8 variables, giving a total of 40 320 possible combinations (8 factorial), but specific combinations are described so some subjects will be unclassifiable. A major criterion is the ratio of the velocity of early diastolic mitral inflow to the early diastolic velocity of long-axis lengthening of the LV (E/e′), although a meta-analysis concluded that E/e′ cannot reliably estimate LV filling pressure in heart failure with preserved ejection fraction (HFpEF). In the Stanislas cohort in France, applying the new recommendations compared with those that they replaced would reduce the prevalence of diastolic dysfunction from 6.3% to 1.1% in subjects aged 40–60 years and from 12.9% to 3.1% in those aged >60 years. In the SABRE cohort in London, 89% of 1395 subjects with a mean age of 69 years would now be abnormal on the basis of their LV long-axis early diastolic velocities (Alun Hughes, personal communication), probably because the recommended cut points are unrelated to age. Any criterion that defines most of the target population as abnormal will not have discriminant value. The second example relates to the revised guideline on heart failure from the European Society of Cardiology. This starts with the preamble that ‘Guidelines summarize and evaluate all available evidence on a particular issue at the time of the writing process’, but then states that ‘the main terminology used to describe heart failure is historical’ and that normal LVEF is ‘typically considered as ≥50%’. It continues: ‘Patients with an LVEF in the range of 40-49% represent a ‘grey area’, which we now define as heart failure with mid-range ejection fraction (HFmrEF)’. Subdividing EF in this way and creating a new disease is irrational when EF is normally distributed and has considerable measurement variability. Unsurprisingly, repeated measurements often lead to reclassification. Using the previous cut point of 50%, then during a mean follow-up of 5.1 years, 39% of patients classified as heart failure with reduced EF (HFrEF) had an EF > 50% and 39% of patients with HFpEF had an EF < 50%. In a recent study that applied the new recommendation, within 1 year of their diagnosis as HFmrEF 44% of patients had transitioned to HFpEF and 16% to HFrEF. In some patients in both studies the changes may have been real, due to progression of disease or response to treatment, but in others the redesignation can be explained by variation within the reference change value. These documents were surely the result of considerable study and deliberation, but the key point is that both recommendations have been revised without clear evidence being presented to support the changes and without it being possible to understand why they were made. Interactive electronic guidelines would allow others to study the same data and reassess the conclusions. Most importantly, new recommendations that represent hypotheses should be evaluated prospectively before they are included in guidelines. In diagnostic imaging, the use of appropriateness criteria may be helpful, but they are a substitute for developing evidence, and in some circumstances, their introduction has had little impact on clinical practice.

Using machine learning and decision support

A provocative summary could be that we take diagnostic imaging tests that are objective, quantifiable, and continuously related to disease and that have established sensitivity, specificity, and positive and negative predictive values, but then we use them in patients without considering the pre-test probability and by interpreting the results subjectively rather than objectively. We prefer to apply single cut points rather than age- and gender-specific reference ranges, and we fail to report the imprecision implicit in any measurement. Computer-aided diagnostic support has existed for more than 40 years but only recently has it reached the stage where it might make a substantial difference. Early studies showed that decision-support could perform as well as experts; now it can do better, particularly when analysing huge data sets. Information technology helps with diagnosis, research, and audit, but its potential for diagnostic imaging will be harnessed only if clinicians interact with informatics engineers to ensure that the tools that we need are developed (Table ). For diagnosis, automated systems can analyse faster and more accurately than visual assessment, and machine learning can reveal diagnostic patterns in complex data, including imaging data collected during exercise, that would not otherwise be discernible. Computer-reporting tools can help novices to gain experience rapidly, shortening the learning curve and reducing net risk to patients. Using ‘clinical knowledge support’ improves patient safety, reduces complications, and shortens length of stay; adverse outcomes were halved in hospitals with the highest rates of use. Diagnostic reporting systems could prompt the interpreter to consider alternative diagnoses or to reanalyse data that are inconsistent, and given relevant details, they could calculate pre- and post-test probabilities for the individual patient. Giving this information to the clinician rather than a simple yes/no answer would be a more honest way of reporting a diagnostic study. More objective methods of analysing images have important applications in research. When open access was allowed to trial data, reanalysis by independent investigators led to different conclusions in 35% of studies. Examples from diagnostic imaging included a computer-assisted study of regional wall motion abnormalities, instead of visual interpretation, and an MR study of myocardial infarction that applied a new measurement; each review concluded that one treatment was effective, after the original analysis had found no difference. Open access to imaging databases such as the Biobank study in the UK and the Cardiac Atlas project in the USA may yield substantial dividends. The optimal size of a training set to develop automated methods for detecting heart failure has been estimated to be 4000 patients. A ‘big data’ audit study of 3.6 million hospitalizations demonstrated that patients who had echocardiography during their admission for heart failure, had 20% lower mortality, implying that accurate diagnosis may have guided more effective treatment.

Regulatory governance of diagnostic imaging

Diagnostic imaging machines are medical devices. Until recently, their market access in Europe depended on the manufacturer demonstrating that its device was safe (with a positive ratio of net benefit relative to risk) and that it performed its designated tasks satisfactorily. Most diagnostic imaging systems are intermediate-risk (Class II) devices, so their authorization has not depended on the manufacturer submitting supporting evidence from clinical trials. In 2016, a revised guide to the clinical evaluation of medical devices was published by the European Commission. For the first time, this gives examples of performance data for diagnostic imaging that manufacturers can submit for approval, including sensitivity, specificity and reproducibility, comparisons of iterations of diagnostic software with previous versions, and normal values by age and gender for all groups in which the diagnostic system may be used. Adherence to the advice is voluntary, but manufacturers might collect these data if clinicians considered their availability as a factor when choosing which system to purchase. The European Union process for approving medical devices is changing. New Regulations were published in May 2017 that will be fully implemented from 2020., Requirements for clinical evidence will be strengthened especially for high-risk (Class III) devices. There is provision for designation of expert laboratories; one undertaking preclinical evaluation of diagnostic imaging systems would be an excellent choice. The Regulation on in vitro diagnostic devices (Annex 1, Chapter II, Paragraph 9.1.b) states that a manufacturer shall demonstrate the scientific validity of its method by submitting results from clinical performance studies including ‘diagnostic sensitivity, diagnostic specificity, positive predictive value, negative predictive value, likelihood ratio, (and) expected values in normal and affected populations’. Similar standards could be developed for imaging tests but they would need to be included in advisory documents. Here there will be an important role for experts to advise regulators on what is needed, from a clinical perspective. It can be argued that manufacturers have not integrated clinical decision support, because their machines have been approved without it. Smarter diagnostic systems would reduce diagnostic error and provide better reports, so their development should be a responsibility for manufacturers, shared with researchers. It is hoped that the new regulations will promote more evidence-based imaging (see Table ) and encourage research without stifling innovation.

Conclusions

The cardiologist Sir Thomas Lewis invented the term ‘clinical science’. He wrote in 1929 that ‘the lack of progress on the clinical side of Medicine (and Surgery) is due chiefly, not to inherent difficulties presented by the subject, but to what has become a traditionally low standard of work and thought from a scientific standpoint’. It would be unfair and misleading to imply that a huge amount of excellent research has not already been done in cardiovascular imaging, but we can also agree with Lewis that accomplished thinkers could achieve much more. New diagnostic tools are expensive, and diagnostic tests account for an increasing proportion of the burgeoning costs of health care. There are major geographical and inter-hospital variations in the use of diagnostic imaging which cannot be explained by differences in the prevalence of disease. Many of the principles that we should adopt when developing collaborative multimodality imaging have been described,, but they remain insufficiently implemented. Our subspecialty is much more than ‘imaging’; our challenge is to make it better clinical science.

Table 1

Priorities for improving diagnostic analysis and reporting

Minimizing diagnostic error

Teach and practise metacognition and analytical thinking

Implement debiasing strategies

Use objective rather than subjective tests

Develop and consult diagnostic checklists

Use normative reference populations to establish normal values

Consider pre-test probability when ordering and reporting a test

Consider reference change values in routine clinical practice

Document and analyse diagnostic errors

Hold regular discrepancy meetings

Implementing smarter information technology

Develop automated analyses

Develop and use decision-support software

Adjust for physiological status and risk factors

Apply cut-points for clinical decisions that have implications for clinical outcomes

Use machine learning to analyse complex data sets

Establish open-access imaging research databases

Develop interactive consensus documents and guidelines

Table 2

Requirements for evidence-based diagnostic imaging

Establishing diagnostic performance

Measure accuracy in vitro against standard phantoms

Compare software in silico against common digital data sets

Test validity against external, independent reference criteria

Measure inter-observer reproducibility in vivo with good sample size:

• for independent acquisition of images

• for independent analysis of images

• for biological/temporal variability

• in the target clinical population

Perform comparisons with similar and alternative diagnostic approaches

Document feasibility in routine clinical practice

Determine diagnostic utility in populations with varying pre-test probabilities

Opportunities for developing regulatory governance

Transparent reporting of performance (accuracy) against imaging phantoms

Open-access logs of software iterations

Public availability of reproducibility data and reference change values

Industry support for clinical end point studies of diagnostic technologies

Post-market surveillance and registries of diagnostic imaging

Integration of clinical decision support into diagnostic reporting systems

107 in total

1. Grading quality of evidence and strength of recommendations for diagnostic tests and strategies.

Authors: Holger J Schünemann; A Holger J Schünemann; Andrew D Oxman; Jan Brozek; Paul Glasziou; Roman Jaeschke; Gunn E Vist; John W Williams; Regina Kunz; Jonathan Craig; Victor M Montori; Patrick Bossuyt; Gordon H Guyatt
Journal: BMJ Date: 2008-05-17

2. Impact of Changes in Consensus Diagnostic Recommendations on the Echocardiographic Prevalence of Diastolic Dysfunction.

Authors: Olivier Huttin; Alan G Fraser; Stefano Coiro; Erwan Bozec; Christine Selton-Suty; Zohra Lamiral; Zied Frikha; Patrick Rossignol; Faiez Zannad; Nicolas Girerd
Journal: J Am Coll Cardiol Date: 2017-06-27 Impact factor: 24.094

Review 3. Fact or Artifact in Two-Dimensional Echocardiography: Avoiding Misdiagnosis and Missed Diagnosis.

Authors: Philippe B Bertrand; Robert A Levine; Eric M Isselbacher; Pieter M Vandervoort
Journal: J Am Soc Echocardiogr Date: 2016-03-09 Impact factor: 5.251

4. Common carotid intima-media thickness in cardiovascular risk stratification of older people: the Rotterdam Study.

Authors: Suzette E Elias-Smale; Maryam Kavousi; Germaine C Verwoert; Michael T Koller; Ewout W Steyerberg; Francesco U S Mattace-Raso; Albert Hofman; Arnold P G Hoeks; Robert S Reneman; Jacqueline C M Witteman
Journal: Eur J Prev Cardiol Date: 2011-06-22 Impact factor: 7.804

5. Ethnic-Specific Normative Reference Values for Echocardiographic LA and LV Size, LV Mass, and Systolic Function: The EchoNoRMAL Study.

Authors:
Journal: JACC Cardiovasc Imaging Date: 2015-05-14

6. Head-to-Head Comparison of Global Longitudinal Strain Measurements among Nine Different Vendors: The EACVI/ASE Inter-Vendor Comparison Study.

Authors: Konstantinos E Farsalinos; Ana M Daraban; Serkan Ünlü; James D Thomas; Luigi P Badano; Jens-Uwe Voigt
Journal: J Am Soc Echocardiogr Date: 2015-07-23 Impact factor: 5.251

Review 7. Reconsideration of echocardiographic standards for mitral valve prolapse: lack of association between leaflet displacement isolated to the apical four chamber view and independent echocardiographic evidence of abnormality.

Authors: R A Levine; E Stathogiannis; J B Newell; P Harrigan; A E Weyman
Journal: J Am Coll Cardiol Date: 1988-05 Impact factor: 24.094

8. Diagnostic errors in congenital echocardiography: importance of study conditions.

Authors: Oscar J Benavidez; Kimberlee Gauvreau; Tal Geva
Journal: J Am Soc Echocardiogr Date: 2014-04-04 Impact factor: 5.251

9. Evolving concepts of angiogram: fractional flow reserve discordances in 4000 coronary stenoses.

Authors: Gabor Toth; Michalis Hamilos; Stylianos Pyxaras; Fabio Mangiacapra; Olivier Nelis; Frederic De Vroey; Luigi Di Serafino; Olivier Muller; Carlos Van Mieghem; Eric Wyffels; Guy R Heyndrickx; Jozef Bartunek; Marc Vanderheyden; Emanuele Barbato; William Wijns; Bernard De Bruyne
Journal: Eur Heart J Date: 2014-03-18 Impact factor: 29.983

10. Intervendor consistency and reproducibility of left ventricular 2D global and regional strain with two different high-end ultrasound systems.

Authors: Kenji Shiino; Akira Yamada; Matthew Ischenko; Bijoy K Khandheria; Mahala Hudaverdi; Vicki Speranza; Mary Harten; Anthony Benjamin; Christian R Hamilton-Craig; David G Platts; Darryl J Burstow; Gregory M Scalia; Jonathan Chan
Journal: Eur Heart J Cardiovasc Imaging Date: 2017-06-01 Impact factor: 6.875

1 in total

Review 1. Left Ventricular Diastolic Dysfunction in Type 2 Diabetes-Progress and Perspectives.

Authors: Elena-Daniela Grigorescu; Cristina-Mihaela Lacatusu; Mariana Floria; Bogdan-Mircea Mihai; Ioana Cretu; Laurentiu Sorodoc
Journal: Diagnostics (Basel) Date: 2019-09-17

1 in total