Kenya Kusunose1, Robert Zheng2, Hirotsugu Yamada3, Masataka Sata2. 1. Department of Cardiovascular Medicine, Tokushima University Hospital, 2-50-1 Kuramoto, Tokushima, Japan. kusunosek@tokushima-u.ac.jp. 2. Department of Cardiovascular Medicine, Tokushima University Hospital, 2-50-1 Kuramoto, Tokushima, Japan. 3. Department of Community Medicine for Cardiology, Tokushima University Graduate School of Biomedical Sciences, Tokushima, Japan.
Abstract
Despite recent advances in imaging for myocardial deformation, left ventricular ejection fraction (LVEF) is still the most important index for systolic function in daily practice. Its role in multiple fields (e.g., valvular heart disease, myocardial infarction, cancer therapy-related cardiac dysfunction) has been a mainstay in guidelines. In addition, assessment of LVEF is vital to clinical decision-making in patients with heart failure. However, notable limitations to LVEF include poor inter-observer reproducibility dependent on observer skill, poor acoustic windows, and variations in measurement techniques. To solve these problems, methods for standardization of LVEF by sharing reference images among observers and artificial intelligence for accurate measurements have been developed. In this review, we focus on the standardization of LVEF using reference images and automated LVEF using artificial intelligence.
Despite recent advances in imaging for myocardial deformation, left ventricular ejection fraction (LVEF) is still the most important index for systolic function in daily practice. Its role in multiple fields (e.g., valvular heart disease, myocardial infarction, cancer therapy-related cardiac dysfunction) has been a mainstay in guidelines. In addition, assessment of LVEF is vital to clinical decision-making in patients with heart failure. However, notable limitations to LVEF include poor inter-observer reproducibility dependent on observer skill, poor acoustic windows, and variations in measurement techniques. To solve these problems, methods for standardization of LVEF by sharing reference images among observers and artificial intelligence for accurate measurements have been developed. In this review, we focus on the standardization of LVEF using reference images and automated LVEF using artificial intelligence.
Despite recent advances in imaging for myocardial deformation, left ventricular ejection fraction (LVEF) is still the most important index for systolic function in daily practice. Its role in multiple fields (e.g., valvular heart disease, myocardial infarction, cancer therapy-related cardiac dysfunction) has been a mainstay in guidelines [1-3]. For example, in the case of heart failure with reduced ejection fraction, renin-angiotensin system (RAS) inhibitors or beta-blockers have been shown to improve the prognosis. On the other hand, routine administration of RAS inhibitors or beta-blockers is not recommended for heart failure with preserved ejection fraction [4]. This is a typical example in which LVEF is an important index in cardiovascular clinical practice. However, notable limitations to LVEF include poor inter-observer reproducibility dependent on observer skill, poor acoustic windows, and variations in measurement techniques [5]. To solve these problems, methods for standardization of LVEF by sharing reference images among observers and artificial intelligence for accurate measurements have been developed.In addition, due to an aging population and the prevalence of lifestyle-related diseases, the number of patients who visit hospitals for cardiovascular diseases is very high. In the intensive care unit, there are cases in which it is necessary to make a visual judgment of function using echocardiographic images due to time constraints. Recently, in an environment with a rapidly increasing number of confirmed/suspected COVID-19 patients, non-specialists, such as emergency physicians, are more likely required to perform this examination in the infection control room [6]. Development of an automatic analysis tool for echocardiographic images is desired as a decision support system.In this review, we focus on the standardization of LVEF using reference images and automated LVEF using artificial intelligence (AI).
Measurement of LVEF
LVEF is an index of LV contractility that indicates the degree of change in LV volume from diastole to systole. It is calculated by subtracting the end-systolic LV volume from the end-diastolic LV volume and dividing it by the end-diastolic LV volume. There is no definitive consensus on the normal value, as it is affected by age, gender, race, measurement method, and so forth. Based on the guidelines of the American Society of Echocardiography and reports from Japan, the lower limit of normal for LVEF is set at around 50% in many institutions [7-9]. Several methods have been proposed to measure LVEF with echocardiography (Fig. 1). A comparison of the advantages between different techniques for the measurement of LVEF is shown in Table 1. The most common quantitative method is the biplane disk-summation method. In this method, the LV is divided into 20 disks along the long axis. The volume is calculated from the sum of the cross-sectional areas of the disks using the long axis and the short axis of each disk. In clinical practice, the LV volume is calculated by tracing the LV endocardium in 4-chamber and 2-chamber views at the end-systolic and end-diastolic phases.
Fig. 1
Calculation of left ventricular ejection fraction using M-mode, B-mode, and 3-dimensional echocardiography. In the guidelines, if applicable, 3D echocardiography is recommended to measure left ventricular ejection fraction. LV left ventricle, LA left atrium
Table 1
Comparison of advantages and disadvantages between different techniques for measurement of ejection fraction
Methods
Availability
Assumptions
Reproducibility
Speed
Eyeball
Always
Dependent on observer skill
Low
Instant
M-mode
Widely used
Dependent on geometric assumptions
Low/modest
Quick
B-mode
Generally used
Minimizes mathematic assumptions
Low/modest
Needs tracing
3D
Readily available
Independent of geometric assumptions
High
Dependent on machine
AI
Not yet
Black box
High
Dependent on machine
3D 3-dimensional, AI artificial intelligence
Calculation of left ventricular ejection fraction using M-mode, B-mode, and 3-dimensional echocardiography. In the guidelines, if applicable, 3D echocardiography is recommended to measure left ventricular ejection fraction. LV left ventricle, LA left atriumComparison of advantages and disadvantages between different techniques for measurement of ejection fraction3D 3-dimensional, AI artificial intelligence
Reproducibility of LVEF
Cardiac magnetic resonance (CMR) imaging represents the gold standard in the quantification of LVEF. The reproducibility of CMR measurements is superior to echocardiography in most studies [10-12] The biplane disk-summation method using echocardiography has a measurement error of approximately 10% for LVEF [13]. In the field of cancer therapy-related cardiac dysfunction, this value is equal to a diagnostic criterion (10% decrease from baseline). Small changes in LVEF may not necessarily represent true changes due to reproducibility issues [14]. In addition, a large variability in LVEF measurements may occur at different centers, and therapies may be confounded when decisions are made based solely on LVEF. Against this backdrop, a reproducible method is necessary for the measurement of LVEF.
Eyeball LVEF
Eyeball EF is the “appearance EF”, which is the LVEF estimated by experience based on the appearance (size and movement) of the LV. The guideline of the American Society of Echocardiography clearly states that the biplane-disk method should be used to evaluate LVEF [9]. On the other hand, the guideline of the American Society of Intensive Care Medicine states that LV function should be qualitatively evaluated in intensive care settings [15]. The limitations of eyeball EF are that it is dependent on the experience of the examiner and it has relatively poor reproducibility. The results of our previous multicenter study involving 13 centers showed that eyeball EF varied from center to center, with five of the 13 centers differing by more than 3% in the absolute value of LVEF (Fig. 2) [16].
Fig. 2
Overestimation and underestimation of visual LVEF between laboratories in Japan. Six laboratories overestimated visual EF, and seven laboratories underestimated visual EF compared with reference values. Two laboratories modestly overestimated and three laboratories modestly underestimated (bold)
Overestimation and underestimation of visual LVEF between laboratories in Japan. Six laboratories overestimated visual EF, and seven laboratories underestimated visual EF compared with reference values. Two laboratories modestly overestimated and three laboratories modestly underestimated (bold)Several papers have reported a quality assessment program in a clinical setting [17, 18]. The investigators used reference cases as a standard to reduce inter-observer variability. Reference LVEFs were provided by echocardiographic expert reviews. In these studies, intervention with reference images could improve the reproducibility of visually estimated LVEF [17, 19, 20]. In our previous multicenter study, we prepared reference images of three apex cross-sectional images from 20 to 70% EF and showed that the inter-institutional variability could be reduced to less than 3% by using the reference images. In addition, a learning session using reference images also resulted in less misclassifications of LVEF, especially in mild to moderately impaired LVEF, regardless of observers’ experience [16]. These results suggest that a simple learning session with reference images can minimize inter-observer variability and misclassification in practitioners with varied experience.Reliability and accuracy are separate aspects of echocardiographic measurements. When you get the same incorrect answer all the time, a result can be reliable and inaccurate. We can use shooting at a target as an example to clarify the definitions of reliability and accuracy. Figure 3 shows all combinations of reliability and accuracy. Some ideas can be implemented to improve accuracy and reliability. If we can get a stable trace line of LV with practice, the reliability will improve. Furthermore, a learning system for accurate EF using reference images will increase accuracy. Based on this theory, we believe that learning with reference images can help to increase accuracy in measurements of LVEF.
Fig. 3
Reliability vs accuracy. Novice observers often had inaccurate and low precision of measurements. Expert observers sometimes had high precision and inaccurate measurements
Reliability vs accuracy. Novice observers often had inaccurate and low precision of measurements. Expert observers sometimes had high precision and inaccurate measurements
Automated LVEF
Automated LVEF is key to improving reproducibility. There are various steps required to measure LVEF automatically (Fig. 4).
Fig. 4
Steps of LVEF measurement. The process of LVEF measurement involves four steps
Identification of the end-diastolic and end-systolic phases from the ECG.Detection of the boundary between the cardiac cavity and the myocardium.Tracking the endocardial boundary.Calculation of LV volumes at end-diastole and end-systole.Steps of LVEF measurement. The process of LVEF measurement involves four stepsTrials have been conducted to automatically measure LVEF using combinations of these steps. However, because of the variation in position and size of the heart, the technology for automatic LV tracing using the pattern matching method has limitations in tracking. In addition, it is not always possible to obtain a clear image of the cardiac cavity boundary in all patients. To overcome this issue, a tracing method called "knowledge-based systems" has been developed, which calculates the most appropriate tracing line by installing a database of multiple cases and tracing examples. This has made it possible to draw the optimal trace line with high accuracy. In a global multicenter study of fully automated software for calculating LVEF, a relatively high correlation coefficient of about 0.7–0.8 was obtained [21, 22]. On the other hand, one of the limitations of knowledge-based algorithms is that even in cases with good image quality, the similarity of data may result in incorrect tracing. For example, Fig. 5 shows a case of fully automated tracing of the LV using Auto EF, but a large tracing line was drawn from the left to the right ventricle because the ventricle was not identified properly by the algorithm. This is a field where further improvement in accuracy is expected with AI algorithms.
Fig. 5
A case of fully automated tracing of the left ventricular cavity using auto EF algorithm. A large tracing line was drawn from the left to the right ventricle because the ventricle was not identified properly. RV right ventricle, LV left ventricle
A case of fully automated tracing of the left ventricular cavity using auto EF algorithm. A large tracing line was drawn from the left to the right ventricle because the ventricle was not identified properly. RV right ventricle, LV left ventricle
AI for LVEF
In recent years, with the development of computer technology, the accuracy of automated diagnosis of medical images by machine learning has been improved. In 2012, deep learning was shown to have high accuracy in image classification, where the computer learns the features extracted by repeated trials. Deep learning can be regarded as a type of machine learning, but its potential as a self-encoding and universal approximator has led to more accurate results than conventional machine learning [23, 24]. This algorithm was combined with the development of highly sophisticated techniques for the prevention of overlearning and gradient vanishing. It is now possible to obtain more accurate results than traditional machine learning [25-30]. Deep learning does not require the setting of feature values by humans, and by learning many supervised data, it is becoming possible for “computer eyes to judge medical images” (Fig. 6). This process seems to be the learning process of “human eyes to judge medical images” and may be particularly useful in the field of diagnosis by visual appearance (e.g., eyeball LVEF).
Fig. 6
Conventional artificial intelligence (AI) and new AI. In conventional artificial intelligence (AI), left ventricular volumes should be calculated to measure the left ventricular ejection fraction (LVEF). In deep learning, left ventricular volumes can be directly estimated without tracking the endocardial borders
Conventional artificial intelligence (AI) and new AI. In conventional artificial intelligence (AI), left ventricular volumes should be calculated to measure the left ventricular ejection fraction (LVEF). In deep learning, left ventricular volumes can be directly estimated without tracking the endocardial bordersWhen we apply the deep learning algorithm to estimate LVEF, there are many problems in the echocardiographic data. For examples, there are differences in echocardiographic images between different venders, making it necessary to output the common image parts from DICOM data. We should consider which parts of the image should be analyzed as input data. After obtaining the image location, image size, number of images, frame rate, and heart rate from the DICOM tag information, the image is extracted based on the location information. Unnecessary information (such as the name of the hospital and date and time) included outside the echocardiographic image should be removed. The image size is standardized and reduced by rescaling and resampling pixels. For image standardization, there are many issues to be considered, such as which portions should be cropped, whether it is necessary to adjust the scale, and what size is appropriate [31].Since LVEF is measured by left ventricular volume in end-diastole and end-systole, it may be possible to predict LVEF by the two time periods. However, there is a possibility of improving the accuracy by using more images for training, and it is necessary to try a method that incorporates time series data. We created a three-dimensional CNN (3DCNN) model using 340 echocardiographic videos labeled with the LVEF calculated by experts. This 3DCNN model consists of convolutional and pooling layers with an input of 10 echocardiographic images per heartbeat, and finally outputs a continuous value from 0 to 1 by a sigmoid function through all coupling layers. The model was validated in a cohort independent from the one used for training, and the correlation coefficient was 0.92 (p < 0.001), indicating that the model was able to predict LVEF with high accuracy by using echocardiographic images (Fig. 7) [32].
Fig. 7
Correlation between ejection fraction (EF) by deep learning (DL) and by expert observers. The correlation is excellent in the independent cohort
Correlation between ejection fraction (EF) by deep learning (DL) and by expert observers. The correlation is excellent in the independent cohortIn principle, it is possible to calculate LVEF using only 2-chamber and 4-chamber views, but in daily practice, the practitioner must also refer to other echocardiographic views. Therefore, it is expected that the accuracy of LVEF can be improved by adding 3ch, short, and long-axis images. The 95% prediction error calculated from the 5-section average root-mean-square-error by the 3DCNN model is about 14%, while the prediction accuracy of LVEF via segmentation using U-net is about 20% [33, 34]. Direct estimation of LVEF by 3DCNN seemed to be better. The accuracy of the 3DCNN model was AUC > 0.99 when the LVEF was divided into LVEF > 50% and LVEF < 50% [32]. Since LVEF is an important index for deciding the course of treatment in emergency heart failure, the model may be useful in clinical practice when making rapid treatment decisions.Further refinement of the method is expected by increasing the number of echocardiographic images. In response to these preliminary results, the Japan Society of Ultrasonics in Medicine and the Society of Echocardiography, with support from the Japan Agency for Medical Research and Development (AMED) (Fig. 8), have jointly started to create a database in which videos of 2- and 4-chamber images and their associated LVEF values are recorded. Using this large amount of data, it is thought that it is possible to create a highly accurate LVEF prediction model.
Fig. 8
Japan Agency for Medical Research and Development (AMED)-supported projects. The Japan Society of Ultrasonics in Medicine (JSUM) and the Japanese Society of Echocardiography started to gather images from multiple centers. NCVC National Cerebral and Cardiovascular Center
Japan Agency for Medical Research and Development (AMED)-supported projects. The Japan Society of Ultrasonics in Medicine (JSUM) and the Japanese Society of Echocardiography started to gather images from multiple centers. NCVC National Cerebral and Cardiovascular Center
Conclusions
LVEF is an important index for visual evaluation. However, there are some limitations when measuring LVEF due to poor inter-observer reproducibility, limited acoustic windows, and variations in measurement techniques. It is likely to be a good match for deep learning that captures the characteristics of images. Recently, several studies have demonstrated automated quantification of ejection fraction from echocardiographic acquisitions [32-34]. We hope that the AI technology will transfer to the clinical setting [35].