| Literature DB >> 32273514 |
Antônio H Ribeiro1,2, Manoel Horta Ribeiro3, Gabriela M M Paixão3,4, Derick M Oliveira3, Paulo R Gomes3,4, Jéssica A Canazart3,4, Milton P S Ferreira3, Carl R Andersson5, Peter W Macfarlane6, Wagner Meira3, Thomas B Schön7, Antonio Luiz P Ribeiro8,9.
Abstract
The role of automatic electrocardiogram (ECG) analysis in clinical practice is limited by the accuracy of existing models. Deep Neural Networks (DNNs) are models composed of stacked transformations that learn tasks by examples. This technology has recently achieved striking success in a variety of task and there are great expectations on how it might improve clinical practice. Here we present a DNN model trained in a dataset with more than 2 million labeled exams analyzed by the Telehealth Network of Minas Gerais and collected under the scope of the CODE (Clinical Outcomes in Digital Electrocardiology) study. The DNN outperform cardiology resident medical doctors in recognizing 6 types of abnormalities in 12-lead ECG recordings, with F1 scores above 80% and specificity over 99%. These results indicate ECG analysis based on DNNs, previously studied in a single-lead setup, generalizes well to 12-lead exams, taking the technology closer to the standard clinical practice.Entities:
Mesh:
Year: 2020 PMID: 32273514 PMCID: PMC7145824 DOI: 10.1038/s41467-020-15432-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
(Dataset summary) Patient characteristics and abnormalities prevalence, n (%).
| Train + Val ( | Test ( | |
|---|---|---|
| Abnormality | ||
| 1dAVb | 35,759 (1.5%) | 28 (3.4%) |
| RBBB | 63,528 (2.7%) | 34 (4.1%) |
| LBBB | 39,842 (1.7%) | 30 (3.6%) |
| SB | 37,949 (1.6%) | 16 (1.9%) |
| AF | 41,862 (1.8%) | 13 (1.6%) |
| ST | 49,872 (2.1%) | 36 (4.4%) |
| Age group | ||
| 16−25 | 155,531 (6.7%) | 43 (5.2%) |
| 26−40 | 406,239 (17.5%) | 122 (14.8%) |
| 41−60 | 901.456 (38.8%) | 340 (41.1%) |
| 61−80 | 729,300 (31.4%) | 278 (33.6%) |
| ≥81 | 129,987 (5.6%) | 44 (5.3%) |
| Sex | ||
| Male | 922,780 (39.7%) | 321 (38.8%) |
| Female | 1,399,733 (60.3%) | 506 (61.2%) |
Fig. 1Abnormalities examples.
A list of all the abnormalities the model classifies. We show only three representative leads (DII, V1 and V6).
(Performance indexes) Scores of our DNN are compared on the test set with the average performance of: (i) 4th year cardiology resident (cardio.); (ii) 3rd year emergency resident (emerg.); and (iii) 5th year medical students (stud.).
| Precision (PPV) | Recall (Sensitivity) | Specificity | F1 score | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DNN | cardio. | emerg. | stud. | DNN | cardio. | emerg. | stud. | DNN | cardio. | emerg. | stud. | DNN | cardio. | emerg. | stud. | |
| 1dAVb | 0.867 | 0.905 | 0.639 | 0.605 | 0.929 | 0.679 | 0.821 | 0.929 | 0.995 | 0.997 | 0.984 | 0.979 | 0.776 | 0.719 | 0.732 | |
| RBBB | 0.895 | 0.868 | 0.963 | 0.914 | 1.000 | 0.971 | 0.765 | 0.941 | 0.995 | 0.994 | 0.999 | 0.996 | 0.917 | 0.852 | 0.928 | |
| LBBB | 1.000 | 1.000 | 0.963 | 0.931 | 1.000 | 0.900 | 0.867 | 0.900 | 1.000 | 1.000 | 0.999 | 0.997 | 0.947 | 0.912 | 0.915 | |
| SB | 0.833 | 0.833 | 0.824 | 0.750 | 0.938 | 0.938 | 0.875 | 0.750 | 0.996 | 0.996 | 0.996 | 0.995 | 0.848 | 0.750 | ||
| AF | 1.000 | 0.769 | 0.800 | 0.571 | 0.769 | 0.769 | 0.615 | 0.923 | 1.000 | 0.996 | 0.998 | 0.989 | 0.769 | 0.696 | 0.706 | |
| ST | 0.947 | 0.968 | 0.946 | 0.912 | 0.973 | 0.811 | 0.946 | 0.838 | 0.997 | 0.999 | 0.997 | 0.996 | 0.882 | 0.946 | 0.873 | |
PPV positive predictive value. The bold values represent the best scores.
Fig. 2Precision-recall curve.
Show precision-recall curve for our nominal prediction model on the test set (strong line) with regard to each ECG abnormalities. The shaded region shows the range between maximum and minimum precision for neural networks trained with the same configuration and different initialization. Points corresponding to the performance of resident medical doctors and students are also displayed, together with the point corresponding to the DNN performance for the same threshold used for generating Table 2. Gray dashed curves in the background correspond to iso-F1 curves (i.e. curves in the precision-recall plane with constant F1 score).
(Error analysis) Present the analysis of misclassified exams.
| DNN | cardio. | emerg. | stud. | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| meas. | noise | unexplain. | meas. | noise | concep. | atte. | meas. | noise | concep. | atte. | meas. | noise | concep. | atte. | |
| 1dAVb | 3 | 2 | 1 | 8 | 3 | 15 | 3 | 13 | 3 | 3 | |||||
| RBBB | 3 | 1 | 4 | 2 | 1 | 8 | 3 | 2 | |||||||
| LBBB | 1 | 1 | 1 | 1 | 4 | 2 | 3 | ||||||||
| SB | 4 | 4 | 4 | 1 | 5 | 2 | 1 | ||||||||
| AF | 2 | 1 | 4 | 2 | 2 | 5 | 3 | 7 | |||||||
| ST | 2 | 1 | 2 | 1 | 5 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 5 | ||
The errors were classified into the following categories: (i) measurements errors (meas.) were ECG interval measurements preclude the given diagnosis by its textbook definition; (ii) errors due to noise, where we believe that the analyst or the DNN failed due to a lower than usual signal quality; and (iii) other type of errors (unexplain.). Those were further divided, for the medical residents and students, into two categories: conceptual errors (concep.), where our reviewer suggested that the doctor failed to understand the definitions of each abnormality, and attention errors (atte.), where we believe the error could be avoided if the reviewer had been more careful.
Fig. 3(DNN architecture).
The unidimensional residual neural network architecture used for ECG classification.