| Literature DB >> 35095568 |
Shenda Hong1,2, Wenrui Zhang3, Chenxi Sun4,5, Yuxi Zhou6,7, Hongyan Li4,5.
Abstract
Cardiovascular diseases (CVDs) are one of the most fatal disease groups worldwide. Electrocardiogram (ECG) is a widely used tool for automatically detecting cardiac abnormalities, thereby helping to control and manage CVDs. To encourage more multidisciplinary researches, PhysioNet/Computing in Cardiology Challenge 2020 (Challenge 2020) provided a public platform involving multi-center databases and automatic evaluations for ECG classification tasks. As a result, 41 teams successfully submitted their solutions and were qualified for rankings. Although Challenge 2020 was a success, there has been no in-depth methodological meta-analysis of these solutions, making it difficult for researchers to benefit from the solutions and results. In this study, we aim to systematically review the 41 solutions in terms of data processing, feature engineering, model architecture, and training strategy. For each perspective, we visualize and statistically analyze the effectiveness of the common techniques, and discuss the methodological advantages and disadvantages. Finally, we summarize five practical lessons based on the aforementioned analysis: (1) Data augmentation should be employed and adapted to specific scenarios; (2) Combining different features can improve performance; (3) A hybrid design of different types of deep neural networks (DNNs) is better than using a single type; (4) The use of end-to-end architectures should depend on the task being solved; (5) Multiple models are better than one. We expect that our meta-analysis will help accelerate the research related to ECG classification based on machine-learning models.Entities:
Keywords: classification; deep learning; electrocardiogram; machine learning; meta-analysis; physionet challenge; practical lessons
Year: 2022 PMID: 35095568 PMCID: PMC8795785 DOI: 10.3389/fphys.2021.811661
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.566
Overview of databases used in Challenge 2020.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
|
|
|
|
| |
| CPSC (Liu et al., | 9,458 | 10,330 | 1,463 | 1,463 | 13,256 |
| INCART (Tihonenko et al., | 32 | 74 | 0 | 0 | 74 |
| PTB (Bousseljot et al., | 19,175 | 22,353 | 0 | 0 | 22,353 |
| G12EC (G12, | 15,742 | 10,344 | 5,167 | 5,167 | 20,678 |
| Undisclosed | Unknown | 0 | 0 | 10,000 | 10,000 |
| Total | Unknown | 43,101 | 6,630 | 16,630 | 66,361 |
Figure 1Number of recordings of each scored diagnosis.
Figure 2The framework of our meta-analysis.
Details of employed techniques.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Data preprocessing | Signal processing | 95.12 | 10 | N.A. |
| Data augmentation | 31.70 | 6 | 0.071 | |
| Imbalance handling | 53.66 | 7 | 0.252 | |
| Feature engineering | Hand features | 36.59 | 0 | 0.983 |
| Demographic features | 29.27 | 5 | 0.109 | |
| Machine-learning models | Deep neural network | 82.93 | 10 | 0.116 |
| Convolutional neural network | 82.93 | 10 | 0.116 | |
| Recurrent neural network/transformer | 31.71 | 4 | 0.317 | |
| Attention | 24.39 | 6 | 0.006 | |
| Training strategy | Model ensemble | 36.59 | 4 | 0.878 |
| End-to-End | 80.49 | 10 | 0.139 | |
| Multi-binary classification | 58.54 | 10 | 0.002 | |
| Applications to the real world | Post-processing | 2.38 | 1 | N.A. |
| Interpretability | 4.76 | 0 | N.A. | |
| Unknown classes and unseen patients | 0 | 0 | N.A. |
N.A. means that the hypothesis test is not conducted.
Figure 3Box-plots of score distributions of data preprocessing techniques.
Figure 4Box-plots of score distributions of feature engineering techniques.
Figure 5Box-plots of score distributions of machine-learning models.
Figure 6Box-plots of score distributions of training strategy techniques.