| Literature DB >> 35672420 |
Hui Liu1, Dan Chen1,2, Da Chen1, Xiyu Zhang2, Huijie Li2, Lipan Bian1, Minglei Shu3, Yinglong Wang4.
Abstract
Deep learning approaches have exhibited a great ability on automatic interpretation of the electrocardiogram (ECG). However, large-scale public 12-lead ECG data are still limited, and the diagnostic labels are not uniform, which increases the semantic gap between clinical practice. In this study, we present a large-scale multi-label 12-lead ECG database with standardized diagnostic statements. The dataset contains 25770 ECG records from 24666 patients, which were acquired from Shandong Provincial Hospital (SPH) between 2019/08 and 2020/08. The record length is between 10 and 60 seconds. The diagnostic statements of all ECG records are in full compliance with the AHA/ACC/HRS recommendations, which aims for the standardization and interpretation of the electrocardiogram, and consist of 44 primary statements and 15 modifiers as per the standard. 46.04% records in the dataset contain ECG abnormalities, and 14.45% records have multiple diagnostic statements. The dataset also contains additional patient demographics.Entities:
Mesh:
Year: 2022 PMID: 35672420 PMCID: PMC9174207 DOI: 10.1038/s41597-022-01403-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Overview of large public 12-lead ECG datasets.
| Name | # ECG | Length (seconds) | Standard | # Classes | # Patient | Single-source |
|---|---|---|---|---|---|---|
| CPSC database[ | 10330 | 6∼60 | — | 23 | 10330 | N |
| INCART database[ | 74 | 1800 | — | 10 | 32 | Y |
| PTB-XL dataset[ | 21837 | 10 | SCP-ECG[ | 71 | 18885 | Y |
| Georgia database[ | 10344 | 10 | SNOMED-CT[ | 24 | 10344 | N |
| Shaoxing People’s Hospital dataset[ | 10646 | 10 | — | 11 | 10646 | Y |
| SPH dataset (this work)[ | 25770 | 10∼60 | AHA[ | 44 | 24666 | Y |
Fig. 1Files in the SPH database.
Overview of primary statements and modifiers in the dataset.
| Category | Code | Primary Statement (+ | Count |
|---|---|---|---|
| A | 1 | Normal ECG | 13905 |
| C | 21 | Sinus tachycardia | 725 |
| C | 22 | Sinus bradycardia | 2711 |
| C | 23 | Sinus arrhythmia | 1553 |
| D | 30 | Atrial premature complex(es) | 539 |
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| D | 31 | Atrial premature complexes, nonconducted | 4 |
| D | 36 | Junctional premature complex(es) | 64 |
| D | 37 | Junctional escape complex(es) | 20 |
| E | 50 | Atrial fibrillation | 675 |
| + | |||
| + | |||
| E | 51 | Atrial flutter | 99 |
| E | 54 | Junctional tachycardia | 13 |
| F | 60 | Ventricular premature complex(es) | 1067 |
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| + | |||
| H | 80 | Short PR interval | 11 |
| H | 81 | AV conduction ratio N:D | 3 |
| H | 82 | Prolonged PR interval | 238 |
| H | 83 | Second-degree AV block, Mobitz type I (Wenckebach) | 9 |
| H | 84 | Second-degree AV block, Mobitz type II | 3 |
| H | 85 | 2:1 AV block | 35 |
| H | 86 | AV block, varying conduction | 47 |
| H | 87 | AV block, advanced (high-grade) | 3 |
| H | 88 | AV block, complete (third-degree) | 22 |
| I | 101 | Left anterior fascicular block | 154 |
| I | 102 | Left posterior fascicular block | 6 |
| I | 104 | Left bundle-branch block | 84 |
| I | 105 | Incomplete right bundle-branch block | 1259 |
| I | 106 | Right bundle-branch block | 710 |
| I | 108 | Ventricular preexcitation | 27 |
| J | 120 | Right-axis deviation | 161 |
| J | 121 | Left-axis deviation | 138 |
| J | 125 | Low voltage | 322 |
| K | 140 | Left atrial enlargement | 19 |
| K | 142 | Left ventricular hypertrophy | 209 |
| K | 143 | Right ventricular hypertrophy | 6 |
| L | 145 | ST deviation | 1829 |
| + | |||
| + | |||
| L | 146 | ST deviation with T-wave change | 1063 |
| L | 147 | T-wave abnormality | 2218 |
| + | |||
| L | 148 | Prolonged QT interval | 24 |
| L | 152 | TU fusion | 9 |
| L | 153 | ST-T change due to ventricular hypertrophy | 88 |
| L | 155 | Early repolarization | 32 |
| M | 160 | Anterior MI | 52 |
| + | |||
| + | |||
| M | 161 | Inferior MI | 120 |
| + | |||
| + | |||
| + | |||
| M | 165 | Anteroseptal MI | 91 |
| + | |||
| + | |||
| + | |||
| M | 166 | Extensive anterior MI | 7 |
| + |
Overview of ECG categories in the dataset.
| Code | Category | Count |
|---|---|---|
| A | Overall interpretation | 13905 |
| B | Technical conditions | 0 |
| C | Sinus node rhythms and arrhythmias | 4643 |
| D | Supraventricular arrhythmias | 622 |
| E | Supraventricular tachyarrhythmias | 787 |
| F | Ventricular arrhythmias | 1067 |
| G | Ventricular tachyarrhythmias | 0 |
| H | Atrioventricular conduction | 370 |
| I | Intraventricular and intra-atrial conduction | 2195 |
| J | Axis and voltage | 612 |
| K | Chamber hypertrophy or enlargement | 229 |
| L | ST segment, T wave, and U wave | 5125 |
| M | Myocardial infarction | 260 |
| N | Pacemaker | 0 |
Metadata describing the ECG record.
| Field | Type | Description |
|---|---|---|
| ECG_ID | String | Unique identifier for ECG |
| AHA_Code | String | Encoded representation (see Fig. |
| Patient_ID | String | Unique identifier for patient |
| Age | Integer | Age (18∼100) |
| Sex | String | Sex (’M’: male,’F’: female) |
| N | Integer | The number of sampling point |
| Date | String | Acquisition date |
Fig. 2Encoded representation of AHA diagnostic statements.
Overview of patient age.
| Age | [10, 20) | [20, 30) | [30, 40) | [40, 50) | [50, 60) | [60, 70) | [70, 80) | [80, 90) | [90, 100] |
| #Records | 86 | 2229 | 5145 | 5110 | 5723 | 4441 | 2161 | 822 | 53 |
Overview of ECG record length.
| Seconds | [10, 15) | [15, 20) | [20, 25) | [25, 30) | [30, 35) | [35, 40) | [40, 45) | [45, 50) | [50, 55) | [55, 60] |
| #Records | 24242 | 1141 | 257 | 71 | 26 | 15 | 11 | 3 | 3 | 1 |
Overview of number of statements per ECG record.
| #Statements | 1 | 2 | 3 | 4 | 5 | 6 |
| #Records | 22046 | 2936 | 665 | 109 | 12 | 2 |
Overview of number of ECG records per patient.
| #Records | 1 | 2 | 3 | 4 | 5 |
| #Patients | 23600 | 1033 | 29 | 3 | 1 |
Fig. 3Distribution of basSQI of all ECG records.
Fig. 4Distribution of pSQI of all ECG records.
Fig. 5ECG signals with lowest basSQI values. The length is 10 seconds.
Fig. 6Co-occurrence matrix of primary statements. The diagonal element represents the number of records labeled only by the specified statement.
Fig. 7ECG classification tasks from coarse level to fine level.
| Measurement(s) | 12-lead electrocardiogram |
| Technology Type(s) | EKG or ECG Monitor Device |