| Literature DB >> 32051412 |
Jianwei Zheng1, Jianming Zhang2, Sidy Danioko1, Hai Yao3, Hangyuan Guo4, Cyril Rakovski1.
Abstract
This newly inaugurated research database for 12-lead electrocardiogram signals was created under the auspices of Chapman University and Shaoxing People's Hospital (Shaoxing Hospital Zhejiang University School of Medicine) and aims to enable the scientific community in conducting new studies on arrhythmia and other cardiovascular conditions. Certain types of arrhythmias, such as atrial fibrillation, have a pronounced negative impact on public health, quality of life, and medical expenditures. As a non-invasive test, long term ECG monitoring is a major and vital diagnostic tool for detecting these conditions. This practice, however, generates large amounts of data, the analysis of which requires considerable time and effort by human experts. Advancement of modern machine learning and statistical tools can be trained on high quality, large data to achieve exceptional levels of automated diagnostic accuracy. Thus, we collected and disseminated this novel database that contains 12-lead ECGs of 10,646 patients with a 500 Hz sampling rate that features 11 common rhythms and 67 additional cardiovascular conditions, all labeled by professional experts. The dataset consists of 10-second, 12-dimension ECGs and labels for rhythms and other conditions for each subject. The dataset can be used to design, compare, and fine-tune new and classical statistical and machine learning techniques in studies focused on arrhythmia and other cardiovascular conditions.Entities:
Mesh:
Year: 2020 PMID: 32051412 PMCID: PMC7016169 DOI: 10.1038/s41597-020-0386-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1The ECG waveform and segments in lead II that presents a normal cardiac cycle.
7 ECG databases comparison.
| Name | Subjects | Records (length) | Sampling rate | Age | Male, n(%) | Lead, n |
|---|---|---|---|---|---|---|
| MIT-BIH | 47 | 48 (30 min) | 360 Hz | 23–89 | 25 (52.08) | 2 |
| EDB | 79 | 90 (120 min) | 250 Hz | 30–84 | 70 (88.61) | 2 |
| AHA | N/A | 154 (180 min) | 250 Hz | N/A | N/A | 2 |
| CU | 35 | 35 (8 min) | 250 Hz | N/A | N/A | 2 |
| NSD | 2 | 12 (30 min) | 360 Hz | 51–69 | 1 (50) | 2 |
| St Petersburg DB | 32 | 75 (30 min) | 257 Hz | 18–80 | 17 (53.13) | 12 |
| Proposed one | 10646 | 10646 (10 second) | 500 Hz | 4–98 | 5956 (55.95) | 12 |
Rhythm information and baseline characteristics of participants.
| Acronym Name | Full Name | Frequency, n(%) | Age, Mean ± SD | Male, n(%) |
|---|---|---|---|---|
| SB | Sinus Bradycardia | 3,889 (36.53) | 58.34 ± 13.95 | 2,481 (58.48%) |
| SR | Sinus Rhythm | 1,826 (17.15) | 54.35 ± 16.33 | 1,024 (56.08%) |
| AFIB | Atrial Fibrillation | 1,780 (16.72) | 73.36 ± 11.14 | 1,041 (58.48%) |
| ST | Sinus Tachycardia | 1,568 (14.73) | 54.57 ± 21.06 | 799 (50.96%) |
| AF | Atrial Flutter | 445 (4.18) | 71.07 ± 13.5 | 257 (57.75%) |
| SI | Sinus Irregularity | 399 (3.75) | 34.75 ± 23.03 | 223 (55.89%) |
| SVT | Supraventricular Tachycardia | 587 (5.51) | 55.62 ± 18.53 | 308 (52.47%) |
| AT | Atrial Tachycardia | 121 (1.14) | 65.72 ± 19.3 | 64 (52.89%) |
| AVNRT | Atrioventricular Node Reentrant Tachycardia | 16 (0.15) | 57.88 ± 17.34 | 12 (75%) |
| AVRT | Atrioventricular Reentrant Tachycardia | 8 (0.07) | 57.5 ± 16.84 | 5 (62.5%) |
| SAAWR | Sinus Atrium to Atrial Wandering Rhythm | 7 (0.07) | 51.14 ± 31.83 | 6 (85.71%) |
| All | All | 10,646 (100) | 51.19 ± 18.03 | 5,956 (55.95%) |
Fig. 2An ECG containing both low and high frequency noise.
Fig. 3An ECG after noise reduction.
Fig. 4An ECG containing baseline wandering.
Fig. 5An ECG after removing baseline wandering.
Attributes in diagnosis file.
| Attributes | Type | Value Range | Description |
|---|---|---|---|
| FileName | String | ECG data file name (unique ID) | |
| Rhythm | String | Rhythm Label | |
| Beat | String | Other conditions Label | |
| PatientAge | Numeric | 0–999 | Age |
| Gender | String | MALE/FEMAL | Gender |
| VentricularRate | Numeric | 0–999 | Ventricular rate in BPM |
| AtrialRate | Numeric | 0–999 | Atrial rate in BPM |
| QRSDuration | Numeric | 0–999 | QRS duration in msec |
| QTInterval | Numeric | 0–999 | QT interval in msec |
| QTCorrected | Numeric | 0–999 | Corrected QT interval in msec |
| RAxis | Numeric | −179~180 | R axis |
| TAxis | Numeric | −179~181 | T axis |
| QRSCount | Numeric | 0–254 | QRS count |
| QOnset | Numeric | 16 Bit Unsigned | Q onset (In samples) |
| QOffset | Numeric | 17 Bit Unsigned | Q offset (In samples) |
| TOffset | Numeric | 18 Bit Unsigned | T offset (In samples) |
.
| Acronym Name | Full Name |
|---|---|
| 1AVB | 1 degree atrioventricular block |
| 2AVB | 2 degree atrioventricular block |
| 2AVB1 | 2 degree atrioventricular block(Type one) |
| 2AVB2 | 2 degree atrioventricular block(Type two) |
| 3AVB | 3 degree atrioventricular block |
| ABI | atrial bigeminy |
| ALS | Axis left shift |
| APB | atrial premature beats |
| AQW | abnormal Q wave |
| ARS | Axis right shift |
| AVB | atrioventricular block |
| CCR | countercolockwise rotation |
| CR | colockwise rotation |
| ERV | Early repolarization of the ventricles |
| FQRS | fQRS Wave |
| IDC | Interior differences conduction |
| IVB | Intraventricular block |
| JEB | junctional escape beat |
| JPS | J point shift |
| JPT | junctional premature beat |
| LBBB | left bundle branch block |
| LBBBB | left back bundle branch block |
| LFBBB | left front bundle branch block |
| LRRI | Long RR interval |
| LVH | left ventricle hypertrophy |
| LVHV | left ventricle high voltage |
| LVQRSAL | lower voltage QRS in all lead |
| LVQRSCL | lower voltage QRS in chest lead |
| LVQRSLL | lower voltage QRS in limb lead |
| MI | myocardial infarction |
| MIBW | myocardial infraction in back wall |
| MIFW | Myocardial infgraction in the front wall |
| MILW | Myocardial infraction in the lower wall |
| MISW | Myocardial infraction in the side wall |
| PRIE | PR interval extension |
| PWC | P wave Change |
| QTIE | QT interval extension |
| RAH | right atrial hypertrophy |
| RAHV | right atrial high voltage |
| RBBB | right bundle branch block |
| RVH | right ventricle hypertrophy |
| STDD | ST drop down |
| STE | ST extension |
| STTC | ST-T Change |
| STTU | ST tilt up |
| TWC | T wave Change |
| TWO | T wave opposite |
| UW | U wave |
| VB | ventricular bigeminy |
| VEB | ventricular escape beat |
| VFW | ventricular fusion wave |
| VPB | ventricular premature beat |
| VPE | ventricular preexcitation |
| VET | ventricular escape trigeminy |
| WAVN | Wandering in the atrioventricalualr node |
| WPW | WPW |
Performance report of gradient boosting tree model.
| Rhythm group | F1-score | Precision | Recall |
|---|---|---|---|
| AFIB | 0.941 | 0.938 | 0.944 |
| GSVT | 0.949 | 0.953 | 0.944 |
| SB | 0.993 | 0.990 | 0.996 |
| SR | 0.977 | 0.982 | 0.972 |
| macro avg | 0.965 | 0.966 | 0.964 |
| micro avg | 0.970 | 0.970 | 0.970 |
| weighted avg | 0.970 | 0.971 | 0.970 |
Fig. 6The common process of ECG analysis.
The quantity of data after merged classes.
| Merged from | Merged to | Total | Training data size (80%) | Testing data size (20%) |
|---|---|---|---|---|
| AFIB, AF | AFIB | 3,889 | 3,111 | 778 |
| SVT, AT, SAAWR, ST, AVNRT, AVRT | GSVT | 2,307 | 1,846 | 461 |
| SB | SB | 2,225 | 1,780 | 455 |
| SR, SI | SR | 2,225 | 1,780 | 455 |
| All | All | 10,646 | 8,517 | 2,129 |
| Measurement(s) | cardiac arrhythmia |
| Technology Type(s) | 12 lead electrocardiography • digital curation |
| Factor Type(s) | sex • experimental condition • age group |
| Sample Characteristic - Organism | Homo sapiens |