| Literature DB >> 34753975 |
Vajira Thambawita1,2, Jonas L Isaksen3, Michael A Riegler4,5, Jørgen K Kanters6, Steven A Hicks7,8, Jonas Ghouse3, Gustav Ahlberg3, Allan Linneberg3,9, Niels Grarup3, Christina Ellervik3, Morten Salling Olesen3, Torben Hansen3, Claus Graff10, Niels-Henrik Holstein-Rathlou11, Inga Strümke7, Hugo L Hammer7,8, Mary M Maleckar7,8, Pål Halvorsen7,8.
Abstract
Recent global developments underscore the prominent role big data have in modern medical science. But privacy issues constitute a prevalent problem for collecting and sharing data between researchers. However, synthetic data generated to represent real data carrying similar information and distribution may alleviate the privacy issue. In this study, we present generative adversarial networks (GANs) capable of generating realistic synthetic DeepFake 10-s 12-lead electrocardiograms (ECGs). We have developed and compared two methods, named WaveGAN* and Pulse2Pulse. We trained the GANs with 7,233 real normal ECGs to produce 121,977 DeepFake normal ECGs. By verifying the ECGs using a commercial ECG interpretation program (MUSE 12SL, GE Healthcare), we demonstrate that the Pulse2Pulse GAN was superior to the WaveGAN* to produce realistic ECGs. ECG intervals and amplitudes were similar between the DeepFake and real ECGs. Although these synthetic ECGs mimic the dataset used for creation, the ECGs are not linked to any individuals and may thus be used freely. The synthetic dataset will be available as open access for researchers at OSF.io and the DeepFake generator available at the Python Package Index (PyPI) for generating synthetic ECGs. In conclusion, we were able to generate realistic synthetic ECGs using generative adversarial neural networks on normal ECGs from two population studies, thereby addressing the relevant privacy issues in medical datasets.Entities:
Mesh:
Year: 2021 PMID: 34753975 PMCID: PMC8578227 DOI: 10.1038/s41598-021-01295-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Quantitative difference between WaveGAN* and Pulse2Pulse GAN in the initial training for determining the optimal network and optimal number of epochs.
| Checkpoint (epochs) | Fraction of DeepFake ECGs classified as Normal (%) | |
|---|---|---|
| WaveGAN* | Pulse2Pulse | |
| 500 | 20.9 | 78.7 |
| 1000 | 69.5 | 81.2 |
| 1500 | 71.2 | 78.8 |
| 2000 | 79.7 | |
| 2500 | 71.3 | |
| 3000 | 65.3 | 81.5 |
The best values are bolded for each GAN.
Figure 1Comparison of examples of a real ECG (left lane) and a DeepFake ECG (right lane). See supplementary Figure S1 for 20 more randomly chosen pairs of real and DeepFake ECGs.
Figure 2Distribution of heart rates in all 150,000 DeepFake electrocardiograms. Red fill denotes outside the normal heart rate range. Blue fill is within normal heart rate range (60–99).
Figure 3Scatter plot of the QT/RR interval relationship where Real ECG are shown in blue and normal DeepFakes in red. DeepFake dots are nudged 1 ms to the left for visibility. Note that there are 121,977 normal DeepFakes and only 7233 Real ECG making the DeepFake distribution more pronounced. As seen by the correlation coefficient r2, the real and the fake DeepFake ECGs are similarly distributed.
Figure 4An ECG complex with the nomenclature of intervals (QT, QRS, P duration) and Amplitudes (STJ, R, T) and RR-interval (which can be converted to heart rate (HR) as HR = 60/RR interval.
Mean, standard deviation (std), 2.5%, and 97.5% percentile for standard ECG parameters in real and fake ECGs.
| Real—normal (7233) | Pulse2Pulse—normal (121,977) | Pulse2Pulse—all (150,000) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Std | 2.5% | 97.5% | Mean | Std | 2.5% | 97.5% | Mean | Std | 2.5% | 97.5% | ||
| Heart rate | BPM | 70 | 8 | 60 | 90 | 70 | 7 | 60 | 88 | 70 | 8 | 60 | 89 |
| P duration | ms | 105 | 12 | 82 | 130 | 117 | 17 | 86 | 152 | 118 | 17 | 84 | 152 |
| QT interval | ms | 395 | 21 | 352 | 436 | 395 | 20 | 354 | 436 | 395 | 22 | 352 | 436 |
| QRS duration | ms | 90 | 9 | 74 | 110 | 92 | 9 | 78 | 112 | 93 | 10 | 78 | 114 |
| PR interval | ms | 156 | 19 | 120 | 198 | 158 | 17 | 126 | 192 | 159 | 19 | 124 | 194 |
| STJ amplitude (V5) | µV | 2 | 27 | − 44 | 58 | 18 | 33 | − 44 | 87 | 16 | 36 | − 54 | 87 |
| R amplitude (V5) | µV | 1287 | 402 | 600 | 2163 | 1275 | 367 | 620 | 2026 | 1273 | 402 | 566 | 2094 |
| T amplitude (V5) | µV | 343 | 137 | 126 | 664 | 366 | 135 | 156 | 668 | 361 | 141 | 141 | 673 |
BPM beats per minute.
Figure 5Model architectures of the generators and the discriminator used to generate synthetic ECGs. WaveGAN* uses a 1D noise vector with 100 points. Pulse2Pulse uses a 2D noise vector with size of 8 × 5000 as input, same as the output ECG size.