| Literature DB >> 34943586 |
Yang Yang1, Xing-Ming Guo1, Hui Wang1, Yi-Neng Zheng2.
Abstract
The aggravation of left ventricular diastolic dysfunction (LVDD) could lead to ventricular remodeling, wall stiffness, reduced compliance, and progression to heart failure with a preserved ejection fraction. A non-invasive method based on convolutional neural networks (CNN) and heart sounds (HS) is presented for the early diagnosis of LVDD in this paper. A deep convolutional generative adversarial networks (DCGAN) model-based data augmentation (DA) method was proposed to expand a HS database of LVDD for model training. Firstly, the preprocessing of HS signals was performed using the improved wavelet denoising method. Secondly, the logistic regression based hidden semi-Markov model was utilized to segment HS signals, which were subsequently converted into spectrograms for DA using the short-time Fourier transform (STFT). Finally, the proposed method was compared with VGG-16, VGG-19, ResNet-18, ResNet-50, DenseNet-121, and AlexNet in terms of performance for LVDD diagnosis. The result shows that the proposed method has a reasonable performance with an accuracy of 0.987, a sensitivity of 0.986, and a specificity of 0.988, which proves the effectiveness of HS analysis for the early diagnosis of LVDD and demonstrates that the DCGAN-based DA method could effectively augment HS data.Entities:
Keywords: convolutional neural network; deep convolutional generative adversarial networks; diagnosis; heart sounds; left ventricular diastolic dysfunction
Year: 2021 PMID: 34943586 PMCID: PMC8699866 DOI: 10.3390/diagnostics11122349
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1A flow diagram of this paper. The CNN is the proposed model, and the others are the compared models.
Some Doppler echocardiography indices (mean ± standard deviation).
| Index | LVDD Group | Control Group | |
|---|---|---|---|
| LVEF | 0.45 ± 0.16 | 0.64 ± 0.03 | 0.000 * |
| Peak E-wave velocity(cm/s) | 73.81 ± 26.60 | 63.25 ± 14.07 | 0.034 * |
| Peak A-wave velocity(cm/s) | 72.58 ± 27.48 | 75.53 ± 18.07 | 0.589 |
| Septal e’ velocity(cm/s) | 4.18 ± 1.40 | 6.20 ± 2.09 | 0.000 * |
| Average E/e’ | 18.57 ± 6.55 | 10.66 ± 2.00 | 0.000 * |
| LA volume index (mL/m2) | 40.68 ± 15.49 | 23.52 ± 5.48 | 0.000 * |
| TR velocity(m/s) | 28.74 ± 6.62 | 16.79 ± 7.26 | 0.000 * |
* p-value < 0.05 demonstrates statistically significant difference.
Figure 2The examples of converting HS samples into spectrogram: (A) LVDD group; (B) control group.
The number of HS samples in different groups.
| Group | Subjects | Age (Mean ± Square Deviation) | Samples | Sample Length (s) |
|---|---|---|---|---|
| LVDD group | 30 | Aged 24–89 (66.87 ± 16.21) | 3677 | 1.6 |
| Control group | 41 | Aged 19–81 (58.71 ± 13.19) | 4803 | 1.6 |
The detailed information of the CNN model.
| # | Layer | Filter Size | Stride | Output Dimension | Activation Function |
|---|---|---|---|---|---|
| 1 | Input | — | — | (128, 128, 3) | — |
| 2 | Conv1 | 3 × 3 | 1 | (128, 128, 64) | ReLU |
| 3 | Maxpool1 | 2 × 2 | 2 | (64, 64, 64) | — |
| 4 | Conv2 | 3 × 3 | 1 | (64, 64, 32) | ReLU |
| 5 | Maxpool2 | 2 × 2 | 2 | (32, 32, 32) | — |
| 6 | Conv3 | 3 × 3 | 1 | (32, 32, 16) | ReLU |
| 7 | Maxpool3 | 2 × 2 | 2 | (16, 16, 16) | — |
| 8 | FC1 | — | — | (128, 1) | — |
| 9 | FC2 | — | — | (100, 1) | dropout = 0.5 |
| 10 | Output | — | — | (0, 1) | softmax |
Conv = convolutional layer; Maxpool = max-pooling layer; FC = fully connected layer.
Figure 3The structure of the GAN model.
The detailed information of the generative model.
| Layer | Filter Size | Stride | Output Dimension | Activation Function | BN |
|---|---|---|---|---|---|
| Input | — | — | (1, 1, 100) | ReLU | Yes |
| Deconv1 | 5 × 5 | 2 | (8, 8, 512) | ReLU | Yes |
| Deconv2 | 5 × 5 | 2 | (16, 16, 256) | ReLU | Yes |
| Deconv3 | 5 × 5 | 2 | (32, 32, 128) | ReLU | Yes |
| Deconv4 | 5 × 5 | 2 | (64, 64, 64) | ReLU | Yes |
| Deconv5 | 5 × 5 | 2 | (128, 128, 3) | ReLU | Yes |
| Output | — | — | (128, 128, 3) | Tanh | No |
Deconv = deconvolutional layer.
The detailed information of the discriminant model.
| Layer | Filter Size | Stride | Output Dimension | Activation Function | BN |
|---|---|---|---|---|---|
| Input | — | — | (128, 128, 3) | Leaky ReLU | No |
| Conv1 | 5 × 5 | 2 | (64, 64, 64) | Leaky ReLU | Yes |
| Conv2 | 5 × 5 | 2 | (32, 32, 128) | Leaky ReLU | Yes |
| Conv3 | 5 × 5 | 2 | (16, 16, 256) | Leaky ReLU | Yes |
| Conv4 | 5 × 5 | 2 | (8, 8, 512) | Leaky ReLU | Yes |
| Conv5 | 5 × 5 | 2 | (4, 4, 1024) | Leaky ReLU | Yes |
| Output | — | — | (0, 1) | sigmoid | Yes |
Conv = convolutional layer.
Figure 4The structure of the DCGAN model.
Figure 5The loss value of the DCGAN model.
Figure 6The examples of the spectrogram using the DCGAN model in the LVDD group with an increasing number of epochs: (A) original image; (B) epoch = 0; (C) epoch = 50; (D) epoch = 100; (E) epoch = 150; (F) epoch = 200; (G) epoch = 250; (H) epoch = 300.
Figure 7The structure of the CNN model.
Figure 8The performance of the CNN model in the DCGAN dataset with different coefficients.
Characteristics of the CNNs’ architectures used in this paper.
| Models | Parameters | Epochs |
|---|---|---|
| VGG-16 | 138,357,544 | 500 |
| VGG-19 | 20,483,904 | 500 |
| ResNet-18 | 63,470,656 | 500 |
| ResNet-50 | 46,159,168 | 500 |
| DenseNet-121 | 62,378,344 | 500 |
| AlexNet | 82,378,344 | 500 |
| Proposed CNN | 559,396 | 500 |
The results of different models on the testing set in the RS dataset during the 10-fold cross-validation (mean ± standard deviation).
| Models | Evaluation Metrics | Training Time (mins:secs) | ||
|---|---|---|---|---|
| Acc | Se | Sp | ||
| VGG-16 | 0.913 ± 0.018 | 0.929 ± 0.018 | 0.892 ± 0.019 | 443:68 |
| VGG-19 | 0.894 ± 0.022 | 0.908 ± 0.021 | 0.876 ± 0.022 | 112:37 |
| ResNet-18 | 0.861 ± 0.021 | 0.883 ± 0.020 | 0.864 ± 0.020 | 251:26 |
| ResNet-50 | 0.883 ± 0.022 | 0.899 ± 0.021 | 0.871 ± 0.020 | 178:13 |
| DenseNet-121 | 0.842 ± 0.019 | 0.856 ± 0.022 | 0.825 ± 0.021 | 224:79 |
| AlexNet | 0.879 ± 0.017 | 0.897 ± 0.019 | 0.857 ± 0.019 | 317:25 |
| Proposed CNN | 0.916 ± 0.015 | 0.932 ± 0.017 | 0.895 ± 0.018 | 65:44 |
RS dataset = 3677 images in the LVDD group and 4803 images in the control group.
The results of different models on the testing set in the RS dataset + NG dataset during the 10-fold cross-validation (mean ± standard deviation).
| Models | Evaluation Metrics | Training Time (mins:secs) | ||
|---|---|---|---|---|
| Acc | Se | Sp | ||
| VGG-16 | 0.949 ± 0.009 | 0.958 ± 0.009 | 0.931 ± 0.010 | 535:25 |
| VGG-19 | 0.928 ± 0.013 | 0.933 ± 0.012 | ±0.912 ± 0.013 | 243:14 |
| ResNet-18 | 0.902 ± 0.011 | 0.911 ± 0.010 | 0.901 ± 0.011 | 361:12 |
| ResNet-50 | 0.925 ± 0.012 | 0.934 ± 0.011 | 0.914 ± 0.011 | 302:57 |
| DenseNet-121 | 0.887 ± 0.011 | 0.896 ± 0.012 | 0.867 ± 0.012 | 332:13 |
| AlexNet | 0.919 ± 0.009 | 0.930 ± 0.009 | 0.898 ± 0.010 | 435:79 |
| Proposed CNN | 0.955 ± 0.005 | 0.966 ±0.007 | 0.947 ± 0.008 | 179:26 |
RS dataset + NG dataset = 22,062 images in the LVDD group and 28,818 images in the control group.
The results of different models on the testing set in the RS dataset + DCGAN dataset during the 10-fold cross-validation (mean ± standard deviation).
| Models | Evaluation Metrics | Training Time (mins:secs) | ||
|---|---|---|---|---|
| Acc | Se | Sp | ||
| VGG-16 | 0.981 ± 0.003 | 0.978 ± 0.004 | 0.979 ± 0.003 | 494:56 |
| VGG-19 | 0.964 ± 0.004 | 0.956 ± 0.005 | 0.961 ± 0.005 | 194:78 |
| ResNet-18 | 0.957 ± 0.004 | 0.941 ± 0.006 | 0.949 ± 0.005 | 314:34 |
| ResNet-50 | 0.968 ± 0.005 | 0.952 ± 0.006 | 0.951 ± 0.006 | 263:71 |
| DenseNet-121 | 0.936 ± 0.005 | 0.922 ± 0.007 | 0.913 ± 0.006 | 286:45 |
| AlexNet | 0.962 ± 0.004 | 0.959 ± 0.005 | 0.955 ± 0.005 | 381:24 |
| Proposed CNN | 0.987 ± 0.001 | 0.986 ± 0.002 | 0.988 ± 0.002 | 130:13 |
RS dataset + DCGAN dataset = 33,093 images in the LVDD group and 43,227 images in the control group.
Figure 9The training and validation performance of the CNN model in the RS dataset + DCGAN dataset at 500 epochs.