Jae Hyon Park1, Insun Park2, Kichang Han3, Jongjin Yoon1, Yongsik Sim1, Soo Jin Kim4, Jong Yun Won1, Shina Lee5, Joon Ho Kwon1, Sungmo Moon1, Gyoung Min Kim1, Man-Deuk Kim1. 1. Department of Radiology, Yonsei University College of Medicine, Seoul, Korea. 2. Department of Anesthesiology and Pain Medicine, Seoul National University Bundang Hospital, Seongnam, Korea. 3. Department of Radiology, Yonsei University College of Medicine, Seoul, Korea. wowsaycheese@yuhs.ac. 4. Department of Surgery, Yonsei University College of Medicine, Seoul, Korea. 5. Department of Internal Medicine, College of Medicine, Ewha Womans University, Seoul, Korea.
Abstract
OBJECTIVE: To investigate the feasibility of using a deep learning-based analysis of auscultation data to predict significant stenosis of arteriovenous fistulas (AVF) in patients undergoing hemodialysis requiring percutaneous transluminal angioplasty (PTA). MATERIALS AND METHODS: Forty patients (24 male and 16 female; median age, 62.5 years) with dysfunctional native AVF were prospectively recruited. Digital sounds from the AVF shunt were recorded using a wireless electronic stethoscope before (pre-PTA) and after PTA (post-PTA), and the audio files were subsequently converted to mel spectrograms, which were used to construct various deep convolutional neural network (DCNN) models (DenseNet201, EfficientNetB5, and ResNet50). The performance of these models for diagnosing ≥ 50% AVF stenosis was assessed and compared. The ground truth for the presence of ≥ 50% AVF stenosis was obtained using digital subtraction angiography. Gradient-weighted class activation mapping (Grad-CAM) was used to produce visual explanations for DCNN model decisions. RESULTS: Eighty audio files were obtained from the 40 recruited patients and pooled for the study. Mel spectrograms of "pre-PTA" shunt sounds showed patterns corresponding to abnormal high-pitched bruits with systolic accentuation observed in patients with stenotic AVF. The ResNet50 and EfficientNetB5 models yielded an area under the receiver operating characteristic curve of 0.99 and 0.98, respectively, at optimized epochs for predicting ≥ 50% AVF stenosis. However, Grad-CAM heatmaps revealed that only ResNet50 highlighted areas relevant to AVF stenosis in the mel spectrogram. CONCLUSION: Mel spectrogram-based DCNN models, particularly ResNet50, successfully predicted the presence of significant AVF stenosis requiring PTA in this feasibility study and may potentially be used in AVF surveillance.
OBJECTIVE: To investigate the feasibility of using a deep learning-based analysis of auscultation data to predict significant stenosis of arteriovenous fistulas (AVF) in patients undergoing hemodialysis requiring percutaneous transluminal angioplasty (PTA). MATERIALS AND METHODS: Forty patients (24 male and 16 female; median age, 62.5 years) with dysfunctional native AVF were prospectively recruited. Digital sounds from the AVF shunt were recorded using a wireless electronic stethoscope before (pre-PTA) and after PTA (post-PTA), and the audio files were subsequently converted to mel spectrograms, which were used to construct various deep convolutional neural network (DCNN) models (DenseNet201, EfficientNetB5, and ResNet50). The performance of these models for diagnosing ≥ 50% AVF stenosis was assessed and compared. The ground truth for the presence of ≥ 50% AVF stenosis was obtained using digital subtraction angiography. Gradient-weighted class activation mapping (Grad-CAM) was used to produce visual explanations for DCNN model decisions. RESULTS: Eighty audio files were obtained from the 40 recruited patients and pooled for the study. Mel spectrograms of "pre-PTA" shunt sounds showed patterns corresponding to abnormal high-pitched bruits with systolic accentuation observed in patients with stenotic AVF. The ResNet50 and EfficientNetB5 models yielded an area under the receiver operating characteristic curve of 0.99 and 0.98, respectively, at optimized epochs for predicting ≥ 50% AVF stenosis. However, Grad-CAM heatmaps revealed that only ResNet50 highlighted areas relevant to AVF stenosis in the mel spectrogram. CONCLUSION: Mel spectrogram-based DCNN models, particularly ResNet50, successfully predicted the presence of significant AVF stenosis requiring PTA in this feasibility study and may potentially be used in AVF surveillance.
Hemodialysis is a major renal replacement therapy for patients with end-stage renal disease (ESRD) that requires a functioning arteriovenous fistula (AVF) or arteriovenous graft [1]. However, over time, vascular thrombosis or stenosis occurs, and AVF tends to become dysfunctional. Hence, the accurate diagnosis of significant AVF stenosis and timely intervention are crucial for maintaining dialysis access.The latest 2019 Kidney Disease Outcomes Quality Initiative guidelines [2] recommend screening for AVF stenosis through regular physical examinations, including palpation and auscultation, by a health practitioner with moderate quality of evidence. AVF stenosis can be screened via auscultation based on abnormal blood flow, which is referred to as a high-pitched bruit. Auscultation is non-invasive compared to digital subtraction angiography (DSA), which is the current gold standard for assessing vascular stenosis [3], and is simple and convenient compared to Doppler ultrasonography, which requires expensive equipment and skilled operators. However, diagnoses based on sound can be subjective and rely on the practitioner's clinical experience. Moreover, even a trained practitioner cannot quantify stenosis severity based on auscultation alone. Considering that the main indications for percutaneous transluminal angioplasty (PTA) are significant stenosis (≥ 50% of the lumen) or obstruction, it is difficult to accurately assess whether a patient needs PTA [4] based on bruit.However, the quantification of auscultation and extraction of features to detect the presence of significant stenosis requiring PTA using deep learning can aid health practitioners in screening patients with ESRD who require angioplasty. Recent technical advances have enabled computer-based approaches, particularly artificial intelligence, to automatically interpret stethoscope-recorded sounds for telemedicine and self-screening [56].This pilot study aimed to evaluate the feasibility of using deep convolutional neural network (DCNN) models that analyze auscultation-based mel spectrograms to predict hemodynamically significant (≥ 50%) AVF stenosis.
MATERIALS AND METHODS
Study Population
This single-center prospective study was approved by the Institutional Review Board of our hospital (IRB No. 2020-2715-009). Patients with autologous AVFs who were referred for PTA owing to clinical signs of significant stenosis (pulsation, prolonged hemostasis time, and increased circuit pressure) were assessed for eligibility. Informed consent was obtained from all patients. The inclusion and exclusion criteria are summarized in Table 1. Forty patients with ESRD were enrolled in this study from November 2020 to August 2021. Patients’ baseline characteristics were obtained from their electronic medical records. The framework of this study is shown in Figure 1.
Table 1
Inclusion and Exclusion Criteria of the Study Population
Inclusion criteria
1. Native AVF at least 60 days before tde procedure tdat had been used for dialysis for at least 8 of 12 sessions during a 4-week period, ensuring fistula maturity
2. ≥ 50% stenosis documented on fistulogram
Exclusion criteria
1. Thrombosed AVF
2. Age < 18 years
AVF = arteriovenous fistula
Fig. 1
Framework of the proposed DCNN models.
AVF = arteriovenous fistula, DCNN = deep convolutional neural network
Digital Subtraction Angiography (DSA) and Percutaneous Transluminal Angioplasty (PTA)
One board-certified interventional radiologist with seven years of experience performed all DSA and PTA procedures. The fistula was accessed under ultrasound guidance, and a 7F vascular sheath was inserted at an appropriate puncture site along the AVF. The balloon was sized based on the diameter of the adjacent normal segment of the vein. Angioplasty was performed using the manufacturer’s stated burst pressure and was maintained for at least 30 seconds until the waist deformity of the balloon catheter was completely effaced. Procedural endpoint or technical success was defined as less than 30% residual stenosis or restoration of the thrill on palpation. Clinical success was defined as the restoration of blood flow to a level permitting at least one dialysis treatment after PTA. The degree of AVF stenosis was quantified using the open-source software ImageJ (US National Institute of Health) by measuring the vessel diameter at the most stenotic site and proximal non-stenotic site using DSA images before and after PTA. Each stenosis was independently measured three times, and the values were averaged. Complications were classified as major or minor according to the practice guidelines of the Society of Interventional Radiology [7].
Data: Recording of AVF Shunt Sounds
A wireless electronic stethoscope (Stemoscope, Hulu Devices) was used to record shunt sounds by placing it on top of the venous access 1–2 cm distal to the anastomosis site for 10–15 seconds. Shunt sounds were recorded before and after PTA and were labeled “pre-PTA” and “post-PTA,” respectively. A total of 80 AVF shunt sounds (40 pre-PTA and 40 post-PTA) were recorded and saved as .wav audio files. These 80 sounds were pooled for this study. According to DSA findings, all patients had significant (≥ 50%) stenosis before PTA, but none after. Therefore, all pre- and post-PTA sounds were categorized as positive and negative, respectively.
Data: Preprocessing and Feature Extraction
Because the length of each audio file varied due to manual recording, the audio files were trimmed or padded to a length of 6 seconds using the Python library Librosa [8]. In this study, we used a mel spectrogram for feature extraction because it is one of the most widely used methods for audio data representation [910]. To obtain a mel spectrogram, the audio file was first mapped from the time domain to the frequency domain using a short-time Fourier transform with a window length of 25 ms and stride length of 10 ms. The frequency was subsequently converted to a mel scale and amplitude-to-color dimensions using mel filters to generate a mel spectrogram, representing the short-term power spectrum of sound. Each mel spectrogram was again normalized and resized to a resolution of 128 × 128 with three channels with the x-axis, y-axis, and color representing the time, frequency (Hz), and magnitude of amplitude, respectively.
Data: Augmentation
The synthetic minority over-sampling technique (SMOTE) algorithm was used to generate synthetic mel spectrograms from existing neighboring mel spectrograms to best represent real-world data that may be obtained in clinical settings [11]. Data augmentation was performed 25 times (Fig. 2).
In this study, widely used convolutional neural network architectures, including DenseNet201 [12], EfficientNetB5 [13], and ResNet50 [14], were used to construct DCNN models for predicting hemodynamically significant AVF stenosis. Two fully connected layers using a rectified linear unit as the activation function were added after the Conv-pool layers with 2048 and 2048 neurons, and two dropout layers (rate = 0.5) were added after the first and second dense layers for regularization and to avoid model overfitting. Finally, for binary classification, a final layer with one neuron was added using the softmax activation function. DCNN models were initialized with ImageNet weights and compiled with categorical cross-entropy as a loss function and a root mean square propagation [15] optimizer with a learning rate of 0.0001. The models were trained with batch sizes of 10 and 50 epochs. Each dataset was randomly divided into training, validation, and test sets using split ratios of 70%, 10%, and 20%, respectively (Fig. 2). Gradient-weighted class activation mapping (Grad-CAM) [16] was used to produce visual explanations for DCNN model decisions.
Model: Implementation
All codes were written and executed on Google Colab (https://colab.research.google.com, n.d.), which provided 12 Gb of random access memory and an NVIDIA Tesla K80 graphics processing unit. Python version 3.10.4, along with the Python libraries NumPy, pandas, scikit-learn, TensorFlow, and Keras, was used.
Evaluation: Metrics and Statistical Analyses
The performance of the DCNN models for diagnosing ≥ 50% AVF stenosis was evaluated using the area under the receiver operating characteristics curve (AUROC) and using a confusion matrix, precision (i.e., positive predictive value), accuracy, recall (i.e., hit rate, sensitivity, or true positive rate), and F-1 score by applying a diagnostic cutoff of ≥ 0.5 for the final model output. The ground truth for the presence of ≥ 50% AVF stenosis was based on the results of DSA. Statistical analyses were performed using Google Colab or SAS software version 9.4 (SAS Institute).
RESULTS
Study Population Characteristics
The baseline characteristics of patients are summarized in Table 2. The median age of the 40 patients with ESRD was 62.5 years, and 24 male and 16 female comprised the cohort. The most common AVF types were brachial-cephalic (55%), radial-cephalic (40%), and brachial-basilic (5%). More than half (26/40, 65%) of all patients underwent PTA at least once prior to referral. All patients demonstrated technical and clinical success after PTA. There were no major complications, and one patient developed a hematoma after the procedure.
Table 2
Baseline Patient Characteristics
Age, years*
62.5 (53.0–69.8)
Sex
Male
24 (60)
Female
16 (40)
AVF type
Brachial-cephalic
22 (55)
Radial-cephalic
16 (40)
Brachial-basilic
2 (5)
Location of stenosis
Juxta-anastomotic vein
21 (53)
Cephalic arch
12 (30)
Cannulation zone
7 (18)
Previous PTA (+)
Number (x) of previous PTA
26 (65)
n ≤ 2
28 (70)
2 < n ≤ 5
8 (20)
5 < n ≤ 8
2 (5)
8 < n
2 (5)
ESRD etiology
Diabetic nephropathy
26 (65)
Hypertensive nephropathy
3 (8)
C1q nephropathy
2 (5)
RPGN
1 (3)
ADPKD
1 (3)
Unknown
7 (18)
Comorbidities
Type 2 diabetes mellitus
26 (65)
Hypertension
26 (65)
Heart failure
4 (10)
Hyperlipidemia
9 (23)
Coronary artery occlusive disease
10 (25)
Peripheral artery occlusive disease
2 (5)
Current smoker
12 (30)
Body mass index*
22.7 (21.0–25.1)
Median time between AVF formation date and PTA (days)*
839.5 (529.0–1351.5)
Median AVF stenosis (%) before PTA*
59.1 (49.1–65.2)
Median AVF stenosis (%) after PTA*
23.4 (15.1–36.4)
Median diameter (mm) of vessel*
6.8 (5.2–11.2)
Median flow (mL/min)*
491.4 (246.2–772.9)
Clinical success
40 (100)
Data area number of patients with % in parentheses, unless specified otherwise. *Data are presented as median values with the 25th percentile and 75th percentile in parentheses. ADPKD = autosomal dominant polycystic kidney disease, AVF = arteriovenous fistula, ESRD = end-stage renal disease, PTA = percutaneous transluminal angioplasty, RPGN = rapidly progressive glomerulonephritis
Mel Spectrograms of Pre-PTA and Post-PTA Shunt Sounds
Mel spectrograms generated from AVF shunt sounds before (pre-PTA) and after PTA (post-PTA) qualitatively correlated with the degree of AVF stenosis. The pre-PTA mel spectrogram showed a greater magnitude of amplitude at high frequency, primarily during the systolic phase, corresponding to the known high-pitched bruit with systolic accentuation in patients with stenotic AVF (Fig. 3). To quantitatively confirm this finding, the mel spectrograms were equally subdivided into three categories of high-, medium-, and low-pitch frequencies, and histograms showing the number of pixels for each magnitude of amplitude (or pixel/sound intensity) for both pre-PTA and post-PTA shunt sounds were constructed (Fig. 4). At high-pitch frequencies, pre-PTA shunt sounds had a higher number of pixels with high-pixel intensities than that of post-PTA shunt sounds. At medium- to low-pitch frequencies, post-PTA shunt sounds showed a higher number of pixels with high-pixel intensities than that of pre-PTA shunt sounds.
Fig. 3
Audio signals (amplitude vs. time) (A, D), mel spectrograms (B, E), and digital subtraction angiography image (C, F) of 61-year-old males with brachiocephalic fistula and cephalic arch stenosis.
A-F.
A, B, and C represent images before PTA (“pre-PTA”), and D, E, and F represent images after PTA (“post-PTA”). Compared to the post-PTA sound, the pre-PTA sound shows a stronger magnitude of the audio signal and higher pixel signal intensities at high-pitch frequencies with systolic accentuation in mel spectrograms. PTA = percutaneous transluminal angioplasty
Fig. 4
Histograms showing the number of pixels at various pixel intensities at (A) high-, (B) medium-, (C) low-pitch frequencies of AVF shunt sounds before and after PTA and mel spectrograms of (D) pre-PTA and (E) post-PTA shunt sounds.
Performance of DCNN Models for Predicting Hemodynamically Significant Stenosis
Three convolutional neural network architectures (DenseNet201, EfficientNetB5, and ResNet50) were used to construct the DCNN models, and their performance metrics are listed in Table 3. The training and validation sets showed similar accuracy and loss at epochs ≥ 40 for DenseNet201, ≥ 12 epochs for EfficientNetB5, and ≥ 19 epochs for ResNet50 (Supplementary Fig. 1). At these optimized epochs, the AUROCs of the DenseNet201, EfficientNetB5, and ResNet50 models were 0.70, 0.98, and 0.99, respectively (Fig. 5). However, Grad-CAM heatmaps of DenseNet201 and EfficientNetB5 indicated areas in the mel spectrogram that were irrelevant to AVF stenosis (Supplementary Fig. 2). In contrast, the Grad-CAM heatmaps of ResNet50 highlighted areas in the borders of high- and medium-pitch frequencies as well as medium- and low-pitch frequencies pertaining to the difference between pre-PTA and post-PTA shunt sounds.
Table 3
Precision, Recall, and F-1 Score of DenseNet201, EfficientNetB5, and ResNet50 DCNN Models for Predicting Hemodynamically Significant Arteriovenous Fistula Stenosis
Model
AUROC
Precision
Recall
F-1 Score
DenseNet201
0.70
0.81
0.28
0.42
EfficientNetB5
0.98
0.95
1.0
0.97
ResNet50
0.99
0.95
0.98
0.96
AUROC = area under the receiver operating characteristics curve, DCNN = deep convolutional neural network
Fig. 5
Receiver operating characteristic curves of the DenseNet201, EfficientNetB5, and ResNet50 models in predicting hemodynamically significant arteriovenous fistula stenosis in need of percutaneous transluminal angioplasty (A) and confusion matrixes of the DenseNet201 (B), EfficientNetB5 (C), and ResNet50 (D) models.
AUROC = area under the receiver operating characteristics curve
DISCUSSION
The proposed mel spectrogram-based DCNN models successfully predicted significant AVF stenosis requiring PTA. Mel spectrograms generated from auscultation showed patterns corresponding to abnormal high-pitched bruits with systolic accentuation observed in stenotic AVF. Histograms of the magnitude of amplitude for pre-PTA and post-PTA shunt sounds based on high-, medium-, and low-pitch frequencies also confirmed this finding. Except for DenseNet201, EfficientNetB5 and ResNet50 showed AUROCs > 0.95 at optimized epochs in predicting significant AVF stenosis, and Grad-CAM heatmaps showed that only ResNet50 reached decisions based on explainable differences in the mel spectrogram.Mel spectrogram patterns of pre-PTA and post-PTA shunt sounds were consistent with those of the time-frequency domain obtained through the S-transform-based method by Wang et al. [17]. In Wang et al. [17] study, the spectra of blood flow sounds of non-stenotic AVFs were mainly distributed between 200 and 600 Hz, whereas those of stenotic AVFs were mainly distributed from 600–800 Hz. Moreover, stenotic AVF showed narrower features at higher frequencies, which is similar to our results. The narrower spectrum at higher frequencies can be thought to physiologically correspond to the seagull murmur generated by turbulence in the narrow vessels [18]. This murmur is most prominent in the systolic phase as a large amount of blood flow is transmitted rapidly through the arteriovenous anastomosis, allowing turbulence and murmur formation. Previous studies [192021] have demonstrated that the frequency of this murmur correlates with the diameter of the fistula and confirmed that a higher pitch of murmur is produced with a narrower fistula. These findings are consistent with the fact that the blood flow sound intensities at higher frequencies tend to increase with stenosis, as demonstrated by our histograms. When the heart systole ends, the pumped blood flow is reduced, also reducing the spectral amplitude of the background blood flow sound, causing turbulence from the remaining blood flow to maintain spectral features at low pitch frequencies. This explains why pre-PTA shunt sounds showed a similar or slightly lower number of pixels at similar sound intensities at medium- to low-pitch frequencies than post-PTA shunt sounds.Mel spectrogram-based DCNN models showed high diagnostic performance in predicting significant AVF stenosis, regardless of the type of DCNN architecture used to construct the model. Glangetas et al. [22] previously proposed the idea of an autonomous stethoscope developed by integrating an artificial intelligence algorithm into portal digital stethoscopes, which can be simply turned into a smartphone accessory [23]. While this idea was originally proposed to classify lung sounds, significant AVF stenosis requiring PTA may be monitored and screened using the above DCNN models to allow for timely interventions, which may lead to increased AVF patency and longevity [2425].This study has several limitations. First, this pilot study included a small number of patients, all of whom had venous outflow stenosis and underwent interventions by a single interventional radiologist, which may have contributed to selection bias. The primary focus of this study was to assess the feasibility of applying DCNN models to predict significant AVF stenosis requiring PTA. However, the post-PTA shunt sounds of the same patients from whom pre-PTA shunt sounds were obtained, instead of shunt sounds from completely different patients with AVF, were used as controls owing to ethical issues with performing DSA on patients with AVF without a clinical sign of stenosis. The inclusion of a more diverse group of patients without AVF stenosis should introduce more variation in the mel spectrogram, and the performance of the DCNN model may be lower than that presented herein. Further studies that include a more diverse group of patients without dysfunctional AVFs are warranted to validate whether these models can be used to screen for significant stenosis in general patients undergoing hemodialysis. Second, owing to the paucity of AVF shunt sounds, the SMOTE algorithm was used to generate synthetic mel spectrograms, and a separate external test set was not used. Third, mel spectrograms are the only type of visual representation of audio data, and for simplicity, the use of other representations, including harmonic-percussive spectrograms or scattergrams, has not been explored [26]. Fourth, recording of the shunt sound was performed collectively at venous access 1–2 cm from the anastomosis site before performing DSA, regardless of the stenosis site. Recording at this site may not have fully captured the degree of stenosis at proximal sites, including the cephalic arch, and the performance of DCCN models may have been improved if auscultation was performed at the site of stenosis. Finally, DSA was used as a reference instead of other imaging modalities, including Doppler ultrasonography, because DSA is the current gold standard for assessing vascular stenosis [3].In conclusion, mel spectrogram-based DCNN models, particularly ResNet50, successfully predicted the presence of significant AVF stenosis requiring PTA in this feasibility study and may potentially be used in AVF surveillance.
Authors: Charmaine E Lok; Thomas S Huber; Timmy Lee; Surendra Shenoy; Alexander S Yevzlin; Kenneth Abreo; Michael Allon; Arif Asif; Brad C Astor; Marc H Glickman; Janet Graham; Louise M Moist; Dheeraj K Rajan; Cynthia Roberts; Tushar J Vachharajani; Rudolph P Valentini Journal: Am J Kidney Dis Date: 2020-03-12 Impact factor: 8.860
Authors: Rajiv S Vasudevan; Yu Horiuchi; Francesca J Torriani; Bruno Cotter; Sofie M Maisel; Sanjeet S Dadwal; Robert Gaynes; Alan S Maisel Journal: Am J Med Date: 2020-06-19 Impact factor: 4.965