Literature DB >> 35166079

Deep Learning Analysis to Automatically Detect the Presence of Penetration or Aspiration in Videofluoroscopic Swallowing Study.

Jeoung Kun Kim¹, Yoo Jin Choo², Gyu Sang Choi³, Hyunkwang Shin³, Min Cheol Chang⁴, Donghwi Park⁵.

Abstract

BACKGROUND: Videofluoroscopic swallowing study (VFSS) is currently considered the gold standard to precisely diagnose and quantitatively investigate dysphagia. However, VFSS interpretation is complex and requires consideration of several factors. Therefore, considering the expected impact on dysphagia management, this study aimed to apply deep learning to detect the presence of penetration or aspiration in VFSS of patients with dysphagia automatically.
METHODS: The VFSS data of 190 participants with dysphagia were collected. A total of 10 frame images from one swallowing process were selected (five high-peak images and five low-peak images) for the application of deep learning in a VFSS video of a patient with dysphagia. We applied a convolutional neural network (CNN) for deep learning using the Python programming language. For the classification of VFSS findings (normal swallowing, penetration, and aspiration), the classification was determined in both high-peak and low-peak images. Thereafter, the two classifications determined through high-peak and low-peak images were integrated into a final classification.
RESULTS: The area under the curve (AUC) for the validation dataset of the VFSS image for the CNN model was 0.942 for normal findings, 0.878 for penetration, and 1.000 for aspiration. The macro average AUC was 0.940 and micro average AUC was 0.961.
CONCLUSION: This study demonstrated that deep learning algorithms, particularly the CNN, could be applied for detecting the presence of penetration and aspiration in VFSS of patients with dysphagia.

Entities: Chemical

Keywords: Deep Learning; Deglutition; Swallowing Reflex; VFSS

Mesh：

Year: 2022 PMID： 35166079 PMCID： PMC8845107 DOI： 10.3346/jkms.2022.37.e42

Source DB: PubMed Journal: J Korean Med Sci ISSN： 1011-8934 Impact factor: 2.153

INTRODUCTION

The swallowing process includes the coordinated contraction and relaxation of the muscles of the tongue, pharynx, larynx, and esophagus, which is controlled by the central nervous system (CNS) from the brain cortex to the brainstem.123 Any lesion in the path from the CNS to the swallowing muscles can cause difficulty in swallowing, which is referred as dysphagia.45 Dysphagia is a common clinical symptom in patients with cerebrovascular, neuromuscular, and neurodegenerative diseases and with head and neck cancers.678 The videofluoroscopic swallowing study (VFSS) is currently considered the gold standard to accurately diagnose and quantitatively analyze dysphagia.9 Clinicians repeatedly perform a frame-by-frame analysis of spatiotemporal and quantitative parameters in a recorded VFSS video to determine the cause of dysphagia and the appropriate diet.10111213 Therefore, despite being able to objectively observe the entire process of swallowing through VFSS, its interpretation is complex and needs consideration of several factors.9 Recently, deep learning, a technique in artificial intelligence wherein the system learns rules and patterns from the given information, has been increasingly studied in the medical field.14 Deep learning has several advantages in terms of detecting the possible interactions between attributes or variables; hence, it may be useful in diagnosis and prediction.15 The application of the recent developments in deep learning research could reduce the burden over clinicians caused by the complexity of VFSS interpretation. Moreover, to date, no research pertaining to deep learning has been directed to detect the presence of penetration or aspiration in VFSS of patients with dysphagia. Therefore, considering the expected impact on dysphagia management, this study aimed to apply deep learning to detect penetration or aspiration in VFSS of patients with dysphagia automatically.

METHODS

All procedures were carried out in accordance with the relevant guidelines and regulations. We included patients who visited the outpatient clinic of the rehabilitation department, who were admitted to the rehabilitation department of one of the two university hospitals (Ulsan University Hospital and Yeungnam University Hospital) because of dysphagia, or who were diagnosed using VFSS between January 2009 and April 2020. The steps of the modeling process applied in this study are shown in Fig. 1.

Fig. 1

The steps of the modeling process applied in this study.

VFSS = videofluoroscopic swallowing study.

The steps of the modeling process applied in this study.

VFSS = videofluoroscopic swallowing study.

Data collection

The VFSS data of 190 participants with dysphagia were collected. The exclusion criteria were as follows: 1) patients of age less than 20 years; 2) patients who had undergone tracheostomy; 3) patients with facial or cranial anomalies; and 4) patients having metal plate in the cervical spine or facial bone that could develop an artifact.

Analysis of VFSS

When the VFSS was performed, the patients were instructed to seat upright under a videofluoroscopy machine with the head in a neutral position. Boundaries for the frame of videofluoroscopy included the incisors anteriorly, cervical vertebrae posteriorly, nasal border of the soft palate superiorly, and cervical esophagus inferiorly.1617 The fluoroscopic images of swallows were digitally recorded and stored at 30 frames/s.1617 Each VFSS was performed using a bolus of ‘‘thin’’ fluid (1–50 cP). Each patient received a 5-mL bolus delivered using a 10-mL syringe.1617 In the analysis of VFSS, the presence of penetration was determined when the contrast material passed above the true vocal cord, and not below.18 The presence of aspiration was determined when the contrast material passed below the true vocal cord.18 Based on the above criteria, the presence or absence of penetration or aspiration in the dynamic fluoroscopic images was reviewed by two rehabilitation medicine specialists with more than 10 years of clinical experience in dysphagia. Based on the VFSS, patients were classified into normal (without penetration and aspiration), penetration, and aspiration groups.

VFSS image selection

To analyze VFSS by deep learning, we selected five consecutive frame images (at 0.33-s intervals) from the VFSS, back and forth, when the hyoid bone reached the peak (the highest position of the hyoid bone; high-peak image), and another five consecutive frame images from the VFSS when the hyoid bone completely descended from the peak (the lowest position of the hyoid bone; low-peak image) (Fig. 2). Therefore, 10 frame images were selected from one swallowing process (five high-peak images and five low-peak images) for the application of deep learning in the VFSS video of a patient with dysphagia (Fig. 2).

Fig. 2

ROC curve for the data validation models. The AUC of the validation dataset of the VFSS images for the convolutional neural network model was 0.942 for normal findings, 0.878 for penetration, and 1.000 for aspiration. For calculating the average AUC, both macro and micro average AUC was employed. Macro average AUC was 0.940 and micro average AUC was 0.961.

AUC = area under the curve, ROC = receiver operating characteristic, VFSS = videofluoroscopic swallowing study.

ROC curve for the data validation models. The AUC of the validation dataset of the VFSS images for the convolutional neural network model was 0.942 for normal findings, 0.878 for penetration, and 1.000 for aspiration. For calculating the average AUC, both macro and micro average AUC was employed. Macro average AUC was 0.940 and micro average AUC was 0.961.

AUC = area under the curve, ROC = receiver operating characteristic, VFSS = videofluoroscopic swallowing study.

Deep learning analysis

We applied a convolutional neural network (CNN) for deep learning using the Python programming language. TensorFlow 2.6.0 with Keras, and scikit-learn toolkit 0.24.1 were used to train CNN models. The details and performance of the best model are described in Table 1. A CNN consists of one or more convolutional layers, often with a subsampling layer; the convolutional layers are followed by one or more fully connected layers, similar to that in a standard neural network.19 To achieve better learning outcomes, we employed several pre-trained CNN models including efficientnet [B0, B1, and B3],20 mobilenet,21 inceptionV3,22 and Resnet50.23 Both fine-tuning and training from scratch were employed for each CNN model. VFSS images were used as inputs to classify patients with dysphagia into normal (no penetration and aspiration), penetration, or aspiration groups. Training and validation data were randomly assigned using scikit-learn, keeping the ratios of normal, penetration and aspiration the same in both datasets. Of the study population (total 190 patients), 70% (n = 133), 30% (n = 57) were included in the training and validation sets, respectively. Additionally, of the 950 images each for high-peak and low-peak images, 70% (665 images) and 30% (285 images) were used for training and validation, respectively.

Table 1

Performances of the deep-learning model

Sample size (patients)	133, 70% for training, 57, 30% for validation, total 190
Sample ratio (patients)	Normal: 113, 59.47%; penetration: 32, 16.84%; aspiration: 45, 23.68%
Sample size (images)	665, 70% for training, 285, 30% for validation, total 950 each for high-peak and low-peak images
Sample ratio (images)	Normal: 690, 72.63%; penetration:147, 15.47%; aspiration: 213, 22.42% for high-peak images
	Normal: 700, 73.68%; penetration: 40, 4.21%; aspiration: 210, 22.11% for low-peak images
CNN model	Model for high-peak images		Model for low-peak images
		-MobileNet with fine-tuning		-MobileNet with fine-tuning
		-SGD optimizer, relu activation		-SGD optimizer, elu activation
		-Data augmentation, dropout and early stopping for reducing overfitting		-Data augmentation, dropout and early stopping for reducing overfitting
		-Image size 320 × 180 × 3 as input		-Image size 320 × 180 × 3 as input
		-Training accuracy: 100%		-Training accuracy: 100%
		-Validation accuracy: 93.68%		-Validation accuracy: 93.68%
VFSS classifier performance	Classifier of high-peak images for individual patient		Classifier of low-peak images for individual patient
		-Training accuracy: 100%		-Training accuracy: 100%
		-Validation accuracy: 94.74%		-Validation accuracy: 94.74%
VFSS integrated classifier performance	-Training accuracy: 100%, validation accuracy: 94.74%
	-Validation ROC AUC for normal 0.942, penetration 0.878, aspiration 1.000
	-Validation macro average ROC AUC 0.940, micro average ROC AUC 0.961

CNN = convolutional neural network, SGD = stochastic gradient descent, VFSS = videofluoroscopic swallowing study, ROC = receiver operating characteristics, AUC = area under the curve.

CNN = convolutional neural network, SGD = stochastic gradient descent, VFSS = videofluoroscopic swallowing study, ROC = receiver operating characteristics, AUC = area under the curve. For obtaining the classification model according to VFSS findings (normal, penetration, and aspiration), the classification was initially conducted in both high-peak and low-peak images. We applied the following classification criteria: 1) normal: ≥ 4 normal images (of five images [separately for high-peak and low-peak images]); 2) penetration: < 4 normal images and no aspiration image; and 3) aspiration: < 4 normal images and ≥ 1 aspiration images. The two classifications from the high-peak and low-peak images were integrated into a final classification according to the following criteria: 1) normal: normal in both high-peak and low-leak images; 2) penetration: ≤ 1 normal (in the two classification results) and no aspiration; and 3) aspiration: ≤ 1 normal and ≥ 1 aspiration (Table 2).

Table 2

The criteria for the integration of the classification results of high-peak and low-peak images

Classification model	Dysphagia classification criteria
Initial classifier in each high-peak and low-peak images	Normal: NI ≥ 4
	Penetration: NI < 4 and AI = 0
	Aspiration: NI < 4 and AI ≥ 1
Integrated classifier (final decision)	Normal: N = 2
	Penetration: N ≤ 1 and A = 0
	Aspiration: N ≤ 1 and A ≥ 1

NI = normal image, AI = aspiration image, N = normal decision, A = aspiration decision.

Statistical analysis

Statistical analyses were performed using Python 3.8.10 and scikit-learn version 0.24.1. Receiver operating characteristic curve analysis was performed, and the area under the curve (AUC) was calculated. The confidence interval for the average AUC was calculated as bias-corrected and accelerated using the R 4.0.5 and multiROC 1.1.1 package.24

Ethics statement

This study was approved by the Institutional Review Board of Yeungnam University Hospital (2019-10-008). The board decided that informed consent was not required due to the retrospective nature of the study and the use of anonymous clinical data.

RESULTS

A total of 190 patients (mean age, 66.83 ± 15.47 years; 92 men, 88 women) were included in this study (Table 3). Of the 190 patients, 113 (59.47%) patients were classified in the normal group (no penetration and aspiration), 32 (16.84%) patients in the penetration group, and 45 (23.68%) patients in the aspiration group (Table 1). Additionally, of the 950 high-peak images of 190 patients, 590 images (62.11%) were normal, and 147 (15.47%) and 213 images (22.42%) showed penetration and aspiration, respectively. Of the 950 low-peak images of 190 patients, 700 (73.68%), 40 (4.21%), and 210 (22.11%) showed normal, penetration, and aspiration findings, respectively.

Table 3

Characteristics of patients with dysphagia who were included in this study

Characteristics		Values
Age, yr		66.83 ± 15.47
Sex, male:female		92:88
Normal:penetration:aspiration		113 (59.47):32 (16.84):45 (23.68)
Cause
	Stroke	92 (48.42)
	Spinal cord injury, cervical level	16 (8.42)
	Parkinson's disease	15 (7.89)
	Motor neuron disease	19 (10.00)
	Dementia	23 (12.11)
	Deconditioning	25 (13.16)

Values are presented as mean ± SD or number (%).

Values are presented as mean ± SD or number (%). The AUC of the validation dataset of the VFSS images for the CNN model was 0.942 for normal findings, 0.878 for penetration, and 1.000 for aspiration. For calculating the average AUC, both macro and micro average AUC was employed. Macro average AUC was 0.940 and micro average AUC was 0.961 (Fig. 2).

DISCUSSION

To the best of our knowledge, this study is the first to use deep learning to detect the presence of penetration or aspiration in VFSS of patients with dysphagia. The results of this study are promising, and the study has high accuracy. Considering that AUCs of 0.7–0.8, 0.8–0.9, and > 0.9 are generally considered acceptable, excellent, and outstanding, respectively, the ability of deep learning models used in this study to detect normal swallowing, penetration, or aspiration is outstanding.25 While neural networks and other pattern detection methods have been utilized for the past 50 years, recently, there has been a significant development in the field of CNN.14 The multiple convolutional layers of the CNN model may be more appropriate for classifying the clinical outcome based on radiologic or other image-based data because of the characteristics of the model such as ruggedness to shifts and distortion in images, limited memory requirement, and easier and better training.19 Detection of a particular finding using CNN has been reported to be rugged to distortions such as changes in shape caused by different poses, lighting conditions, and camera angles, presence of partial occlusions, and horizontal and vertical shifts, if a considerable amount of data set is sufficiently trained.19 Moreover, in the convolutional layer of the CNN, the same coefficients are used across different locations in space; hence, the memory requirement is drastically reduced.19 Several methods of deep learning-based VFSS analysis have been reported in previous studies.91022 Using the single-shot multi-box detector, one of the state-of-the-art deep learning methods for object detection, Zhang et al.26 developed a tracking system for the detection of the hyoid bone. However, the analysis of motion or action in VFSS videos is difficult using this method, because the technique focuses on the detection of a spatial region on a single image rather than on the analysis of a sequence of images from videographic data. Lee et al.910 reported a state-of-the-art video analysis method using an integrated three-dimensional convolutional network for the detection of the pharyngeal phase and for analyzing the swallowing reflex in a VFSS video without manual spatial annotations. While the detection of the pharyngeal phase and analysis of the swallowing reflex are useful for shortening the time required for VFSS by the clinician, they have limitations in that both require further analysis to determine the status of the patients. To date, most VFSS-based deep learning studies have focused on tracking anatomical structures such as hyoid bones, analyzing the pharyngeal phase, or recording the swallowing reflex time. However, in clinical settings, the most important implication of VFSS is detection of the presence of penetration or aspiration. Therefore, unlike previous studies, the deep learning program developed in this research would be useful to physicians in clinical settings. There are a few limitations to this study. We could not input the entire video of VFSS for deep learning analysis; we trained the CNN model only by selecting two sets of five consecutive frame images from VFSS of patients with dysphagia. However, in VFSS, we believe that penetration or aspiration usually develops in two phases. If the primary cause of penetration or aspiration is delayed swallowing reflex or reduced laryngeal elevation, the penetration or aspiration usually develops when the hyoid bone is at the high-peak. In the low-peak, over-flow penetration or aspiration can also develop when the amount of pyriformis or vallecular sinus residue increases while the hyoid bone descends (at the end of the swallowing process). Therefore, five consecutive VFSS images in both positions of the hyoid bone (high-peak and low-peak) include considerable moments of penetration and aspiration in VFSS video. This hypothesis was proven correct according to the results of this study, using VFSS with deep learning by means of a CNN, which showed high accuracy. However, for more accurate analysis, deep learning analysis of complete VFSS video images will be necessary in the future. In conclusion, this study demonstrated that deep learning algorithms, particularly the CNN, could be applied for detecting the presence of penetration and aspiration in VFSS of patients with dysphagia.

22 in total

Review 1. Rehabilitation medicine: 2. Diagnosis of dysphagia and its nutritional management for stroke patients.

Authors: Hillel M Finestone; Linda S Greene-Finestone
Journal: CMAJ Date: 2003-11-11 Impact factor: 8.262

2. Management of Dysphagia in stroke patients.

Authors: Reza Shaker; Joseph E Geenen
Journal: Gastroenterol Hepatol (N Y) Date: 2011-05

3. Normal contractile algorithm of swallowing related muscles revealed by needle EMG and its comparison to videofluoroscopic swallowing study and high resolution manometry studies: A preliminary study.

Authors: Donghwi Park; Hyun Haeng Lee; Seok Tae Lee; Yoongul Oh; Jun Chang Lee; Kyoung Won Nam; Ju Seok Ryu
Journal: J Electromyogr Kinesiol Date: 2017-07-25 Impact factor: 2.368

4. Relationship between manometric and videofluoroscopic measures of swallow function in healthy adults and patients treated for head and neck cancer with various modalities.

Authors: Barbara Roa Pauloski; Alfred W Rademaker; Cathy Lazarus; Guy Boeckxstaens; Peter J Kahrilas; Jerilyn A Logemann
Journal: Dysphagia Date: 2008-10-28 Impact factor: 3.438

Review 5. Effectiveness of pharmacologic treatment for dysphagia in Parkinson's disease: a narrative review.

Authors: Min Cheol Chang; Jin-Sung Park; Byung Joo Lee; Donghwi Park
Journal: Neurol Sci Date: 2020-11-17 Impact factor: 3.307

6. The videofluorographic swallowing study.

Authors: Bonnie Martin-Harris; Bronwyn Jones
Journal: Phys Med Rehabil Clin N Am Date: 2008-11 Impact factor: 1.784

7. Findings of Abnormal Videofluoroscopic Swallowing Study Identified by High-Resolution Manometry Parameters.

Authors: Donghwi Park; Yoongul Oh; Ju Seok Ryu
Journal: Arch Phys Med Rehabil Date: 2015-10-24 Impact factor: 3.966

8. Clinical characteristics of dysphagic stroke patients with salivary aspiration: A STROBE-compliant retrospective study.

Authors: Kwang Jae Yu; Donghwi Park
Journal: Medicine (Baltimore) Date: 2019-03 Impact factor: 1.889

9. Diagnosis and Clinical Course of Unexplained Dysphagia.

Authors: Jiwoon Yeom; Young Seop Song; Won Kyung Lee; Byung-Mo Oh; Tai Ryoon Han; Han Gil Seo
Journal: Ann Rehabil Med Date: 2016-02-26

10. Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine.

Authors: Zeeshan Ahmed; Khalid Mohamed; Saman Zeeshan; XinQi Dong
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

1 in total

1. A Retrospective Clinical Evaluation of an Artificial Intelligence Screening Method for Early Detection of STEMI in the Emergency Department.

Authors: Dongsung Kim; Ji Eun Hwang; Youngjin Cho; Hyoung-Won Cho; Wonjae Lee; Ji Hyun Lee; Il-Young Oh; Sumin Baek; Eunkyoung Lee; Joonghee Kim
Journal: J Korean Med Sci Date: 2022-03-14 Impact factor: 2.153

1 in total