| Literature DB >> 36080847 |
Jeong Hoon Lee1, Chang Yoon Lee2, Jin Seop Eom3, Mingun Pak4, Hee Seok Jeong5, Hee Young Son2.
Abstract
Despite the lack of findings in laryngeal endoscopy, it is common for patients to undergo vocal problems after thyroid surgery. This study aimed to predict the recovery of the patient's voice after 3 months from preoperative and postoperative voice spectrograms. We retrospectively collected voice and the GRBAS score from 114 patients undergoing surgery with thyroid cancer. The data for each patient were taken from three points in time: preoperative, and 2 weeks and 3 months postoperative. Using the pretrained model to predict GRBAS as the backbone, the preoperative and 2-weeks-postoperative voice spectrogram were trained for the EfficientNet architecture deep-learning model with long short-term memory (LSTM) to predict the voice at 3 months postoperation. The correlation analysis of the predicted results for the grade, breathiness, and asthenia scores were 0.741, 0.766, and 0.433, respectively. Based on the scaled prediction results, the area under the receiver operating characteristic curve for the binarized grade, breathiness, and asthenia were 0.894, 0.918, and 0.735, respectively. In the follow-up test results for 12 patients after 6 months, the average of the AUC values for the five scores was 0.822. This study showed the feasibility of predicting vocal recovery after 3 months using the spectrogram. We expect this model could be used to relieve patients' psychological anxiety and encourage them to actively participate in speech rehabilitation.Entities:
Keywords: GRBAS; deep learning; spectrogram; voice recovery
Mesh:
Year: 2022 PMID: 36080847 PMCID: PMC9460363 DOI: 10.3390/s22176387
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Grade score distribution.
| Grade | Pre op. | Post op. | 3 Months Post op. |
|---|---|---|---|
|
| 43 | 25 | 31 |
|
| 61 | 60 | 67 |
|
| 9 | 24 | 14 |
|
| 1 | 5 | 2 |
Figure 1Workflow scheme of the data-learning process for acoustic vocal samples of patients with thyroid surgery using an artificial intelligence model. (A) Inclusion criteria and steps to train a pretrained model that extracts important features from vocal data. (B) The development of a deep learning model to predict the GRBAS score after three weeks from pre- and postoperative vocal samples with the pretrained model.
Prediction performance for the GRBAS score.
| Class | RMSE | Rho | |
|---|---|---|---|
| Grade | 0.399 | 0.796 | <0.001 |
| Roughness | 0.365 | 0.149 | 0.509 |
| Breathiness | 0.409 | 0.784 | <0.001 |
| Asthenia | 0.469 | 0.602 | 0.003 |
| Strain | 0.203 | NA | NA |
Figure 2The prediction results for the (A) grade, (B) breathiness, and (C) asthenia scores of the test set. The x-axis represents the observed GRBAS score of the patient, and the y-axis represents the predicted value by the deep learning model. The blue line is the regression line to see the relationship between the predicted value and the actual value.
Figure 3For the grade, breathiness, and asthenia scores, we divided the patient based on 0 or not, and the ROC was calculated.
Figure 4ROC curve for vocal samples of patients 6 months after surgery.
Figure 5A spectrogram example and its visualization results using gradient-based localization to predict the grade score in second and third residual block of EfficientNet. (A) Spectrogram with heatmap visualization of a patient with high-degree grade score. (B) A patient with normal grade.