| Literature DB >> 32271836 |
Tae Keun Yoo1, Ein Oh2, Hong Kyu Kim3, Ik Hee Ryu4, In Sik Lee4, Jung Sub Kim4, Jin Kuk Kim4.
Abstract
Wrong-site surgeries can occur due to the absence of an appropriate surgical time-out. However, during a time-out, surgical participants are unable to review the patient's charts due to their aseptic hands. To improve the conditions in surgical time-outs, we introduce a deep learning-based smart speaker to confirm the surgical information prior to cataract surgeries. This pilot study utilized the publicly available audio vocabulary dataset and recorded audio data published by the authors. The audio clips of the target words, such as left, right, cataract, phacoemulsification, and intraocular lens, were selected to determine and confirm surgical information in the time-out speech. A deep convolutional neural network model was trained and implemented in the smart speaker that was developed using a mini development board and commercial speakerphone. To validate our model in the consecutive speeches during time-outs, we generated 200 time-out speeches for cataract surgeries by randomly selecting the surgical statuses of the surgical participants. After the training process, the deep learning model achieved an accuracy of 96.3% for the validation dataset of short-word audio clips. Our deep learning-based smart speaker achieved an accuracy of 93.5% for the 200 time-out speeches. The surgical and procedural accuracy was 100%. Additionally, on validating the deep learning model by using web-generated time-out speeches and video clips for general surgery, the model exhibited a robust and good performance. In this pilot study, the proposed deep learning-based smart speaker was able to successfully confirm the surgical information during the time-out speech. Future studies should focus on collecting real-world time-out data and automatically connecting the device to electronic health records. Adopting smart speaker-assisted time-out phases will improve the patients' safety during cataract surgeries, particularly in relation to wrong-site surgeries.Entities:
Year: 2020 PMID: 32271836 PMCID: PMC7144990 DOI: 10.1371/journal.pone.0231322
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A deep learning-based smart speaker in ophthalmic surgery to confirm surgical information.
Sound dataset for the target words.
| Words | N | Total size (MB) | Source | Purpose |
|---|---|---|---|---|
| 451 | 58.1 | Recorded by the authors | Start time-out | |
| 2,367 | 71.2 | Open dataset [ | Surgery site | |
| 2,353 | 71.0 | Open dataset [ | Surgery site | |
| 2,370 | 71.1 | Open dataset [ | Patient ID | |
| 2,373 | 71.3 | Open dataset [ | Patient ID | |
| 2,356 | 70.9 | Open dataset [ | Patient ID | |
| 2,372 | 71.4 | Open dataset [ | Patient ID | |
| 2,357 | 71.1 | Open dataset [ | Patient ID | |
| 2,369 | 71.6 | Open dataset [ | Patient ID | |
| 2,377 | 71.6 | Open dataset [ | Patient ID | |
| 2,352 | 70.7 | Open dataset [ | Patient ID | |
| 2,364 | 71.3 | Open dataset [ | Patient ID | |
| 2,376 | 71.8 | Open dataset [ | Patient ID | |
| 484 | 60.1 | Recorded by the authors | Procedure | |
| 606 | 84.0 | Recorded by the authors | Procedure | |
| 462 | 58.3 | Recorded by the authors | Procedure |
aResearchers recorded the target words provided by the text-to-voice tools.
Fig 2Deep learning architecture and application of LattePanda to build a smart speaker.
Fig 3Training and validation results.
(A) Learning Curves of the Deep Learning Model. (B) Confusion Matrix to present Classification Results for Validation Dataset.
Validation accuracy according to different deep learning architectures and training epochs.
| Models | Epochs | Accuracy (%) |
|---|---|---|
| Deep learning (CNN) | 10 | 92.5 |
| 25 | 96.3 | |
| 50 | 96.1 | |
| Deep learning (CNN without batch normalization) | 10 | 88.2 |
| 25 | 93.5 | |
| 50 | 94.0 | |
| Deep learning (CNN without dropout) | 10 | 92.1 |
| 25 | 95.5 | |
| 50 | 92.9 |
CNN, Convolutional neural network
Binary classification results to explore the robustness and outcome of detection using the test dataset.
| AUC | Accuracy (%) | Sensitivity (%) | Specificity (%) | |
|---|---|---|---|---|
| Problem 1: “ | ||||
| Deep learning (CNN) | 0.996 | 97.3 | 96.8 | 97.7 |
| Random forest | 0.991 | 95.7 | 96.4 | 94.9 |
| SVM using RBF kernel | 0.978 | 93.1 | 91.8 | 94.3 |
| Problem 2: “ | ||||
| Deep learning (CNN) | 0.988 | 95.3 | 95.8 | 94.8 |
| Random forest | 0.896 | 82.4 | 87.0 | 77.7 |
| SVM using RBF kernel | 0.885 | 83.6 | 82.0 | 85.1 |
| Problem 3: “ | ||||
| Deep learning (CNN) | 0.990 | 95.1 | 93.6 | 96.6 |
| Random forest | 0.980 | 92.9 | 95.4 | 90.3 |
| SVM using RBF kernel | 0.983 | 93.5 | 90.9 | 96.0 |
AUC, area under the receiver operating characteristic curve; RBF, radial basis function; SVM, support vector machine.
Fig 4The real-time experiment using our developed smart speaker.
(A) We generated the time-out script by selecting surgical status randomly. (B) The accuracy of the deep learning model using Samsung Galaxy 10 as a sound source. (C) The accuracy of the deep learning model using Kakao Mini-C as a sound source.
Fig 5Example of the developed smart speaker using a video from YouTube.