| Literature DB >> 35710769 |
Faiha Fareez1,2, Tishya Parikh1,2, Christopher Wavell1,2, Saba Shahab1,2, Meghan Chevalier1,2, Scott Good1,2, Isabella De Blasi1,2, Rafik Rhouma2,3,4, Christopher McMahon2,3, Jean-Paul Lam2,3, Thomas Lo2, Christopher W Smith5,6.
Abstract
Artificial Intelligence (AI) is playing a major role in medical education, diagnosis, and outbreak detection through Natural Language Processing (NLP), machine learning models and deep learning tools. However, in order to train AI to facilitate these medical fields, well-documented and accurate medical conversations are needed. The dataset presented covers a series of medical conversations in the format of Objective Structured Clinical Examinations (OSCE), with a focus on respiratory cases in audio format and corresponding text documents. These cases were simulated, recorded, transcribed, and manually corrected with the underlying aim of providing a comprehensive set of medical conversation data to the academic and industry community. Potential applications include speech recognition detection for speech-to-text errors, training NLP models to extract symptoms, detecting diseases, or for educational purposes, including training an avatar to converse with healthcare professional students as a standardized patient during clinical examinations. The application opportunities for the presented dataset are vast, given that this calibre of data is difficult to access and costly to develop.Entities:
Mesh:
Year: 2022 PMID: 35710769 PMCID: PMC9203765 DOI: 10.1038/s41597-022-01423-1
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Pie chart demonstrating the proportion of cases in the following categories: respiratory (78.7%, blue), musculoskeletal (16.9%, orange), gastrointestinal (2.2%, grey), cardiac (1.8%, red) and dermatological (0.4%, green).
Fig. 2Histograms displaying the number of conversations with their corresponding length of time in minutes (left) and number of words per conversation (right).
An example of part of a transcribed audio recording and manual correction (from RES0051).
| Speech to text original transcript | Would you mind starting with |
| telling me what brought you in? | |
| Sure I I have had this cough for | |
| the past five days and it doesn’t | |
| seem to be getting any better so. | |
| I’m I’m just here too. | |
| Ask you what, | |
| what it what it possibly could be. | |
| At has the cough been getting any better? | |
| Staying the same or getting | |
| worse over these last five days? | |
| I think it’s getting worse | |
| Manual Correction | D: Would you mind starting with telling me what brought you in? |
| P: Sure, I have had this cough for the past five days and it doesn’t seem to be getting any better so I’m just here to ask you what it what it possibly could be. | |
| D: Has the cough been getting any better, staying the same or getting worse over these last five days? | |
| P: I think it’s getting worse. |
| Measurement(s) | conversations |
| Technology Type(s) | audio recording and transcription |
| Factor Type(s) | N/A |
| Sample Characteristic - Organism | simulated medical exams |
| Sample Characteristic - Environment | simulation |
| Sample Characteristic - Location | simulation |