| Literature DB >> 31799422 |
Juan C Quiroz1, Liliana Laranjo1, Ahmet Baki Kocaballi1, Shlomo Berkovsky1, Dana Rezazadegan1, Enrico Coiera1.
Abstract
Clinicians spend a large amount of time on clinical documentation of patient encounters, often impacting quality of care and clinician satisfaction, and causing physician burnout. Advances in artificial intelligence (AI) and machine learning (ML) open the possibility of automating clinical documentation with digital scribes, using speech recognition to eliminate manual documentation by clinicians or medical scribes. However, developing a digital scribe is fraught with problems due to the complex nature of clinical environments and clinical conversations. This paper identifies and discusses major challenges associated with developing automated speech-based documentation in clinical settings: recording high-quality audio, converting audio to transcripts using speech recognition, inducing topic structure from conversation data, extracting medical concepts, generating clinically meaningful summaries of conversations, and obtaining clinical data for AI and ML algorithms.Entities:
Keywords: Computational science; Health services; Information technology; Software
Year: 2019 PMID: 31799422 PMCID: PMC6874666 DOI: 10.1038/s41746-019-0190-1
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Digital scribe pipeline. A digital scribe acquires the audio of the clinician–patient conversation, performs automatic speech recognition to generate the conversation transcript, extracts information from the transcript, summarizes the information, and generates medical notes in the electronic health record (EHR) associated with the clinician–patient encounter. Speech recognition, information extraction, and summarization rely on AI and ML models that require large volumes of data for training and evaluation.
The challenges associated with the various tasks a digital scribe must perform.
| Task | Challenge |
|---|---|
| Recording audio | • High ambient noise |
| • Microphone fidelity | |
| • Multiple speakers | |
| • Microphone positioning relative to clinician and patient | |
| Automatic speech recognition | • Varying audio quality |
| • High ambient noise | |
| • Multiple speakers | |
| • Disfluencies, false starts, interruptions, non-lexical pauses | |
| • Complexity of medical vocabulary | |
| • Variable speaker volume due to distance to microphone and relative positioning | |
| • Differentiating multiple speakers in the audio (speaker diarization) | |
| Topic segmentation | • Unstructured conversations |
| • Non-linear progression of topics during a medical conversation | |
| Medical concept extraction | • Noisy output of programs mapping text to UMLS |
| • Tuning of parameters of tools used to map text to UMLS | |
| • Contextual inference (understanding the appropriate meaning of a word or phrase given the context) | |
| • Phenomena in spontaneous speech such as zero anaphora, thinking aloud, topic drift | |
| Summarization | • Summarization of non-verbal unstructured communication |
| • Integrating medical knowledge to identify relevant information | |
| • Contextual inference | |
| • Resolving conflicting information from the patient | |
| • Updating hypotheses as the patient discloses more information | |
| • Generating summaries to train a summarization ML model | |
| Data collection | • Clinician and patient privacy concerns |
| • Costly data collection and labeling | |
| • Patient consent to be audio recorded and use the data for research purposes | |
| • De-identification and anonymization of data | |
| • Expensive datasets | |
| • Data held privately as an intellectual property asset | |
| • Clinician reluctance to be recorded due to fear of legal liabilities and extra workload |
Fig. 2Three examples of transitions of clinician–patient conversations lacking clear boundaries and structure. Medical conversation fragments are on the left and the respective topics are on the right. Medical conversations do not appear to follow a classic linear model of defined information seeking activities. The nonlinearity of activities requires digital scribes to link disparate information fragments, merge their content, and abstract coherent information summaries.