Literature DB >> 32229465

Clinical Text Data in Machine Learning: Systematic Review.

Irena Spasic1, Goran Nenadic2.   

Abstract

BACKGROUND: Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data.
OBJECTIVE: The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice.
METHODS: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics.
RESULTS: The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance.
CONCLUSIONS: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation. ©Irena Spasic, Goran Nenadic. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 31.03.2020.

Entities:  

Keywords:  machine learning; medical informatics; medical informatics applications; natural language processing

Year:  2020        PMID: 32229465     DOI: 10.2196/17984

Source DB:  PubMed          Journal:  JMIR Med Inform


  33 in total

1.  TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Authors:  Shorabuddin Syed; Adam Jackson Angel; Hafsa Bareen Syeda; Carole Franc Jennings; Joseph VanScoy; Mahanazuddin Syed; Melody Greer; Sudeepa Bhattacharyya; Shaymaa Al-Shukri; Meredith Zozus; Fred Prior; Benjamin Tharian
Journal:  Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap       Date:  2022-02

2.  Analyzing patient experiences using natural language processing: development and validation of the artificial intelligence patient reported experience measure (AI-PREM).

Authors:  Marieke M van Buchem; Olaf M Neve; Ilse M J Kant; Ewout W Steyerberg; Hileen Boosman; Erik F Hensen
Journal:  BMC Med Inform Decis Mak       Date:  2022-07-15       Impact factor: 3.298

Review 3.  Intelligent Telehealth in Pharmacovigilance: A Future Perspective.

Authors:  Heba Edrees; Wenyu Song; Ania Syrowatka; Aurélien Simona; Mary G Amato; David W Bates
Journal:  Drug Saf       Date:  2022-05-17       Impact factor: 5.228

4.  Identifying Patients With Delirium Based on Unstructured Clinical Notes: Observational Study.

Authors:  Wendong Ge; Haitham Alabsi; Aayushee Jain; Elissa Ye; Haoqi Sun; Marta Fernandes; Colin Magdamo; Ryan A Tesh; Sarah I Collens; Amy Newhouse; Lidia Mvr Moura; Sahar Zafar; John Hsu; Oluwaseun Akeju; Gregory K Robbins; Shibani S Mukerji; Sudeshna Das; M Brandon Westover
Journal:  JMIR Form Res       Date:  2022-06-24

5.  Identifying stroke diagnosis-related features from medical imaging reports to improve clinical decision-making support.

Authors:  Xiaowei Xu; Lu Qin; Lingling Ding; Chunjuan Wang; Meng Wang; Zixiao Li; Jiao Li
Journal:  BMC Med Inform Decis Mak       Date:  2022-10-20       Impact factor: 3.298

6.  Extracting Medical Information from Paper COVID-19 Assessment Forms.

Authors:  Jacob D Schultz; Colin G White-Dzuro; Cheng Ye; Joseph R Coco; Janet M Myers; Claude Shackelford; S Trent Rosenbloom; Daniel Fabbri
Journal:  Appl Clin Inform       Date:  2021-03-10       Impact factor: 2.342

7.  Text Data Augmentation for Deep Learning.

Authors:  Connor Shorten; Taghi M Khoshgoftaar; Borko Furht
Journal:  J Big Data       Date:  2021-07-19

8.  Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition.

Authors:  Jianfu Li; Yujia Zhou; Xiaoqian Jiang; Karthik Natarajan; Serguei Vs Pakhomov; Hongfang Liu; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2021-09-18       Impact factor: 7.942

Review 9.  Resuscitation after global brain ischemia-anoxia.

Authors:  P Safar; A Bleyaert; E M Nemoto; J Moossy; J V Snyder
Journal:  Crit Care Med       Date:  1978 Jul-Aug       Impact factor: 9.296

10.  Artificial Intelligence-Powered Search Tools and Resources in the Fight Against COVID-19.

Authors:  Larry J Kricka; Sergei Polevikov; Jason Y Park; Paolo Fortina; Sergio Bernardini; Daniel Satchkov; Valentin Kolesov; Maxim Grishkov
Journal:  EJIFCC       Date:  2020-06-02
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.