Literature DB >> 33683212

Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation.

Yiqing Zhao1, Sunyang Fu1, Suzette J Bielinski1, Paul A Decker1, Alanna M Chamberlain1, Veronique L Roger1, Hongfang Liu1, Nicholas B Larson1.   

Abstract

BACKGROUND: Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events.
OBJECTIVE: The aim of this study was to develop a machine learning-based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing.
METHODS: The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314).
RESULTS: Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample.
CONCLUSIONS: We developed and validated a machine learning-based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions. ©Yiqing Zhao, Sunyang Fu, Suzette J Bielinski, Paul A Decker, Alanna M Chamberlain, Veronique L Roger, Hongfang Liu, Nicholas B Larson. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.03.2021.

Entities:  

Keywords:  electronic health records; machine learning; natural language processing; stroke

Year:  2021        PMID: 33683212      PMCID: PMC7985804          DOI: 10.2196/22951

Source DB:  PubMed          Journal:  J Med Internet Res        ISSN: 1438-8871            Impact factor:   5.428


  47 in total

1.  Predicting discharge mortality after acute ischemic stroke using balanced data.

Authors:  King Chung Ho; William Speier; Suzie El-Saden; David S Liebeskind; Jeffery L Saver; Alex A T Bui; Corey W Arnold
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

2.  Prediction of stroke in the general population in Europe (EUROSTROKE): Is there a role for fibrinogen and electrocardiography?

Authors:  K G M Moons; M L Bots; J T Salonen; P C Elwood; A Freire de Concalves; Y Nikitin; J Sivenius; D Inzitari; V Benetou; J Tuomilehto; P J Koudstaal; D E Grobbee
Journal:  J Epidemiol Community Health       Date:  2002-02       Impact factor: 3.710

3.  Electronic medical records for clinical research: application to the identification of heart failure.

Authors:  Serguei Pakhomov; Susan A Weston; Steven J Jacobsen; Christopher G Chute; Ryan Meverden; Véronique L Roger
Journal:  Am J Manag Care       Date:  2007-06       Impact factor: 2.229

4.  Automated concept-level information extraction to reduce the need for custom software and rules development.

Authors:  Leonard W D'Avolio; Thien M Nguyen; Sergey Goryachev; Louis D Fiore
Journal:  J Am Med Inform Assoc       Date:  2011-06-22       Impact factor: 4.497

5.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

6.  A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record.

Authors:  Adam Wright; Justine Pang; Joshua C Feblowitz; Francine L Maloney; Allison R Wilcox; Harley Z Ramelson; Louise I Schneider; David W Bates
Journal:  J Am Med Inform Assoc       Date:  2011-05-25       Impact factor: 4.497

7.  Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project.

Authors:  Jennifer L St Sauver; Brandon R Grossardt; Barbara P Yawn; L Joseph Melton; Walter A Rocca
Journal:  Am J Epidemiol       Date:  2011-03-23       Impact factor: 4.897

8.  Chapter 13: Mining electronic health records in the genomics era.

Authors:  Joshua C Denny
Journal:  PLoS Comput Biol       Date:  2012-12-27       Impact factor: 4.475

Review 9.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

Authors:  Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams
Journal:  Genet Med       Date:  2013-06-06       Impact factor: 8.822

10.  A phenotyping algorithm to identify acute ischemic stroke accurately from a national biobank: the Million Veteran Program.

Authors:  Tasnim F Imran; Daniel Posner; Jacqueline Honerlaw; Jason L Vassy; Rebecca J Song; Yuk-Lam Ho; Steven J Kittner; Katherine P Liao; Tianxi Cai; Christopher J O'Donnell; Luc Djousse; David R Gagnon; J Michael Gaziano; Peter Wf Wilson; Kelly Cho
Journal:  Clin Epidemiol       Date:  2018-10-16       Impact factor: 4.790

View more
  1 in total

Review 1.  Can Artificial Intelligence Be Applied to Diagnose Intracerebral Hemorrhage under the Background of the Fourth Industrial Revolution? A Novel Systemic Review and Meta-Analysis.

Authors:  Kai Zhao; Qing Zhao; Ping Zhou; Bin Liu; Qiang Zhang; Mingfei Yang
Journal:  Int J Clin Pract       Date:  2022-02-24       Impact factor: 3.149

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.