Matthew I Miller1, Agni Orfanoudaki2, Michael Cronin1, Hanife Saglam3, Ivy So Yeon Kim4, Oluwafemi Balogun4,5, Maria Tzalidi6, Kyriakos Vasilopoulos6, Georgia Fanaropoulou6, Nina M Fanaropoulou7, Jack Kalin1, Meghan Hutch8,9, Brenton R Prescott4, Benjamin Brush10, Emelia J Benjamin1,5, Min Shin11, Asim Mian12, David M Greer1,4, Stelios M Smirnakis9,13,14, Charlene J Ong15,16,17,18,19. 1. Department of Neurology, Boston University School of Medicine, 85 E. Concord St., Suite 1116, Boston, MA, 02118, USA. 2. Saïd Business School, University of Oxford, Oxford, UK. 3. Department of Neurology, West Virginia University School of Medicine, Morgantown, WV, USA. 4. Boston Medical Center, Boston, MA, USA. 5. Boston University School of Public Health, Boston, MA, USA. 6. School of Medicine, University of Crete, Heraklion, Greece. 7. School of Medicine, Faculty of Health Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece. 8. Department of Preventive Medicine, Northwestern University, Chicago, IL, USA. 9. Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA. 10. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. 11. Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA. 12. Department of Radiology, Boston Medical Center, Boston, MA, USA. 13. Harvard Medical School, Boston, MA, USA. 14. Jamaica Plain Veterans Administration Hospital, Boston, MA, USA. 15. Department of Neurology, Boston University School of Medicine, 85 E. Concord St., Suite 1116, Boston, MA, 02118, USA. cjong@bu.edu. 16. Boston Medical Center, Boston, MA, USA. cjong@bu.edu. 17. Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA. cjong@bu.edu. 18. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. cjong@bu.edu. 19. Harvard Medical School, Boston, MA, USA. cjong@bu.edu.
Abstract
BACKGROUND: Abstraction of critical data from unstructured radiologic reports using natural language processing (NLP) is a powerful tool to automate the detection of important clinical features and enhance research efforts. We present a set of NLP approaches to identify critical findings in patients with acute ischemic stroke from radiology reports of computed tomography (CT) and magnetic resonance imaging (MRI). METHODS: We trained machine learning classifiers to identify categorical outcomes of edema, midline shift (MLS), hemorrhagic transformation, and parenchymal hematoma, as well as rule-based systems (RBS) to identify intraventricular hemorrhage (IVH) and continuous MLS measurements within CT/MRI reports. Using a derivation cohort of 2289 reports from 550 individuals with acute middle cerebral artery territory ischemic strokes, we externally validated our models on reports from a separate institution as well as from patients with ischemic strokes in any vascular territory. RESULTS: In all data sets, a deep neural network with pretrained biomedical word embeddings (BioClinicalBERT) achieved the highest discrimination performance for binary prediction of edema (area under precision recall curve [AUPRC] > 0.94), MLS (AUPRC > 0.98), hemorrhagic conversion (AUPRC > 0.89), and parenchymal hematoma (AUPRC > 0.76). BioClinicalBERT outperformed lasso regression (p < 0.001) for all outcomes except parenchymal hematoma (p = 0.755). Tailored RBS for IVH and continuous MLS outperformed BioClinicalBERT (p < 0.001) and linear regression, respectively (p < 0.001). CONCLUSIONS: Our study demonstrates robust performance and external validity of a core NLP tool kit for identifying both categorical and continuous outcomes of ischemic stroke from unstructured radiographic text data. Medically tailored NLP methods have multiple important big data applications, including scalable electronic phenotyping, augmentation of clinical risk prediction models, and facilitation of automatic alert systems in the hospital setting.
BACKGROUND: Abstraction of critical data from unstructured radiologic reports using natural language processing (NLP) is a powerful tool to automate the detection of important clinical features and enhance research efforts. We present a set of NLP approaches to identify critical findings in patients with acute ischemic stroke from radiology reports of computed tomography (CT) and magnetic resonance imaging (MRI). METHODS: We trained machine learning classifiers to identify categorical outcomes of edema, midline shift (MLS), hemorrhagic transformation, and parenchymal hematoma, as well as rule-based systems (RBS) to identify intraventricular hemorrhage (IVH) and continuous MLS measurements within CT/MRI reports. Using a derivation cohort of 2289 reports from 550 individuals with acute middle cerebral artery territory ischemic strokes, we externally validated our models on reports from a separate institution as well as from patients with ischemic strokes in any vascular territory. RESULTS: In all data sets, a deep neural network with pretrained biomedical word embeddings (BioClinicalBERT) achieved the highest discrimination performance for binary prediction of edema (area under precision recall curve [AUPRC] > 0.94), MLS (AUPRC > 0.98), hemorrhagic conversion (AUPRC > 0.89), and parenchymal hematoma (AUPRC > 0.76). BioClinicalBERT outperformed lasso regression (p < 0.001) for all outcomes except parenchymal hematoma (p = 0.755). Tailored RBS for IVH and continuous MLS outperformed BioClinicalBERT (p < 0.001) and linear regression, respectively (p < 0.001). CONCLUSIONS: Our study demonstrates robust performance and external validity of a core NLP tool kit for identifying both categorical and continuous outcomes of ischemic stroke from unstructured radiographic text data. Medically tailored NLP methods have multiple important big data applications, including scalable electronic phenotyping, augmentation of clinical risk prediction models, and facilitation of automatic alert systems in the hospital setting.
Authors: Curtis P Langlotz; Bibb Allen; Bradley J Erickson; Jayashree Kalpathy-Cramer; Keith Bigelow; Tessa S Cook; Adam E Flanders; Matthew P Lungren; David S Mendelson; Jeffrey D Rudie; Ge Wang; Krishna Kandarpa Journal: Radiology Date: 2019-04-16 Impact factor: 11.105
Authors: E Murat Arsava; Johanna Helenius; Ross Avery; Mine H Sorgun; Gyeong-Moon Kim; Octavio M Pontes-Neto; Kwang Yeol Park; Jonathan Rosand; Mark Vangel; Hakan Ay Journal: JAMA Neurol Date: 2017-04-01 Impact factor: 18.302
Authors: M D Li; M Lang; F Deng; K Chang; K Buch; S Rincon; W A Mehan; T M Leslie-Mazwi; J Kalpathy-Cramer Journal: AJNR Am J Neuroradiol Date: 2020-12-17 Impact factor: 3.825
Authors: Charlene Jennifer Ong; Agni Orfanoudaki; Rebecca Zhang; Francois Pierre M Caprasse; Meghan Hutch; Liang Ma; Darian Fard; Oluwafemi Balogun; Matthew I Miller; Margaret Minnig; Hanife Saglam; Brenton Prescott; David M Greer; Stelios Smirnakis; Dimitris Bertsimas Journal: PLoS One Date: 2020-06-19 Impact factor: 3.240
Authors: Anoop Mayampurath; Zahra Parnianpour; Christopher T Richards; William J Meurer; Jungwha Lee; Bruce Ankenman; Ohad Perry; Scott J Mendelson; Jane L Holl; Shyam Prabhakaran Journal: Stroke Date: 2021-06-24 Impact factor: 10.170