Emily Wheater1, Grant Mair1, Cathie Sudlow1,2,3, Beatrice Alex4,5, Claire Grover4,5, William Whiteley6,7. 1. Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK. 2. Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK. 3. Health Data Research UK Scotland, Edinburgh, UK. 4. The Alan Turing Institute, British Library, 96 Euston Road, London, UK. 5. Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Edinburgh, UK. 6. Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK. william.whiteley@ed.ac.uk. 7. Nuffield Department of Population Health, University of Oxford, Oxford, UK. william.whiteley@ed.ac.uk.
Abstract
BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. RESULTS: The agreement between expert readers was excellent (Cohen's κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81-94); positive predictive value (PPV) 85% (76-90); specificity 100% (95% CI:0.99-1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80-99), PPV 72% (95% CI:55-84); specificity 100% (95% CI:0.99-1.00)]; brain tumours [sensitivity 96% (CI:87-99); PPV 84% (73-91); specificity: 100% (95% CI:0.99-1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.
BACKGROUND: Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). METHODS: We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. RESULTS: The agreement between expert readers was excellent (Cohen's κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81-94); positive predictive value (PPV) 85% (76-90); specificity 100% (95% CI:0.99-1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80-99), PPV 72% (95% CI:55-84); specificity 100% (95% CI:0.99-1.00)]; brain tumours [sensitivity 96% (CI:87-99); PPV 84% (73-91); specificity: 100% (95% CI:0.99-1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. CONCLUSIONS: An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.
Authors: Louise Deleger; Qi Li; Todd Lingren; Megan Kaiser; Katalin Molnar; Laura Stoutenborough; Michal Kouril; Keith Marsolo; Imre Solti Journal: AMIA Annu Symp Proc Date: 2012-11-03
Authors: Victor M Castro; Dmitriy Dligach; Sean Finan; Sheng Yu; Anil Can; Muhammad Abd-El-Barr; Vivian Gainer; Nancy A Shadick; Shawn Murphy; Tianxi Cai; Guergana Savova; Scott T Weiss; Rose Du Journal: Neurology Date: 2016-12-07 Impact factor: 9.910
Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069
Authors: Sunyang Fu; Lester Y Leung; Yanshan Wang; Anne-Olivia Raulli; David F Kallmes; Kristin A Kinsman; Kristoff B Nelson; Michael S Clark; Patrick H Luetmer; Paul R Kingsbury; David M Kent; Hongfang Liu Journal: JMIR Med Inform Date: 2019-04-21
Authors: William Whiteley; Caroline Jackson; Steff Lewis; Gordon Lowe; Ann Rumley; Peter Sandercock; Joanna Wardlaw; Martin Dennis; Cathie Sudlow Journal: PLoS Med Date: 2009-09-08 Impact factor: 11.069
Authors: M D Li; M Lang; F Deng; K Chang; K Buch; S Rincon; W A Mehan; T M Leslie-Mazwi; J Kalpathy-Cramer Journal: AJNR Am J Neuroradiol Date: 2020-12-17 Impact factor: 3.825
Authors: Arlene Casey; Emma Davidson; Michael Poon; Hang Dong; Daniel Duma; Andreas Grivas; Claire Grover; Víctor Suárez-Paniagua; Richard Tobin; William Whiteley; Honghan Wu; Beatrice Alex Journal: BMC Med Inform Decis Mak Date: 2021-06-03 Impact factor: 2.796
Authors: Victor M Torres-Lopez; Grace E Rovenolt; Angelo J Olcese; Gabriella E Garcia; Sarah M Chacko; Amber Robinson; Edward Gaiser; Julian Acosta; Alison L Herman; Lindsey R Kuohn; Megan Leary; Alexandria L Soto; Qiang Zhang; Safoora Fatima; Guido J Falcone; M Seyedmehdi Payabvash; Richa Sharma; Aaron F Struck; Kevin N Sheth; M Brandon Westover; Jennifer A Kim Journal: JAMA Netw Open Date: 2022-08-01