Xiao Wu1, Yuzhe Zhao2, Dragomir Radev3, Ajay Malhotra4. 1. Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA. 2. Department of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA. 3. Department of Computer Science, Yale University, New Haven, CT, USA. 4. Department of Radiology and Biomedical Imaging, Yale University School of Medicine, Box 208042, Tompkins East 2, 333 Cedar St, New Haven, CT, 06520-8042, USA. ajay.malhotra@yale.edu.
Abstract
PURPOSE: The highly structured nature of medical reports makes them feasible for automated large-scale patient identification. This study aimed to develop a natural language processing (NLP) model to retrospectively retrieve patients with presence and history of carotid stenosis (CS) using their ultrasound reports. METHODS: Ultrasound reports from our institution between January 2016 and December 2017 were selected. To process the texts, we developed a parser to divide the raw text into fields. For baseline method, we used bag-of-n-grams and term frequency inverse document frequency as the features and used linear classifiers. Logistic regression was performed as the baseline model. Convolution and recurrent neural networks (CNN; RNN) with attention mechanism were applied to the dataset to improve the classification accuracy. RESULTS: We had 1220 ultrasound reports for training and 307 for testing, totaling to 1527 reports. For predicting history of CS, both CNN and RNN-attention models had a significantly higher specificity than logistic regression. In addition, RNN-attention also had a significantly higher F1 score and accuracy. For predicting presence of carotid stenosis, all models achieved above 93% accuracy. RNN-attention achieved a 95.4% accuracy, although the difference with logistic regression was not statistically significant. RNN-attention had a statistically significant higher specificity than logistic regression. CONCLUSIONS: We developed linear, CNN, and RNN models to predict history and presence of CS from ultrasound reports. We have demonstrated NLP to be an efficient, accurate approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and clinical research studies. KEY POINTS: • Natural language processing models using both linear classifiers and neural networks can achieve a good performance, with an overall accuracy above 90% in predicting history and presence of carotid stenosis. • Convolution and recurrent neural networks, especially with additional features including field awareness and attention mechanism, have superior performance than traditional linear classifiers. • NLP is shown to be an efficient approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and further clinical research studies.
PURPOSE: The highly structured nature of medical reports makes them feasible for automated large-scale patient identification. This study aimed to develop a natural language processing (NLP) model to retrospectively retrieve patients with presence and history of carotid stenosis (CS) using their ultrasound reports. METHODS: Ultrasound reports from our institution between January 2016 and December 2017 were selected. To process the texts, we developed a parser to divide the raw text into fields. For baseline method, we used bag-of-n-grams and term frequency inverse document frequency as the features and used linear classifiers. Logistic regression was performed as the baseline model. Convolution and recurrent neural networks (CNN; RNN) with attention mechanism were applied to the dataset to improve the classification accuracy. RESULTS: We had 1220 ultrasound reports for training and 307 for testing, totaling to 1527 reports. For predicting history of CS, both CNN and RNN-attention models had a significantly higher specificity than logistic regression. In addition, RNN-attention also had a significantly higher F1 score and accuracy. For predicting presence of carotid stenosis, all models achieved above 93% accuracy. RNN-attention achieved a 95.4% accuracy, although the difference with logistic regression was not statistically significant. RNN-attention had a statistically significant higher specificity than logistic regression. CONCLUSIONS: We developed linear, CNN, and RNN models to predict history and presence of CS from ultrasound reports. We have demonstrated NLP to be an efficient, accurate approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and clinical research studies. KEY POINTS: • Natural language processing models using both linear classifiers and neural networks can achieve a good performance, with an overall accuracy above 90% in predicting history and presence of carotid stenosis. • Convolution and recurrent neural networks, especially with additional features including field awareness and attention mechanism, have superior performance than traditional linear classifiers. • NLP is shown to be an efficient approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and further clinical research studies.
Entities:
Keywords:
Carotid stenosis; Natural language processing; Ultrasonography, Doppler
Authors: Luca Saba; Skandha S Sanagala; Suneet K Gupta; Vijaya K Koppula; Amer M Johri; Narendra N Khanna; Sophie Mavrogeni; John R Laird; Gyan Pareek; Martin Miner; Petros P Sfikakis; Athanasios Protogerou; Durga P Misra; Vikas Agarwal; Aditya M Sharma; Vijay Viswanathan; Vijay S Rathore; Monika Turk; Raghu Kolluri; Klaudija Viskovic; Elisa Cuadrado-Godia; George D Kitas; Neeraj Sharma; Andrew Nicolaides; Jasjit S Suri Journal: Ann Transl Med Date: 2021-07
Authors: Ayoub Bagheri; T Katrien J Groenhof; Folkert W Asselbergs; Saskia Haitjema; Michiel L Bots; Wouter B Veldhuis; Pim A de Jong; Daniel L Oberski Journal: J Healthc Eng Date: 2021-07-09 Impact factor: 2.682