Literature DB >> 32449766

An artificial intelligence approach to COVID-19 infection risk assessment in virtual visits: A case report.

Jihad S Obeid^1,2, Matthew Davis³, Matthew Turner³, Stephane M Meystre^2,4, Paul M Heider², Edward C O'Bryan⁵, Leslie A Lenert^2,6.

Abstract

OBJECTIVE: In an effort to improve the efficiency of computer algorithms applied to screening for coronavirus disease 2019 (COVID-19) testing, we used natural language processing and artificial intelligence-based methods with unstructured patient data collected through telehealth visits.
MATERIALS AND METHODS: After segmenting and parsing documents, we conducted analysis of overrepresented words in patient symptoms. We then developed a word embedding-based convolutional neural network for predicting COVID-19 test results based on patients' self-reported symptoms.
RESULTS: Text analytics revealed that concepts such as smell and taste were more prevalent than expected in patients testing positive. As a result, screening algorithms were adapted to include these symptoms. The deep learning model yielded an area under the receiver-operating characteristic curve of 0.729 for predicting positive results and was subsequently applied to prioritize testing appointment scheduling.
CONCLUSIONS: Informatics tools such as natural language processing and artificial intelligence methods can have significant clinical impacts when applied to data streams early in the development of clinical systems for outbreak response.

Entities: CellLine Disease Species

Keywords: AI; COVID-19; artificial intelligence; risk assessment; text analytics

Mesh：

Year: 2020 PMID： 32449766 PMCID： PMC7313981 DOI： 10.1093/jamia/ocaa105

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

Coronavirus disease 2019 (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a virus in a family of highly pathogenic human coronaviruses. This novel coronavirus is a particularly infectious strain resulting in a global pandemic that reached the United States early in the course of the outbreak. One of the lynchpins of controlling the spread of COVID-19 is aggressive testing. Testing for SARS-CoV-2 is resource-intensive, as it involves the collection of a nasopharyngeal swab specimen under biosafety level 2 conditions and laboratory capacity for reverse-transcription polymerase chain reaction (RT-PCR) assay of SARS-CoV-2 RNA. As individual states in the United States ramp up testing facilities, prioritizing testing based on risk of exposure, clinical symptoms, and preexisting risk factors has become an imperative. The Medical University of South Carolina (MUSC) Health system established a free virtual care consultation and screening service for symptomatic individuals in the state of South Carolina. Telehealth providers then screen and prioritize patients for testing. The virtual care visits are captured through a telehealth system, which allows providers to screen patients and prioritize testing via a drive-through testing facility. The data are captured in the telehealth system (Zipnosis, Minneapolis, MN), which includes patient-entered text information. As testing was a limited resource, even with computer screening, there were significant delays for patients in scheduling tests. The informatics research team at MUSC, as part of its outbreak response strategy, undertook the task of enhancing access to and use of the data in Zipnosis notes to prioritize and inform testing. One of the main challenges of this task was that the information piped into the electronic health record (EHR) is not in a structured format, but rather in a text “blob” that contained information both from a template-based patient-facing form and free-text data entered by the patient. The use of EHR to identify specific clinical phenotypes has gained significant momentum over recent years. Characterizing patients based on EHR has several useful purposes, including, but not limited to, clinical decision support, population health studies, and identification of participants for research recruitment., As exemplified by the virtual care data feed at MUSC, a good portion of the information within the EHR resides in free-text format contained inside numerous types of clinical notes., In addition to well-established natural language processing (NLP) pipelines that have been developed for extracting information from unstructured data, machine learning–based clinical text classification approaches have also been used to characterize patients using EHR data. More recently, deep learning approaches such as convolutional neural networks (CNNs) have been used both in predictive modeling in the clinical domain and for phenotyping efforts through clinical text classification. In this case report, we describe the application of text analysis and deep learning methods to improve our testing algorithms.

MATERIALS AND METHODS

Context

The virtual urgent care program for COVID-19 was established by MUSC Health based on the Centers for Disease Control and Prevention guidelines to screen and evaluate presumptive cases in our region. To minimize exposure and lessen the risk of nosocomial infections, patients are advised to visit MUSC Health virtual urgent care for screening and medical advice from trained MUSC Health care providers via a secure online telehealth virtual care system by Zipnosis. Referral for testing for patients at high risk or those who need inpatient care is determined based on the consultation with the providers. The data from the virtual care system are fed into our EHR system (Epic Systems Corporation, Verona, WI) via a proprietary application programming interface (HL7 V2.x.). Data were subsequently extracted from Epic Clarity and moved to a cloud-based “data lake” analytics infrastructure in Azure (Microsoft, Redmond, WA).

Patient population

We included patients with virtual care visits with COVID-19 listed as the reason for the visit. Patients without test results 14 days following the visit were excluded. For patients with multiple test results, only the final result was considered. The total number of patients included in our analysis was 6813, 498 of whom tested positive and 6315 of whom tested negative.

Text processing

The telehealth system notes were preprocessed using a simple Apache UIMA–based NLP application. A pattern matching–based algorithm split the notes into sections and labeled these sections to enable filtering out boilerplate information and instructions from the Zipnosis template while focusing on relevant sections. Examples of such header-demarcated sections included a “Patient Summary:” section in which symptoms were reported by the patient and a section labeled “Pertinent COVID-19 information” in which travel information was reported. Simple pattern matching was also used for limited dataset de-identification, replacing patient names, phone numbers, and addresses with generic tokens in order to protect patient privacy. Diagnosis codes that were demarcated by the template were extracted and appended to the end of the clinical note. Stop words were removed prior to tokenization.

Text analytics

As part of the analysis prior to machine learning, we examined differences in word frequencies across clinical notes with positive test results as compared with notes from those with known negative results. We performed a chi-square analysis to assess words that are overrepresented across these corpora of text. This analysis provided insight into key words associated with positive COVID-19 tests results.

Model architecture

We used Keras and TensorFlow version 2.0 for constructing and training the CNN model. To construct the features for the deep learning models, the text sequences were tokenized and padded with zeros at the end of sequences to match the length of the longest string in the training set. The input layer had a dimension size of 628, slightly exceeding the maximum length of the input sequences of tokens. We used word2vec for the word-embedding layer. The embedding weights were initialized with 200-dimension word vectors from a word2vec model pretrained on a PubMed corpus. The embedding layer had a drop rate of 0.3. This was followed by a convolutional layer with multiple filter sizes (3, 4, and 5) in parallel, with 100 filters in each, ReLU (Rectified Linear Unit) activation, a stride of 1, and global max-pooling, which was followed by a merge tensor then a fully connected 512-node hidden layer with ReLU activation and a drop rate of 0.3. Finally, the output layer had a single binary node with a sigmoid activation function. Several hyperparameter configurations were tried, for example, randomly initialized with uniform distributions with dimension 50, 100, or 200 dimensions in the embedding layer; 50, 80, 100, or 200 filters in the convolutional layer; kernel sizes (2,3,4), (3,4,5), or (4, 5,6); and a variety of learning rates and learning rate reduction factors. These were all tracked with MLflow and the model with best performance on the hold-out set was selected. The final learning rate used was 4 × 10-4, with a reduction factor of 0.5 on performance plateau.

Training and evaluation

The data were partitioned into 3 sets based on random sampling of patients into a training set (60%), cross-validation set (16%), and hold-out test set (24%). There was no overlap of patients across the 3 partitions. The cross-validation set was used for the validation during training epochs. The test set was only used after model fitting to assess performance. A logistic regression model using a bag-of-words count-based vectors as features was used as a comparator. The performance was evaluated using the area under the receiver-operating characteristic curve (AUC). To assess the precision, recall, and F1 score, we downsampled the test set to balance the notes to an equal number of positives and negatives. We did a 100 different cycles of random selections of 120 cases in each class to calculate the mean AUC, precision, recall, and F1-score. We used a probability threshold of 0.2 to optimize for the F1 score. We later examined the output of the model on all patients with virtual care visits with COVID-19 listed as the reason for the visit to assess the discriminant power of the model across three risk categories based on the predicted probability (low if P ≤ .2, medium if P is between .2 and .9, high if P ≥ .9).

Ethical considerations

The purpose of this project was intended to improve the screening process of our virtual care visit program at MUSC for COVID-19 testing and did not involve a systematic investigation or experimental procedure. Therefore, the project was determined to be quality improvement and was not subject to Institutional Review Board for Human Research approval based on the definition of research pursuant to the Common Rule [45 CFR 46.102(d)].,

RESULTS

Word frequencies

The results from the analysis of overrepresented keywords in clinical notes with positive test results as compared with notes from those with negative test results are shown in Figure 1. All results are highly statistically significant. For example, the words smell, taste, sense, and lost are mentioned at a much higher frequency (P < .0001) by patients who tested positive for SARS-CoV-2 vs those who did not.

Figure 1.

Top 10 words that are overrepresented in patients who tested positive for COVID-19 (coronavirus disease 2019), showing relevant words expressed by patients during the virtual care visit intake process.

Deep learning performance

The deep learning model was extremely accurate in classification of training sets (AUC = 0.962) but because of the small size of the training sample yielded only moderate performance, with an AUC of 0.729 (Figure 2) in the hold-out test set; this compared with an AUC of 0.704 for the logistic model regression in the same dataset—a modest but important improvement.

Figure 2.

The area under the receiver-operating characteristic curve (AUC) of the convolutional neural network for predicting SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2)–positive results based on the text content of the virtual care visit notes. The evaluation metrics base on the repeated cycles of randomly selected balanced test sets are shown in Table 1. The CNN outperformed the logistic regression model on all the metrics except for precision.

Table 1.

Mean values for AUC, precision, recall, and F1 score based on repeated balanced test sets

Model	AUC (95% CI)	Precision	Recall	F1 score
CNN	0.732 (0.697-0.767)	0.754	0.453	0.566 (0.541-0.586)
LR	0.707 (0.665-0.739)	0.800	0.227	0.354 (0.329-0.377)

AUC: area under the receiver-operating characteristic curve; CI: confidence interval; CNN: convolutional neural network; LR: logistic regression.

Mean values for AUC, precision, recall, and F1 score based on repeated balanced test sets AUC: area under the receiver-operating characteristic curve; CI: confidence interval; CNN: convolutional neural network; LR: logistic regression. The overall rate of positive tests of patients seen via virtual care was 5.6%. In discussions with the telehealth providers, we decided to optimize risk groups into 3 categories with the selected cutoffs at 0.2 and 0.9, respectively, which puts a low-risk group at <3% positive test rates and a high-risk group at around 60% positive test rate, resulting in a reasonable follow-up rate of around a few dozen calls per day. Even though the accuracy of the model was only acceptable, it was still useful in discriminating patients into these risk categories (Table 2). Looking across all patients with virtual care visits who were tested, we were able to identify a high-risk group that was potentially useful in prioritizing tests.

Table 2.

Analysis of discriminant power of the model

Category	Tested	Positive	% Positive
High	475	289	60.84
Medium	1,915	127	6.63
Low	9,401	244	2.60
Total	11,791	660	5.60

Analysis of discriminant power of the model

DISCUSSION

COVID-19 has brought new scenarios to medicine in which patients are systematically screened using computer algorithms for eligibility for viral testing and subsequent care. This case report demonstrates the rapid application of what previously have been “research” methods to rapidly improve an institution’s computer screening algorithms for COVID-19. Predicting positive results based on clinical text is challenging. The clinical notes can contain a significant amount of noise, both as a result of the templated text, as well as patient-entered information that is often irrelevant with respect to the result of SARS-CoV-2 PCR testing. This may explain the modest performance of the model (AUC = 0.729). However, risk stratification of potentially positive individuals may have significant value. At MUSC, during the period of this study, despite application of a computer algorithm, only 5.6% of tested individuals tested positive, which was not a very efficient use of limited testing resources. Even with an imperfect model, it was possible to risk-stratify the population, helping direct resources to patients in most need. Daily predictions from the model were applied to prioritize appointments for drive-through testing procedures (i.e., high-risk patients were called first). The text analytics highlighted important symptoms that had not been captured by the screening form—namely, lack of smell and taste in affected patients. Anosmia and the alteration of the sense of taste have been reported by mildly symptomatic patients with SARS-CoV-2 infection and are often the first noted symptoms. In our hands, the presence of these symptoms as reported by the patients themselves turned out to be the most sensitive predictor of positive testing results. Other words relevant to COVID-19 signs and symptoms (eg, temperature, fever, cough, and words related to dyspnea) were not as prominent as we expected, likely owing to the fact that such symptoms were captured through the semistructured template, which could have masked overrepresentation. This finding, along with other published literature, resulted in the alteration of the online screening form to specifically include questions about smell and taste just ahead to the updated Centers for Disease Control and Prevention guidelines on the “symptoms of coronavirus,” which includes these specific symptoms. This finding demonstrates the value of a data-driven approach for the identification of relevant symptoms in novel infections such as the one at the root of this rapidly evolving pandemic.

Limitations

Fortunately, the number of positive SARS-CoV-2 test results was low at our institution. As a result, the sample size for training a deep learning model such as the CNN described herein is suboptimal. More data are needed to refine the model and provide better risk stratification. The complete clinical picture should be considered in testing decisions, including the severity of symptoms and history of underlying chronic diseases., Patients with preexisting or comorbid conditions are at higher risk of mortality and may need to be prioritized for clinical reasons, even if the risk of a positive test is low.

Future work

Future work will include more advanced NLP extraction including local context analysis to identify negated terms (e.g., “denies fever”) and terms referring to individuals other than the patient (e.g., “spouse has a fever”), term normalization to standard terminologies, and algorithms that generalize to a variety of clinical text notes. Moreover, expanding training sets and developing predictive models that include preexisting risk factors will provide a more comprehensive tool that informs the decisions of our telehealth providers.

CONCLUSION

This case report describes our rapid use of artificial intelligence methods to improve the efficiency of COVID-19 testing. The results from our text analysis identified symptoms that informed the electronic triage process prior to wide publication of these associations and also revealed how artificial intelligence methods could be used prioritize patients screening positive for testing.

AUTHOR CONTRIBUTIONS

All authors provided substantial input into the conception and design of this work, participated in drafting and revising it critically, and provided final approval of the version to be published.

30 in total

Review 1. Extracting information from textual documents in the electronic health record: a review of recent research.

Authors: S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal: Yearb Med Inform Date: 2008

2. EHR Big Data Deep Phenotyping. Contribution of the IMIA Genomic Medicine Working Group.

Authors: L J Frey; L Lenert; G Lopez-Campos
Journal: Yearb Med Inform Date: 2014-08-15

3. The COVID-19 Pandemic in the US: A Clinical Update.

Authors: Saad B Omer; Preeti Malani; Carlos Del Rio
Journal: JAMA Date: 2020-05-12 Impact factor: 56.272

4. Clinical Characteristics of 138 Hospitalized Patients With 2019 Novel Coronavirus-Infected Pneumonia in Wuhan, China.

Authors: Dawei Wang; Bo Hu; Chang Hu; Fangfang Zhu; Xing Liu; Jing Zhang; Binbin Wang; Hui Xiang; Zhenshun Cheng; Yong Xiong; Yan Zhao; Yirong Li; Xinghuan Wang; Zhiyong Peng
Journal: JAMA Date: 2020-03-17 Impact factor: 56.272

Review 5. Mining electronic health records: towards better research applications and clinical care.

Authors: Peter B Jensen; Lars J Jensen; Søren Brunak
Journal: Nat Rev Genet Date: 2012-05-02 Impact factor: 53.242

6. An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study.

Authors: Victor W Zhong; Jihad S Obeid; Jean B Craig; Emily R Pfaff; Joan Thomas; Lindsay M Jaacks; Daniel P Beavers; Timothy S Carey; Jean M Lawrence; Dana Dabelea; Richard F Hamman; Deborah A Bowlby; Catherine Pihoker; Sharon H Saydah; Elizabeth J Mayer-Davis
Journal: J Am Med Inform Assoc Date: 2016-04-23 Impact factor: 4.497

7. Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records.

Authors: Zubair Afzal; Martijn J Schuemie; Jan C van Blijderveen; Elif F Sen; Miriam C J M Sturkenboom; Jan A Kors
Journal: BMC Med Inform Decis Mak Date: 2013-03-02 Impact factor: 2.796

8. A survey of practices for the use of electronic health records to support research recruitment.

Authors: Jihad S Obeid; Laura M Beskow; Marie Rape; Ramkiran Gouripeddi; R Anthony Black; James J Cimino; Peter J Embi; Chunhua Weng; Rebecca Marnocha; John B Buse
Journal: J Clin Transl Sci Date: 2017-08

9. CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

Authors: Ergin Soysal; Jingqi Wang; Min Jiang; Yonghui Wu; Serguei Pakhomov; Hongfang Liu; Hua Xu
Journal: J Am Med Inform Assoc Date: 2018-03-01 Impact factor: 4.497

Review 10. Electronic health records to facilitate clinical research.

Authors: Martin R Cowie; Juuso I Blomster; Lesley H Curtis; Sylvie Duclaux; Ian Ford; Fleur Fritz; Samantha Goldman; Salim Janmohamed; Jörg Kreuzer; Mark Leenay; Alexander Michel; Seleen Ong; Jill P Pell; Mary Ross Southworth; Wendy Gattis Stough; Martin Thoenes; Faiez Zannad; Andrew Zalewski
Journal: Clin Res Cardiol Date: 2016-08-24 Impact factor: 5.460

15 in total

1. A systematic review on AI/ML approaches against COVID-19 outbreak.

Authors: Onur Dogan; Sanju Tiwari; M A Jabbar; Shankru Guggari
Journal: Complex Intell Systems Date: 2021-07-05

2. Natural language processing enabling COVID-19 predictive analytics to support data-driven patient advising and pooled testing.

Authors: Stéphane M Meystre; Paul M Heider; Youngjun Kim; Matthew Davis; Jihad Obeid; James Madory; Alexander V Alekseyenko
Journal: J Am Med Inform Assoc Date: 2021-12-28 Impact factor: 7.942

3. Contribution of Deep-Learning Techniques Toward Fighting COVID-19: A Bibliometric Analysis of Scholarly Production During 2020.

Authors: Janneth Chicaiza; Stephany D Villota; Paola G Vinueza-Naranjo; Ruben Rumipamba-Zambrano
Journal: IEEE Access Date: 2022-03-11 Impact factor: 3.476

4. Leveraging Informatics and Technology to Support Public Health Response: Framework and Illustrations using COVID-19.

Authors: Jane L Snowdon; William Kassler; Hema Karunakaram; Brian E Dixon; Kyu Rhee
Journal: Online J Public Health Inform Date: 2021-03-21

5. Patient-specific COVID-19 resource utilization prediction using fusion AI model.

Authors: Amara Tariq; Leo Anthony Celi; Janice M Newsome; Saptarshi Purkayastha; Neal Kumar Bhatia; Hari Trivedi; Judy Wawira Gichoya; Imon Banerjee
Journal: NPJ Digit Med Date: 2021-06-03

6. Leveraging health system telehealth and informatics infrastructure to create a continuum of services for COVID-19 screening, testing, and treatment.

Authors: Dee Ford; Jillian B Harvey; James McElligott; Kathryn King; Kit N Simpson; Shawn Valenta; Emily H Warr; Tasia Walsh; Ellen Debenham; Carla Teasdale; Stephane Meystre; Jihad S Obeid; Christopher Metts; Leslie A Lenert
Journal: J Am Med Inform Assoc Date: 2020-12-09 Impact factor: 4.497

Review 7. How artificial intelligence may help the Covid-19 pandemic: Pitfalls and lessons for the future.

Authors: Yashpal Singh Malik; Shubhankar Sircar; Sudipta Bhat; Mohd Ikram Ansari; Tripti Pande; Prashant Kumar; Basavaraj Mathapati; Ganesh Balasubramanian; Rahul Kaushik; Senthilkumar Natesan; Sayeh Ezzikouri; Mohamed E El Zowalaty; Kuldeep Dhama
Journal: Rev Med Virol Date: 2020-12-19 Impact factor: 11.043