| Literature DB >> 34423261 |
Himanshu S Sahoo1, Greg M Silverman2, Nicholas E Ingraham3, Monica I Lupei4, Michael A Puskarich5, Raymond L Finzel6, John Sartori1, Rui Zhang6, Benjamin C Knoll7, Sijia Liu8, Hongfang Liu8, Genevieve B Melton2, Christopher J Tignanelli2, Serguei V S Pakhomov6.
Abstract
OBJECTIVE: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution.Entities:
Keywords: and symptoms; artificial intelligence; clinical decision support systems; follow-up studies; information extraction; natural language processing; signs
Year: 2021 PMID: 34423261 PMCID: PMC8374371 DOI: 10.1093/jamiaopen/ooab070
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Count of document-level mentions for acute CDC symptoms for the corpora
| Features | No. of mentions | |
|---|---|---|
| UMNCor | MayoCor | |
| cdc_aches_n | 3 | 3 |
| cdc_aches_p | 18 | 3 |
| cdc_cough_n | 11 | 6 |
| cdc_cough_p | 28 | 22 |
| cdc_diarrhea_n | 12 | 14 |
| cdc_diarrhea_p | 11 | 18 |
| cdc_dyspnea_n | 10 | 15 |
| cdc_dyspnea_p | 28 | 34 |
| cdc_fatigue_n | 2 | 1 |
| cdc_fatigue_p | 15 | 14 |
| cdc_fever_n | 9 | 24 |
| cdc_fever_p | 30 | 18 |
| cdc_headaches_n | 5 | 8 |
| cdc_headaches_p | 8 | 15 |
| cdc_nausea_vomiting_n | 19 | 20 |
| cdc_nausea_vomiting_p | 13 | 27 |
| cdc_rhinitis_congestion_n | 7 | 1 |
| cdc_rhinitis_congestion_p | 8 | 2 |
| cdc_sore_throat_n | 6 | 4 |
| cdc_sore_throat_p | 9 | 3 |
| cdc_taste_smell_loss_n | 2 | 3 |
| cdc_taste_smell_loss_p | 5 | 5 |
| sum | 259 | 260 |
Note: suffix “_p” following an acute CDC symptom represents positive document-level mention for the acute CDC symptom and while suffix “_n” represents negative document-level mention
Overall microaverage performance measures of the annotation systems for both corpora (confidence intervals are present in Supplementary Appendix D.1-2)
| UMNCor | MayoCor | |||||
|---|---|---|---|---|---|---|
| System | Precision | Recall | f1-score | Precision | Recall | f1-score |
| BioMedICUS | 0.78 | 0.75 | 0.75 | 0.89 | 0.89 | 0.89 |
| CLAMP | 0.84 | 0.85 | 0.85 | 0.91 | 0.92 | 0.91 |
| cTAKES | 0.83 | 0.80 | 0.81 | 0.91 | 0.90 | 0.91 |
| MetaMap | 0.85 | 0.84 | 0.85 | 0.90 | 0.91 | 0.90 |
| COVID-19 Gazetteer | 0.89 | 0.86 | 0.87 | 0.91 | 0.91 | 0.91 |
| MedTagger Custom | 0.82 | 0.82 | 0.82 | 0.92 | 0.92 | 0.92 |
| MedTagger | 0.88 | 0.85 | 0.85 | 0.91 | 0.91 | 0.91 |
Macroaverage performance measures of the annotation systems for both corpora calculated using positive and negative document-level mentions for all the acute CDC symptoms (confidence intervals are present in Supplementary Appendix E.1-2)
| UMNCor | MayoCor | |||||
|---|---|---|---|---|---|---|
| System | Precision | Recall | f1-score | Precision | Recall | f1-score |
| BioMedICUS | 0.71 | 0.75 | 0.72 | 0.73 | 0.74 | 0.73 |
| CLAMP | 0.81 | 0.81 | 0.81 | 0.79 | 0.71 | 0.74 |
| cTAKES | 0.77 | 0.82 | 0.79 | 0.75 | 0.78 | 0.76 |
| MetaMap | 0.80 | 0.82 | 0.81 | 0.75 | 0.71 | 0.73 |
| COVID-19 Gazetteer | 0.82 | 0.88 | 0.84 | 0.77 | 0.79 | 0.78 |
| MedTagger Custom | 0.77 | 0.78 | 0.77 | 0.79 | 0.80 | 0.80 |
| MedTagger | 0.80 | 0.87 | 0.82 | 0.80 | 0.75 | 0.77 |
Figure 1.Total CPU and RAM utilization over the period of execution of the annotation systems on 9000 notes. A, CPU utilization (in number of cores); B, Zoomed in view of (A); C, RAM utilization; D, Zoomed in view of (C); E, Total utilization of CPU (represented as cores*s) and RAM (represented as RAM*s). Statistics for CPU and RAM utilization were collected every 30 s and appended to a file using a bash script that queried the Kubernetes cluster.
Figure 2.Runtime of annotators for 9000 notes. The COVID-19 gazetteer had the least processing time.
Figure 3.Runtime of the COVID-19 gazetteer with increasing number of CPU cores on a given set of notes. It is observed that runtime reduced as number of cores increased for constant set of notes processed.