| Literature DB >> 32569358 |
Xiao Dong1, Jianfu Li1, Ekin Soysal1, Jiang Bian2, Scott L DuVall3,4, Elizabeth Hanchrow5,6, Hongfang Liu7, Kristine E Lynch3,4, Michael Matheny5,6, Karthik Natarajan8,9, Lucila Ohno-Machado10,11, Serguei Pakhomov12, Ruth Madeleine Reeves5,6, Amy M Sitapati10,13, Swapna Abhyankar14, Theresa Cullen14, Jami Deckard14, Xiaoqian Jiang1, Robert Murphy1, Hua Xu1.
Abstract
Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on coronavirus disease 2019 (COVID-19). Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC (Logical Observation Identifiers Names and Codes) codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from 8 healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online Web application for end users (https://clamp.uth.edu/covid/loinc.php). We believe that it will be a useful tool to support secondary use of EHRs for research on COVID-19.Entities:
Keywords: COVID-19; COVID-19 TestNorm; LOINC; natural language processing; testing name normalization
Mesh:
Year: 2020 PMID: 32569358 PMCID: PMC7337837 DOI: 10.1093/jamia/ocaa145
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.An overview of the COVID-19 TestNorm system. COVID-19: coronavirus disease 2019; LOINC: Logical Observation Identifiers Names and Codes.
Semantic categories used by COVID-19 TestNorm
| LOINC Axes | Fine Entity Types | Example Values |
|---|---|---|
| Component | Covid19 | “COVID-19,” “SARS-COV-2” |
| Covid19_Related | “SARS-related CoV,” “SARS-like CoV” | |
| RNA_Comp | “RNA,” “N gene,” “RdRp gene” | |
| Sequence_Comp | “Whole genome” | |
| Antigen_Comp | “Ag,” “Antigen” | |
| Growth_Comp | “Organism” | |
| Antibody_Comp | “Ab,” “Antibody,” “IgM,” “IgG” | |
| Interpretation_Comp | “Interpretation,” “Recent infection” | |
| System | Blood | “Blood,” “Serum,” “Plasma” |
| Respiratory | “NARES,” “NASAL MUCUS” | |
| NP | “NP,” “Swab,” “NASOPHARYNX” | |
| Saliva | “SALIVA,” “ORAL FLUID” | |
| Other | “UNSPECIFIED,” “UNKNOWN SPECIMEN” | |
| Method | RNA_Method | “Non-probe-based,” “NAA,” “PCR” |
| Sequence_Method | “Sequencing” | |
| Antigen_Method | “Rapid IA,” “Immunoassay,” “IA” | |
| Growth_Method | “Organism specific culture” | |
| Antibody_Method | “Rapid IA,” “Immunoassay,” “IA” | |
| Panel_Method | “Panel,” “Panl” | |
| Quantitative_Qualitative | Quantitative | “Cycle Threshold,” “viral load” |
| Qualitative | “Presence,” “Ord” | |
| Institution | Manufacturer | “Abbott” |
COVID-19: coronavirus disease 2019.
Figure 2.Coding rules for Logical Observation Identifiers Names and Codes (LOINC) mapping. COVID-19: coronavirus disease 2019; IVD: in vitro diagnostics; NAA: nucleic acid amplification.
Distribution of mapped LOINC codes
| LOINC Code | Total | Percentage | LOINC Long Common Name |
|---|---|---|---|
| Molecular | |||
| 94759-8 | 240 | 42.25 | SARS-CoV-2 (COVID19) RNA [Presence] in Nasopharynx by NAA with probe detection |
| 94500-6 | 202 | 35.56 | SARS-CoV-2 (COVID19) RNA [Presence] in Respiratory specimen by NAA with probe detection |
| 94309-2 | 75 | 13.20 | SARS-CoV-2 (COVID19) RNA [Presence] in Unspecified specimen by NAA with probe detection |
| 94502-2 | 13 | 2.29 | SARS-related coronavirus RNA [Presence] in Respiratory specimen by NAA with probe detection |
| 94660-8 | 11 | 1.94 | SARS-CoV-2 (COVID19) RNA [Presence] in Serum or Plasma by NAA with probe detection |
| Antibody | |||
| 94563-4 | 10 | 1.76 | SARS-CoV-2 (COVID19) IgG Ab [Presence] in Serum or Plasma by Immunoassay |
| 94564-2 | 4 | 0.70 | SARS-CoV-2 (COVID19) IgM Ab [Presence] in Serum or Plasma by Immunoassay |
| 94762-2 | 2 | 0.35 | SARS-CoV-2 (COVID19) Ab [Presence] in Serum or Plasma by Immunoassay |
| 94504-8 | 2 | 0.35 | SARS-CoV-2 (COVID19) Ab panel - Serum or Plasma by Immunoassay |
| 94505-5 | 2 | 0.35 | SARS-CoV-2 (COVID19) IgG Ab [Units/volume] in Serum or Plasma by Immunoassay |
| 94507-1 | 1 | 0.18 | SARS-CoV-2 (COVID19) IgG Ab [Presence] in Serum, Plasma or Blood by Rapid immunoassay |
| 94508-9 | 1 | 0.18 | SARS-CoV-2 (COVID19) IgM Ab [Presence] in Serum, Plasma or Blood by Rapid immunoassay |
| Other | |||
| 56831-1 | 4 | 0.70 | Problem associated signs and symptoms |
| 90101-7 | 1 | 0.18 | Internal control result |
LOINC: Logical Observation Identifiers Names and Codes.
Figure 3.Number of unique Logical Observation Identifiers Names and Codes (LOINC) codes by site. MAYO: Mayo Clinic; MHHS: Memorial Hermann Health System; UCSD: University of California, San Diego; UFH: University of Florida Health; UMN: University of Minnesota; UTP: University of Texas Physicians; VA: Veterans Health Affairs.