| Literature DB >> 33245291 |
Yanshan Wang1, Sunyang Fu1, Feichen Shen1, Sam Henry2, Ozlem Uzuner2, Hongfang Liu1.
Abstract
BACKGROUND: Semantic textual similarity is a common task in the general English domain to assess the degree to which the underlying semantics of 2 text segments are equivalent to each other. Clinical Semantic Textual Similarity (ClinicalSTS) is the semantic textual similarity task in the clinical domain that attempts to measure the degree of semantic equivalence between 2 snippets of clinical text. Due to the frequent use of templates in the Electronic Health Record system, a large amount of redundant text exists in clinical notes, making ClinicalSTS crucial for the secondary use of clinical text in downstream clinical natural language processing applications, such as clinical text summarization, clinical semantics extraction, and clinical information retrieval.Entities:
Keywords: ClinicalSTS; challenge; clinical natural language processing; electronic health records; medical natural language processing; n2c2; natural language processing; semantic textual similarity; shared task
Year: 2020 PMID: 33245291 PMCID: PMC7732706 DOI: 10.2196/23375
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Flowchart of the released data set generation in the 2019 n2c2/OHNLP track on Clinical Semantic Textual Similarity. EHR: electronic health record; PHI: protected health information.
Figure 2Participation in the 2019 n2c2/OHNLP Clinical Semantic Textual Similarity (ClinicalSTS) track in comparison with the 2018 BioCreative/OHNLP Clinical STS track.
Participating teams, affiliations, and number of systems submitted by each.
| Team name | Affiliation | Number of systems |
| ASU | Arizona State University, USA | 3 |
| ChangYC | National Yang-Ming University, Taiwan | 3 |
| CLEARTeamCNRSLille | N/Aa | 3 |
| DMSS | Boston Children’s Hospital and Harvard University, USA | 3 |
| DUTIR | Dalian University of Technology, China | 2 |
| edmondzhang | Orion Health, USA | 3 |
| ezDI | ezDI Inc, USA | 4 |
| HITSZ | Harbin Institute of Technology at Shenzhen, China | 3 |
| IBMResearch | IBM Corporation, USA | 0 |
| JHU | Johns Hopkins University, USA | 4 |
| LSI_UNED | Universidad Rey Juan Carlos, Spain | 3 |
| MAH | Arizona State University, USA | 3 |
| MedDataQuest | Med Data Quest, USA | 3 |
| MICNLP | German Cancer Research Center, Germany | 3 |
| naist_sociocom | Nara Institute of Science and Technology, Japan | 3 |
| NCBI | National Center for Biotechnology Information, USA | 3 |
| nlpatvcu | Virginia Commonwealth University, George Mason University, USA | 3 |
| PUCPR | Pontifical Catholic University of Paraná, Brazil | 2 |
| QUB | Queen’s University, UK | 4 |
| SBUnlp | Stony Brook University, USA | 3 |
| superficialintelligence0405 | N/A | 3 |
| UAveiro | University of Aveiro, Portugal | 3 |
| UFL | University of Florida, USA | 3 |
| UH_RiTUAL | University of Texas at Houston, USA | 3 |
| Utah-VA | University of Utah and Veterans Affairs, USA | 3 |
| vjaneja | University of Maryland, USA | 1 |
| WSU-MQ | Western Sydney University, Australia | 3 |
| Yale | Yale University, USA | 3 |
| Yuxia | University of Melbourne, Australia | 2 |
| zhouxb | Yunnan University, China | 3 |
aN/A: not available.
Figure 3Distribution of number of words in sentence pairs in the released training and testing data sets.
Number of sentence pairs with different similarity scores in the released training and testing data sets.
| Similarity score | Training data set, n | Testing data set, n |
| [0,1) | 185 | 98 |
| [1,2) | 236 | 168 |
| [2,3) | 245 | 30 |
| [3,4) | 607 | 34 |
| [ | 369 | 82 |
Overall performance of the valid submitted systems and comparison with the previous year’s results.
| Metric | 2019 n2c2/OHNLP ClinicalSTSa, | 2018 BioCreative/OHNLP ClinicalSTS, |
| Maximum | .9010 | .8328 |
| Minimum | –.0530 | .7005 |
| Median | .8291 | .8016 |
| Mean | .7183 | .7820 |
| Standard deviation | .2260 | .0476 |
aClinicalSTS: Clinical Semantic Textual Similarity.
Performance of the top 10 teams with the corresponding best runs.
| Rank | Team | Run |
| |
| 1 | IBMResearch | LM-POSTPROCESS-RUN | .9010 | —a |
| 2 | NCBI | 1 | .8967 | .88 |
| 3 | UFL | XLNet-Run | .8864 | .40 |
| 4 | DMSS | AVERAGE-Run | .8792 | .45 |
| 5 | Yale | 3 | .8784 | .09 |
| 6 | QUB | fine_tuned_models_mean-Run | .8704 | .54 |
| 7 | MICNLP | Step1 | .8694 | <.001 |
| 8 | HITSZ | raw_ensemble | .8685 | .80 |
| 9 | SBUnlp | ensembleall | .8677 | .003 |
| 10 | JHU | BERT-w-stsb-run | .8543 | .005 |
aNot applicable.
Performance comparison (Pearson correlation coefficient) between the Epic and GE sentence pairs.
| Metric | Epic (n=223), | GE (n=189), |
| Maximum | .9148 | .9022 |
| Minimum | .0917 | .0070 |
| Median | .8377 | .7785 |
| Mean | .7792 | .6812 |
| Standard deviation | .1649 | .2257 |
Top 5 systems for sentence pairs from the Epic and GE electronic health record systems.
| Rank | Team | Run |
| |
|
| ||||
|
| 1 | Yale | 4 | .9148 |
|
| 2 | IBMResearch | LM-POSTPROCESS-RUN | .9098 |
|
| 3 | NCBI | 1 | .9020 |
|
| 4 | DMSS | AVERAGE-Run | .8949 |
|
| 5 | UFL | Assemble-Run | .8863 |
|
| ||||
|
| 1 | IBMResearch | LM-POSTPROCESS-RUN | .9022 |
|
| 2 | UFL | XLNet-Run | .9010 |
|
| 3 | NCBI | 1 | .8938 |
|
| 4 | Yale | 3 | .8796 |
|
| 5 | MICNLP | Step1 | .8576 |
Brief summary of the techniques used in the top systems.
| Team | Techniques |
| IBMResearch | Multitask learning, BioBERT, RoBert, ClinicalBERT |
| NCBI | Convolutional neural network, multitask learning, BERT |
| UFL | BERT, XLNet |
| DMSS | BERT, XLNet |
| Yale | BERT, graph convolutional neural network |
| QUB | BERT, XLNet |
| MICNLP | BERT, medication graph |
| HITSZ | BERT, cTAKES |
| SBUnlp | BERT, Unified Medical Language System |
| JHU | BERT |
| Utah-VA | Multiple natural language processing features, deep neural network |
Examples of medication-related sentence pairs in the data set.
| Examples | Score |
| sentence1: minocycline [MINOCIN] 100 mg capsule 1 capsule by mouth one time daily. | 3.0 |
| sentence1: oxycodone [ROXICODONE] 5 mg tablet 0.5-1 tablets by mouth every 4 hours as needed. | 1.0 |