| Literature DB >> 33245284 |
Diwakar Mahajan1, Ananya Poddar1, Jennifer J Liang1, Yen-Ting Lin2, John M Prager3, Parthasarathy Suryanarayanan1, Preethi Raghavan1, Ching-Huei Tsou1.
Abstract
BACKGROUND: Although electronic health records (EHRs) have been widely adopted in health care, effective use of EHR data is often limited because of redundant information in clinical notes introduced by the use of templates and copy-paste during note generation. Thus, it is imperative to develop solutions that can condense information while retaining its value. A step in this direction is measuring the semantic similarity between clinical text snippets. To address this problem, we participated in the 2019 National NLP Clinical Challenges (n2c2)/Open Health Natural Language Processing Consortium (OHNLP) clinical semantic textual similarity (ClinicalSTS) shared task.Entities:
Keywords: deep learning; electronic health records; multi-task learning; natural language processing; semantic textual similarity; transfer learning
Year: 2020 PMID: 33245284 PMCID: PMC7732709 DOI: 10.2196/22508
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Two sample clinical notes for the same patient from consecutive visits. Plain text indicates same content between 2 notes; italics (yellow highlight) indicate the content that has been modified, and bold (blue highlight) indicates new content in the second note.
Sample sentence pairs and annotations from the clinical semantic textual similarity data set.
| Ground trutha | Score | Observations | ||
| Sentence 1 | Sentence 2 |
| Domain dependence | Comments |
| “The patient was taken to the | “The patient was taken to the | 5.0 | Domain specific | Clinical abbreviations |
| “ | “ | 3.5 | Domain specific | Medication instruction parsing |
| “Cardiovascular assessment findings include | “Cardiovascular assessment findings include | 3.0 | Domain specific | Medical concept similarity and medical concept mapping |
| “He was | “The affected shoulder was | 3.0 | Domain independent | Alignment |
| “Musculoskeletal: | “Musculoskeletal: | 1.5 | Domain independent | Assertion classification (polarity) |
aItalics indicate the phrases within each sentence which correspond to the observations.
bPACU: post anesthesia care unit.
cHFA: hydrofluoroalkane.
dAV: atrioventricular.
Figure 2Comparison of traditional machine learning approach (left), where performance evaluation and error analysis lead to feature selection, and our proposed iterative training using multi-task learning approach (right), where performance evaluation and error analysis lead to data set selection.
Data sets used in multi-task learning.
| Data set | Task | Domain | Size | Example |
| STS-Ba | Sentence pair similarity | General | 8600 | Sentence 1: “A young child is riding a horse”; Sentence 2: “A child is riding a horse”; Similarity: 4.75 |
| RQEb | Sentence pair classification | Biomedical | 8900 | Sentence 1: “Doctor X thinks he is probably just a normal 18 month old but would like to know if there are a certain number of respiratory infections that are considered normal for that age”; Sentence 2: “Probably a normal 18 month old but how many respiratory infections are normal”; Ground truth: entailment |
| MedNLIc | Sentence pair classification | Clinical | 14,000 | Sentence 1: “Labs were notable for Cr 1.7 (baseline 0.5 per old records) and lactate 2.4”; Sentence 2: “Patient has normal Cr”; Ground truth: contradiction |
| QQPd | Sentence pair classification | General | 400,000 | Sentence 1: “Why do rockets look white?”; Sentence 2: “Why are rockets and boosters painted white?”; Ground truth: 1 |
| Topic | Sentence classification | Clinical | 1,300,000 | Sentence: “Negative for difficulty urinating, pain with urination, and frequent urination”; Ground truth: SIGNORSYMPTOM |
| MedNERe | Token-wise classification | Clinical | 15,000 | Sentence: “he developed respiratory distress on the AMf of admission, cough day PTAg, CXRh with B/Li LLj PNAk, started ciprofloxacin and levofloxacin”; Ground truth: ciprofloxacin [DRUG] levofloxacin [DRUG] |
aSTS-B: semantic textual similarity benchmark.
bRQE: Recognizing Question Entailment.
cMedNLI: natural language inference data set for the clinical domain.
dQQP: Quora Question Pairs.
eMedNER: medication named entity recognition.
fAM: morning.
gPTA: prior to admission.
hCXR: chest x-ray.
iB/L: bilateral.
jLL: left lower.
kPNA: pneumonia.
Figure 3Intermediate multi-task learning and fine-tuning architecture. ClinicalSTS: clinical semantic textual similarity; STS-B: semantic textual similarity benchmark; RQE: recognizing question entailment; MedNLI: natural language inference data set for the clinical domain; QQP: Quora question pairs; MedNER: medication named entity recognition data set; ClinicalBERT: bidirectional encoder representations from transformers on clinical text mining.
Pretrained language models used in the ensemble module and their training corpora.
| Language model | Corpora for language model pretraining | Domain |
| MT-DNNa | Wikipedia+BookCorpus | General |
| RoBERTab | Wikipedia+BookCorpus+CC-News+OpenWebText+Stories | General |
| BioBERTc | Wikipedia+BookCorpus+PubMed+PMCd | Biomedical |
| IIT-MTL-ClinicalBERTe | Wikipedia+BookCorpus+MIMIC-IIIf | Clinical |
aMT-DNN: multi-task deep neural networks.
bRoBERTa: robustly optimized bidirectional encoder representations from transformers approach.
cBioBERT: bidirectional encoder representations from transformers for biomedical text mining.
dPMC: PubMed Central
eIIT-MTL-ClinicalBERT: iteratively trained using multi-task learning on ClinicalBERT.
fMIMIC-III: Medical Information Mart for Intensive Care.
Figure 4Overview of our end-to-end system. ClinicalBERT: bidirectional encoder representations from transformers on clinical text; IIT-MTL-ClinicalBERT: iterative intermediate training using multi-task learning on ClinicalBERT; MT-DNN: multi-task deep neural networks; RoBERTa: robustly optimized BERT approach; BioBERT: bidirectional encoder representations from transformers for biomedical text mining.
Results of each iteration of iterative intermediate training using multi-task learning.
| Experiment and language model | Data sets used for iterative intermediate training approach using multi-task learning | Pearson correlation coefficient on internal test | ||||||
|
| STS-Ba | RQEb | MedNLIc | Topic | MedNERd | QQPe |
| |
|
| ||||||||
|
| 1 BERTg | —h | — | — | — | — | — | 0.834 |
|
| 2 ClinicalBERTi | — | — | — | — | — | — | 0.848 |
|
| ||||||||
|
| 1 ClinicalBERT | ✓k | — | — | — | — | — | 0.852 |
|
| 2 ClinicalBERT | ✓ | ✓ | ✓ | — | — | — | 0.862 |
|
| 3 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | — | — | 0.866 |
|
| 4 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | ✓ | — |
|
|
| 5 ClinicalBERT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 0.856 |
aSTS-B: semantic textual similarity benchmark.
bRQE: Recognizing Question Entailment.
cMedNLI: Natural Language Inference data set for the clinical domain.
dMedNER: Medication-NER data set.
eQQP: Quora Question Pair data set.
fBL: baseline.
gBERT: bidirectional encoder representations from transformers.
hIndicates data set was not used for this experiment.
iClinicalBERT: bidirectional encoder representations from transformers on clinical text mining.
jIter: iteration.
kIndicates data sets that were trained together in multi-task learning.
lItalics signify highest Pearson correlation coefficient obtained on internal test data set.
Ablation study of language models utilized in the ensemble module. The statistical mean of the language model outputs was used as the ensembling method.
| Experiment | Language model ensemble | Pearson correlation coefficient on internal test | |||
|
| IIT-MTL-ClinicalBERTa | BioBERTb | MT-DNNc | RoBERTad |
|
| 1 | ✓e | —f | — | — | 0.8711 |
| 2 | — | ✓ | — | — | 0.8707 |
| 3 | — | — | ✓ | — | 0.8685 |
| 4 | — | — | — | ✓ | 0.8578 |
| 5 | ✓ | ✓ | — | — | 0.8754 |
| 6 | — | ✓ | ✓ | — | 0.8780 |
| 7 | — | — | ✓ | ✓ | 0.8722 |
| 8 | ✓ | — | — | ✓ | 0.8741 |
| 9 | ✓ | — | ✓ | — | 0.8796 |
| 10 | — | ✓ | — | ✓ | 0.8720 |
| 11 | ✓ | ✓ | ✓ | — |
|
| 12 | — | ✓ | ✓ | ✓ | 0.8769 |
| 13 | ✓ | — | ✓ | ✓ | 0.8787 |
| 14 | ✓ | ✓ | — | ✓ | 0.8764 |
| 15 | ✓ | ✓ | ✓ | ✓ | 0.8795 |
aIIT-MTL-ClinicalBERT: iterative intermediate training using multi-task learning on ClinicalBERT.
bBioBERT: bidirectional encoder representations from transformers for biomedical text mining.
cMT-DNN: multi-task deep neural networks.
dRoBERTa: robustly optimized bidirectional encoder representations from transformers approach.
eIndicates which language models are included in the ensemble.
fIndicates language model was not used for this experiment.
gItalics signify the highest Pearson correlation coefficient obtained on internal test data set.
End-to-end ensemble module and official submission results.
| Components | Pearson correlation coefficient on internal testa | Pearson correlation coefficient on external testa | |||||||
| Mean | LRb | BRc | RRd | Mean | LR | BR | RR | ||
| IIT-MTL-ClinicalBERTe & MT-DNNf & BioBERTg |
| 0.8796 | 0.8795 | 0.8796 |
| 0.8978 | 0.8978 | 0.8978 | |
| + medication features | N/Ah |
| 0.8832 | 0.8831 | N/A |
| 0.8997 | 0.8975 | |
| + domain-specific and phrasal similarity features | N/A | 0.8733 | 0.8741 |
| N/A | 0.8861 | 0.8920 |
| |
aItalics signify the Pearson correlation coefficient obtained on the internal and external test data set corresponding to the three configurations (components and ensemble method) that were our official submissions to the 2019 n2c2/OHNLP challenge.
bLR: linear regression.
cBR: Bayesian regression.
dRR: ridge regression.
eIIT-MTL-ClinicalBERT: iterative intermediate training using multi-task learning on ClinicalBERT.
fMT-DNN: multi-task deep neural networks.
gBioBERT: bidirectional encoder representations from transformers for biomedical text mining.
hN/A: not applicable.
Sample sentence pairs with ground truth annotations and predictions from three language models used in the final ensembled system.
| Sentence 1 | Sentence 2 | Ground Truth | Predictions | |||
|
|
|
| IIT-MTL-ClinicalBERTa | BioBERTb | MT-DNNc | RoBERTad |
| “The following consent was read to the patient and accepted to order testing.” | “We explained the risks, benefits, and alternatives, and the patient agreed to proceed.” | 2.5 | 0.61 | 1.01 | 2.15 | 2.51 |
| “Negative for coughing up blood, coughing up mucus (phlegm) and wheezing.” | “Negative for abdominal pain, blood in stool, constipation, diarrhea and vomiting.” | 0.5 | 1.04 | 1.18 | 2.34 | 1.74 |
aIIT-MTL-ClinicalBERT: iterative intermediate training using multi-task learning on ClinicalBERT.
bBioBERT: bidirectional encoder representations from transformers for biomedical text mining.
cMT-DNN: multi-task deep neural networks.
dRoBERTa: robustly optimized bidirectional encoder representations from transformers approach.