| Literature DB >> 24112363 |
Irena Spasić1, Mark Greenwood, Alun Preece, Nick Francis, Glyn Elwyn.
Abstract
BACKGROUND: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation.Entities:
Year: 2013 PMID: 24112363 PMCID: PMC3853334 DOI: 10.1186/2041-1480-4-27
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Sample output of FlexiTerm. A ranked list of terms and their variants based on their termhood scores.
Figure 2Annotated occurrences of terms recognised by FlexiTerm. The annotations are visualised using MinorThird.
Data sets used in evaluation
| 1 | molecular biology | abstract | PubMed | |
| 2 | COPD | abstract | PubMed | " |
| 3 | COPD | blog post | Web | |
| 4 | obesity, diabetes | clinical narrative | i2b2 | N/A |
| 5 | knee MRI scan | clinical narrative | NHS | N/A |
Qualitative description of the corpora.
Data sets used in evaluation
| 1 | 145 | 100 | 906 | 24,096 | 3,430 | 2,720 |
| 2 | 150 | 100 | 949 | 26,174 | 3,837 | 3,049 |
| 3 | 169 | 100 | 1,949 | 40,461 | 4,404 | 3,422 |
| 4 | 300 | 100 | 3,022 | 55,845 | 5,402 | 4,504 |
| 5 | 73 | 100 | 960 | 13,093 | 946 | 824 |
Quantitative description of the corpora.
Contingency tables for inter–annotator agreement
| | | |||
| | ||||
| | ||||
| | | | ||
| | | |||
| | ||||
General structure of a contingency table, where n and p annotate the total numbers and proportions respectively.
Contingency tables for inter–annotator agreement on data set 1
| | | |||
| 11,948 | 346 | 12,294 | ||
| | 1,664 | 10,138 | 11,802 | |
| | 13,612 | 10,484 | 24,096 | |
| | | | ||
| | | |||
| 0.496 | 0.014 | 0.510 | ||
| | 0.069 | 0.421 | 0.490 | |
| 0.565 | 0.435 | 1 | ||
Agreement at the token level.
Contingency tables for inter–annotator agreement on data set 2
| | | |||
| 7,256 | 1,100 | 8,356 | ||
| | 1,062 | 16,756 | 17,818 | |
| | 8,318 | 17,856 | 26,174 | |
| | | | ||
| | | |||
| 0.277 | 0.042 | 0.319 | ||
| | 0.041 | 0.640 | 0.681 | |
| 0.318 | 0.682 | 1 | ||
Agreement at the token level.
Contingency tables for inter–annotator agreement on data set 3
| | | |||
| 2,325 | 204 | 2,529 | ||
| | 436 | 37,496 | 37,932 | |
| | 2,761 | 37,700 | 40,461 | |
| | | | ||
| | | |||
| 0.057 | 0.005 | 0.062 | ||
| | 0.011 | 0.927 | 0.938 | |
| 0.068 | 0.932 | 1 | ||
Agreement at the token level.
Contingency tables for inter–annotator agreement on data set 4
| | | |||
| 14,396 | 1,454 | 15,850 | ||
| | 2,269 | 37,726 | 39,995 | |
| | 16,665 | 39,180 | 55,845 | |
| | | | ||
| | | |||
| 0.258 | 0.026 | 0.284 | ||
| | 0.040 | 0.676 | 0.716 | |
| 0.298 | 0.702 | 1 | ||
Agreement at the token level.
Contingency tables for inter–annotator agreement on data set 5
| | | |||
| 5,312 | 278 | 5,590 | ||
| | 252 | 7,251 | 7,503 | |
| | 5,564 | 7,529 | 13,093 | |
| | | | ||
| | | |||
| 0.406 | 0.021 | 0.427 | ||
| | 0.019 | 0.554 | 0.573 | |
| 0.425 | 0.575 | 1 | ||
Agreement at the token level.
Inter–annotator agreement
| 1 | 0.917 | 0.501 | 0.834 |
| 2 | 0.917 | 0.566 | 0.809 |
| 3 | 0.984 | 0.878 | 0.869 |
| 4 | 0.934 | 0.587 | 0.840 |
| 5 | 0.960 | 0.511 | 0.918 |
The values of three agreement measures.
Figure 3The size and distribution of data sets. Comparison of terminological and non terminological content.
Figure 4Evaluation results. Comparison to the baseline method with respect to the precision, recall and F-measure. The horizontal axis represents the number of proposed terms k (k = 10, 20, …, 500).
A comparison to the baseline on data set 1
| 1 | transcription factor | t cell |
| transcription factors | ||
| transcriptional factors | ||
| 2 | nf-kappa b | transcription factor |
| 3 | gene expression | nf-kappa b |
| expression of genes | ||
| 4 | transcriptional activity | gene expression |
| activator of transcription | ||
| transcriptional activation | ||
| activating transcription | ||
| activators of transcription | ||
| transcription activation | ||
| transcriptional activator | ||
| 5 | nf-kappab activation | cell line |
| nf-kappab activity | ||
| 6 | human t cells | t lymphocyte |
| human cells | ||
| 7 | cell lines | human monocyte |
| cell line | ||
| 8 | human monocytes | dna binding |
| 9 | activation of nf-kappa b | tyrosine phosphorylation |
| nf-kappa b activation | ||
| nf-kappa b activity | ||
| 10 | protein kinase | b cell |
Top 10 ranked terms by the two methods.
A comparison to the baseline on data set 2
| 1 | chronic obstructive pulmonary disease | chronic obstructive pulmonary disease |
| 2 | patients with copd | obstructive pulmonary disease |
| copd patients | ||
| 3 | pulmonary disease | pulmonary disease |
| 4 | acute exacerbation | copd patient |
| acute exacerbations | ||
| 5 | copd exacerbation | acute exacerbation |
| copd exacerbations | ||
| exacerbations of copd | ||
| exacerbation of copd | ||
| 6 | patients with chronic obstructive pulmonary disease | severe copd |
| patients with chronic obstructive pulmonary diseases | ||
| 7 | lung function | copd exacerbation |
| 8 | exacerbations of chronic obstructive pulmonary disease | lung function |
| chronic obstructive pulmonary disease exacerbations | ||
| exacerbation of chronic obstructive pulmonary disease | ||
| 9 | quality of life | airway inflammation |
| 10 | airway inflammation | exercise capacity |
Top 10 ranked terms by the two methods.
A comparison to the baseline on data set 3
| 1 | pulmonary rehab | pulmonary rehab |
| pulmanory rehab | ||
| 2 | breathe easy | breathe easy |
| 3 | vitamin d | vitamin d |
| 4 | lung transplantation | lung function |
| lung transplant | ||
| lung transplants | ||
| lung transplantations | ||
| 5 | breathe easy groups | severe copd |
| breath easy groups | ||
| breathe easy group | ||
| 6 | chest infection | blood pressure |
| chest infections | ||
| 7 | quality of life | lung disease |
| 8 | blood pressure | lung transplant |
| 9 | lung function | chest infection |
| 10 | rehab room | rehab room |
Top 10 ranked terms by the two methods.
A comparison to the baseline on data set 4
| 1 | hospital course | hospital course |
| course of hospitalization | ||
| 2 | chest pain | present illness |
| 3 | shortness of breath | chest pain |
| 4 | coronary artery | coronary artery |
| coronary arteries | ||
| 5 | present illness | blood pressure |
| 6 | blood pressure | ejection fraction |
| blood pressures | ||
| 7 | coronary artery disease | coronary artery disease |
| 8 | congestive heart failure | myocardial infarction |
| 9 | myocardial infarction | congestive heart failure |
| 10 | ejection fraction | cardiac catheterization |
Top 10 ranked terms by the two methods.
A comparison to the baseline on data set 5
| 1 | mri knee | collateral ligament |
| 2 | collateral ligaments | medial meniscus |
| 3 | medial meniscus | lateral meniscus |
| medial mensicus | | |
| 4 | lateral meniscus | hyaline cartilage |
| 5 | hyaline cartilage | posterior horn |
| 6 | posterior horn | femoral condyle |
| 7 | joint effusion | joint effusion |
| 8 | mri rt knee | mri lt knee |
| mri knee rt | ||
| 9 | mri lt knee | lateral femoral condyle |
| mri knee lt | ||
| 10 | lateral femoral condyle | medial femoral condyle |
Top 10 ranked terms by the two methods.
Computational performance
| 1 | 14 sec | 101 sec |
| 2 | 13 sec | 96 sec |
| 3 | 10 sec | 59 sec |
| 4 | 26 sec | 290 sec |
| 5 | 12 sec | 32 sec |
Completion times across five datasets.