| Literature DB >> 21658293 |
Lana Yeganova1, Donald C Comeau, W John Wilbur.
Abstract
BACKGROUND: The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data.Entities:
Mesh:
Year: 2011 PMID: 21658293 PMCID: PMC3111592 DOI: 10.1186/1471-2105-12-S3-S6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Description of general feature types with examples
| A character of a SF matches the 1st character of a word in a LF | |
| A character of a SF matches the 1st character of a stop-word in a LF | |
| A character of a SF matches the character following a non-alphanumeric non-space character in a LF | |
| A character of a SF matches a character within a token in a LF such that token splits at that character into two substrings, one or both of which are defined words. | |
| The last character of a SF is ‘s’ and last token in a LF ends in ‘s’ or ‘i’ | |
| A letter of a SF matches a capital non-1st character letter in a LF | |
| All characters of a SF appear anywhere in a single token in a LF in the correct order | |
| Look-up table match between a character in a SF and a token in a LF | |
| A substring of a SF matches two or more consecutive characters of a token in a LF | |
Simple and compound features generated for the PSF-PLF pair
| 2FC3, 3FC3, 7FC3 | |
| 2FCG2 | |
| 5FC1, 1FC1 | |
| *FC(+0)FCG, *FC(+1)FCG, *FC(+5)FCG | |
| FCG(-3)FC, FCG(+1)FC | |
| *FC(-3)FC$, *FC(-2)FC$, *FC(+2)FC$, *FC (+1)FC$, *FC (+6)FC$, *2FC (+2)FC$ | |
Simple Features and Corresponding Candidate Definitions
| H | D | L | Candidate Definition | |
|---|---|---|---|---|
| CD1 | 2FC3 | 2FCG2 | 1FC1 | |
| CD2 | 3FC3 | 2FCG2 | 4FC1 | |
| CD3 | 2FC3 | 2FCG2 | 5FC1 |
Comparison of NatLAb with Ab3P system on the Medstract Corpus
| Medstract | Precision | Recall | F-measure |
|---|---|---|---|
| NatLAb | 93% | 95% | 94% |
| Ab3P | 97% | 85% | 91% |
Comparison of NatLAb and Modified NatLAb with BIOADI and Ab3P on the Ab3P Corpus
| Ab3P Corpus | Precision | Recall | F-measure |
|---|---|---|---|
| Modified NatLAb | 93.56% | 89.27% | 91.36% |
| NatLAb | 91.61% | 87.63% | 89.58% |
| Ab3P | 96.50% | 83.20% | 89.36% |
| BIOADI | 95.86% | 84.64% | 89.90% |
Comparison of NatLAb and Modified NatLAb with BIOADI on the BIOADI Corpus
| BIOADI Corpus | Precision | Recall | F-measure |
|---|---|---|---|
| Modified NatLAb | 91.93% | 82.81% | 87.13% |
| NatLAb | 90.74% | 82.25% | 86.29% |
| BIOADI | 93.52% | 79.95% | 86.20% |