| Literature DB >> 19025690 |
Peter Corbett1, Ann Copestake.
Abstract
BACKGROUND: Chemical named entities represent an important facet of biomedical text.Entities:
Mesh:
Year: 2008 PMID: 19025690 PMCID: PMC2586753 DOI: 10.1186/1471-2105-9-S11-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Named Entity Types
| Type | Description | Example | ||
| CM | compound | citric acid | 6865 | 4494 |
| RN | reaction | methylation | 288 | 401 |
| CJ | adjective | pyrazolic | 60 | 87 |
| ASE | enzyme | demethylase | 31 | 181 |
| CPR | prefix | 1,3- | 53 | 21 |
n= number in Chemistry corpus, n= number in PubMed corpus.
Figure 1Evaluation on chemistry papers.
F scores (at confidence threshold of 0.3) and Mean Average Precision (MAP) values for Figs. 1–5.
| Corpus | System | MAP | |
| Chemistry | Full | 87.1% | 80.8% |
| Chemistry | No Rescorer | 86.8% | 81.0% |
| Chemistry | No Preclassifier | 82.7% | 74.8% |
| Chemistry | No n-Grams | 79.2% | 72.2% |
| Chemistry | Custom LingPipe | 75.9% | 74.6% |
| Chemistry | Pure LingPipe | 66.9% | 63.2% |
| Chemistry | No Overlaps | 82.9% | 80.8% |
| Chemistry | CM | 87.0% | 81.2% |
| Chemistry | RN | 74.5% | 73.4% |
| Chemistry | CJ | 90.0% | 92.0% |
| Chemistry | ASE | 17.4% | 36.2% |
| PubMed | Full | 86.1% | 83.2% |
| PubMed | No Rescorer | 83.3% | 79.1% |
| PubMed | No Preclassifier | 81.4% | 73.4% |
| PubMed | No n-Grams | 77.6% | 70.6% |
| PubMed | Custom LingPipe | 78.6% | 75.6% |
| PubMed | Pure LingPipe | 71.9% | 66.1% |
| PubMed | CM | 85.6% | 82.3% |
| PubMed | RN | 95.3% | 93.2% |
| PubMed | CJ | 78.7% | 83.1% |
| PubMed | ASE | 83.4% | 86.0% |
Figure 2Evaluation on PubMed abstracts.
Figure 3Evaluation on chemistry papers, showing effects of disallowing overlapping entities.
Figure 4Evaluation on chemistry papers, showing performance on different named entity classes.
Figure 5Evaluation on PubMed abstracts, showing performance on different named entity classes.