| Literature DB >> 25254099 |
Abstract
Collection of documents annotated with semantic entities and relationships are crucial resources to support development and evaluation of text mining solutions for the biomedical domain. Here I present an overview of 36 corpora and show an analysis on the semantic annotations they contain. Annotations for entity types were classified into six semantic groups and an overview on the semantic entities which can be found in each corpus is shown. Results show that while some semantic entities, such as genes, proteins and chemicals are consistently annotated in many collections, corpora available for diseases, variations and mutations are still few, in spite of their importance in the biological domain.Entities:
Year: 2014 PMID: 25254099 PMCID: PMC4168744 DOI: 10.12688/f1000research.3216.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Overview of the corpora: main publication, year of publication, citations in Google Scholar (as December-2013) and the URL are shown for each corpus.
| Corpus | Ref. | Year | Cit. | URL |
|---|---|---|---|---|
| AIMed | [
| 2005 | 270 |
|
| AnEM | [
| 2012 | 9 |
|
| AZDC | [
| 2009 | 19 |
|
| Bact. Gene Int. | [
| 2012 | 11 |
|
| BioCreative GM | [
| 2008 | 126 |
|
| BioInfer | [
| 2007 | 246 |
|
| CellFinder | [
| 2012 | 5 |
|
| CG | [
| 2013 | 3 |
|
| CHEMDNER | [
| 2013 | 7 |
|
| CRAFT | [
| 2012 | 17 |
|
| Craven | [
| 1999 | 374 |
|
| DDI | [
| 2013 | 0 |
|
| EBI Disease | [
| 2008 | 66 |
|
| EDGAR | [
| 2000 | 395 |
|
| EPI | [
| 2012 | 14 |
|
| EU-ADR | [
| 2012 | 4 |
|
| GeneReg | [
| 2010 | 11 |
|
| Genia | [
| 2003 | 575 |
|
| Genia Ev. Extr. | [
| 2008 | 236 |
|
| GETM | [
| 2010 | 13 |
|
| GREC | [
| 2009 | 53 |
|
| HPRD50 | [
| 2007 | 268 |
|
| ID | [
| 2012 | 14 |
|
| IEPA | [
| 2002 | 208 |
|
| Linnaeus | [
| 2010 | 79 |
|
| LLL | [
| 2005 | 163 |
|
| Metab. Enzym. | [
| 2011 | 14 |
|
| MutationFinder | [
| 2007 | 83 |
|
| Nagel | [
| 2009 | 12 |
|
| NCBI Disease | [
| 2012 | 10 |
|
| OSIRIS | [
| 2008 | 20 |
|
| PC | [
| 2013 | 4 |
|
| PICAD | [
| 2011 | 1 |
|
| SCAI | [
| 2008 | 57 |
|
| SNPCorpus | [
| 2011 | 3 |
|
| Species | [
| 2013 | 1 |
|
Figure 1. Classification of the corpora according to the semantic annotations they contain.