| Literature DB >> 23703206 |
Chih-Hsuan Wei1, Hung-Yu Kao, Zhiyong Lu.
Abstract
Manually curating knowledge from biomedical literature into structured databases is highly expensive and time-consuming, making it difficult to keep pace with the rapid growth of the literature. There is therefore a pressing need to assist biocuration with automated text mining tools. Here, we describe PubTator, a web-based system for assisting biocuration. PubTator is different from the few existing tools by featuring a PubMed-like interface, which many biocurators find familiar, and being equipped with multiple challenge-winning text mining algorithms to ensure the quality of its automatic results. Through a formal evaluation with two external user groups, PubTator was shown to be capable of improving both the efficiency and accuracy of manual curation. PubTator is publicly available at http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/.Entities:
Mesh:
Year: 2013 PMID: 23703206 PMCID: PMC3692066 DOI: 10.1093/nar/gkt441
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Text-mining tools used for pre-annotating bio-entities in PubMed articles
| Bio-entity | Text-mining tool | Nomenclature | F1 score (%) |
|---|---|---|---|
| Gene (mention) | GeneTUKit | N/A | 82.97 |
| Gene (normalization) | GenNorm | NCBI Gene | 92.89 |
| Disease | DNorm | MEDIC | 80.90 |
| Species | SR4GN | NCBI Taxonomy | 85.42 |
| Chemical | A dictionary-based lookup approach | MeSH | 53.82 |
| Mutation | tmVar | NCBI dbSNP (rs#) or tmVar normalized forms | 93.98 |
The reported F1 scores (http://en.wikipedia.org/wiki/F1_score) of different tools were either taken from their corresponding publications or assessed by us on public benchmarking datasets. MEDIC is a disease vocabulary created by Comparative Toxicogenomics Database. All other vocabularies are products of National Library of Medicine. Separate tools are used for identifying gene names in abstracts (mention) and assigning NCBI Gene identifiers to those mentions (normalization).
Figure 1.The PubTator homepage with five different search options.
Figure 2.The PubTator search results page. Automatically computed entities are highlighted in colours. Unlike PubMed, article abstracts can be displayed here without going to a different page.
Figure 3.The PubTator annotation page. The two radio buttons (Curatable/Not Curatable) at the top of the page is designed for document triage. The text box and the table below are used for entity annotation. The relationship table at the bottom of the page is for relationship annotation. In Mention View, each row corresponds to an entity mention. In Concept View (default), different mentions of the same concept (i.e. having the same identifier) are combined and displayed in the same row.