| Literature DB >> 23160414 |
Chih-Hsuan Wei1, Bethany R Harris, Donghui Li, Tanya Z Berardini, Eva Huala, Hung-Yu Kao, Zhiyong Lu.
Abstract
Today's biomedical research has become heavily dependent on access to the biological knowledge encoded in expert curated biological databases. As the volume of biological literature grows rapidly, it becomes increasingly difficult for biocurators to keep up with the literature because manual curation is an expensive and time-consuming endeavour. Past research has suggested that computer-assisted curation can improve efficiency, but few text-mining systems have been formally evaluated in this regard. Through participation in the interactive text-mining track of the BioCreative 2012 workshop, we developed PubTator, a PubMed-like system that assists with two specific human curation tasks: document triage and bioconcept annotation. On the basis of evaluation results from two external user groups, we find that the accuracy of PubTator-assisted curation is comparable with that of manual curation and that PubTator can significantly increase human curatorial speed. These encouraging findings warrant further investigation with a larger number of publications to be annotated. Database URL: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/Entities:
Mesh:
Year: 2012 PMID: 23160414 PMCID: PMC3500520 DOI: 10.1093/database/bas041
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
The curation tasks and testing corpora for PubTator evaluation
| Group | Gold standard (50 abstracts) | Curation tasks |
|---|---|---|
| NLM | Sampled from the 151 gene indexing assistant test collection | Gene indexing (mention level) |
| TAIR | Sampled from all the papers reviewed by the TAIR group in December 2011 | Gene indexing (document level) |
| Document triage |
Table 2. The statistics of testing corpora for PubTator evaluation
| Gold standard | PubMed set (25 docs) | PubTator set (25 docs) |
|---|---|---|
| NLM—gene indexing | 188 Gene mentions | 172 Gene mentions |
| TAIR—gene indexing | 44 Gene identifiers | 29 Gene identifiers |
| TAIR—document triage | 13 Relevant articles | 11 Relevant articles |
Figure 1Comparison of human curation accuracy for the gene indexing task by using PubMed versus PubTator. (a) NLM mention-level results. (b) TAIR document-level results.
Figure 2Comparison of human curation accuracy for the document triage task by using PubMed versus PubTator (TAIR).
Figure 3Comparison of human curation speed for the gene indexing task by using PubMed versus PubTator. The black bars represent the standard deviation of curation time. (a) NLM results. (b) TAIR results.