| Literature DB >> 24564403 |
Lars Olsen, Ulrich Johan Kudahl, Ole Winther, Vladimir Brusic.
Abstract
BACKGROUND: As the output of biological assays increase in resolution and volume, the body of specialized biological data, such as functional annotations of gene and protein sequences, enables extraction of higher-level knowledge needed for practical application in bioinformatics. Whereas common types of biological data, such as sequence data, are extensively stored in biological databases, functional annotations, such as immunological epitopes, are found primarily in semi-structured formats or free text embedded in primary scientific literature.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24564403 PMCID: PMC3852072 DOI: 10.1186/1471-2164-14-S5-S14
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flow chart of tasks in conceptual framework for semi-automated updating of knowledgebases.
Examples of PubMed results from a selection of keyword searches (publication data from December 1, 2009 - March 29, 2013).
| Keyword | PubMed hits |
|---|---|
| cancer OR tumor OR antigen OR epitope | 552309 |
| (tumor OR cancer) AND (antigen OR epitope) | 45517 |
| tumor AND antigen | 40525 |
| tumor antigen | 22264 |
| tumor AND antigen AND epitope | 3057 |
| tumor AND antigen AND epitope AND T cell | 852 |
| "tumor antigen" | 642 |
Figure 2Learning curve for training sets of increasing size. Initial training set consisted of 13 relevant and 13 irrelevant abstracts. Training set was increased to 260 abstracts in increments of 26 additional abstracts. Test set was fixed at 50 abstracts, 25 relevant and 25 irrelevant.
Figure 3Average frequency of the top ten most discriminative terms between relevant (above x axis) and irrelevant abstracts (below x axis). Significance of difference is based on t test of term frequency between corpora and p-values are listed between bars. Terms are stemmed to ensure completeness in term count.
Figure 4Number of entries in PubMed, UniProt/Swiss-Prot, and COSMIC. Entries in PubMed were filtered by the search term "(tumor OR cancer) AND (antigen OR epitope)".