| Literature DB >> 15661078 |
Ran Rubinstein1, Itamar Simon.
Abstract
BACKGROUND: High-throughput genomic research tools are becoming standard in the biologist's toolbox. After processing the genomic data with one of the many available statistical algorithms to identify statistically significant genes, these genes need to be further analyzed for biological significance in light of all the existing knowledge. Literature mining--the process of representing literature data in a fashion that is easy to relate to genomic data--is one solution to this problem.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15661078 PMCID: PMC547913 DOI: 10.1186/1471-2105-6-12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 3Analysis of a list of genes affected by p53 overproduction. A. The number of genes remaining after filtering the p53-affected genes with terms intended to reveal known p53 targets. B. Average number of articles per gene in the different queries. C. Venn diagram depicting the different functions of p53 affected genes as reflected by a GeneRIF search. 1Search term is "p53 AND (target OR transcriptional OR activation OR repression)"
Figure 1The MILANO data input page .
Summary of Medline hit counts for all the full length mRNA genes (16,862 genes) using different search strategies.
| Type of primary terma | Positive resultsb | Non reasonable resultsc | Articles per gened |
| Symbol | 10,045 | 20 | 198 |
| Expanded | 12,028 | 140 | 817 |
| Filtered | 11,910 | 22 | 451 |
aThe Medline search was conducted using three searching strategies: Symbol refers to a search in which each gene was represented by its official symbol; Expanded refers to searches in which each gene was represented by the gene symbol, all its synonyms and the official gene product name; Filtered refers to searches in which non informative names were filtered out of the expanded list.
bNumber of queries that returned at least one result.
cNumber of queries that returned more than 33,000 results. We used 33,000 as a rough estimate of non reasonable results based on the fact that some of the most investigated genes, like p53, appear in less than 33,000 abstracts.
dThe average number of abstracts per gene counting only genes that appeared at least once and did not appear in more than 33,000 abstracts.
Figure 2An example of a result of MILANO search using a short list of gene symbols that were expanded by the program to include all their informative synonyms versus p53 related terms. All reported numbers are hyperlinked and will initiate a new search for that specific term combination.
Comparative analysis of literature mining tools. Eleven known p53 target genes were analyzed using five methods. The numbers represent the number of reoccurrences of each gene with the term "P53".
| CDKN1A | ||||||
| GADD45A | ||||||
| SFN | 0 | |||||
| IGFBP3 | ||||||
| TNFRSF6 | ||||||
| MUC2 | ||||||
| MYC | ||||||
| PCNA | ||||||
| ACTA2 | 0 | 0 | 0 | 0 | ||
| XRCC5 | ||||||
| TRAF4 |
aThe search was performed with LocusLink ids as the primary search terms.
bThe search was performed with the primary gene symbols as the primary search terms.
cThe search was performed with UniGene ids as the primary search terms.