| Literature DB >> 19465383 |
Conrad Plake1, Loic Royer, Rainer Winnenburg, Jörg Hakenberg, Michael Schroeder.
Abstract
High-throughput screens such as microarrays and RNAi screens produce huge amounts of data. They typically result in hundreds of genes, which are often further explored and clustered via enriched GeneOntology terms. The strength of such analyses is that they build on high-quality manual annotations provided with the GeneOntology. However, the weakness is that annotations are restricted to process, function and location and that they do not cover all known genes in model organisms. GoGene addresses this weakness by complementing high-quality manual annotation with high-throughput text mining extracting co-occurrences of genes and ontology terms from literature. GoGene contains over 4,000,000 associations between genes and gene-related terms for 10 model organisms extracted from more than 18,000,000 PubMed entries. It does not cover only process, function and location of genes, but also biomedical categories such as diseases, compounds, techniques and mutations. By bringing it all together, GoGene provides the most recent and most complete facts about genes and can rank them according to novelty and importance. GoGene accepts keywords, gene lists, gene sequences and protein sequences as input and supports search for genes in PubMed, EntrezGene and via BLAST. Since all associations of genes to terms are supported by evidence in the literature, the results are transparent and can be verified by the user. GoGene is available at http://gopubmed.org/gogene.Entities:
Mesh:
Year: 2009 PMID: 19465383 PMCID: PMC2703922 DOI: 10.1093/nar/gkp429
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The GoGene web interface showing a result from a search for 400 gene IDs taken from an outcome of a microarray experiment on pancreatic cancer. On the left, all relevant concepts from GO and MeSH are shown. Clicking on a concept shows the related genes on the right. Each gene entry primarily consists of a title, a short summary and a link to a detailed gene page, where all information on a gene is summarized.
Figure 2.The distribution of Dice coefficients (see text for explanation) from pairwise species comparison. As expected, annotation sets of non-orthologous gene pairs show significantly lower Dice coefficients than from orthologous gene pairs.
Figure 3.Disease associations mined from literature for all genes (top) compared to disease associations for known cell-cycle genes. Cell-cycle genes are enriched in neoplasms and depleted in nutritional and metabolic diseases.