| Literature DB >> 20487560 |
Xin He1, Moushumi Sen Sarma, Xu Ling, Brant Chee, Chengxiang Zhai, Bruce Schatz.
Abstract
BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered.Entities:
Mesh:
Year: 2010 PMID: 20487560 PMCID: PMC2885378 DOI: 10.1186/1471-2105-11-272
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Conceptual overview of Genelist Analyzer. The program takes a group of genes as input, retrieves the relevant documents for each gene, and identifies the terms that are associated with this gene group (enriched terms). Further interactive analysis will allow a user to trace back the documents containing the terms and genes. The example shown in this figure is hypothetical.
Figure 2The result page of Genelist Analyzer. The top part of the screen provides control of the output; the "Significant Concepts" table displays the main concepts identified along with the relevant statistics (note that gene names are underline and linked to the external program Gene Summarizer). The Ratio field of a concept is the percentage of genes associated with this concept and the Significance field displays the statistical confidence score of that concept. The concepts are automatically clustered, and the index of the cluster which a concept belongs to is also shown. The "Genes Found" table display the information of the input gene: the name and the number of documents retrieved for each gene. This page is generated from the genes up-regulated by methoprene treatment in honey bees.
Figure 3The gene-term matrix from Genelist Analyzer. When the user chooses specific concepts of interest from the result screen and click the "Analyze" button, the program will retrieve the genes that actually contain these concepts, ranked by their relevances, and display the gene-term matrix. The supporting documents of gene-term association can be accessed through the hyperlinks in the matrix. This page used the list of genes up-regulated by methoprene treatment in honey bees.
Overrepresented concepts in bee behavior-related genes identified by GO Toolbox and Genelist Analyzer.
| GO Toolbox | Genelist Analyzer |
|---|---|
| Defense response | Defense, cytokine, |
| Response to stress, response to heat, response to temperature stimulus | Thermotolerance, |
| Protein folding | Chaperone, cochaperone |
| Pigmentation, Dopamine metabolism, Catecholamine metabolism | Pigment, melanin, Laminin |
| Carbohydrate metabolism | Proteoglycan |
| Regulation of circadian rhythm | |
| Circadian sleep/wake cycle, sleep | |
| Transition metal ion homeostasis, Iron ion homeostasis | Ire, ferritin |
| Amino acid and derivative metabolism | Alanine |
| Sex determination | |
| Response to pest, pathogen or parasite | Bacteria, bacterial, gram, pathogen, macrophage, antimicrobial, imd |
Overrepresented concepts in genes responding to methoprene treatment, identified by Genelist Analyzer (top 30 terms) and PAKORA (at P < 0.01).
| Genelist Analyzer | Ca2, filament, sodium, light chain, cytochrome, electrophoresis, myosin heavy, sodium channel, heavy chain, cytochrome p450, polyacrylamide gel, thick filament, flight muscle, Na channel, myosin light, pyrethroid, channel gene, indirect flight, basement, basement membrane, kdr, proteasome, chain kinase, tubule, insecticide, iv, ATPase, muscle myosin, myofibril, dh31, indirect |
| PAKORA | phototactic, type, myosin, depressor, lattice, rod, insoluble, separation, resistant, oscillatory, flight, overlap, would, atpase, well, myofibril, built, sarcomere, time, rearing, corresponding, smooth, wall, there, ethyl, disappear, five |
Figure 4Simple examples for the term significance test. Each table represents the (hypothetic) data for one test term. The second column shows the count of the test term in the document set of a gene, and the third column shows the expected count according to the null distribution (assuming that the term is not related to the gene). The expected count is the product of the frequency of the term in the background collection and the length of the document set of the gene. E.g. in the first row of table (A), 5 means the term appears five times in all the documents associated to g1, and 0.1 is the expected counts according to the background. (A) An example where the term may be related to the first two genes. (B) An example where the term does not appear to be significantly related to any gene.