| Literature DB >> 25495442 |
Dokyun Na, Hyungbin Son, Jörg Gsponer1.
Abstract
BACKGROUND: Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent-child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.Entities:
Mesh:
Year: 2014 PMID: 25495442 PMCID: PMC4298957 DOI: 10.1186/1471-2164-15-1091
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Categorization based on the graphical structure of GO (GO Slim approach). A. Section of the GO structure. GO terms of interest are colored in green and categories to which GO terms are assigned are colored in blue. In this example, each of the blue GO terms refers to a particular category. B. Mapping results by the GO Slim approach, which takes only the graphical structure of GO into account.
Figure 2Approach used in categorizer to assign genes to categories. A. Three steps used in the categorization process: (i) Information content calculation, (ii) semantic similarity score calculations for parent–child pairs and (iii) categorization according to the semantic similarity scores. See the main text for details. B. Illustrative (synthetic) example for the calculation of semantic similarity scores. Information content scores (I) are shown for each GO term. G0 is a root term. In this example, a user defined two categories (A and B) and assigned G22 to category A (orange), and G23 to category B (blue). Semantic similarity scores (S) of several terms are also shown.
Categories provided with categorizer
| Groups | Categories |
|---|---|
| Biological processes | Cell cycle, Cytoskeleton, Metabolism, Transcription, Translation, Protein folding, Proteolysis, Signaling, RNA processing, Splicing, Transmembrane transport, Intracellular localization, Protein transport, Nuclear transport, Vesicles, Golgi/ER, Mitochondria, Endo- and exo-cytosis, Lysosome, Peroxisome, Ribosomes, Phagocytosis/phagosome, Autophagy, Apoptosis, DNA repair, DNA replication, Receptors |
| Cellular localization | Cytoplasm, Mitochondria, Golgi, Nucleus, Cytoskeleton, Vesicle/Lysosome, ER, Extracellular |
| Enzyme functions | Hydrolase, Isomerase, Ligase, Lyase, Oxidoreductase, Transferase |
Figure 3Snapshots of the GUI of . A. Initial window for setting up the categorization parameters: category definitions, gene annotations, gene test set, background genes, and categorization options. B. Categorization results: category statistics (left), detailed categorization results (middle), and enrichment analysis result (right).
Figure 4Comparison of results generated by using Categorizer and a GO-Slim-based approach. A. Overall accuracies of Categorizer, GO Slim and a random predictor. The categories of genetic modifiers obtained from a high-throughput screening study (Zhang et al.) were used as a gold standard. B. Enrichment of the 9 categories. All the genes tested for HD were used as a reference. C. Numbers of genes for each category in the test and randomized reference sets. Since we allowed multiple categorization, one gene may appear in several categories. Significantly enriched categories (p < 10-2) were marked as *.
Figure 5Performance comparison of different semantic similarity measures and the one implemented in Categorizer. MCC values were calculated with HD modifiers as done for the comparison of Categorizer and GO Slim in Figure 4A.