| Literature DB >> 17784955 |
Da Wei Huang1, Brad T Sherman, Qina Tan, Jack R Collins, W Gregory Alvord, Jean Roayaei, Robert Stephens, Michael W Baseler, H Clifford Lane, Richard A Lempicki.
Abstract
The DAVID Gene Functional Classification Tool http://david.abcc.ncifcrf.gov uses a novel agglomeration algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules. This organization is accomplished by mining the complex biological co-occurrences found in multiple sources of functional annotation. It is a powerful method to group functionally related genes and terms into a manageable number of biological modules for efficient interpretation of gene lists in a network context.Entities:
Mesh:
Year: 2007 PMID: 17784955 PMCID: PMC2375021 DOI: 10.1186/gb-2007-8-9-r183
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flow chart of the procedures for the DAVID Gene Functional Classification Tool and the DAVID Functional Annotation Clustering Tool.
Figure 2A hypothetical example of detecting gene-gene functional relationships by kappa statistics. (a) The all-redundant and structured terms are broken into 'independent' terms in a flat linear collection. Each gene associates with some of the annotation term collection so that a gene-annotation matrix can be built in a binary format, where 1 represents a positive match for the particular gene-term and 0 represents the unknown. Thus, each gene has a unique profile of annotation terms represented by a combination of 1 s and 0 s. (b) For a particular example of genes a and b, a contingency table was constructed for kappa statistics calculation. The higher kappa score (0.66) indicates that genes a and b are in considerable agreement, more so than by random chance. By flipping the table 90 degrees, the kappa score of term-term can be achieved, based on the agreement of common genes (not shown). For more information see Additional data files 11 and 12.
Figure 3The gene-gene functional relationship can be specifically detected by kappa statistics. (a) Kappa scores were calculated for all possible combinations of human gene-gene pairs (approximately 300 million). Only gene-gene pairs with a higher number of annotation terms in common possibly have good kappa values. The box plot consists of the smallest and largest observations at the two end points (95% confidence interval), as well as a box from the 1st to 3rd quartiles. The blue and red lines represent median and mean observations, respectively. (b) Kappa scores were calculated for all possible human gene-gene pairs, gene-gene pairs with randomized annotation terms, all collected protein-protein interacting pairs, and all 'chemokine' gene pairs, respectively. The distributions of those kappa scores from protein-protein interacting pairs (pink) and 'chemokine' gene pairs (light blue) significantly shift to the high value end compared to human total (blue); conversely, the kappa score distribution (yellow) of gene pairs with randomized annotation terms remains in the lower value end below 0.35. Interestingly, for the human genome (blue), over 50% of the kappa scores equal 0 (no detectable relationships) and >95% are lower than 0.35. Altogether, this indicates that kappa statistics can specifically detect the gene-gene functional relationships.
Figure 4Graphical illustration of the heuristic fuzzy partition algorithm. (a) Hypothetically, each element (gene) can be positioned in a virtual two-dimensional space, based on its characteristics (annotation terms). The distance represents the degree of relationship (kappa score) among the genes. (b) Any gene has a chance as a medoid to form an initial seeding group. Only the initial groups with enough closely related members (for example, members >3 and kappa score ≥0.4) are qualified (solid-line circle). Conversely, unqualified ones are shown as dashed-line circles. (c) Every qualified initial seeding group is iteratively merged with each other to form a larger group based on the multi-linkage rule, that is, sharing 50% or more of memberships, until all secondary clusters (thicker oval) are stable. Importantly, the genes not covered by any qualified initial seeding group are considered as outliers (in gray). (d) Finally, three final groups (thicker ovals) are formed because they can no longer be merged with any other group. One gene (in red) belonging to two groups represents the fuzziness capability of the algorithm. And outliers (in gray in (c)) are removed for clearer presentation. A step-by-step example can be found in Additional data file 13.
Figure 5A text format report from the Gene Functional Classification Tool. The example shows the output of 16 genes (Additional data file 1) analyzed by the tool with default settings. Without prior knowledge, the tool is able to classify genes into three functional gene groups. On each group header, a set of buttons is provided for in-depth exploration of the annotation for the group. 'T' reports the major enriched annotation terms associated with the group. The 'Heat Map' symbol provides a detailed graphical view of gene-term relationships. 'RG' searches other related genes in the genome but not in the list.
Figure 6An example of genes-to-terms 2-D view. All the related 23 kinase genes and their associated annotation terms from gene group 3 (kinase group) for demo list 2 are displayed in a 2-D heat map-like interactive graphical view. Green represents the positive association between the gene-term; conversely, black represent an unknown relationship. The annotation terms are ordered based on their enrichment scores associated with the group. The kinase commonly related annotations (big green block) are shown on the left side, and the scattered pattern (green and black) on the right side shows the functional difference.
The top 20 enriched terms for demo list 2 by various traditional functional annotation tools
| No. | GOMiner | DAVID Chart | GOstat | Ontologizer | topGO elim | ADGO |
| 1 | Inflammatory response | Response to pathogenic bacteria | Cell-cell signaling | Response to stimulus | Induction of positive chemotaxis | Inflammatory response/extracellular region |
| 2 | Clathrin coat of coated pit | Chemokine activity | Response to pest, pathogen or parasite | DNA repair | Positive regulation of vascular endothelium | Inflammatory response |
| 3 | Viral genome replication | Cell migration | Response to stress | Cell surface receptor linked signal transduction | Chemokine activity | Cell-cell signaling/extracellular space |
| 4 | Morphogenesis | Clathrin-coated vesicle | Response to external biotic stimulus | Positive regulation of protein metabolic process | Angiogenesis | Soluble fraction/chemokine activity |
| 5 | Cytokine activity | Clathrin vesicle coat | Response to wounding | Cytoskeleton organization and biogenesis | Vascular endothelial growth factor receptor | Extracellular space |
| 6 | Establishment of spindle localization | Clathrin coated vesicle membrane | Negative regulation of biological process | Molecular_function | Extracellular matrix binding | Sensory perception/chemokine activity |
| 7 | Cell communication | Receptor binding | Negative regulation of physiological process | Cell communication | Viral genome replication | Inflammatory response/chemokine activity |
| 8 | Establishment of mitotic spindle localization | Response to other organism | Cytoplasmic vesicle membrane | DNA binding | Extracellular space | Sensory perception/extracellular space |
| 9 | Regulation of cellular process | Kinase activity | Cytoplasmic vesicle membrane | Protein binding | Cell-cell signaling | Chemokine activity |
| 10 | Regulation of biological process | RNA polymerase II transcription factor activity | Negative regulation of cellular process | Cell cortex | Inflammatory response | Chemotaxis/extracellular space |
| 11 | Development | Clathrin coat | Regulation of biological process | Mitochondrial part | Vasculogenesis | G-protein coupled receptor protein signaling pathway/extracellular space |
| 12 | Signal transduction | Establishment of cellular localization | Cell proliferation | GTPase activity | Chemotaxis | Inflammatory response/extracellular space |
| 13 | Viral infectious cycle | Cell differentiation | Phagocytic vesicle | Chemotaxis | Neutrophil activation | Extracellular space/chemokine activity |
| 14 | Positive regulation of protein metabolism | Cell death | Calpain inhibitor activity | Anatomical structure formation | Ammonia ligase activity | G-protein coupled receptor protein signaling pathway/chemokine activity |
| 15 | Regulation of protein-nucleus import | Regulation of isotype switching | Cell adhesion | Lyase activity | Endothelin-converting enzyme 1 activity | Chemotaxis/soluble fraction |
| 16 | Immune cell migration | Membrane-bound vesicle | Negative regulation of cellular physiological process | Interleukin-12 production | U-plasminogen activator receptor activity | Cell-cell signaling/chemokine activity |
| 17 | Organ development | Cell cycle | Vesicle membrane | Nitrogen compound biosynthetic process | Cell adhesion | Cell proliferation/extracellular space |
| 18 | Organogenesis | Membrane fraction | Inflammatory response | DNA recombination | Fructose metabolism | Extracellular region/chemokine activity |
| 19 | Chemotaxis | Angiogenesis | Cell communication | Cytokine biosynthetic process | Response to pathogenic bacteria | G-protein coupled receptor protein signaling pathway/soluble fraction |
| 20 | Taxis | Cell communication | Cell differentiation | Immune system process | Hyaluronic acid binding | Sensory perception/extracellular region |
| Total 380 terms ( | Total 157 terms ( | Total 119 terms ( | Total 31 terms ( | Total 160 terms ( | Total 67 terms ( |
The example gene list was analyzed by GoMiner, DAVID, GOStat, Ontologizer, topGO, and ADGO. The annotation data coverage was set to GO terms of all levels, and all other parameters used were each tool's default settings. Only the top 20 terms from each tool are shown (see Additional data file 15 for all results). Many of the terms are redundant or found within the same hierarchy. We emphasize the top 20 terms for three reasons: first, the top ranked terms represent the overall quality of the tools in terms of sensitivity and specificity; second, it renders the amount of analytical effects equivalent and comparable throughout the comparisons, including the clustered results; and third, analysts usually spend more time and attention on the top ranked terms due to time and focus constraints.
Sixteen total gene functional groups identified by the Functional Classification Tool
| Gene functional group no. | Associated biology | Group enrichment score |
| 1 | Chemokine/cytokine | 3.37 |
| 2 | Transcription regulation | 2.89 |
| 3 | Signal transduction/membrane receptors | 2.68 |
| 4 | Kinase activity | 2.54 |
| 5 | DNA damage/repair | 2.23 |
| 6 | Iron binding | 2.05 |
| 7 | RNA processing/splicing factors | 1.81 |
| 8 | Organic acid transport | 1.71 |
| 9 | Cation/ion transport | 1.69 |
| 10 | DNA metabolism/chromosome organization | 1.53 |
| 11 | Cellular macromolecule catabolism | 1.41 |
| 12 | Metalloprotease | 1.34 |
| 13 | Macrotubule | 1.24 |
| 14 | Protein localization/fusion | 1.17 |
| 15 | Amine metabolism | 1.1 |
| 16 | RAS small GTPase | 1.03 |
The genes of demo list 2 were analyzed by the Functional Classification Tool. The major biology terms associated with each group are manually summarized based on gene-term enrichment buttons provided for each functional group.
The top 20 annotation clusters identified by the DAVID Functional Annotation Clustering Tool
| Annotation cluster | Representative annotation terms | Enrichment score |
| 1 | Negative regulation of biological process | 5.38 |
| 2 | Signal transduction | 4.36 |
| 3 | Inflammatory response | 3.75 |
| 4 | Extracellular region | 3.69 |
| 5 | Cytokine/chemokine activity | 3.12 |
| 6 | Viral genome replication | 2.23 |
| 7 | Cell death/apoptosis | 2.19 |
| 8 | Regulation of biological process | 2.18 |
| 9 | Organ morphogenesis | 2.06 |
| 10 | Regulation of cell cycle | 2.01 |
| 11 | Positive regulation of biological process | 1.87 |
| 12 | Biological process unknown | 1.76 |
| 13 | Physiological interaction between organisms | 1.69 |
| 14 | Antimicrobial humoral response | 1.52 |
| 15 | Transcription cofactor activity | 1.46 |
| 16 | Integral to plasma membrane | 1.44 |
| 17 | Coated vesicle membrane | 1.42 |
| 18 | DNA repair/DNA metabolism | 1.38 |
| 19 | Kinase activity | 1.3 |
| 20 | Myoblast differentiation | 1.3 |
The genes of demo list 2 were analyzed by the Functional Annotation Clustering Tool. The top 20 annotation clusters out of 65 total clusters have group enrichment scores less than or equal to 0.05 (equivalent to 1.3 in minus log scale). The clusters are ordered by group enrichment score. The representative biology terms associated with the top 20 annotation clusters are manually selected, showing a much clearer and non-redundant view of the annotation terms associated with the study.