| Literature DB >> 17284313 |
Kay Prüfer1, Bjoern Muetzel, Hong-Hai Do, Gunter Weiss, Philipp Khaitovich, Erhard Rahm, Svante Pääbo, Michael Lachmann, Wolfgang Enard.
Abstract
BACKGROUND: Genome-wide expression, sequence and association studies typically yield large sets of gene candidates, which must then be further analysed and interpreted. Information about these genes is increasingly being captured and organized in ontologies, such as the Gene Ontology. Relationships between the gene sets identified by experimental methods and biological knowledge can be made explicit and used in the interpretation of results. However, it is often difficult to assess the statistical significance of such analyses since many inter-dependent categories are tested simultaneously.Entities:
Mesh:
Year: 2007 PMID: 17284313 PMCID: PMC1800870 DOI: 10.1186/1471-2105-8-41
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Properties of the four category tests
| Hypergeometric | binary | Category contains lower proportion of variable "1" as the root category | Category contains higher proportion of variable "1" as the root category | Gene list with detected genes on an array; "0" is not differently expressed, "1" is differently expressed |
| Wilcoxon rank | continuous | Sum of ranks of genes in the category is higher than all other genes | Sum of ranks of genes in the category is lower than all other genes | Gene list with detected genes on an array; continuous variable is the probability for being not differently expressed |
| Binominal | two counts | Frequency of countA in category is lower than in root category | Frequency of countB in category is lower than in root category | CountA is amount of SAGE tags in experiment A, countB is amount of SAGE tags in experiment B |
| 2 × 2 contingency | four counts | Counts are dependent and countA/countB < countC/countD | Counts are dependent and countA/countB > countC/countD | CountA and countC are differences at nonsynonymous sites between and within species, countB and countD are differences at synonymous sites between and within species, respectively |
Figure 1Schematic overview of FUNC. See main text for a description.
Figure 2Illustration how the global p-value is calculated. On the left ((a) and (c)) the cumulative p-value distribution between 0 and 0.05 is shown for the data set (red line) and the random sets (black or gray lines). For each distribution its maximal rank is determined and the maximal rank of the data set (red arrow) is compared to the maximal ranks of the random sets ((b) and (d)). The upper two panels exemplify this principle with three random sets and the lower two panels show the result of testing the ontology molecular function for an excess of amino acid changes in primates (see results and discussion).
Figure 3Illustration of the refinement algorithm. (a) Before the refinement, four groups are labelled significant (red) that contain the genes 1–4. (b) On the deepest level of the tree significant categories remain significant (orange). On the next level a significant category (arrow) is tested after all genes in the significant descendant categories (blue box) are removed. In this example, the category remains significant. (c) This procedure is repeated for the category on the next level (arrow) and again all genes in significant descendant categories (blue box) are removed. In this example the category is no longer significant after refinement.
Categories evolving fast in humans and chimpanzees
| GO:0003674 | molecular_functione | 4303 | 84306 | 8604 | 0.1 |
| GO:0004889 | nicotinic acetylcholine-activated cation-selective channel activity | 6 | 85 | 29 | 0.34 |
| GO:0004984 | olfactory receptor activity | 5 | 78 | 32 | 0.41 |
| GO:0005184 | neuropeptide hormone activity | 13 | 95 | 31 | 0.33 |
| GO:0005217 | intracellular ligand-gated ion channel activity | 2 | 117 | 30 | 0.26 |
| GO:0005272 | sodium channel activity | 9 | 124 | 34 | 0.27 |
| GO:0005279 | amino acid-polyamine transporter activity | 19 | 276 | 63 | 0.23 |
| GO:0005523 | tropomyosin binding | 4 | 14 | 12 | 0.86 |
| GO:0008188 | neuropeptide receptor activity | 17 | 212 | 44 | 0.21 |
| GO:0008271 | sulphate porter activity | 4 | 107 | 37 | 0.35 |
| GO:0015194 | L-serine transporter activity | 1 | 7 | 8 | 1.14 |
| GO:0016652 | oxidoreductase activity, acting on NADH or NADPH, NAD or NADP as acceptor | 10 | 140 | 36 | 0.26 |
| GO:0031402 | sodium ion binding | 31 | 684 | 109 | 0.16 |
| GO:0031404 | chloride ion binding | 26 | 459 | 79 | 0.17 |
a number of genes analysed in category; b number of amino acid changes between mouse and rat; b number of amino acid changes between human and chimpanzee; b number of amino acid changes between mouse and rat; ratio of primates/rodents; the ratio in the top category, in this case molecular function, gives the expected ratio.