| Literature DB >> 19168912 |
Leila Taher1, Ivan Ovcharenko.
Abstract
MOTIVATION: Several functional gene annotation databases have been developed in the recent years, and are widely used to infer the biological function of gene sets, by scrutinizing the attributes that appear over- and underrepresented. However, this strategy is not directly applicable to the study of non-coding DNA, as the non-coding sequence span varies greatly among different gene loci in the human genome and longer loci have a higher likelihood of being selected purely by chance. Therefore, conclusions involving the function of non-coding elements that are drawn based on the annotation of neighboring genes are often biased. We assessed the systematic bias in several particular Gene Ontology (GO) categories using the standard hypergeometric test, by randomly sampling non-coding elements from the human genome and inferring their function based on the functional annotation of the closest genes. While no category is expected to occur significantly over- or underrepresented for a random selection of elements, categories such as 'cell adhesion', 'nervous system development' and 'transcription factor activities' appeared to be systematically overrepresented, while others such as 'olfactory receptor activity'-underrepresented.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19168912 PMCID: PMC2647827 DOI: 10.1093/bioinformatics/btp043
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Distribution of GO categories with respect to the locus length. Left and right tables list the GO categories particularly associated with short and long loci, respectively.
GO categories significantly associated with genes in shorter loci and in longer loci
| Process/function | Locus length (kb) |
|
|---|---|---|
| Genes in shorter loci | ||
| Response | ||
| To unfolded protein | 28.7 | 2.4e-5 |
| To bacterium, defense | 58.5 | 1.3e-5 |
| To biotic stimulus | 58.8 | 1.8e-12 |
| Oxidative phosphorylation | 32.6 | 6.1e-9 |
| Oxidoreductase activity | 38.3 | 1.2e-5 |
| Electron transport | ||
| Mitochondrial | 34.3 | 1.1e-5 |
| ATP synthesis coupled | 36.1 | 1.4e-6 |
| Ribosome | ||
| Structural constituent | 36.8 | 1.7e-8 |
| Biogenesis and assembly | 44.9 | 1.8e-7 |
| Keratinization | 38.2 | 1.2e-6 |
| Epidermal cell differentiation | 43.3 | 1.2e-5 |
| rRNA | ||
| Processing | 50.2 | 9.1e-7 |
| Metabolic process | 51.3 | 8.4e-7 |
| Genes with longer loci | ||
| Morphogenesis | ||
| Embryonic limb | 525.2 | 6.8e-7 |
| Neurite | 185.2 | 1.4e-7 |
| Development | ||
| Limb | 483.1 | 6.0e-8 |
| Lung | 283.2 | 7.3e-6 |
| Respiratory tube | 277.0 | 4.4e-6 |
| Brain | 228.2 | 1.3e-7 |
| Central nervous system | 228.1 | 1.3e-11 |
| Tube | 202.0 | 4.4e-9 |
| Regulation of | ||
| Developmental process, positive | 325.3 | 2.1e-5 |
| Cell differentiation, negative | 316.4 | 2.5e-5 |
| Transcription, positive | 183.4 | 1.8e-6 |
| Axon guidance | 320.5 | 2.5e-5 |
| Signaling | ||
| Cyclic-nucleotide-mediated | 214.7 | 2.7e-6 |
| G-protein | 214.7 | 1.3e-6 |
Fig. 2.The average number of GO categories that show up as significantly over- or underrepresented in experiments with random sets of non-coding elements for different sample sizes.
Fig. 3.Significantly over- and/or underrepresented GO categories (showing only categories which are significant in at least 25% of the experiments). The x-axis represents different sample sizes, only within a range in which the number of GO categories over- and/or underrepresented shows high variation.
Significantly over/underrepresented GO categories (showing only categories which are significant in at least 25% of the experiments)
| GO id | Description |
|
|---|---|---|
| Overrepresentation | ||
| GO:0007156 | Homophilic cell adhesion | 4.7 |
| GO:0007155 | Cell adhesion | 2.3 |
| GO:0007399 | Nervous system development | 1.9 |
| GO:0005509 | Calcium ion binding | 1.7 |
| GO:0007242 | Intracellular signaling cascade | 1.4 |
| GO:0043565 | Sequence-specific DNA binding | 1.4 |
| GO:0007275 | Multicellular organismal development | 1.3 |
| GO:0006468 | Protein amino acid phosphorylation | 1.3 |
| GO:0003700 | Transcription factor activity | 1.2 |
| Underrepresentation | ||
| GO:0007186 | G-protein coupled receptor protein signaling pathway | 0.7 |
| GO:0050896 | Response to stimulus | 0.6 |
| GO:0007608 | Sensory perception of smell | 0.4 |
| GO:0004984 | Olfactory receptor activity | 0.3 |
Overrepresented GO categories appear to have ratios >1, while underrepresented GO categories consist of shorter loci, on average.
Overrepresented GO categories computed using the usual hypergeometric test (panel A) and accounting for variable locus length (panel B) on the datasets described by Ovcharenko et al. (2004) and Woolfe et al. (2005)
|
|
Categories removed by the GO ascertainment correction are highlighted, as well as additional categories found after applying the correction.
Overrepresented GO categories computed using the usual hypergeometric test (panel A) and accounting for variable locus length (panel B) on the datasets described by Bejerano et al. (2004)
|
|
Categories removed by the GO ascertainment correction are highlighted, as well as additional categories found after applying the correction.