| Literature DB >> 20961448 |
Lili Hao1, Xiaomeng Ge, Haolei Wan, Songnian Hu, Martin J Lercher, Jun Yu, Wei-Hua Chen.
Abstract
BACKGROUND: Many functional, structural and evolutionary features of human genes have been observed to correlate with expression breadth and/or gene age. Here, we systematically explore these correlations.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20961448 PMCID: PMC2970608 DOI: 10.1186/1471-2148-10-316
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Correlations of human protein-coding gene properties with expression breadth and phyletic age. The numbers in the x-axis of panels a, c, e indicate the numbers of tissues in which genes are expressed. Phyletic groups in panels b, d, f are arranged according to their age, with 'cellular organisms' being the oldest and 'primates' the youngest. (a) Broadly expressed human proteins tend to be older, i.e., have homologs in more distantly related species. (b) Genes of different ages have distinct promoter architectures. (c-f) Gene function (according to GO and KEGG annotation) is correlated with both expression breadth (c, e) and phyletic age (d, f).
Correlation of human protein-coding gene properties with phyletic age and expression breadth
| Category | Property | Phyletic age | Expression breadth | ||
|---|---|---|---|---|---|
| Age | - | - | 0.270 | < 10-15*** | |
| Structural | |||||
| Protein length (log) | 0.340 | < 10-15*** | 0.113 | < 10-15*** | |
| Exon number | 0.290 | < 10-15*** | 0.185 | < 10-15*** | |
| CpG+/TATA- promoter | 0.217 | < 10-15*** | 0.394 | < 10-15*** | |
| Length 1st intron (log) | 0.103 | < 10-15*** | 0.063 | 6.0×10-14*** | |
| Length of 5' UTR (log) | 0.029 | 0.0005*** | 0.127 | < 10-15*** | |
| GC content of CDS | -0.110 | < 10-15*** | -0.120 | < 10-15*** | |
| Functional | |||||
| Molecular functions | -0.543 | < 10-15*** | -0.178 | < 10-15*** | |
| Pathway class | -0.264 | < 10-15*** | -0.025 | 0.042 * | |
| Expression level (log) | 0.162 | < 10-15*** | 0.611 | < 10-15*** | |
| Evolutionary | |||||
| Ka | -0.275 | < 10-15*** | -0.258 | < 10-15*** | |
| Ks | -0.062 | 1.3 × 10-11*** | -0.050 | 8.9 × 10-09*** | |
| Ka/Ks | -0.336 | < 10-15*** | -0.274 | < 10-15*** | |
a Pearson's correlation coefficient. * P < 0.05;** P < 0.01;*** P < 0.001.
Influence of expression breadth, phyletic age and expression abundance on protein properties using generalized linear modela
| Category | Property | |||
|---|---|---|---|---|
| Structural | ||||
| Protein length (log) | < 10-15*** | 4.7 × 10-13*** | < 10-15*** | |
| Exon number | < 10-15*** | < 10-15*** | 5.50 × 10-13*** | |
| CpG+/TATA- promoter | 2.30 × 10-14*** | < 10-15*** | 0.0405* | |
| 1st intron length (log) | 0.0061** | 0.0240 * | 4.43 × 10-05*** | |
| Length 5' UTR (log) | 0.00747** | < 10-15*** | 4.31 × 10-05*** | |
| GC of CDS | < 10-15*** | < 10-15*** | 8.34 × 10-11*** | |
| Functional | ||||
| Molecular function (GO) | < 10-15*** | < 10-15*** | 0.0739 | |
| Pathway class (KEGG) | < 10-15*** | 7.17 × 10-6*** | 0.0121* | |
| Evolutionary | ||||
| Ka | < 10-15*** | < 10-15*** | 0.180 | |
| Ks | 7.84 × 10-5*** | 2.41 × 10-7*** | 0.428 | |
| Ka/Ks | < 10-15*** | < 10-15*** | 0.000354*** | |
| MainFactors | ||||
| Age | - | < 10-15*** | 0.348 | |
| EST breadth | < 10-15*** | - | < 10-15*** | |
| Expression abundance | 0.348 | < 10-15*** | - |
Numbers are the corresponding P-values, * P < 0.05;** P < 0.01;*** P < 0.001.
a We used a form like 'property ~ phyletic age + expression breadth + expression abundance' in the generalized linear model analysis. This form will produce three p-values showing the influence of phyletic age, expression breadth and expression abundance on 'property' respectively; p-values less than certain threshold (0.05 for example) suggest significant contribution of some factors to the 'property'; multiple significant p-values suggest the corresponding factors contribute independently to the 'property'.
Figure 2Young genes are rarely functionally characterized; but the youngest group is the most disease-related. (a) genes annotated by gene ontology with experimental evidence codes (GO-EXP); (b) genes annotated by KEGG; (c) disease-causing genes annotated by OMIM (Online Mendelian Inheritance in Man).