| Literature DB >> 20454629 |
Clara S M Tang1, Richard J Epstein.
Abstract
We recently reported that the human genome is ''splitting" into two gene subgroups characterised by polarised GC content (Tang et al, 2007), and that such evolutionary change may be accelerated by programmed genetic instability (Zhao et al, 2008). Here we extend this work by mapping the presence of two separate high-evolutionary-rate (Ka/Ks) hotspots in the human genome-one characterized by low GC content, high intron length, and low gene expression, and the other by high GC content, high exon number, and high gene expression. This finding suggests that at least two different mechanisms mediate adaptive genetic evolution in higher organisms: (1) intron lengthening and reduced repair in hypermethylated lowly-transcribed genes, and (2) duplication and/or insertion events affecting highly-transcribed genes, creating low-essentiality satellite daughter genes in nearby regions of active chromatin. Since the latter mechanism is expected to be far more efficient than the former in generating variant genes that increase fitnesss, these results also provide a potential explanation for the controversial value of sequence analysis in defining positively selected genes.Entities:
Year: 2010 PMID: 20454629 PMCID: PMC2862947 DOI: 10.1155/2010/856825
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Ka/Ks profile of the human genome, showing that 75% of all genes are characterized by a Ka/Ks < 0.2; that is, most are under negative selection, whereas only a small percentage is characterised by very high Ka/Ks.
Mean expression score (breadth and SAGE) of varying Ka/Ks groups for low and high GC genes. The data confirm that the different Ka/Ks groups so defined vary significantly in terms of gene expression levels for both low-GC (correlation coefficient −0.32, P < 2.2 × 10−16) and high-GC gene subsets (correlation coefficient −0.10, P = .00033), as well as in terms of expression breadth (correlation coefficient −0.35, P < 2.2 × 10−16, and correlation coefficient −0.098, P = .00067, resp.) using Spearman correlation.
| Ka/Ks | Breadth | SAGE | ||
|---|---|---|---|---|
| Low GC | High GC | Low GC | High GC | |
| 0 | 15.85 | 11.83 | 163.93 | 103.38 |
| 0–0.1 | 14.03 | 10.74 | 58.22 | 67.84 |
| 0.1–0.2 | 11.52 | 10.40 | 39.42 | 74.04 |
| >0.2 | 9.10 | 8.86 | 32.37 | 57.75 |
Figure 2Comparative relationship between low- (upper rows) and high-GC gene groups (lower rows) and intron length (left) and exon number (middle), and their ratio (right).
Figure 3Distribution of genes with various GC content and intron length ((a), left) dot plot ((b), right) contour map with nearest neighbour smoothing. (c) Contour map with fixed neighbour smoothing (left, 1%) and (right, 5%). (d, e). Contour map of (d) Ka/Ks and (e) expression levels in SAGE of genes, using different sensitivity cutoffs (left, 1%, and right, 5%).
Characterisation of gene subsets with differing intron/exon numbers and intron length, in terms of evolutionary rate and gene expression. Spearman correlation coefficient (= 0.58, P < 2.0 × 10−16) was calculated for gene subgroups greater than 2SD (intron length/number and intron number/length) from the mean.
| Short and higher intron | Long and higher intron |
| |||
|---|---|---|---|---|---|
| length/number | number/length | ||||
| Mean | Median | Mean | Median | ||
| Ka/Ks | 0.19 | 0.17 | 0.054 | 0.080 | <2 × 10−16 |
| Breadth | 10.27 | 9 | 23 | 11.52 | 0.019 |
| SAGE | 114.44 | 34.66 | 30.05 | 39.91 | 0.021 |
† P-value of nonparametric Mann-Whitney test.