| Literature DB >> 19371439 |
Alissa M Resch1, Aleksey Y Ogurtsov, Igor B Rogozin, Svetlana A Shabalina, Eugene V Koonin.
Abstract
BACKGROUND: Alternative splicing (AS) in protein-coding sequences has emerged as an important mechanism of regulation and diversification of animal gene function. By contrast, the extent and roles of alternative events including AS and alternative transcription initiation (ATI) within the 5'-untranslated regions (5'UTRs) of mammalian genes are not well characterized.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19371439 PMCID: PMC2674463 DOI: 10.1186/1471-2164-10-162
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
5'UTR statistics for ALT and nonALT gene sets
| Total genes | 11,727 | 2,915 | 14,288 | 909 |
| 5'UTR length | 203 | 239.6 | 180.1 | 178.1 |
| Genes with uORFs | 5212 (44%) | 1547 (53%) | 5968 (42%) | 460 (51%) |
| uORF length | 58.7 | 73.4 (ALT) | 54.1 | 63.1 (ALT) |
| 48.2 (CONSTIT) | 44 (CONSTIT) | |||
Genes in the ALT_5'UTR set exhibit 5'UTR transcript diversity (AS and ATI), whereas, genes in the nonALT control set do not. Total number of genes in each set (top tier). Average 5'UTR length (nts) for transcripts in each set (middle tier). Total number of genes that contain uORFs (percentage in parenthesis), and average uORF lengths (nts) in alternative (ALT) and constitutive (CONSTIT) regions (bottom tier).
Figure 1Classification of alternative and constitutive nucleotides. The procedure to classify nucleotides as alternative or constitutive is outlined in steps 1–4. Start and stop codons are marked by arrows and asterisks, respectively. Alternative regions of 5'UTR and CDS are colored dark blue and dark red, respectively, whereas constitutive regions are colored light blue and light red. Variable regions (those classified as alternative or constitutive, depending on the isoform) are colored gray. Alternative nucleotide positions are labeled "A" along the genomic sequence, while constitutive nucleotide positions are labeled "C".
Number of alternative and constitutive nucleotides in 5'UTR and CDS
| 5'UTR | 548,760 | 189,778 | 3:1 |
| CDS | 1,072,755 | 2,984,181 | 1:3 |
| 5'UTR | 132,729 | 59,421 | 2:1 |
| CDS | 252,539 | 1,105,911 | 1:4 |
Number of alternative (ALT) and constitutive (CONSTIT) nucleotides in the 5'UTR and CDS regions of genes in the ALT_5'UTR sets for human and mouse. Ratio of alternative-to-constitutive nucleotides is shown in the column on the right.
Frequency of uAUGs and uORFs in alternative and constitutive regions
| Total uAUGs mapped to genomic | 6228 | 1517 |
| uAUGs mapped to ALT | 4948 (79%) | 1108 (73%) |
| uAUGs mapped to CONSTIT | 1280 (21%) | 409 (27%) |
| uAUGs included in evolutionary conservation analysis in Human-Macaque and Mouse-Rat orthologs | 5608 | 1518 |
| Total uORFs mapped to genomic | 4870 | 1255 |
| uORFs mapped to ALT | 3807 (78%) | 906 (72%) |
| uORFs mapped to CONSTIT | 1063 (22%) | 349 (28%) |
| uORFs fully contained within ALT | 3042 (80%) | 667 (74%) |
| uORFs fully contained within CONSTIT | 901 (85%) | 324 (94%) |
| uORFs included in evolutionary conservation analysis in Human-Macaque and Mouse-Rat orthologs | 1988 | 842 |
Count of uAUGs (top tier) and uORFs (bottom tier) that map to alternative (ALT) and constitutive (CONSTIT) regions of 5'UTR in human and mouse (percentages in parenthesis).
Observed and expected frequency of AUG triplets and shuffled triplets
| ATG | ALT | 10.2 | 12.9 | 10.4 | 13 |
| CONSTIT | 7.9 | 12.4 | 8.6 | 12.7 | |
| AGT/GTA/GAT/TAG/TGA | ALT | 60.9 | 64.5 | 65.8 | 65.0 |
| (12.1) | (12.9) | (13.2) | (13.0) | ||
| CONSTIT | 58.7 | 62.0 | 63.7 | 63.5 | |
| (11.7) | (12.4) | (12.8) | (12.7) | ||
Observed and expected frequencies of AUG triplets (top tier) and shuffled triplets (AGT/GTA/GAT/TAG/TGA) that are permutations of AUG (bottom tier), per 1000 nucleotides in ALT and CONSTIT regions of 5'UTR. The significance of the differences between expected and observed frequencies of uAUG and shuffled triplets was estimated using the χ2 test. All comparisons using the χ2 test are highly significant (P < 10-10) except for two cases in mouse: observed ALT_AGT/GTA/GAT/TAG/TGA versus expected ALT_AGT/GTA/GAT/TAG/TGA (P = 0.41) and observed CONSTIT_AGT/GTA/GAT/TAG/TGA versus expected CONSTIT_AGT/GTA/GAT/TAG/TGA (P = 0.97).
Figure 2uORF length distributions in alternative and constitutive regions. uORF length distributions between alternative (ALT) and constitutive (CONSTIT) regions of 5'UTR are significantly different in human (A) and mouse (B) (P = 1.1 × 10-26 for human and P = 6.5 × 10-9 for mouse; Student's t-test). uORFs in ALT and CONSTIT regions are labeled as blue and gray bars in histograms.
Conservation of uAUGs and uORFs in alternative and constitutive regions
| ATG | ALT | 3245(74%) | 1119(26%) | 642(59%) | 442(41%) |
| CONSTIT | 1005(81%) | 239(19%) | 305(70%) | 129(30%) | |
| AGT/GTA/GAT/TAG/TGA | ALT | 17498(73%) | 6493(27%) | 3937(57%) | 2925(43%) |
| CONSTIT | 6525(75%) | 2190(25%) | 1904(59%) | 1314(41%) | |
| uORF | ALT | 910(61%) | 593(39%) | 290(41%) | 409(59%) |
| CONSTIT | 348(72%) | 137(28%) | 81(57%) | 62(43%) | |
Conservation of uAUGs and uORFs in alternative (ALT) and constitutive (CONSTIT) regions of 5'UTR for human-macaque and mouse-rat. Conservation of AUG triplets in ALT and CONSTIT regions (top tier). The significance between conserved (Con) and non-conserved (NonCon) AUG frequencies in ALT and CONSTIT regions was estimated using Fisher's exact test (P = 0.000002 for human-macaque; P = 0.00007 for mouse-rat). Conservation of AGT, GTA, GAT, TAG and TGA shuffled triplets in ALT and CONSTIT regions (middle tier). Fisher's exact test for the fraction of conserved AUG triplets produced significant results in human: P = 0.05 (for ALT AUG versus shuffled triplets); P = 3.7 × 10-6 (for CONSTIT AUG versus shuffled triplets), and in mouse: P = 8.2 × 10-6 (for CONSTIT versus shuffled triplets). Results for mouse ALT AUG versus shuffled triplets were insignificant (P = 0.21). Conservation of uORFs in ALT and CONSTIT regions of 5'UTR (bottom tier). The significance between conserved and non-conserved uORF frequencies in ALT and CONSTIT regions was estimated using Fisher's exact test (P = 6.5 × 10-9 for human-macaque; P = 0.001 for mouse-rat).
Substitution rates for alternative and constitutive regions of 5'UTR
| ALT | all | 0.045 ± 0.001 | 0.052 ± 0.001 | 0.047 ± 0.001 | 0.054 ± 0.001 |
| ≥ 30 bp | 0.046 ± 0.001 | 0.051 ± 0.002 | |||
| CONSTIT | all | 0.040 ± 0.001 | 0.046 ± 0.002 | 0.042 ± 0.001 | 0.051 ± 0.002 |
| ≥ 30 bp | 0.041 ± 0.002 | 0.048 ± 0.003 | |||
| ALT | all | 0.092 ± 0.002 | 0.111 ± 0.002 | 0.099 ± 0.002 | 0.098 ± 0.001 |
| ≥ 30 bp | 0.094 ± 0.004 | 0.114 ± 0.005 | |||
| CONSTIT | all | 0.088 ± 0.003 | 0.103 ± 0.004 | 0.093 ± 0.002 | 0.100 ± 0.001 |
| ≥ 30 bp | 0.091 ± 0.004 | 0.108 ± 0.007 | |||
Substitution rates for alternative and constitutive regions of 5'UTR were estimated for human-macaque (top tier) and mouse-rat (bottom tier). Evolutionary rates for non-synonymous (Ka) and synonymous (Ks) sites from uORFs within ALT and CONSTIT regions were estimated using the Pamilo-Bianchi-Li method. Rates were calculated for all uORFs (all) and for the subset of uORFs ≥ 30 nts in length (≥ 30 bp). Evolutionary rates were also calculated for regions of 5'UTR that contain uORFs (K5 uORFs) and for regions without (K5 uORFs excluded), using the Kimura-2-Parameter method.
Functional classification of human genes with alternative 5'UTRs
| response to stimulus | 20 | 597 | 4.2E-92 |
| G-protein coupled receptor protein signaling pathway | 52 | 842 | 1.8E-64 |
| signal transduction | 232 | 1778 | 1.2E-20 |
| receptor activity | 189 | 1425 | 1.3E-15 |
| RNA binding | 66 | 559 | 7.5E-11 |
| membrane | 754 | 4608 | 1.8E-09 |
| translation | 26 | 261 | 4.5E-09 |
| protein folding | 24 | 243 | 1E-08 |
| extracellular space | 59 | 480 | 2.7E-08 |
| rhodopsin-like receptor activity | 28 | 267 | 3.6E-08 |
| extracellular region | 92 | 686 | 7.3E-08 |
| biological process | 83 | 627 | 9.8E-08 |
| nucleus | 669 | 4043 | 2.1E-07 |
| integral to membrane | 571 | 3467 | 6.4E-07 |
| DNA binding | 195 | 1285 | 1.5E-06 |
| nucleic acid binding | 93 | 669 | 1.7E-06 |
| proteinaceous extracellular matrix | 23 | 210 | 5.1E-06 |
| intracellular | 303 | 1896 | 7.2E-06 |
| RNA splicing | 22 | 201 | 7.9E-06 |
| regulation of apoptosis | 33 | 101 | 2.1E-05 |
Human genes are partitioned into two groups: genes with alternative 5'UTRs (ALT) and ALL genes. Gene Ontology keyword descriptions are listed in left column. Keyword frequencies were tabulated for the ALT and ALL sets, and normalized by the total numbers in each set. P-values were calculated using the χ2 test.