| Literature DB >> 24792168 |
Svetlana A Shabalina1, Aleksey Y Ogurtsov2, Nikolay A Spiridonov3, Eugene V Koonin1.
Abstract
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5' and 3' transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5'-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3'-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24792168 PMCID: PMC4066770 DOI: 10.1093/nar/gku342
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Anatomy of mammalian transcripts: functional domains, constitutive and alternative nucleotides and alternative events. TI, transcription initiation site; AUG, translation initiation site; TT, transcription termination site; translation termination site; ATI, alternative transcription initiation; AS, alternative splicing; ATT, alternative transcription termination. Protein-coding regions are filled by black (in cCDSs) or by dark grey (in grey areas). UTRs are shown in white (for UTRs) and in light grey (for grey areas).
Figure 2.Distributions of introns in longest isoforms transcribed from polymorphic and monomorphic gene loci (A) and in their protein coding regions (B).
(A) Numbers of introns and isoforms in monomorphic (mono) and polymorphic (poly) genes in the human genome (hg18 and hg19). (B) Distributions of constitutive and alternative nucleotides located in the core (cCDS alt) or terminal (5grey alt, 5grey dual, 3grey alt, 3grey dual) coding regions of polymorphic gene loci (Ensembl, hg19)
| Gene group | Intron # max | Intron # mean | # of isoforms | |||
|---|---|---|---|---|---|---|
| hg18 | ||||||
| Mono+Poly | 10.14 ± 0.63 | 8.9 ± 0.40 | ||||
| Mono | 7.98 ± 0.61 | 7.98 ± 0.61 | 1 | |||
| Poly | 13.02 ± 0.13 | 10.65 ± 0.53 | 3.56 ± 0.13 | |||
| hg19 | ||||||
| Mono+Poly | 10.35 ± 0.11 | 8.9 ± 0.12 | ||||
| Mono | 7.8 ± 0.11 | 7.8 ± 0.11 | 1 | |||
| Poly | 12.99 ± 0.1 | 9.8 ± 0.08 | 4.07 ± 0.02 | |||
| # seq | 17 718 | 11 811 | 7 524 | 5 011 | 10 397 | 6 145 |
| average nt | 866.6 | 200.2 | 283.0 | 133.0 | 496.2 | 378.3 |
| # nt | 14 674 986 | 2 364 981 | 2 129 403 | 666 591 | 5 159 400 | 2 324 934 |
| Sum terminal alternative (with dual) | ||||||
| # nt | 10 280 328 | |||||
| Sum total alternative (with dual) | ||||||
| # nt | 12 645 309 | |||||
Abbreviations: con,constitutive; alt, alternative, dual, dual function; nt, nucleotides; seq, sequences.
Figure 3.Mean numbers of introns in different functional regions of AS, ATI and ATT genes.
Figure 4.Predominant extension of alternative transcripts in the 5′- and 3′-terminal regions. Mean lengths of functional regions (x-axis) and mean numbers of introns (y-axis) are shown. (A) ATI gene group; (B) ATT gene group; (C) ATI + ATT gene group; (D) AS gene group. Zero on the x-axis is the distal (most downstream) start codon in the respective locus.
Rates of synonymous (Ks) and non-synonymous (Kn) nucleotide substitutions in human-macaque orthologous alternative (alt) and constitutive (con) protein coding sequences. (A) Unfiltered dataset. (B) Highly conserved sequences (Ks ≈ 0) excluded
| Region | Length | % regions with | % regions with overlapping frames | |||
|---|---|---|---|---|---|---|
| 5′ grey alt | 0.427 | 0.0252 ± 0.0010 | 0.0827 ± 0.0014 | 353 ± 16 | 26 | 4.3 |
| cCDS alt | 0.495 | 0.0279 ± 0.0005 | 0.0733 ± 0.0011 | 257 ± 11 | 34 | 6.2 |
| 3′ grey alt | 0.422 | 0.0255 ± 0.0005 | 0.0780 ± 0.0010 | 570 ± 18 | 27 | 13.6 |
| CDS con | 0.228 | 0.0207 ± 0.0004 | 0.0910 ± 0.0005 | 905 ± 21 | 12 | 18.8 |
| 5′ grey alt | 0.255 | 0.0261 ± 0.0012 | 0.1076 ± 0.0016 | 441 ± 16 | ||
| cCDS alt | 0.291 | 0.0292 ± 0.0006 | 0.1029 ± 0.0014 | 309 ± 11 | ||
| 3′ grey alt | 0.250 | 0.0257 ± 0.0006 | 0.1014 ± 0.0012 | 659 ± 18 | ||
| CDS con | 0.229 | 0.0225 ± 0.0004 | 0.0976 ± 0.0008 | 1002 ± 21 | ||
Figure 5.Distributions of RNA/nucleotide selection pressure ratio values, RNSP (A) and Protein Selection Pressure ratio values, PSP (B) in the 5′ grey area, cCDS and the 3′ grey area.
Distribution of phosphorylation sites (PhS) in constitutive (CDS con) and alternative sequences located in the core (cCDS alt) or terminal (5grey alt, 5grey dual, 3grey alt, 3grey dual) coding regions of polymorphic gene loci
| cCDS con | cCDS alt | 5grey alt | 5grey dual | 3grey alt | 3grey dual | |
|---|---|---|---|---|---|---|
| # PhS | 15 353 | 1 851 | 1 908 | 929 | 3 777 | 805 |
| # #Seq with PhS | 3 452 | 562 | 462 | 288 | 892 | 269 |
| #Seq total | 11 354 | 6 040 | 3 698 | 2 437 | 4 369 | 1 826 |
| Densitya | 1.31 | 1.006 | 0.979 | 0.950 | 1.171 | 1.027 |
aThe PhS density was calculated for the total lengths of the sequences found in the Human Protein reference database (www.hprd.org).