| Literature DB >> 16914452 |
Jun-ichi Takeda1, Yutaka Suzuki, Mitsuteru Nakao, Roberto A Barrero, Kanako O Koyanagi, Lihua Jin, Chie Motono, Hiroko Hata, Takao Isogai, Keiichi Nagai, Tetsuji Otsuki, Vladimir Kuryshev, Masafumi Shionyu, Kei Yura, Mitiko Go, Jean Thierry-Mieg, Danielle Thierry-Mieg, Stefan Wiemann, Nobuo Nomura, Sumio Sugano, Takashi Gojobori, Tadashi Imanishi.
Abstract
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56,419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37,670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16914452 PMCID: PMC1557807 DOI: 10.1093/nar/gkl507
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Statistics of the data processing and of the alternative splicing variants and exons identified
| #Locus | #cDNA | #Total exon | #Alternative exon | #Constitutive exon | |
|---|---|---|---|---|---|
| 25 585 | 56 419 | 389 895 | 44 727 | 345 168 | |
| 24 425 | 55 036 | 389 895 | 44 727 | 345 168 | |
| 10 127 | 35 030 | 331 924 | 44 727 | 287 197 | |
| 6877 | 18 297 | 176 505 | 37 670 | 138 835 | |
| 4568 | 7494 | 18 297 | 7494 | 10 803 | |
| 5565 | 11 156 | 139 911 | 25 236 | 114 675 | |
| 2933 | 4940 | 18 297 | 4940 | 13 357 | |
| 3216 | 4750 | 18 262 | 6398 | 11 864 | |
| 6005 | 13 409 | 148 242 | 28 728 | 119 514 | |
| 797 | 1034 | 5877 | 1401 | 4476 |
aUnmapped cDNAs' exons could not be counted.
Figure 1Schematic representation of the identification of the alternative splicing. Essentially, the illustrated patterns of the exon pairs were searched for and selected as alternative splicing exons in both computational and manual annotations.
Relation between alternative splicing genes and motifs
| #Motif-related locus | #NOT motif-related locus | Total | |
|---|---|---|---|
| 5523 | 1354 | 6877 | |
| 7241 | 10 307 | 17 548 | |
| 12 764 | 11 661 | 24 425 |
aP-value < 10−16.
Most frequently observed motifs which were affected by alternative splicing variants
| InterPro ID | Motifs in alternative splicing locus | Motifs in all locus | Ratio | Significance of enrichment ( | Definition |
|---|---|---|---|---|---|
| 003598 | 417 | 495 | 0.84 | <10−16 | Immunoglobulin C-2 type |
| 000005 | 237 | 245 | 0.97 | <10−16 | Helix–turn–helix, AraC type |
| 000867 | 73 | 79 | 0.92 | <10−16 | Insulin-like growth factor-binding protein (IGFBP) |
| 000345 | 114 | 211 | 0.54 | 10−15 | Cytochrome |
| 003962 | 55 | 78 | 0.71 | 10−15 | Fibronectin, type III subdomain |
| 002017 | 56 | 88 | 0.64 | 10−12 | Spectrin repeat |
| 000379 | 62 | 103 | 0.6 | 10−11 | Esterase/lipase/thioesterase |
| 002035 | 42 | 60 | 0.7 | 10−11 | von Willebrand factor, type A |
| 000595 | 22 | 25 | 0.89 | 10−10 | Cyclic nucleotide-binding domain |
| 003034 | 31 | 42 | 0.74 | 10−9 | DNA-binding SAP |
Most frequently observed GO terms which were affected by alternative splicing variants
| GO ID | GOs in alternative splicing locus | GOs in all locus | Ratio | Significance of enrichment ( | GO term |
|---|---|---|---|---|---|
| 0003676 | 451 | 1112 | 0.41 | <10−16 | Nucleic acid binding |
| 0003700 | 327 | 518 | 0.63 | <10−16 | Transcription factor activity |
| 0003677 | 276 | 603 | 0.46 | <10−16 | DNA-binding |
| 0004713 | 164 | 318 | 0.52 | <10−16 | Protein tyrosine kinase activity |
| 0005215 | 164 | 299 | 0.55 | <10−16 | Transporter activity |
| 0008270 | 148 | 276 | 0.54 | <10−16 | Zinc ion binding |
| 0005520 | 73 | 79 | 0.92 | <10−16 | Insulin-like growth factor-binding |
| 0005524 | 379 | 967 | 0.39 | 10−14 | ATP binding |
| 0003824 | 190 | 429 | 0.44 | 10−13 | Catalytic activity |
| 0016491 | 116 | 237 | 0.49 | 10−11 | Oxidoreductase activity |
Figure 2Patterns of the identified alternative splicing.
Characteristics of the identified alternative splicing exons
| Exon-intron junction type | Containing Alu-like element | Containing ESE | Total | ||
|---|---|---|---|---|---|
| Canonical | Non-canonical | ||||
| 26 888 (96.6%) | 954 (3.4%) | 12% | 8% | 27 842 | |
| 129 293 (99.2%) | 1073 (0.8%) | 2% | 10% | 130 366 | |
| 156 181 (98.7%) | 2027 (1.3%) | 4% | 9% | 158 208 | |
Figure 3Distribution of the length difference between the alternative splicing variants. The percentages show the populations belonging to the corresponding groups.
Numbers of the genes in which alternative splicing variants should influence the possible protein functions
| #Locus | #cDNA | |
|---|---|---|
| | 3015 | 8727 |
| | 2982 | 8624 |
| | 1779 | 5179 |
| | 1348 | 3933 |
| | 129 | 604 |
| | 172 | 390 |
| | 27 | 56 |
Figure 4Examples of the alternative splicing variants detected as ‘motif-changed’ (A), ‘subcellular localization-changed’ (B) and ‘transmembrane domain-changed’ (C). Exons and introns are represented by green boxes and lines, respectively. The violet boxes are protein coding regions and yellow boxes are alternative splicing exons. The positions of the detected motifs and transmembrane domains are shown beneath the transcripts. In the uppermost panel, GO terms attached to the transcript indicated by the lower line are shown.
Figure 5Examples of the ‘uncommon’ patterns of alternative splicing; ‘bridged’ (A), ‘nested’ (B) and ‘multiple CDS’ (C). These ‘uncommon’ patterns of alternative variations were defined as following: i) ‘bridged’: a locus in which two alternative splicing variants were arrayed tandemly without sharing any exons and another transcript ‘bridged’ these two variants, sharing at least some of its exons with both of them; ii) ‘nested’: a locus in which CDS region of one alternative splicing variant was not shared with another variant and iii) ‘multiple CDS’: a locus in which different ORFs >200 amino acid in length were annotated independently for different alternative splicing variants having overlapping CDSs of different reading frames. In the lower panel of (A), the results of RT–PCR are shown. Each photograph shows the amplicons of the indicated RT–PCR using the indicated primers. Tissue origins of the template RNAs are shown in the margin. The asterisk indicates a non-specific band. The coloring of the figures is the same as in Figure 4.