| Literature DB >> 17452356 |
German Gaston Leparc1, Robi David Mitra.
Abstract
To better understand the complex role that alternative splicing plays in intracellular signaling, it is important to catalog the numerous splice variants involved in signal transduction. Therefore, we developed PASE (Prediction of Alternative Signaling Exons), a computational tool to identify novel alternative cassette exons that code for kinase phosphorylation or signaling protein-binding sites. We first applied PASE to the Caenorhabditis elegans genome. In this organism, our algorithm had an overall specificity of > or =76.4%, including 33 novel cassette exons that we experimentally verified. We then used PASE to analyze the human genome and made 804 predictions, of which 308 were found as alternative exons in the transcript database. We experimentally tested 384 of the remaining unobserved predictions and discovered 26 novel human exons for a total specificity of > or =41.5% in human. By using a test set of known alternatively spliced signaling exons, we determined that the sensitivity of PASE is approximately 70%. GO term analysis revealed that our exon predictions were found in the introns of known signal transduction genes more often than expected by chance, indicating PASE enriches for splice variants that function in signaling pathways. Overall, PASE was able to uncover 59 novel alternative cassette exons in C. elegans and humans through a genome-wide ab initio prediction method that enriches for exons involved in signaling.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17452356 PMCID: PMC1904267 DOI: 10.1093/nar/gkm187
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 3.Semi-nested RT-PCR approach to detect novel exons from a pool of 18 tissue RNA samples. In the first round of PCR, an external forward primer targeted to a 5′ upstream canonical exonic sequence is used with a reverse primer targeted to the predicted exon in question. A 1:100 dilution of this first round reaction is then used as the template for the second round of PCR. The second round PCR then uses an internal forward primer targeted to an exonic region between the external forward primer and the previously used reverse exon primer. As an example, the novel C. elegans exon in gene ZK180.2 is shown with the expected PCR product band sizes (261 bp first round PCR, 156 bp second round PCR). The second round reaction was cloned and sequenced for validation of the prediction (DNA ladder is labeled as λ).
Figure 1.Overview of the PASE algorithm. (1) PASE first scans intronic sequences of RefSeq genes for candidate cassette exons. (2) Predicted exons that introduce frameshifts or premature stop codons are removed. (3) Exons that overlap blocks of highly conserved genomic sequence elements are then selected. (4) Finally, the remaining cassette exons are searched with Scansite motifs to identify candidate-signaling exons.
Figure 2.Examples of Scansite motifs used in PASE. (A) Akt kinase (also known as Protein Kinase B) is a member of the basophilic serine/threonine-specific protein kinase family that selectively phosphorylates the serine/threonine residue (position 0) of protein sequences resembling the linear motif R-X-R-X-X-S/T. (B) The non-catalytic Src Homology 3 (SH3) domain of the tyrosine kinase Src mediates specific protein–protein interactions by binding to ligands with the linear motif resembling R-X-X-P-X-X-P.
EST and experimentally validated PASE predictions in C. elegans and human
| % validated | Human | % validated | ||
|---|---|---|---|---|
| RefSeq entries | 21 584 | – | 22 615 | – |
| Introns | 118 457 | – | 197 864 | – |
| Translatable exons | 6 008 | 13.1% | 207 176 | 6.2% |
| Conserved exons | 815 | 38.1% | 5 160 | 22.8% |
| Scansite > 10 bits exons | 113 | 37.2% | 5 489 | 19.4% |
| Scansite > 6 bits exons | 823 | 30.8% | 35 190 | 13.7% |
| Conserved & Scansite > 10 bits exons | 20 (18) | 90.0% | 109 (57) | 52.3% |
| Conserved & Scansite > 6 bits exons | 140 (107) | 76.4% | 804 (334) | 41.5% |
The number of validated predictions is in parentheses.
Thirty-three novel C. elegans exons with predicted interaction sites confirmed by cloning and sequencing
| Wormbase ID | Gene Description | Intron | Best hit (human homolog) | Score | Motif |
|---|---|---|---|---|---|
| F29C4.8 | Collagens (type IV and type XIII), and related proteins (COLlagen) | 3 | p85_SH3_m1 | 12.6825 | PPGLRGSPGWPGLPG |
| ZK180.2 | GABA-B ion channel receptor subunit GABABR1 homolog | 3 | PKC_common | 10.8279 | FGWKRVGTVKQNDQP |
| K11C4.5 | Ca2+ release channel (ryanodine receptor) | 9 | Casn_Kin2 | 10.2192 | KDVLEEETEEQEPIW |
| K02H8.1 | It encodes a muscleblind-like. Would have nucleic-acid-binding activity | 3 | Grb2_SH2 | 10.023 | NTPIYPPYYNGMMYP |
| C14A11.3a | Guanine nucleotide exchange factor for Rho and Rac GTPases | 1 | PKC_zeta | 8.8355 | KKYGFWGSVFSKYCF |
| T13H5.1 | Protein tyrosine phosphatase | 13 | PLCg_SH3 | 8.831 | KRPHQVPPMKVDPEG |
| C29A12.4 | Neurexin III-alpha/GLIoTactin ( | 23 | Casn_Kin2 | 8.6202 | QITDGDESEDEFDGS |
| H16O14.2 | Encodes a putative nuclear protein | 1 | ErkDD | 8.4135 | KKPPNMHINIPTDEG |
| C44C10.9 | Encodes a Collagen structural gene | 3 | Src_Kin | 8.3384 | YSTPDDIYSAYEKFI |
| F36H1.4c | Abnormal cell LINeage | 5 | GSK3_Kin | 8.2932 | MVHHPNQTISTTPSS |
| T22B2.4 | Predicted RNA-binding protein SEB4 (RRM superfamily) (SUPpressor) | 4 | PIP3_PH | 8.1225 | MGTKKSEFLS RMCV F |
| F10C1.7c | Nuclear envelope protein lamin, intermediate filament superfamily | 2 | PKA_Kin | 8.0228 | RKEFKRETENGTDKM |
| C02F12.1 | Tetraspanin family integral membrane protein | 3 | PDGFR_Kin | 7.5919 | KDKFSNNYMGVYLKN |
| C23F12.1a | Actin-binding cytoskeleton protein, filamin | 3 | p85_SH3_m1 | 7.5172 | EPLGGGVPKQPVQFY |
| Y49F6B.9 | Predicted E3 ubiquitin ligase | 6 | Cam_Kin2 | 7.3624 | HPCPRCKTLIVKEND |
| F33D4.2e | Inositol 1,4,5-trisphosphate receptor (Inositol Triphosphate Receptor) | 19 | DNA_PK | 7.1926 | IGKMSQDSQSDYDSD |
| T12A7.6 | Putative protein, unknown function | 3 | Src_Kin | 7.1049 | YAKTDLIYDDWKFDN |
| Y67D8C.10b | Calcium transporting ATPase (Membrane Calcium ATPase) | 5 | PKA_Kin | 7.0479 | HHREHRDSHHQAQNQ |
| H10D18.5 | Encodes a putative nuclear protein | 2 | p85_SH3_m2 | 6.9925 | IDNKPLFPYMHFAQF |
| T02G6.2 | It encodes a putative nuclear protein. predicted to localize in the nucleus | 4 | Clk2_Kin | 6.9219 | RVEYRYHSETLLYDF |
| C32C4.1 | Voltage-gated K+ channel KCNB/KCNC | 5 | Casn_Kin1 | 6.8928 | KEFTGITSGWPFLGA |
| ZC518.1a | Bestrophin (Best vitelliform macular dystrophy-associated protein) | 11 | GSK3_Kin | 6.8602 | RADSPDDSSHDSCSH |
| C06H5.6 | Synaptic vesicle transporter SVOP and related transporters | 6 | PKC_zeta | 6.8268 | IKKYINESVAFNKQT |
| F28B4.2 | Guanine-nucleotide releasing factor | 1 | Fgr_Kin | 6.7201 | VPQYHMQYFTFDKIT |
| F08A10.1a | Ca2+-activated K+ channel proteins (intermediate conductance classes) | 3 | Src_SH2 | 6.6538 | PRRKRVDYDQISMNW |
| F53A3.4 | Prion-like-(Q/N-rich)-domain-bearing protein | 1 | ATM_Kin | 6.5667 | HQQSIQFSQFPPPQL |
| H14N18.4a | Gamma-glutamyltransferase | 7 | Casn_Kin1 | 6.4654 | KDMPDSETINKAPDH |
| Y67D8C.9 | Puromycin-sensitive aminopeptidase and related aminopeptidases | 2 | Itk_SH3 | 6.4406 | QSKKKTPPRVVERLI |
| C14F11.5 | Alpha crystallins (heat shock protein) | 4 | Cdk5_Kin | 6.3331 | TVTPEQRSPGRKAFE |
| C36F7.4a | Immunoglobulin C-2 Type/fibronectin type III domains (neuRonal IGCAM) | 3 | Erk1_Kin | 6.2825 | CKADGNPTPTVIWRR |
| T23E1.1 | It encodes a putative membrane protein family member of bilaterial origin. | 2 | p38_Kin | 6.2769 | PMIFSCNSPMGNWAN |
| ZK180.2 | GABA-B ion channel receptor subunit GABABR1 homolog | 3 | GSK3_Kin | 6.2566 | THAKFASSDSHEPHE |
| K05B2.5 | Monocarboxylate transporter | 10 | Nck_2nd_SH3 | 6.0161 | ADTADGMPQLQDQDN |
Twenty-six novel human exons with predicted interaction sites confirmed by cloning and sequencing
| Gene symbol | Gene description | Intron | Best Scansite | Score | Motif |
|---|---|---|---|---|---|
| KCNH1 | Potassium voltage-gated channel, subfamily H, | 7 | p85_SH2 | 13.1167 | WEEDPYEYIRMKFDV |
| VPS8 | Vacuolar protein sorting-associated 8 | 4 | p85_SH2 | 11.891 | WEPPVEDYISMTFSE |
| CRYGN | Gamma N-crystallin variant | 3 | Abl_SH2 | 10.5251 | GDGAWVLYEEPNYHG |
| CD58 | Lymphocyte function-associated antigen 3 precursor (Ag3) | 3 | PKC_common | 10.1748 | GKNVTVKTIKKKQKR |
| PLEKHA6 | Phosphoinositol 3-phosphate-binding protein-3 | 7 | Cdc2_Kin | 9.7482 | FPYNYPPSPTVHDKM |
| ANKS1A/ODIN | Ankyrin repeat and sterile alpha motif domain | 4 | PDGFR_Kin | 9.3912 | KYGPFDPYINAKNND |
| PLEKHA6 | Phosphoinositol 3-phosphate-binding protein-3 | 13 | PLCg_SH3 | 8.8428 | ESPPAVPPLPSESRF |
| ESR1 | Estrogen receptor 1 / estrogen receptor alpha | 2 | p85_SH3_m2 | 8.5929 | EKPWQQMPLKGHNDY |
| ITGA6 | Integrin alpha chain, alpha 6 | 5 | Akt_Kin | 8.5524 | PPREQPDTFPDVMMN |
| MGC26733 | Hypothetical protein MGC26733 | 18 | Nck_2nd_SH3 | 8.5351 | KVFDECFPDQPQIGH |
| SNX1 | Sorting nexin 1 isoform c | 11 | PLCg_NSH2 | 8.4544 | RYGQSGNYMELAWHC |
| MTMR6 | Myotubularin-related protein 6 | 6 | PKA_Kin | 8.353 | RPKRRMQSWWATQKD |
| RGPD5 | RANBP2-like and GRIP domain containing 5 isoform | 20 | GSK3_Kin | 8.0857 | FWTSTPSSQPESKEP |
| LRP1 | Low density lipoprotein-related receptor 1 | 19 | Casn_Kin1 | 7.6201 | GDGSDEQTCPEPADN |
| UTY | Tetratricopeptide repeat protein isoform 1 | 12 | Abl_SH3 | 7.3266 | IEEAWSLPIPAELTS |
| PMS1 | Postmeiotic segregation 1 | 4 | PDGFR_Kin | 7.2751 | YMKKSGDYVTVVEDV |
| CREB5 | cAMP responsive element binding protein 5 | 3 | M1433 | 6.7301 | MDFSKGHTWTIVMNA |
| RECQL5 | RecQ protein-like 5 isoform 1 | 6 | Crk_SH3 | 6.5986 | ISTFQSPPPLPSRTL |
| PAK3 | p21-activated kinase 3 | 2 | GSK3_Kin | 6.5144 | FQTSRPVTVASSQSE |
| GTDC1 | Glycosyltransferase-like domain containing 1 | 8 | Nck_2nd_SH3 | 6.3065 | LQEKEREPKMQFNTQ |
| LSAMP | Limbic system-associated membrane protein | 1 | PKC_common | 6.1958 | FKQRKKPTLCRCVVE |
| PPP2R1B | Protein phosphatase 2, regulatory subunit A (PR 65), beta | 15 | p38_Kin | 6.1829 | AAVRDIQSPCRAQGP |
| WDFY3 | WD repeat and FYVE domain containing 3 isoform | 2 | Erk1_Kin | 6.8017 | EKQCALLSPKDFKAT |
| RPS6KC1 | Ribosomal protein S6 kinase, 52kDa, polypeptide | 1 | p38_Kin | 6.619 | PGWWVIT S PNILANQ |
| OSR1/OXSR1 | Oxidative-stress responsive 1 | 3 | GSK3_Kin | 6.5846 | MVGSFANTNHLSRWW |
| RUNX3 | Runt-related transcription factor 3 | 3 | p38_Kin | 6.2715 | SCSCWLPSPHTDFFQ |
Figure 4.Novel exons in ERα and LRP1 exhibit tissue-specific expression. Flanking exon primers were used in RT-PCR tests for expression of the constitutive splice junctions, while exon-specific semi-nested primers were used to test for the expression of the novel alternative exon. (A) Expression of the ERα constitutive exon 2–3 splice junction and the novel alternative exon across 18 tissues. The constitutive splice junction is expressed in several tissues, shown as a 234 bp PCR product. The novel exon is shown as a 244 bp PCR product that is exclusively included in breast and liver. (B) Expression of the LRP1 constitutive exon 18–19 splice junction and the novel alternative exon across 18 tissues. The constitutive splice junction is expressed in most tissues, showing a 240 bp PCR product. The novel exon is shown as a 128 bp PCR product that is observed in all tissues except uterus. Abbreviations for lanes: DNA ladder (λ), prostate (Pro), breast (Brt), colon (Col), skin (Ski), stomach (Stm), thymus (Thy), lung (Lun), trachea (Tra), placenta (Pla), no template negative control (−N), and pooled tissue control (+P), brain (Brn), retina (Ret), skeletal Muscle (SM), testis (Tes), kidney (Kid), ovary (Ovy), pancreas (Pan), uterus (Utr) and liver (Liv). λ Lanes have size markers spaced at 50-nt intervals up to 350 nt, then the top two bands are for 500 and 766 nt. The strong band corresponds to 200 nt.