| Literature DB >> 24499931 |
Areum Han1, Peter Stoilov2, Anthony J Linares3, Yu Zhou4, Xiang-Dong Fu4, Douglas L Black5.
Abstract
The splicing regulator Polypyrimidine Tract Binding Protein (PTBP1) has four RNA binding domains that each binds a short pyrimidine element, allowing recognition of diverse pyrimidine-rich sequences. This variation makes it difficult to evaluate PTBP1 binding to particular sites based on sequence alone and thus to identify target RNAs. Conversely, transcriptome-wide binding assays such as CLIP identify many in vivo targets, but do not provide a quantitative assessment of binding and are informative only for the cells where the analysis is performed. A general method of predicting PTBP1 binding and possible targets in any cell type is needed. We developed computational models that predict the binding and splicing targets of PTBP1. A Hidden Markov Model (HMM), trained on CLIP-seq data, was used to score probable PTBP1 binding sites. Scores from this model are highly correlated (ρ = -0.9) with experimentally determined dissociation constants. Notably, we find that the protein is not strictly pyrimidine specific, as interspersed Guanosine residues are well tolerated within PTBP1 binding sites. This model identifies many previously unrecognized PTBP1 binding sites, and can score PTBP1 binding across the transcriptome in the absence of CLIP data. Using this model to examine the placement of PTBP1 binding sites in controlling splicing, we trained a multinomial logistic model on sets of PTBP1 regulated and unregulated exons. Applying this model to rank exons across the mouse transcriptome identifies known PTBP1 targets and many new exons that were confirmed as PTBP1-repressed by RT-PCR and RNA-seq after PTBP1 depletion. We find that PTBP1 dependent exons are diverse in structure and do not all fit previous descriptions of the placement of PTBP1 binding sites. Our study uncovers new features of RNA recognition and splicing regulation by PTBP1. This approach can be applied to other multi-RRM domain proteins to assess binding site degeneracy and multifactorial splicing regulation.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24499931 PMCID: PMC3907290 DOI: 10.1371/journal.pcbi.1003442
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1PTBP1 binding model.
A. Scheme of the PTBP1 binding model. The two-state HMM model was trained on PTBP1 bound RNA sequences (48,604 clusters) from published PTBP1-CLIP experiments. Triplets from these CLIP clusters were predictive of two states, with all of the pyrimidine triplets preferred by State 1. The diagram presents the structure of the PTBP1 HMM (Hidden Markov Model) and its trained transition probabilities. B. The probabilities that triplets are seen states 1 or 2 (emission probabilities) are plotted in black and gray bars, respectively. Asterisks indicate G containing pyrimidine triplets.
Figure 2Validation of the PTBP1 binding model.
A. To validate binding scores, thirteen RNAs with various PTBP1 binding scores were transcribed in vitro and subjected to binding assay. Apparent Kd's (dissociation constant) were highly negatively correlated with PTBP1 binding scores (Pearson correlation = −0.9). B. Four RNA sequences with predicted PTBP1 binding scores (Full data binding data in Figure S3). Potential PTBP1 binding sites are underlined and in bold. Experimental binding affinities were assessed by electrophoretic mobility shift of RNA by PTBP1 and compared with prediction scores. Apparent dissociation constants (Kd) were defined as the concentration at which half the protein was bound to RNA.
Figure 3Sequence characteristics of PTBP1-dependent alternatively spliced exons.
A. An RNA map shows enrichment of predicted PTBP1 binding sites near PTBP1-dependent exons. The Y-axis plots average density of predicted PTBP1 binding states within a 24 nt window; the length of overlap between two adjacent windows was 8 nt. B. To assess PTBP1 binding signatures of individual exons, known PTBP1 regulated exons were clustered by their PTBP1 binding score profiles and visualized as heat maps. These heat maps indicate wide variation in the positions of PTBP1 binding sites between individual exons. C. Four sequence features including the PTBP1 binding scores and 3′ splice site strength show statistically significant differences between regulated and control exon groups (one-tailed Student's t-tests).
Figure 4Scheme of the PTBP1 splicing regulation model and its application to an exon in Ptbp3.
A. The PTBP1 splicing regulation model was trained on known PTBP1-regulated and non-regulated exons and used to predict new PTBP1-dependent exons. Prediction results were compared to changes in exon inclusion (PSI) measured by RT-PCR and RNA-seq. An exon from Ptbp3 is presented as a prediction example. From intron and exon sequences, PTBP1 binding scores and 3′ splice site strength were calculated and fed into the regulation model. B. The model predicts exon 2 of Ptbp3 as repressed by PTBP1 with high probability (0.89). Ptbp1 knockdown in mouse neuroblastoma cells (N2A) confirmed de-repression of the exon (from PSI = 45 to PSI = 70).
PTBP1 repressed exons identified by the splicing model.
| Gene Name | Gene Description | mm9 coordinates | PTB Binding Scores | 3′ | p(Repressed) | ||
| Upstream | Downstream | Splice site | |||||
| Intron (250 nt) | Exon | Intron (100 nt) | Strength | ||||
| paired box gene 6 | chr2:105523985–105524115(+) | 8.35 | −1.49 | −0.80 | 0.27 | 0.99 | |
| methyl-CpG binding domain protein 5 | chr2:49134101–49135303(+) | 6.46 | −0.93 | 2.27 | −0.32 | 0.98 | |
| Rho GTPase activating protein 24 | chr5:102981145–102981338(+) | 6.47 | 0.10 | −0.05 | 0.46 | 0.97 | |
| transducin-like enhancer of split 1 | chr4:71819247–71819451(−) | 4.71 | 0.05 | −1.26 | −2.56 | 0.94 | |
| acyl-CoA synthetase long-chain family | chr11:54150438–54150515(+) | 4.16 | 1.40 | −0.03 | −0.82 | 0.94 | |
| ryanodine receptor 1, skeletal muscle | chr7:29829938–29829955(−) | 4.78 | −0.08 | −0.88 | −1.71 | 0.94 | |
| ankyrin repeat and KH domain containing 1 | chr18:36784163–36784921(+) | 4.37 | 0.03 | 1.44 | −0.64 | 0.93 | |
| solute carrier family 39 (zinc transporter) | chr14:70713408–70713577(−) | 3.51 | 1.16 | −1.27 | −3.20 | 0.92 | |
| gamma-aminobutyric acid (GABA) A receptor | chr11:41727472–41727495(−) | 1.95 | 2.77 | 0.56 | −3.79 | 0.92 | |
| integrin alpha 7 | chr10:128378878–128378997(+) | 4.14 | 0.29 | 1.13 | −0.25 | 0.92 | |
| IQ motif and Sec7 domain 2 | chrX:148615540–148615635(+) | 4.88 | 0.68 | −0.13 | 0.71 | 0.91 | |
| SWI/SNF related, matrix associated, actin dependent regulator of chromatin | chr19:26825612–26825646(+) | 3.94 | −0.09 | 1.23 | −0.86 | 0.91 | |
| zinc finger, AN1-type domain 3 | chr17:30197755–30197795(+) | 4.17 | 2.34 | 0.90 | 1.80 | 0.91 | |
| ArfGAP with GTPase domain, ankyrin repeat and PH domain 2 | chr10:126527198–126527257(+) | 3.57 | 0.06 | −0.53 | −3.08 | 0.90 | |
| Titin | chr2:76723554–76723832(−) | 2.93 | 1.06 | 1.19 | −1.83 | 0.90 | |
| ROD1 regulator of differentiation 1 (S. pombe) | chr4:59559021–59559054(−) | 3.80 | 0.57 | 1.66 | 0.73 | 0.89 | |
| mitogen-activated protein kinase 8 | chr14:34203859–34203930(−) | 2.35 | 1.17 | 1.01 | −3.48 | 0.89 | |
| synaptosomal-associated protein 91 | chr9:86693373–86693534(−) | 2.60 | 1.89 | −0.35 | −2.17 | 0.88 | |
| formin-like 1 | chr11:103059449–103059547(+) | 3.93 | −0.60 | −1.36 | −2.70 | 0.88 | |
| pleckstrin homology-like domain, family B | chr9:44509029–44509169(−) | 3.20 | 0.57 | 1.05 | −0.66 | 0.87 | |
| RIKEN cDNA 2310035C23 gene | chr1:107637012–107637094(+) | 2.03 | 1.76 | 0.85 | −2.46 | 0.87 | |
| aryl hydrocarbon receptor nuclear translocator | chr3:95270715–95270759(+) | 3.48 | −0.36 | 2.53 | 0.09 | 0.87 | |
| SET and MYND domain containing 2 | chr1:191723697–191723807(−) | 3.33 | 0.33 | −0.28 | −1.64 | 0.86 | |
| adaptor protein complex AP-2, alpha 1 subunit | chr7:52158832–52158897(−) | 3.35 | −0.12 | −0.89 | −2.69 | 0.86 | |
| killer cell lectin-like receptor, subfamily A | chr6:130329011–130329100(−) | 2.82 | 3.18 | 0.62 | 0.77 | 0.86 | |
| sperm associated antigen 9 | chr11: 93942054–93942068(+) | 0.99 | 3.01 | 1.62 | −3.03 | 0.86 | |
| collagen, type IV, alpha 3 binding protein | chr13:97386949–97387026(+) | 2.81 | 1.23 | 0.74 | −0.93 | 0.86 | |
| GTPase activating RANGAP domain-like 3 | chr2:32941395–32941464(−) | 4.15 | 0.38 | 0.36 | 0.94 | 0.86 | |
| DENN/MADD domain containing 1A | chr2:37982049–37982168(−) | 3.37 | 0.80 | 1.35 | 0.59 | 0.86 | |
| membrane-spanning 4-domains, subfamily A | chr19:11400297–11400353(−) | 2.79 | 2.35 | 0.37 | −0.05 | 0.86 | |
| cDNA sequence BC030307 | chr10:86169981–86170089(+) | 2.75 | −0.16 | 1.95 | −2.40 | 0.85 | |
| phosphatase and actin regulator 1 | chr13:43154940–43155146(+) | 2.73 | 1.25 | 0.50 | −0.97 | 0.85 | |
| R3H domain containing 2 | chr10:126902187–126902240(+) | 1.66 | 1.98 | 1.96 | −1.57 | 0.84 | |
| CDC14 cell division cycle 14B | chr13:64306579–64306725(−) | 1.42 | 2.75 | 2.44 | −0.58 | 0.84 | |
| ubiquilin 1 | chr13:58282183–58282266(−) | 2.88 | 0.98 | −0.06 | −1.17 | 0.84 | |
| Titin | chr2:76739898–76740179(−) | 2.63 | −0.07 | 1.49 | −2.38 | 0.84 | |
| syntaxin 3 | chr19:11857290–11857400(−) | 3.00 | −1.12 | 2.26 | −3.62 | 0.84 | |
| solute (sodium/calcium) carrier family 8 | chr12: 82310340–82310458(−) | 1.84 | 1.25 | 1.76 | −2.25 | 0.84 | |
| zinc finger protein 62 | chr11:49028057–49028156(+) | 3.27 | 1.98 | −0.51 | 0.19 | 0.83 | |
| discs, large homolog 1 (Drosophila) | chr16:31771843–31771941(+) | 1.53 | 1.98 | 1.87 | −1.65 | 0.83 | |
| neurexin II | chr19:6463824–6463847(+) | 3.35 | −1.37 | 1.33 | −2.26 | 0.83 | |
| killer cell lectin-like receptor, subfamily A | chr6:130179953–130180042(−) | 2.68 | 2.11 | −0.39 | −0.63 | 0.83 | |
| phosphatidylinositol binding clathrin assembly | chr7:97330729–97330878(+) | 1.15 | 2.15 | 2.30 | −2.37 | 0.83 | |
| acyl-Coenzyme A dehydrogenase family | chr9:26798168–26798277(−) | 2.61 | 0.88 | −0.31 | −1.86 | 0.83 | |
| epsin 1 | chr7:5033620–5033723(+) | 3.92 | 0.65 | 0.07 | 1.06 | 0.82 | |
| glutamate receptor interacting protein 1 | chr10:119422530–119422685(+) | 2.66 | −0.74 | 2.61 | −3.13 | 0.82 | |
| CUB and Sushi multiple domains 3 | chr15:47587514–47587627(−) | 2.42 | 2.16 | 0.20 | −0.45 | 0.82 | |
| leucine rich repeat (in FLII) interacting protein 1 | chr1:92990137–92990214(+) | 2.02 | 0.40 | 3.44 | −2.21 | 0.82 | |
| serine/arginine-rich splicing factor 11 | chr3:157703405–157703586(+) | 1.09 | 2.05 | −0.56 | −4.75 | 0.82 | |
| transmembrane protein 209 | chr6:30441087–30441184(−) | 3.82 | 0.16 | 0.10 | 0.64 | 0.82 | |
The 50 highest scoring exons predicted to be repressed by PTBP1 based on sequence alone.
Figure 5Validation of novel PTBP1-repressed exons by RT-PCR.
A. Candidate PTBP1-repressed exons with probability greater than 0.65 were validated by RT-PCR following Ptbp1 knockdown. Data shown are averages ± standard error of PSI (Percent of Spliced In) from biological triplicates. Statistical analysis was performed using paired one-tailed Student's t-test (p-values<0.01**, <0.05*). B. Exons with low PTBP1 repression probabilities (≤0.2) were also validated by RT-PCR following Ptbp1 knockdown in biological triplicates.
Figure 6Large-scale validation of novel PTBP1-repressed exons by RNA-seq.
A. Validation of the PTBP1 splicing model using RNA-seq. After Ptbp1 knockdown, we performed RNA-seq experiments and estimated changes in PSI (Percent of Spliced In) for 573 cassette exons. The graph shows average delta PSI values for exons, grouped by their probabilities to be repressed by PTBP1. The number of exons in the corresponding probability bin is given by n. P-values were calculated from one-tailed Student's t-test. B. A genome browser screenshot of a novel PTBP1-regulated exon: exon 2 of the Kcnq2 gene. For whole internal mouse exons, we created custom genome browser tracks to visualize the PTBP1 splicing model and mapped RNA seq reads.