| Literature DB >> 24931975 |
Michael K K Leung1, Hui Yuan Xiong1, Leo J Lee1, Brendan J Frey2.
Abstract
MOTIVATION: Alternative splicing (AS) is a regulated process that directs the generation of different transcripts from single genes. A computational model that can accurately predict splicing patterns based on genomic features and cellular context is highly desirable, both in understanding this widespread phenomenon, and in exploring the effects of genetic variations on AS.Entities:
Mesh:
Year: 2014 PMID: 24931975 PMCID: PMC4058935 DOI: 10.1093/bioinformatics/btu277
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Architecture of the DNN used to predict AS patterns. It contains three hidden layers, with hidden variables that jointly represent genomic features and cellular context (tissue types)
Comparison of the LMH code's AUC performance on different methods
| (a) AUCLMH_All | ||||
|---|---|---|---|---|
| Tissue | Method | Low | Medium | High |
| Brain | MLR | 81.3 ± 0.1 | 72.4 ± 0.3 | 81.5 ± 0.1 |
| BNN | 75.2 ± 0.3 | |||
| DNN | ||||
| Heart | MLR | 84.6 ± 0.1 | 73.1 ± 0.3 | 83.6 ± 0.1 |
| BNN | 74.7 ± 0.3 | |||
| DNN | ||||
| Kidney | MLR | 86.7 ± 0.1 | 75.6 ± 0.2 | 86.3 ± 0.1 |
| BNN | 78.3 ± 0.4 | |||
| DNN | ||||
| Liver | MLR | 86.5 ± 0.2 | 75.6 ± 0.2 | 86.5 ± 0.1 |
| BNN | 77.9 ± 0.6 | |||
| DNN | ||||
| Testis | MLR | 85.6 ± 0.1 | 72.3 ± 0.4 | 85.2 ± 0.1 |
| BNN | ||||
| DNN | ||||
Notes: ± indicates 1 standard deviation; top performances are shown in bold.
Comparison of the DNI code's performance in terms of the AUC for decrease versus increase (AUCDvI) and change versus no change (AUCChange)
| (a) AUCDvI | (b) AUCChange | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | Brain versus Heart | Brain versus Kidney | Brain versus Liver | Brain versus Testis | Heart versus Kidney | Heart versus Liver | Heart versus Testis | Kidney versus Liver | Kidney versus Testis | Liver versus Testis | Change versus No change |
| MLR | 50.3 ± 0.2 | 48.8 ± 0.8 | 48.3 ± 1.1 | 51.2 ± 0.5 | 50.0 ± 1.5 | 47.8 ± 1.7 | 51.1 ± 0.5 | 49.4 ± 0.8 | 51.9 ± 0.5 | 51.3 ± 0.6 | 74.7 ± 0.1 |
| BNN-MLR | 65.3 ± 0.3 | 73.7 ± 0.2 | 69.1 ± 0.4 | 72.9 ± 0.5 | 72.6 ± 0.3 | 66.7 ± 0.4 | 68.3 ± 0.7 | 54.7 ± 0.6 | 65.0 ± 0.8 | 65.0 ± 0.9 | 76.6 ± 0.8 |
| DNN-MLR | 77.9 ± 0.1 | 81.6 ± 0.1 | 82.4 ± 0.1 | 81.3 ± 0.1 | 82.4 ± 0.1 | 79.9 ± 0.2 | 79.1 ± 0.1 | 79.9 ± 0.8 | |||
| DNN | |||||||||||
Note: ± indicates 1 standard deviation; top performances are shown in bold.
Performance of the DNN evaluated on a different RNA-Seq experiment
| (a) AUCLMH_All | |||
|---|---|---|---|
| Tissue | Low | Medium | High |
| Brain | 88.1 ± 0.5 | 76.1 ± 1.0 | 87.0 ± 0.6 |
| Heart | 90.7 ± 0.5 | 78.4 ± 1.3 | 89.0 ± 1.0 |
Fig. 2.Plot of the change in AUCLMH_All by substituting the values in each feature groups by their median. Feature groups that are more important to the predictive performance of the model have lower values. The groups are sorted by the mean over multiple partitions and folds, with the standard deviations shown. The number of features for each feature group are indicated in brackets
The top 25 features (unordered) of the splicing code that describes low and high percent inclusion
| Feature description | Low | High |
|---|---|---|
| Strength of the I1 acceptor site | ↓ | ↑ |
| Strength of the I2 donor site | ↓ | ↑ |
| Strength of the I1 donor site | ↑ | ↓ |
| Mean conservation score of first 100 bases in 3′ end of I1 | ↑↓ | ↑↓ |
| Mean conservation score of first 100 bases in 5′ end of I2 | ↑↓ | ↑↓ |
| Counts of Burge's exonic splicing silencer in A | ↓ | ↑ |
| Counts of Chasin's exonic splicing silencer in A | ↓ | ↑ |
| Log base 10 length of exon A | ↓ | ↑ |
| Log base 10 length ratio between A and I2 | ↓ | ↑ |
| Whether exon A introduces frame shift | ↑↓ | ↑↓ |
| Predicted nucleosome positioning in 3′ end of A | ↑↓ | ↑↓ |
| Frequency of AGG in exon A | ↑ | ↓ |
| Frequency of CAA in exon A | ↓ | ↑ |
| Frequency of CGA in exon A | ↓ | ↑ |
| Frequency of TAG in exon A | ↑ | ↓ |
| Frequency of TCG in exon A | ↓ | ↑ |
| Frequency of TTA in exon A | ↑ | ↓ |
| Translatability of C1-A | ↓ | ↑ |
| Translatability of C1-A–C2 | ↓ | ↑ |
| Translatability of C1–C2 | ↑ | ↓ |
| Counts of Yeo's ‘GTAAC’ motif cluster in 5′ end of I2 | ↓ | ↑ |
| Counts of Yeo's ‘TGAGT’ motif cluster in 5′ end of I2 | ↓ | ↑ |
| Counts of Yeo's ‘GTAGG’ motif cluster in 5′ end of I2 | ↓ | ↑ |
| Counts of Yeo's ‘GTGAG’ motif cluster in 5′ end of I2 | ↓ | ↑ |
| Counts of Yeo's ‘GTAAG’ motif cluster in 5′ end of I2 | ↓ | ↑ |
Note: The direction of the arrows indicate that a feature's value should in general be increased (↑) or decreased (↓) to change the PSI predictions to low or high. Feature details can be found in Section 4 of the Supplementary Material.
Fig. 3.Magnitude of the backpropagated signal to the input of the top 50 features computed when the targets are changed from low to high, and high to low. White indicates that the magnitude of the signal is large, meaning that small perturbations to this input can cause large changes to the model's predictions. The features are approximately sorted left to right by the magnitude