| Literature DB >> 28333956 |
Shweta Bhandare1, Debra S Goldberg1,2, Robin Dowell1,2,3.
Abstract
BACKGROUND: The RNA binding proteins (RBPs) human antigen R (HuR) and Tristetraprolin (TTP) are known to exhibit competitive binding but have opposing effects on the bound messenger RNA (mRNA). How cells discriminate between the two proteins is an interesting problem. Machine learning approaches, such as support vector machines (SVMs), may be useful in the identification of discriminative features. However, this method has yet to be applied to studies of RNA binding protein motifs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28333956 PMCID: PMC5363848 DOI: 10.1371/journal.pone.0174052
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of sequence length of HuR and TTP PAR-CLIP [3] clusters.
| Number of Sequences | Max length | Min Length | Average Length | |
|---|---|---|---|---|
| HuR | 3642 | 243 | 19 | 56 |
| TTP | 4626 | 172 | 21 | 26 |
Fig 1Feature Augmentation Technique.
Original features and augmented features for HuR and TTP domains.
Discriminative methods recover known K-mers for HuR.
Top feature k-mers from the k-spectrum kernel and over-represented k-mers from DREME are compared to the published k-mers [18]. For the k-spectrum kernel, determined feature weights are provided.
| Method | Success Rate | Sensitivity | PPV | ROC |
|---|---|---|---|---|
| 77.3 | 96.8 | 69.7 | 89.4 | |
| DREME | 78.6 | 92.5 | 64.1 | 87.3 |
| UUUUUUU | UUUUUUUUU, 63.5 | UUUUUUU | ||
| UUUAUUU | UUUUAUUUU, 23.9 | UUUUUUA | ||
| UUUGUUU | UUUUUGUUU, 11.3 | AUUUUUU | ||
| UAUUUAU | UUUAUUUUU, 19.9 | UUUUUUA | ||
| AUUUUUA | AUUUUUUUU, 18.4 | UUUUUAA | ||
| AUUUAUU | UUUUUUUAA, 12.9 | AUUUUUA | ||
| AAUUUUUA | UUUUUAUUU, 10.5 | AUUUAU | ||
| AAUUUUA | AUUUUUAUUU, 7.49 | AUUUAA | ||
| AAUAUUU | AAUUUUUUU, 11.2 | |||
| CUUUUUUUU | CUUUUUUUU, 15.19 | CUUUUUU | ||
| UCUCUUUU | UUUCUUUUU, 13.04 | UCUCUUUU | ||
| UUUCUUU | UUUUCUUUU, 20.8 | CUUUUUA | ||
| UUUCCUU | UUUUUCUUU, 18.6 | AUUUUCU | ||
| UUUUUUUUC | UUUUUUUUC, 15.9 | CUUUUAU |
Fig 2Position Frequency Matrix for the top motif identified by DREME for HuR.
Fig 3Position Frequency Matrix for the top motif identified by DREME for TTP.
Discriminative methods recover known K-mers for TTP.
Top feature k-mers from the k-spectrum kernel and over-represented k-mers from DREME are compared to the published k-mers [4]. For the k-spectrum kernel, determined feature weights are provided.
| Method | Success Rate | Sensitivity | PPV | ROC |
|---|---|---|---|---|
| 88.8 | 95.5 | 84.1 | 95.6 | |
| DREME | 91.4 | 91.57 | 77.09 | 93.4 |
| UUAUUUAUU | UUAUUUAUU, 6.44 | AUAAAUA | ||
| UUAUUUA | AUUUAUUUAUU, 5.4 | |||
| UAUUUAUU | UAUUUAUUU, 5.34 | AUAUUUA | ||
| UUUA | UUUAUUUAU, 5.17 | AUAUUUU | ||
| AUUUA | AUUUAUUUA, 4.72 | AUAUUUA |
Feature engineering does not consistently improve model performance.
Performance (success rate, sensitivity, PPV and ROC) for models for HuR and TTP incorporating engineered features.
| RBP | Success Rate | Sensitivity | PPV | ROC |
|---|---|---|---|---|
| HuR | 82.4 | 86.4 | 79.7 | 89.7 |
| TTP | 85.2 | 91.2 | 81.5 | 92.6 |
Engineered features obtain the top weights in the model.
Top ten features along with their weights discovered by the k-spectrum kernel method when feature engineering is incorporated.
| HuR (Feature, Weight) | TTP (Feature, Weight) |
|---|---|
| uCount, 1142.8 | auCount, 1488.15 |
| auCount, 956.9 | uCount, 1462.98 |
| uRepeated, 671.43 | uRepeated, 894.11 |
| UUUUUUUU, 46.69 | aCount, 337.70 |
| UUUAUUUU, 30.21 | UUAUUUAUU, 38.17 |
| AUUUUUUU, 19.88 | UAUUUAUUU, 32.69 |
| UUUUAUUU, 19.64 | AAUAUUUAU, 26.33 |
| UUUUUUUC, 17.99 | AUAUUUAUU, 25.83 |
| UUUUCUUU, 16.68 | UUUAUUUAU, 20.89 |
| UUAUUUUU, 15.63 | AUUUAUUUA, 20.22 |
Four different test scenarios to identify motifs shared and specific to each RBP.
| Test | Positive | Negative |
|---|---|---|
| Scenario 1 | Only HuR | Only TTP + (HuR And TTP) |
| Scenario 2 | Only TTP | Only HuR + (HuR And TTP) |
| Scenario 3 | (HuR And TTP) | Only HuR + Only TTP |
| Scenario 4 | Only HuR | Only TTP |
Discriminative methods evaluate distinct subsets of the HuR and TTP dataset to identify both shared and specific sequence features.
The performance of the k-spectrum kernel on distinct subsets of the data. Top eight k-mers from k-spectrum kernel (by weight) and DREME (by E-value). The test scenarios correspond to Table 6.
| Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | |
|---|---|---|---|---|
| Only HuR vs (Only TTP and Common) | Only TTP vs (Only HuR and Common) | Common vs (HuR and TTP specific) | HuR-specific vs TTP-specific | |
| Positive Sequences | 3206 | 4161 | 467 | 3206 |
| Negative Sequences | 5667 | 4712 | 7367 | 4161 |
| Balanced Success Rate | 78.1 | 72.8 | 62.9 | 80.7 |
| ROC | 85.2 | 80.5 | 68.7 | 88.1 |
| UUUUUU | UAAUAUUUA | UUUUAUUUAA | UUUUGUUUU | |
| AAUAUUUAU | UUUAUUUAA | |||
| UUUUUUUU | CUAUUUAUU | UUUAUUUA | ||
| AUUUUUUUUU | ||||
| UUUUGUUUU | UUUUAUAUU | UUUUUUUUUU | ||
| UUUUUUUU | UAUUUUUUUUA | |||
| AUUUAUUUU | AUUUUUU | |||
| AUAAAUAU | UAUUAUUUU | |||
| DREME Motifs | ||||
| UAUUUAU | UUUUAUUU | |||
| UUUUCUU | AUUUAUU | UAUUUAUAA | UUUUUGU | |
| AUUUAUA | UAUUUAUUC | AUUUUUU | ||
| UCUUUUU | AUAAAUU | UAUUUAUUA | ||
| AUUUUUU | AUAUAUA | UAUUUAUAU | UUCUUUU | |
| AAUUUUU | AUUAAUA | UUUUUUUUUU | UUGUUUU | |
| CUCUUUU | UUUAUAA | |||
| AACUUUU | UUUUUCUUUU | AACUUUU |
Fig 4Analysis of Feature k-mers.
(A) Number of occurrences of HuR specific feature k-mers in each dataset. (B) Number of occurrences of TTP specific k-mers in each dataset. (C) Number of occurrences of common k-mers in each dataset.
Domain adaptation and multi-task learning identifies domain specific (prefaced with HuR or TTP) and shared (no prefix) k-mers.
Performance metrics and top twenty k-mers are compared for the HuR, TTP and combined model (see main text for description of models).
| Test Data/Model | HuR Model ( | TTP Model ( | Combined Model ( |
|---|---|---|---|
| HuR | 89.4 | 87 | 89.4 |
| TTP | 91.2 | 95.6 | 94.5 |
| Combined | NA | NA | 92.5 |
| Features | |||
| UUUUUUUUU | UAUUUAUUU | HuR | |
| UUUUAUUUU | UUAUUUAUU | ||
| UUUUCUUUU | UUUAUUUAU | UUUUUUUUU | |
| UUUAUUUUU | AAUAUUUAU | UUUUAUUUU | |
| AUUUUUUUU | AUUAUUAUU | UAUUUAUUU | |
| UUUUUCUUU | AUAUUUAUU | UUAUUUAUU | |
| UUUUUAUUU | CUAUUUAUU | UUUAUUUUU | |
| UUUUUUUUC | AUUUAUUUA | UUUUUAUUU | |
| CUUUUUUUU | AUUAUUUAU | ||
| UUUCUUUUU | UAUUUUUAU | AUUUUUUUU | |
| UUUUUUUCU | UAAUAUUUA | ||
| UUUUUUUUA | UAUUAUUAU | ||
| UUUUUUCUU | AUUUUUAUU | ||
| UUAUUUUUU | UCUAUUUAU | ||
| UUUUUUUAA | UUAUUAUUA | UUUUUUUUC | |
| UAUUUUUUU | UUAUUAUUU | UUAUUUUUU | |
| UUCUUUUUU | UUUAUUAUU | AUUUUAUUU | |
| AUUUUUAUU | UUUUAUUUA | ||
| UUUUGUUUU | UUUUUAUUU | UUUUUUUUA | |
| UCUUUUUUU | AUUUAUUUU | CUUUUUUUU | |
| UUAUUUUUA | AUUUUAUUU |