| Literature DB >> 21253596 |
Tao Huang1, Sibao Wan, Zhongping Xu, Yufang Zheng, Kai-Yan Feng, Hai-Peng Li, Xiangyin Kong, Yu-Dong Cai.
Abstract
Protein concentrations depend not only on the mRNA level, but also on the translation rate and the degradation rate. Prediction of mRNA's translation rate would provide valuable information for in-depth understanding of the translation mechanism and dynamic proteome. In this study, we developed a new computational model to predict the translation rate, featured by (1) integrating various sequence-derived and functional features, (2) applying the maximum relevance & minimum redundancy method and incremental feature selection to select features to optimize the prediction model, and (3) being able to predict the translation rate of RNA into high or low translation rate category. The prediction accuracies under rich and starvation condition were 68.8% and 70.0%, respectively, evaluated by jackknife cross-validation. It was found that the following features were correlated with translation rate: codon usage frequency, some gene ontology enrichment scores, number of RNA binding proteins known to bind its mRNA product, coding sequence length, protein abundance and 5'UTR free energy. These findings might provide useful information for understanding the mechanisms of translation and dynamic proteome. Our translation rate prediction model might become a high throughput tool for annotating the translation rate of mRNAs in large-scale.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21253596 PMCID: PMC3017080 DOI: 10.1371/journal.pone.0016036
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The number of ORFs with low translation rates and high translation rates in rich condition and starvation condition.
| Starvation condition | The number of ORFs | |||
| The number of ORFs with Low translation rate | The number of ORFs with High translation rate | |||
| Rich condition | The number of ORFs with Low translation rate | 1125 | 209 | 1334 |
| The number of ORFs with High translation rate | 209 | 1124 | 1333 | |
| The number of ORFs | 1334 | 1333 | 2667 | |
Figure 1The IFS curves of translation rate prediction in rich and starvation condition.
The IFS curves for (A) the translation rate prediction model of rich condition achieved the peak accuracy at 68.8% with 37 features and (B) the translation rate prediction model of starvation condition achieved the highest accuracy at 70.0% with 86 features.
The common features for translation rate prediction in both rich condition and starvation condition.
| Name | Feature Type | Point-Biserial Correlation (rich) | Point-Biserial Correlation (starvation) |
| ATA | Codon usage frequency | −0.3641809 | −0.320724134 |
| V123 | Amino acids composition | 0.217345654 | 0.249518281 |
| CGA | Codon usage frequency | −0.297473206 | −0.244839127 |
| TCC | Codon usage frequency | 0.251689274 | 0.234058044 |
| NoofRNABindingProteins | Other (Number of RNA binding proteins known to bind its mRNA product) | 0.22353164 | 0.194339726 |
| GCT | Codon usage frequency | 0.279887045 | 0.266483213 |
| V126 | Amino acids composition | −0.180096048 | −0.149802124 |
| GGA | Codon usage frequency | −0.208428434 | −0.176300373 |
| cds.length | Other (Coding sequence length) | 0.097429773 | −0.03025402 |
| V72 | Polarity | 0.279590151 | 0.307614177 |
| CGG | Codon usage frequency | −0.189139955 | −0.147269889 |
| PA | Other (Protein abundance) | 0.141561548 | 0.120850079 |
| AGG | Codon usage frequency | −0.199042873 | −0.154301709 |
| CCA | Codon usage frequency | 0.282776605 | 0.283726919 |
| ACC | Codon usage frequency | 0.24618065 | 0.230897941 |
| TGC | Codon usage frequency | −0.220759013 | −0.173512017 |
| GO:0005737 | GO (GO:0005737_cytoplasm) | 0.242558032 | 0.206209243 |
| GCC | Codon usage frequency | 0.268835706 | 0.270918872 |
| GTA | Codon usage frequency | −0.212847373 | −0.20408338 |
| GO:0042277 | GO (GO:0042277_peptide binding) | 0.137845496 | 0.139232871 |
| CTT | Codon usage frequency | −0.203855194 | −0.190162108 |
| TCT | Codon usage frequency | 0.194907502 | 0.185575651 |
| TAT | Codon usage frequency | −0.188268811 | −0.173452245 |
| AAC | Codon usage frequency | 0.143590251 | 0.176587498 |
| GO:0006878 | GO (GO:0006878_cellular copper ion homeostasis) | 0.134957407 | 0.131972094 |
| V55 | Normalized Van Der Waals volume | −0.19022717 | −0.191407228 |
Figure 2The numbers of each kind of features in optimal feature sets.
The numbers of each kind of features for (A) the optimal 37-feature set of rich condition, (B) the optimal 86-feature set of starvation condition.