| Literature DB >> 27899632 |
Sergey A Evfratov1, Ilya A Osterman1,2,3, Ekaterina S Komarova1, Alexandra M Pogorelskaya1, Maria P Rubtsova1,2,3, Timofei S Zatsepin1,2,3, Tatiana A Semashko4, Elena S Kostryukova4,5, Andrey A Mironov1,3, Evgeny Burnaev2,6, Ekaterina Krymova6, Mikhail S Gelfand1,2,3,6,7, Vadim M Govorun4, Alexey A Bogdanov1,3, Petr V Sergiev1,2,3, Olga A Dontsova1,2,3.
Abstract
Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs.Entities:
Mesh:
Substances:
Year: 2017 PMID: 27899632 PMCID: PMC5389652 DOI: 10.1093/nar/gkw1141
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The principal scheme of the Flowseq experiment. Presented are the steps of library construction, transformation, sorting and sequencing. (A) Cloning of the randomized DNA fragment into pRFPCER reporter vector in front of CER gene. RFP gene retains its constant 5΄-UTR. (B) Electroporation of entire plasmid library into E. coli cells. (C) Separation of cells on the basis of CER/RFP fluorescence by cell sorter. (D) Collection of cell pools (F1–F8) according to CER/RFP fluorescence ratio. (E) DNA extraction and amplification of randomized region followed by next generation sequencing.
Figure 2.Distribution of the CER and RFP fluorescence intensity for cells transformed by reporter construct libraries with 20 nt randomized fragments in the 5΄-UTRs of CER. Each dot corresponds to a single event. The CER fluorescence intensity increases along the X-axis; the RFP intensity used as a control increases along the Y-axis. An overlay of sorting results for cells transformed by a control 4/7 construct is shown in grey. Positions of all collected fractions differing by the CER/RFP ratio are shown. A related Supplementary Figure S3 contains the data on creation and sorting of a library with randomized 30 nt fragments in the 5΄-UTRs.
Figure 3.Influence of the nucleotide sequence of 20 nt fragments in 5΄-UTRs on the translation efficiency. Panels F1–F8 correspond to fractions sorted by the translation efficiency from the least efficient F1 to the most efficient F8. Each bar represents a proportion of nucleotides at the corresponding position in the 5΄-UTR relative to the AUG start codon, as shown below the graphs. Nucleotides are color-coded as shown in the legend at the upper right side of the figure G – red, A – yellow, U – green, C – blue. A related Supplementary Figure S5 contains the data on the positional nucleotide composition of 5΄-UTRs with for randomized 30 nt fragments.
Figure 4.Influence of the secondary structure on the translation efficiency. (A) Distribution of the folding energy for 5΄-UTRs with 20 nt randomized fragments of varying translation efficiencies. Each graph corresponds to a fraction from the highest (F8) to the lowest (F1) translation efficiency, marked below the graph. The 25–75 percentile is boxed, the median value is shown as a horizontal line, outliers less than Q1 – 1.5 Interquartile range (IQR) and more than Q3 + 1.5 IQR are shown as circles. (B) The plfold-score of forming a base pair in the 3΄ direction of a given nucleotide for 5΄-UTRs with 20 nt randomized fragments of varying translation efficiencies. Each graph corresponds to a fraction from the highest (F8) to the lowest (F1) translation efficiency. Position relative to mRNA sequence is shown above graphs; sketch of 5΄-UTR sequence is shown below graphs. Lines correspond to individual mRNAs grouped by the position of maximal plfold-score. The color represents the plfold-score of forming a base pair in the 3΄ direction from black (pairing is unlikely) to red (pairing is highly probable). A related Supplementary Figure S6 contains the data for 5΄-UTRs with randomized 30 nt fragments and the plfold-scores of forming a base pair in the 5΄ direction of a given nucleotide for both libraries.
Figure 5.Frequency and distribution of SD-like subsequences in 5΄-UTRs with randomized 20 nt fragments differing in mRNA translation efficiency. (A) Distribution of maximal SD position specific score matrix (PSSM) scores (X-axis) in 5΄-UTR pools with varying translation efficiency. The colors of plots reflect the translation efficiency from the most efficient (green) to the least efficient (red). Height of bars corresponds to the frequency of motifs with indicated similarity to SD PSSM. (B) Positional distribution of SD sequence PSSM score for 5΄-UTR pools differing in translation efficiency. Position relative to mRNA sequence is shown above graphs; sketch of 5΄-UTR sequence is shown below graphs. Lines correspond to individual mRNAs grouped by the position of maximal PSSM score. The color indicates a similarity of subsequence starting at particular nucleotide to SD sequence (Supplementary Figure S7B). The translation efficiency increases from the least efficient (F1) to the most efficient (F8). A related Supplementary Figure S7 contains the same data for 5΄-UTRs with randomized 30 nt fragments.
Figure 6.Histogram of differences between true and predicted integer class labels (fraction numbers) for the multiclass classification by logistic regression approach.
Translation efficiency of ‘A-rich 5΄-UTR’ mRNA and its mutant forms
| Name | 5΄-UTR sequence | Relative translation efficiency* | Predicted translation efficiency# |
|---|---|---|---|
| Control | GGAGAAGGAGAUAUCAU | 1 | 1 |
| A-rich 5΄-UTR | GGGAUUUAAAAAAAGGCGGAAAAAUAAUGCAU | 6.02 | 0.68 |
| A-rich 5΄-UTR noAUG | GGGAUUUAAAAAAAGGCGGAAAAAUAAU | 2.59 | 0.82 |
| A-rich 5΄-UTR -2G | GGGAUUUAAAAAAAGGC | 4.50 | 0.68 |
| A-rich 5΄-UTR -5G | GGGAUUUAAAAAAA | 0.38 | 1.57 |
| 2/7 | CACACAACACCUGAUCAACU | 0.03 | 0.14 |
*Relative values of fluorescence of CER protein, whose mRNA contained 5΄-UTR shown and RFP protein whose mRNA contained ‘Control’ 5΄-UTR.
#Relative translation efficiencies predicted by RBS calculator on the basis of known mRNA features affection translation.
Translation efficiency of ‘Short SD 5΄-UTR’ mRNA and its mutant forms
| Name | 5΄-UTR sequence | Relative translation efficiency* | Predicted translation efficiency# |
|---|---|---|---|
| Control | GGAGAAGGAGAUAUCAU | 1 | 1 |
| Short SD 5΄-UTR | GGUUUCUUAUUUGGUUCGGAGUGAGAUGCGAU | 4.31 | 0.68 |
| Short SD 5΄-UTR -SD | GGUUUCUUAUUUGGUUC | 0.076 | 0.13 |
| Short SD 5΄-UTR -9UA | GG | 0.073 | 1.18 |
| Short SD 5΄-UTR -3U | GGUUUCUUA | 0.51 | 0.65 |
| Short SD 5΄-UTR -3΄ | GGUUUCUUAUUUGGUUCGGAGU | 2.35 | 0.18 |
| Short SD 5΄-UTR +SD | GGUUUCUUAUUUGGUUC | 0.41 | 0.92 |
| Short SD 5΄-UTR -GUC | GGUUUCUUAUUUGGUUCGGAGU | 3.21 | 0.11 |
| Short SD 5΄-UTR -GCG | GGUUUCUUAUUUGGUUCGGAG | 4.32 | 0.56 |
| 4/13 | CACCCGGAGCAACAACAACU | 0.06 | 0.04 |
| 4/7 | CACACAACACCGGAGCAACU | 0.8 | 0.22 |
*Relative values of fluorescence of CER protein, whose mRNA contained 5΄-UTR shown and RFP protein whose mRNA contained ‘Control’ 5΄-UTR.
#Relative translation efficiencies predicted by RBS calculator on the basis of known mRNA features affection translation.
Translation efficiency of ‘AG-rich 5΄-UTR’ mRNA and its mutant forms
| Name | 5΄-UTR sequence | Relative translation efficiency* | Predicted translation efficiency# |
|---|---|---|---|
| Control | GGAGAAGGAGAUAUCAU | 1 | 1 |
| AG-rich 5΄-UTR | GGAGUCUAAAGAGAGAGAGAGU | 5.09 | 2.05 |
| AG-rich 5΄-UTR –AG1-3 | GGAGUCUAAA | 0.15 | 0.13 |
| AG-rich 5΄-UTR –AG4-6 | GGAGUCUAAAGAGAGA | 1.40 | 0.58 |
| AG-rich 5΄-UTR +SD1 | GGAGUCUAAAGAGAG | 1.52 | 4.18 |
| AG-rich 5΄-UTR +SD2 | GGAGUCUAAAG | 1.56 | 12.37 |
| AG-rich 5΄-UTR -G | GGAGUCUAAAGAGA | 1.14 | 0.31 |
| 4/16 | CCGGAGCACACACAACAACU | 0.02 | 0.04 |
*Relative values of fluorescence of CER protein, whose mRNA contained 5΄-UTR shown and RFP protein whose mRNA contained ‘Control’ 5΄-UTR.
#Relative translation efficiencies predicted by RBS calculator on the basis of known mRNA features affection translation.
Translation efficiency of natural AG-rich 5΄-UTR mRNAs and their mutant forms
| Name | 5΄-UTR sequence | Relative translation efficiency* | Predicted translation efficiency# |
|---|---|---|---|
| Control | GGAGAAGGAGAUAUCAU | 1 | 1 |
| lon | TTAAACTAAGAGAGAGCTCT | 0.25 | 0.36 |
| lon-AG | TTAAACTAA | 0.03 | 0.06 |
| lon+SD | TTAAACTAAG | 1.84 | 3.14 |
| lrp | AATACAGAGAGACAATAATT | 1.04 | 0.67 |
| lrp-AG | AATACA | 0.13 | 0.12 |
| lrp+SD | AATACAG | 5.66 | 1.91 |
| ybaB | CGTGATTGAGAGAGAAACCT | 0.8 | 2.86 |
| ybaB-AG | CGTGATT | 0.04 | 0.32 |
| ybaB+SD | CGTGATTG | 4.84 | 9.31 |
| rplY | TTAAACTAAGAGAGAGCTCT | 0.19 | 0.36 |
| rplY-AG | TTAAACTAA | 0.03 | 0.06 |
| rplY+SD | TTAAACTAAG | 1.81 | 3.14 |
*Relative values of fluorescence of CER protein, whose mRNA contained 5΄-UTR shown and RFP protein whose mRNA contained ‘Control’ 5΄-UTR.
#Relative translation efficiencies predicted by RBS calculator on the basis of known mRNA features affection translation.
Figure 7.Dependence of the CER/RFP translation efficiency on preceding natural 5΄-UTR sequences normalized to that of control RFP mRNA.
Figure 8.Translation efficiency of CER mRNAs with 5΄-UTR corresponding to single and double mutants of eno 5΄-UTR. The axes correspond to nucleotides of eno 5΄-UTR shown by large letters. Coordinates of each square correspond to the first and second mutation of eno 5΄-UTR. Nucleotide variants are shown by small letters. Each square represents a 5΄-UTR variant with translation efficiency shown by color from green (the most efficient translation) to red (the least efficient one). White denotes lack of a 5΄-UTR variant in the data set. Two overlapping SD sequences are boxed. Color code is shown below the plot.