| Literature DB >> 25875337 |
Ophir Shalem1, Eilon Sharon1, Shai Lubliner1, Ifat Regev1, Maya Lotan-Pompan1, Zohar Yakhini2, Eran Segal1.
Abstract
The 3'end genomic region encodes a wide range of regulatory process including mRNA stability, 3' end processing and translation. Here, we systematically investigate the sequence determinants of 3' end mediated expression control by measuring the effect of 13,000 designed 3' end sequence variants on constitutive expression levels in yeast. By including a high resolution scanning mutagenesis of more than 200 native 3' end sequences in this designed set, we found that most mutations had only a mild effect on expression, and that the vast majority (~90%) of strongly effecting mutations localized to a single positive TA-rich element, similar to a previously described 3' end processing efficiency element, and resulted in up to ten-fold decrease in expression. Measurements of 3' UTR lengths revealed that these mutations result in mRNAs with aberrantly long 3'UTRs, confirming the role for this element in 3' end processing. Interestingly, we found that other sequence elements that were previously described in the literature to be part of the polyadenylation signal had a minor effect on expression. We further characterize the sequence specificities of the TA-rich element using additional synthetic 3' end sequences and show that its activity is sensitive to single base pair mutations and strongly depends on the A/T content of the surrounding sequences. Finally, using a computational model, we show that the strength of this element in native 3' end sequences can explain some of their measured expression variability (R = 0.41). Together, our results emphasize the importance of efficient 3' end processing for endogenous protein levels and contribute to an improved understanding of the sequence elements involved in this process.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25875337 PMCID: PMC4398552 DOI: 10.1371/journal.pgen.1005147
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 2Scanning mutagenesis of native 3’ end sequences reveals critical elements required to maintain expression.
(A) Illustration of the two scanning mutagenesis strategies used, in the upper panel two 10bp mutation windows were designed with non-overlapping 10bp steps. In the lower panel 9bp mutation windows were designed with overlapping 3bp steps. (B) Profile of the effect of mutations as a function of location for two genes: CDC24 and YTA5. Y-axis shows the expression log2 fold change compared to the wild type sequence with each point representing a single 10bp mutation window centered around the corresponding x-axis value relative to the stop codon. The gray line connects the average of each pair of mutations. (C) Distribution of log2 fold ratio between mutated and wild type 3’ end sequences showing a highly skewed distribution towards negative values. (D) Distribution of absolute expression values (a.u.) for non-mutated native 3’ end sequences (dark red) and mutated 3’ end sequences (gray). For the mutated sequences, the mutation that resulted in the largest reduction in expression was chosen for each native sequence.
Fig 1Illustration of our method and overall expression distribution.
(A) 13,000 designed synthetic sequences were ligated into a low copy plasmid (top part). The plasmid pool was then transformed into yeast to create a heterogeneous pool of yeast cells each expressing YFP to a different level corresponding to one of the unique 13,000 cloned 3’ end sequences. The cells were then sorted using fluorescence activated sorting (FACS) into 16 expression bins by the YFP/mCherry ratio (middle). Next, the reporter 3’ end sequences of cells in each bin were amplified, using bar coded primers for each bin, and sequence barcodes was recovered using next-generation sequencing (NGS). Finally, each sequencing read was mapped to a specific 3’ end sequence and a specific bin (bottom) to achieve the distribution of cells with each synthetic 3’ end sequence across the expression bins. The distribution of each construct was fit to a gamma distribution and the mean expression value was inferred based on this fit. (B) The distribution of library expression values in induced and un-induced promoter states. The induced state displays a tri-modal distribution with 3 peaks corresponding to (1) non-induced promoter state (2) induced promoter state and low expressing 3’ end sequences and (3) induced promoter state with a wide range of 3’ end mediated expression.