Literature DB >> 28178267

A quantitative model for the rate-limiting process of UGA alternative assignments to stop and selenocysteine codons.

Yen-Fu Chen¹, Hsiu-Chuan Lin^1,2, Kai-Neng Chuang^1,2, Chih-Hsu Lin³, Hsueh-Chi S Yen^1,2, Chen-Hsiang Yeang^2,3.

Abstract

Ambiguity in genetic codes exists in cases where certain stop codons are alternatively used to encode non-canonical amino acids. In selenoprotein transcripts, the UGA codon may either represent a translation termination signal or a selenocysteine (Sec) codon. Translating UGA to Sec requires selenium and specialized Sec incorporation machinery such as the interaction between the SECIS element and SBP2 protein, but how these factors quantitatively affect alternative assignments of UGA has not been fully investigated. We developed a model simulating the UGA decoding process. Our model is based on the following assumptions: (1) charged Sec-specific tRNAs (Sec-tRNASec) and release factors compete for a UGA site, (2) Sec-tRNASec abundance is limited by the concentrations of selenium and Sec-specific tRNA (tRNASec) precursors, and (3) all synthesis reactions follow first-order kinetics. We demonstrated that this model captured two prominent characteristics observed from experimental data. First, UGA to Sec decoding increases with elevated selenium availability, but saturates under high selenium supply. Second, the efficiency of Sec incorporation is reduced with increasing selenoprotein synthesis. We measured the expressions of four selenoprotein constructs and estimated their model parameters. Their inferred Sec incorporation efficiencies did not correlate well with their SECIS-SBP2 binding affinities, suggesting the existence of additional factors determining the hierarchy of selenoprotein synthesis under selenium deficiency. This model provides a framework to systematically study the interplay of factors affecting the dual definitions of a genetic codon.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 28178267 PMCID： PMC5323020 DOI： 10.1371/journal.pcbi.1005367

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Stop codons can be reassigned to encode amino acids [1, 2]. Failures in stop codon reassignment leads to the production of prematurely terminated proteins [3, 4], but how cellular factors influence alternative definitions of stop codons is not fully understood. While some stop codon reassignments are confined to certain species or organelles, redefinition of UGA to selenocysteine (Sec) in selenoprotein synthesis occurs in all three domains of life [5]. Selenoproteins are proteins that contain the Sec amino acid residue. Translating UGA to Sec requires Sec-tRNASec (Sec-specific tRNA charged with Sec), the Sec insertion sequence (SECIS) element at the 3’ untranslated region (3’UTR) of selenoprotein mRNAs [4, 6, 7], and other regulatory factors such as SBP2 [8-10] and EFSec [11, 12]. Failed UGA to Sec decoding results in translation termination, with UGA being recognized by a release factor (RF) instead. RFs trigger the hydrolysis of ester bonds in peptidyl-tRNA and corresponding release of translated proteins from the ribosome [13, 14]. Translating UGA to Sec is inefficient [15-17] and influenced by the abundance of selenoprotein mRNA, Sec-tRNASec, selenium, SBP2 and the intrinsic properties of SECIS elements [8, 17–21]. Overexpression of selenoprotein mRNA reduces UGA-to-Sec decoding [18, 22], but this effect could be rescued by co-expression of uncharged Sec-specific tRNA (tRNASec) [18, 22] or SBP2 [8]. The efficiency of Sec incorporation has been shown to be positively correlated with tRNASec or selenium supply in cells [20] yet differs among seleoproteins [23]. There are at least 25 selenoproteins in the human proteome [24] and their difference in Sec incorporation efficiency leads to a “selenoprotein hierarchy” under selenium deficiency [23]: proteins with higher Sec incorporation efficiency exploit more Sec-tRNASec and are more rapidly synthesized. It is well known that hierarchical selenoprotein expression depends on the SECIS-SBP2 interaction [8], but whether this interaction is the sole determinant for selenoprotein hierarchy remains unclear. Despite the aforementioned rich studies in selenoprotein translation, a systematic and quantitative characterization of the joint effects of various regulatory factors has not yet been reported. To fill this gap, we developed a simple mechanistic model that captures the quantitative characteristics of the UGA translation process and applied this model to experimental data to investigate how various regulatory factors influence the definition of UGA. We utilized differential protein half-lives from full-length and truncated selenoproteins, retrieved from a single-cell-based global protein stability (GPS) assay [25], to infer UGA definitions under cell culture conditions with variations in selenium supply and selenoprotein expression levels, and used those inferred quantities to estimate the model parameters. We found that the qualitative behavior of selenoprotein translation derived from our model closely resembles that from experimental data. Moreover, we re-capitulated the selenoprotein hierarchy by measuring and comparing the stability of proteins expressed from constructs with SECIS elements of four distinct selenoproteins. The estimated Sec incorporation rates are incongruent with the reported SECIS-SBP2 binding equilibrium constants, suggesting the existence of additional factors to explain selenoprotein hierarchy. Our model provides a framework to quantitatively study the regulation of UGA codon redefinition and selenoprotein synthesis.

Results

Experimental results

Inferring UGA definition using differential protein half-lives between full-length and UGA-terminated selenoproteins

The UGA codon defines either Sec or translation termination during selenoprotein synthesis. Decoding UGA to Sec or translation termination results in the expression of full-length (PL) or truncated (PS) selenoproteins, respectively (Fig 1). UGA assignment can therefore be inferred from the ratio of PL to PS. We previously found that PL is more stable than PS, and that the protein stability of neither species is affected by selenium supply [3]. Thus, our deduction of UGA definition can be taken from the observed protein half-life of the total selenoprotein population (PT), which is represented by the mixture of PL and PS. Intuitively, when a higher proportion of UGA is redefined as Sec, PL is favored over PS, resulting in a greater observed selenoprotein half-life and vice versa. This phenomenon can be depicted by the following formula: where x and (1-x) are the proportions of UGA defined as Sec and translation termination, respectively, and λ, λ and λ are the degradation rates of PL, PS and PT, respectively. The reciprocal of λ is proportional to the protein half-life. The half-life of PT is a linear combination of that of PL and PS, and thus x can be deduced by measuring the half-lives of PT, PL and PS.

Fig 1

A model of UGA decoding during selenoprotein synthesis.

The Sec-tRNASec and RFs compete for a UGA site with association constants k1 and k2, respectively. The abundance of Sec-tRNASec is determined by both selenium (Se) and uncharged tRNASec with an equilibrium constant k When Sec-tRNASec binds to the UGA site, the mRNA will be translated into full length selenoproteins (PL) with synthesis rate γL and degradation rate λL. When RF binds to the UGA site, the mRNA will be translated into truncated proteins (PS) with synthesis rate γS and degradation rate λS. Φ denotes debris after protein degradation.

A model of UGA decoding during selenoprotein synthesis.

Characterization of UGA assignments in SEPHS2 and SEPW1 syntheses

We measured the protein half-lives of PL, PS and PT using GPS, a single-cell-based dual-fluorescent reporter system [25]. GPS allows for simultaneous measurement of protein synthesis (mRNA level), abundance and half-life using red-fluorescent protein (RFP) signal, green-fluorescent protein (GFP) signal and the GFP/RFP ratio, respectively (S1 Fig). The GPS reporter was permanently integrated into the genome of cells in order to tightly control protein expression levels and to avoid artifacts resulting from transient expression. We utilized the synthesis of selenoprotein SEPHS2 as a model to investigate the UGA decoding process. To measure the half-lives of PL and PS, we generated SEPHS2 mutant transcripts that exclusively express PL or PS (see Experimental Methods for details). Consistent with previous findings [3], PL was much more stable than PS (Fig 2A and 2B), and the half-lives of both species were not affected by either selenium supply (Fig 2C) or protein synthesis rate (Fig 2D). To reiterate the concept of deducing UGA definition using differential protein half-lives between PL and PS, a hypothetical curve for wild-type SEPHS2 transcripts (PT) expressing both PL and PS is also shown (Fig 2B, the orange line), with the observed half-life lying between PL and PS. Favorable UGA to Sec or stop assignments shifts the curve closer to that of PL or PS, respectively.

Fig 2

Protein half-life analysis of full-length and truncated SEPHS2.

Protein half-life analysis of full-length and truncated SEPHS2.

(a) Distributions of protein stability measurements of PL or PS by the GPS assay. PL and PS were expressed from SEPHS2 mutant transcripts that exclusively express one form of SEPHS2. % of Max indicates normalized cell counts such that the peak value of each distribution is 100%. (b) The relationship between protein synthesis and abundance for PL and PS. Each dot represents a single cell carrying the indicated GPS reporter with a corresponding protein synthesis (RFP) and protein abundance (GFP). The GFP/RFP ratio, or the slope of the protein synthesis-abundance plot passing through the origin, reflects the protein half-life. A hypothetical line for PT, the total amount of proteins of both forms, is shown. (c-d) GPS analysis of PL and PS under various selenium concentrations (c) or synthesis levels (d). Relative mRNA levels are quantifications of the RFP signals in the GPS assay. To investigate how various factors affected the alternative translation of UGA, we measured the half-lives of PT under different selenium concentrations and synthesis levels. In accordance with the hypothetical line in Fig 2B, the half-lives of PT (i.e. the GFP/RFP ratio or slope) are situated between those of PL and PS (S2 Fig). As revealed by the corresponding increase in the half-lives of PT (Fig 3A and 3B; S2 Fig), UGA to Sec translation is preferred with increasing selenium availability. In contrast, UGA to Sec translation is disfavored with increased SEPHS2 synthesis, as shown by the corresponding decrease in the half-lives of PT (Fig 3C). Consistent with the idea of binding competition between Sec-tRNASec and RFs at the UGA sites [26], decreasing the abundance of RFs promoted UGA to Sec decoding as revealed by the increase in the half-life of PT (S3 Fig).

Fig 3

The effect of selenium supply and SEPHS2 synthesis level on UGA definition.

The effect of selenium supply and SEPHS2 synthesis level on UGA definition.

(a) The relationship between protein synthesis and abundance for PT analyzed under various selenium concentrations. Both original and processed experimental results are presented in the graph and are represented by “original” and “mean”, respectively. The processed results present the mean abundance at each synthesis level. Since the half-life of PS is not affected by selenium supply, only PS analyzed at 40 nM selenium concentration is shown. (b) GPS assay of PT under various selenium concentrations. (c) GPS assay of PT under 40 nM selenium and various synthesis levels. Relative synthesis levels were estimated from the GPS assay. (d) The ratios of PL and PS abundance were quantified by Western blotting. Relative mRNA levels were estimated from the GPS assay. The abundance of PT possesses a positive yet nonlinear relation with SEPHS2 synthesis (Fig 3A). The rate of increase for PT abundance declines with elevated SEPHS2 synthesis. Both the half-life of PT and the amount of protein synthesis that yields a declining rate of PT abundance increase with increasing selenium supply (Fig 3A), suggesting selenium as a limiting factor for UGA to Sec translation. However, elevated selenium supply can only push saturation of PT abundance toward higher protein synthesis but not eradicate it (Fig 3A), and the half-lives of PT cannot reach that of PL even at high selenium concentrations (Fig 3B; S2 Fig). Those observations suggest the existence of additional limiting factors beyond selenium supply. We directly quantified the protein abundance of PL and PS by Western blotting as an alternative approach to investigate the UGA decoding process. The ratio of PL/PS abundance served as an indicator for UGA to Sec translation efficiency (Fig 3D). Consistent with results inferred from protein half-lives (Fig 3A–3C), UGA to Sec translation increased with selenium supply, yet became saturated at high selenium concentrations. The efficiency of UGA to Sec translation declined with synthesis (mRNA levels) at each fixed selenium concentration. Intriguingly, we observed the production of PS even under ample selenium supply (data not shown), suggesting unavoidable binding competition between RFs and Sec-tRNASec at the UGA sites. We analyzed the UGA decoding process of another selenoprotein, SEPW1. The superior stability of the full-length proteins compared to truncated peptides was sustained (S4 Fig). Moreover, the relation between protein abundance and synthesis under various selenium concentrations also possessed similar qualitative characteristics for both SEPW1 (S5A Fig) and SEPHS2 (Fig 3A). In both proteins, protein abundance (GFP) increased with increasing protein synthesis (RFP) and selenium concentrations, yet the GFP-RFP curve slope declined with increasing RFP values. These results indicate that the qualitative behaviors of Sec incorporation are not idiosyncratic to SEPHS2.

Comparison of UGA assignments with four SECIS elements

To investigate the role of SECIS elements on hierarchical selenoprotein translation, we replaced the SECIS element of the SEPHS2 transcript with those from three other selenoprotein transcripts–GPX1, SELK, and SEPX1 –and monitored SEPHS2 protein expressions by the GPS assay. We show the relations between protein abundance and protein synthesis under 40 nM selenium concentration (Fig 4A) and four selenium concentrations (S6A–S6D Fig). The data from those constructs exhibited a hierarchy of Sec incorporation efficiency. SEPHS2 and GPX1 had higher GFP-RFP slopes than SEPX1 and SELK, indicating superior Sec incorporation of the SECIS elements of SEPHS2 and GPX1 to those of SEPX1 and SELK.

Fig 4

Experimental and predicted Sec incorporation efficiencies in four SECIS constructs.

(a) The relationship between protein synthesis and abundance for PT under 40 nM selenium concentration in the experimental data of four SECIS constructs: GPX1 (red), SEPHS2 (blue), SEPX1 (brown), and SELK (green). A solid circle indicates the mean GFP value for each RFP value. (b) The relationship between protein synthesis and abundance for PT under 40 nM selenium concentration according to the models inferred from experimental data.

Experimental and predicted Sec incorporation efficiencies in four SECIS constructs.

Computational modeling results

A mathematical model of the Sec incorporation process

We propose a simple mechanistic model of selenoprotein expression control that accounts for the aforementioned experimental characteristics: PL is more stable than PS, and their half-lives are not affected by synthesis level or selenium supply. Total selenoprotein abundance increases with both mRNA levels and selenium supply. Additional limiting factors account for the saturation of PT at high levels of selenoprotein mRNA and selenium. UGA to Sec translation increases with selenium supply but decreases with selenoprotein mRNA levels, and it saturates at high selenium concentrations due to the existence of the same limiting factor. Constituent binding competition between RFs and Sec-tRNAsec occurs at UGA sites. The model is schematically illustrated in Fig 1 and described below.

Basic reactions and hypotheses

The model is based on the following simplifying assumptions: Synthesis and degradation reactions of both PL and PS follow first-order kinetics, which stipulate that the reaction rates are proportional to the substrate concentrations. PL and PS have distinct synthesis and degradation rates. PS possess a considerably shorter half-life than PL. RFs and Sec-tRNASec compete for UGA sites. The total amount of selenoprotein mRNA is distributed among the transcripts participating in the translation of PL (mRNA-Sec-tRNASec), PS (mRNA-RF), and free molecules. The total amount of tRNASec in a cell is fixed and distributed between free and charged tRNAs. The conjugation of Sec to tRNASec also follows first-order kinetics with respect to selenium and free tRNASec molecules. The selenoprotein constructs in our experiments are derived from intron-less cDNAs and thus are immune to nonsense-mediated mRNA decay (NMD), a well-known mRNA quality surveillance mechanism to eliminate mRNA with premature stop codons [27, 28]. Nevertheless, we have incorporated NMD regulation into our model (Eqs 5–7 in Materials and Methods). Under those assumptions, we describe the following reactions at steady state in this model: Sec-tRNASec incorporation, PL translation and degradation: RF competition for the UGA site, PS translation and degradation: Sec-tRNASec synthesis: In addition to the aforementioned reactions, we also imposed three other constraints on the total amounts of selenoprotein mRNA and tRNASec. The first constraint stipulates that the selenoprotein mRNAs are distributed among the molecules bound to Sec-tRNASec, RFs, and free molecules (Eq 10). The second constraint stipulates that the total Sec-tRNASec molecules are distributed between the charged tRNAs interacting with mRNAs and free molecules (Eq 11). The third constraint stipulates that the total tRNASec molecules are distributed between charged and uncharged tRNAs (Eq 12). The model consists of nine parameters: translation (γ) and degradation (λ) rates of PL and PS (i.e. γL/λL and γS/λS respectively); equilibrium constants for the interactions with Sec-tRNASec and RFs (k and k, respectively); the equilibrium constant of charging Sec to tRNASec (k); the total amount of tRNASec (T); and the total amount of RFs. The total amounts of mRNA and protein levels (m and P respectively) of each cell are measured by the RFP and GFP intensities in the GPS assay (Eq 13). Given those parameters as well as the equations and constraints derived from the hypotheses above, the relationship between total protein abundance and total mRNA levels can be expressed as a complex functional formula. To estimate the model parameters, we further simplified the nine parameters in the model and combined them into six independent parameters: the ratios of synthesis and degradation rates γL/λL and γS/λS were calculated from the experimental data of PL and PS alone, respectively; k and RF were combined into one parameter kF as they always co-occurred in the equations; we also introduced parameters ρp and ρm to specify the ratios of protein and mRNA abundance from GFP and RFP intensities, respectively, and replaced ρm with an equivalent parameter ρ = ρm/ρp. Consequently, only the following six parameters need to be estimated: k, kF, k, T ρ and ρp. A detailed description of the model is reported in Materials and Methods.

Recapitulation of the qualitative characteristics of selenoprotein synthesis and degradation

To verify the sensibility of this model, we examined if it could reproduce the qualitative properties observed from experimental data for SEPHS2. Moreover, to ensure that this model consists of all the essential requirements to explain the observed phenomena, we excluded the two constraints (mRNA and tRNASec), both separately and together, and checked whether the reduced models could still recapitulate the same qualitative properties. We selected a specific set of parameter values in the model ({k1, k3, k, T, ρ, ρ} = {3, 10, 0.1, 500, 10, 100}; [m] = 1∼4000), varied the amount of selenium supply and mRNA levels, and then generated simulated data for the GPS assay (Fig 5A) and the Western blot experiment (Fig 5B). We compared the simulation outcomes of four models: (1) the model with both mRNA and tRNASec constraints; (2) the model with the mRNA constraint alone; (3) the model with the tRNASec constraint alone; and (4) the model without mRNA and tRNASec constraints. Only the model incorporating both constraints exhibits saturation of the total protein abundance (Fig 5A) and PL/PS (Fig 5B) with increased protein synthesis and selenium supply, respectively. At low mRNA (protein synthesis) levels, PL formation dominates due to its superior stability. Hence, observed protein stability is higher, as indicated by the slope of the protein abundance-synthesis curve (Fig 5A, lower-right panel, left part of the curves) and the higher PL/PS (Fig 5B, lower-right panel, the red curve). As the mRNA level increases, Sec-tRNASec molecule supply becomes exhausted and PS formation dominates. Therefore, the observed protein stability approaches the lower rate of PS (Fig 5A, lower-right panel, right part of the curves), and PL/PS becomes smaller (Fig 5B, lower-right panel, the purple curve). Similarly, at low selenium concentrations, there is an abundant supply of uncharged tRNASec. Thus, the amount of charged Sec-tRNASec is proportional to the selenium concentration, and the amount of PL produced is roughly proportional to Sec-tRNASec supply (Fig 5B, lower-right panel, left part of the curves). When selenium concentration increases, all tRNASec molecules are charged. Thus, PL formation depends only on the amount of tRNASec and becomes insensitive to selenium concentration (Fig 5B, lower-right panel, right part of the curves). Increasing mRNA levels enhance incorporation of Sec and depletion of uncharged tRNASec molecules, thereby pushing saturation of the PL/PS ratio towards lower selenium concentrations (Fig 5B, lower-right panel).

Fig 5

Prediction of Sec incorporation efficiencies under four different constraints.

Prediction of Sec incorporation efficiencies under four different constraints.

(a) Protein abundance among various protein synthesis rates and selenium concentrations was simulated using mathematical models. Four models were compared using a specific parameter set ({k1, k3, k, T, ρ, ρ} = {3, 10, 0.1, 500, 10, 100}; [m] = 1∼4000). (b) Simulation of Sec incorporation efficiency (PL/PS ratio) using parameter sets identical to (a). The caption “mRNA total” indicates the number of mRNA molecules in the model. Both the mRNA and tRNASec constraints are essential to reproduce the qualitative characteristics observed from experimental data. The model with only the mRNA constraint can account for the lower translational efficiency at higher mRNA levels due to the dominance of PS (Fig 5A, upper-right panel), in accordance with our experimental results from GPS assay (Fig 3A). However, since the tRNASec supply is unlimited, the charged Sec-tRNASec abundance is proportional to the selenium concentration. PL formation is therefore linearly dependent on the selenium concentration when it is high (Fig 5B, upper-right panel), which cannot explain the results from our Western blot experiment (Fig 3D). The intervals with zero PL/PS reflect the regimes where charged Sec-tRNASec become a limiting factor. In contrast, the model with only the tRNASec constraint can recapitulate the saturation of PL formation at high selenium concentrations due to limited tRNASec supply (Fig 5B, lower-left panel), in accordance with the results from our Western blot experiment (Fig 3D). However, since free mRNA supply is unconstrained, the maximum capacity to produce PL is quickly reached (due to limited tRNASec supply), and formation of PS dominates subsequent protein synthesis. Thus, the protein abundance-synthesis curves are straight and are collapsed into a single line for all selenium concentrations (Fig 5A, lower-left panel), which cannot explain the experimental results from our GPS assay (Fig 3A). The model without either constraint does not exhibit non-linearity in either experiment (Fig 5A and 5B, upper-left panels).

Estimation of model parameters

The six independent parameters were connected by complex nonlinear functional relationships. We developed an algorithm to estimate the parameters that fit the functional relationships between single-cell GFP and RFP intensities from our GPS experiments. In brief, each set of parameters π gave rise to a function GFP = f(RFP). We defined the loss function as the square error between measured and predicted GFP values, summing over all data points: Q2(π) = ∑(GFP − f(RFP))2. A grid-search algorithm was employed to find the parameter values that minimized the loss function. The procedures for data processing and parameter estimation are described in Materials and Methods.

The parameter estimation algorithm can recover parameter values from simulations

To see how precisely our algorithm recovered the parameter values, we performed a simulation test. We generated 100 random parameter combinations (Eq 18) and simulated the corresponding RFP versus GFP data points for each parameter set. The algorithm estimated the parameter values based on simulated data points. By comparing the input and predicted parameters, we evaluated the success rate of recovering correct parameters (see Material and Methods). The success rate varied between 70–100% with the highest resolution of grid search (Table 1, the last row). The average recovery rate ranged from 64% to 76% with grid densities increasing from 1024 (4 possible values for each parameter) to 248832 (12 possible values for each parameter) within the same parameter boundary. We also introduced noise in simulated data points and assessed the parameter recovery rates from noisy data (see Material and Methods, Eq 19). Experimental data indicated that the noise of GFP values for a given RFP value is proportional to the RFP signal level, and the standard deviation of the normalized noise is about 0.3. We varied standard deviation of the noise in simulated data from 0.3 to 5 and report the recovery rate in Table 2 (see Materials and Methods). The recovery rate varied from 70% to 33% as the normalized standard deviation of noise increased from 0 to 5. The recovery rate dropped below 50% when the normalized noise standard deviation is above 1.0. These results are intuitive, as it is hard to reconstruct a model when noise exceeds the signal level.

Table 1

Parameter recovery rate under different resolutions.

	Parameters
Blocks	k₁	kF	k₃	T_total	ρ	Average
4	61%	58%	100%	68%	69%	64%
6	63%	63%	100%	73%	80%	70%
8	67%	68%	100%	69%	77%	70%
10	68%	73%	100%	77%	79%	74%
12	76%	71%	100%	76%	80%	76%

Table 2

Parameter recovery rate under different levels of data noise.

Parameters
Data std.	k₁	kF	k₃	T_total	Ρ	Average
0.0	67%	68%	100%	69%	77%	70%
0.3	53%	50%	77%	64%	54%	55%
0.6	43%	45%	64%	55%	48%	48%
1.0	48%	55%	71%	56%	49%	52%
3.0	39%	38%	61%	44%	42%	41%
5.0	28%	32%	51%	39%	33%	33%

The table shows the recovery rate under different data noise levels. The column “Data Std.” indicates the standard deviations of the noisy data sets (see Material and Methods).

The table shows the recovery rate of each parameter from 100 simulations. A successful recovery is defined when the recovered parameter values are within ten-fold of the true parameter values. It shows the recovery rate under different algorithm resolutions. The “Blocks” column indicates the grid numbers used in the algorithm. Higher grids result in higher resolution. The table shows the recovery rate under different data noise levels. The column “Data Std.” indicates the standard deviations of the noisy data sets (see Material and Methods).

Estimated parameter values from GPS data

We employed the grid search algorithm to estimate the six independent parameters from the SEPHS2 GPS data. Table 3 displays the top 10 parameter sets identified by the algorithm. They are grouped into two degenerate classes of solutions. Within each class, each parameter set gives rise to the same loss function value. Among them, the highest loss function value is 2.1-fold that of the lowest one. The differences between respective k, T and ρp values are all within 1.5-fold. Greater differences between minimum and maximum values occurred for k (1.4-fold) and ρp (1.5-fold). Small differences between the top-ranking parameter values obtained from a global grid search suggest their closeness to the global optimum values.

Table 3

Top ten estimated SEPHS2 parameters from experimental data.

k₁	kF	k₃	T_total	ρ	ρ_p	Q²
17.03	9803.03	0.02	63.44	10.40	0.95	1.48 x 10⁸
17.65	9704.55	0.02	60.63	10.87	0.95	1.48 x 10⁸
17.96	10000.00	0.02	61.39	10.75	0.95	1.48 x 10⁸
17.03	9556.83	0.02	61.90	10.64	0.95	1.48 x 10⁸
17.34	9852.28	0.02	62.67	10.52	0.95	1.48 x 10⁸
13.13	8621.24	0.02	72.39	13.32	0.65	3.17 x 10⁸
12.82	8473.52	0.02	72.90	13.20	0.65	3.17 x 10⁸
13.13	9768.97	0.02	73.66	13.09	0.65	3.17 x 10⁸
13.13	8867.45	0.02	74.43	12.97	0.65	3.17 x 10⁸
12.82	8621.24	0.02	74.18	12.97	0.65	3.17 x 10⁸

We checked how well the model derived from the top-ranking parameter values fit the experimental data. Since the scattered plots of GFP-RFP intensities of the GPS data were noisy, we show the mean of GFP values corresponding to each single RFP value (Fig 6A). The GFP-RFP curves generated by the optimum parameter values (solid circles) fit well with the experimental data (dots) at high selenium concentrations (red, black and blue colors). At the lowest selenium concentration, the model underestimates the GFP value (protein abundance) with each fixed RFP value (mRNA level) (green dots and circles). This shift is likely due to the existence of endogenous selenium in cells with little or no external selenium supply. Beyond qualitative observations in Figs 3A, 3D and 5, we also compared two quantitative scores of goodness of fit (r and root mean square error, RMSE) among three alternative models (with mRNA and tRNA constraints alone and a combination of both constraints) of the data from GPS (S1 Table) and Western blot (S2 Table) assays.

Fig 6

Comparison of experimental and predicted Sec incorporation efficiencies in SEPHS2.

Comparison of experimental and predicted Sec incorporation efficiencies in SEPHS2.

(a) The relationship between protein synthesis and abundance for PT analyzed under various selenium concentrations. Dots denote mean GFP values for each RFP value in the experimental data and are the same as Fig 3A. Solid circles denote the same quantities from model fitting. (b) The relationship between the full length protein quantities and mRNA levels under various selenium concentrations from model prediction. We also checked whether the estimated parameter values were within biologically sensible ranges according to prior studies (Table 4). In mammalian cells, the ratios of protein synthesis and degradation rates have a broad spectrum of values, ranging from 10−3 to 104 [29]. The SEPHS2 protein synthesis/degradation ratio calculated from our control experiments varies from 70 to 80, which falls within this range. We also estimated the possible ranges of mRNA and protein copy numbers of SEPHS2. Previous studies have reported an SEPHS2 mRNA expression level of approximately 102 molecules per cell and a protein expression level of 103 molecules per cell [29-31] (see Material and Methods). The mRNA and protein levels in our results are all within these ranges (Fig 6B).

Table 4

Physiological ranges of the model parameters.

Parameter	Description	Typical values	References
k₁	mRNA-Sec-tRNA^Sec association constant	0~100 (nM/hr)	[29, 35]
kF	Product of release factor concentration and mRNA-release factor association constant (k₂)	0~10000	[31]
k₃	Sec-tRNA^Sec synthesis rate	0~100 (nM/hr)	[36]
T_total	Abundance of tRNA^Sec in the cell	10~1000 (nM)	[37–40]
ρ	Ratio of RFP and GFP intensity constant	0~10000	[29, 41]
ρ_p	GFP intensity constant	0~10000 (a.u./nM)	[29, 41]
s1	Ratio of γ_L over λ_L	Estimated from FACS data	[42]
s2	Ratio of γ_S over λ_S	Estimated from FACS data	[42]

To justify the wider applicability of the model estimation algorithm, we estimated the model parameters of SEPW1 from the experimental data (S3 Table). Similar to SEPHS2, the GFP-RFP curves of SEPW1 generated by the inferred model (S5B Fig) recapitulates the qualitative characteristics of experimental data (S5A Fig).

Comparison of Sec incorporation rates and SECIS-SBP2 binding affinity in selenoproteins

We replaced the SECIS element of the SEPHS2 transcript with those from three other selenoprotein transcripts to investigate the role of SECIS elements on hierarchical selenoprotein expression. We estimated the model parameters of the four SECIS constructs, compared their k and kF values in Table 5, and reported all the inferred parameter values in S4 Table. While all the models possess a similar level of kF, their ks can be separated into two groups: SEPHS2 and GPX1 have higher values (17.0 and 12.2) than SEPX1 and SELK (5.7 and 5.7) (Fig 4B). This order is compatible with the order of GFP-RFP curves in experimental data (Fig 4A and S6 Fig). Similar levels of kF are consistent with the experimental setting, as all the constructs are derived from SEPHS2 and differ only in their SECIS elements. Their RF incorporation efficiency (k) and RF concentration should thus be invariant. Likewise, other parameters pertaining to the processing of alternative UGA codon assignments (k and T) also exhibit similar levels (S4 Table).

Table 5

Comparison of estimated Sec incorporation strength of four SECIS elements and SECIS-SBP2 dissociation constants.

SECIS element	SECIS-SBP2 K_d (nM)	k₁	kF	k₁/ kF
SEPHS2	7.5	17.0	9803.0	0.0017
GPX1	6.3	12.2	8521.3	0.0014
SEPX1	9.7	5.7	9803.0	0.0006
SELK	3.6	5.7	8522.8	0.0007

The values of k and kF for each SECIS element were estimated from their experimental data. The SECIS-SBP2 disassociation constants K are reported from [43].

The values of k and kF for each SECIS element were estimated from their experimental data. The SECIS-SBP2 disassociation constants K are reported from [43]. However, the order of Sec incorporation efficiency among the four SECIS elements (k) is not compatible with their SECIS-SBP2 binding disassociation constants (K in Table 5). In particular, SELK possesses the lowest disassociation constant (thus the highest SECIS-SBP2 binding affinity), yet has the lowest Sec incorporation efficiency. The order of SECIS-SBP2 binding affinity among the remaining three SECIS elements (GPX1, SEPHS2, SEPX1) is roughly compatible with the order of their k values (SEPHS2, GPX1, SEPX1).

Discussion

Selenoprotein synthesis serves as a remarkable model to study how cellular and environmental factors influence the definition of a dual-use codon. We have proposed a concise mathematical model of selenoprotein synthesis that matches well with both qualitative and quantitative characteristics of experimental results. By combining the power of biological experiments and computational modeling, we have revealed how multiple cis and trans regulatory factors collectively influence the definition of UGA. The characteristics of experimental data can be explained by the competition between RF and Sec-tRNASec for UGA codons of limited selenoprotein mRNAs, as well as the limited abundance of tRNASec. We formulated these two types of resource limitation as a quantitative, mechanistic model. Simulations according to this model successfully reproduced qualitative characteristics of the experimental data (Fig 5). Beyond qualitative matching, we also proposed an algorithm to estimate model parameters from experimental data. The model derived from the estimated parameters fit well with the experimental data (Fig 6A, S1 Table and S2 Table). Previous work on the importance of SECIS-SBP2 interactions for the selenoprotein expression hierarchy remains inconclusive. Some studies have indicated that SECIS-SBP2 interactions dictate the selenoprotein hierarchy [8], whereas others have suggested that those interactions alone are insufficient to determine Sec incorporation efficiency [21, 32]. Our deduced Sec incorporation rates attributed to distinct SECIS elements did not correlate well with reported SECIS-SBP2 binding affinities (Table 4). SEPHS2 and GPX1 had substantially higher Sec incorporation rates than SEPX1 and SELK, yet the SECIS-SBP2 binding of SELK was the strongest among the four SECIS elements. Thus, we provide evidence to support the presence of other determining factors for selenoprotein hierarchy. The order of predicted GFP-RFP curves among the four SECIS elements is consistent with the order of the corresponding experimental curves except for zero selenium concentration (S6 Fig). At zero selenium concentration, the predicted curves of all SECIS elements coincide and are considerably lower than all the experimental curves. This is likely due to the existence of residual selenium in cells even at zero external selenium supply. The parameters in our model conform to some of the fundamental quantitative features of cell biology, such as the translation and degradation rates of proteins, incorporation rates of Sec-tRNASec and RFs, and the quantities of tRNASec and RFs in cells. Few of these quantities have been reported for mammalian cells, so it is not possible to verify the accuracy of the estimated parameters from existing information. Thus, a thorough verification of the estimated parameter values remains to be conducted. The concise selenoprotein synthesis model we propose circumvents detailed mechanistic description. It is now possible to build a more detailed, mechanistic model by including all the intermediate steps in the pathway. However, introducing additional free parameters without concomitant measurements merely complicates the model with little improvement in accuracy. Importantly, in our simplified equations, we reveal the existence of a limiting factor beyond selenium concentration in Sec-tRNASec synthesis. Which enzymes or substrates constitute the true limiting factor warrants further investigation. Likewise, incorporation of tRNASec or RFs at a UGA site involves binding of multiple molecules [8–12, 33, 34]. Some of them could possibly be limiting factors additional to excess mRNA and tRNASec supplies. Despite Sec incorporation being a very specialized process, the process of synthesizing and degrading multiple products with shared and limited resources is ubiquitous in biochemical systems. Some instances include dichotomy between growth and production of organisms, competitive binding of transcription factors and their repressors on promoters, and biosynthesis of metabolites from multiple pathways with shared substrates. Although the models capturing those phenomena may have very different formulations than the models described in this study, the methodology we introduced may be extended to other systems with similar characteristics. Furthermore, presence of multiple exogenous and endogenous limiting factors, such as selenium, selenoprotein transcripts and tRNASec in our study, may yield a more complicated system behavior than the cases with single or no limiting factors.

Materials and methods

Plasmid construction

To generate the SEPHS2 and SEPW1 GPS reporter construct, SEPHS2 and SEPW1 cDNA from the Mammalian Gene Collection (GE Healthcare Dharmacon Inc., Lafayette, CO, USA) was cloned into a lentiviral vector carrying the RFP-IRES-GFP GPS cassette using Gateway technology (Life Technologies, Carlsbad, CA, USA). To generate SEPHS2 and SEPW1 mutants that exclusively express PL or PS, the TGA/Sec codon on SEPHS2 and SEPW1 cDNA was mutated into TGT/Cys or TAA/stop by site-directed mutagenesis (Stratagene, Santa Clara, CA, USA), respectively. To replace the SECIS element of SEPHS2 with that of other selenoproteins, SECIS elements of GPX1, SELK and SEPX1 were amplified from corresponding selenoprotein cDNAs and cloned into the SEPHS2 reporter using Gibson Assembly (New England Biolabs Inc., Ipswich, MA, USA).

Tissue culture

HEK293T cells were maintained in DMEM with 10% fetal bovine serum (FBS, purchased from Hyclone Laboratories, Logan, UT, USA) and antibiotics in a 6% CO2 atmosphere at 37°C. FBS is the main source of selenium in cell culture. To control selenium supply, cells were first depleted of selenium in FBS-free DMEM supplemented with 10 μg/mL insulin and 5 μg/mL transferrin for 24 hrs. Cells were then balanced with indicated concentrations of sodium selenite (Na2SeO3, Sigma-Aldrich, St. Louis, MO, USA) for another 24 hrs. All tissue culture media and supplements were purchased from Gibco Life Technologies, unless otherwise indicated. To produce lentiviruses, HEK293T cells were transfected with pHAGE, pHIV gag/pol, pVsvg, pRev and pTat using TransIT-293 reagent (Mirus Bio LLC, Madison, WI, USA). Viruses were harvested 48 hrs after transfection.

Generation of GPS reporter cell lines and GPS assays

To generate GPS reporter cell lines, cells were infected with lentiviruses carrying GPS reporter constructs. Infection was carried out in media with 8 μg/mL polybrene (Sigma-Aldrich). To collect reporter cell lines with a series of SEPHS2 synthesis levels, cells were infected stepwise with lentiviruses carrying GPS reporter constructs. To prepare samples for FACS analysis, cells were washed with PBS, trypsinized and resuspended in medium containing 2% FBS and analyzed using a BD LSR Fortessa system (BD Biosciences, San Jose, CA, USA). 106 cells were recorded for each sample. FlowJo (Ashland, OR, USA) was used for primary FACS data analysis.

Western blotting

Cells were harvested in cold PBS and lysed in RIPA buffer (150 mM NaCl, 1.0% IGEPAL®CA-630, 0.5% sodium deoxycholate, 0.1% SDS, and 50 mM Tris, pH 8.0). Standard procedures were used for Western blotting. Antibody against GFP (JL-8) was purchased from Clontech Laboratories (Mountain View, CA, USA).

Data processing

The single-cell-based GPS data consists of 106 pairs of RFP-GFP intensities for individual cells. The RFP-GFP relationship in each cell manifests a high level of variation. However, for each small range of RFP values, the corresponding GFP values typically have a Gaussian distribution with a variance proportional to the RFP value. Therefore, we treated the GPS data as instantiations of the following random variables: y = f(x) + ϵ, where x denotes a random variable of RFP intensities with an unspecified distribution and y denotes a random variable of GFP intensities and is a function of x with an additive noise ϵ. ϵ∼N(0,xσ2) follows a Gaussian distribution with zero mean and xσ2 variance. To reduce data noise and size, we applied two filtering procedures to the GPS data. First, we divided the range of RFP and GFP values into 2000 grids and discarded the data points in grids comprising fewer than 30 data points. Second, we sorted the RFP values and selected 0.4% data points. The processed data thereby consisted of about 3000 pairs of RFP and GFP values for each selenium concentration.

A mathematical model of selenoprotein synthesis and degradation

The basic assumptions and reactions of the model are described in the Results and illustrated in Fig 1. Here, we demonstrate the mathematical formulation of the model. We first introduce the following notations: m: concentration of total selenoprotein mRNA molecules m: concentration of free selenoprotein mRNA molecules not interacting with Sec-tRNASec or RFs. SeT: concentration of free Sec-tRNASec molecules m − SeT0: concentration of the mRNA-Sec-tRNASec complex before mRNA degradation m − SeT: concentration of the mRNA-Sec-tRNASec complex after mRNA degradation k1: association constant of the reaction m + SeT ⇌ m − SeT P: concentration of full-length selenoproteins γ: translation rate of full-length selenoproteins λ: degradation rate of full-length selenoproteins RF: concentration of RFs m − RF0: concentration of the mRNA-RF complex before mRNA degradation m − RF: concentration of the mRNA-RF complex after mRNA degradation k2: association constant of the reaction m + RF ⇌ m − RF P: concentration of truncated selenoproteins γ: translation rate of truncated selenoproteins λ: degradation rate of truncated selenoproteins Se: selenium concentration T: concentration of uncharged tRNASec SeT: concentration of charged Sec-tRNASec T: concentration of all tRNASec molecules (charged and uncharged combined) k3: association constant of the reaction T + Se ⇌ SeT α: probability that an mRNA-Sec-tRNASec complex escapes mRNA degradation α: probability that an mRNA-RF complex escapes mRNA degradation e0: background mRNA decay rate N: average number of proteins translated from one mRNA molecule during its life

Full-length protein synthesis and degradation

At equilibrium, m − SeT0 is proportional to the product of m and SeT prior to mRNA degradation: A fraction of m − SeT0 complexes are degraded by the background mRNA decay process. Likewise, at steady state, the total amounts of translated and degraded molecules are equal:

Truncated protein synthesis and degradation

The equations for truncated protein synthesis and degradation follow those of full-length proteins by replacing Sec-tRNASec with RFs: where is attributed to NMD. Derivation of α and α is described in S1 File. Since mRNA degradation can be neglected in our system, we set α = α = 1.

Sec-tRNASec synthesis

We simplified the complicated process of Sec-tRNASec synthesis to a first-order reaction that depends bilinearly on selenium concentration and uncharged tRNASec:

mRNA constraint

The mRNA constraint simply states that the selenoprotein mRNAs are allocated among the mRNA-Sec-tRNASec complexes, mRNA-RF complexes, and free mRNAs:

tRNA constraints

There are two constraints involving tRNASec. First, the total amount of charged tRNASec is distributed between the Sec-tRNASec molecules interacting with mRNAs and the free Sec-tRNASec molecules: Second, the total amount of tRNASec is distributed between charged and uncharged species:

Conversion of fluorescence intensities into mRNA and protein abundance

The GPS assay measures fluorescence intensities rather than molecular abundance. To convert the RFP and GFP intensities into mRNA and protein abundance, we introduced two additional parameters:

Reduction of model parameters

The number of parameters appearing in Eqs 1–8 can be reduced in the following way. First, we collapsed k2 ∙ RF into a single parameter kF as they always co-occurred in the equations. Second, only the translation/degradation rate ratios γ/λ and γ/λ are relevant in our experiments. Third, those ratios can be directly determined from the control experiments with complete full-length or truncated protein synthesis (Fig 2B): , , where SP and SP denote the slopes of the GFP-RFP curves from the two control experiments. After this reduction, we can express full-length and truncated protein concentrations in the following forms: Combining Eqs 10 and 11 with Eqs 1–8, we specified the dependency of free mRNA concentration with total mRNA levels: With m, we can express P and P in analytic forms. Hence, the function of P with respect to m can be established.

A parameter estimation algorithm

We developed a grid-search algorithm to find the parameter values that best fit the experimental data. Among the six undetermined parameters, ρp is an arbitrary parameter that only affects the scale of selenoprotein expression but not the behavior of the translation process in simulation. Thus, we first excluded ρp in the fitting algorithm and manually adjusted ρp after fitting. We generated grids with different combinations of parameters and calculated the fitness of the predicted (RFP, GFP) intensities generated by these parameters with the experimental results. The grids were first generated by logarithmically dividing each parameter into 12 intervals within their boundaries (the range of each parameter value is shown in Table 4). These parameter sets were applied to the mathematical model to convert RFP values into P and P in the loss function Q2: The total loss function TQ2 is summed over all data points indexed by i. P is calculated by transforming GFP intensities using ρp. The loss function has a complicated nonlinear form and thus contains many local optima. Analytic algorithms such as gradient descent will likely find suboptimal solutions whose loss is far from the global minimum. We devised a variation of the divide-and-conquer heuristic approach to alleviate this problem. We started by partitioning the log-scale range of each parameter value by coarse-grained intervals. A small number of multi-dimensional grids were generated from the partitioned parameter space. We then recursively performed the following computations: (1) evaluation of loss function values of parameter configurations on the grids, (2) selection of the top 30 parameter configurations, and (3) subdivision of the selected grids into smaller intervals. Recursion stopped when the grid sizes reached the required resolution of parameter values. The criteria for selecting the parameter configuration from the top-ranking solutions are reported in S1 File. The Matlab codes of the parameter estimation algorithm are reported in S2 File. The GPS data of SEPHS2, GPX1, SEPX1, SELK and SEPW1 are reported in S3–S7 Files respectively. The top ranking solutions of the four SECIS element constructs and SEPW1 are reported in S8 File.

Parameter estimation of simulated data

We randomly generated 100 parameter sets within each parameter boundary by the following function: Where UB and LB are the upper and lower bounds, respectively, of each parameter and X is a random number uniformly distributed on the open interval (0, 1). For each parameter set, about 1000 corresponding RFP and GFP values were generated by the mathematical model. The parameter estimation algorithm was applied to the simulated data, and the estimated parameter values were compared with the parameter values from which the simulated data were generated. We also introduced additive noise to the simulated data with the following formula: Where GFP denotes the GFP values calculated from the model. NorR is randomly drawn from a normal distribution with a mean equal to 0. The standard deviation of NorR varied from 0.3 to 5.0 (Table 2). The performance of our algorithm was evaluated by the log10 ratios between predicted and underlying parameter values: Where P denotes the parameters predicted by the algorithm and P are the true parameters. A parameter value prediction was labeled successful if the error of at least one of the predicted parameter set was smaller than 1 among the top 15 answers reported by the algorithm. The recovering rate indicates the ratio of successful predictions among 100 test sets.

Parameter estimation of the experimental data

We applied the parameter estimation algorithm to about 15,000 RFP-GFP pairs measured at five selenium concentrations. For ρp, we manually chose the value that yielded mRNA and protein levels within normal SEPHS2 expression ranges. We referred to MOPED [31] and BioGPS [30] to get the mRNA and protein expression levels of SEPHS2 relative to ACTN1 and ACTN2, and then converted the relative SEPHS2 expression level into absolute concentration using the dataset of absolute concentrations of ACTN1 and ACTN2 [29]. We estimated that the mRNA expression level of SEPHS2 falls within the order of 102 molecules per cell and the protein expression level within 103 molecules per cell.

The GPS assay system.

GPS is a dual fluorescent reporter system capable of simultaneous measurement of protein synthesis, abundance and stability in single cells [25]. In the GPS system, the reporter cassette enables translation of red fluorescent protein (RFP) and green fluorescent protein (GFP) from a single transcript via cap-dependent translation, as well as translation from the internal ribosome entry site (IRES). While RFP serves as a non-degradable internal control that reports protein synthesis, GFP is fused to the N-terminus of the protein of interest (e.g., SEPHS2) and reports protein abundance. The GFP/RFP ratio represents protein stability, measuring the relative steady-state abundance between RFP and GFP-fusion proteins. Single-cell fluorescent signals were recorded using fluorescence-activated cell sorting (FACS). (TIF) Click here for additional data file.

The relationship between experimental and simulated SEPHS2 expression under various selenium concentrations.

Each dot denotes the GFP (proportional to total protein abundance PT) and RFP (proportional to total mRNA quantity) values of a single cell. Each solid circle denotes the simulated GFP value under each RFP value according to the inferred model. Yellow and orange dots denote the GPS data of mutants expressing only PL and PS, respectively. Their (PL and PS) mean GFP values under each RFP value are represented by solid circles of the corresponding colors. (TIF) Click here for additional data file.

The effect of release factor knockdown on UGA definition.

Distributions of GFP/RFP ratios of PT with or without shRNA-mediated knockdown of RF1. (TIF) Click here for additional data file.

Protein half-life and UGA definition analysis of SEPW1.

(a) Protein stability measurement of PL or PS by the GPS assay. PL and PS were expressed from SEPW1 mutant transcripts that exclusively express one form of SEPW1. (b) The relationship between protein synthesis and abundance for PL and PS in SEPW1 analogous to Fig 2B. (c-d) GPS analysis of PL and PS in SEPW1 under various selenium concentrations (c) or synthesis levels (d). Relative mRNA levels represent quantifications of the RFP signals in the GPS assay. (TIF) Click here for additional data file.

Comparison of experimental and predicted Sec incorporation efficiencies in SEPW1.

(a) The relationship between protein synthesis and abundance for PT analyzed under five selenium concentrations from experimental data. The style follows S2 Fig. (b) The relationship between PT abundance and mRNA levels under five selenium concentrations from model prediction. (TIF) Click here for additional data file.

Comparison of the selenoprotein hierarchy under various selenium concentrations.

The relationship between protein synthesis and abundance for PT analyzed under four selenium concentrations for four SECIS elements. The panels on the left column (a-d) indicate the results from experimental data. The panels on the right column (e-h) indicate the predictions from the inferred models. The selenium concentrations applied are indicated on the left. (TIF) Click here for additional data file.

Quantitative evaluation of experimental and predicted protein abundances based on observed protein synthesis levels and selenium concentrations.

(DOCX) Click here for additional data file.

Quantitative evaluation of experiment and predicted PL/PS ratio corresponding to the relative mRNA levels from the Western blotting assay.

(DOCX) Click here for additional data file.

Estimated parameter values of SEPW1.

(DOCX) Click here for additional data file.

Estimated parameter values of constructs of four SECIS elements.

(DOCX) Click here for additional data file.

Detailed description of the data processing protocol, parameter estimation algorithm, and an augmented model for incorporating mRNA degradation.

(PDF) Click here for additional data file.

The Matlab codes of the parameter estimation algorithm.