| Literature DB >> 34230498 |
Jicong Cao1,2,3,4, Eva Maria Novoa4,5,6,7, Zhizhuo Zhang4,5,6, William C W Chen1,2,3, Dianbo Liu1,4,5, Gigi C G Choi1,2,3,8, Alan S L Wong1,2,3,8, Claudia Wehrspaun1,2,3, Manolis Kellis9,10,11, Timothy K Lu12,13,14,15,16.
Abstract
Despite significant clinical progress in cell and gene therapies, maximizing protein expression in order to enhance potency remains a major technical challenge. Here, we develop a high-throughput strategy to design, screen, and optimize 5' UTRs that enhance protein expression from a strong human cytomegalovirus (CMV) promoter. We first identify naturally occurring 5' UTRs with high translation efficiencies and use this information with in silico genetic algorithms to generate synthetic 5' UTRs. A total of ~12,000 5' UTRs are then screened using a recombinase-mediated integration strategy that greatly enhances the sensitivity of high-throughput screens by eliminating copy number and position effects that limit lentiviral approaches. Using this approach, we identify three synthetic 5' UTRs that outperform commonly used non-viral gene therapy plasmids in expressing protein payloads. In summary, we demonstrate that high-throughput screening of 5' UTR libraries with recombinase-mediated integration can identify genetic elements that enhance protein expression, which should have numerous applications for engineered cell and gene therapies.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34230498 PMCID: PMC8260622 DOI: 10.1038/s41467-021-24436-7
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Schematic overview of recombinase-mediated 5′ UTR library screening strategy.
Naturally occurring 5′ UTRs were extracted, analyzed, and used as the training set to generate synthetic 5′ UTRs for screening. Oligos encoding the 5′ UTR library were synthesized and cloned into plasmids containing a recombinase-recognition site and a GFP reporter. The resulting plasmids were transfected into the HEK 293T-LP cell line with the corresponding recombinase recognition site, resulting in targeted genomic insertion. The cells were sorted into bins based on GFP intensities, and the 5′ UTR sequences of each bin were amplified, sequenced, counted, and compared. The 5′ UTR candidates that enhanced GFP expression were selected and validated experimentally. Finally, the top-ranked validated 5′ UTRs were combined to test for increased gene expression.
Fig. 2Design of the 5′ UTR library of naturally occurring and synthetic 5′ UTRs.
RNA-seq and Ribo-seq datasets of HEK 293T, PC3, and human muscle cells, together with the GTEx database of human muscle tissue, were collected. Natural 5′ UTRs with high TEs and low TEs in HEK 293T and RD cells, 5′ UTRs with various TEs in human muscle cells, and the 5′ UTRs with high mRNA counts in human muscle tissues were selected and added to the library. In addition, we designed synthetic 5′ UTRs by: (i) collecting endogenous 5′ UTR sequences on the target cell type (HEK 293T, PC3 or human muscle cells) from public data; (ii) extracting sequence features of the 5′ UTRs, including those nucleotides surrounding the AUG region; (iii) training a Random Forest machine learning method for each cell type/tissue (HEK 293T, PC3 or human muscle cells), to learn a function that maps sequence features to mRNA expression levels and TEs; and (iv) designing a set of 100 bp synthetic sequences that are predicted to maximize TEs and protein expression levels using genetic algorithms.
Fig. 3Strategy for constructing HEK 293T cell lines with a landing pad and screening the 5′ UTR library using recombinase-based gene integration.
a Recombinase-based library screening workflow. b Construction of the 5′ UTR library and schematic illustration of recombinase-based gene integration. c We observed high reproducibility for barcode representations between two HEK-LP cell lines independently transfected with the library and a recombinase-expression plasmid; cells were sorted into three bins based on GFP expression (top 0–2.5%, top 2.5–5%, and top 5–10%). log2 values of normalized barcode counts are shown. R is the Pearson correlation coefficient.
Fig. 4Selection and validation of 5′ UTR candidates.
a 5′ UTRs that modulate protein expression were ranked by their mean log2 ratios (compared with the control of unsorted cells) of the normalized barcode count in the three bins based on GFP expression. 5′ UTRs with a log2 ratio greater than 0.52 (which is highlighted as a red dotted line) in all three bins were selected for further validation. b The GFP gene was inserted into the pVAX1 plasmid to make the pVAX1-GFP plasmid, which was used as a control in the GFP expression study. 5′ UTR candidates were inserted directly upstream of the Kozak sequence of the GFP coding sequence to make the pVAX1-UTR-GFP plasmids. c Three 5′ UTR candidates that significantly enhanced protein expression were chosen for further testing. The p-values for NeoUTR1, NeoUTR2, NeoUTR3 vs pVAX1 are <0.0001, <0.0001, and <0.0001. d The effects of the three 5′ UTRs on GFP expression in RD cells. The p-values for NeoUTR1, NeoUTR2, NeoUTR3 vs pVAX1 are 0.0001, <0.0001 and 0.0010. e The effects of the three 5′ UTRs on VEGF expression in RD cells. The p-values for NeoUTR1, NeoUTR2, NeoUTR3 vs pVAX1 are 0.8838, 0.0146 and 0.0675. f The effects of the three 5′ UTRs on CCL21 expression in RD cells. The p-values for NeoUTR1, NeoUTR2, NeoUTR3 vs pVAX1 are 0.0183, 0.0412, and 0.0002. Relative protein expression in each sample was normalized to that of the pVAX1 plasmid (relative expression (%) = 100 is highlighted as a gray dotted line). Source data are provided as a Source data file. Statistical differences between groups were analyzed by ordinary one-way ANOVA with 95% confidence interval. Data are presented as mean values ± SD for three biological replicates (c, d) or four biological replicates (e, f). (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001 vs pVAX1).
Fig. 5Effects of combinatorial 5′ UTRs on GFP expression in various cell lines.
a We constructed six distinct 5′ UTR combinations by combining different pairwise permutations of the three validated 5′ UTR candidates with a CAACAA linker between them, and then inserted these combinations into the pVAX1-GFP plasmid directly upstream of the Kozak sequence. b GFP expression from the 5′ UTR combinations on GFP expression in HEK 293T cells. Statistical differences between groups were analyzed by ordinary one-way ANOVA with 95% confidence interval. The relative protein expression was normalized to that from the pVAX1-GFP plasmid, set as 100 (%) and highlighted as a gray dotted line. Data are presented as mean values ± SD for three biological replicates. The p-values for CoNeoUTR2-1, CoNeoUTR3-1, CoNeoUTR1-2, CoNeoUTR3-2, CoNeoUTR1-3, and CoNeoUTR2-3 vs pVAX1 are 0.0025, <0.0001, <0.0001, <0.0001, <0.0001, and <0.0001. c Test of the single and combinatorial 5′ UTRs on GFP expression in various cell lines. Source data are provided as a Source data file. Statistical differences between groups were analyzed by two-way ANOVA with 95% confidence interval. Dunnett test was performed to correct for multiple comparisons in ANOVA post-hoc analysis. The relative protein expression was normalized to that from the pVAX1-GFP plasmid, set as 100 (%) and highlighted as a gray dotted line. Data are presented as mean values ± SD for three biological replicates. In HEK 293T cell lines, the p-values for NeoUTR1, NeoUTR2, NeoUTR3, CoNeoUTR2-1, CoNeoUTR3-1, CoNeoUTR1-2, CoNeoUTR3-2, CoNeoUTR1-3, and CoNeoUTR2-3 vs pVAX1 are <0.0001, <0.0001, <0.0001, 0.0002, <0.0001, <0.0001, <0.0001, <0.0001, and <0.0001. In RD cell lines, the p-values for NeoUTR1, NeoUTR2, NeoUTR3, CoNeoUTR2-1, CoNeoUTR3-1, CoNeoUTR1-2, CoNeoUTR3-2, CoNeoUTR1-3, and CoNeoUTR2-3 vs pVAX1 are <0.0001, <0.0001, 0.0055, 0.5989, 0.0836, 0.1336, <0.0001, 0.0549, and <0.0001. In MCF cell lines, the p-values for NeoUTR1, NeoUTR2, NeoUTR3, CoNeoUTR2-1, CoNeoUTR3-1, CoNeoUTR1-2, CoNeoUTR3-2, CoNeoUTR1-3, and CoNeoUTR2-3 vs pVAX1 are 0.0042, <0.0001, <0.0001, 0.3513, 0.9132, <0.0001, <0.0001, 0.0002, and <0.0001. In C2C12 cell lines, the p-values for NeoUTR1, NeoUTR2, NeoUTR3, CoNeoUTR2-1, CoNeoUTR3-1, CoNeoUTR1-2, CoNeoUTR3-2, CoNeoUTR1-3, and CoNeoUTR2-3 vs pVAX1 are <0.0001, 0.0042, 0.0306, 0.9997, 0.0089, <0.0001, <0.0001, <0.0001, and <0.0001. (*p < 0.05; **p < 0.01; ***p < 0.001; ****p < 0.0001 vs pVAX1).