| Literature DB >> 28545477 |
Gleb Kuznetsov1,2,3, Daniel B Goodman1,2, Gabriel T Filsinger1,2,4, Matthieu Landon1,4,5, Nadin Rohland1, John Aach1, Marc J Lajoie6,7, George M Church8,9.
Abstract
We present a method for identifying genomic modifications that optimize a complex phenotype through multiplex genome engineering and predictive modeling. We apply our method to identify six single nucleotide mutations that recover 59% of the fitness defect exhibited by the 63-codon E. coli strain C321.∆A. By introducing targeted combinations of changes in multiplex we generate rich genotypic and phenotypic diversity and characterize clones using whole-genome sequencing and doubling time measurements. Regularized multivariate linear regression accurately quantifies individual allelic effects and overcomes bias from hitchhiking mutations and context-dependence of genome editing efficiency that would confound other strategies.Entities:
Keywords: Genome engineering; Predictive modeling; Synthetic organisms
Mesh:
Year: 2017 PMID: 28545477 PMCID: PMC5445303 DOI: 10.1186/s13059-017-1217-z
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Workflow for improving phenotypes through model-guided multiplex genome editing. First, an initial set of target alleles (hundreds to thousands) is chosen for testing based on starting hypotheses. These targets may be designed based on differences from a reference strain, synthesis or design errors, or biophysical modeling. Multiplex genome editing creates a set of modified clones enriched with combinations of the targeted changes. Clones are screened for genotype and phenotype and predictive modeling is used to quantify allele effects. The workflow is repeated to validate and test new alleles. Beneficial alleles are combined to create an optimized genotype
Fig. 2Mutation dynamics over many cycles of MAGE allele reversion. a Increase in combinatorial diversity and reversion count vs. number of MAGE cycles. b Number of reversions per clone vs. MAGE cycle. c The rate of reversions per MAGE cycle among the different allele categories, showing a higher rate per cycle for cells exposed to all 127 oligos. d The number of de novo mutations per clone over successive MAGE cycles. e Rate of de novo mutations per MAGE cycle. f The average ratio between number of de novo mutations and reverted alleles per MAGE cycle remains constant throughout the experiment. g Doubling time (min) improvement per clone from the C321.∆A starting strain (top dotted line) towards the ECNR2 parent strain (bottom dotted line). Blue line is a LOESS fit
Fig. 3Genotypic and phenotypic diversity in 87 clones sampled across 50 MAGE cycles enabled model-guided prioritization of top single nucleotide variants (SNVs) for further validation. a Percent of C321.∆A fitness defect recovered across MAGE cycles (shown with bar color and height). The number of SNVs reverted or introduced are shown below. b Presence of targeted reversions and de novo mutations in each clone colored according to fitness. A subset of the most enriched mutations is shown, ordered by enrichment (full dataset available in Additional file 10). c Example model fit using top eight alleles as features with 15 samples left out as a test set (blue points) and used to evaluate R2. Training points are plotted in orange. The inset shows distribution of R2 values for 100 different simulations with 15 random samples left out to calculate R2 for each. Example fit was chosen to exemplify a median R2 value from this distribution. d Average model fit coefficients for top eight alleles assigned non-zero values over repeated cross-validated linear regression (see “Methods”) indicate their predicted contribution to fitness improvement
Fig. 4Construction and characterization of final strain C321.∆A.opt. a Doubling time of clones isolated during construction and optimization of C321.∆A. Strain C321.∆A.opt was constructed in seven cycles of MAGE in batches of up to three cycles separated by MASC-PCR screening to pick clones with the maximum number of alleles converted (see “Methods”). The two dotted horizontal lines correspond to the relative doubling times for the original GRO and the wild-type strain. b Testing nsAA-dependent protein expression using the nsAA p-acetyl-L-phenylalanine (pAcF) in sfGFP variants with 0, 1, or 3 residues replaced with UAG codons. Normalized GFP fluorescence was calculated by taking the ratio of absolute fluorescence to OD600 of cells suspended in phosphate buffered saline (PBS) for each sample and normalizing to the fluorescence ratio of non-recoded strain EcNR1.mutS.KO expressing 0 UAG sfGFP plasmid
Fig. 5Interactions among top six alleles show evidence of epistasis. Genotypes and fitness measurements were obtained from 359 intermediate clones generated during the construction of the final strain containing the six best alleles (Additional file 7). Each clone was genotyped using MASC-PCR and doubling time was measured during allele validation experiments and final strain construction. a Individual model coefficients for the top six alleles, as well as three significant interaction terms identified during combinatorial construction. These values are from a linear model with interaction terms between each pair of alleles. The error bars signify the standard error of the mean of the model coefficients and the significance codes for a non-zero effect size are: *** p < 0.001, ** 0.001 ≤ p < 0.01, * 0.01 ≤ p < 0.05, n.s. not significant. All three interactions coefficients remain significant after a family-wise error rate (FWER) of = 0.05/C(6,2) = 0.003. b Each data point represents the amount of fitness recovered when adding the allele specified to an identical starting genotype background. Horizontal error bars correspond to the standard deviation of fitness defect among all clones with this starting genotype. Vertical error bars represent the standard deviation of all differences between clones with and without the respective allele. For each plot, the thick colored line represents a simple linear fit through the points, corresponding to the r and p values given in each plot. The dotted line corresponds to the predicted fit for a simple multiplicative model of fitness where the allele always recovers a constant percent of the remaining fitness defect regardless of the background. For all alleles except A4102449G (pink), adding the allele to C321 showed a recovery of the fitness defect (>0 on the y axis), with the percentage of defect recovered decreasing as other alleles are also reverted, consistent with a first-order multiplicative model. In some cases, the fitness improvement drops more rapidly than predicted by the multiplicative model (i.e. points below the dotted lines), suggesting diminishing returns epistasis. This is supported by the negative-coefficient interaction terms in panel (a). In the case of A4102449G there appears to be a negative effect with the mutation alone, but an increase in the presence of other alleles, suggesting possible sign epistasis