| Literature DB >> 34762392 |
Patrick V Phaneuf1, Daniel C Zielinski2,3, James T Yurkovich2, Josefin Johnsen3, Richard Szubin2, Lei Yang3, Se Hyeuk Kim3, Sebastian Schulz3, Muyao Wu2, Christopher Dalldorf2, Emre Ozdemir3, Rebecca M Lennen3, Bernhard O Palsson1,2,4,3, Adam M Feist2,3.
Abstract
Microbes are being engineered for an increasingly large and diverse set of applications. However, the designing of microbial genomes remains challenging due to the general complexity of biological systems. Adaptive Laboratory Evolution (ALE) leverages nature's problem-solving processes to generate optimized genotypes currently inaccessible to rational methods. The large amount of public ALE data now represents a new opportunity for data-driven strain design. This study describes how novel strain designs, or genome sequences not yet observed in ALE experiments or published designs, can be extracted from aggregated ALE data and demonstrates this by designing, building, and testing three novel Escherichia coli strains with fitnesses comparable to ALE mutants. These designs were achieved through a meta-analysis of aggregated ALE mutations data (63 Escherichia coli K-12 MG1655 based ALE experiments, described by 93 unique environmental conditions, 357 independent evolutions, and 13 957 observed mutations), which additionally revealed global ALE mutation trends that inform on ALE-derived strain design principles. Such informative trends anticipate ALE-derived strain designs as largely gene-centric, as opposed to noncoding, and composed of a relatively small number of beneficial variants (approximately 6). These results demonstrate how strain design efforts can be enhanced by the meta-analysis of aggregated ALE data.Entities:
Keywords: adaptive laboratory evolution; data-driven strain design; genome design variables; meta-analysis; mutation functional analysis; structural biology
Mesh:
Substances:
Year: 2021 PMID: 34762392 PMCID: PMC8870144 DOI: 10.1021/acssynbio.1c00337
Source DB: PubMed Journal: ACS Synth Biol ISSN: 2161-5063 Impact factor: 5.110
Figure 1A general workflow to derive strain designs from aggregated ALE mutations and experimental conditions. The multiple resources used in this workflow are described in the Methods section. Resources and processes are described in the section text.
Figure 2Dimensions and properties of the mutational data set used in this study. (a) A plot of the different dimensions of the ALE data used within this study as extracted from ALEdb. (b) A visual representation of the different condition types across ALEs in the targeted set. The mapping between individual colors and labels can be found in Supplementary Figures S1–S9.
Figure 3Mutation types and effects exhibit a bias toward specific genomic feature types. (a) Table of mutation type and mutated feature frequencies. Synonymous SNPs are abbreviated as “syn”, nonsynonymous SNPs are abbreviated as “non-syn”, and truncating mutations are abbreviated as “trunc”. (b) The distribution of mutation sizes and amount of genomic features affected according to mutation types. Abbreviations: SNP, single nucleotide polymorphism; DEL, deletion; MOB, mobile insertion elements; INS, insertion; SUB, substitution; CNV, copy number variant. (c) The proportion of mutations to individual features across feature types that are truncations. (d) The number of sequence truncating and nontruncating mutations for individual genomic features. Abbreviations: TFBS, transcription factor binding site; RBS, ribosomal binding site.
Figure 4ALE adapted genotypes are gene-centric and involve few mutated features per condition. (a) A clustermap of the Pearson correlation coefficients for all genomic feature pairs (656 085). (b) The distribution of cluster sizes from the clustering of all genomic feature pairs according to their correlation. The median cluster size was six, as highlighted. (c) The total amount of statistically significant associations between unique features and conditions according to feature types. (d) The amount of significantly associated conditions per unique genomic feature.
Figure 5Aggregated ALE data reveal common low-frequency mutation targets with potential benefit to a broad set of conditions.
Figure 6Clustering of truncating or nontruncating mutations reveal variant designs for glycerol as a carbon source. (a) An oncoplot demonstrating the types of mutations to genomic features on operons of interest (operons cyaA, glpFKX, and ptsHI-crr) and the conditions for the ALE samples hosting these mutations. Values within parentheses represent concentrations in g/L unless otherwise stated. (b) A mutation needle plot for mutated amino acids across GlpK’s amino acid chain. (c) GlpK’s 3D structure and mutated residues from mutations. The residue chain and transparent surfaces are colored according to the legend of the corresponding mutation needle plot. Mutations are represented by a small opaque sphere with a value representing their amino acid position on the corresponding mutation needle plot. The color of the mutation’s sphere corresponds to the mutation’s predicted effect as described by the legend on the corresponding mutation needle plot. The transparent sphere centered on the mutations’ opaque sphere represents the number of mutations with a specific predicted effect on that position. The angle shown illustrates how all the GlpK–GlpK interface surfaces are oriented on the same side of the 3D structure along with the clustering of mutations on or near these surfaces. (d) A mutation needle plot for mutated amino acids across CyaA’s amino acid chain. (e) The accumulation of the truncated amino acids downstream of truncating mutation from the mutation needle plot. (f) CyaA’s protein structure and mutated residues from nontruncating mutations. (g) The growth rates of the mutants harboring ALE mutations and designed variants for GlpK in the selection pressure of glycerol as a carbon source. (h) The growth rates of the mutants harboring ALE mutations and designed variants for CyaA in the selection pressure of glycerol as a carbon source. (i) The growth rates of the mutants harboring ALE mutations and designed variants for CyaA in the selection pressure of Δpgi.
Figure 7Clustering of truncating mutations reveals a variant design for toxic concentrations of isobutyric acid. (a) An oncoplot demonstrating mutations linked to the pykF operon across all ALE experiments of this study’s data. Values within parentheses represent concentrations in g/L unless otherwise stated. (b) A mutation needle plot for mutated amino acids across PykF’s amino acid chain. (c) PykF’s 3D structure and mutated residues. No truncating mutations are included. The residue chain and transparent surfaces are colored according to the legend of the corresponding mutation needle plot. Mutations are represented by a small opaque sphere with a value representing their amino acid position on the corresponding mutation needle plot. The color of the mutation’s sphere corresponds to the mutation’s predicted effect as described by the legend on the corresponding mutation needle plot. The transparent sphere centered on the mutation’s opaque sphere represents the number of mutations with a specific predicted effect on that position. The angle shown illustrates how most of the mutations cluster in 3D space around the area which hosts most of the catalytic domains. (d) The accumulation of the truncated amino acids downstream of truncating mutations from mutation needle plot. (e) The growth rates of WT, a ΔpykF strain, the pykF ALE mutant, and the pykF designed variant with inhibiting concentration of isobutyric acid (12.5 g/L). A ΔpykF mutant was used to investigate for any difference between the strains that partially truncate pykF and its full truncation. (f) The growth rates of WT, a ΔpykF strain, the pykF ALE mutant, and the pykF designed variant with glucose as a carbon source. A ΔpykF mutant was used to investigate for any difference between the strains that partially truncate pykF and its full truncation.