| Literature DB >> 31670785 |
Marco Fantini1, Simonetta Lisi1, Paolo De Los Rios2,3, Antonino Cattaneo1,4, Annalisa Pastore5,6.
Abstract
Protein structure is tightly intertwined with function according to the laws of evolution. Understanding how structure determines function has been the aim of structural biology for decades. Here, we have wondered instead whether it is possible to exploit the function for which a protein was evolutionary selected to gain information on protein structure and on the landscape explored during the early stages of molecular and natural evolution. To answer to this question, we developed a new methodology, which we named CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), that is able to obtain the in vitro evolution of a protein from an artificial selection based on function. We were able to observe with CAMELS many features of the TEM-1 beta-lactamase local fold exclusively by generating and sequencing large libraries of mutational variants. We demonstrated that we can, whenever a functional phenotypic selection of a protein is available, sketch the structural and evolutionary landscape of a protein without utilizing purified proteins, collecting physical measurements, or relying on the pool of natural protein variants.Entities:
Keywords: AmpR; DCA; PacBio; SMRT sequencing; Sequel; beta-lactamase; direct coupling analysis; error-prone PCR; evolutionary couplings; molecular evolution; mutagenesis; third-generation sequencing; β-lactamase
Year: 2020 PMID: 31670785 PMCID: PMC7086169 DOI: 10.1093/molbev/msz256
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FSchematic representation of the pipeline and structure of the target protein. (A) The coding sequence of target protein is cloned in a plasmid vector for mutagenesis. After several rounds of mutation and selection for the desired function, the new protein variants are collected in a DNA library. NGS of this library provides sequences that after processing are used for generating the prediction. (B) Experimental structure and main features of TEM-1 beta-lactamase (derived from PDB 1ZG4). The N- and C-terminal helices (orange) form a subdomain with the five stranded central beta sheet (blue) linked to the helical subdomains (red) by two hinge regions on the opposite side of the sheet. The catalytic pocket resides at the interface between the beta sheet and the helical domain. Helix H2 (purple) is the innermost helix of the helical domain and comprises both a catalytic and a structural function.
FSequencing and molecular evolution results. (A) Boxplot showing the number of amino acid mutations (mismatches) observed in the sample of clones sequenced after each generation (Sanger sequencing, black border) and after NGS (red border). The white diamond dots indicate the mean. (B) Frequency distribution of the number of aminoacidic mutations observed in the sequenced libraries (solid lines) and their respective Poissonian regressions (dotted lines: gen1: λ=5.12; gen5: λ=12.54; gen12: λ=26.9).
FSequencing and molecular evolution results. (A) Shannon information entropy (H) per residue position of the sequenced 12th generation library. The colors and annotations follow the secondary structure classification present in the PDB structure 1ZG4 (red: alpha helices, blue: beta sheets, tan: coils). The leader peptide sequence (light gray) is missing in the structure. (B) Comparison of the Shannon information entropy between the UniProt and the in vitro evolved data sets. (C) Relationship between the entropy of the residues obtained in molecular evolution and the mean B factor of the residues observed in the reference structure 1ZG4. Since the reference structure is missing the leader peptide, the first 23 amino acids do not have an associated Bfactor. (D) t-SNE dimensionality reduction applied to the joined UniProt/error-prone PCR 12th generation library data set. Hamming distance between sequences was used as distance metric. Gray and cyan represent the original data set (gray UniProt, cyan epPCR library). Overlaid on top, the UniProt sequence membership to one of the three main families of type A beta-lactamases retrieved from the corresponding UniProt annotation are displayed in bright colors. The original pUC19 beta-lactamase before molecular evolution is classified as a TEM beta-lactamase (red).
FDCA predictions of the beta-lactamase contact map. (A) DCA plot showing the top L (L = 286, the length of the protein amino acid chain) contact predictions by DCA obtained from the fifth generation of molecular evolution. The graph is an L×L grid where each axis represents the amino acid positions of the lactamase chain, from the N- to C-terminals. Each point represents the pair of residues described by its coordinates. The graph is separated in two halves. In the lower half black dots represent pairs of residues that have at least a pair of their respective nonhydrogen atoms <8.5 Å apart in the reference crystallographic structure (PDB ID: 1ZG4). These positions are considered residues in contact with each other. In the upper half the top L DCA predictions from the molecular evolution data set are plotted above the gray-mirrored silhouette of the crystallographic contacts. Pairs where the respective residues are less than five positions apart in the lactamase alignment are excluded from this ranking to promote visualization of long-range interactions. In the graph the color indicates the shortest path (as the lowest L1 norm in the graph grid space) connecting the point to a contact pair position (a pair of residues that have nonhydrogen atoms <8.5 Å apart in the reference structure). (B) DCA plot showing the top L (L = 286, the length of the protein amino acid chain) contact predictions by DCA obtained from the 12th generation of molecular evolution. (C) Plot of the top L DCA predictions of the UniProt data set. (D) Plot of the top L/2 partial correlations of residue positions on DCA score obtained from the 12th generation of molecular evolution.