Literature DB >> 23104376

SwissSidechain: a molecular and structural database of non-natural sidechains.

David Gfeller1, Olivier Michielin, Vincent Zoete.   

Abstract

Amino acids form the building blocks of all proteins. Naturally occurring amino acids are restricted to a few tens of sidechains, even when considering post-translational modifications and rare amino acids such as selenocysteine and pyrrolysine. However, the potential chemical diversity of amino acid sidechains is nearly infinite. Exploiting this diversity by using non-natural sidechains to expand the building blocks of proteins and peptides has recently found widespread applications in biochemistry, protein engineering and drug design. Despite these applications, there is currently no unified online bioinformatics resource for non-natural sidechains. With the SwissSidechain database (http://www.swisssidechain.ch), we offer a central and curated platform about non-natural sidechains for researchers in biochemistry, medicinal chemistry, protein engineering and molecular modeling. SwissSidechain provides biophysical, structural and molecular data for hundreds of commercially available non-natural amino acid sidechains, both in l- and d-configurations. The database can be easily browsed by sidechain names, families or physico-chemical properties. We also provide plugins to seamlessly insert non-natural sidechains into peptides and proteins using molecular visualization software, as well as topologies and parameters compatible with molecular mechanics software.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23104376      PMCID: PMC3531096          DOI: 10.1093/nar/gks991

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Amino acid sidechains confer their specific properties to all proteins. They govern the specific folding of polypeptide chains into distinct 3D structures, build complementary binding interfaces between members of macromolecular complexes or create enzyme catalytic sites by spatially arranging together different chemical groups (1). Despite the large diversity of existing proteins, it is clear that nature has not exhaustively sampled the set of macromolecules with specific functions that can be chemically synthesized. This is especially true because proteins are made out of a limited number (twenty in most cases) of genetically encoded amino acids. Although this limited number may confer advantages in terms of genetic encoding and synthesis of proteins in vivo, it clearly restricts protein’s functional diversity. Introducing new chemistry into existing proteins is therefore an attractive strategy for biochemical studies, protein engineering and human therapeutics. A promising approach is to expand the repertoire of amino acids by introducing non-natural amino acids into existing proteins or peptides (2). Thereby, new regions of the chemical space can be explored that have not been already sampled along evolution. Non-natural amino acids, and especially l-alpha amino acids where the sidechain has only one single bond with the backbone (i.e. excluding amino acids with branching at Cα or bonds with other backbone atoms), are particularly interesting because they do not affect the chemical properties of the protein backbone. As such they are more likely to keep the global fold and secondary structure elements of a polypeptide chain unchanged. For instance, in a recent study, it was shown that non-natural sidechains dramatically increase the affinity of amyloid fiber inhibitors (3). Similarly, several cyclic and other kinds of modified peptides with non-natural amino acid sidechains have been developed for therapeutics use, such as cilengitide (4) or crafilzomib (5). Non-natural sidechains have also been used as independent ligands. For instance, l-3,4-dihydroxyphenylalanine is a non-natural amino acid used in Parkinson disease treatment (6), and 5-hydroxytryptophan (oxitriptan) has been used as an antidepressant (7). Another commonly used type of non-natural sidechains consists of d-amino acids. d-amino acids are enantiomers of l-amino acids where the sidechain and the hydrogen atom attached to the Cα atom have been switched. d-amino acids are known to increase resistance to protease degradation and are thus frequently used to enhance the biological stability of peptide-based drugs or biological tools (8,9). In addition to therapeutics use, non-natural sidechains have found many other applications in biochemistry and protein studies (10). These include photo-crosslinking amino acids to probe in vivo protein interactions (11,12), fluorescent amino acids used as markers of specific proteins (13) or phosphorylated amino acid mimetics to probe the effect of post-translational modifications (14). The importance of expanding the building blocks of proteins beyond the 20 natural ones is further emphasized by noting that a remarkably high degree of optimization is often observed in the choice of amino acid sequences for many proteins. Optimized sequences stand as an important hurdle to protein or peptide engineering based only on genetically encoded amino acids. Experimental research on non-natural sidechains is based mainly on solid-phase synthesis technology (15) and genetically encoding non-natural sidechains into micro-organisms or cell cultures (2). The former technique uses in vitro chemical peptide synthesis to assemble polypeptide chains with both natural and non-natural residues. This technology is routinely used to synthesize non-natural peptides and can nowadays be applied to small proteins of up to hundred amino acids (16). The latter is typically based on generating novel unique codon–tRNA pairs, together with the corresponding aminoacyl tRNA synthetases in a given organism (2). This approach enables production of non-natural sidechain containing polypeptide chains in micro-organisms that can be used as in vivo biological tools (13) or as modified peptide libraries for ligand screening (17). Recent progresses currently enable encoding simultaneously hundreds of different non-natural sidechains in the same organism (18). Bio- and chemo-informatics, structural modeling and visualization tools are powerful to guide and interpret experimental studies with both natural and non-natural amino acids. Recently, we and others developed different computational tools to better characterize non-natural sidechains and predict the effect of introducing them into proteins or peptides (19–22). This includes sidechain 3D coordinates, biomolecular parameters describing properties such as bond length flexibility, as well characterization of conformational diversity in terms of rotamers (23). For the latter, different strategies have been elaborated to predict rotamer probabilities, using either solely free energy–based calculations (20,22), or combining physics-based calculations together with statistical analysis of conserved properties in natural sidechain rotamer libraries (19). Here, we introduce the SwissSidechain database, a unified and integrated resource providing access to curated biochemical and structural information for 210 non-natural sidechains. Many of the data are unique to SwissSidechain (e.g. rotamers, biomolecular parameters), whereas the rest has been collected from existing resources to provide a rapid access to this information. The SwissSidechain database further includes visualization and molecular modeling tools. Different options to browse the non-natural sidechains based on their chemical properties or their families are also provided. The database can be accessed at http://www.swisssidechain.ch.

Content of the SwissSidechain database

The SwissSidechain database contains molecular and structural data for 210 non-natural alpha amino acid sidechains, both in l- and d-configurations, in addition to the 20 natural ones. These amino acids have been selected based on two main criteria: first, the presence of non-natural sidechains in publicly available protein structures in the PDB (Protein Data Bank) (24) and second, the commercial availability. Each amino acid in SwissSidechain has been given a three- or four-letter code. For all sidechains present in the PDB, we kept the existing three-letter code. For other sidechains, a four-letter code was created. These include many d-amino acids whose l-configuration is present in the PDB. For them, a ‘d’ was simply added in front of the three-letter code (e.g. NLE stands for l-Norleucine and DNLE for d-Norleucine). For the rest, four-letter codes were chosen so as to recall as much as possible the full chemical name (e.g. AZDA for l-azido-alanine), and always starting with a ‘d’ for non-natural sidechains in d-configuration (e.g. DZDA for d-azido-alanine). SwissSidechain data are available in different files describing the chemical structure (SMILES and 2D chemical drawing) and 3D coordinates (PDB, Mol2) of these sidechains, files describing their physico-chemical properties (molecular weight, volume, logP, pKa, partial charges, bond/angle/torsion constants), as well as files describing the probability of their possible conformations (rotamers). Among the different experimentally measurable chemical properties of non-natural sidechains in SwissSidechain, volumes have been calculated with CHARMM (Chemistry at HARvard Molecular Mechanics) (25) and molecular weights with OpenBabel (26). LogP values give information about the hydrophobicity. Experimental LogP for both the full amino acid and the sidechain only have been manually collected from literature when available. For the rest of the sidechains, LogP values have been predicted with XlogP3 (27). PKa values for each protonatable group of the full amino acids have been retrieved from literature when available and computed with MarvinSketch (ChemAxon, http://www.chemaxon.com) otherwise. Model parameters, such as bond, angle and torsion constants, which provide information about sidechain flexibility and dynamics, are provided as parameter files compatible with the CHARMM force field (28) and can be readily used in the CHARMM (25) and GROMACS (GROningen MAchine for Chemical Simulations) (29) molecular mechanics software. Similarly, partial atomic charges, which are useful to predict possible interactions (e.g. H-bond, electrostatic) of a sidechain with its surrounding atoms, are provided as topology files in the CHARMM (25) and GROMACS (29) format. The computation of these model parameters is described in detail elsewhere (19). Finally, rotamers have been generated as described in Gfeller et al. (19). To include information about sidechain conformations in recent crystal structures, we developed the current version of SwissSidechain based on an up-to-date rotamer library for natural sidechains, instead of using rotamer libraries published in 2002 (23). In practice, we performed a global analysis of natural sidechain conformations in all crystallographic structures available in the PDB (as on May 2012) with resolution lower or equal to 1.75Å. This updated rotamer library of natural sidechains, and especially the conserved probabilities of the first dihedral angles of long sidechains, was used in all our analysis to predict rotamers for non-natural sidechains, as described in Gfeller et al. (19). We further expand rotamer libraries to d-amino acids, both natural and non-natural. For non-chiral sidechains, rotamer probabilities can be readily generated using the properties of mirror images. In particular, a d-configuration characterized by backbone dihedral angles and and sidechain dihedral angles χi is equivalent to the l-configuration with dihedral angles −, − and −χi (i = 1, … N, where N stands for the number of freely rotating dihedral angles along the sidechain). In other words, , where P stands for the conformational probability of a d-amino acid and P stands for the conformational probability of the same amino acid in l-configuration (30). Note that this does not consider interactions with other residues, neither whether the previous and the next residues are in l- or d-configuration. However, such information is typically not explicitly considered in rotamer libraries. For chiral sidechains, the previous relation does not hold because of the additional asymmetric centers. Therefore, we used the same strategy based on molecular dynamics simulations and renormalization that was used for l-sidechains in Gfeller et al. (19) to predict the rotamer libraries for chiral d-sidechains. To help exploring the structural environment of non-natural sidechains incorporated in polypeptide chains, we developed visualization plugins for PyMOL and UCSF Chimera (31). These plugins enable users to mutate residues in protein structures to any sidechain (natural and non-natural, in both l- and d-configurations) present in the SwissSidechain database. This is particularly useful to evaluate the structural environment of a new residue and detect possible clashes that may arise on mutation. Moreover, the resulting structures can be readily used for future investigations with molecular mechanics software. Detailed instructions for installing and running these plugins are provided on the SwissSidechain web site, together with short movies describing standard use for mutating a residue to a non-natural sidechain in a protein structure.

Web interface

For each sidechain, all data are collected in dedicated web pages containing 2D and 3D structures, the different names, biochemical properties, links to existing databases and download options. An example is shown in Figure 1. The biochemical parameters listed on the right have been computed as described above. Links to existing databases include the PDB (24), the PDB Ligand Expo (32) and PubChem (33). Files to be downloaded consist of structure files containing 3D coordinates (PDB and Mol2 formats), 2D structures, rotamers (backbone dependent and independent) and topology files for molecular mechanics software such as CHARMM (25) and GROMACS (29) Parameter files are available at the Molecular Dynamics (MD) simulations pages. These data are provided for both l- and d-configurations. The amino acid 3D structures for the l-configuration can be visualized online thanks to the open-source Java viewer for chemical structures Jmol (http://www.jmol.org). In addition we provide bulk download options for all data at the download page of SwissSidechain.
Figure 1.

Sidechain general information page. 2D and 3D views of the full amino acids are provided. Names, SMILES and several biochemical parameters are listed on the right part of the page, as well as links to other structural or chemical databases. Download options are provided in the lower part.

Sidechain general information page. 2D and 3D views of the full amino acids are provided. Names, SMILES and several biochemical parameters are listed on the right part of the page, as well as links to other structural or chemical databases. Download options are provided in the lower part.

Browsing options

SwissSidechain data can be browsed in large alphabetical tables containing the three- or four-letter codes and the sidechain full names, 2D structures, internal links to the sidechain pages, external links to the PDB, as well as download links for structural files. Both a full table and tables restricted to sub-families of natural sidechain derivatives (e.g. methionine derivatives) are provided. Another alternative is to query data based on sidechain physico-chemical properties. Two particularly important parameters are the volume and the hydrophobicity (logP) of the sidechains. In SwissSidechain, we provide an interactive 2D plot of these parameters. This enables users to zoom, visualize and select specific subsets of non-natural sidechains (see Figure 2). Sidechains can be selected by clicking on the corresponding circles or selecting a region of the graph and using the ‘select all’ button. Selected sidechains appear in the box below the graph and links to sidechain pages can be followed from there. This approach is especially useful for protein engineering and ligand design, to select a subset of sidechains with specific properties.
Figure 2.

Sidechain chemical properties browsing plot. Each sidechain is represented by a yellow circle positioned according to the sidechain volume and logP. On mouseover, the amino acid 2D structure and full name appear. Sidechains can be selected by clicking on the circles or using the ‘select all’ button to select all sidechains in the plot. Selected sidechains appear in the box below the plot, where users can follow the links to the sidechain web page (see Figure 1) or download all structural and molecular files. Users can zoom on the graph by selecting subregions (see red rectangle in the overview panel). Specific families of sidechain derivatives can be selected in the menu on the left.

Sidechain chemical properties browsing plot. Each sidechain is represented by a yellow circle positioned according to the sidechain volume and logP. On mouseover, the amino acid 2D structure and full name appear. Sidechains can be selected by clicking on the circles or using the ‘select all’ button to select all sidechains in the plot. Selected sidechains appear in the box below the plot, where users can follow the links to the sidechain web page (see Figure 1) or download all structural and molecular files. Users can zoom on the graph by selecting subregions (see red rectangle in the overview panel). Specific families of sidechain derivatives can be selected in the menu on the left.

CONCLUSION AND OUTLOOK

Exploiting naturally evolved, and often well optimized, proteins or peptides while not being restricted to the limited set of 20 natural amino acids is a promising approach for protein engineering with applications in biochemistry, protein structure and function analysis, and for drug design (3,10). Toward this goal, non-natural sidechains are being increasingly used in experimental studies. The SwissSidechain database provides a unified web resource for hundreds of non-natural sidechains. The biochemical parameters allow researchers to rapidly retrieve sidechains with specific properties, such as volume or hydrophobicity. The different visualization tools enable seamless introduction of non-natural sidechains in protein and peptide structures. The model parameters, such as bond and angle flexibility or partial atomic charges are useful to investigate non-natural sidechains with molecular mechanics software. Most of the non-natural sidechains are commercially available, so that researchers interested in using the predictions obtained with the SwissSidechain data can easily test them experimentally. The naming is consistent with the PDB so that published experimental structures can be readily analyzed with the tools provided in SwissSidechain. In the current version of SwissSidechain, we focus on alpha amino acid sidechains that do not modify the backbone atoms (i.e. without branching at Cα, modification of the nitrogen atom or longer backbones such as beta amino acids). These amino acids in their l-configuration are less likely to lead to large structural remodeling when incorporated into proteins or peptides. Therefore they are often used in protein engineering experiments, and predictions of their effect on protein structure and function are in general more accurate. The same amino acids in d-configuration have found many applications in biochemistry and drug discovery. Better characterizing d-amino acid properties is also useful to study d-retro-inverso peptides that can provide mimetics of l-peptides (34), or d-peptides obtained by screening l-peptides against d-targets and using the mirror-image property of d-enantiomers (35). Structural modeling of these types of non-natural sidechains is becoming feasible, although it is still more challenging. This is for instance the case with peptides binding to peptide recognition domains, such as Post synaptic density protein, Drosophila disc large tumor suppressor, and Zonula occludens-1 protein or Src Homology 3 domains (36,37). As the binding interface is often relatively short, one may be able to probe in silico the effect of alpha-amino acids in d-configuration (38). For these different reasons, we have included data for the d-configuration of all amino acid sidechains present in the current version of SwissSidechain. As experimental techniques keep progressing, more and more non-natural sidechains will become commercially available or be encountered in protein structures. We plan to include these new sidechains in future updates of the SwissSidechain database. Other developments may include focusing on different families of non-natural sidechains, such as fluorescent or photo-cleavable amino acids.

FUNDING

European Molecular Biology Organization long-term fellowship [ALTF 241-2010 to D.G.]; Swiss Institute of Bioinformatics. Funding for open access charge: Swiss Institute of Bioinformatics. Conflict of interest statement. None declared.
  36 in total

Review 1.  Rotamer libraries in the 21st century.

Authors:  Roland L Dunbrack
Journal:  Curr Opin Struct Biol       Date:  2002-08       Impact factor: 6.809

Review 2.  Expanding the genetic code.

Authors:  Lei Wang; Jianming Xie; Peter G Schultz
Journal:  Annu Rev Biophys Biomol Struct       Date:  2006

3.  Potent D-peptide inhibitors of HIV-1 entry.

Authors:  Brett D Welch; Andrew P VanDemark; Annie Heroux; Christopher P Hill; Michael S Kay
Journal:  Proc Natl Acad Sci U S A       Date:  2007-10-17       Impact factor: 11.205

Review 4.  CHARMM: the biomolecular simulation program.

Authors:  B R Brooks; C L Brooks; A D Mackerell; L Nilsson; R J Petrella; B Roux; Y Won; G Archontis; C Bartels; S Boresch; A Caflisch; L Caves; Q Cui; A R Dinner; M Feig; S Fischer; J Gao; M Hodoscek; W Im; K Kuczera; T Lazaridis; J Ma; V Ovchinnikov; E Paci; R W Pastor; C B Post; J Z Pu; M Schaefer; B Tidor; R M Venable; H L Woodcock; X Wu; W Yang; D M York; M Karplus
Journal:  J Comput Chem       Date:  2009-07-30       Impact factor: 3.376

Review 5.  Mirror image phage display--a method to generate D-peptide ligands for use in diagnostic or therapeutical applications.

Authors:  Susanne Aileen Funke; Dieter Willbold
Journal:  Mol Biosyst       Date:  2009-06-12

6.  Probing protein-protein interactions with a genetically encoded photo-crosslinking amino acid.

Authors:  Hui-wang Ai; Weijun Shen; Amit Sagi; Peng R Chen; Peter G Schultz
Journal:  Chembiochem       Date:  2011-06-15       Impact factor: 3.164

7.  Encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome.

Authors:  Heinz Neumann; Kaihang Wang; Lloyd Davis; Maria Garcia-Alai; Jason W Chin
Journal:  Nature       Date:  2010-02-14       Impact factor: 49.962

8.  Total chemical synthesis of biologically active vascular endothelial growth factor.

Authors:  Kalyaneswar Mandal; Stephen B H Kent
Journal:  Angew Chem Int Ed Engl       Date:  2011-07-08       Impact factor: 15.336

9.  From combinatorial peptide selection to drug prototype (II): targeting the epidermal growth factor receptor pathway.

Authors:  Marina Cardó-Vila; Ricardo J Giordano; Richard L Sidman; Lawrence F Bronk; Zhen Fan; John Mendelsohn; Wadih Arap; Renata Pasqualini
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-26       Impact factor: 11.205

10.  PubChem's BioAssay Database.

Authors:  Yanli Wang; Jewen Xiao; Tugba O Suzek; Jian Zhang; Jiyao Wang; Zhigang Zhou; Lianyi Han; Karen Karapetyan; Svetlana Dracheva; Benjamin A Shoemaker; Evan Bolton; Asta Gindulyte; Stephen H Bryant
Journal:  Nucleic Acids Res       Date:  2011-12-02       Impact factor: 16.971

View more
  43 in total

1.  Oxidation-induced destabilization of the fibrinogen αC-domain dimer investigated by molecular dynamics simulations.

Authors:  Eric N Pederson; Gianluca Interlandi
Journal:  Proteins       Date:  2019-06-14

2.  Anion-π interactions in complexes of proteins and halogen-containing amino acids.

Authors:  Sunčica Z Borozan; Mario V Zlatović; Srđan Đ Stojanović
Journal:  J Biol Inorg Chem       Date:  2016-02-24       Impact factor: 3.358

3.  A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets (ProSPECCTs).

Authors:  Christiane Ehrt; Tobias Brinkjost; Oliver Koch
Journal:  PLoS Comput Biol       Date:  2018-11-08       Impact factor: 4.475

4.  Trapped translocation intermediates establish the route for export of capsular polysaccharides across Escherichia coli outer membranes.

Authors:  Nicholas N Nickerson; Iain L Mainprize; Lauren Hampton; Michelle L Jones; James H Naismith; Chris Whitfield
Journal:  Proc Natl Acad Sci U S A       Date:  2014-05-19       Impact factor: 11.205

5.  Targeting Unoccupied Surfaces on Protein-Protein Interfaces.

Authors:  David Rooklin; Ashley E Modell; Haotian Li; Viktoriya Berdan; Paramjit S Arora; Yingkai Zhang
Journal:  J Am Chem Soc       Date:  2017-08-04       Impact factor: 15.419

6.  SwissTargetPrediction: a web server for target prediction of bioactive small molecules.

Authors:  David Gfeller; Aurélien Grosdidier; Matthias Wirth; Antoine Daina; Olivier Michielin; Vincent Zoete
Journal:  Nucleic Acids Res       Date:  2014-05-03       Impact factor: 16.971

7.  Translation-dependent unwinding of stem-loops by UPF1 licenses Regnase-1 to degrade inflammatory mRNAs.

Authors:  Takashi Mino; Noriki Iwai; Masayuki Endo; Kentaro Inoue; Kotaro Akaki; Fabian Hia; Takuya Uehata; Tomoko Emura; Kumi Hidaka; Yutaka Suzuki; Daron M Standley; Mariko Okada-Hatakeyama; Shigeo Ohno; Hiroshi Sugiyama; Akio Yamashita; Osamu Takeuchi
Journal:  Nucleic Acids Res       Date:  2019-09-19       Impact factor: 16.971

8.  Forcefield_PTM: Ab Initio Charge and AMBER Forcefield Parameters for Frequently Occurring Post-Translational Modifications.

Authors:  George A Khoury; Jeff P Thompson; James Smadbeck; Chris A Kieslich; Christodoulos A Floudas
Journal:  J Chem Theory Comput       Date:  2013-12-10       Impact factor: 6.006

9.  p-Cyanophenylalanine and selenomethionine constitute a useful fluorophore-quencher pair for short distance measurements: application to polyproline peptides.

Authors:  Mary Rose Mintzer; Thomas Troxler; Feng Gai
Journal:  Phys Chem Chem Phys       Date:  2015-03-28       Impact factor: 3.676

10.  Light Regulation of Enzyme Allostery through Photo-responsive Unnatural Amino Acids.

Authors:  Andrea C Kneuttinger; Kristina Straub; Philipp Bittner; Nadja A Simeth; Astrid Bruckmann; Florian Busch; Chitra Rajendran; Enrico Hupfeld; Vicki H Wysocki; Dominik Horinek; Burkhard König; Rainer Merkl; Reinhard Sterner
Journal:  Cell Chem Biol       Date:  2019-09-05       Impact factor: 8.116

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.