Literature DB >> 17986451

CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering.

Conan K L Wang1, Quentin Kaas, Laurent Chiche, David J Craik.   

Abstract

CyBase was originally developed as a database for backbone-cyclized proteins, providing search and display capabilities for sequence, structure and function data. Cyclic proteins are interesting because, compared to conventional proteins, they have increased stability and enhanced binding affinity and therefore can potentially be developed as protein drugs. The new CyBase release features a redesigned interface and internal architecture to improve user-interactivity, collates double the amount of data compared to the initial release, and hosts a novel suite of tools that are useful for the visualization, characterization and engineering of cyclic proteins. These tools comprise sequence/structure 2D representations, a summary of grafting and mutation studies of synthetic analogues, a study of N- to C-terminal distances in known protein structures and a structural modelling tool to predict the best linker length to cyclize a protein. These updates are useful because they have the potential to help accelerate the discovery of naturally occurring cyclic proteins and the engineering of cyclic protein drugs. The new release of CyBase is available at http://research1t.imb.uq.edu.au/cybase.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17986451      PMCID: PMC2239000          DOI: 10.1093/nar/gkm953

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Proteins with a macrocyclic backbone consisting of a continuous cycle of peptide bonds have been discovered over recent years in bacteria, plants and animals (1). These macrocyclic proteins are different from small cyclic peptides, such as cyclosporin, in that they are gene-encoded products, with backbone cyclization occurring as a post-translational modification rather than being non-ribosomally synthesized (2). Currently, there are five major classes of naturally occurring cyclic proteins: the cyclic sex-pilin (3) and bacteriocins (4–6) from bacterial sources, the θ-defensins from primates (7), trypsin inhibitors from Asteraceae and Cucurbitaceae family plants (8–10) and the cyclotides from plants of the Violaceae and Rubiaceae (11–13). The cyclotides are by far the largest family of circular proteins, with recent screening programs suggesting that the total number of sequences may be in the thousands (14,15). Interest in cyclic proteins has been inspired by the promising therapeutic advantages of cyclic proteins over their conventional linear counterparts (16–19). One of the major benefits of a circular backbone is improved stability (16,17) and at least one family of cyclic proteins, the cyclotides, has been shown to be highly resistant to enzymatic, thermal and chemical treatment (20,21). This increased stability means that circular proteins are promising scaffolds for drug design applications (22,23). The concept of backbone cyclization can be adopted to improve the bioavailability of linear proteins, thus increasing the therapeutic potential (24–27). Furthermore, rigidification of the often-flexible termini through cyclization can lead to favourable entropy changes and improved receptor binding affinities (8,9). CyBase is a database of cyclic proteins that was initially developed to provide a uniform repository to handle the sequence/structure/function data for circular proteins (28). CyBase has now been completely redesigned to manage the continuing growth of circular protein data and to provide improved user-interactivity. A major feature of the new release is a new module to manage data on synthetic circular proteins, which was designed to assist in circular protein engineering. Additionally, a range of analytical and predictive tools has been designed to handle the unique challenges of circular protein characterization and engineering.

IMPROVEMENTS AND DISCUSSION

Although CyBase has been completely redesigned, several core features of the original CyBase release, including the underlying database architecture and the general search and display capabilities, have been retained in the new version. Information on sequence, structure and function is stored in a MySQL database, where the central protein table, which contains information on each characterized cyclic protein, is linked to additional tables that described nucleic acid sequences, structures, activities and literature references. The data is accessed using a web-based interface, which provides a variety of text- or alignment-based searching methods. Data entries are displayed using dynamically generated data cards, which describe the relevant information, including sequence, classification and cross-links to other entries in CyBase or to external biological databases such as Genbank, UniProt and PDB. In the original CyBase release, the interface was adapted from a popular content management system for community websites written using the PHP language. In the new release, the interface has been substantially redesigned to increase user-interactivity and improve integration of data with tools. These improvements have been achieved using an additional data abstraction layer implemented in XML that also improves the extensibility and maintainability of the database. As of August 2007, CyBase includes 251 protein sequences, 49 nucleic acid sequences, 39 structures and 91 activity-related entries from five classes of circular proteins. The data content of CyBase is now almost double the initial release, and the growth is expected to continue, with a recent study suggesting that in at least one family of circular proteins, the cyclotides, >9000 sequences have yet to be characterized (14). In addition, an increasing number of engineering studies are being applied to circular proteins. The new CyBase release provides a new range of tools to aid in cyclic protein visualization, discovery and engineering. In terms of visualization, the ‘Diversity Wheel’ tool generates a novel representation of circular protein sequence variation. The tool accepts a multiple sequence alignment and generates a wheel-like diagram that is composed of an inner circle, which describes the consensus sequence from the given multiple sequence alignment, and the radial spikes from each position represent the different amino acids observed at that position, as shown in Figure 1. This representation is useful for evolutionary or mutational studies of circular proteins. For cyclotides and squash trypsin inhibitors, a ‘Collier de Perles’ graphical representation (29) of the sequence/structure has been adapted from the KNOTTIN database (30) to handle the cyclic nature of cyclotides. An example ‘Collier de Perles’ representation is shown in Figure 1. This representation provides a link between protein sequences and their structures and is particularly useful for protein engineering, sequence–structure analysis, visualization and comparisons of positions for mutations, polymorphisms and contact analysis (29). For the visualization of structures, a tool based on Jmol (http://www.jmol.net) has been added to the structure cards to allow a quick overview of each structure and to highlight crucial structural features such as the surface hydrophobicity. In combination with the activity entries in CyBase, visualization of the structures assists in identifying structure–activity relationships.
Figure 1.

Sequence graphical representations incorporated into CyBase. Panel (A) shows a Diversity Wheel representation of sequence diversity from a multiple sequence alignment, where the consensus sequence is positioned in the inner circle and the spike protruding from each position represents the amino acid variation observed at that position. Panel (B) is a Collier de Perles representation of the prototypical cyclotide, kalata B1, showing the sequence and disulphide connectivity. Collier de Perles representations can be generated for proteins belonging to the cyclotide or trypsin squash inhibitor classes. Panel (C) shows a Cyclic Seqplot, which is a representation for NOE data measured from an NMR experiment. The sequence of the peptide is shown on the outside of the circle. Backbone NOEs are drawn as dark bars, where the height of the bar is relative to the strength of the NOE. Medium range and long NOEs are drawn as arcs and lines. 134 × 47mm (600 × 600 DPI)

Sequence graphical representations incorporated into CyBase. Panel (A) shows a Diversity Wheel representation of sequence diversity from a multiple sequence alignment, where the consensus sequence is positioned in the inner circle and the spike protruding from each position represents the amino acid variation observed at that position. Panel (B) is a Collier de Perles representation of the prototypical cyclotide, kalata B1, showing the sequence and disulphide connectivity. Collier de Perles representations can be generated for proteins belonging to the cyclotide or trypsin squash inhibitor classes. Panel (C) shows a Cyclic Seqplot, which is a representation for NOE data measured from an NMR experiment. The sequence of the peptide is shown on the outside of the circle. Backbone NOEs are drawn as dark bars, where the height of the bar is relative to the strength of the NOE. Medium range and long NOEs are drawn as arcs and lines. 134 × 47mm (600 × 600 DPI) Several tools have been added to CyBase to facilitate cyclic protein discovery. Characterization of cyclic proteins has benefited from approaches in molecular biology and mass spectrometry (to determine sequence information) and NMR (to determine 3D structures). To assist in molecular screening for cyclic protein genes, the CyBase ‘Primer Match’ tool, which was developed from suggestions from users, can rapidly predict primer-binding sites given a list of primer sequences and a template sequence. Identification of cyclic protein genes is important because backbone cyclization is a ‘seamless’ process, which means that the location of the N- and C-termini cannot be determined from the mature peptide alone. Mass spectrometry methods, which measure the mass of the mature peptide or enzyme-digested fragments, are commonly used for rapid protein sequence determination. Existing computational tools, which form the core of protein sequence proteomics, do not consider the effect of cyclization on a protein of interest, which changes the mass of the mature protein, the pI of the protein and introduces additional fragments when the protein is digested. Accordingly, the CyBase ‘Digest Peptide’ tool allows for the in silico enzyme digestion of cyclic peptides as well as the prediction of properties such as the absorption coefficient and the pI. The CyBase ‘Fingerprint Search’ tool gives the capability to search the entire database using masses of peptide fragments obtained from an enzymatic digestion of the reduced original peptide for rapid protein sequence identification. Analysis of NMR data, such as chemical shift and NOE patterns, can provide an early indication of the structure of a protein. Chemical shifts and NOE restraints are stored in CyBase and can be presented visually for analysis and comparison. The ‘Alphaplot’ tool can easily generate chemical shift index plots, which are commonly used to identify secondary structure (31). The CyBase ‘Cyclic Seqplot’ tool offers a new representation for short- and long-range NOE patterns, which uses a circular template as shown in Figure 1, and can be used to quickly identify structural elements (e.g. secondary structure). CyBase provides tools to facilitate the engineering of cyclic proteins. To help identify potential targets for backbone cyclization, the CyBase ‘Termini Distance Distributions’ page provides current statistics on N- to C-termini distances of proteins from the PDB. The distribution of distances from the PDB as of June 2007 is shown in Figure 2 and indicates that a significant number of proteins have N- to C-termini distances below 20 Å, a distance that may only require linkers made of a few residues (32). The distributions of the distances can be compared to random models. The details of two random models—one based on an ellipsoid and the other on a random-walk algorithm—have been described previously (33). The current study is further useful because the proximity of the N- and C-termini of proteins has been implicated as an important factor in protein stability and folding (34). The CyBase ‘Predict Linker’ tool predicts the size of a poly-alanine linker needed to connect the termini of a given protein. The algorithm models the cyclized structure using an increasingly longer closing linker while avoiding steric clashes by using the MODELLER program (35). An example model of an artificially cyclized protein is shown in Figure 2. Further analysis of the effect of cyclization can be made using the ‘Cyclization Energy’ tool, which predicts the change in the unfolding free energy, ΔΔGcycl, by backbone cyclization. The algorithm used for the energy prediction is based on the probability of a given linker length stretching a particular distance over the folded and unfolded states of the protein, and has been described in detail previously (36). As circular proteins have been shown to be relatively stable, circular proteins present themselves as promising scaffolds for grafting applications. By following the CyBase ‘Synthetic Analogues’ tool, users can view summaries of grafted or modified peptides and identify which variants had been successfully folded and which variants had interesting activity. The collation of this information is potentially very useful for developing rules for future studies involving synthetic cyclic peptides.
Figure 2.

Cyclization tools incorporated into CyBase. By scanning the distribution of N- to C-termini distances from the PDB as shown in panel (A), the conotoxin MII was identified as a potential target for backbone cyclization. Its relatively short N- to C-termini distance of 9.8 Å means that it is potentially more amenable to backbone cyclization compared to a protein with a longer termini distance. Panel (B) shows a model of a cyclic MII using its native linear structure as a template (PDB ID: 1MII) (37), which has been cyclized in silico using a seven-residue poly-alanine linker (coloured in white). 83 × 136 mm (600 × 600 DPI)

Cyclization tools incorporated into CyBase. By scanning the distribution of N- to C-termini distances from the PDB as shown in panel (A), the conotoxin MII was identified as a potential target for backbone cyclization. Its relatively short N- to C-termini distance of 9.8 Å means that it is potentially more amenable to backbone cyclization compared to a protein with a longer termini distance. Panel (B) shows a model of a cyclic MII using its native linear structure as a template (PDB ID: 1MII) (37), which has been cyclized in silico using a seven-residue poly-alanine linker (coloured in white). 83 × 136 mm (600 × 600 DPI)

CONCLUSION

Cyclic proteins are interesting because they offer increased stability compared to conventional proteins and are promising drug scaffolds. CyBase is a database dedicated to cyclic proteins that provides a standardized method for accessing information on proteic sequences, nucleic sequences, 3D structures and assay results. CyBase also manages data on synthetic analogues of cyclic proteins to assist in drug development projects. Since its initial release, CyBase has grown in size and now provides a suite of tools that are useful for the visualization, analysis and characterization and engineering of cyclic proteins. These include a new ‘Diversity Wheel’ representation, which is useful for analysing circular protein sequence variation, and a ‘Predict Linker’ tool to help in the engineering of cyclic proteins from linear targets. CyBase is available at http://research1t.imb.uq.edu.au/cybase/.
  36 in total

1.  High-resolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds.

Authors:  S Luckett; R S Garcia; J J Barker; A V Konarev; P R Shewry; A R Clarke; R L Brady
Journal:  J Mol Biol       Date:  1999-07-09       Impact factor: 5.469

Review 2.  Loops, linkages, rings, catenanes, cages, and crowders: entropy-based strategies for stabilizing proteins.

Authors:  Huan-Xiang Zhou
Journal:  Acc Chem Res       Date:  2004-02       Impact factor: 22.384

Review 3.  Discovery, structure and biological activities of the cyclotides.

Authors:  David J Craik; Norelle L Daly; Jason Mulvenna; Manuel R Plan; Manuela Trabi
Journal:  Curr Protein Pept Sci       Date:  2004-10       Impact factor: 3.272

4.  The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy.

Authors:  D S Wishart; B D Sykes; F M Richards
Journal:  Biochemistry       Date:  1992-02-18       Impact factor: 3.162

5.  Three-dimensional solution structure of alpha-conotoxin MII by NMR spectroscopy: effects of solution environment on helicity.

Authors:  J M Hill; C J Oomen; L P Miranda; J P Bingham; P F Alewood; D J Craik
Journal:  Biochemistry       Date:  1998-11-10       Impact factor: 3.162

6.  Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure.

Authors:  J F Hernandez; J Gagnon; L Chiche; T M Nguyen; J P Andrieu; A Heitz; T Trinh Hong; T T Pham; D Le Nguyen
Journal:  Biochemistry       Date:  2000-05-16       Impact factor: 3.162

7.  Amino and carboxy-terminal regions in globular proteins.

Authors:  J M Thornton; B L Sibanda
Journal:  J Mol Biol       Date:  1983-06-25       Impact factor: 5.469

8.  Thermal, chemical, and enzymatic stability of the cyclotide kalata B1: the importance of the cyclic cystine knot.

Authors:  Michelle L Colgrave; David J Craik
Journal:  Biochemistry       Date:  2004-05-25       Impact factor: 3.162

9.  Isolation and characterization of a highly hydrophobic new bacteriocin (gassericin A) from Lactobacillus gasseri LA39.

Authors:  Y Kawai; T Saito; T Toba; S K Samant; T Itoh
Journal:  Biosci Biotechnol Biochem       Date:  1994-07       Impact factor: 2.043

Review 10.  Peptide AS-48: prototype of a new class of cyclic bacteriocins.

Authors:  Mercedes Maqueda; Antonio Gálvez; Manuel Martínez Bueno; Maria José Sanchez-Barrena; Carlos González; Armando Albert; Manuel Rico; Eva Valdivia
Journal:  Curr Protein Pept Sci       Date:  2004-10       Impact factor: 3.272

View more
  89 in total

1.  Identification and structural characterization of novel cyclotide with activity against an insect pest of sugar cane.

Authors:  Michelle F S Pinto; Isabel C M Fensterseifer; Ludovico Migliolo; Daniel A Sousa; Guy de Capdville; Jorge W Arboleda-Valencia; Michelle L Colgrave; David J Craik; Beatriz S Magalhães; Simoni C Dias; Octávio L Franco
Journal:  J Biol Chem       Date:  2011-11-10       Impact factor: 5.157

Review 2.  Discovering the bacterial circular proteins: bacteriocins, cyanobactins, and pilins.

Authors:  Manuel Montalbán-López; Marina Sánchez-Hidalgo; Rubén Cebrián; Mercedes Maqueda
Journal:  J Biol Chem       Date:  2012-06-14       Impact factor: 5.157

3.  Progress toward sourcing plants for new bioconjugation tools: a screening evaluation of a model peptide ligase using a synthetic precursor.

Authors:  Tunjung Mahatmanto; Isyatul Azizah; Alex Buchberger; Nicholas Stephanopoulos
Journal:  3 Biotech       Date:  2019-11-09       Impact factor: 2.406

4.  Do plant cyclotides have potential as immunosuppressant peptides?

Authors:  Carsten Gründemann; Johannes Koehbach; Roman Huber; Christian W Gruber
Journal:  J Nat Prod       Date:  2012-01-24       Impact factor: 4.050

5.  Genome sequence of Bacillus subtilis subsp. spizizenii gtP20b, isolated from the Indian ocean.

Authors:  Longjiang Fan; Shiping Bo; Huan Chen; Wanzhi Ye; Katrin Kleinschmidt; Heike I Baumann; Johannes F Imhoff; Michael Kleine; Daguang Cai
Journal:  J Bacteriol       Date:  2010-12-23       Impact factor: 3.490

Review 6.  Various mechanisms in cyclopeptide production from precursors synthesized independently of non-ribosomal peptide synthetases.

Authors:  Wenyan Xu; Liling Li; Liangcheng Du; Ninghua Tan
Journal:  Acta Biochim Biophys Sin (Shanghai)       Date:  2011-07-14       Impact factor: 3.848

7.  Expression of fluorescent cyclotides using protein trans-splicing for easy monitoring of cyclotide-protein interactions.

Authors:  Krishnappa Jagadish; Radhika Borra; Vanessa Lacey; Subhabrata Majumder; Alexander Shekhtman; Lei Wang; Julio A Camarero
Journal:  Angew Chem Int Ed Engl       Date:  2013-01-15       Impact factor: 15.336

8.  Distribution and evolution of circular miniproteins in flowering plants.

Authors:  Christian W Gruber; Alysha G Elliott; David C Ireland; Piero G Delprete; Steven Dessein; Ulf Göransson; Manuela Trabi; Conan K Wang; Andrew B Kinghorn; Elmar Robbrecht; David J Craik
Journal:  Plant Cell       Date:  2008-09-30       Impact factor: 11.277

Review 9.  Bacteriocin as weapons in the marine animal-associated bacteria warfare: inventory and potential applications as an aquaculture probiotic.

Authors:  Florie Desriac; Diane Defer; Nathalie Bourgougnon; Benjamin Brillet; Patrick Le Chevalier; Yannick Fleury
Journal:  Mar Drugs       Date:  2010-04-04       Impact factor: 5.118

10.  APD2: the updated antimicrobial peptide database and its application in peptide design.

Authors:  Guangshun Wang; Xia Li; Zhe Wang
Journal:  Nucleic Acids Res       Date:  2008-10-28       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.