Literature DB >> 29136213

KNOTTIN: the database of inhibitor cystine knot scaffold after 10 years, toward a systematic structure modeling.

Guillaume Postic^1,2,3,4, Jérôme Gracy⁵, Charlotte Périn^1,2,3,4, Laurent Chiche⁵, Jean-Christophe Gelly^1,2,3,4.

Abstract

Knottins, or inhibitor cystine knots (ICKs), are ultra-stable miniproteins with multiple applications in drug design and medical imaging. These widespread and functionally diverse proteins are characterized by the presence of three interwoven disulfide bridges in their structure, which form a unique pseudoknot. Since 2004, the KNOTTIN database (www.dsimb.inserm.fr/KNOTTIN/) has been gathering standardized information about knottin sequences, structures, functions and evolution. The website also provides access to bibliographic data and to computational tools that have been specifically developed for ICKs. Here, we present a major upgrade of our database, both in terms of data content and user interface. In addition to the new features, this article describes how KNOTTIN has seen its size multiplied over the past ten years (since its last publication), notably with the recent inclusion of predicted ICKs structures. Finally, we report how our web resource has proved usefulness for the researchers working on ICKs, and how the new version of the KNOTTIN website will continue to serve this active community.

Entities: Chemical Disease Species

Mesh：

Substances：

Year: 2018 PMID： 29136213 PMCID： PMC5753296 DOI： 10.1093/nar/gkx1084

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Inhibitor cystine-knots (ICKs) form a family of ultra-stable miniproteins, found in a wide variety of organisms, with confirmed and potential medical applications. They are characterized by the presence of at least three interwoven disulfide bridges, which form an intramolecular knot and confer them structural and functional resistance to high temperature, enzymatic degradation, extreme pH and mechanical stress. ICKs are ∼30–50 residue long, which make them easily accessible to chemical synthesis. The loops connecting the disulfide bridges show a high variability of sequence, which results in a broad range of functions covered by ICKs, from channel blockage to inhibition of enzymes. All these properties have lead to use ICKs as scaffolds for the engineering of various pharmaceutical and imaging agents (1,2). Examples are the US FDA-approved Linzess® (linaclotide, Allergan/Ironwood Pharmaceuticals; www.linzess.com) (3), which is used to treat irritable bowel disease with constipation, and Prialt® (ziconotide, Azur Pharma; www.prialt.com), which is used to treat chronic pain. The ICK family is made of three groups: (i) knottins, which represents the majority of ICKs—so that the two names are often used interchangeably—and are characterized by having disulfide bonds between the knot-forming cysteines III and VI going through cystines I-IV and II-V; (ii) cyclotides, which have the same disulfide connectivity as knottins, but have their backbone cyclized via an N-terminal to C-terminal peptide bond; and (iii) growth factor cystine-knots, which is the smallest group of the ICK family and includes proteins with a different connectivity than knottins and cyclotides. Since its launch in 2004, our KNOTTIN database (4) concentrates sequences, structures and bibliographical data about ICKs, except the few proteins belonging to the growth factor cystine-knots group. Data about cyclotides can also be found in the more specialized database CyBase (5,6). Our knottin-dedicated database is valuable given the specific properties of these miniproteins in terms of sequence, structure and function. Indeed, the knottins very low sequence identity between families and high sequence plasticity (except for the cysteines) require specific procedures to correctly identify and classify these proteins. In the same way, their structural pseudoknot, very low content of regular secondary structures and particular protein hydrophobic core formed by disulfide bonds, require adapted 2D and 3D representation. The whole diversity of knottins functions (such as neurotransmitters, analgesics, anthelmintics, anti-erectile dysfunction, antimalarials, antimicrobials, antitumor agents, protease inhibitors, toxins and insecticides) also has to be properly represented and documented, by gathering relevant functional annotations and bibliographic data. Finally, the active community of researchers working on the applied and theoretical aspects of knottins needs easy access to softwares dedicated to the analysis of knottins, which is provided by the computational tools available on our KNOTTIN website. Here, we present an upgraded version of KNOTTIN (www.dsimb.inserm.fr/KNOTTIN/), 10 years after its last publication (7). This report describes the new features of the database, the way it has evolved and shown usefulness over the past decade, and future developments.

DESCRIPTION

Website navigation

The content of the KNOTTIN database can be browsed with the horizontal navigation bar of the web interface. The ‘Experimental 3D structures’ and ‘Sequences & 3D models’ menus give access to experimental and theoretical models, respectively. In these pages, users can select one or several proteins, to visualize either their aligned sequences or 3D structures. The latter can also be done under the ‘Sequence alignments’ menu, which displays pre-compiled multi-sequence alignment files. In each of these three menus, the knottins are grouped by families, which have been determined based on sequence similarity and biological activity. Each knottin of the database is also categorized based on the length of its loops between the knot-forming cysteines (i.e. five loops for the knottins, and six for the cyclotides). Thus, a sequence (a)b.c(d)[e] is attributed to each knottin, the letters ‘a’ to ‘e’ being the loops lengths. This nomenclature of ICKs has been introduced with the initial release of the database in 2004.

Querying the system

The database can also be searched. Under the ‘Sequence search’ menu, amino acids sequences can be searched with the BLAST algorithm. The theoretical and experimental models of KNOTTIN can also be searched, under the ‘Conformation search’ menu, based on different structural features, such as torsion angles, secondary structure content and solvent accessibility. Sequences and structures can also be accessed via the ‘Information search’ menu, which offers users the possibility of searching the database by using different criteria, such as family, keywords, crystallography techniques or the aforementioned knottins nomenclature. The database also gathers the literature about the knottin proteins, which can be accessed under the ‘Article search’ menu based on criteria, such as articles authors, publication date and keywords. Bibliographic data about knottins functions, folding, synthesis, modeling, and biotechnological applications can also be browsed under the different sections of the ‘Bibliography’ menu.

Specific tools

Besides the database, the KNOTTIN website is also a platform that regroups under the ‘Tools’ menu softwares dedicated to the analysis of knottin proteins. These computational methods are Knoter1D and Knoter3D (7), which are aimed at identifying knottins based on their 1D sequences and 3D structures, respectively. The third section of the ‘Tools’ menu is a portal to our Knoter1D3D web server (8) for the prediction of knottin 3D structures based on their sequence. The KNOTTIN web site also provides access to statistics about the database content, citations and web traffic. General information about knottins and the database usage are also available for users. Finally, the ‘Links’ menu contains hyperlinks to knottin-related web resources—such as the Cyclotide Webpage (www.cyclotide.com), CyBase (www.cybase.org.au), Tox-Prot (www.uniprot.org/program/Toxins), MvirDB (9) and the ConoServer (10,11)—and other protein databases and web servers.

PRACTICAL USE

Over the past decade, KNOTTIN has been cited by multiple articles and reviews. In most of the cases, our database is cited either as a source of information on knottins, or when introducing knottin proteins and their structural characteristics, various functions, or their presence in a wide range of species. Numerous citations of KNOTTIN are also related the families of knottins defined in the database, to which authors refer when identifying or quantifying knottins of interest. This shows that our database, throughout the years, has successfully served as a useful overview on the field of inhibitor cystine knots. In addition to the data stored in KNOTTIN, the computational tools available on the website have also been cited, in particular Knoter1D and Knoter3D (for example in (12–14)). Our database has also been cited by three patents, including one describing cystine knot peptides engineered for anti-thrombotic therapies (15) and which suggests using the KNOTTIN’s conformation search to determine folding patterns.

NEW FEATURES

Since its launch, the KNOTTIN database has seen its size grow (from 385 sequences in 2004, to 3320 in 2017, and from 85 to 214 natives structures), as the number of available sequences in UniProt and native structures in the PDB increased. However, the amount of experimentally determined structures of knottins remains relatively limited compared to the number of sequences, due to the difficulties related to the purification and crystallography of these proteins. This lack of available structures is a critical concern regarding the knottin-based drug design, which mainly lies on the study of 3D structures. To overcome this issue, one of the main features of this upgraded version of KNOTTIN is the systematic and automatic modeling and inclusion in the database of theoretical models produced with our aforementioned Knoter1D3D tool. This addition of predicted structures for every knottin sequence has increased by one order of magnitude the size of our database in terms of structures, with currently contains >3000 theoretical models. In details, these theoretical models constitute a valuable source of structural data, especially for knottin families for which there is no experimental structure available, such as the bacterial knottins. The prediction of knottin structures has been integrated as a step in the pipeline for generating the data of KNOTTIN (Figure 1). When a protein sequence from UniProt is identified as a knottin by Knoter1D, it is used as an input for Knoter1D3D to produce theoretical models, which will be added to the database—along with related UniProt data (such as, sequence, descriptor, species, tissue, authors, PMID, keywords) and a multi-sequence alignment with other knottins of the same family computed with ClustalW (16). The rest of the KNOTTIN pipeline concerns the detection and the subsequent addition of native knottin structures to the database. Thus, when a knottin structure is identified in the PDB by Knoter3D, the coordinate file is ‘standardized’: each residue is renumbered so that the knotted cysteines (except Cys IV) correspond to the positions 20, 40, 60, 80 and 100; the coordinate file is also reoriented by superimposition of the knotted cysteines onto those of the structure of the squash seed trypsin inhibitor (PDB code: 2btcI). These coordinate files are added to the database, along with data regarding structural properties (such as torsion angles, secondary structures, solvent accessibility) computed with STRIDE (17) and PDBgeo (described in (18)).

Figure 1.

Flowchart describing how the KNOTTIN database is generated. The Knoter1D and Knoter3D processes have been defined in the previous release of KNOTTIN (7); the Knoter1D3D process is also described in our previous work (8). The UniProt data are automatically extracted, by using a Perl script, from the corresponding UniProt web pages. It should be noted that predicted structures of knottins can also be found in other databases of protein theoretical models (such as SWISS-MODEL (19) and ModBase (20)), but these data are do not match the models from KNOTTIN, neither quantitatively nor qualitatively. Indeed, according to the Protein Model Portal (www.proteinmodelportal.org), which contains models from SWISS-MODEL and ModBase, there are only 468 predicted structures of knottins in these two databases. The much greater number of models in KNOTTIN is explained by the fact that our Knoter1D3D comparative modeling procedure can accurately predict knottin structures when the template-to-sequence identity is as low as 10% (8), which is rather common among knottins. Moreover, our use of a modeling method optimized for knottins necessarily improves the quality of the theoretical models, compared to those from other databases generated with regular comparative modeling procedures, which are not adapted to the particular structural features of knottins. This new version of the KNOTTIN database also comes with several technical improvements. The whole web interface has been entirely redesigned with the aim of being more user-friendly and compatible with modern devices. Notably, natives and predicted structures can now be visualized on the website thanks to the JavaScript-based molecular viewer JSmol (21), which therefore does not require users to have Java installed. By default, structures are displayed as a ‘cartoon’ representations, with each knottins loops colored differently. The numbering of the cysteines I to VI is displayed, and the disulfide bonds (‘SS’) are represented as ‘balls and sticks’ and colored differently, depending on whether they are knotted or not. The regular secondary structure elements are hidden by default, but users have the possibility to change our graphical presets by right-clicking the viewer (Figure 2)—as they would do with a local molecular viewer. This interactivity of JSmol also allows saving the session with the current parameters, as well as exporting the molecule as an image.

Figure 2.

Visualization of the structural superimposition of three native structures of knottins belonging to the ‘Agouti-related’ family (PDB codes: 1hykA, 1mr0A and 1y7jA). A right click on the JSmol viewer allows users to modify the representations of structures, or to perform other actions. Finally, some other improvements have been brought to the KNOTTIN website, such as a page in the ‘Statistics’ menu dedicated to the articles that cite KNOTTIN, and an updated version of the multi-sequence alignments viewer Mview (22). The possibility of downloading the whole database content (sequences, native and predicted structures, and multi-sequence alignments) as a compressed archive is also a new feature. It has been implemented with the aim of being useful for researchers wanting to carry out statistical studies about knottins. These data can also be used as training or benchmark datasets in the development of new computational methods dedicated to knottin proteins. Under the same ‘Data’ menu, users can now contribute to the maintenance and update of the database, by proposing either a new protein sequence (optionally with additional information or coordinate file) or a published article about knottins. This new functionality is achieved through web forms that users can fill out; the input is then manually verified, before being integrated to the database.

CONCLUSIONS AND PERSPECTIVES

This new version of the KNOTTIN website is distinct from the former by the updated content of its database, as well as by its new interface and the inclusion of new data types. The KNOTTIN database now contains more than 3000 sequences of knottins, and has greatly extended its reach with the addition of predicted structures for all of these sequences. To cope with the daily increase of the number of sequence in the UniProt database, future efforts will be put in the full automation of the update pipeline. Regarding the latest data, it is interesting to observed that, while sequences have been found in animals, plants, fungi, bacteria and viruses, knottins are still absent from archaea, which converges with previous findings (13). Therefore, particular attention will be paid to new data about these organisms, which represent one of the three domains of life. Finally, KNOTTIN is also a web server providing a user-friendly access to our knottin-specific tools. Following this direction, the platform will integrate the future computational methods we will develop for the analysis of knottin proteins.

20 in total

1. Two randomized trials of linaclotide for chronic constipation.

Authors: Anthony J Lembo; Harvey A Schneier; Steven J Shiff; Caroline B Kurtz; James E MacDougall; Xinwei D Jia; James Z Shao; Bernard J Lavins; Mark G Currie; Donald A Fitch; Brenda I Jeglinski; Paul Eng; Susan M Fox; Jeffrey M Johnston
Journal: N Engl J Med Date: 2011-08-11 Impact factor: 91.245

2. Multiple sequence alignment using ClustalW and ClustalX.

Authors: Julie D Thompson; Toby J Gibson; Des G Higgins
Journal: Curr Protoc Bioinformatics Date: 2002-08

3. MView: a web-compatible database search or multiple alignment viewer.

Authors: N P Brown; C Leroy; C Sander
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

Review 4. Cystine-knot peptides: emerging tools for cancer imaging and therapy.

Authors: Shelley E Ackerman; Nicolas V Currier; Jamie M Bergen; Jennifer R Cochran
Journal: Expert Rev Proteomics Date: 2014-08-28 Impact factor: 3.940

5. ConoServer: updated content, knowledge, and discovery tools in the conopeptide database.

Authors: Quentin Kaas; Rilei Yu; Ai-Hua Jin; Sébastien Dutertre; David J Craik
Journal: Nucleic Acids Res Date: 2011-11-03 Impact factor: 16.971

6. CyBase: a database of cyclic protein sequence and structure.

Authors: Jason P Mulvenna; Conan Wang; David J Craik
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

7. SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information.

Authors: Marco Biasini; Stefan Bienert; Andrew Waterhouse; Konstantin Arnold; Gabriel Studer; Tobias Schmidt; Florian Kiefer; Tiziano Gallo Cassarino; Martino Bertoni; Lorenza Bordoli; Torsten Schwede
Journal: Nucleic Acids Res Date: 2014-04-29 Impact factor: 16.971

8. KNOTTIN: the knottin or inhibitor cystine knot scaffold in 2007.

Authors: Jérôme Gracy; Dung Le-Nguyen; Jean-Christophe Gelly; Quentin Kaas; Annie Heitz; Laurent Chiche
Journal: Nucleic Acids Res Date: 2007-11-19 Impact factor: 16.971

9. ModBase, a database of annotated comparative protein structure models and associated resources.

Authors: Ursula Pieper; Benjamin M Webb; Guang Qiang Dong; Dina Schneidman-Duhovny; Hao Fan; Seung Joong Kim; Natalia Khuri; Yannick G Spill; Patrick Weinkam; Michal Hammel; John A Tainer; Michael Nilges; Andrej Sali
Journal: Nucleic Acids Res Date: 2013-11-23 Impact factor: 16.971

10. Dramatic expansion of the black widow toxin arsenal uncovered by multi-tissue transcriptomics and venom proteomics.

Authors: Robert A Haney; Nadia A Ayoub; Thomas H Clarke; Cheryl Y Hayashi; Jessica E Garb
Journal: BMC Genomics Date: 2014-06-11 Impact factor: 3.969

24 in total

1. The three-dimensional structure of an H-superfamily conotoxin reveals a granulin fold arising from a common ICK cysteine framework.

Authors: Lau D Nielsen; Mads M Foged; Anastasia Albert; Andreas B Bertelsen; Cecilie L Søltoft; Samuel D Robinson; Steen V Petersen; Anthony W Purcell; Baldomero M Olivera; Raymond S Norton; Terje Vasskog; Helena Safavi-Hemami; Kaare Teilum; Lars Ellgaard
Journal: J Biol Chem Date: 2019-04-11 Impact factor: 5.157

2. Structural venomics reveals evolution of a complex venom by duplication and diversification of an ancient peptide-encoding gene.

Authors: Sandy S Pineda; Yanni K-Y Chin; Eivind A B Undheim; Sebastian Senff; Mehdi Mobli; Claire Dauly; Pierre Escoubas; Graham M Nicholson; Quentin Kaas; Shaodong Guo; Volker Herzig; John S Mattick; Glenn F King
Journal: Proc Natl Acad Sci U S A Date: 2020-05-12 Impact factor: 11.205

3. Immunomodulatory potential of black soldier fly larvae: applications beyond nutrition in animal feeding programs.

Authors: Elizabeth Koutsos; Bree Modica; Tarra Freel
Journal: Transl Anim Sci Date: 2022-06-22

4. Yeast Surface Display: New Opportunities for a Time-Tested Protein Engineering System.

Authors: Maryam Raeeszadeh-Sarmazdeh; Eric T Boder
Journal: Methods Mol Biol Date: 2022

5. Recifin A, Initial Example of the Tyr-Lock Peptide Structural Family, Is a Selective Allosteric Inhibitor of Tyrosyl-DNA Phosphodiesterase I.

Authors: Lauren R H Krumpe; Brice A P Wilson; Christophe Marchand; Suthananda N Sunassee; Alun Bermingham; Wenjie Wang; Edmund Price; Tad Guszczynski; James A Kelley; Kirk R Gustafson; Yves Pommier; K Johan Rosengren; Christina I Schroeder; Barry R O'Keefe
Journal: J Am Chem Soc Date: 2020-12-02 Impact factor: 15.419

Review 6. Classes, Databases, and Prediction Methods of Pharmaceutically and Commercially Important Cystine-Stabilized Peptides.

Authors: S M Ashiqul Islam; Christopher Michel Kearney; Erich Baker
Journal: Toxins (Basel) Date: 2018-06-19 Impact factor: 4.546

7. Pro-Inflammatory Signaling Upregulates a Neurotoxic Conotoxin-Like Protein Encrypted Within Human Endogenous Retrovirus-K.

Authors: Domenico Di Curzio; Mamneet Gurm; Matthew Turnbull; Marie-Josée Nadeau; Breanna Meek; Julia D Rempel; Samuel Fineblit; Michael Jonasson; Sherry Hebert; Jennifer Ferguson-Parry; Renée N Douville
Journal: Cells Date: 2020-06-30 Impact factor: 6.600

8. A knottin scaffold directs the CXC-chemokine-binding specificity of tick evasins.

Authors: Angela W Lee; Maud Deruaz; Christopher Lynch; Graham Davies; Kamayani Singh; Yara Alenazi; James R O Eaton; Akane Kawamura; Jeffrey Shaw; Amanda E I Proudfoot; João M Dias; Shoumo Bhattacharya
Journal: J Biol Chem Date: 2019-06-05 Impact factor: 5.157

9. Screening, large-scale production and structure-based classification of cystine-dense peptides.

Authors: Colin E Correnti; Mesfin M Gewe; Christopher Mehlin; Ashok D Bandaranayake; William A Johnsen; Peter B Rupert; Mi-Youn Brusniak; Midori Clarke; Skyler E Burke; Willem De Van Der Schueren; Kristina Pilat; Shanon M Turnbaugh; Damon May; Alex Watson; Man Kid Chan; Christopher D Bahl; James M Olson; Roland K Strong
Journal: Nat Struct Mol Biol Date: 2018-02-26 Impact factor: 18.361

Review 10. Proteinaceous effector discovery and characterization in filamentous plant pathogens.

Authors: Claire Kanja; Kim E Hammond-Kosack
Journal: Mol Plant Pathol Date: 2020-08-07 Impact factor: 5.663