Literature DB >> 18025039

KNOTTIN: the knottin or inhibitor cystine knot scaffold in 2007.

Jérôme Gracy¹, Dung Le-Nguyen, Jean-Christophe Gelly, Quentin Kaas, Annie Heitz, Laurent Chiche.

Abstract

The KNOTTIN database provides standardized information on the small disulfide-rich proteins with a knotted topology called knottins or inhibitor cystine knots. Static pages present the essential historical or recent results about knottin discoveries, sequences, structures, syntheses, folding, functions, applications and bibliography. New tools, KNOTER3D and KNOTER1D, are provided to determine or predict if a user query (3D structure or sequence) is a knottin. These tools are now used to automate the database update. All knottin structures and sequences in the database are now standardized according to the knottin nomenclature based on loop lengths between knotted cysteines, and to the knottin numbering scheme. Therefore, the whole KNOTTIN database (sequences and structures) can now be searched using loop lengths, in addition to keyword and sequence (BLAST, HMMER) searches. Renumbered and structurally fitted knottin PDB files are available for download as well as renumbered sequences, sequence alignments and logos. The knottin numbering scheme is used for automatic drawing of standardized two-dimensional Colliers de Perles of any knottin structure or sequence in the database or provided by the user. The KNOTTIN database is available at http://knottin.cbs.cnrs.fr.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins
Cysteine

Year: 2007 PMID： 18025039 PMCID： PMC2238874 DOI： 10.1093/nar/gkm939

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

KNOTTINS: SMALL DISULFIDE-RICH PROTEINS WITH A KNOTTED ARRANGEMENT

The knottins are fascinating miniproteins present in many species and featuring various biological actions such as toxic, inhibitory, anti-microbial, insecticidal, cytotoxic, anti-HIV or hormone-like activity (1). They share a unique knotted topology of three disulfide bridges, with one disulfide penetrating through a macrocycle formed by the two other disulfides and inter-connecting peptide backbones. This scaffold was first discovered in 1982 in PCI, a carboxypeptidase inhibitor from potato (2). It was since observed in an amazing number of unrelated protein families including, e.g. toxins from plants, bugs, molluscs or arachnids, or anti-microbials from plants, insects or arthropods. The KNOTTIN database provides standardized data on sequences, structures and other information on known knottins (3). Proteins sharing this scaffold were referred to as knottins (4) or inhibitor cystine knots (ICK) (5), or even simply as cystine knots. The most populated knottin families are conotoxins (389 sequences), spider toxins (257 sequences) and cyclotides (105 sequences). The cyclotides are head-to-tail cyclized knottins present in plants from the Violaceae and Rubiaceae [see www.cyclotide.com (6)]. These miniproteins are considered as natural combinatorial peptide libraries structurally constrained by the knottin scaffold (7,8) but in which hypermutation of essentially all residues are permitted with the exception of the strictly conserved cysteines of the knot. The main knottin features are therefore a remarkable stability due to the cystine knot, a small size making them readily accessible to chemical synthesis, and an excellent tolerance to sequence variations. Knottins thus appear as appealing leads or frameworks for peptide drug design (1,9–18). One knottin has come to market for the treatment of chronic pain (Prialt from Elan corp http://www.elan.com) and others are on the way. Companies have emerged that plan to use knottins as leads or scaffolds in drug design (e.g. see the ‘Microbody’ technology, i.e. generation of drug candidates based on knottins, at www.nascacell.de). The new developments in the field have prompted us to improve the KNOTTIN database content. Moreover, following collaboration with the Swiss Institute of Bioinformatics, UniProtKB/Swiss-Prot entries (19) are now annotated with knottin structural information. Relevant entries can be retrieved with the newly introduced keyword ‘Knottin’.

THE KNOTTIN DATABASE IN 2007

The first database release (3) contained 11 protein families, 85 three-dimensional (3D) structures and 385 sequences (1D). The content has more than doubled in the current release built from the Uniprot release 11.3 [10 July 2007; UniProtKB/Swiss-Prot 53.3 and UniProtKB/TrEMBL 36.3 (19)] using new automated tools (see subsequently). The KNOTTIN database now contains 28 families, 145 3D structures and 1066 protein sequences. Table 1 displays an overview of the current release content.

Table 1.

Statistics on the current KNOTTIN database content

Family	Cys IV	1D^a	NMR^b	X-ray^c	Organisms

					1D^a	3D^a
All		1066	126	14	299	61
Agouti-related	61	95	4		75	2
Alpha-amylase inhibitor	61	1	2	1	1	1
Bug	61	3	2		3	2
Carboxypeptidase inhibitor	77	13	1	1	4	1
Conotoxin1	61	389	26		52	10
Conotoxin2	78	4	1		4	1
Conotoxin3	77	1			1
Cyclotide	78	105	20		20	8
Fungi1	77	4			4
Fungi2	61	2			2
Gurmarin like	61	1	2		1	1
Horseshoe crab	61	4	3		1	1
Insect antimicrobial	61	13	1		6	1
Phenoloxidase inhibitor	61	5			4
Plant antimicrobial	61	6	1		3	1
Plant defensin	63	2	1		1	1
Plant toxin	78	30	2		17	2
Scorpion1	63	14	2		7	2
Scorpion2	61	5	2		4	2
Scorpion3	63	25	2		6	1
Spider	61	257	44		47	15
Sponge	61	1			1
Terebra	61	1			1
Serine protease inhibitor1	78	35	8	12	16	8
Serine protease inhibitor2	61	4			2
Trematoda	61	15			1
Virus1	61	22	2		6	1
Virus2	61	9			9

a1D refers to the sequence database, 3D to the structure database.

bNumber of NMR structures. cNumber of X-ray structures.

Statistics on the current KNOTTIN database content a1D refers to the sequence database, 3D to the structure database. bNumber of NMR structures. cNumber of X-ray structures. Several families have no 3D structures included in the database. For two of them, Fungi1 and Sponge, 3D structures have been reported that unambiguously classify these proteins as knottins (20,21) although these structures were not deposited into the Protein Data Bank (22). Nevertheless, seven families were included despite the fact that no 3D structure has yet been reported. These families, Conotoxin3, Fungi2, Phenoloxidase inhibitor, Serine protease inhibitor2, Terebra, Trematoda and Virus2, were selected for inclusion because all of them displayed good similarity with known knottins according to the KNOTER1D prediction tool (see subsequently). Some of them were moreover previously predicted as knottins [Phenoloxidase inhibitor (23)], have known disulfide bridges [Serine protease inhibitor2 (24)], were classified by others in protein families already included in the KNOTTIN database (Conotoxin3 is classified in the conotoxin-P family; Virus2 is classified in the conotoxin-O superfamily) or were annotated as knottins by the UniProt depositors (Trematoda). Finally, the Fungi2 and the Terebra families were included purely based on the KNOTER1D predictions and should probably be viewed with caution. Other improvements of the KNOTTIN website are, e.g. the availability of alignments in various text formats for downloads and of sequence logos (25). The logo for the entire database content is shown in Figure 1 and the percent of identity between knottins in Figure 2.

Figure 1.

Figure 2.

Frequencies of percent identity observed when comparing each knottin with all other knottins (plain line) or with other knottins in the same family (dashed line).

Sequence logo (25) for the alignment of all standardized sequences in the KNOTTIN database. The alignment is truncated between the first and the last cysteine of the knot (standard positions 20 and 100, respectively). Frequencies of percent identity observed when comparing each knottin with all other knottins (plain line) or with other knottins in the same family (dashed line). The textual content of the database (menus ‘Functions’, ‘Folding’, ‘Synthesis’, ‘Modeling & drug design’, ‘Landmarks’, ‘References’, ‘Links’) has been regularly updated. Its display has been improved and reorganized, and the cited references are now included in the database thus allowing keyword searches. It is not expected to provide exhaustive data on knottins, but rather to permanently provide a freely available review of essential data on knottins. The KNOTTIN database is freely available at http://knottin.cbs.cnrs.fr or http://knottin.com, but we request that this article be cited when using the KNOTTIN database in research projects.

AUTOMATED DETECTION OF KNOTTINS: KNOTER3D AND KNOTER1D

Previous updates of the KNOTTIN database implied numerous manual steps, e.g. manual inspection of BLAST (26), PSI-BLAST (27) or HMMER (28) search results for each knottin family. To (i) facilitate and speed up the database update, (ii) reduce unavoidable errors due to manual steps and (iii) provide prediction tools for new sequences or structures, we have implemented two new algorithms named KNOTER3D and KNOTER1D that largely automate knottin detection. Given a protein structure Str1, KNOTER3D first searches for the presence of three disulfide bridges with the I–IV, II–V, III–VI connectivity. If present, the Str1 protein in renumbered such that cysteines I, II, III, V and VI have numbers 20, 40, 60 80 and 100, respectively. Then the structural core of Str1, i.e. the cystine-stabilized beta-sheet motif (29) (renumbered residues 40, 60–61, 79–81 and 99–100), is superimposed onto the corresponding motif of a reference knottin structure (CPTI-II, PDB ID: 2btcI) and a RMSD below 2.5 Å indicates that the structure Str1 is a knottin. The automated detection of knottins from their amino acid sequence alone is more difficult. The KNOTER1D prediction tool is based on sequence similarity search, cysteine position analysis and, when possible, database annotation mining if a close homolog of the considered protein is available in the UniProt database. Given a protein sequence Seq1 whose the knottin status is unknown, the prediction algorithm consists in searching sequences similar to Seq1 from the KNOTTIN database using BLAST, then comparing Seq1 with each found similar knottin sequence Knot_Seq2 and its associated knottin family Knot_Fam2 until the following similarity score is above a predefined cutoff. The similarity between Seq1 and Knot_Seq2 is measured using a composite score integrating the following sequence analysis data: S is a sequence similarity score. It is the P-value logarithm of the best local pair-wise alignment between Seq1 and Knot_Seq2 detected by BLAST. C is a score related to knotted cysteines. It is the number of query cysteines, which correspond to knotted cysteines when aligning Seq1 onto the multiple sequence alignment of family Knot_Fam2 using CLUSTALW. L is a loop compatibility score. It is computed using a first-order hidden Markov model based on inter-cysteine loop-length frequencies observed in family Knot_Fam2. Furthermore, if Uni_Seq1, the Uniprot database entry whose amino acid sequence is the most similar to Seq1 according to BLAST, shares more than 80% sequence identity with Seq1, then the previously defined scores S, C and L are complemented with the following database-mining pieces of information. The information can also be provided directly to KNOTER1D. T is a taxonomic score. A specific and a general taxonomic group are defined for each knottin family, e.g. ‘Cucurbitaceae’ and ‘Viridiplantae’, respectively, for the ‘Serine protease Inhibitor1’ family. Then, T = 2 if the Uni_Seq1 species belongs to the specific group of the knottin family Knot_Fam2, T = 1 if the Uni_seq1 species belongs to the general group only and T = 0 if the Uni_seq1 species does not belong to the general taxonomic group of the knottin family Knot_Fam2. F is a functional score. A list of functions is defined for each knottin families. Then F = 1 if the Uni_Seq1 entry function annotation is compatible with any function of the knottin family Knot_Fam2 (e.g. ‘Protease inhibitor’ for the Serine Protease Inhibitor1 family). K is a score based on keywords. A list of keywords was established from data in the DE, KW, DR, FT and CC fields of knottins and other disulfide-rich proteins contained in a disulfide-rich subset of the UniProt/Swiss-Prot database. Associated weights were empirically set to positive or negative values according to the ratio between knottin or non-knottin proteins in the subset of UniProt/Swiss-Prot entries matching the considered keyword (e.g. the keyword ‘violacin’ which matches knottin entries only has a weight set to +6, while the keyword ‘egf’ which defines non-knotted cysteine-rich modules has a weight set to –8). Otherwise, if Uni_Seq1 is missing in Uniprot, i.e. there is no close homolog of the query sequence in UniProt, then the scores T, F and K are set to 0. A composite knottin similarity score is then built as the weighted sum A subset of UniProt/SwissProt+TrEMBL containing only disulfide-rich proteins [SS(UniProt/SP-Tr), 137 875 proteins] has been generated. Then weights wS, wC, wL, wT, wF, wK and the score cutoff for knottin prediction were optimized against SS(UniProt/SP-Tr), in order to maximize the discrimination between knottins and non-knottins according to the current release of the KNOTTIN database. Based on the TS composite score, KNOTER1D predicts the query sequence Seq1 as a knottin, a putative knottin, or a non-knottin. For knottin or putative knottin, KNOTER1D provides: A multiple alignment of the query sequence onto the closest knottin family. The sequence number of the six cysteines predicted to be involved in the knot. A renumbered alignment according to the standard knottin numbering. The database update is now essentially based on KNOTER3D and KNOTER1D. All new structures in the Protein Data Bank (22) since the last update are submitted to the KNOTER3D tool and new detected knottins are integrated in the database. On the other hand, each sequence in SS[UniProt/SP-Tr] is sent to KNOTER1D and synthetic results are compiled in a list containing one line by sequence ranked by decreasing values of the composite score TS. Each protein line indicates the TS score, the query accession number, the query knottin family if it is already in the database, the most similar (hit) knottin ID and family, the BLAST score for query/hit alignment, the T, F, C and L scores, the total number of cysteines in the query, the length of the query and the keywords/weights used (K score). This file is first manually annotated to select or reject new sequences to be included in the KNOTTIN database, and then submitted to perl scripts that generate the updated ‘sequence’ section of the database. This mostly automated protocol significantly improves the sensitivity and reliability of knottin detection. It also helped in discovering new putative knottin families (Table 1).

KNOTTIN STANDARDIZATIONS

The knottin standardization relies only on the detection of the six cysteines of the knot (1,3). Briefly, the cysteines I, II, III, V and VI involved in the knot are renumbered 20, 40, 60, 80 and 100. The cysteine IV, which has different spatial locations in various families, receives different numbers in the range 61, e.g. in ‘Conotoxin1’ and in ‘Spider toxins’, to 78, e.g. in ‘Serine protease inhibitor1’ (Table 1). Although this is straightforward for knottins containing only six cysteines, it can become rather cumbersome for knottins containing more than six cysteines. The only way to rigorously and systematically determine the knotted cysteines is from 3D structure analyses. This is done by the KNOTER3D tool available in the Tool menu. However, by similarity, the standardization can then be transferred to any knottin sequence. This is now done by the KNOTER1D program (see above) that provides the knottin standardization for any sequence predicted to be a putative knottin. Extension of the knottin standardization to all sequences in the current release has two main advantages: The sequence database can now be searched or sorted using criteria based on loop lengths. This allows, e.g. to easily establish exhaustive statistics on loop lengths including different unrelated families. Such statistics are displayed in Figure 3 and now appear on the database homepage. In previous releases, only the 3D structures were standardized.

Figure 3.

Statistics on loop lengths in the KNOTTIN database. Loop labels follow the knottin nomenclature (3).

Standardized alignments are easily produced in which spatially conserved cysteines of the knot can be recognized (numbers 20, 40, 60, 80, 100) and are correctly aligned. Without such standardization, automatic alignments of non-homologous knottins are likely to be wrong. An example of a standardized alignment is shown in Figure 4 for knottins selected in the ‘Insect anti-microbial’ family.

Figure 4.

Standard alignment of selected knottins in the ‘Insect anti-microbial’ family. Cysteines numbered 20, 40, 60, 80 and 100 correspond to cysteines I, II, III, V and VI, respectively. The drop-down menu provides access to various alignment formats useful for data downloads. It also allows the creation of a logo from the alignment and the transfer of the alignment to the PAT webserver (30).

Statistics on loop lengths in the KNOTTIN database. Loop labels follow the knottin nomenclature (3). Standard alignment of selected knottins in the ‘Insect anti-microbial’ family. Cysteines numbered 20, 40, 60, 80 and 100 correspond to cysteines I, II, III, V and VI, respectively. The drop-down menu provides access to various alignment formats useful for data downloads. It also allows the creation of a logo from the alignment and the transfer of the alignment to the PAT webserver (30). As shown in Figure 3, all loops do not display similar length variability. Loops ‘b’, ‘(d)’ and ‘e’ display rather normal distributions although loop ‘b’ tends to be shorter and loop ‘e’ longer (it is worth noting that the loop ‘e’ hairpin often displays an additional stabilizing disulfide bridge between β-strands, see positions 82 and 98 in Figure 4). Loop ‘[f]’ is the C-to-N linker in cyclic knottins, mainly cyclotides, and is shown as zero length for acyclic knottins. In contrast, loop ‘(a)’ and ‘c’ display strongly biased lengths. Loop ‘c’ of zero length, i.e. cysteine IV is adjacent to cysteine III and is thus renumbered as 61 (Table 1), is present in Conotoxin1 and spider toxins, the two largest knottin families. Loop ‘(a)’ also displays a striking distribution biased toward 3 and 6 lengths. These length biases remain to be explained. The renumbered alignment shown in Figure 4 shows the new drop-down menu, which is useful to display text formatted data, but also to draw sequence logos or to send the alignment to the PAT webserver (30) allowing more powerful sequence analyses. The KNOTER1D and KNOTER3D tools are available through the ‘Tool’ menu of the KNOTTIN database and through the ‘Sequence similarity search’ and ‘Tertiary structure analysis’ menus of the PAT webserver (30). The PAT webserver allows submission of protein sequences or structures in many formats including PDB or SwissProt IDs, and combinations with many other tools, while the KNOTTIN database provides ‘Collier de Perles’ standardized two-dimensional representations. The standardization of all knottin sequences provided in the current release is the first main step toward rigorous 3D modeling of all knottins. Reconstruction of all atom models will consist then in loop modeling of inter-cysteine segments. A strategy for optimal reconstruction is currently underway.

28 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. The KNOTTIN website and database: a new information system dedicated to the knottin scaffold.

Authors: Jean-Christophe Gelly; Jérôme Gracy; Quentin Kaas; Dung Le-Nguyen; Annie Heitz; Laurent Chiche
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. WebLogo: a sequence logo generator.

Authors: Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal: Genome Res Date: 2004-06 Impact factor: 9.043

4. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

5. Molecular recognition between serine proteases and new bioactive microproteins with a knotted structure.

Authors: D Le Nguyen; A Heitz; L Chiche; B Castro; R A Boigegrain; A Favel; M A Coletti-Previero
Journal: Biochimie Date: 1990 Jun-Jul Impact factor: 4.079

Review 6. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

7. The race-specific elicitor AVR9 of the tomato pathogen Cladosporium fulvum: a cystine knot protein. Sequence-specific 1H NMR assignments, secondary structure and global fold of the protein.

Authors: J Vervoort; H W van den Hooven; A Berg; P Vossen; R Vogelsang; M H Joosten; P J de Wit
Journal: FEBS Lett Date: 1997-03-10 Impact factor: 4.124

8. A common structural motif incorporating a cystine knot and a triple-stranded beta-sheet in toxic and inhibitory polypeptides.

Authors: P K Pallaghy; K J Nielsen; D J Craik; R S Norton
Journal: Protein Sci Date: 1994-10 Impact factor: 6.725

9. Refined crystal structure of the potato inhibitor complex of carboxypeptidase A at 2.5 A resolution.

Authors: D C Rees; W N Lipscomb
Journal: J Mol Biol Date: 1982-09-25 Impact factor: 5.469

10. CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering.

Authors: Conan K L Wang; Quentin Kaas; Laurent Chiche; David J Craik
Journal: Nucleic Acids Res Date: 2007-11-05 Impact factor: 16.971

56 in total

1. Isolation, amino acid sequence and biological activities of novel long-chain polyamine-associated peptide toxins from the sponge Axinyssa aculeata.

Authors: Satoko Matsunaga; Mitsuru Jimbo; Martin B Gill; L Leanne Lash-Van Wyhe; Michio Murata; Ken'ichi Nonomura; Geoffrey T Swanson; Ryuichi Sakai
Journal: Chembiochem Date: 2011-08-09 Impact factor: 3.164

2. Site-specific effects of diselenide bridges on the oxidative folding of a cystine knot peptide, omega-selenoconotoxin GVIA.

Authors: Konkallu Hanumae Gowd; Viktor Yarotskyy; Keith S Elmslie; Jack J Skalicky; Baldomero M Olivera; Grzegorz Bulaj
Journal: Biochemistry Date: 2010-03-30 Impact factor: 3.162

3. Clawing through evolution: toxin diversification and convergence in the ancient lineage Chilopoda (centipedes).

Authors: Eivind A B Undheim; Alun Jones; Karl R Clauser; John W Holland; Sandy S Pineda; Glenn F King; Bryan G Fry
Journal: Mol Biol Evol Date: 2014-05-20 Impact factor: 16.240

Review 4. Plant cystine-knot peptides: pharmacological perspectives.

Authors: Barbara Molesini; Davide Treggiari; Andrea Dalbeni; Pietro Minuz; Tiziana Pandolfini
Journal: Br J Clin Pharmacol Date: 2016-04-22 Impact factor: 4.335

5. Molecular cloning and in silico characterization of knottin peptide, U2-SCRTX-Lit2, from brown spider (Loxosceles intermedia) venom glands.

Authors: Gabriel Otto Meissner; Pedro Túlio de Resende Lara; Luis Paulo Barbour Scott; Antônio Sérgio Kimus Braz; Daniele Chaves-Moreira; Fernando Hitomi Matsubara; Eduardo Mendonça Soares; Dilza Trevisan-Silva; Luiza Helena Gremski; Silvio Sanches Veiga; Olga Meiri Chaim
Journal: J Mol Model Date: 2016-08-03 Impact factor: 1.810

Review 6. Cyclotides, a novel ultrastable polypeptide scaffold for drug discovery.

Authors: Andrew Gould; Yanbin Ji; Teshome L Aboye; Julio A Camarero
Journal: Curr Pharm Des Date: 2011-12 Impact factor: 3.116

7. Common structural traits for cystine knot domain of the TGFβ superfamily of proteins and three-fingered ectodomain of their cellular receptors.

Authors: A Galat
Journal: Cell Mol Life Sci Date: 2011-03-03 Impact factor: 9.261

8. The insecticidal neurotoxin Aps III is an atypical knottin peptide that potently blocks insect voltage-gated sodium channels.

Authors: Niraj S Bende; Eunji Kang; Volker Herzig; Frank Bosmans; Graham M Nicholson; Mehdi Mobli; Glenn F King
Journal: Biochem Pharmacol Date: 2013-03-06 Impact factor: 5.858

9. Knottin cyclization: impact on structure and dynamics.

Authors: Annie Heitz; Olga Avrutina; Dung Le-Nguyen; Ulf Diederichsen; Jean-François Hernandez; Jérôme Gracy; Harald Kolmar; Laurent Chiche
Journal: BMC Struct Biol Date: 2008-12-12

10. Interrogating and predicting tolerated sequence diversity in protein folds: application to E. elaterium trypsin inhibitor-II cystine-knot miniprotein.

Authors: Jennifer L Lahti; Adam P Silverman; Jennifer R Cochran
Journal: PLoS Comput Biol Date: 2009-09-04 Impact factor: 4.475