| Literature DB >> 32422972 |
Rachael A Mansbach1, Srirupa Chakraborty1,2, Timothy Travers1,2, S Gnanakaran1.
Abstract
Conotoxins are short, cysteine-rich peptides of great interest as novel therapeutic leads and of great concern as lethal biological agents due to their high affinity and specificity for various receptors involved in neuromuscular transmission. Currently, of the approximately 6000 known conotoxin sequences, only about 3% have associated structural characterization, which leads to a bottleneck in rapid high-throughput screening (HTS) for identification of potential leads or threats. In this work, we combine a graph-based approach with homology modeling to expand the library of conotoxin structures and to identify those conotoxin sequences that are of the greatest value for experimental structural characterization. The latter would allow for the rapid expansion of the known structural space for generating high quality template-based models. Our approach generalizes to other evolutionarily-related, short, cysteine-rich venoms of interest. Overall, we present and validate an approach for venom structure modeling and experimental guidance and employ it to produce a 290%-larger library of approximate conotoxin structures for HTS. We also provide a set of ranked conotoxin sequences for experimental structure determination to further expand this library.Entities:
Keywords: conotoxins; homology modeling; network analysis; protein structure determination
Mesh:
Substances:
Year: 2020 PMID: 32422972 PMCID: PMC7281422 DOI: 10.3390/md18050256
Source DB: PubMed Journal: Mar Drugs ISSN: 1660-3397 Impact factor: 5.118
Figure A1Rost’s phenomenological curve (Equation (1)) of minimum percentage identity for homology modeling as a function of pairwise alignment length with padding as employed in this work. As the length of the alignment decreases, the minimum percent identity for homology modeling increases, and there is a particularly rapid increase below alignments of about 25 amino acids, where a fairly large proportion of toxins reside.
Figure 1Schematic of a simple graph-based algorithm for constructing a library of structural templates for homology modeling. For each connected component in the graph of sequences, where an edge represents the ability to homology model one sequence based on another, we employ a greedy approach to find a good library of template structures that cover as much of the sequence space as possible. For computation of the sequence set of interest for experimental characterization, we skip consideration of the structures and run the algorithm on the subset with structure-associated sequences removed.
Figure 2Graph of conotoxins containing (A) four cysteines, (B) six cysteines, (C) eight cysteines, and (D) ten cysteines where nodes are sequences and edges exist between sequences with pairwise alignments that have high enough length and percent identity to fall above the Rost curve with (Equation (1)). We show the set of sequences added to the template libraries in orange, the set of sequences corresponding to unselected structures in black, the set of covered sequences that we homology model based on the templates included in the library in blue, and the set of projected sequences in green in which structures are in need of characterization in order that the rest of the sequences in magenta may be homology modeled based on some template. Nodes belonging to both and are displayed as half green, half blue. The sizes of the nodes correspond to their degree; that is, the number of other sequences that they can be modeled based on or used to model. Node locations and edge lengths were chosen for ease of visualization of separate connected components. Visualization of the graphs was produced with Gephi 0.9.2 [39].
Figure A2Graph of conotoxins containing (A) four cysteines, (B) six cysteines, (C) eight cysteines, and (D) ten cysteines where nodes are sequences and edges exist between sequences with pairwise alignments that have a high enough length and percent identity to fall above the Rost curve with (Equation (1)). Colors show the relative sequence lengths of each node, but the color scale of each graph is independent of the others. The sizes of the nodes correspond to their degree; that is, the number of other sequences that they can be modeled based on or used to model. Node locations and edge lengths were chosen for ease of visualization of separate connected components. Visualization of the graphs was produced with Gephi 0.9.2 [39].
List of sequences containing four cysteines in order of interest for experimental characterization, based on the degree (sequence coverage) in alignment graphs (cf. Figure 2). Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications. Node degree corresponds to the number of sequences with pairwise alignments that are long enough and have high enough percent identity to be homology modeled with the given sequence as a template. Cysteines are highlighted in red to guide the eye. We note in the fourth column the pharmacological family, although it is unknown for the majority of sequences, as it requires a separate experimental determination in most cases.
| Sequence | Name(s) | Degree | Pharm. Fam. |
|---|---|---|---|
| AAKVKYSNTPEE | Li1.28 | 10 | Unknown |
| G | Vc1.1[N9A] | 10 | alpha |
| G | MII [L15A] | 8 | Unknown |
| AALEDADMKTEKGFLSSIVGNLGTVGNLV– | |||
| GSV | Pu5.7 | 7 | Unknown |
| RAALEDADMKTEKGVLNAIFSNLGDLGNL– | |||
| VSSV | Pu5.9 | 6 | Unknown |
| AGLTDADLKTEKGFLSGLLNVAGSV | Lt5g | 6 | Unknown |
| G | MII [H9A] | 6 | Unknown |
| VPAEQMMEEL | Lt14.4 | 5 | Unknown |
| TNEGPGRDPAP | Cal5b | 5 | Unknown |
| RPE | Mr1.8 | 4 | Unknown |
| G | TxIA | 4 | alpha |
| SPGSTI | Fe14.1 | 4 | Unknown |
| G | PnIA [A10L,sTy15Y] | 4 | Unknown |
| YAAVVNRASALMAQAVLRD | Ec1.7 | 4 | Unknown |
| NGR | Ac1.1b, CnIH, R1.1, Bt1.6, Mn1.2, C4.3 | 4 | Unknown |
| NGR | Mn1.5 | 3 | Unknown |
| G | LtIA [A4S] | 3 | Unknown |
| G | MII [E11A,L15A] | 3 | Unknown |
| G | O1.3 | 3 | Unknown |
| GG | Cr1.6 | 3 | Unknown |
| GGG | Gly-AnIB | 3 | Unknown |
| DG | Eb1.1, Qc1.18 | 3 | Unknown |
| LDP | Li1.4, Sa1.12 | 3 | Unknown |
| NE | Qc1.1b, LiC22 | 3 | Unknown |
| DE | Li1.24, Sa1.6 | 3 | Unknown |
| G | Li1.11 | 3 | Unknown |
| SFRFIPGGIKEIA | G14.1 | 3 | Unknown |
| VPPEPILEII | Vc14.4 | 3 | Unknown |
| G | Su1.6 | 2 | Unknown |
| AANDKASVQIALTVQE | Dd1.7, Li1.21 | 2 | Unknown |
| TAFGLRL | Cal1b | 2 | Unknown |
| AANAKLFDVGQS | Sa1.7 | 2 | Unknown |
| TVRDA | Li1.16, Sa1.3 | 2 | Unknown |
| NLQIL | S5.3, Eb5.5 | 2 | Unknown |
| E | Cl14c | 2 | Unknown |
| GIW | Cal14.1a | 2 | Unknown |
| GMWDE | Lp1.7 | 2 | Unknown |
| GR | CnIJ | 2 | Unknown |
| IALIATRE | Co1.3 | 2 | Unknown |
| G | PeIA[A7V, S9H,V10A,N11R] | 2 | Unknown |
| DG | Qc1.7 | 2 | Unknown |
| PPG | Bu1.2 | 2 | Unknown |
| LINTR | Vc5.11 | 2 | Unknown |
| NAAANDKASDVIPLALQG | Cn1.6 | 2 | Unknown |
| G | PeIA[A7V, S9H,V10A,N11R,E14A] | 2 | Unknown |
| WDVND | FlfXIVB | 2 | Unknown |
| NGR | Mn1.4b | 2 | Unknown |
| NGR | Ac1.2 | 2 | Unknown |
| G | AuIA | 2 | alpha |
| DE | Pc1b | 1 | Unknown |
| AANLMALLQESL | Pu14.6 | 1 | Unknown |
| G | Ca1.2 | 1 | Unknown |
| FLTQQSPRDFAKSVMQLLHYNWID | Lv5.7 | 1 | Unknown |
| APAELILETI | Bt14.3 | 1 | Unknown |
| EIVNIIDSISDVAKQI | Vn5.5 | 1 | Unknown |
| E | Vc5.7 | 1 | Unknown |
| Mr5.7 | 1 | Unknown | |
| G | Su1.2 | 1 | Unknown |
| DD | PuSG1.1 | 1 | Unknown |
| APNVKDSKASGS | Li1.32 | 1 | Unknown |
| YHE | Sa1.16 | 1 | Unknown |
| G | Li1.14 | 1 | Unknown |
| G | Qc1.9 | 1 | Unknown |
| VMQLRYYNWID | Qc5.3 | 1 | Unknown |
| TG | Co1.4 | 1 | Unknown |
| SVEGVISTIKDFAVKV | Ts5.5 | 1 | Unknown |
| S | Leo-A1 | 1 | Unknown |
| S | Lp5.1 | 1 | Unknown |
| R | MI[del1G] | 1 | Unknown |
| QTPG | EIIA | 1 | Unknown |
| QG | Qc1.12 | 1 | Unknown |
| PE | Ai1.2 | 1 | Unknown |
| NIQII | Tx5.5 | 1 | Unknown |
| NAWLTPEE | |||
| DGFRRLPYR | Pu1.5 | 1 | Unknown |
| KVY | Lt5i | 1 | Unknown |
| IINW | Sr5.7 | 1 | Unknown |
| Y | SIA | 1 | alpha |
| GILELAKTV | Tx5.13, Tr5.3, Vr5.1 | 1 | Unknown |
| GG | Qc1.13 | 1 | Unknown |
| G | Cal14a | 1 | Unknown |
| GIRGN | Vt1.24 | 1 | Unknown |
List of sequences containing six cysteines in order of interest for experimental characterization, based on the degree (sequence coverage) in alignment graphs (cf. Figure 2). Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications. Node degree corresponds to the number of sequences with pairwise alignments that are long enough and have a high enough percent identity to be homology modeled with the given sequence as a template. Cysteines are highlighted in red to guide the eye. We note in the fourth column the pharmacological family, although it is unknown for the majority of sequences, as it requires a separate experimental determination in most cases.
| Sequence | Name(s) | Degree | Pharm. Fam. |
|---|---|---|---|
| LPP | RIIIJ | 29 | Unknown |
| QKGLVPSVITT | S4.4 | 28 | Unknown |
| QPWLVPSKITN | Mn4.2 | 27 | Unknown |
| DDE | MaIr137, G6.2 | 20 | Unknown |
| D | Vn6.8 | 20 | Unknown |
| V | Mi010 | 20 | Unknown |
| E | CaHr91 | 17 | Unknown |
| TVDEA | Vi6.7 | 15 | Unknown |
| E | Eb6.22 | 15 | Unknown |
| G | ABVIC | 15 | Unknown |
| TVGEE | Tr7.4 | 15 | Unknown |
| TATEE | Ar6.24 | 14 | Unknown |
| EA | M6.2 | 13 | Unknown |
| TTEE | Tr7.3 | 13 | Unknown |
| MTMG | MIL3-b (partial) | 12 | Unknown |
| VPEE | Ar6.28 | 12 | Unknown |
| DE | Ac6.2 | 12 | Unknown |
| TxO1 | 11 | omega | |
| D | MIL2-a | 11 | Unknown |
| Mr6.8 | 11 | Unknown | |
| TTAESWWEGE | Mr6.16 | 11 | Unknown |
| G | MIL3-f | 11 | Unknown |
| Vn6.15 | 11 | Unknown | |
| Mr6.1 | 10 | Unknown | |
| S | Pu6.7 | 10 | Unknown |
| G | Pn6.7 | 10 | Unknown |
| SIAGRTTTEE | Ts6.7 | 10 | Unknown |
| DG | MVIA, Cn6.1 | 10 | delta |
| N | SIIIA[del1] | 9 | Unknown |
| KTTAESWWEGE | |||
| TsMEKL-03 | 8 | Unknown | |
| RHG | TIIIA | 8 | mu |
| D | Conotoxin-1 | 8 | Unknown |
| Cal6.1d | 8 | Unknown | |
| KTTAESWWEGE | |||
| Vn6.5 | 7 | Unknown | |
| MaIr193 | 7 | Unknown | |
| ABVIL | 7 | Unknown | |
| WWEGE | Vn6.3 | 7 | Unknown |
| WWWGG | |||
| ELYRFPSRY | Vc6.26 | 7 | Unknown |
| YE | CnVIA, St6.2 | 7 | delta |
| Lv3-IP01 | 7 | Unknown | |
| V | Ar6.2 | 6 | Unknown |
| E | Cal6.1h | 6 | Unknown |
| Eu3.2 | 6 | Unknown | |
| Co3-IP02, Ts3-IP07, Vr3-IP08, Rt3-IP03, Ca3-IP02, Ec3-IP03 | 6 | Unknown | |
| Tx6.3 | 6 | Unknown | |
| Ar6.19 | 6 | Unknown | |
| FP | |||
| EADW | Cl9.4 | 6 | Unknown |
| GPP | PIIIF [Y17S,N18S,L20S] | 5 | Unknown |
| D | Da6.6, Tx6.6 | 5 | Unknown |
| OIVA [K15N] | 5 | Unknown | |
| Ml6.2 | 5 | Unknown | |
| Ts3.1 | 5 | Unknown | |
| Q | Ar6.17 | 5 | Unknown |
| YYDDYDEYYY | Mi029 | 5 | Unknown |
| S | Pu6.15 (partial) | 5 | Unknown |
| VKP | Da6.2 | 5 | Unknown |
| FAVIFT | Co6.1 | 4 | Unknown |
| WWDGE | VeG52 | 4 | Unknown |
| Fla6.16 | 4 | Unknown | |
| G4.1 | 4 | Unknown | |
| MGYILPALSQQT | Co3-D01 | 4 | Unknown |
| MKLMLSALRQQE | Lv3-YH04 | 4 | Unknown |
| S | Pu6.17 | 4 | Unknown |
| Pn6.3 | 4 | Unknown | |
| STS | SO5 | 4 | omega |
| GG | Ca6.2 | 4 | Unknown |
| Ep6.1 | 4 | Unknown | |
| King-Kong 1 | 4 | Unknown | |
| T | LvVIA 2 | 4 | Unknown |
| M1 | 4 | Unknown | |
| PeIVA | 4 | alpha | |
| ATD | Ac6.5 | 4 | Unknown |
| G | LvVID | 3 | Unknown |
| DV | |||
| GG | Cal9.1d | 3 | Unknown |
| KF | Tx3h | 3 | Unknown |
| G | Mr2 | 3 | Unknown |
| E | At6.7 | 3 | Unknown |
| WWEGD | Lt7b | 3 | Unknown |
| SVIA | 3 | omega | |
| Pn6.5 | 3 | Unknown | |
| G6.12 | 3 | Unknown | |
| At6.2 | 3 | Unknown | |
| P | P2a | 3 | Unknown |
| Gm6.3 | 3 | Unknown | |
| SKQ | Mr3.4 | 3 | Unknown |
| PnIVB | 2 | mu | |
| EIILHALGTR | Vr3-T05 | 2 | Unknown |
| VxVIA, MgJ42 | 2 | Unknown | |
| Bt6.4, ErVIA | 2 | Unknown | |
| Mr3.8 | 2 | Unknown | |
| Eu3.3, Bt3.3 | 2 | Unknown | |
| VcVIC | 2 | Unknown | |
| Vc6.40 | 2 | Unknown | |
| EIQHVHMLS | Pu6.23 | 2 | Unknown |
| Gm3-WP04 | 2 | Unknown | |
| DAINVAPGTSITRTETDQE | |||
| RSNGVPT | Di6.11 | 2 | Unknown |
| Pu6.20 | 2 | Unknown | |
| Im9.11 | 2 | Unknown | |
| Ca3-VP01, Cp3-VP05 | 2 | Unknown | |
| YWTE | Bu25 | 2 | Unknown |
| WFGHEE | RVIIA | 2 | Unknown |
| Q | Mr6.29 | 2 | Unknown |
| Q | Vc7.4 | 2 | Unknown |
| QG | MIIIA | 2 | mu |
| T | Pu6.37 | 2 | Unknown |
| S | Vc6.12 | 2 | Unknown |
| T | Pu6.30 | 2 | Unknown |
| TTSTRK | Br7.9 | 2 | Unknown |
| MTKH | ABVIE | 2 | Unknown |
| V | MrIIIF | 2 | Unknown |
| R | S3-I05 | 2 | Unknown |
| R | S3-Y01 | 2 | Unknown |
| VSIWF | Lt9a variant 2 | 2 | Unknown |
| Q | Ar6.5 | 2 | Unknown |
| STDD | Vn6.18 | 2 | Unknown |
| G | Qc3-YDG01 | 2 | Unknown |
| G | Cal6.4c | 2 | Unknown |
| G | Pu6.25 | 2 | Unknown |
| STD | Mr6.23 | 2 | Unknown |
| R | Cp3-H02 | 2 | Unknown |
| R | Bt3-I03, Vx3-I03 | 2 | Unknown |
| WWGEND | Tx7.31 | 2 | Unknown |
| G | Tx3g, Vt3-SR01 | 2 | Unknown |
| SSDEE | Mi034 | 2 | Unknown |
| S | Tx3e, Vt3-TP01, Ec3-TP01-2 | 2 | Unknown |
| T | Im6.7 | 1 | Unknown |
| T | Pc6b | 1 | Unknown |
| TxVIIA | 1 | gamma | |
| G | S1.7 | 1 | Unknown |
| TRG | Cl6.6b | 1 | Unknown |
| Pn6.6 | 1 | Unknown | |
| WREGS | TxMEKL-022/TxMEKL-021 | 1 | Unknown |
| Y | Cl6.8 | 1 | Unknown |
| Y | Vc6.25 | 1 | Unknown |
| TxMMSK-02, Cp3-WP03, Vr3-WP04, S3-WP01, Rt3-WP01 | 1 | Unknown | |
| WRVDSE | Tx7.30 | 1 | Unknown |
| TsMMSK-021 | 1 | Unknown | |
| Lv3-D02 | 1 | Unknown | |
| Tx3-KP03 | 1 | Unknown | |
| Conotoxin-3 | 1 | Unknown | |
| VQPSE | |||
| FTYGG | conkunitzin-G1 | 1 | Unknown |
| V | Mr3.16 | 1 | Unknown |
| Lv3-V02 | 1 | Unknown | |
| TRG | Cl6.10 | 1 | Unknown |
| S3-E03 | 1 | Unknown | |
| Ts3-SGN01 | 1 | Unknown | |
| S | Vn6.16 | 1 | Unknown |
| R | Cp3-V08 | 1 | Unknown |
| S | |||
| Pu6.13 | 1 | Unknown | |
| RD | PuIA | 1 | omega |
| G | MrIIIA | 1 | Unknown |
| G | Tx3-TP01 | 1 | Unknown |
| G | Cp3-D03 | 1 | Unknown |
| E | LtVIB | 1 | Unknown |
| E | Vc6.10 | 1 | Unknown |
| G | Vr3-SP01 | 1 | Unknown |
| GMWGK | TxMEKL-011, LeD51 | 1 | Unknown |
| GVWSE | G6.8 | 1 | Unknown |
| GWDTPAP | |||
| EGHYVSSHLLERQ | Cal6.3a | 1 | Unknown |
| DE | LtIIIA | 1 | iota |
| KFILHALGQWQ | Vc3.4 | 1 | Unknown |
| DD | Cal6.5a | 1 | Unknown |
| KT | Om6.6 | 1 | Unknown |
| L | Gla(3)-TxVI | 1 | Unknown |
| MQGKISSEQHPMFDPIEG | Lt3.6 | 1 | Unknown |
| D | Mi3-E04 | 1 | Unknown |
| D | Mr020 | 1 | Unknown |
| D | PIVE | 1 | kappa |
| DAMQKSKGSGS | LtVIA | 1 | Unknown |
| NPKLSKLTKT | |||
| NSGPT | LiCr95 | 1 | Unknown |
| Q | Ar6.10 | 1 | Unknown |
| Q | Tx3-L02, Vr3-L01, Vt3-L01, S3-L02 | 1 | Unknown |
| Pu6.2 | 1 | Unknown | |
| QK | CnIIIG | 1 | Unknown |
| QQ | TxMMSK-04, Vt3-EP01 | 1 | Unknown |
| R | CnIIIE | 1 | Unknown |
| R | Vr3-Y02, Vt3-Y01, Ts3-Y01 | 1 | Unknown |
| Mr6.2 | 1 | Unknown | |
| R | SxIIIA | 1 | mu |
| APWTVVTATTN | A4.4 | 1 | Unknown |
List of sequences containing eight cysteines in order of interest for experimental characterization, based on the degree (sequence coverage) in alignment graphs (cf. Figure 2). Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications. Node degree corresponds to the number of sequences with pairwise alignments that are long enough and have a high enough percent identity to be homology modeled with the given sequence as a template. Cysteines are highlighted in red to guide the eye. We note in the fourth column the pharmacological family, although it is unknown for the majority of sequences, as it requires a separate experimental determination in most cases.
| Sequence | Name(s) | Degree | Pharm. Fam. |
|---|---|---|---|
| TDV | |||
| TVKWW | Cal12.1p2 | 28 | Unknown |
| GHVP | |||
| TG | R11.10 | 18 | Unknown |
| Q | Vr15b | 14 | Unknown |
| Cp1.1 | 10 | Unknown | |
| Q | Cap15a | 9 | Unknown |
| S | |||
| RDQ | Gla-MrII, Eu12.4 | 9 | Unknown |
| SR | Em11.8 | 8 | Unknown |
| DKWGT | |||
| VLP | Mr11.1 | 6 | Unknown |
| T | Pu11.5 | 6 | Unknown |
| YDAPY | |||
| A | Cal22d | 5 | Unknown |
| GT | Tx11.3 | 5 | Unknown |
| GT | Vc11.4 | 5 | Unknown |
| RGV | |||
| YLWDKN | Cal12.2c | 4 | Unknown |
| T | Vc11.1 | 3 | Unknown |
| Ep11.12 | 2 | Unknown | |
| T | M11.2 | 2 | Unknown |
| R | Im11.14 | 1 | Unknown |
| Vi11.5 | 1 | Unknown | |
| TRSFADLPDDWGM | |||
| Vc11.6 | 1 | Unknown | |
| Im11.1 | 1 | Unknown | |
| RIIPQRRGAQLRHFF | Pu11.9 | 1 | Unknown |
| BtX, Sx11.2 | 1 | kappa | |
| Lt11.3 | 1 | Unknown | |
| Mr15.2 | 1 | Unknown | |
| D | De13b | 1 | Unknown |
| EGGYVRED | Mi045 | 1 | Unknown |
| S | Vt11.3 | 1 | Unknown |
| WPRLYDSD | |||
| SLT | Mr22.1 | 1 | Unknown |
| M | Bt11.4 | 1 | Unknown |
| ASI | Ca11.3 | 1 | Unknown |
List of sequences containing ten cysteines in order of interest for experimental characterization, based on the degree (sequence coverage) in alignment graphs (cf. Figure 2). Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications. Node degree corresponds to the number of sequences with pairwise alignments that are long enough and have a high enough percent identity to be homology modeled with the given sequence as a template. Cysteines are highlighted in red to guide the eye. We note in the fourth column the pharmacological family, although it is unknown for the majority of sequences, as it requires a separate experimental determination in most cases.
| Sequence | Name(s) | Degree | Pharm. Fam. |
|---|---|---|---|
| DRDVQD | |||
| H | Cp20.1 | 19 | Unknown |
| LH | |||
| Lt15.6 | 5 | Unknown | |
| YNRQ | |||
| VY | |||
| HNG | Vc21.1 | 2 | Unknown |
| Q | |||
| SPGKSG | Ac8.1 | 2 | Unknown |
| G | |||
| FYRG | Ca8c | 2 | Unknown |
| T | |||
| GRRA | Pu19.1 | 1 | Unknown |
| SGST | |||
| QRG | G8.3 | 1 | Unknown |
| G | |||
| WG | GVIIIA | 1 | sigma |
| G | |||
| KG | Tx8.1 | 1 | Unknown |
Figure 3Quality of graph-based template library selection criteria. Comparison of root-mean-square deviation (RMSD) distributions from experimental structures for (A,B) structures within the libraries, with each structure modeled by selecting from all other templates within the given library (“in-library” assessment), and (C,D) structures outside the libraries modeled by selecting from all templates within the given library (“out-of-library” assessment). For each homology modeled structure, we choose the best fit to the experiment. The distributions produced by the simple 25% cutoff libraries are shown in blue; the distributions produced by using the graph-based algorithm are shown in orange. Distributionis are transparent for ease of viewing.
List of conotoxins with corresponding PDB structure IDs [55] comprising the 4C library. Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications.
| Name(s) | PDB ID | Sequence |
|---|---|---|
| EpI [sTy15>Y], EpI | 1a0m | GCCSDPRCNMNNPDYC |
| PnIB, PnIB [sTy15Y] | 1akg | GCCSLPPCALSNPDYC |
| CnIA | 1b45 | GRCCHPACGKYYSC |
| AuIB, Ac-AuIB, AuIB [ribbon isoform] | 1mxp | GCCSYPPCFATNPDC |
| ImI [R11E] | 1e74 | GCCSDPRCAWEC |
| ImI [R7L] | 1e75 | GCCSDPLCAWRC |
| ImI [D5N] | 1e76 | GCCSNPRCAWRC |
| TIA | 2lr9 | FNWRCCLIPACRRNHKKFC |
| MrIB, MrIB C-term amidated | 1ieo | VGVCCGYKLCHPC |
| EI | 1k64 | RDPCCYHPTCNMSNPQIC |
| GID, GID*, GID*-NH2, GID*[O16P] | 1mtq | IRDECCSNPACRVNNPHVC |
| SI | 1hje | ICCNPACGPKYSC |
| TXIX | 1wct | ECCEDGWCCXAAP |
| GI | 1xga | ECCNPACGRHYSC |
| Conkunitzin-S1 | 1y62 | RPSLCDLPADSGSGTKAEKRIYYNSARKQ-CLRFDYTGQGGNENNFRRTYDCQRTCL |
| PIA, PIA [R1ADMA] | 1zlc | RDPCCSNPVCTVHNPQIC |
| cMII-6 | 2ajw | GCCSNPVCHLEHSNLCGGAAGG |
| PlXIVA | 2fqc | FPRPRICNLACRAGIGHKYPFCHCR |
| GI (SER12)-benzoylphenylalanine | 2fr9 | ECCNPACGRHYYC |
| GI (ASN4)-benzoylphenylalanine | 2frb | ECCYPACGRHYSC |
| OmIA | 2gcz | GCCSHPACNVNNPHICG |
| BuIA, BuIA[P6O], BuIA[P7O] | 2ns3 | GCCSTPPCAVLYC |
| ImI [P6A] | 2ifi | GCCSDARCAWRC |
| ImI [P6K], ImI [P6K] deamidated | 2ifj | GCCSDKRCAWRC |
| ImI, ImI [C2U,C8U], ImI [C2U,C3U,C8U,C12U], ImI deamidated, A c-ImI, ImI [A9S], ImI [C3U,C12U], ImI [P60], ImI [P6APro], ImI [P6A(S)Pro], ImI [P6guaPro], ImI [P6betPro], ImI [P6fluoPro], ImI [P6fluo(S)Pro], ImI [P6phiPro], ImI [P6phi(S)Pro], ImI [P6benzPro], ImI [P6naphPro], ImI [P6phi(3S)Pro], ImI [P6phi(5R)Pro] | 2bypF | GCCSDPRCAWRC |
| CMrVIA [K6P], CMrVIA [K6P] amidated | 2ih7 | VCCGYPLCHPC |
| CMrVIA, CMrVIA amidated | 2b5p | VCCGYKLCHPC |
| Cyclic MrIA | 2j15 | NGVCCGYKLCHPCAG |
| RgIA [P6V] | 2juq | GCCSDVRCRYRCR |
| RgIA [D5E] | 2jur | GCCSEPRCRYRCR |
| RgIA [Y10W] | 2jus | GCCSDPRCRWRCR |
| RgIA | 2jut | GCCSDPRCRYRCR |
| Pc16a | 2ler | SCSCKRNFLCC |
| Midi | 2lu6 | CNCSRWARDHSRCC |
| TxIB | 2lz5 | GCCSDPPCRNKHPDLC |
| Li1.12, TxID | 2m3i | GCCSHPVCSAMSPIC |
| Ar1248 | 2m62 | GVCCGVSFCYPC |
| Lo1a | 2md6 | EGCCSNPACRTNHPEVCD |
| LvIA | 5xgl | GCCSHPACNVDHPEIC |
| Exendin-4/conotoxin chimera (Ex-4[1-27]/pl14a) | 2naw | HGEGTFTSDLSKQMEEEAVRC-FIECLKGIGHKYPFCHCR |
| Bt1.8 | 2nay | GCCSNPACILNNPNQC |
| TXIA(A10L) | 2uz6 | GCCSRPPCILNNPDLC |
| CnVA | 3zkt | ECCHRQLLCCLRFV |
| Cyclic Vc1.1 | 4ttl | GCCSDPRCNYDHPEICGGAAGG |
| GIC | 1ul2 | GCCSHPACAGNNQHIC |
| PeIA, Bt1.4, PeIA[P6O], PeIA[P13O] | 5jmeF | GCCSHPACSVNHPELC |
| Pn10.1 | 5t6v | STCCGYRMCVPC |
| LsIA, LsIA# | 5t90F | SGCCSNPACRVNNPNIC |
| VilXIVA | 6efe | GGLGRCIYNCMNSGGGLSFIQCKTMCY |
List of conotoxins with corresponding PDB structure IDS [55] comprising the 6C library. Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications.
| Name(s) | PDB ID | Sequence |
|---|---|---|
| conotoxin-GS | 1ag7 | ACSGRGSRCPPQCCMGLRCGRGNPQKCIGAHEDV |
| PIIIE, PIIIE [K9S], PIIIE [S17Y,S18N,S20L] | 1jlo | HPPCCLYGKCRRYPGCSSASCCQR |
| MVIIC, S6.6 | 1omn | CKGKGAPCRKTMYDCCSGSCGRRGKC |
| TVIIA | 1eyo | SCSGRDSRCPPVCCMGLMCSRGKCVSIYGE |
| TxVII | 1f3k | CKQADEPCDVFSLDCCTGICLGVCMW |
| TxVIA | 1fu3 | WCKQSGEMCNLLDQNCCDGYCIVLVCT |
| EVIA | 1g1z | DDCIKPYGFCSLPILKNGLCCSGACVGVCADL |
| GIIIB | 1gib | RDCCTPPRKCKDRRCKPMKCCA |
| GVIA | 1ttl | CKSPGSSCSPTSYNCCRSCNPYTKRCY |
| PIVA | 1p1p | GCCGSYPNAACHPCSCKDRPSYCGQ |
| EIVA | 1pqr | GCCGPYPNAACHPCGCKVGRPPYCDRPSGG |
| PIIIA | 1r9i | QRLCCGFPKSCRSRQCKPHRCC |
| MVIIA[R10K] | 1tt3 | CKGKGAKCSKLMYDCCTGSCRSGKC |
| Am2766 | 1yz2 | CKQAGESCDIFSQNCCVGTCAFICIE |
| MrIIIE | 2efz | VCCPFGGCHELCYCCD |
| FVIA | 2km9 | CKGTGKSCSRIAYNCCTGSCRSGKC |
| Im23a, Mr23a | 2lmz | IPYCGQTGAECYSWCIKQDLSKDWCCDFVKDIRMNPPADKCP |
| BuIIIB | 2lo9 | VGERCCKNGKRGCGRWCRDHSRCC |
| KIIIA, KIIA [W8dTrp] | 2lxg | CCNCSSKWCRDHSRCC |
| Ar1446 | 2m61 | CCRLACGLGCHPCC |
| cGm9a | 2mso | SCNNSCQSHSDCASHCICTFRGCGAVNGLP |
| cBru9a | 2msq | SCGGSCFGGCWPGCSCYARTCFRDGLP |
| Mo3964 | 2mw7 | DGECGDKDEPCCGRPDGAKVCNDPWVCILTSSRCENP |
| MfVIA | 2n7f | RDCQEKWEYCIVPILGFVYCCPGLICGPFVCV |
| cyclic PVIIA | 2n8e | CRIPNQKCFQHLDDCCSRKCNRFNKCVLPETGGG |
| conotoxin-muOxi-GVIIJ | 2n8h | GWCGDPGATCGKLRLYCCSGFCDSYTKTCKDKSSA |
| CnIIIC | 2yen | QGCCNGPKGCSSKWCRDHARCC |
| CcTx | 4b1qP | APWLVPSQITTCCGYNPGTMCPSCMCTNTC |
| Reg12i | 6bx9 | CCTALCSRYHCLPCC |
| MoVIB | 6ceg | CKPPGSKCSPSMRDCCTTCISYTKRCRKYY |
List of conotoxins with corresponding PDB structure IDS [55] comprising the 8C library. Name or names of sequences are taken from the Conoserver database [40]. Multiple names for the same sequence indicate the same sequence is produced by different species or has different post-translational modifications.
| Name(s) | PDB ID | Sequence |
|---|---|---|
| G11.1 | 6cei | CAVTHEKCSDDYDCCGSLCCVGICAKTIAPCK |
| RXIA, RXIA[Btr33>W] | 2p4l | GPSFCKADEKPCEYHADCCNCCLSGICAPSTNWILPGCSTSSFFKI |
Figure 4Schematic of procedure for producing homology modeled structures from library templates for conotoxin sequences with unknown structure lying in the set . We employ a BLAST alignment procedure and specifically force the cysteines to align to further refine the templates that were originally chosen for inclusion using the graph-based Rost criterion. Graph inset of the eight cysteine graph is an example. The inset consisting of an example alignment input figure was created using the alignment obtained from BLAST [42] and visualized with Aliview [43].
Figure 5Quality of modeling criteria. (A,B) Distribution of root-mean-square deviation (RMSD) for homology models compared with their corresponding experimental structures, without prior removal of any structural alignment outliers. Each experimental structure present in the library was modeled by selecting from all other templates in the library. The top three models for each structure based on combined MODELLER DOPE and PROCHECK G-FACTOR scores are considered here. (A) Distribution mean = 2.00 Å, standard deviation = 0.97 Å. (B) Distribution mean = 2.25 Å, standard deviation = 1.20 Å. (C,D) Distribution of fraction of native contacts present in each of the homology modeled structures, with respect to the experimental structure. Each experimental structure present in the library was modeled by selecting from all other templates in the library. The top three models for each structure based on combined MODELLER DOPE and PROCHECK G-FACTOR scores are considered here. (C) Distribution mean = 0.797, standard deviation = 0.108. (D) Distribution mean = 0.805, standard deviation = 0.097.
Figure A3Distribution of root-mean-square deviation (RMSD) for homology models compared with their corresponding experimental structures, after refinement involving the rejection of structural alignment outliers. Each experimental structure present in the library was modeled by selecting from all other templates in the library. The top three models for each structure based on combined MODELLER DOPE and PROCHECK G-FACTOR scores are considered here. (A) Distribution mean = 1.55 Å, standard deviation = 0.92 Å. (B) Distribution mean = 1.17 Å, standard deviation = 0.67 Å.