| Literature DB >> 28786958 |
Dominique Koua1,2, Lucia Kuhn-Nentwig3.
Abstract
Spider venoms are rich cocktails of bioactive peptides, proteins, and enzymes that are being intensively investigated over the years. In order to provide a better comprehension of that richness, we propose a three-level family classification system for spider venom components. This classification is supported by an exhaustive set of 219 new profile hidden Markov models (HMMs) able to attribute a given peptide to its precise peptide type, family, and group. The proposed classification has the advantages of being totally independent from variable spider taxonomic names and can easily evolve. In addition to the new classifiers, we introduce and demonstrate the efficiency of hmmcompete, a new standalone tool that monitors HMM-based family classification and, after post-processing the result, reports the best classifier when multiple models produce significant scores towards given peptide queries. The combined used of hmmcompete and the new spider venom component-specific classifiers demonstrated 96% sensitivity to properly classify all known spider toxins from the UniProtKB database. These tools are timely regarding the important classification needs caused by the increasing number of peptides and proteins generated by transcriptomic projects.Entities:
Keywords: classification; hmmcompete; machine learning; profile HMM; spider; toxin
Mesh:
Substances:
Year: 2017 PMID: 28786958 PMCID: PMC5577579 DOI: 10.3390/toxins9080245
Source DB: PubMed Journal: Toxins (Basel) ISSN: 2072-6651 Impact factor: 4.546
InterPro signatures and Pfam HMMs used for spider toxin classification and related peptide family names used in ToxProt (as of 20 April 2017).
| InterPro Signature Combination | Pfam HMM | ToxProt First Classification Level (TPL1) * | Total of Annotated Sequences in ToxProt |
|---|---|---|---|
| IPR000737; IPR011052 | PF00299 | Protease inhibitor I7 (squash-type serine protease inhibitor) family | 1 |
| IPR002110; IPR020683 | PF00023; PF12796 | Latrotoxin superfamily | 4 |
| IPR002110; IPR020683; IPR013829 | PF00023; PF12796; PF13606 | Latrotoxin superfamily | 3 |
| IPR002223 | PF00014 | Venom Kunitz-type family | 39 |
| IPR002223; IPR020901 | PF00014 | Venom Kunitz-type family | 8 |
| IPR003614 | 1 | ||
| IPR004169 | Plectoxin superfamily (16) | 18 | |
| Spider toxin CSTX superfamily (1) | |||
| No class (1) | |||
| IPR004214 | PF02950 | Spider toxin Tx2 family (1) | 2 |
| Huwentoxin-1 family (1) | |||
| IPR005853; IPR013605 | PF08396 | Omega-agatoxin superfamily | 13 |
| IPR007733; IPR027300 | PF05039 | Spider agouti family | 1 |
| IPR008017 | PF05353 | Delta-atracotoxin family | 7 |
| IPR008197 | Spider wap-1 family(17) | 21 | |
| Spider wap-2 family (4) | |||
| IPR009243 | Beta/delta-agatoxin family | 12 | |
| IPR009243; IPR004169 | Beta/delta-agatoxin family | 2 | |
| IPR009415 | PF06357 | Shiva superfamily | 13 |
| IPR009415; IPR018071 | PF06357 | Shiva superfamily (14) | 15 |
| No class (1) | |||
| IPR011142 | Spider toxin CSTX superfamily | 6 | |
| IPR011696 | PF07740 | Huwentoxin-1 family | 114 |
| IPR011696; IPR013140 | PF07740 | Huwentoxin-1 family | 119 |
| IPR011696; IPR016191 | PF07740 | Huwentoxin-1 family | 4 |
| IPR012499 | PF07945 | Shiva superfamily | 7 |
| IPR012522 | PF08025 | Oxyopinin-2 family | 4 |
| IPR012625 | PF08089 | Huwentoxin-2 family | 79 |
| IPR012625; IPR012627 | PF08092 | Magi-1 superfamily | 1 |
| IPR012626 | PF08091 | Insecticidal toxin ABC family | 5 |
| IPR012627 | PF08092 | Magi-1 superfamily | 82 |
| IPR012628 | PF08093 | Magi-5 family | 3 |
| IPR012633 | PF08115 | Spider toxin SFI family | 10 |
| IPR012634 | PF08116 | Spider neurotoxin 21C2 family | 4 |
| IPR013139; IPR012628 | PF08093 | Omega-atracotoxin type 2 family | 5 |
| IPR013605 | PF08396 | Omega-agatoxin superfamily | 14 |
| IPR016328; IPR009243 | Beta/delta-agatoxin family | 13 | |
| IPR017946 | Arthropod phospholipase D family | 199 | |
| IPR017946; IPR000909 | Arthropod phospholipase D family | 2 | |
| IPR018802 | PF10279 | Latarcin superfamily | 11 |
| IPR019553 | PF10530 | Plectoxin superfamily (1) | 62 |
| Spider toxin CSTX superfamily (6) | |||
| U6-lycotoxin family(10) | |||
| U7-lycotoxin family (11) | |||
| U8-lycotoxin family (28) | |||
| U11-lycotoxin family (6) | |||
| IPR019553; IPR004169 | PF10530 | U10-lycotoxin family | 5 |
| IPR019553; IPR011142 | PF10530 | Spider toxin CSTX superfamily | 104 |
| IPR020683 | Latrotoxin superfamily | 1 | |
| IPR020683; IPR007094 | Latrotoxin superfamily | 1 | |
| IPR023569 | PF06607 | AVIT (prokineticin) family | 9 |
| IPR024079; IPR001506; IPR006026 | PF01400 | Peptidase M12A family | 1 |
| IPR027300 | Plectoxin superfamily (5) | 6 | |
| No class (1) | |||
| IPR027300; IPR004169 | No class | 1 | |
| IPR034035; IPR024079; IPR001506; IPR006026 | PF01400 | Peptidase M12A family | 4 |
* When more than one family name is associated to a given signature, the number of sequences annotated as a member of each family is indicated between parentheses.
Distribution of spider toxin sequences from ToxProt not associated to any InterPro signature or to any peptide family (25% of the sequences).
| ToxProt Family | Number of Sequences |
|---|---|
| Aptotoxin family | 4 |
| Arthropod phospholipase D family | 13 |
| AVIT (prokineticin) family | 1 |
| Bradykinin-related peptide family | 5 |
| Cupiennin family | 43 |
| Cytoinsectotoxin family | 20 |
| Helical arthropod-neuropeptide-derived (HAND) family | 3 |
| Huwentoxin-1 family | 12 |
| HWTX-LSTX family | 2 |
| Insecticidal toxin DTX family | 3 |
| JZTX-72 family | 3 |
| Latrotoxin superfamily | 1 |
| Litx family | 3 |
| Magi-1 superfamily | 1 |
| Omega-agatoxin superfamily | 2 |
| Omega-lycotoxin family | 7 |
| Phrixotoxin family | 13 |
| Plectoxin superfamily | 7 |
| Shiva superfamily | 2 |
| Spider agouti family | 9 |
| Spider LiTx3-related peptide family | 2 |
| Spider toxin CSTX superfamily | 6 |
| Spider toxin Tx2 family | 7 |
| Spider toxin Tx3-6 family | 7 |
| Spider wap-1 family | 1 |
| U12-lycotoxin family | 6 |
| U2-agatoxin family | 24 |
| Venom metalloproteinase (M12B) family | 1 |
| No family name | 114 |
A total of 229 peptides (17%) from the database are classified, but not related to any InterPro signature. A total of 114 sequences (8%) from the database are associated to neither an InterPro signature nor a similarity-based classification. This situation indicates that more classifiers are needed and/or the sensitivity of current classifiers has to be improved.
Distribution of the new spider toxin profile HMMs and their related InterPro and Pfam signatures.
| Toxin Type (Level 1) | Classifiers * (Level 2 and 3) | Discriminative ToxProt Annotation | InterPro Signatures | Pfam HMMs |
|---|---|---|---|---|
| Spider Cationic peptides (SC) 21 profile HMMs | SC_01_00 | Cytoinsectotoxin family | ||
| SC_02_00 | Oxyopinin family | IPR012522 | PF08025 | |
| SC_03_00 to SC_03_07 | Latarcin superfamily | IPR018802 | PF10279 | |
| SC_04_01 to SC_04_10 | Cupiennin family | |||
| CsTx-16 ** | ||||
| SC_05_00 | Bradykinin-related peptide family | |||
| Spider Neurotoxin (SN) 170 profile HMM | SN_01_00 | U2-agatoxin family | ||
| SN_02_00 to SN_02_09 | Plectoxin superfamily | IPR004169 | ||
| CsTx-19 **, CsTx-28,34,36 *** | ||||
| SN_03_01 to SN_03_06 | Spider toxin Tx2 family | IPR004214 | PF02950 | |
| SN_04_00 to SN_04_04 | Omega-agatoxin superfamily | IPR005853; IPR013605 | PF08396 | |
| SN_05_00 to SN_05_06 | Spider agouti family | IPR007733; IPR027300 | PF05039 | |
| SN_06_00 | Delta-atracotoxin family | IPR008017 | PF05353 | |
| SN_07_00 to SN_07_04 | Beta/delta agatoxin family | IPR009243 | ||
| SN_08_01 to SN_08_02 | Shiva superfamily, Omega-toxin family | IPR009415; IPR018071 | PF06357 | |
| SN_09_00 | Spider toxin Tx3-6 family | |||
| SN_10_00 to SN_10_67 | Huwentoxin-1 family | IPR011696 | PF07740 | |
| SN_11_00 | Shiva superfamily, Kappa toxin family | IPR012499 | PF07945 | |
| SN_12_01 to SN_12_08 | Huwentoxin-2 family | IPR012625 | PF08089 | |
| SN_13_00 to SN_13_03 | Insecticidal toxin ABC family | IPR012626 | PF08091 | |
| SN_14_00 to SN_14_09 | Magi-1 superfamily | IPR012627 | PF08092 | |
| SN_15_01 to SN_15_02 | Magi-5 family | IPR012628 | PF08093 | |
| SN_16_00 | Spider toxin SFI family | IPR012633 | PF08115 | |
| SN_17_00 | Spider neurotoxin 21C2 family | IPR012634 | PF08116 | |
| SN_18_00 | AVIT (prokineticin) family | IPR023569 | PF06607 | |
| SN_19_00 to SN_19_12 | Spider toxin CsTx superfamily | IPR019553; IPR011142 | PF10530 | |
| SN_20_00 | CsTx-20 ** | |||
| SN_21_00 | Aptotoxin_family | |||
| SN_22_00 | Helical arthropod neuropeptide derived (HAND) family | |||
| SN_23_00 | Double-knot toxin subfamily | |||
| SN_24_00 | OAIP 4 subfamily | |||
| SN_25_00 | HWTX-LSTX family | |||
| SN_26_00 | Insecticidal toxin DTX family | |||
| SN_27_00 | JZTX-72 family | |||
| SN_28_00 | Litx family | |||
| SN_29_00 | Omega lycotoxin family | |||
| SN_30_00 | Phrixotoxin family | |||
| SN_31_00 | U12-lycotoxin family | |||
| SN_32_00 to SN_32_02 | MIT-like AcTx family ** | IPR020202 | PF17556 | |
| CsTx-21 **, CsTx-22 *** | ||||
| SN_33_00 | CsTx-26 *** | |||
| SN_34_00 | CsTx-29 *** | |||
| SN_35_00 | CsTx-35 *** | |||
| SN_36_00 | Huwentoxin type 10 ** | |||
| SN_37_00 | CsTx-37 ** | |||
| SN_38_00 | CsTx-38 ** | |||
| SN_39_00 | Spider LiTx3 related peptide family | |||
| SN_40_00 | Spiderine ** | |||
| Venom Proteins (VP) 28 profile HMMs | VP_01_00 | Protease inhibitor I7 (squash type serine protease inhibitor) family | IPR000737; IPR011052 | PF00299 |
| VP_02_00 | Peptidase M12A family | IPR024079; IPR001506; | PF01400 | |
| VP_03_01 to VP_03_02 | Arthropod phospholipase D family | IPR017946 | ||
| VP_04_00 | Venom metalloproteinase (M12B) family | |||
| VP_05_00 | Hyaluronidase ** | IPR018155 | ||
| VP_06_00 | Arthropod Phospholipase A2 ** | IPR001211 | ||
| VP_07_00 | Angiotensin-converting Enzyme ** | IPR033591 | ||
| VP_08_00 | Peptidylglycine alpha-amidating monooxygenase ** | IPR000720 | ||
| VP_09_00 | Signal peptidase ** | IPR001733 | ||
| VP_10_00 | Venom serine protease *** | IPR001314 | ||
| VP_11_01 to VP_11_02 | Spider WAP family | IPR008197 | ||
| VP_12_01 to VP_12_04 | Venom Kunitz-type family | IPR002223; IPR020901 | PF00014 | |
| VP_13_01 to VP_13_02 | Cysteine-rich secretory protein | IPR014044; | ||
| IPR002413 | ||||
| VP_14_00 | Thyroglobulin-like protein ** | IPR000716 | ||
| VP_15_00 | Leucine rich peptide ** | IPR032675 | ||
| VP_16_00 | Protein disulfide-isomerase ** | IPR005792 | ||
| VP_17_00 | Tachylectin 5A ** | IPR002181 | ||
| VP_18_00 | Cystatin ** | IPR027214 | ||
| VP_19_01 to VP_19_04 | Latrotoxin superfamily | IPR002110; IPR020683 | PF00023 |
*: When the level 2 classifier concerns very divergent sequences, the family profile (xx_nn_00) showed poor classification performance and were, therefore, eliminated. **: Sequences from these families were not present in ToxProt. The family is documented in UniProtKB (Venomzone). ***: These families are newly detected or in unpublished venom gland transcriptomes. The new proposed classifiers are not only intended to reorganize the ToxProt classification, but improve the overall coverage for all types of known venom components.
Figure 1The new classifiers are able to produce very sharp discrimination between closely-related structural compositions. While all shown sequences share the cysteine framework C-C-CC-C-C, match the Pfam model Toxin_12 (PF07740), and are annotated in ToxProt as members of the huwentoxin-1 family, our new classifiers were able to separate them between various Spider Neurotoxin family 10 groups, each group representing a specific structural variation.
Figure 2hmmcompete outputs. The standard output is a tab-delimited file that can be opened by text editors (Word, OOWriter, Wordpad, nano, vi, gedit, kate, etc.) or spreadsheet editors (Excel, OOCalc, etc.). An example of the tab-delimited output is provided as Supplementary File F5. An HTML can be saved with the option --htmout. An example of the HTML output is provided as Supplementary File F7.
Complete list of hmmcompete options.
| Option Name | Description |
|---|---|
| --hmm <hmmDbPath> | The profile database to be used for sequence classification. HMMER3 profiles. This is a mandatory argument. |
| -i or --in <seqFastaDb> | The sequence database to be classified. In FASTA format. This is a mandatory argument. |
| -h or --help | Help. Print a brief reminder of command line usage and available options. |
| -v or --version | Print |
| -o <file_path> or --out <file_path> | Direct the main tabular output to a file <f> instead of the default stdout. |
| -d or --desc | Display profile description in the main output when the description is present in the profile. Default: ‘Off’. |
| --altpred | Display number of alternative profile HMM matching a sequence, as well as a summarized description of each alternative match. This description includes positions of the query sequence matching the profile, as well as the produced score. Default: ‘Off’. |
| --allseq | Also report sequences not matched by any model. Default: Off, i.e., only query sequences matched by a profile in hmmDB are reported by default. |
| --pepreg | Display region of the target sequence that matched the reported profileHMM. Default: ‘Off’. |
| --hsout <file_path> | Save an output file similar to that of |
| --htmout <file_path> | Save an HTML version of the output. May be useful for web integration. |