| Literature DB >> 29231872 |
Teresa Romero-Gutierrez1, Esteban Peguero-Sanchez2, Miguel A Cevallos3, Cesar V F Batista4, Ernesto Ortiz5, Lourival D Possani6.
Abstract
This communication reports a further examination of venom gland transcripts and venom composition of the Mexican scorpion Thorellius atrox using RNA-seq and tandem mass spectrometry. The RNA-seq, which was performed with the Illumina protocol, yielded more than 20,000 assembled transcripts. Following a database search and annotation strategy, 160 transcripts were identified, potentially coding for venom components. A novel sequence was identified that potentially codes for a peptide with similarity to spider ω-agatoxins, which act on voltage-gated calcium channels, not known before to exist in scorpion venoms. Analogous transcripts were found in other scorpion species. They could represent members of a new scorpion toxin family, here named omegascorpins. The mass fingerprint by LC-MS identified 135 individual venom components, five of which matched with the theoretical masses of putative peptides translated from the transcriptome. The LC-MS/MS de novo sequencing allowed to reconstruct and identify 42 proteins encoded by assembled transcripts, thus validating the transcriptome analysis. Earlier studies conducted with this scorpion venom permitted the identification of only twenty putative venom components. The present work performed with more powerful and modern omic technologies demonstrates the capacity of accomplishing a deeper characterization of scorpion venom components and the identification of novel molecules with potential applications in biomedicine and the study of ion channel physiology.Entities:
Keywords: RNA-seq; Thorellius; Vaejovidae; proteome; transcriptome; venom; venom gland
Mesh:
Substances:
Year: 2017 PMID: 29231872 PMCID: PMC5744119 DOI: 10.3390/toxins9120399
Source DB: PubMed Journal: Toxins (Basel) ISSN: 2072-6651 Impact factor: 4.546
Figure 1Relative diversity of the annotated transcripts putatively coding for venom components in accordance to protein families and subfamilies. The abundance of the particular transcripts is not considered. The group with the highest representation is that of the enzymes.
The nomenclature used for the T. atrox transcripts.
| Species Code | Meaning | Family Code | Meaning | Subtype Code | Meaning | Example |
|---|---|---|---|---|---|---|
| Tat | NaT | Na-channel | Alp | Alpha-Na Toxin | TatNaTAlp01 | |
| Toxins | Bet | Beta-Na Toxin | TatNaTBet01 | |||
| KTx | K-channel | Alp | Alpha-K Toxin | TatKTxAlp01 | ||
| Bet | Beta-K Toxin | TatKTxBet01 | ||||
| Kap | Kappa-K Toxin | TatKTxKap01 | ||||
| Del | Delta-K Toxin | TatKTxDel01 | ||||
| Scr | Scorpin-like | TatKTxScr01 | ||||
| CaT | Ca-channel | Clc | Calcin | TatCaTClc01 | ||
| Lio | Liotoxin-like | TatCaTLio01 | ||||
| Ome | Omegascorpin | TatCaTOme01 | ||||
| HDP | Host | Def | Defensin | TatHDPDef01 | ||
| ND1–5 | NDBPs families 1–5 | TatHDPND201 | ||||
| Ani | Anionic peptide | TatHDPAni01 | ||||
| Wap | Waprin-like | TatHDPWap01 | ||||
| Enz | Enzymes | PA2 | Phospholipase A2 | TatEnzPA201 | ||
| PLB | Phospholipase B | TatEnzPLB01 | ||||
| PLD | Phospholipase D | TatEnzPLD01 | ||||
| SeP | Serine protease | TatEnzSeP01 | ||||
| MtP | Metalloprotease | TatEnzMtP01 | ||||
| Hya | Hyaluronidase | TatEnzHya01 | ||||
| Pin | Protease | Srp | Serpin-like | TatPInSrp01 | ||
| Inhibitors | Kun | Kunitz-type | TatPInKun01 | |||
| Oth | Other | La1 | La1-like | TatOthLa101 | ||
| venom | CRI | CRISP | TatOthCRI01 | |||
| components | Und | Undefined | TatOthUnd01 |
Figure 2The putative sodium channel-acting toxins derived from the T. atrox transcripts. (A) Distribution of the found transcripts into alpha and beta NaTx subfamilies. (B) Alignment of the translated complete CDS potentially coding for α-NaTxs with their closest matches. (C) Alignment of two precursors derived from transcripts potentially coding for β-NaTxs with their closest matches. In all the alignments shown in figures in this report, points indicate sequence identity and dashes indicate gaps. When present, the sequence elements are shown as follows: predicted signal peptides are underlined, mature peptides are in bold type with the cysteine arrays highlighted in blue, and propeptides are in italics. The UniProt/GenBank identifiers precede the name of the scorpion species for the reference sequences. The identity percentages are always calculated for the whole sequences shown, including the signal peptides and propeptides when present.
Figure 3Potassium channel-acting toxins derived from the T. atrox transcripts. (A) Distribution of the found transcripts with respect to their subfamilies. (B) Two of the precursors of α-KTxs derived from transcripts are shown aligned to the sequences of their closest matches by BLAST. (C) The precursors of the scorpine-like peptides of the β-KTxs subfamily aligned to previously reported sequences from this species and HgeScplp2 as reference. An exact sequence to the one indicated as ViScplp2 was also found in this work. (D) The precursor identified for the κ-KTx aligned to its closest BLAST match. (E) The encoded mature sequence of the found δ-KTxs aligned to other known scorpion Kunitz-type peptides.
Figure 4Putative calcium channel-acting toxins derived from the T. atrox transcripts. (A) Distribution of the found transcripts with respect to their types. (B) The precursors of calcins, aligned with the precursors of their closest matches by BLAST. (C) The precursors of the liotoxin-like peptides, aligned to the reference sequences. (D) The mature putative Cav-acting toxin found in this work, and the other scorpion transcript-derived similar sequences from the databases, aligned to the type IV-ω-agatoxins from A. aperta as references.
Figure 5Possible Host Defense Peptides (HDPs) deduced from the transcriptome analysis. (A) Distribution of the found transcripts with respect to their types. The NDBPs are further expanded to show their families. (B) Precursors of the T. atrox β-defensins, aligned to reference precursors from other scorpion defensins. (C–E) The same sequence analysis for the precursors of the found NDBPs from families 2, 3 and 4, respectively.
Physicochemical parameters predicted for the mature NDBPs by the HeliQuest software (http://heliquest.ipmc.cnrs.fr/cgi-bin/ComputParams.py).
| ID | NDBp Family | Length of the Mature Peptide | Hydrophobicity | Hydrophobic Moment | Charge |
|---|---|---|---|---|---|
| TatHDPND202 | NDBP-2 | 47 | 0.263 | 0.065 | +6 |
| TatHDPND201 | NDBP-2 | 43 | 0.212 | 0.089 | +6 |
| TatHDPND301 | NDBP-3 | 19 | 0.903 | 0.377 | +2 |
| TatHDPND302 | NDBP-3 | 25 | 0.495 | 0.327 | +4 |
| TatHDPND405 | NDBP-4 | 13 | 0.819 | 0.606 | +1 |
| TatHDPND406 | NDBP-4 | 13 | 0.742 | 0.458 | +1 |
| TatHDPND407 | NDBP-4 | 13 | 0.746 | 0.456 | +1 |
| TatHDPND402 | NDBP-4 | 13 | 0.778 | 0.779 | 0 |
| TatHDPND403 | NDBP-4 | 13 | 0.752 | 0.792 | +1 |
| TatHDPND401 | NDBP-4 | 13 | 0.793 | 0.595 | +1 |
Figure 6La1-like peptides coded by transcripts from T. atrox. Only the mature sequences were used in the alignment.
Mass fingerprint from the fractions of the T. atrox soluble venom. The distribution of venom components found with LC-MS were reported in 20 min intervals. Monoisotopic mass was considered for those components with a MW below 3000 Da and for components with MW above 3000 Da, average mass was considered.
| RT 1 (min) | MW 2 (Da) | RT (min) | MW (Da) |
|---|---|---|---|
| 1462.7, 2057.24, 2117.68, 2265.06, 2796.27, 3111.96, 9115.86, 10663.93, 11,123.28, | 1944.15, 2645.50, 2815.60, 6330.03, 6473.90, 6714.40, 7438.62, 7639.27, 7843.00, 8049.12, 8213.16, 8829.81, 8950.11,9535.2 | ||
| 1076.62, 1205.68, 1212.80, 1673.85, 1817.88, 3427.38, 3499.92, 3586.92, 3878.10, 4197.53, 12,306.36 | 1337.72, 1497.81, 2193.06, 2248.28, 2347.32, 3338.30, 7040.46, 7956.10, 8201.97, 8727.14 | ||
| 1331.64, 1799.04, 1886.82, 2333.32, 2411.36, 2447.40, 2592.26, 3777.63, 3945.62, 5813.52 | 1296.10, 2151.20, 4171.38, 4302.42, 4389.42, 4697.56, 4762.08, 6195.66 | ||
| 2377.16, 2850.1, 2944.70, 3606.60, 4485.10, 4595.04, 5279.52, 5654.40 | 10,039.5, 13,729.41, 14,079.03 | ||
| 3332.90, 3535.47, 3718.60, 3787.85, 4113.96, 4125.80, 4204.00, 4279.05, 4290.36, 5196.42, 5756.56, 7011.33, 7123.96, 7236.99, 8126.40, 8328.51 | 1828.00, 6554.31, 6750.45, 6946.57, 7269.84, 8272.50, 10,545.20, 12,430.9, 13,591.92, 13,815.51, 14,614.72 | ||
| 3223.80, 3243.80, 3569.92, 3767.15, 4250.67, 8468.54, 8581.60, 8716.70, 9056.88, 9490.25 | 3821.44, 5409.48, 10,882.9, 16,915.41 | ||
| 1198.64, 1648.86, 3267.39, 4036.16, 4348.40, 4561.84, 4815.2 | 2038.11, 3347.5, 4505.55, 4791.65, 4949.7, 8355.48, 11,174.46, 11,833.92, 11,847.44, 11,899.27, 13,891.59, 14,257.71, 14,705.56, 14,741.70 |
1 RT (retention time); 2 MW (experimental molecular weight in Daltons).
Figure 7Relative distribution of the MW identified on the venom of T. atrox scorpion. Peptides between 1000 and 5000 Da are the most abundant, covering more than 50% of the components identified on the fingerprint.
Molecular masses identified in T. atrox transcriptome.
| 5196.79 | 5196.42 | 80–100 | |
| 6195.85 | 6195.66 | 180–200 | |
| 3607.43 | 3606.60 | 60–80 | |
| 4114.86 | 4113.96 | 100–120 | |
| 3788.48 | 3787.85 | 80–100 | |
Amino acid sequences found by LC-MS/MS using the transcriptome of T. atrox as a database for protein identification.
| Transcriptome ID | Score | Coverage | Protein Type | Accession Number of the Reference Protein |
|---|---|---|---|---|
| comp8310_c0_seq1 | 46.06 | 19.1% | Allatostatins-like | XP_013775495 |
| comp32030_c1_seq1 | 28.07 | 34.8% | Angiotensin-converting enzyme | XP_013773749 |
| comp32030_c2_seq1 | 32.73 | 7.7% | Angiotensin-converting enzyme | XP_013773749 |
| comp33161_c0_seq1 | 535.88 | 24.5% | Angiotensin-converting enzyme | XP_013773749 |
| comp33725_c0_seq1 | 65.33 | 16.8% | Angiotensin-converting enzyme | XP_013773749 |
| comp33936_c0_seq1 | 64.74 | 13.1% | Angiotensin-converting enzyme | XP_013773749 |
| TatCaTClc01 | 88.83 | 24.2% | Calcium toxin. Calcin | A0A1L4BJ42 |
| comp32319_c0_seq1 | 18.83 | 7.56% | Ectonucleoside triphosphate diphosphohydrolase 2-like | XP_013778001 |
| comp881_c0_seq1 | 452.55 | 18.7% | Elastase-like protein | CAX51421 |
| TatHDPND201 | 513.12 | 46.7% | HDP. NDBP-2 family | F1AWB0 |
| TatHDPND301 | 22.46 | 94.7% | HDP. NDBP-3 family | ALG64974 |
| ViVlp1 | 762.84 | 28.6% | HDP. NDBP-2 family | AGK88593 |
| ViAMP1 | 188.70 | 70.8% | HDP. NDBP-3 family | ALG64975 |
| TaHDPND401 | 254.77 | 61.5% | HDP. NDBP-4 family | I0DEB5 |
| TatHDPND403 | 37.65 | 100% | HDP. NDBP-4 family | I0DEB5 |
| ViCT2 | 882.53 | 76.9% | HDP. NDBP-4 family | I0DEB3 |
| TatEnzHya01 | 161.40 | 39.1% | Hyaluronidase | API81375 |
| comp15335_c0_seq1 | 92.74 | 10.3% | Hypothetical protein | CAX51393 |
| comp30560_c0_seq1 | 67.45 | 9.1% | Hypothetical protein | AEX09195 |
| comp31101_c0_seq1 | 103.49 | 29.1% | Hypothetical protein (allergen type) | CAX51409 |
| comp30730_c0_seq1 | 16.64 | 6.5% | Hypothetical protein RvY_03950 | GAU91754 |
| ViLa1lp1 | 47.91 | 63.3% | La1-like | AOF40216 |
| TatOthLa101 | 469.44 | 45.5% | La1-like | AOF40202 |
| comp34524_c0_seq1 | 74.23 | 11.5% | Metalloproteinase | XP_009865190 |
| TatEnzMtp04 | 31.25 | 18.8% | Metalloproteinase | AMO02513 |
| comp32637_c0_seq1 | 828.12 | 42% | Nucleotidase | XP_013774694 |
| comp26928_c1_seq1 | 214.51 | 23.2% | Other venom components | N/A |
| comp27809_c1_seq1 | 1255.98 | 24.4% | Other venom components | N/A |
| comp30392_c0_seq1 | 34.49 | 32.9% | Other venom components | CAX51433 |
| comp32982_c0_seq3 | 20.83 | 13.7% | Other venom components | N/A |
| comp43100_c0_seq1 | 70.29 | 15.3% | Other venom components | N/A |
| comp31198_c0_seq1 | 20.19 | 3.13% | Other venom components | N/A |
| TatEnzPA201 | 1616.89 | 45.5% | Phospholipase A2 | API81339 |
| TatEnzPA213 | 253.38 | 27% | Phospholipase A2 | API81335 |
| TatEnzPA215 | 877.33 | 50.2% | Phospholipase A2 | API81335 |
| TatEnzPA202 | 94.96 | 31.3% | Phospholipase A2 | API81335 |
| comp20627_c0_seq1 | 14.64 | 0.9% | Protein kinase C-binding protein NELL2-like | XP_022243213 |
| TatOthCRI06 | 24.43 | 15.9% | Putative cysteine-rich protein | JAT91149 |
| TatOthCRI07 | 10.69 | 23.1% | Putative cysteine-rich protein | API81352 |
| comp30427_c0_seq1 | 16.87 | 2.27% | Steryl-sulfatase-like isoform | XP_0193859 |