| Literature DB >> 35365832 |
Wei-Qiao Rao1,2, Konstantinos Kalogeropoulos1, Morten E Allentoft3,4, Shyam Gopalakrishnan4, Wei-Ning Zhao2, Christopher T Workman1, Cecilie Knudsen1, Belén Jiménez-Mena5, Lorenzo Seneci1, Mahsa Mousavi-Derazmahalleh3, Timothy P Jenkins1, Esperanza Rivera-de-Torre1, Si-Qi Liu2, Andreas H Laustsen1.
Abstract
Snake venoms represent a danger to human health, but also a gold mine of bioactive proteins that can be harnessed for drug discovery purposes. The evolution of snakes and their venom has been studied for decades, particularly via traditional morphological and basic genetic methods alongside venom proteomics. However, while the field of genomics has matured rapidly over the past 2 decades, owing to the development of next-generation sequencing technologies, snake genomics remains in its infancy. Here, we provide an overview of the state of the art in snake genomics and discuss its potential implications for studying venom evolution and toxinology. On the basis of current knowledge, gene duplication and positive selection are key mechanisms in the neofunctionalization of snake venom proteins. This makes snake venoms important evolutionary drivers that explain the remarkable venom diversification and adaptive variation observed in these reptiles. Gene duplication and neofunctionalization have also generated a large number of repeat sequences in snake genomes that pose a significant challenge to DNA sequencing, resulting in the need for substantial computational resources and longer sequencing read length for high-quality genome assembly. Fortunately, owing to constantly improving sequencing technologies and computational tools, we are now able to explore the molecular mechanisms of snake venom evolution in unprecedented detail. Such novel insights have the potential to affect the design and development of antivenoms and possibly other drugs, as well as provide new fundamental knowledge on snake biology and evolution.Entities:
Keywords: DNA sequencing; snake genomics; snake toxins; snakes; venom; venom evolution
Mesh:
Substances:
Year: 2022 PMID: 35365832 PMCID: PMC8975721 DOI: 10.1093/gigascience/giac024
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Schematic diagram of snake evolution based on data from Reptile-database.org [16]. Snakes (Serpentes) are divided into 3 main infraorders, Scolecophidia, Henophidia, and Alethinophidia, which together encompass ∼24 families (7 shown here). Families comprising venomous species are indicated with a skull and crossbones symbol. Colubridae constitutes the largest family of snakes, encompassing 52% of the ∼3,566 snake species currently described. The total number of currently described venomous snake species is 2,901, predominantly falling within the families Homalopsidae, Lamprophiidae, Colubridae, Elapidae, and Viperidae. Only snake species that have undergone whole-genome sequencing and assembly are listed in this figure.
Number of toxin-encoding genes for 22 toxin families in selected venomous and non-venomous reptile species
| Venom protein family | Non-venomous | Venomous | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Venom family abbreviation |
|
|
|
|
|
|
|
|
|
|
| |
| 5′-nucleotidases | 5Nase | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 5 | 1 | ||
| Acetylcholinesterase | ACeH | 22 | 11 | 12 | 16 | 2 | 14 | 7 | ||||
| Kunitz-type peptide | 86 | 39 | 49 | 53 | 3 | 70 | 2 | 2 | ||||
| Bradykinin-potentiating peptides and C-type natriuretic peptides | BNP | 1 | 3 | 1 | 6 | 3 | 2 | 1 | 1 | 1 | 1 | |
| Cysteine-rich secretory proteins | CRISPs | 2 | 1 | 1 | 2 | 3 | 7 | 2 | 2 | 4 | 2 | 1 |
| C-type lectins and C-type lectin-like proteins | CTLPs | 5 | 7 | 6 | 13 | 2 | 22 | 10 | 6 | 5 | 6 | |
| Disintegrins | Dis | 3 | 2 | |||||||||
| Factor V | 5 | 5 | 6 | 5 | 5 | 3 | ||||||
| Factor X | 9 | 11 | 11 | 11 | 11 | |||||||
| Hyaluronidases | HYAL | 5 | 6 | 6 | 1 | 6 | 3 | 6 | 1 | 4 | 1 | 1 |
| L-amino acid oxidases | LAAO | 4 | 5 | 6 | 2 | 3 | 3 | 4 | 1 | 4 | 2 | 2 |
| Nerve growth factors or neurotrophins | NGF | 5 | 5 | 5 | 5 | 3 | 4 | 1 | 2 | 1 | 1 | |
| Phosphodiesterases | PDE | 6 | 6 | 5 | 5 | 1 | 5 | 1 | 1 | |||
| Phospholipases A2 | PLA2 | 1 | 1 | 1 | 1 | 4 | 8 | 1 | 9 | 5 | 3 | 1 |
| Phospholipases B | PLB | 1 | 1 | 1 | 4 | 1 | 1 | 1 | 1 | 1 | ||
| Snake venom metalloproteinases | SVMP (PI) | 2 | 1 | |||||||||
| SVMP (PII) | 1 | 4 | 3 | 3 | 7 | |||||||
| SVMP (PIII) | 1 | 1 | 2 | 7 | 4 | 8 | 5 | 6 | 11 | 2 | 20 | |
| Snake venom serine proteinases | SVSP | 4 | 6 | 7 | 1 | 8 | 8 | 22 | 11 | 9 | 15 | 12 |
| Three-finger toxins | 3FTx | 5 | 19 | 4 | 2 | 3 | ||||||
| Vascular endothelial growth factors | VEGF | 4 | 7 | 7 | 5 | 6 | 6 | 1 | 3 | 1 | 1 | |
| Venom ficolins | Veficolins | 11 | 9 | 9 | 11 | 10 | 4 | 1 | ||||
| Vespryns/ohanin-like proteins | 90 | 40 | 52 | 39 | 1 | 42 | 1 | 1 | ||||
| Waprin | 5 | 3 | 3 | 4 | 3 | 1 | 1 | |||||
In venomous snake genomes, the numbers refer to the venom gland genome only. Non-venomous species lack venom glands, and the indicated numbers refer to homologous proteins expressed in other organs.
The green anole (Anolis carolinensis) was selected as outgroup taxon because it is a non-venomous, non-snake squamate with a complete genome sequence available.
Whole-genome sequencing studies on snakes, published or in progress
| Assembly | Annotation | Venomous | INSDC ID | Source | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Scientific classification | Sequencing� | Scaffold | |||||||||||
| Superfamily | Family | Genus | Species | Sequencing platform | DoC | GC% | N50 Size (kb) | Contig N50 Size (kb) | Genome Size (Gb) | Protein-encoding genes identified | |||
| Colubroidea | Viperidae |
|
| Illumina; PacBio; BAC-SeqSc | 150 IL20 PB | 163.5 | 2.1 | Yes | PRJNA691605 | [ | |||
|
|
| Illumina; PacBio | 100 | 36.6 | 139 | 15.74 | 1.3 | Yes | PDHV00000000.1 | [ | |||
|
| Illumina; PacBio | 190 IL33 PB | 39.9/39.8 | 207,720 | 2,110 | 1.6 | 18,240 | Yes | VORL00000000 | [ | |||
|
| Illumina | 40 | 38.5 | 5.1 | 4.1 | 1.1 | Yes | JPMF00000000.1 | [ | ||||
|
| Illumina | 135 | 34.3 | 23.8 | 5.8 | 1.5 | Yes | LVCR00000000.1 | [ | ||||
|
|
| Illumina | 96 | 38.2 | 467 | 3.8 | 1.4 | 20,540 | Yes | BFFQ00000000.1 | [ | ||
|
| Illumina | 86 | 40.6 | 424 | 22 | 1.6 | 20,122 | Yes | BCNE00000000.2 | [ | |||
|
|
| Illumina, PacBio | 1,045,000 | 1.6 | Yes | PRJNA750087 | [ | ||||||
|
|
| Illumina | 121 | 41.3 | 126.6 | 11.7 | 1.5 | Yes | JTGP00000000.1 | [ | |||
|
|
| Illumina | ♂ 114 ♀ 238 | 2,120 | 22.42 | 1.4 | 21,194 | Yes | DQ343647.1 | [ | |||
| Colubridae |
|
| Illumina | 13 | 38.3 | 4.3 | 2.39 | 1.4 | No | JTLQ00000000.1 | [ | ||
|
|
| Illumina | 185 | 43.6 | 2,414 | 16.8 | 1.8 | 20,995 | No | QLTV00000000 | [ | ||
|
|
| Illumina | 72 | 41.8 | 516 | 10.45 | 1.4 | Yes | LFLD00000000.1 | [ | |||
|
| Illumina; PacBio | 62 | 41.1 | 100,851 | 4,620 | 1.6 | 18,900 | Yes | PRJNA561996 | ||||
| Elapidae |
|
| Illumina | 28 | 40.6 | 226 | 3.98 | 1.6 | Yes | AZIM00000000.1 | [ | ||
|
|
| 73 | 40.1 | 14,685 | 50.44 | 1.6 | 19,358 | Yes | ULFR00000000.1 | [ | |||
|
|
| Illumina; PacBio | 71 | 40.2 | 5,997 | 31.76 | 1.6 | 19,770 | Yes | PRJEB27871 | [ | ||
|
|
| PacBio; Nanopore; Illumina | 250 | 40.4 | 223,350 | 303.98 | 1.79 | 23,248 | Yes | SOZL00000000.1 | [ | ||
|
|
| Illumina NovaSeq | 120 | 37.2 | 1,346 | 183 | 1.62 | 21,863 | Yes | PRJNA597425 | [ | ||
| Pythonoidea | Pythonidae |
|
| Illumina; Roche 454 | 20 | 39.7 | 214 | 10.66 | 1.4 | 19,793 | No | AEQU00000000.2 | [ |
| Booidea | Boidae |
|
| Illumina; Roche 454; PacBio | 125 | 1.6 | No | [ | |||||
PB stands for PacBio and IL for Illumina.
Figure 2:Schematic representation of the next-generation sequencing pipeline for genomic assembly. (1) Multiple companies have marketed sequencing platforms for genomic and transcriptomic studies, the most commonly used being Illumina (left), PacBio (middle), and Nanopore (right). (2) The 3 platforms differ in read length and accuracy of their generated sequences. Whilst Illumina sequencing generally yields short reads with low error rates, Nanopore sequences are substantially longer (≤2 Mb), yet subject to frequent sequencing errors. Last, PacBio generates sequences with lengths and error rates in between the 2 other platforms. (3) After sequencing, reads are computationally processed and assembled into contigs, which in turn (4) serve as the building blocks for scaffolds.The scaffolds are then aligned and annotated to produce the complete target genome.
Selected genomic features compared across several vertebrate lineages [21]
| Transposable elements content (%) | ||||
|---|---|---|---|---|
| Tetrapod taxon | Genome size (Gb) | GC content (%) | Range | Mean |
| Mammals | 2.2–6.0 | ∼40.9 | 33.4–56.4 | 44.5 |
| Birds | 1.2–2.1 | ∼40.2 | 4.6–10.1 | 7.8 |
| Colubroidea | 1.5–3.0 | 39.3–47.8 | 33.0–56.3 | 46.2 |
| Non-colubroid snakes | 1.7–2.1 | 38.8–43.4 | 28.7–48.7 | 38.7 |
| Scincoidea (skinks) | 1.3–2.6 | 43.2–46.1 | 34.3–44.0 | 37.6 |
Figure 3:Venom-related gene families in the P. flavoviridis genome. (A) Deduced evolutionary history of venom-related gene families through 2 rounds of whole-genome duplication (2R-WGD). An original set of 18 genes (shown in the top box) became 72 (4 copies each). Then, a single copy of each family was likely co-opted to develop toxic functions, resulting in 1 snake venom (SV) copy (shown in a pale red box in the right column) and 3 non-venom (NV) paralogs (shown in the see-through box to the left). (B) Tandem duplications of SVMP genes. (C) Tandem duplications of SVSP genes. (D) Tandem duplications of CTLP genes. Based on Fig. 2 and Fig. S8 from [15].
Figure 4:Syntenic comparison of toxin gene clusters. Comparison showing the 3FTx, CRISP, and SVMP genes in N. naja and C. viridis genomes. Orthologous gene pairs are indicated by the line linked across the corresponding genomic regions. Based on Fig. 4 and Extended Fig. 4 from [27].