Agnatha is an ancient superclass of jawless fish that gave rise to all other vertebrates after diverging from chordates ~535 million years ago (mya) (Janvier, 1981). Currently, modern agnathans species include only 43 hagfish and 40 lampreys (Forey, 1995). These two orders have remained relatively morphologically unchanged since diverging 488–443 mya (Xian‐Guang et al., 2002).In vertebrates, viruses from birds and mammals are considerably better characterized than amphibian, reptile, and fish viruses (Essbauer & Ahne, 2001). As a subset of fish viruses, agnathan viruses are even more underexplored than teleost viruses. This lack of agnathan virus sampling limits our understanding of the complete fish virome and our ability to monitor emerging fish viruses in the aquaculture industry.Prior to next‐generation sequencing and bioinformatics approaches, only one lamprey virus was known: the viral haemorrhagic septicaemia virus (VHSV) from the Rhabdoviridae family. Detection of this virus relied on molecular and serology‐based methods (Gadd et al., 2010); however, more recent large‐scale meta‐transcriptomics enabled the discovery of eight new viruses in agnathans from the viral families Astroviridae, Caliciviridae, Hantaviridae, Hepeviridae, and Orthomyxoviridae (Shi et al., 2018), demonstrating the yet unsampled diversity of agnathan viruses.Through screening lesser explored viromes, new diverse viruses can be discovered in fish and higher vertebrates. The aim of this study was to identify and characterize novel viral sequences in agnathan RNA‐sequencing (RNA‐seq) datasets and determine the genetic relationship of these sequences to extant viruses.
MATERIALS AND METHODS
Raw RNA‐seq datasets (n = 150; Table S1) encompassing three hagfish species and nine lamprey species were downloaded from the National Centre of Biotechnology Information (NCBI) Sequence Read Archive (SRA) database using SRA Toolkit v.2.9.6‐1 (Leinonen et al., 2011). The reads were assembled de novo with Trinity v.2.8.4 (Haas et al., 2013) and annotated using DIAMOND v.0.9.31 (Buchfink et al., 2015) against the NCBI non‐redundant (nr) database v.2.11.0 (E‐value = 1e−3). Contigs annotated as viral sequences were filtered to remove duplicates, and multiple hits were merged into one transcript. These consolidated sequences were used as queries in a reciprocal BLASTx search (E‐value = 1e−3) against the NCBI nr database to remove non‐viral sequences. Out of these results, the top viral hit for each virus‐like transcript was compiled. Two custom Python scripts were used throughout the workflow to consolidate hits and filter output (Harding et al., 2021). The nucleotide sequences of the virus‐like transcripts that were fragments of a single viral genome were mapped to its closest BLASTx hit. The resulting assembled genome was annotated with reference to related viruses.Phylogenies were inferred following alignment of novel viral sequences with their closest relatives and outgroup viral family sequences. Amino acid alignments were used where possible, with nucleotide alignments undertaken only if multiple stop codons existed within viral open reading frames. All alignments were conducted using MAFFT (v.7.450) (Katoh & Standley, 2013) with default settings and phylogenetic analysis was conducted using RAxML v.8.2.11 (Stamatakis, 2014). Trees were constructed with 500 bootstrap replicates and outgroup rooted (Russo et al., 2018).
RESULTS AND DISCUSSION
Agnathan viruses expand the host range of previously known viral families
Out of the 150 agnathan transcriptomes screened, three hagfish and three lamprey transcriptomes contained five novel viral sequences with identity to the Caliciviridae (n = 2), Tobaniviridae (n = 1), Retroviridae (n = 1) and Chuviridae (n = 1) families (Table 1).
TABLE 1
Viral sequences discovered in this study
Viral sequence discovered
Viral sequence sample source (Binomial name)
NCBI SRA accession(s) of sample source
Closest BLASTx viral hit (NCBI accession)
Viral family
% Nucleotide pairwise identity with closest hit
Host species of closest viral hit
Reference of closest relative
Singapore brook lamprey calicivirus (SBLCV)
Brook lamprey (Lampetra planeri) adult pancreas
SRR5226597
Dongbei arctic lamprey calicivirus 1 (MG599967)
Caliciviridae
86
Arctic lamprey (Lethenteron camtschaticum)
Shi et al. (2018)
Normandy brook lamprey calicivirus (NBLCV)
Brook lamprey (Lampetra planeri) larval intestine
SRR6329412
Dongbei arctic lamprey calicivirus 1 (MG599967)
Caliciviridae
52
Arctic lamprey (Lethenteron camtschaticum)
Shi et al. (2018)
Inshore hagfish bafinivirus (IHBV)
Inshore hagfish (Eptatretus burgeri) juvenile head
Two novel caliciviruses, named Singapore brook lamprey calicivirus (SBLCV; Figure 1a) and Normandy brook lamprey calicivirus (NBLCV; Figure 1b) were discovered in two different brook lamprey (Lampetra planeri) datasets from Singapore and France, respectively (Table 1). Caliciviruses are small, non‐enveloped viruses with an ~8 kb single‐stranded positive sense RNA (+ssRNA) genome, known to cause a range of diseases in vertebrates (Vinjé et al., 2019). The SBLCV sequence represents a full‐length calicivirus genome (8547 nt), while the NBLCV sequence (5448 nt) is missing ~3000 nt at the 5’ end of the genome and has a 35 nt sequencing gap within the polymerase region (Figure 1b). SBLCV and NBLCV shared 50% identity (over 5448 nt) with each other and were both closely related to Dongbei arctic lamprey calicivirus 1 (DALC1; MG599967) (Shi et al., 2018); SBLCV had 86% identity over 8547 nt to DALC1, while NBLCV had 52% identity over 5448 nt to DALC1.
FIGURE 1
Phylogeny and genome structure of two caliciviruses detected in brook lamprey transcriptomes. (a) Genome organization of Singapore brook lamprey calicivirus (SBLCV). Contigs from the brook lamprey pancreas RNA‐seq dataset (SRR5226597) are represented by black bars that were aligned to Dongbei arctic lamprey calicivirus 1 (DALC1). An annotated to‐scale consensus sequence of the SBLCV genome was generated from this alignment. (b) Genome organization of Normandy brook lamprey calicivirus (NBLCV) genome. Contigs from the larval brook lamprey intestinal RNA‐seq dataset (SRR6329412) were aligned to DALC1. An annotated to‐scale consensus sequence of the NBLCV genome was generated from this alignment. Conserved enzymatic motifs are indicated above the predicted polyprotein. Nucleotide positions are shown above or below the genome. Incomplete sequences or missing bases are represented by jagged‐ended bars. (c) Maximum likelihood (ML) phylogeny of the ORF1 encoded non‐structural polyprotein (2783 aa positions) of SBLCV, NBLCV, 58 other caliciviruses, and three dicistroviruses used as an outgroup. (d) ML phylogeny of ORF2 structural polyprotein (794 aa positions) of SBLCV, NBLCV, 56 other caliciviruses, and three dicistroviruses used as an outgroup. Alignments were created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogenies inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the GAMMA BLOSUM62 protein substitution model. Shaded colour represents host range (yellow = birds; orange = mammals; purple = reptiles; blue = fish; green = amphibians); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents aa substitutions per site
Phylogeny and genome structure of two caliciviruses detected in brook lamprey transcriptomes. (a) Genome organization of Singapore brook lamprey calicivirus (SBLCV). Contigs from the brook lamprey pancreas RNA‐seq dataset (SRR5226597) are represented by black bars that were aligned to Dongbei arctic lamprey calicivirus 1 (DALC1). An annotated to‐scale consensus sequence of the SBLCV genome was generated from this alignment. (b) Genome organization of Normandy brook lamprey calicivirus (NBLCV) genome. Contigs from the larval brook lamprey intestinal RNA‐seq dataset (SRR6329412) were aligned to DALC1. An annotated to‐scale consensus sequence of the NBLCV genome was generated from this alignment. Conserved enzymatic motifs are indicated above the predicted polyprotein. Nucleotide positions are shown above or below the genome. Incomplete sequences or missing bases are represented by jagged‐ended bars. (c) Maximum likelihood (ML) phylogeny of the ORF1 encoded non‐structural polyprotein (2783 aa positions) of SBLCV, NBLCV, 58 other caliciviruses, and three dicistroviruses used as an outgroup. (d) ML phylogeny of ORF2 structural polyprotein (794 aa positions) of SBLCV, NBLCV, 56 other caliciviruses, and three dicistroviruses used as an outgroup. Alignments were created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogenies inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the GAMMA BLOSUM62 protein substitution model. Shaded colour represents host range (yellow = birds; orange = mammals; purple = reptiles; blue = fish; green = amphibians); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents aa substitutions per sitePhylogenetic analyses of the non‐structural polypeptide (Figure 1c) and VP1 (Figure 1d) were performed to compare the evolutionary relationship between agnathan caliciviruses and other caliciviruses. SBLCV and NBLCV sequences both clustered closely with DALC1 (Figure 1). According to the International Committee on Taxonomy of Viruses (ICTV), a new calicivirus genus is defined by >60% amino acid (AA) difference in the VP1 sequence (Vinjé et al., 2019). The SBLCV VP1 sequence differed from DALC1 by 5% (Figure 1a), and therefore, is likely another variant of DALC1, while the NBLCV VP1 sequence differed from DALC1 and SBLCV by 73% (Figure 1b), thus constituting a separate calicivirus genus. The discovery of further lamprey caliciviruses indicates the vast range of similar undiscovered viruses across other fish species.Caliciviruses cause severe disease in fish (Mor et al., 2017; Mikalsen et al., 2014). For instance, fathead minnow calicivirus (FHMCV) and Atlantic salmon calicivirus (ASCV), have been isolated from teleost fish with systemic disease and clinical symptoms such as haemorrhages and lesions at the time of sampling (Mor et al., 2017; Mikalsen et al., 2014). In contrast, the San Miguel sea lion virus (SMSV) types 5 and 7 (Smith et al., 1998) causes asymptomatic systemic infections in opaleye perch, but upon ingestion by mammals severe clinical diseases such as vesicular exanthemas can occur (Smith et al., 1998).Since SBLCV and DALC1 were discovered in RNA‐Seq datasets obtained from fish not known to exhibit notable clinical symptoms, their pathogenicity is unclear. However, this does not exclude the pathogenic potential of these viruses to agnathans and other fish.
Bafinivirus in a hagfish; a complex virus in a simple fish
The inshore hagfish bafinivirus (IHBV; Figure 2a) was discovered in the RNA‐seq dataset of an inshore hagfish (Eptatretus burgeri) from Japan (Table 1) and shared 53% pairwise identity over 16,804 nt with fathead minnow nidovirus (FHMNV; NC_038295) (Batts et al., 2012). FHMNV belongs to the genus Bafinivirus within the Nidovirales order. Nidoviruses are large enveloped, helical viruses with +ssRNA genomes with a host range encompassing all vertebrates and some arthropods, such as ticks and crustaceans. Bafinivirus genomes are ~27 kb in length and have only thus far been identified in teleosts (De Groot et al., 2011). Phylogenetic analysis of the nidovirus spike (S) sequence (1,879 aa) was performed to compare the evolutionary relationship between IHBV, FHMNV and other nidoviruses (Figure 2b). IHBV clustered with known teleost viruses that infect salmon, bream and minnow, which suggests the existence of an agnathan clade of bafiniviruses related to those in teleosts such as FHMNV, Chinook salmon bafinivirus (CSBV; NC_026812) and white bream virus (WBV; NC_008516) (Figure 2). There is 122 million years of evolutionary distance between the teleosts and agnathans and therefore for some of their viruses too, suggesting a wide diversity of bafiniviruses that remain undiscovered in all fish.
FIGURE 2
Alignment and phylogenetic trees of inshore hagfish bafinivirus. (a) Genome organization of the inshore hagfish bafinivirus (IHBV). One contig from the juvenile inshore hagfish head sample (SRR5234495) was aligned to the fathead minnow nidovirus (FHMNV). Conserved enzymatic motifs are indicated above the polyprotein. Nucleotide positions are shown above or below the genome. Black bent arrows represent the start of transcription for each ORF. Incomplete sequences or missing bases are represented by jagged‐ended bars. (b) Maximum likelihood (ML) phylogeny of ORF1b (2362 aa positions) of IHBV and 28 other viruses from the order Nidovirales. (c) ML phylogeny of the spike (S) protein (1879 aa positions) of IHBV, 28 other viruses from the order Nidovirales. Alignments were created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogenies inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the GAMMA BLOSUM62 protein substitution model. Shaded colour represents host range (orange = insects; green = mammals; pink = reptiles; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents aa substitutions per site
Alignment and phylogenetic trees of inshore hagfish bafinivirus. (a) Genome organization of the inshore hagfish bafinivirus (IHBV). One contig from the juvenile inshore hagfish head sample (SRR5234495) was aligned to the fathead minnow nidovirus (FHMNV). Conserved enzymatic motifs are indicated above the polyprotein. Nucleotide positions are shown above or below the genome. Black bent arrows represent the start of transcription for each ORF. Incomplete sequences or missing bases are represented by jagged‐ended bars. (b) Maximum likelihood (ML) phylogeny of ORF1b (2362 aa positions) of IHBV and 28 other viruses from the order Nidovirales. (c) ML phylogeny of the spike (S) protein (1879 aa positions) of IHBV, 28 other viruses from the order Nidovirales. Alignments were created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogenies inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the GAMMA BLOSUM62 protein substitution model. Shaded colour represents host range (orange = insects; green = mammals; pink = reptiles; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents aa substitutions per siteNidovirus infections can cause disease and lead to death in a wide variety of vertebrate hosts and some crustacean species (Wongteerasupaya et al., 1995). FHMNV, the closest relative of IHBV, is known to cause clinical symptoms such as haemorrhaging of the eyes and skin, and even death in baitfish farms in the USA (Batts et al., 2012). CSBV is another bafinivirus also known to cause severe FHMNV‐like clinical symptoms in a range of fish species including goldfish (Carassius auratus) and Chinook salmon (Cano et al., 2020). Given the pathogenic potential of related nidoviruses, IHBV could cause severe diseases in agnathans and teleosts; thus, further studies are warranted to fully characterize its pathogenicity.
Ancestral retrovirus and chuvirus sequences likely endogenized in agnathan genomes
Occasionally, viruses integrate into the host genome and are passed on through subsequent generations, and once fixed in a population, they are termed endogenous viral elements (EVEs) (Holmes, 2011). EVEs that produce transcripts can be differentiated from infecting viruses due to the presence of stop codons in the RNA (Harding et al., 2021).Two EVEs were discovered in the transcriptomes of an inshore hagfish from Japan and a shortheaded lamprey (Mordacia mordax) from Australia. Four chuvirus‐like transcripts were found in the inshore hagfish (Figure 3; Table 1), each sharing 50%–64% pairwise identity over 2300–2400 nt with the Guangdong red‐banded snake chuvirus‐like virus (GRSChuV; MG600009) (Shi et al., 2018). The shortheaded lamprey RNA‐seq dataset revealed two non‐overlapping sequences (Figure 4; Table 1) that collectively shared 54% pairwise identity over 5837 nt with the retrovirus, Atlantic salmon swim bladder sarcoma virus (SSSV; NC_007654) (Paul et al., 2006). Stop codons comprise ~1%–4% of these six sequences and as a result, are unlikely to be translated into any functional proteins.
FIGURE 3
Alignment and phylogenetic trees of inshore hagfish chuvirus‐like sequences. (a) Maximum likelihood (ML) phylogeny of the nucleoprotein gene (2550 nt positions) of two inshore hagfish chuvirus‐like contigs, 11 other chuviruses, and 2 bornaviruses rooted used as an outgroup. Alignment was created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogeny inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the General Time Reversible (GTR) GAMMA nucleotide substitution model. Shaded colour represents host range (orange = ticks; pink = reptiles; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents nt substitutions per site. (b) Alignment of chuvirus‐like contigs to Guangdong red‐banded snake chuvirus‐like virus (GRSChuV). Four contigs in the RNA‐seq datasets of two replicates of an inshore hagfish testis sample (ERR2061165, ERR2061166) are represented by black bars with nt lengths indicated above the bars. Contigs from the same transcriptome are connected by a black line. Nucleotide positions are shown above the genome
FIGURE 4
Alignment and phylogenetic trees of shortheaded lamprey retrovirus‐like sequences. (a) Maximum likelihood (ML) phylogeny of the pol gene (3179 nt positions) of a shortheaded lamprey retrovirus‐like contig, 33 other retroviruses, and 13 NYNRIN‐like fish proteins. Alignment was created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogeny inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the General Time Reversible (GTR) GAMMA nucleotide substitution model. Shaded colour represents host range (orange = mammals; yellow = birds; green = amphibians; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents nt substitutions per site. (b) Alignment of retrovirus‐like contigs to Atlantic salmon swim bladder sarcoma virus (SSSV). Two contigs in the RNA‐seq datasets of a shortheaded lamprey (SRR2146922) are represented by black bars. Grey boxes superimposed on the black bars represent the putative ORFs. Nucleotide positions are shown above nt sequences. Conserved enzymatic motifs are indicated above the polyprotein
Alignment and phylogenetic trees of inshore hagfish chuvirus‐like sequences. (a) Maximum likelihood (ML) phylogeny of the nucleoprotein gene (2550 nt positions) of two inshore hagfish chuvirus‐like contigs, 11 other chuviruses, and 2 bornaviruses rooted used as an outgroup. Alignment was created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogeny inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the General Time Reversible (GTR) GAMMA nucleotide substitution model. Shaded colour represents host range (orange = ticks; pink = reptiles; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents nt substitutions per site. (b) Alignment of chuvirus‐like contigs to Guangdong red‐banded snake chuvirus‐like virus (GRSChuV). Four contigs in the RNA‐seq datasets of two replicates of an inshore hagfish testis sample (ERR2061165, ERR2061166) are represented by black bars with nt lengths indicated above the bars. Contigs from the same transcriptome are connected by a black line. Nucleotide positions are shown above the genomeAlignment and phylogenetic trees of shortheaded lamprey retrovirus‐like sequences. (a) Maximum likelihood (ML) phylogeny of the pol gene (3179 nt positions) of a shortheaded lamprey retrovirus‐like contig, 33 other retroviruses, and 13 NYNRIN‐like fish proteins. Alignment was created with MAFFT v.7.450 (Katoh & Standley, 2013), trimmed manually, and phylogeny inferred with RAxML v.8.2.11 (Stamatakis, 2014) using the General Time Reversible (GTR) GAMMA nucleotide substitution model. Shaded colour represents host range (orange = mammals; yellow = birds; green = amphibians; blue = fish); images indicate host type. Genbank accessions prefix the labels of each sequence used. Node labels indicate bootstrap support from 500 replicates (%). Scale bar represents nt substitutions per site. (b) Alignment of retrovirus‐like contigs to Atlantic salmon swim bladder sarcoma virus (SSSV). Two contigs in the RNA‐seq datasets of a shortheaded lamprey (SRR2146922) are represented by black bars. Grey boxes superimposed on the black bars represent the putative ORFs. Nucleotide positions are shown above nt sequences. Conserved enzymatic motifs are indicated above the polyproteinThe detection of these EVEs indicates prior infection with an ancestral related virus. Despite the unclear evolutionary path of Chuviridae, agnathan chuviruses could inform of the evolution of these viruses and the threat they may pose to vertebrate hosts.
CONCLUSION
From this study, it is evident that agnathan viruses are largely unexplored and their threat to both agnathans and teleosts is unknown. Lamprey and hagfish viruses range from simpler viruses such as caliciviruses, to complex viruses such as bafiniviruses. The novel viruses discovered in this study are closely related to highly pathogenic teleost viruses and could be similarly pathogenic in agnathans and possibly other fish. By exploring the range of viruses in agnathans, a more complete virome can be established to enable future viral discovery and to understand the pathogenic challenges that fish face.
CONFLICT OF INTEREST
The authors have declared no conflict of interest.Table S1Click here for additional data file.
Authors: Sunil Kumar Mor; Nicholas B D Phelps; Terry Fei Fan Ng; Kuttichantran Subramaniam; Alexander Primus; Anibal G Armien; Rebekah McCann; Corey Puzach; Thomas B Waltzek; Sagar M Goyal Journal: Arch Virol Date: 2017-08-16 Impact factor: 2.574