Literature DB >> 35860404

Comparative genomic analysis reveals new evidence of genus boundary for family Iridoviridae and explores qualified hallmark genes.

Ruoxuan Zhao1,2, Congwei Gu1,3, Xiaoxia Zou4, Mingde Zhao1,3, Wudian Xiao1,3, Manli He1,3, Lvqin He1,3, Qian Yang1,3, Yi Geng5, Zehui Yu1,3,6.   

Abstract

Members of the family Iridoviridae (iridovirids) are globally distributed and trigger adverse economic and ecological impacts on aquaculture and wildlife. Iridovirids taxonomy has previously been studied based on a limited number of genomes, but this is not suitable for the current and future virological studies as more iridovirids are emerging. In our study, 57 representative iridovirids genomes were selected from a total of 179 whole genomes available on NCBI. Then 18 core genes were screened out for members of the family Iridoviridae. Average amino acid sequence identity (AAI) analysis indicated that a cut-off value of 70% is more suitable for the current iridovirids genome database than ICTV-defined 50% threshold to better clarify viral genus boundaries. In addition, more subgroups were divided at genus level with the AAI threshold of 70%. This observation was further confirmed by genomic synteny analysis, codon usage preference analysis, genome GC content and length analysis, and phylogenic analysis. According to the pairwise comparison analysis of core genes, 9 hallmark genes were screened out to conduct preliminary identification and investigation at the genus level of iridovirids in a more convenient and economical manner.
© 2022 The Author(s).

Entities:  

Keywords:  Codon usage; Core genes; Iridoviridae; Phylogenetics; Synteny analysis; Taxonomy

Year:  2022        PMID: 35860404      PMCID: PMC9284377          DOI: 10.1016/j.csbj.2022.06.049

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   6.155


Introduction

Members of the Iridoviridae family (designated iridovirids) are a diverse collection of large DNA viruses (approximately 120–300 nm in diameter) with linear, double-stranded circularly permutated and terminally redundant DNA genomes enclosed within an icosahedral capsid [1]. The family is currently devided into two subfamilies: Alphairidovirinae and Betairidovirinae. The former comprised three genera: Ranavirus, whose members mainly infects fish, amphibians, and reptiles; Lymphocystivirus and Megalocytivirus, whose members target only on bony fish. Another subfamily contains four genera (Iridovirus, Chloriridovirus, Decapodiridovirus, and Daphniairidovirus), which mainly infect invertebrates such as crustaceans and insects [2]. In the past two decades, reports of iridovirids infections have markedly increased, which reflects the fact that viruses of this family, once viewed as obscure viruses with little economic or ecological impact, are now known to be widespread in nature with significant impact on modern aquaculture and wildlife [3]. For example, the annual production of freshwater bass in China exceeds 620,000 tons (China Fishery Statistics Yearbook, 2021), and this large-scale aquaculture industry has been severely affected by Santee-Cooper ranavirus infection, especially in seasons with higher temperature, causing considerable economic losses [4]. Some ranaviruses have been linked to declining amphibian populations and represent a range of emerging infectious diseases that may even lead to population extinctions [5], [6]. Moreover, human activities can accelerate the spread of certain iridovirids, as seen in the case of tiger salamander (Ambystoma tigrinum) die-offs throughout western North America [7]. Nowadays, sequence comparisons using both pairwise sequence similarities and phylogenetic relationships have become one of the primary sets of characters used to define and differentiate virus taxa [8]. With the identification of shrimp hemocyte iridescent virus (SHIV) [9] and Cherax quadricarinatus iridovirus (CQIV) [10], a new genus (Decapodiridovirus) was established, followed by the seventh genus (Daphniairidovirus) within the family Iridoviridae that contains a single species (Daphniairidovirus tvaerminne, DIT) (ICTV proposal: 2020.018D). However, with the emergence of iridovirids and the advancement of modern sequencing technologies and bioinformatics, there are controversies regarding iridovirids taxonomy. Some researchers have proposed the construction of new genera to distinguish classified members of the family Iridoviridae [3], [11]. Therefore, we would like to verify the feasibility of the current ICTV proposed genus demarcation criteria used for iridovirids, which is that members of a given genus share less than 50% amino acid sequence identity (AAI) with members of other genera (file code: 2018.007D). Otherwise, we will find an more appropriate AAI threshold for members of the Iridoviridae taxonomy. In this study, we re-annotated and systematically compared 179 Iridoviridae virus genomic nucleic acid sequences available in National Center for Biotechnology Information (NCBI) virus database. Eighteen core genes were redefined based on 57 representative genomes. Importantly, we proposed a new AAI cut-off value (70%) that is more suitable for current iridovirids genome databases and more conducive to understand the genera demarcation within the Iridoviridae family. This proposal was further confirmed by genomic-based synteny, phylogeny and codon usage preference analysis. In addition, 9 hallmark genes were selected for iridovirids identification and investigation at genus level.

Results

Data collection and remove replicate genomes

The finalized dataset comprises 196 iridovirids genomes, including 22 species among 7 genera, of which the taxonomy of 179 strains can be found in the ICTV Master Species List 2020 (Supplementary Table S1). Iridovirids genome size ranging from 100 to 288kbp, with GC contents ranging between 26% ∼ 55%. Pairwise comparisons of 196 iridovirids genomes were performed by CompareM v0.1.2 and the average amino-acid identity (AAI) values were calculated. Genomes with AAI values ≥ 99% were grouped into a cluster which are considered to be replicate genomes. As shown in Fig. 1A, 35 clusters and 22 singleton viral genomes were generated from 196 iridovirids genomes. Phylogenetic analysis of whole genome sequences also confirmed the similar evolutionary distances of viruses in the same group of AAI analysis (Fig. 1B). Eventually, a total of 57 representative genomes (the most studied genome from each of clusters), were screened out for later analysis (Supplementary Table S2).
Fig. 1

(A) The AAI network built using the genomes of 196 Iridoviridae viruses. The edge represents AAI ≥ 99% between two nodes, and each node and color represent one genome and a cluster, respectively. (B) The viral proteomic tree (ViPTree) based on whole genome sequences. Different colored branches and outermost circles indicate different clusters. Branch length indicates evolutionary distance.

(A) The AAI network built using the genomes of 196 Iridoviridae viruses. The edge represents AAI ≥ 99% between two nodes, and each node and color represent one genome and a cluster, respectively. (B) The viral proteomic tree (ViPTree) based on whole genome sequences. Different colored branches and outermost circles indicate different clusters. Branch length indicates evolutionary distance.

Determination of iridovirids strict core genes

The Prokka v1.14.6 package was used to re-annotate 57 representative iridovirids genomes, generated 6922 coding sequences (CDS) in total. Pairwise comparisons of all CDS were performed using BLASTp with an E-value threshold of 1e-5, which eventually generated 485 orthogroups. Conserved genes of top 28 orthogroups and Eaton previously identified core genes are shown in Fig. 2 [5]. For selection of eligible core genes, orthogroups that contain subgroups due to paralogous genes, or possess less than 57 iridovirids genomes, should be excluded. For instance, orthogroup #1 contains gene cg7, cg18, and cg19; only 48 of 57 representative iridovirids genomes encode gene that belong to orthogroup #26 (Table 1). Finally, eighteen core genes qualified (Supplementary file_1).
Fig. 2

The BLASTp network of top 28 orthogroups. Each node represents one amino acid sequence. The edge represents percentage of identical matches >0 between two nodes (E-value threshold of 1e-5). Core genes defined by Eaton are colored.

Table 1

Conserved genes and core genes of iridovirids.

OrthogroupsNumber of nodesNumber of iridoviridsaCore genes defined by EatonbGene nameQualified core genec
#194657cg7,cg18,cg19Putative tyrosin kinase, Serine-threonine protein kinaseno
#213434NAHypothetical proteinno
#39657cg14Ribonuclease IIIno
#46057cg10Myristilated membrane proteinyes
#56057cg3Putative NTPase Iyes
#66057cg2DNA-dep RNA pol-II Largest subunityes
#75857cg9Unknownyes
#85857cg17Putative XPPG-RAD2-type nucleaseyes
#95857cg6D5 family NTPase involved in DNA replicationyes
#105857cg12DNA-dep RNA pol-II second largest subunityes
#115757new_cg2Unknownyes
#125757cg8NIF-NLI interacting factoryes
#135757new_cg4Deoxynucleoside kinaseyes
#145757new_cg6Immediate early protein ICP-46yes
#155757cg1Putative replication factor and/or DNA binding-packingyes
#165757cg4ATPase-like proteinyes
#175757cg5Helicase familyyes
#185757cg11DNA pol Family B exonucleaseyes
#195757cg16Major capsid proteinyes
#205757new_cg5Erv1/Alr familyyes
#215757new_cg7Hypothetical proteinyes
#225656new_cg3Transcription elongation factor TFIISno
#235656NAHypothetical proteinno
#245553NAHypothetical proteinno
#255555cg13Ribonucleotide reductase small subunitno
#264848cg15Proliferating cell nuclear antigenno
#274646NAHypothetical proteinno
#284747NAHypothetical proteinno

a: This value represents the number of viral genome that encoded corresponding genes. Value = 57 indicate strict core genes (genes present in all strains), values < 57 indicate soft core genes (genes present in partial strains).

b: NA means not defined by Eaton.

c: Eligibility of core gene needs to meet two conditions, namely (1) being a strict core gene and (2) no more than three paralogous genes in an orthogroup (number of iridoviridsa ≤ 60).

The BLASTp network of top 28 orthogroups. Each node represents one amino acid sequence. The edge represents percentage of identical matches >0 between two nodes (E-value threshold of 1e-5). Core genes defined by Eaton are colored. Conserved genes and core genes of iridovirids. a: This value represents the number of viral genome that encoded corresponding genes. Value = 57 indicate strict core genes (genes present in all strains), values < 57 indicate soft core genes (genes present in partial strains). b: NA means not defined by Eaton. c: Eligibility of core gene needs to meet two conditions, namely (1) being a strict core gene and (2) no more than three paralogous genes in an orthogroup (number of iridoviridsa ≤ 60).

Whole genome AAI analysis

According to the ICTV proposed genus demarcation criteria of Iridoviridae viruses, we obtained the AAI network for 57 iridovirids genomes based on a cut-off value of 50%, generating seven genera (Fig. 3A1). However, average amino acid identity analysis of 57 representative viral genomes showed that an AAI threshold of 50% is prone to include some dispersed CDS into one group, whereas a threshold of around 70% is able to concentrate similar proteins into one group (Fig. 3B). With the AAI cut-off value at 70%, members in Iridoviridae were divided into fourteen subgroups (Fig. 3A2). Ranavirus and Chloriridovirus are split into three subgroups; Megalocytivirus, Lymphocystivirus, and Iridovirus were divided into two subgroups, separately. Therefore, we propose a new AAI threshold of genus boundary for the family Iridoviridae, and verify this proposal in the remainder of this study.
Fig. 3

(A) AAI network of 57 iridovirids genomes (A1: cut-off ≥ 50%, A2: cut-off ≥ 70%). Each node represents one genome. Nodes connected by lines indicate that the AAI value of connected nodes is ≥ 50% or 70%. (B) Violin plot of overall identity analysis of 6922 CDS encoded by 57 iridovirids genomes. Each point represents an identity value.

(A) AAI network of 57 iridovirids genomes (A1: cut-off ≥ 50%, A2: cut-off ≥ 70%). Each node represents one genome. Nodes connected by lines indicate that the AAI value of connected nodes is ≥ 50% or 70%. (B) Violin plot of overall identity analysis of 6922 CDS encoded by 57 iridovirids genomes. Each point represents an identity value.

Synteny analysis

The amino acid sequences and nucleotide sequences of 18 core genes encoded by 57 representative iridovirids were compared pairwise and identity value were calculated (Fig. 4A). A threshold of 75% was revealed to be suitable for classifying iridovirids at single gene level using amino acid sequences. Whereas the division boundary was not clear by analyzing nucleic acid sequences. Subsequently, the full landscape of protein sequences linear relationship of representative iridovirids was assessed, with an identity threshold set at 75% (Fig. 4B). It showed that the genera Ranavirus and Chloriridovirus had three subgroups each, and the genera Megalocytivirus and Lymphocystivirus were divided into two subgroups, separately.
Fig. 4

(A) Violin plots of the percentage of identical matches of amino acid sequences and nucleic acid sequences of core genes. Each point represents the percentage of identical matches between the two aligned sequences (Left). Points have been removed for clarity of observation (Right). (B) Synteny analysis of representative iridovirids amino acid sequences (identity threshold at 75%). Each corresponding block represents the collinear comparison of two viruses. If there were no collinear amino acid sequence at a 75% identity between two viruses, the block would be blank.

(A) Violin plots of the percentage of identical matches of amino acid sequences and nucleic acid sequences of core genes. Each point represents the percentage of identical matches between the two aligned sequences (Left). Points have been removed for clarity of observation (Right). (B) Synteny analysis of representative iridovirids amino acid sequences (identity threshold at 75%). Each corresponding block represents the collinear comparison of two viruses. If there were no collinear amino acid sequence at a 75% identity between two viruses, the block would be blank.

Phylogenetic analysis

The IQ-TREE program and the iTol web server was used to preformed a maximum likelihood-based phylogenetic analysis of concatenated core genes of 57 representative iridovirids (Fig. 5). More than seven major clades come out in the phylogenetic tree. ICTV genus Chloriridovirus and Ranavirus are each divided into three monophyletic clades; Iridivirus, Megalocytivirus, and Lymphocystivirus were each divided into two monophyletic clades. Megalocytivirus_2 and Ranavirus_2 subgroup contain only one genome and diverge significantly from other subgroups of the respective ICTV genus, as indicated by the relatively long branch lengths at this node. Meanwhile, genome GC content and size statistics also provided strong evidence of differences among members in all monophyletic clades (Fig. S1, Fig. 5). For example, Lymphocystivirus 1 and Lymphocystivirus 2 have an average genome size of 200 and 105kbp, respectively; Megalocytivirus 1 and Megalocytivirus 2 have a GC content of 55% and 36%, respectively (Fig. S1).
Figure 5

Phylogenic tree of iridovirids. Maximum likelihood analysis based on concatenated core genes of representative iridovirids (best-fit model according to BIC: Q.yeast + R6). The tree was rooted on midpoint. The first column of colored branches and bars represents ICTV classified iridovirids genera. The second column of colored bars represents genera or subgroup classified in this study (AAI identity cut-off ≥ 75%). The third column of colored bars represents a heat map of the GC content of the viral genome (Fig. S1). The grey bars in the last column represent viral genome size (Fig. S1). Branch length indicates evolutionary distance. The size of the point on the branch represents the bootstrap value >75.

Phylogenic tree of iridovirids. Maximum likelihood analysis based on concatenated core genes of representative iridovirids (best-fit model according to BIC: Q.yeast + R6). The tree was rooted on midpoint. The first column of colored branches and bars represents ICTV classified iridovirids genera. The second column of colored bars represents genera or subgroup classified in this study (AAI identity cut-off ≥ 75%). The third column of colored bars represents a heat map of the GC content of the viral genome (Fig. S1). The grey bars in the last column represent viral genome size (Fig. S1). Branch length indicates evolutionary distance. The size of the point on the branch represents the bootstrap value >75.

Codon usage bias analysis

Codon usage bias (CUB) is the mechanism of unequal usage of synonymous codons in mature mRNA molecules, and a distinctive property of viral genome and very specific even for a species [12], [13], [14]. Correspondence analysis (CoA) based on the relative synonymous codon usage (RSCU) matrix was able to minimize the effect of amino acid composition and reduce the dimensionality of datasets to obtain awareness of multiple variables (Fig. 6). An effective number of codons (ENC) plot can clarify the relationship between the ENC and GC content at the third codon position (GC3), enabling assessment of the effects of natural selection and mutational pressure on viral genome evolution (Fig. 7) [12]. In both CoA and ENC plot analysis, the CUB properties of Megalocytivirus and Iridovirus were clearly divided into two subgroups; Chloriridoviru and Ranavirus were divided into three subgroups. Furthermore, ENC-GC3 plot indicate that the codon usage bias of iridovirids is mainly shaped by mutational pressure, and Ranavirus_3 is the subgroup that most affected by natural selection (Fig. 7).
Fig. 6

Correspondence analysis of Ranavirus, Megalocytivirus, Chloriridovirus, Lymphocystivirus, and Iridovirus. Each dot represents the RSCU value of one gene. Density statistics for the two axes are shown above and to the right of the plot, respectively.

Fig. 7

The relationship between the ENC values and GC3s. Each dot represents the ENC value (Y axes) and GC3 value (X axes) of one gene. The solid line indicates the expected curve of ENC and GC3 only in the absence of natural selection. Points on or close to the expected curve mean that the bias is caused by mutation pressure, while points below the curve indicate the presence of other influential factors such as natural selection. Density statistics for the two axes are shown above and to the right of the plot, respectively.

Correspondence analysis of Ranavirus, Megalocytivirus, Chloriridovirus, Lymphocystivirus, and Iridovirus. Each dot represents the RSCU value of one gene. Density statistics for the two axes are shown above and to the right of the plot, respectively. The relationship between the ENC values and GC3s. Each dot represents the ENC value (Y axes) and GC3 value (X axes) of one gene. The solid line indicates the expected curve of ENC and GC3 only in the absence of natural selection. Points on or close to the expected curve mean that the bias is caused by mutation pressure, while points below the curve indicate the presence of other influential factors such as natural selection. Density statistics for the two axes are shown above and to the right of the plot, respectively.

Iridovirids hallmark gene identification

Complete genomes or concatenated core gene sequences are commonly used for virus taxonomic studies, but single gene-based taxonomy is easier and convenient to conduct. In order to clarify hallmark genes of members in the family Iridoviridae, pairwise comparisions of each core gene of 57 iridovirids were performed using BLAST (Supplementary file_2, Supplementary file_3 filtered the data for pident ≤ 75%). The criteria for selecting hallmark gene is that the similarity of all amino acid sequences within a group of viruses is ≥ 75%, but their similarity with viruses from other groups is <75%. Finally, nine core genes were selected as iridovirids hallmark genes that are able to identify unknown iridovirids at the genus level (Table 2).
Table 2

Selection of iridovirids hallmark genes.

Core genesLength (Nucleic acid/Amino acid)
Qualified hallmark proteins
min_lengthavg_lenmax_len
cg1723/240841.7/279.61203/400Yes
cg22352/7833759.2/1252.14134/1377Yes
cg32607/8682853.7/950.23516/1171Yes
cg4720/239858.1/285972/323No
cg5495/164743.4/246.81395/464Yes
cg62145/7142837.2/944.73060/1019No
cg8531/176604.1/200.4642/213Yes
cg91215/4043364.7/1120.64152/1383No
cg101365/4541531.4/509.51608/535Yes
cg112799/9323130.6/1042.54773/1590Yes
cg121395/4643309.3/1102.13597/1198No
cg161362/4531387.7/461.61455/484No
cg17606/2011052.6/349.91248/415No
newcg2369/122881.2/292.71083/360Yes
newcg4567/188585.4/194.1639/212Yes
newcg5336/111437.7/144.9714/237No
newcg61011/3361189.6/395.51902/633No
newcg7402/133482.9/160594/197No
Selection of iridovirids hallmark genes.

Discussion

Characters consisting of any viral property or feature can be used to distinguish one virus from another, including genomic characterization, viral capsid structure, gene expression program, host range, and pathogenicity [8]. The genus demarcation criteria for Iridoviridae viruses proposed by ICTV is that members of a given genus share less than 50% amino acid sequence identity with members of other genera. Furthermore, additional criteria, such as phylogenetic analysis to clearly distinguish one genus from others, principal host species, presence of a DNA methytransferase, and characteristic pathology, can also distinguish genera within the family (file code: 2018.007D). Previously, methods used to classify members of the Iridoviridae included molecular analysis of restriction endonuclease (REN) profiles, mcp amplicons sequencing, DNA hybridization, terminal redundancies, and DNA-DNA homologies [15], [16]. However, with rapid expansion of viral genome databases, these advances have led the ICTV to present a consensus statement suggesting a shift from “traditional” taxonomy toward a genome-centered, and perhaps one day largely automated, viral taxonomy [17], [18]. Whole-genome average amino acid identity (AAI) is calculated based on protein-coding genes between a pair of genomes as determined by whole-genome pairwise sequence comparisons using the BLAST algorithm, which have been widely applied for microbial taxonomy [19]. Rohwer and Edwards successfully grouped phages into taxa by AAI analysis and highlighted genetic markers useful for monitoring phage biodiversity [20]. Furthermore, AAI analysis is also important for revealing bacterial genetic relatedness, whether at a single gene level (for instance, 16S rRNA and 23S rRNA) or at the whole-genome level [19]. Due to the controversy taxonomy on some members in the Iridoviridae family, we analyzed 179 iridovirids genomes available at NCBI. The AAI cut-off value (50%) for iridovirids genus demarcation proposed by ICTV included some dispersed genomes into the same group (Fig. 3B and Fig. 4A). In our study, an AAI cut-off value of 70% was found to be more suitable for iridovirids classification based on existing sequenced genomes, indicating that the Iridoviridae family should be divided into more genera, or subgroup at least. Further, synteny analysis, concatenated strict core gene phylogenetic analysis, genome codon usage preference, GC content and length statistics all supported our classification proposal. It should be noted that we are not the first to call for an update of the taxonomy of members in the Iridoviridae family. The genus Ranavirus is the most researched and contains most of the iridovirids discovered so far. One of our previous studies of Santee-cooper ranavirus showed that Asian isolates are quite different from European and American isolates based on mcp phylogeny [4]. Genomic dot plot analysis in this study showed collinearity between the genomes of GIV and SGIV, but they possessed few regions of collinearity with other ranaviruses. In addition, GIV/SGIV lack the DNA methyltransferase gene that seen in other ranaviruses, which as a result, may need to be considered as a new genus, or recognized as a distinct species in the genus Ranavirus [3]. In our study, SGIV have the farthest evolutionary distance from the other two subgroups (Fig. 5). This is consistent with previous studies that the codon usage bias and genomic length of GIV and SGIV were different as compared to other members of Ranavirus [21]. Previous phylogenic analysis showed that scale drop disease virus (SDDV) clusters with megalocytiviruses, but form a separate branch within this genus [11]. Furthermore, the major infection symptoms of members of the same Iridoviridae genus are different. For instance, a symptom of SDDV infection in seabream is severe scale loss [22], whereas infection with infectious spleen and kidney necrosis virus (ISKNV) mainly observes diffuse necrosis in the haematopoietic tissues [23]. To date, phylogenetic analysis based on viral genomes or the 26 core genes identified by Eaton is the most commonly used method to elucidate evolutionary relationships among iridovirids, as seen in the genus or species renewal ICTV proposal for Iridoviridae in recent years [5]. However, the prerequisite is that the viral genomes should have been sequenced or the sequences of whole core genes are available. Previously, the major capsid protein (mcp) was thought to be reliable for the evolutionary analysis of iridovirids [10], [24], [25]. However, we found that the mcp gene is not accurate enough to allocate viruses at the genus level which is not recommended for future research (Supplementary file_2, Supplementary file_3). Instead, the identification of nine hallmark genes in this study provides an easy-to-use framework for virologists to accurately group viruses and form the basis of genus-level taxonomy in the future.

Methods

Genomic data and annotation

All Iridoviridae virus genomics listed in the National Center for Biotechnology Information (NCBI) Virus database (https://www. ncbi.nlm.nih.gov/labs/virus/) (as of December 2021) were collected. Genomes were re-annotated by using Prokka v1.14.6 package uniformly with the same parameters (settings: --kingdom Viruses, remaining settings: default) [26].

Repetitive genomes filtration

The program CompareM v0.1.2 (https://github.com/dparks1134/CompareM) was used to pairwise align collected genomes and calculate the AAI values of extracted CDS. The AAI value of 99% was set as a threshold to group similar viral genomics and then the generated network diagram matrix file was visualized by Cytoscape v3.8.2 [27]. Meanwhile, genomic phylogenic analysis was performed to examine the reliability of AAI analysis. All the genomic nucleic acid sequences were merged into a single file and subsequently submitted to ViPTreeGen (v.1.1.2) to construct a phylogenic tree [28]. From each group, select the most studied genome as the representative virus for later analysis. The GC content and genome size were calculated and visualized by seqkit v0.16.1 and the ggplot2 package in R [29].

Evaluation of core genes

After re-annotated collected iridovirids, each genome has a greater consensus among their annotated CDS. All protein sequences generated by Prokka annotation were merged into a single file (all.fa) using the “cat” command of Linux. Then, the all.fa file were submitted to BLAST (2.11.0+) for calculating the percentage of identical matches (makeblastdb -in all.fa -dbtype prot -out index/all -parse_seqids; blastp -query all.fa -db index/all -out all_blast.out -evalue 1e-5 -num_threads 8 -outfmt 6). After grouping conserved homologous genes by using Cytoscape, core genes of iridovirids were screened out by filtering groups including paralogous genes or genes that were not shared by all 57 representative genomes.

AAI analysis

The program CompareM v0.1.2 was used to calculate average amino acid identity (AAI) of representative Iridoviridae genomes. The AAI value of 50% (according to the ICTV proposal) and 70% (generated in this study) were separately set as threshold to group iridovirids genome, then visualize generated matrix file by using Cytoscape v3.8.2.

Synteny analysis of core genes

Synteny analysis serves as an alternative method to determine viral taxonomy and evolutionary relationships. BLAST v2.11.0+ (E-value threshold of 1e-5) and MCScanX were performed to determine synteny of concatenated core genes of representative iridovirids genes (Table 3). Firstly, annotated amino acid sequence files of representative iridovirids were merged into a dataset, using the “makeblast” command of BLAST. Secondly, the merged sequence file iridovirus.fa was aligned by using “blastp” command of BLAST. Then, comparison results were filtered according to the identity threshold of 75%. Finally, both the annotation information file (gff format) and the aligned file were imported into MCScanX to generate synteny images.
Table 3

The detailed steps of synteny analysis.

StepCodes
Step 1: Create databasemakeblastdb -in iridovirus.fa -dbtype prot -out index/all -parse_seqids
Step 2: BLASTblastp -query iridovirus.fa -db index/all -out out.blast -evalue 1e-5 -num_threads 8 -outfmt 6
Step 3: Filtrationcat out.blast | awk ‘{ if ($3 > 75) print $0}’ > iridovirus.blast(identity threshold set as 75%)
Step 4: MCScanX./MCScanX input_file/iridovirus
Step 5: Visualizationjava dot_plotter -g iridovirus.gff –s iridovirus.collinearity -c dot.ctl -o dot.PNG
The detailed steps of synteny analysis.

Phylogenic analysis

The maximum likelihood phylogenetic tree (ML-Tree) was constructed based on core genes of representative iridovirids. The MAFFT software was used to pairwise align sequences using the default setting [30]. The aligned core genes were concatenated by using PhyloSuite [31]. ML-Trees were then constructed by using IQ-TREE v1.6.12 [32]. Finally, iTol was used to annotate the phylogenetic trees [33].

Indicators for codon performance

In this study, correspondence Analysis (CoA) on RSCU and ENC-Plot Analysis were performed to evaluate viral codon usage preference as previously described [12]. In brief, each viral coding region was represented as 59-dimensional vector corresponding to RSCU value of each synonymous codon (excluding AUG, UGG, and stop codons) calculated by CodonW program. The effective number of codons (ENC) ranging from 20 (only one specific codon is recruited for each amino acid) to 61 (the recruitment percentage for all synonymous codons is equal) were also calculated. The expected ENC value corresponding to GC3 was calculated as previously described [12]. All data was finally visualized by R ggplot2 package.

CRediT authorship contribution statement

Ruoxuan Zhao: Conceptualization, Data curation, Software, Writing – original draft, Writing – review & editing. Congwei Gu: Conceptualization, Data curation, Writing – review & editing. Xiaoxia Zou: Conceptualization, Data curation, Writing – review & editing. Mingde Zhao: Software, Writing – review & editing. Wudian Xiao: Software, Writing – review & editing. Manli He: Methodology, Writing – review & editing. Lvqin He: Methodology, Writing – review & editing. Qian Yang: Data curation, Writing – review & editing. Yi Geng: Conceptualization, Writing – review & editing. Zehui Yu: Conceptualization, Methodology, Software, Writing – original draft, Writing – review & editing, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  26 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

2.  Multiple alignment of DNA sequences with MAFFT.

Authors:  Kazutaka Katoh; George Asimenos; Hiroyuki Toh
Journal:  Methods Mol Biol       Date:  2009

3.  Isolation and identification of Singapore grouper iridovirus Hainan strain (SGIV-HN) in China.

Authors:  Jingguang Wei; Youhua Huang; Weibin Zhu; Chen Li; Xiaohong Huang; Qiwei Qin
Journal:  Arch Virol       Date:  2019-05-09       Impact factor: 2.574

4.  Prokka: rapid prokaryotic genome annotation.

Authors:  Torsten Seemann
Journal:  Bioinformatics       Date:  2014-03-18       Impact factor: 6.937

5.  Comparison of the major capsid protein genes, terminal redundancies, and DNA-DNA homologies of two New Zealand iridoviruses.

Authors:  R J Webby; J Kalmakoff
Journal:  Virus Res       Date:  1999-02       Impact factor: 3.303

6.  First report of a ranavirus associated with morbidity and mortality in farmed Chinese giant salamanders (Andrias davidianus).

Authors:  Y Geng; K Y Wang; Z Y Zhou; C W Li; J Wang; M He; Z Q Yin; W M Lai
Journal:  J Comp Pathol       Date:  2011-01-21       Impact factor: 1.311

7.  Comparative genomic analysis of the family Iridoviridae: re-annotating and defining the core set of iridovirus genes.

Authors:  Heather E Eaton; Julie Metcalf; Emily Penny; Vasily Tcherepanov; Chris Upton; Craig R Brunetti
Journal:  Virol J       Date:  2007-01-19       Impact factor: 4.099

Review 8.  Invertebrate Iridoviruses: A Glance over the Last Decade.

Authors:  İkbal Agah İnce; Orhan Özcan; Ayca Zeynep Ilter-Akulke; Erin D Scully; Arzu Özgen
Journal:  Viruses       Date:  2018-03-30       Impact factor: 5.048

9.  ICTV Virus Taxonomy Profile: Iridoviridae.

Authors:  V Gregory Chinchar; Paul Hick; Ikbal Agah Ince; James K Jancovich; Rachel Marschang; Qiwei Qin; Kuttichantran Subramaniam; Thomas B Waltzek; Richard Whittington; Trevor Williams; Qi-Ya Zhang
Journal:  J Gen Virol       Date:  2017-05-30       Impact factor: 3.891

10.  Molecular and Ecological Studies of a Virus Family (Iridoviridae) Infecting Invertebrates and Ectothermic Vertebrates.

Authors:  V Gregory Chinchar; Amanda L J Duffus
Journal:  Viruses       Date:  2019-06-09       Impact factor: 5.048

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.