| Literature DB >> 35860404 |
Ruoxuan Zhao1,2, Congwei Gu1,3, Xiaoxia Zou4, Mingde Zhao1,3, Wudian Xiao1,3, Manli He1,3, Lvqin He1,3, Qian Yang1,3, Yi Geng5, Zehui Yu1,3,6.
Abstract
Members of the family Iridoviridae (iridovirids) are globally distributed and trigger adverse economic and ecological impacts on aquaculture and wildlife. Iridovirids taxonomy has previously been studied based on a limited number of genomes, but this is not suitable for the current and future virological studies as more iridovirids are emerging. In our study, 57 representative iridovirids genomes were selected from a total of 179 whole genomes available on NCBI. Then 18 core genes were screened out for members of the family Iridoviridae. Average amino acid sequence identity (AAI) analysis indicated that a cut-off value of 70% is more suitable for the current iridovirids genome database than ICTV-defined 50% threshold to better clarify viral genus boundaries. In addition, more subgroups were divided at genus level with the AAI threshold of 70%. This observation was further confirmed by genomic synteny analysis, codon usage preference analysis, genome GC content and length analysis, and phylogenic analysis. According to the pairwise comparison analysis of core genes, 9 hallmark genes were screened out to conduct preliminary identification and investigation at the genus level of iridovirids in a more convenient and economical manner.Entities:
Keywords: Codon usage; Core genes; Iridoviridae; Phylogenetics; Synteny analysis; Taxonomy
Year: 2022 PMID: 35860404 PMCID: PMC9284377 DOI: 10.1016/j.csbj.2022.06.049
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1(A) The AAI network built using the genomes of 196 Iridoviridae viruses. The edge represents AAI ≥ 99% between two nodes, and each node and color represent one genome and a cluster, respectively. (B) The viral proteomic tree (ViPTree) based on whole genome sequences. Different colored branches and outermost circles indicate different clusters. Branch length indicates evolutionary distance.
Fig. 2The BLASTp network of top 28 orthogroups. Each node represents one amino acid sequence. The edge represents percentage of identical matches >0 between two nodes (E-value threshold of 1e-5). Core genes defined by Eaton are colored.
Conserved genes and core genes of iridovirids.
| Orthogroups | Number of nodes | Number of iridoviridsa | Core genes defined by Eatonb | Gene name | Qualified core genec |
|---|---|---|---|---|---|
| #1 | 946 | 57 | cg7,cg18,cg19 | Putative tyrosin kinase, Serine-threonine protein kinase | no |
| #2 | 134 | 34 | NA | Hypothetical protein | no |
| #3 | 96 | 57 | cg14 | Ribonuclease III | no |
| #4 | 60 | 57 | cg10 | Myristilated membrane protein | yes |
| #5 | 60 | 57 | cg3 | Putative NTPase I | yes |
| #6 | 60 | 57 | cg2 | DNA-dep RNA pol-II Largest subunit | yes |
| #7 | 58 | 57 | cg9 | Unknown | yes |
| #8 | 58 | 57 | cg17 | Putative XPPG-RAD2-type nuclease | yes |
| #9 | 58 | 57 | cg6 | D5 family NTPase involved in DNA replication | yes |
| #10 | 58 | 57 | cg12 | DNA-dep RNA pol-II second largest subunit | yes |
| #11 | 57 | 57 | new_cg2 | Unknown | yes |
| #12 | 57 | 57 | cg8 | NIF-NLI interacting factor | yes |
| #13 | 57 | 57 | new_cg4 | Deoxynucleoside kinase | yes |
| #14 | 57 | 57 | new_cg6 | Immediate early protein ICP-46 | yes |
| #15 | 57 | 57 | cg1 | Putative replication factor and/or DNA binding-packing | yes |
| #16 | 57 | 57 | cg4 | ATPase-like protein | yes |
| #17 | 57 | 57 | cg5 | Helicase family | yes |
| #18 | 57 | 57 | cg11 | DNA pol Family B exonuclease | yes |
| #19 | 57 | 57 | cg16 | Major capsid protein | yes |
| #20 | 57 | 57 | new_cg5 | Erv1/Alr family | yes |
| #21 | 57 | 57 | new_cg7 | Hypothetical protein | yes |
| #22 | 56 | 56 | new_cg3 | Transcription elongation factor TFIIS | no |
| #23 | 56 | 56 | NA | Hypothetical protein | no |
| #24 | 55 | 53 | NA | Hypothetical protein | no |
| #25 | 55 | 55 | cg13 | Ribonucleotide reductase small subunit | no |
| #26 | 48 | 48 | cg15 | Proliferating cell nuclear antigen | no |
| #27 | 46 | 46 | NA | Hypothetical protein | no |
| #28 | 47 | 47 | NA | Hypothetical protein | no |
a: This value represents the number of viral genome that encoded corresponding genes. Value = 57 indicate strict core genes (genes present in all strains), values < 57 indicate soft core genes (genes present in partial strains).
b: NA means not defined by Eaton.
c: Eligibility of core gene needs to meet two conditions, namely (1) being a strict core gene and (2) no more than three paralogous genes in an orthogroup (number of iridoviridsa ≤ 60).
Fig. 3(A) AAI network of 57 iridovirids genomes (A1: cut-off ≥ 50%, A2: cut-off ≥ 70%). Each node represents one genome. Nodes connected by lines indicate that the AAI value of connected nodes is ≥ 50% or 70%. (B) Violin plot of overall identity analysis of 6922 CDS encoded by 57 iridovirids genomes. Each point represents an identity value.
Fig. 4(A) Violin plots of the percentage of identical matches of amino acid sequences and nucleic acid sequences of core genes. Each point represents the percentage of identical matches between the two aligned sequences (Left). Points have been removed for clarity of observation (Right). (B) Synteny analysis of representative iridovirids amino acid sequences (identity threshold at 75%). Each corresponding block represents the collinear comparison of two viruses. If there were no collinear amino acid sequence at a 75% identity between two viruses, the block would be blank.
Figure 5Phylogenic tree of iridovirids. Maximum likelihood analysis based on concatenated core genes of representative iridovirids (best-fit model according to BIC: Q.yeast + R6). The tree was rooted on midpoint. The first column of colored branches and bars represents ICTV classified iridovirids genera. The second column of colored bars represents genera or subgroup classified in this study (AAI identity cut-off ≥ 75%). The third column of colored bars represents a heat map of the GC content of the viral genome (Fig. S1). The grey bars in the last column represent viral genome size (Fig. S1). Branch length indicates evolutionary distance. The size of the point on the branch represents the bootstrap value >75.
Fig. 6Correspondence analysis of Ranavirus, Megalocytivirus, Chloriridovirus, Lymphocystivirus, and Iridovirus. Each dot represents the RSCU value of one gene. Density statistics for the two axes are shown above and to the right of the plot, respectively.
Fig. 7The relationship between the ENC values and GC3s. Each dot represents the ENC value (Y axes) and GC3 value (X axes) of one gene. The solid line indicates the expected curve of ENC and GC3 only in the absence of natural selection. Points on or close to the expected curve mean that the bias is caused by mutation pressure, while points below the curve indicate the presence of other influential factors such as natural selection. Density statistics for the two axes are shown above and to the right of the plot, respectively.
Selection of iridovirids hallmark genes.
| Core genes | Length (Nucleic acid/Amino acid) | Qualified hallmark proteins | ||
|---|---|---|---|---|
| min_length | avg_len | max_len | ||
| cg1 | 723/240 | 841.7/279.6 | 1203/400 | Yes |
| cg2 | 2352/783 | 3759.2/1252.1 | 4134/1377 | Yes |
| cg3 | 2607/868 | 2853.7/950.2 | 3516/1171 | Yes |
| cg4 | 720/239 | 858.1/285 | 972/323 | No |
| cg5 | 495/164 | 743.4/246.8 | 1395/464 | Yes |
| cg6 | 2145/714 | 2837.2/944.7 | 3060/1019 | No |
| cg8 | 531/176 | 604.1/200.4 | 642/213 | Yes |
| cg9 | 1215/404 | 3364.7/1120.6 | 4152/1383 | No |
| cg10 | 1365/454 | 1531.4/509.5 | 1608/535 | Yes |
| cg11 | 2799/932 | 3130.6/1042.5 | 4773/1590 | Yes |
| cg12 | 1395/464 | 3309.3/1102.1 | 3597/1198 | No |
| cg16 | 1362/453 | 1387.7/461.6 | 1455/484 | No |
| cg17 | 606/201 | 1052.6/349.9 | 1248/415 | No |
| newcg2 | 369/122 | 881.2/292.7 | 1083/360 | Yes |
| newcg4 | 567/188 | 585.4/194.1 | 639/212 | Yes |
| newcg5 | 336/111 | 437.7/144.9 | 714/237 | No |
| newcg6 | 1011/336 | 1189.6/395.5 | 1902/633 | No |
| newcg7 | 402/133 | 482.9/160 | 594/197 | No |
The detailed steps of synteny analysis.
| Step | Codes |
|---|---|
| Step 1: Create database | makeblastdb -in iridovirus.fa -dbtype prot -out index/all -parse_seqids |
| Step 2: BLAST | blastp -query iridovirus.fa -db index/all -out out.blast -evalue 1e-5 -num_threads 8 -outfmt 6 |
| Step 3: Filtration | cat out.blast | awk ‘{ if ($3 > 75) print $0}’ > iridovirus.blast |
| Step 4: MCScanX | ./MCScanX input_file/iridovirus |
| Step 5: Visualization | java dot_plotter -g iridovirus.gff –s iridovirus.collinearity -c dot.ctl -o dot.PNG |