| Literature DB >> 33172115 |
Cristina Moraru1, Arvind Varsani2,3, Andrew M Kropinski4.
Abstract
Nucleotide-based intergenomic similarities are useful to understand how viruses are related with each other and to classify them. Here we have developed VIRIDIC, which implements the traditional algorithm used by the International Committee on Taxonomy of Viruses (ICTV), Bacterial and Archaeal Viruses Subcommittee, to calculate virus intergenomic similarities. When compared with other software, VIRIDIC gave the best agreement with the traditional algorithm, which is based on the percent identity between two genomes determined by BLASTN. Furthermore, VIRIDIC proved best at estimating the relatedness between more distantly-related phages, relatedness that other tools can significantly overestimate. In addition to the intergenomic similarities, VIRIDIC also calculates three indicators of the alignment ability to capture the relatedness between viruses: the aligned fractions for each genome in a pair and the length ratio between the two genomes. The main output of VIRIDIC is a heatmap integrating the intergenomic similarity values with information regarding the genome lengths and the aligned genome fraction. Additionally, VIRIDIC can group viruses into clusters, based on user-defined intergenomic similarity thresholds. The sensitivity of VIRIDIC is given by the BLASTN. Thus, it is able to capture relationships between viruses having in common even short genomic regions, with as low as 65% similarity. Below this similarity level, protein-based analyses should be used, as they are the best suited to capture distant relationships. VIRIDIC is available at viridic.icbm.de, both as a web-service and a stand-alone tool. It allows fast analysis of large phage genome datasets, especially in the stand-alone version, which can be run on the user's own servers and can be integrated in bioinformatics pipelines. VIRIDIC was developed having viruses of Bacteria and Archaea in mind; however, it could potentially be used for eukaryotic viruses as well, as long as they are monopartite.Entities:
Keywords: VIRIDIC; nucleotide-based intergenomic distance; nucleotide-based intergenomic similarity; phages; viruses
Year: 2020 PMID: 33172115 PMCID: PMC7694805 DOI: 10.3390/v12111268
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1Comparison between the intergenomic similarity values produced with the default BLASTN alignment parameters (parameter set 1: -word_size 7 -reward 2 -penalty -3 -gapopen 5 -gapextend 2) and parameter sets of increasing stringency. Parameter set 2: “-word_size 11 -reward 2 -penalty -3 -gapopen 5 -gapextend 2”. Parameter set 3: “-word_size 20 -reward 1 -penalty -2”. Parameter set 4: “-word_size 28 -reward 1 -penalty -2”. For illustration, the similarity values between two viral genomes (NCBI accession NC_008694 and NC_010807) and all the other genomes in the benchmarking dataset were chosen. On the X axis are plotted the intergenomic similarity values as calculated with the parameter set 1. On the Y axis are plotted the intergenomic similarity values as calculated with each of the four parameter sets. The plot was generated with the ggplot2 R package [22].
Figure 2Plot comparing intergenomic similarity values generated by different tools (on the Y axis) with those generated by the traditional method used by ICTV (on the X axis). The plot was generated with the ggplot2 R package [22]. Data used for this plot are found in Table S1.
Figure 3Genome alignments of the Escherichia coli T7 phage (NC_001604.1) and Pelagibacter phage HTVC011P (NC_020483.1) using (A) NCBI BLASTN, with the T7 genome as query; and (B) progressiveMAUVE plugin from Geneious software [26].
Figure 4VIRIDIC generated heatmap incorporating intergenomic similarity values (right half) and alignment indicators (left half and top annotation). In the right half, the color-coding allows a rapid visualization of the clustering of the phage genomes based on intergenomic similarity: the more closely-related the genomes, the darker the color. The numbers represent the similarity values for each genome pair, rounded to the first decimal. In the left half, three indicator values are represented for each genome pair, in the order from top to bottom: aligned fraction genome 1 (for the genome found in this row), genome length ratio (for the two genomes in this pair) and aligned fraction genome 2 (for the genome found in this column). The darker colors emphasize low values, indicating genome pairs where only a small fraction of the genome was aligned (orange to white color gradient), or where there is a high difference in the length of the two genomes (black to white color gradient). The aligned genome fractions are expected to decrease with increasing the distance between the phages. Therefore, darker colors should correspond to genome pairs with low similarity values, and whiter colors to genome pairs with higher similarity values. Similarly, more closely-related viruses are expected to have similar lengths. Therefore, if low genome length ratios correspond to genome pairs with high similarity (e.g., MG969412.1 and MG969413.1 have a 62.4% similarity, but only 0.3 genome length fraction), this signals that the pair needs to be investigated further before being classified. The genome of the K155 strain of the T7 phage (AY264776.1) and its permuted (AY264776.1_perm1 and AY264776.1_perm2) and reversed complemented (AY264776_reversed) variants presented no significant differences between their intergenomic similarity values.