| Literature DB >> 30510593 |
Formijn van Hemert1, Maarten Jebbink1, Andries van der Ark2, Frits Scholer3, Ben Berkhout1.
Abstract
Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a "classical" model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30510593 PMCID: PMC6232797 DOI: 10.1155/2018/6490647
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Nucleotide skew profiles of the rubella virus genome (JN635259). (a) Rubella virus. (b) 10x randomized.
Figure 2Trees of virus genomes based on nucleotide skew values (a) or nucleotide substitution (b).
Figure 3Trees of coronavirus genomes based on nucleotide skew values (a) or nucleotide substitution (b).
Figure 4Trees of flavivirus genomes based on nucleotide skew values (a) or nucleotide substitution (b).
Figure 5Trees of picornavirus genomes based on nucleotide skew values (a) or nucleotide substitution (b).
Figure 6Trees of HIV-1 subtype genomes based on nucleotide skew values (a) or nucleotide substitution (b).
Mean skew values of coronavirus, flavivirus, picornavirus, and HIV-1 subtype RNA genomes. HIV-1 subtype RNAs display much lower StDEV of skew values compared with those of the other viruses.
| Skew values | C vs G | G vs A | U vs G | U vs A | C vs A | U vs C | |
|---|---|---|---|---|---|---|---|
| Coronavirus | AVG | −29.09 | −58.05 | 101.36 | 40.51 | −86.34 | 124.36 |
| StDev | 46.67 | 33.47 | 21.40 | 24.00 | 43.15 | 45.22 | |
| Flavivirus | AVG | −43.38 | 38.95 | −44.53 | −5.01 | −3.66 | −1.19 |
| StDev | 29.92 | 53.33 | 32.36 | 59.98 | 71.05 | 25.50 | |
| Picornavirus | AVG | 9.48 | −32.05 | 37.12 | 2.81 | −24.10 | 26.49 |
| StDeV | 62.05 | 45.85 | 40.06 | 39.51 | 95.34 | 81.95 | |
| HIV | AVG | −77.15 | −101.30 | −17.95 | −118.86 | −171.62 | 59.48 |
| StDEV | 3.82 | 7.00 | 3.46 | 6.20 | 7.32 | 5.26 | |