| Literature DB >> 29325030 |
Juan Ángel Patiño-Galindo1,2, Fernando González-Candelas1,2, Oliver G Pybus3.
Abstract
Many viroids and RNA viruses have genomes that exhibit secondary structure, with paired nucleotides forming stems and loops. Such structures violate a key assumption of most methods of phylogenetic reconstruction, that sequence change is independent among sites. However, phylogenetic analyses of these transmissible agents rarely use evolutionary models that account for RNA secondary structure. Here, we assess the effect of using RNA-specific nucleotide substitution models on the phylogenetic inference of viroids and RNA viruses. We obtained data sets comprising full-genome nucleotide sequences from six viroid and ten single-stranded RNA virus species. For each alignment, we inferred consensus RNA secondary structures, then evaluated different DNA and RNA substitution models. We used model selection to choose the best-fitting model and evaluate estimated Bayesian phylogenies. Further, for each data set we generated and compared Robinson-Foulds (RF) statistics in order to test whether the distributions of trees generated under alternative models are notably different to each other. In all alignments, the best-fitting model was one that considers RNA secondary structure: RNA models that allow a nonzero rate of double substitution (RNA16A and RNA16C) fitted best for both viral and viroid data sets. In 14 of 16 data sets, the use of an RNA-specific model led to significantly longer tree lengths, but only in three cases did it have a significant effect on RFs. In conclusion, using RNA model when undertaking phylogenetic inference of viroids and RNA viruses can provide a better model fit than standard approaches and model choice can significantly affect branch length estimates.Entities:
Keywords: RNA secondary structure; RNA virus; phylogenetics; viroid
Mesh:
Year: 2018 PMID: 29325030 PMCID: PMC5814974 DOI: 10.1093/gbe/evx273
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary Statistics of Each Viroid and Virus Data Set Analyzed, Including Size (number of taxa and sequence length), Overall Mean Genetic Distance, Structure Conservation Index (SCI), Estimate Percentage of Base-Paired Nucleotides, Median MFED Value, and Best-Fitting Evolutionary Model of Each Viroid and Virus Data Set Analyzed
| Sequence Length (nt) | Mean | SCI | % (paired nucleotides) | %(Median MFED) | Best-Fitting Model | Δ AICc (overall best-fitting model vs. best-fitting DNA-only model) | ||
|---|---|---|---|---|---|---|---|---|
| Viroids | ||||||||
| TASVd | 22 | 374 | 0.036 | 0.91 | 68 | 14.30 | HKY_Γ+RNA16C_Γ | 338 |
| CeVd | 178 | 369 | 0.041 | 0.92 | 70 | 8.40 | GTR_Γ+RNA16E_Γ | 258 |
| CLVd | 14 | 379 | 0.061 | 0.88 | 68 | 15.40 | GTR_Γ+RNA16A_Γ | 352 |
| GYSVd | 24 | 352 | 0.128 | 0.84 | 65 | 8.30 | GTR_Γ+RNA16C_Γ | 336 |
| AGVd | 27 | 368 | 0.02 | 0.91 | 78 | 11.40 | HKY_Γ+RNA16C | 295 |
| PSTVd | 88 | 356 | 0.019 | 0.97 | 69 | 12.80 | HKY_Γ+RNA16A_Γ | 220 |
| Viruses | ||||||||
| HDV | 121 | 1,543 | 0.204 | 0.66 | 46 | 2.60 | GTR_Γ+RNA16D_Γ | 2,237 |
| Sudan Ebolavirus | 7 | 18,875 | 0.032 | 0.9 | 64 | 1.40 | GTR_Γ+RNA16A | >1,000 |
| DENV | 23 | 10,733 | 0.263 | 0.40 | NC | NC | NC | NC |
| DENV-1 | 20 | 10,733 | 0.061 | 0.81 | 60 | (−)1.3 | GTR_Γ+RNA16A_Γ | >1,000 |
| HCV | 42 | 9,605 | 0.292 | 0.40 | NC | NC | NC | — |
| HCV-1b (RNAalifold) | 20 | 9,605 | 0.087 | 0.82 | 66 | 3.80 | GTR_Γ+RNA16A_Γ | >1,000 |
| HCV-1b (SHAPE reactivity) | 20 | 9,605 | 0.087 | 0.82 | 51 | 3.80 | GTR_Γ+RNA16A_Γ | >1,000 |
| HIV-1 | 18 | 9,173 | 0.126 | 0.64 | NC | NC | NC | — |
| HIV-1B (RNAalifold) | 33 | 9,173 | 0.056 | 0.74 | 57 | 0.50 | GTR_Γ+RNA16D_Γ | >1,000 |
| HIV-1B (SHAPE reactivity) | 33 | 9,173 | 0.056 | 0.74 | 23 | 0.50 | GTR_Γ+RNA16D_Γ | 674 |
| FMDV | 19 | 8,192 | 0.135 | 0.75 | 60 | 3.90 | GTR_Γ+RNA16D_Γ | >1,000 |
| Measles | 20 | 15,893 | 0.042 | 0.89 | 63 | 0.10 | GTR_Γ+RNA16A_Γ | >1,000 |
| Rubella | 35 | 9,758 | 0.06 | 0.9 | 65 | 1.20 | GTR_Γ+RNA16A_Γ | >1,000 |
| Mumps | 20 | 15,355 | 0.045 | 0.86 | 61 | (−)0.8 | GTR_Γ+RNA16A_Γ | >1,000 |
| Rabies | 26 | 11,923 | 0.111 | 0.66 | 5 | NC | NC | NC |
| Rabies C1 | 20 | 11,923 | 0.088 | 0.74 | 63% | (−)0.3 | GTR_Γ+RNA16A_Γ | >1,000 |
Note.—NC, not computed.
SCI (Structure Conservation Index) below 0.70.
Percentage of nucleotides forming base pairing, after obtaining a consensus structure comprising paired-sites that are present in >75% of genotypes/subtypes within a species.
The RNA secondary structure only includes the 15 regions along the HIV-1B genome, reported by Siegfried et al. (2014), that have both SHAPE reactivity values and low Shannon entropies, thus being considered as well defined structures.
Comparisons of Tree Lengths (L) Estimated under DNA and Mixed Models, for All Sites, Paired Sites, and Unpaired Sites
| Ratio (mixed/DNA) | Ratio (paired-DNA model/unpaired) | Ratio (paired-RNA model/unpaired) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Viroids | |||||||||
| TASVd | 0.47 | 3.07 | 6.532 | <0.001 | 3.05 | 0.51 | 1.82 | 0.167 | 0.597 |
| AGVd | 4.91 | 4.93 | 1.004 | 0.341 | 0.55 | 0.31 | 1.13 | 0.563 | 2.055 |
| CeVd | 30.41 | 33.81 | 1.111 | <0.001 | 34.18 | 33.4 | 31.42 | 0.977 | 0.919 |
| CLVd | 0.44 | 1.23 | 2.795 | <0.001 | 1.36 | 0.74 | 1.01 | 0.544 | 0.743 |
| GYSVd | 0.77 | 2.06 | 2.675 | <0.001 | 2.11 | 0.88 | 2.21 | 0.417 | 1.047 |
| PSTVd | 17.28 | 17.05 | 0.989 | 0.14 | 17.22 | 17.17 | 17.13 | 0.997 | 0.995 |
| Viruses | |||||||||
| HDV | 9.09 | 12.15 | 1.337 | <0.001 | 12.44 | 7.5 | 15.35 | 0.603 | 1.234 |
| Sudan Ebolavirus | 0.07 | 0.1 | 1.408 | <0.001 | 0.1 | 0.05 | 0.13 | 0.555 | 1.322 |
| DENV-1 | 0.46 | 0.55 | 1.196 | <0.001 | 0.55 | 0.42 | 0.92 | 0.764 | 1.673 |
| HCV-1b (RNAalifold) | 1.16 | 1.73 | 1.495 | <0.001 | 1.73 | 0.89 | 1.84 | 0.513 | 1.064 |
| HCV-1b (SHAPE) | 1.17 | 1.4 | 1.191 | <0.001 | 1.4 | 0.98 | 1.9 | 0.7 | 1.357 |
| HIV-1B (RNAalifold) | 1.48 | 2.21 | 1.493 | <0.001 | 2.21 | 0.92 | 2.2 | 0.416 | 0.995 |
| HIV-1B (SHAPE) | 1.48 | 1.51 | 1.02 | <0.001 | 1.52 | 1.5 | 2.59 | 0.987 | 1.704 |
| FMDV | 2.01 | 2.48 | 1.234 | <0.001 | 2.52 | 1.48 | 2.72 | 0.587 | 1.079 |
| Measles | 0.33 | 0.42 | 1.273 | <0.001 | 0.42 | 0.3 | 0.78 | 0.714 | 1.857 |
| Rubella | 0.71 | 1.04 | 1.465 | <0.001 | 1.04 | 0.6 | 1.33 | 0.577 | 1.277 |
| Mumps | 0.35 | 0.48 | 1.371 | <0.001 | 0.42 | 0.32 | 0.86 | 0.761 | 2.048 |
| Rabies C1 | 0.94 | 1.16 | 1.234 | <0.001 | 1.16 | 0.86 | 2.07 | 0.741 | 1.784 |
P value obtained from comparing the branch length distributions using paired Wilcocon tests, after a logarithm transformation.
The RNA secondary structure only includes the 15 regions along the HIV-1B genome, reported by Siegfried et al. (2014), that have both SHAPE reactivity values and low Shannon entropies, thus being considered as well defined structures.
Topology could not be fixed for branch lengths inference due to unresolved bipartitions, and a Wilcoxon rank sum test was performed instead of a paired test.
. 1.—Density plots representing, for each data set, the distribution of RF distances obtained by comparing topologies from the same posterior distribution (either including or excluding the RNA model) versus the distribution of RF distances obtained by comparing topologies from two different posterior distributions. The results of the randomization tests are shown as the proportion of comparisons for which an RF distance obtained through comparing states from the same posterior (blue = under mixed model; red = under DNA model) was lower than the RF distance obtained by comparing states from the two different posterior distributions (black= mixed vs. DNA models). Significant values after FDR correction are labeled with “*.”