| Literature DB >> 34960619 |
Saki Nagata1, Ryoji Kiyohara1, Hiroyuki Toh1.
Abstract
The hepatitis delta virus is a single-stranded circular RNA virus, which is characterized by high self-complementarity. About 70% of the genome sequences can form base-pairs with internal nucleotides. There are many studies on the evolution of the hepatitis delta virus. However, the secondary structure has not been taken into account in these studies. In this study, we developed a method to examine the effect of base pairing as a constraint on the nucleotide substitutions during the evolution of the hepatitis delta virus. The method revealed that the base pairing can reduce the evolutionary rate in the non-coding region of the virus. In addition, it is suggested that the non-coding nucleotides without base pairing may be under some constraint, and that the intensity of the constraint is weaker than that by the base pairing but stronger than that on the synonymous site.Entities:
Keywords: base pairing; hepatitis delta antigen; hepatitis delta virus; one-sample Wilcoxon test
Mesh:
Substances:
Year: 2021 PMID: 34960619 PMCID: PMC8708965 DOI: 10.3390/v13122350
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
List of GenBank IDs for HDV used in this study.
|
|
| HQ005371, MN984413, MN984411, MN984408, MN984443, MN984429, KJ744224, MN984407, HQ005367, MN984422, |
|
|
| KF660599, KF660598, MG557658, AB118846, AY261457, AF425645, AF104264, MK234591, MK234593, MK234592, |
|
|
| AB037947, KC590319, HF679406, HF679405, HF679404 |
Average ratio of bases that bind to other bases.
| Threshold | Average Ratio |
|---|---|
| 0.4 | 0.753 |
| 0.5 | 0.722 |
| 0.6 | 0.672 |
| 0.7 | 0.623 |
Figure 1Schematic diagram of the aligned HDV genome. The three red rectangles indicate the non-coding regions. The numbers colored red indicate the positions of the non-coding regions in the alignment shown in Supplementary Data S1. The blue rectangle indicates the negative strand coding region of the L-HDAg gene. The numbers colored blue indicate the positions of the coding region in the alignment. The black line indicates the region that includes the complementary sites of the coding region.
Figure 2Procedure of the distance ratio analysis. (a) Calculation of distances of two regions between every possible pair of aligned sequences. The symbols, d(x, y) and dx, y), indicate the evolutionary distance between sequences x and y in region a and that in region b. The distances d(x, y) and d(x, y) correspond to one of the five distances edbp(x, y), ednbp(x, y), ed(x, y), and . For simplicity of the explanation, the two regions are drawn as separate regions in an alignment. However, the regions can be overlapped. For example, when distance ratios of and are examined as d(x, y) and dx, y), the regions a and b are the non-synonymous and synonymous sites of the same coding region. (b) Calculation of distance ratios between every possible pair of aligned sequences. If dx, y) equals zero, the corresponding distance ratio is not included in the set of the distance ratios. (c) One-sample Wilcoxon test. There are two alternative hypotheses, the median of the ratios > 1.0 and the median of the ratios < 1.0, which correspond to the right-sided and left-sided test. One of the two alternative hypotheses was adopted based on the median of the calculated distance ratios. That is, when the median was larger than 1.0 (less than 1.0), the right-sided (left-sided) one-sample Wilcoxon test was applied to the data of distance ratios.
Medians and corrected p-values of ten distance ratios for three genotypes. The size of the samples of each genotype is indicated by n. The corrected p-values were calculated by the left-sided test (the alternative hypothesis is that the median is less than 1.0) or the right-sided test (the alternative hypothesis is that the median is greater than 1.0.). The asterisk indicates the p-value calculated by the right-sided test. The floating point number representation is used to express the p-values.
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| 0.46 | 0.59 | 1.3 | 0.16 | 0.34 | 0.27 | 0.52 | 1.1 | 0.89 | 0.30 |
|
| <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 | <2.2 × 10−15 | |
|
|
| 0.70 | 0.71 | 1.0 | 0.30 | 0.44 | 0.43 | 0.87 | 1.2 | 1.2 | 0.35 |
|
| <2.2 × 10−15 | <2.2 × 10−15 | 8.6 × 10−4 * | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 * | <2.2 × 10−15 | |
|
|
| 0.29 | 0.50 | 1.3 | 0.40 | 0.74 | 0.63 | 0.63 | 1.8 | 1.1 | 0.69 |
|
| 1.9 × 10−2 | 1.9 × 10−2 | 1.4 × 10−1 * | 2.0 × 10−2 | 1.0 * | 1.0 * | 2.0 × 10−2 | 2.0 × 10−1 * | 6.4 × 10−1 * | 1.0 |
Medians and corrected p-values of ten distance ratios between genotypes. The size of the samples of each genotype is indicated by n. See legend of Table 3 for the asterisk and the representation of the p-value.
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| 0.76 | 0.74 | 0.99 | 0.21 | 0.28 | 0.28 | 0.70 | 0.89 | 0.93 | 0.31 |
|
| <2.2 × 10−15 | <2.2 × 10−15 | <1.8 × 10−7 * | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | |
|
|
| 0.60 | 0.71 | 1.2 | 0.26 | 0.43 | 0.36 | 0.87 | 1.5 | 1.2 | 0.29 |
|
| <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 * | <2.2 × 10−15 | |
|
|
| 0.61 | 0.72 | 1.2 | 0.32 | 0.48 | 0.43 | 0.82 | 1.3 | 1.1 | 0.38 |
|
| <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 | <2.2 × 10−15 * | <2.2 × 10−15 * | <2.2 × 10−15 |
Figure 3Box plots of ten distance ratios for three genotypes. The ordinate indicates the values of the distance ratios. The symbols on the abscissa, A–J, correspond to the ratios edbp(x,y)/ednbp(x, y), edbp(x,y)/ed(x, y), ednbp(x,y)/ed(x, y), edbp(x,y)/, ednbp(x,y)/, ed(x,y)/, edbp(x,y)/, ednbp(x,y)/, ed(x,y)/, and /. The top and the bottom of a box indicates the upper and lower quantile of the distance ratios. The horizontal bar in the box indicates the median of the distance ratios. The top and the bottom of the line running through the box indicates the maximum and minimum of the ratios, except for the outliers. The ratios that are present outside a closed interval, are regarded as the outliers. The circles above or below the line indicate the outliers.
Figure 4Box plots of the ten distance ratios between genotypes. The ordinate indicates the values of the distance ratios. The symbols on the abscissa, A–J, correspond to edbp(x,y)/ednbp(x, y), edbp(x,y)/ed(x, y), ednbp(x,y)/ed(x, y), edbp(x,y)/, ednbp(x,y)/, ed(x,y)/, edbp(x,y)/, ednbp(x,y)/, ed(x,y)/, and /. See legend of Figure 3 for the explanation of box plot.