| Literature DB >> 23351667 |
Emily Abernathy1, Min-hsin Chen, Jayati Bera, Susmita Shrivastava, Ewen Kirkness, Qi Zheng, William Bellini, Joseph Icenogle.
Abstract
Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23351667 PMCID: PMC3574052 DOI: 10.1186/1743-422X-10-32
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
List of rubella viruses used in this study
| | |||||
| | |||||
| | |||||
| | |||||
| | |||||
| F-Th_USA64 | 1a | Connecticut, USA, 1964 | M15240 | [ | NA |
| ULR_GER84 | 1a | Leipzig, Germany, 1984 | AF435865 | [ | NA |
| TO-W_JAP67 | 1a | Toyama, Japan, 1967 | AB047330 | [ | NA |
| Matsue.JPN/68 | 1a | Matsue, Japan, 1968 | AB222609 | [ | NA |
| Cba_ARG88 | 1B | Cordoba, Argentina, 1988 | DQ085339 | [ | NA |
| Anim_MEX97 | 1C | Baja California, Mexico, 1997 | DQ085341 | [ | NA |
| JC2_NZL91 | 1D | Auckland, New Zealand, 1991 | DQ388281 | [ | NA |
| 6423_ITA97 | 1E | Pavia, Italy, 1997 | DQ085343 | [ | NA |
| GUZ_GER92 | 1 G | Stuttgart, Germany, 1992 | DQ388280 | [ | NA |
| BR1-CN79 | 2A | Beijing, China, 1979 | AY258322 | [ | NA |
| AN5_KOR96 | 2B | Seoul, South Korea, 1996 | DQ085342 | [ | NA |
| I-11_ISR68 | 2B | Tel Aviv, Israel, 1968 | DQ085338 | [ | NA |
| C4_RUS67 | 2C | Moscow, Russia, 1967 | DQ388279 | [ | NA |
| C74_RUS97 | 2C | Moscow, Russia, 1997 | DQ085340 | [ | NA |
* The 16 new sequences are in bold.
^ The epidemiological link (source) and passage numbers of the 16 viruses are given when known.
Figure 1Phylogenetic tree of the 30 viruses. The first 21 nts were deleted from the sequence alignment due to a gap in one sequence. Baysian Information Criterion (BIC) scores for different models were computed using the MEGA 5.05 program with the default settings. TN93 with a proportional discrete Gamma distribution (+G) and a fraction of invariant sites (+I), was selected as best fitted model with the lowest BIC scores. The transition/transversion bias for TN93 + G + I model was estimated to be 6.85, using MEGA 5.05 under the Kimura 2-parameter model. The maximum likelihood tree constructed by MEGA 5.05 program with default settings and TN93 + G + I model is shown.
Clade-specific amino acid variation
| | |||||
|---|---|---|---|---|---|
| NSP | 464 | R | H | P150 (unknown) | |
| 551 | A | T | P150 (Q domain) | [ | |
| 725 | G | D | P150 (HVR) | [ | |
| 859 | T | A | P150 (X) | [ | |
| 864 | A | E/D | P150 (X) | [ | |
| 1064 | S | G | P150 (Protease) | [ | |
| 1082 | T | A | P150 (Protease) | [ | |
| 1147 | L | R/Q | P150 (Protease) | [ | |
| SP | 116 | S | T | C (unknown) | |
| 308 | M | I/L | E2 (antigenicity) | [ | |
| 323 | Q | K | E2 (antigenicity) | [ | |
| 420 | A/V | T | E2 (unknown) | | |
| 700 | E | D | E1 (unknown) |
* The references include definitions of the domains.
Figure 2Identity plots of nucleotide (A) and amino acid (B) sequences of 30 rubella viruses. The genes and putative domains are shown at the top of the panels. This includes: the methyltransferase (MT), hypervariable region (HVR), X-domain (X), the protease (Pro), helicase (Hel) and RNA-dependent RNA polymerase (RdRp) in NSP and the nucleocapid (C), membrane glycoprotein 2 (E2) and membrane glycoprotein 1 (E1) in SP. The nt analysis was done by counting the number of identical residues at the specific positions of all (green), clade 1 (blue) or clade 2 (red) viruses using Microsoft Office Excel. Comparisons were done using the consensus sequence from all 30 viruses or the clade-specific consensus sequences. Thus, any position at which each virus contains identical nt or aa residues will be 1. The nucleotide identity was plotted using a sliding 30-nt window; data are plotted as moving averages of the number of nucleotide changes. Each line in the amino acid identity plot represents the amount of amino acid identity at the indicated position.
Figure 3Sequence variation among 30 rubella viruses. The percentage of nucleotide variability (A) and amino acid variability (B) in each domain, as denoted on the X-axis, of 30 rubella viruses relative to overall consensus sequence was calculated using Microsoft Office Excel. The Y-axis indicates the percentage of variability. The amino acid alignment was determined by ClustalW according to the Gonnet PAM 250 matrix. Panels (C and D) show the variation of nucleic acid and amino acid sequences for the 6 regions shown in Figure 2. The conserved, semi-conserved, or non-conserved designation of each aa position was enumerated based on the most variable aa that was observed at that position amongst the 30 viruses. The clade-specific variations in nucleic acid sequences are compared using clade-specific consensus sequences. The range of each domain is indicated at the bottom of the graphs.