Literature DB >> 25609811

Local evolutionary patterns of human respiratory syncytial virus derived from whole-genome sequencing.

Charles N Agoti1, James R Otieno1, Patrick K Munywoki1, Alexander G Mwihuri1, Patricia A Cane2, D James Nokes3, Paul Kellam4, Matthew Cotten5.   

Abstract

UNLABELLED: Human respiratory syncytial virus (RSV) is associated with severe childhood respiratory infections. A clear description of local RSV molecular epidemiology, evolution, and transmission requires detailed sequence data and can inform new strategies for virus control and vaccine development. We have generated 27 complete or nearly complete genomes of RSV from hospitalized children attending a rural coastal district hospital in Kilifi, Kenya, over a 10-year period using a novel full-genome deep-sequencing process. Phylogenetic analysis of the new genomes demonstrated the existence and cocirculation of multiple genotypes in both RSV A and B groups in Kilifi. Comparison of local versus global strains demonstrated that most RSV A variants observed locally in Kilifi were also seen in other parts of the world, while the Kilifi RSV B genomes encoded a high degree of variation that was not observed in other parts of the world. The nucleotide substitution rates for the individual open reading frames (ORFs) were highest in the regions encoding the attachment (G) glycoprotein and the NS2 protein. The analysis of RSV full genomes, compared to subgenomic regions, provided more precise estimates of the RSV sequence changes and revealed important patterns of RSV genomic variation and global movement. The novel sequencing method and the new RSV genomic sequences reported here expand our knowledge base for large-scale RSV epidemiological and transmission studies. IMPORTANCE: The new RSV genomic sequences and the novel sequencing method reported here provide important data for understanding RSV transmission and vaccine development. Given the complex interplay between RSV A and RSV B infections, the existence of local RSV B evolution is an important factor in vaccine deployment.
Copyright © 2015, Agoti et al.

Entities:  

Mesh:

Year:  2015        PMID: 25609811      PMCID: PMC4403408          DOI: 10.1128/JVI.03391-14

Source DB:  PubMed          Journal:  J Virol        ISSN: 0022-538X            Impact factor:   5.103


INTRODUCTION

Human respiratory syncytial virus (RSV) is a leading viral cause of severe respiratory infection during infancy and early childhood and among immunocompromised populations (1, 2). Globally, the virus is estimated to be responsible for 30 million episodes of acute lower respiratory tract infections (RTIs) and more than 50,000 deaths annually in children under 5 years of age (3). RSV infections throughout the world consistently occur as annual or biennial epidemics, and persons of all ages can be infected with diverse clinical outcomes ranging from mild upper RTIs to severe pneumonia or bronchiolitis (2, 4). A vaccine against RSV is not yet available (5). Careful analyses of RSV molecular epidemiology, evolution, and transmission are essential for defining the circulating viruses, for characterizing antigenic variation, and for tracking transmission patterns. The outcome of these studies can support new strategies for RSV control and vaccine use and development. It has long been known that children suffer repeated RSV infections throughout life (6, 7). The ability of the virus to continue to infect previously exposed individuals is thought to be linked to an ability to bypass preexisting immune responses (8). Sequence variation in attachment (G) protein in consecutive years (9) is thought to be part of this mechanism; also, the global existence of two groups, A and B, and their alternating infection incidences may play a role (10, 11). The transmissibility of RSV group A (RSVA) is estimated to be slightly higher than that of RSV group B (RSVB) (12), and RSVA infections are more frequent than RSVB infections (12). An additional important feature of RSV infection is the apparently rapid global dispersion of new RSV variants (13). Indeed, genetically similar viruses cluster more by time than by location, suggesting rapid global movement of new variants (14). RSV molecular pathology and epidemiology have been reviewed in detail elsewhere (2, 4, 15, 16). Historically, RSV molecular epidemiology has focused on the 900-bp region encoding the G protein (15, 17). The G protein together with the fusion (F) protein are important targets of human protective antibody responses (8, 15), with changes in this region thought to be driven by pressure to avoid host immune responses. Although studies of the sequence variability of RSV have concentrated on G gene variability, given the rapid infection pace but relatively low evolutionary rate of RSV, transmission studies over short periods require the stronger evolutionary signal provided by the full virus genome sequence (15,200 nucleotides [nt]: 11 open reading frames [ORFs] and noncoding regions). There is also a need to understand the nature of variation of immune targets other than the G protein. Advances in primer design, sequencing technology, and sequence assembly algorithms now allow full-genome sequencing for a number of viruses, including RSV (18–23), norovirus (24), and Middle East respiratory syndrome (MERS) coronavirus (25, 26). The current work describes RSV genome evolution across a set of clinical samples collected from children who presented with severe RSV disease in a rural coastal Kenyan hospital using a novel RSV whole-genome sequencing (WGS) approach optimized for small amounts of clinical diagnostic samples. The sequence data provide an update of the genome-wide diversity of circulating RSV strains in this part of Kenya, including both RSVA and RSVB and the recently reemerged group B genotype GB3 (27). The novel genomes support previous conclusions on patterns of local RSV variation relative to global RSV diversity and reveal a significant difference in local evolution of RSVA versus RSVB.

MATERIALS AND METHODS

Primer design.

All RSV sequences available (August 2012) with lengths of >14,000 nt were collected and sorted by group, yielding 138 RSVA and 38 RSVB genomes. The sequences for each group were pooled and sliced into 33-nt strings with a 1-nt step size. The 33-mers were filtered to remove sequences with ambiguous nucleotides, and the frequency of each sequence within the set was determined. The 33-nt sequences were then trimmed to a calculated melting temperature (T) of 58°C, discarding sequences mapping to human rRNA, with GC contents of <30% or >65%, or with a single nucleotide frequency of >60%. The RSV genome was divided into six 3-kb segments overlapping by 300 nt. All sequences were mapped to an RSVA or RSVB reference strain, and the two most frequent primers mapping within 300 nt of the end of each amplicon were selected. The reverse complement of the downstream sequences was prepared. To ensure amplification of the far ends of the genomes, two additional primers were included from the 5′- and 3′-terminal genomic regions. A summary of the primer sequences and their predicted target sequences across all known RSV genomes is presented in Table 1.
TABLE 1

Summary of RSV primers used in this study

TargetPrimerSequence (5′ to 3′)StrandPositionaTm (°C)b% with 0 MMc% with 0–3 MMd
RSVArsvasACGCGAAAAAATGCGTACAACPlus157.1318.2818.97
RSVArsva52TGTGCATGTTATTACAAGTAGTGATATTTGPlus26656.9695.5298.97
RSVArsva50GCATGTTATTACAAGTAGTGATATTTGCCPlus26957.5195.1798.97
RSVArsva117ATAAGAGATGCCATGGTTGGTTTAAGAPlus284958.4495.86100.00
RSVArsva86AAGAGATGCCATGGTTGGTTTAAGAPlus285158.4395.86100.00
RSVArsva175TTCTCTTAAACCAACCATGGCATCTMinus287858.4395.86100.00
RSVArsva39CTTCTCTTAAACCAACCATGGCATCMinus287958.2295.86100.00
RSVArsva1820GCAGCATATGCAGCAACAATCPlus520756.9593.7998.97
RSVArsva1914CAGCATATGCAGCAACAATCCAAPlus520858.3293.1098.62
RSVArsva1644CAACTCCATTGTTATTTGCCCCMinus567456.0589.66100.00
RSVArsva1688CAACTCCATTGTTATTTGCCCCAMinus567457.5489.66100.00
RSVArsva704ATGTGTTGCCATGAGCAAACTCPlus789357.9591.03100.00
RSVArsva731GCCATGAGCAAACTCCTCACTPlus790058.4971.3899.31
RSVArsva341TTGTCAGGTAGTATCATTATTTTTGGCATGMinus819658.5398.9799.31
RSVArsva312AGGATATTTGTCAGGTAGTATCATTATTTTTGGMinus820358.0898.97100.00
RSVArsva374AAGAGAACTCAGTGTAGGTAGAATGTTTPlus1036057.8996.55100.00
RSVArsva350AGAACTCAGTGTAGGTAGAATGTTTGPlus1036356.6496.55100.00
RSVArsva497GCTTGATTGAATTTGCTGAGATCTGTMinus1062058.4495.52100.00
RSVArsva539ATGCTTGATTGAATTTGCTGAGATCTGMinus1062258.6895.52100.00
RSVArsva1220GATTGGGTGTATGCATCTATAGATAACAAGPlus1238657.9495.8699.31
RSVArsva1232ATTGGGTGTATGCATCTATAGATAACAAGPlus1238757.1795.8699.31
RSVArsva364TTATATATCCCTCTCCCCAATCTTTTTCAAAMinus1307058.3296.21100.00
RSVArsva385ATCAGTTATATATCCCTCTCCCCAATCTTMinus1307558.4696.21100.00
RSVArsva4066GTTGTATAACAAACTACCTGTGATTTTAATCAGMinus1498357.9588.9799.31
RSVArsva5632TAACTATAATTGAATACAGTGTTAGTGTGTAGCMinus1506357.9529.3195.17
RSVArsvaeACGAGAAAAAAAGTGTCAAAAACTAATAMinus1522355.0917.5918.28
RSVBrsvbsACGCGAAAAAATGCGTACTACAPlus157.5643.1443.14
RSVBrsvb3TGGGGCAAATAAGAATTTGATAAGTGCPlus4458.5848.0454.90
RSVBrsvb1021GGGGCAAATAAGAATTTGATAAGTGCTATTPlus4558.7547.0654.90
RSVBrsvb33ATATTAGGAATGCTCCATACATTAGTAGTTGPlus277757.2188.24100.00
RSVBrsvb71TAAGAGATGCTATGGTTGGTCTAAGAGAPlus284158.6990.20100.00
RSVBrsvb50AGTCTTGCCATAGCCTCTAACCTMinus293758.5793.14100.00
RSVBrsvb95CCATTTTTTCGCTTTCCTCATTCCTAMinus296358.1495.10100.00
RSVBrsvb7884AGTATATGTGGCAACAATCAACTCTGPlus520257.4881.37100.00
RSVBrsvb7996TATGTGGCAACAATCAACTCTGCPlus520657.7081.37100.00
RSVBrsvb7442GATGTGGAGGGCTCGGATGMinus554857.9275.49100.00
RSVBrsvb7423CCATGGTTATTTGCCCCAGATTTAATMinus566257.8777.4599.02
RSVBrsvb3762AGAGGTCATTGCTTGAATGGTAGAAPlus764257.9893.14100.00
RSVBrsvb3712AAGAGCATAGACACTTTGTCTGAAATAAGPlus776257.8977.45100.00
RSVBrsvb3652GCTTATGGTTATGCTTTTGTGGATATCTAATMinus813058.4189.2298.04
RSVBrsvb3660GCAATCATGCTTTCACTTGAGATCAAMinus824758.6764.7198.04
RSVBrsvb32AAGAAGAGTACTAGAGTATTACTTGAGAGATAAPlus1023657.0490.20100.00
RSVBrsvb52AAATCCAAATCTTAGCAGAGAAAATGATAGPlus1041256.7096.08100.00
RSVBrsvb47CCATGCAGTTCATCTAATACATCACTGMinus1067358.1390.2099.02
RSVBrsvb168TGCATGTCTATATGTACATATTATTGTGACAAGMinus1074658.2591.1899.02
RSVBrsvb651ATCGACATTGTGTTTCAAAATTGCATAAGPlus1264058.4081.37100.00
RSVBrsvb165TTCAAAATTGCATAAGTTTTGGTCTTAGCPlus1265358.0688.24100.00
RSVBrsvb27TTAATGAACATATGATCAGTTATATACCCCTCTMinus1308857.8879.41100.00
RSVBrsvb60AACTTAAAACTGTGACAGCCTTTTATTCTMinus1332558.0889.22100.00
RSVBrsvb1199ATAGTACACTACCTGTTATTTTAATCAGCTTCTMinus1497758.5688.24100.00
RSVBrsvb989TATAGTACACTACCTGTTATTTTAATCAGCTTCMinus1497857.5788.24100.00
RSVBrsvbeACGAGAAAAAAAGTGTCAAAAACTAATGTMinus1521657.475.886.86

Primer mapping position in RSVA (GenBank accession number FJ948820) or RSVB (GenBank accession number JQ582843).

T (melting temperature) calculated using a Python script that approximates the method of Breslauer et al. (51).

Percentage of full-length RSVA genomes (n = 290) or full-length RSVB genomes (n = 102) showing perfect homology to primer, i.e., 0 mismatches (MM).

Percentage of full-length RSVA genomes (n = 290) or full-length RSVB genomes (n = 102) showing the target sequence for the primer with up to 3 mismatches.

Summary of RSV primers used in this study Primer mapping position in RSVA (GenBank accession number FJ948820) or RSVB (GenBank accession number JQ582843). T (melting temperature) calculated using a Python script that approximates the method of Breslauer et al. (51). Percentage of full-length RSVA genomes (n = 290) or full-length RSVB genomes (n = 102) showing perfect homology to primer, i.e., 0 mismatches (MM). Percentage of full-length RSVA genomes (n = 290) or full-length RSVB genomes (n = 102) showing the target sequence for the primer with up to 3 mismatches.

Clinical samples.

Viral nucleic acid for sequencing was extracted from RSV-positive clinical specimens (nasopharyngeal swabs [NPS] or washes) collected from children under 5 years old admitted to the Kilifi District Hospital (KDH) with severe or very severe pneumonia between 2002 and 2012. RSV infection was diagnosed with an indirect immunofluorescence antibody technique (IFAT; Light Diagnostics). Details of the study that provided the samples sequenced in this study have been previously provided (28). Informed consent was obtained from a parent or guardian on behalf of each child before specimen collection, and the KEMRI Ethics Review Committee approved all protocols. Additional details on the samples are provided in Table 2.
TABLE 2

Details for samples used in this study

MiSeqAge (mo)Sample date (day-mo-yr)GroupLength (nt)aCoveragebPresent in G setcPresent in F setdGenBank no.eENA no.f
10028_10007-Jan-02A9,3466,401YesKP317918ERR323212
10028_11627-Apr-02A7,09110,370KP317940ERR323213
10028_12628-Jan-03A9,7765,692YesKP317955ERR323214
11866_65513-Feb-03A12,1517,347YesYesKP317949ERR438932
11865_75824-Mar-04A14,98512,283YesYesKP317956ERR438910
10891_50621-Jan-05A5,3963,554YesYesKP317948ERR376407
10891_56002-Feb-05A5,3962,369YesYesKP317924ERR376413
9696_451420-Feb-06A14,7783,830YesYesKP317944ERR303303
10891_57123-Feb-06A14,8414,640YesYesKP317942ERR376414
10891_58029-Mar-06A8,8646,016KP317943ERR376415
10891_59304-Jan-07A11,4965,316YesKP317937ERR376416
10891_60105-Jan-07A14,7914,454YesYesKP317926ERR376417
10891_51007-Mar-08A14,9674,882YesYesKP317933ERR376408
10891_521117-Mar-08A5,6361,201YesKP317931ERR376409
10899_38122-Feb-09A14,8548,478YesYesKP317950ERR381723
10899_40426-Jan-10A10,11313,351YesKP317916ERR381725
10899_411810-Feb-10A14,71312,405YesYesKP317935ERR381726
11864_54329-Apr-10A14,7167,071YesYesKP317936ERR438905
11862_33326-Aug-10A14,7198,961YesYesKP317921ERR438868
11864_53125-Mar-11A14,7356,891YesYesKP317951ERR438904
11862_282813-Apr-11A15,21410,434YesYesKP317920ERR438864
11862_29423-Mar-12A14,95012,922YesYesKP317953ERR438865
11862_321430-Apr-12A7,1976,180KP317947ERR438867
9697_161006-Jul-02B15,0405,419YesYesKP317939ERR303322
9697_10813-Jan-03B9,7906,853YesKP317930ERR303316
10140_14602-Apr-04B12,03412,174YesYesKP317919ERR331021
9697_71022-Dec-04B15,0804,480YesYesKP317925ERR303313
9697_6225-Dec-04B14,9986,523YesYesKP317954ERR303312
9697_5127-Jan-06B15,2343,682YesYesKP317917ERR303311
9465_102327-Feb-09B14,99516,190YesYesKP317938ERR303268
9465_113113-Feb-10B15,00411,722YesYesKP317941ERR303269
9465_122206-Apr-10B15,26014,855YesYesKP317932ERR303270
9465_61709-May-10B15,33313,719YesYesKP317952ERR303264
9465_7301-Feb-11B15,23314,182YesYesKP317927ERR303265
9465_8214-Apr-11B15,32315,367YesYesKP317945ERR303266
9465_9108-Jul-11B15,23714,709YesYesKP317928ERR303267
9465_3814-Jan-12B14,99512,378YesYesKP317946ERR303261
9465_11913-Feb-12B15,23312,994YesYesKP317934ERR303259
9465_41401-Mar-12B15,17914,802YesYesKP317923ERR303262
10911_9123-Mar-12B14,97712,504YesYesKP317929ERR376442
9465_2516-May-12B14,94112,906YesYesKP317922ERR303260

Final sequence length obtained from de novo assembly of short read data (see Materials and Methods).

Coverage calculated by mapping all reads to final assembled contig. Coverage was calculated as the number of mapped reads/(length of the genome fragment/129).

Samples yielding sufficient sequence for G region analysis (Fig. 5).

Samples yielding sufficient sequence for F region analysis (Fig. 5).

The final genome data were deposited in GenBank with the indicated accession numbers.

Short-read data available at European Nucleotide Archive (http://www.ebi.ac.uk/ena).

Details for samples used in this study Final sequence length obtained from de novo assembly of short read data (see Materials and Methods). Coverage calculated by mapping all reads to final assembled contig. Coverage was calculated as the number of mapped reads/(length of the genome fragment/129). Samples yielding sufficient sequence for G region analysis (Fig. 5).
FIG 5

Kilifi versus global changes in the G, F, and NS2 proteins. (A) Kilifi compared to global G protein changes. For each group, the G protein sequences were identified as Kilifi or non-Kilifi (global) and aligned, and a consensus amino acid sequence was generated (at 60% level). The first portion shows the positions of O-linked (red) and N-linked (blue) glycosylation sites, the second portion shows general features of the G protein, and the third portion shows total changes (Kilifi plus global) at each position. The fourth portion shows amino acid differences in each G sequence from the consensus. Amino acid changes observed only in Kilifi are marked in red, and changes observed either globally or in the Kilifi are marked in gray. Gaps are not indicated. N-linked and O-linked glycosylation sites were determined using NetNGlyc 1.0 and NetOGlyc 3.1 (46–48). (B) Kilifi versus global F protein changes. Changes in F protein were determined and are depicted as in panel A. Known motifs of the F protein (49) include signal peptide (SP), heptad repeat C (HRC), 27-mer fragment (p27), putative fusion peptide (FP), heptad repeat A (HRA), domains 1 and 2 (Dom1&2), heptad repeat B (HRB), transmembrane domain (TM), and cytoplasm domain (CP). Antigenic sites I, II, and IV (ASI, ASII, and ASIV) are sites of neutralizing antibody binding (40, 50). (C) Kilifi versus global NS2 protein changes. Changes in NS2 protein were determined and are depicted as in panel A. Known motifs of the NS2 protein include the TRAF3-interacting domain (TRAF3-ID) and C-terminal tetrapeptide sequence (DLNP) (43).

Samples yielding sufficient sequence for F region analysis (Fig. 5). The final genome data were deposited in GenBank with the indicated accession numbers. Short-read data available at European Nucleotide Archive (http://www.ebi.ac.uk/ena).

RNA extraction, RT, and PCR.

Viral RNA was extracted with the QIAmp extraction kit (Qiagen, United Kingdom) from a starting NPS specimen volume of 140 μl and final elution volume of 60 μl. Reverse transcription (RT) of RNA molecules was performed with the forward primers for each of the six amplicons. A separate RT reaction was performed for each amplicon. Typically, the 20-μl reaction mixture contained 2 μl of sample RNA. A 5-μl aliquot of the resulting cDNA was used in each 25-μl PCR mixture. The PCR mixture was incubated at 98°C for 30 s, followed by 40 cycles of 98°C for 10 s, 53°C for 30 s, and 72°C for 3.0 min and a final extension of 72°C for 10.0 min. Following PCR, aliquots of the products were run on a 0.6% agarose gel to monitor amplification success, and the products from the 6 reactions for each sample were pooled for Illumina library preparation.

Deep sequencing.

Sequencing of the pooled amplicons was performed with Illumina MiSeq. Samples were multiplexed at 15 to 20 per MiSeq run and processed as paired-end reads (2 × 149 nt), generating approximately 1.5 million reads per sample. Raw sequence data were processed with QUASR (29) to remove low-quality (< median Phred 35) and adapter-containing reads, and de novo assembly with SPAdes (30) was performed. RSV contigs were identified by BLASTN analysis, and low-coverage contigs were excluded. Where necessary, partial but overlapping genome contigs were combined using Sequencher (v5.2.4). All final viral genomes were examined for appropriate assembly based on length and the presence of the expected intact RSV open reading frames.

Protein changes.

After sorting by virus group (RSV group A or B), the genomic region under investigation was translated, the protein sequence was aligned using MAFFT (31), and protein differences from the consensus sequence of the group were visualized and quantitated using Python scripts.

Reference data set.

A comprehensive RSV genome data set was generated from the GenBank database using as a starting set all reported RSV genomes. The search was conducted on 28 September 2014 using the search term “txid11250 [Organism]) AND 13500[SLEN]: 17000[SLEN].” Genomes with multiple ambiguous bases, lacking country of detection or date of collection (year), or from patent depositions were excluded. The newly sequenced Kilifi RSV genomes for each group were combined with those from GenBank in the subsequent analysis. Thinned representative reference sets were prepared by using the usearch algorithm (32).

Phylogenetic analyses.

Phylogenetic trees of the genome sequences and selected genomic regions were constructed using the Bayesian methods in MrBayes program v3.2.1 (http://mrbayes.sourceforge.net/index.php) under the general time reversible model of evolution. RSVA and RSVB were analyzed separately using both the total data set and the thinned data sets. The viruses within the groups were assigned to genotypes based on the clustering pattern of the G ORF portion sequences with reference sequences representative of the previously described RSV genotypes: for RSVA, strains representing GA1-7, SAA1, and ON1, and for RSVB, strains representing GA1-4, SAB1-SAB4, and BA (11, 33–35). Phylogenetic trees were visualized in FigTree v1.4.2.

Evolutionary analyses.

Nucleotide substitution rates and estimates for time to most recent common ancestor (tMRCA) were obtained from the usearch-thinned data sets, using uclust to remove genomes closer than ID 0.99 (32). The rates and tMRCA estimates were calculated in BEAST v1.7.5 (36) both for full genomes and for the individual ORFs.

Nucleotide sequence accession numbers.

The final set of RSV sequences was deposited in GenBank with the following accession numbers: KP317916 to KP317956.

RESULTS

Two sets of reverse transcription and PCR primers were selected from all available RSVA and RSVB genomic sequence data based on frequency, location, and predicted PCR function (see Table 1 for further details). The general pattern of primer sites and the locations of primer targets in RSVA and RSVB genomes are shown in Fig. 1A. Actual PCR results are shown in Fig. 1B for RSVA and RSVB samples, with PCR products of the expected size obtained for all 6 amplicons. These primers were used as part of a deep-sequencing process for RSV combining the full cDNA preparation and genome amplification, deep sequencing with Illumina MiSeq, and de novo assembly (Fig. 1C) to generate 27 complete or nearly complete genomes (11 group A and 16 group B; median length, 14,990 nt; range, 14,666 to 15,232 nt). An additional number of samples yielded RSV contigs of >5,000 nt in length, and these were also retained for further analysis. A summary of the genomic sequences in this study is provided in Table 2.
FIG 1

(A) PCR primer target sites in RSVA and RSVB. The primer target sequences in representative RSVA (left) and RSVB (right) viruses were determined. Circular markers indicate positions of primer target sites in the test genome color-coded by number of mismatches with the primer; gray bars indicate lengths and positions of the predicted products. (B) Two examples of reverse transcription-PCR function. The DNA products of reverse transcription and PCR amplification of two samples were resolved by agarose gel electrophoresis and visualized by ethidium bromide staining. Sizes of some of the molecular size markers (in base pairs) are indicated to left of the gel. Lane m, molecular size markers; lanes 1 to 6, individual 2- to 3-kb RSV amplicons 1 to 6, respectively. (C) Flowchart of the RSV sequencing process.

(A) PCR primer target sites in RSVA and RSVB. The primer target sequences in representative RSVA (left) and RSVB (right) viruses were determined. Circular markers indicate positions of primer target sites in the test genome color-coded by number of mismatches with the primer; gray bars indicate lengths and positions of the predicted products. (B) Two examples of reverse transcription-PCR function. The DNA products of reverse transcription and PCR amplification of two samples were resolved by agarose gel electrophoresis and visualized by ethidium bromide staining. Sizes of some of the molecular size markers (in base pairs) are indicated to left of the gel. Lane m, molecular size markers; lanes 1 to 6, individual 2- to 3-kb RSV amplicons 1 to 6, respectively. (C) Flowchart of the RSV sequencing process.

RSV global phylogenetic clustering and placement of Kilifi genomes.

The 27 Kilifi genomes were combined with RSVA and RSVB genomes from 16 countries from specimens collected between the years 1981 and 2013 (see Materials and Methods). The phylogenetic clustering is shown in Fig. 2A (RSVA) and B (RSVB). RSVA forms 3 major clades: GA1 (including strains only from the United States), GA5 (with U.S. and global strains), and a clade with both GA2 and the ON1 viruses with a 72-nucleotide duplication in the G ORF (33), which included nearly all of the new Kilifi RSVA genomes (GA2_ON1). Multiple subclusters showing temporal clustering were detected within each of these clades.
FIG 2

Phylogenetic analysis of the Kilifi RSVA and RSVB genomes. (A) MrBayes tree of representative global RSVA genome sequences together and the 11 novel Kilifi RSVA genome sequences. (B) MrBayes tree of representative global RSVB genome sequences and the 16 novel Kilifi RSVA genome sequences. Trees were inferred using the Bayesian methods in MrBayes (http://mrbayes.sourceforge.net/index.php) under the GTR model of evolution. The numbers next to the branches indicate the posterior probabilities. The Kilifi taxa are indicated in red font. Thinned global reference sets for RSVA and RSVB were prepared from all available RSV genomes clustering at 0.99% identity using uclust (32). See Materials and Methods for additional details.

Phylogenetic analysis of the Kilifi RSVA and RSVB genomes. (A) MrBayes tree of representative global RSVA genome sequences together and the 11 novel Kilifi RSVA genome sequences. (B) MrBayes tree of representative global RSVB genome sequences and the 16 novel Kilifi RSVA genome sequences. Trees were inferred using the Bayesian methods in MrBayes (http://mrbayes.sourceforge.net/index.php) under the GTR model of evolution. The numbers next to the branches indicate the posterior probabilities. The Kilifi taxa are indicated in red font. Thinned global reference sets for RSVA and RSVB were prepared from all available RSV genomes clustering at 0.99% identity using uclust (32). See Materials and Methods for additional details. Four clades were designated for the RSVB genomes, with BA containing the majority of the Kilifi sequences (Fig. 2B). Clade GB1_GB4 included viruses detected in the United States between 1983 and 1991. Clades SAB1, GB3, and BA included viruses from multiple countries, including the Kilifi RSVB genomes. Similar to that for RSVA, the clustering was more temporal than geographical. Notably, the BA (Buenos Aires) clade viruses are characterized by the presence of a 60-nucleotide duplication within the G ORF. The 4 viruses within clade GB3 (3 from Kilifi and 1 from Germany) lacked the 60-nucleotide duplication. Neither RSVA nor RSVB genomes from Kilifi showed a monophyletic grouping. Instead, the Kilifi genomes were dispersed throughout the observed RSV evolution, clustering with contemporaneous genomes from the other countries. The phylogenetic tree topologies arising from whole-genome and G protein ORF sequences were highly similar (data not shown).

Comparison of genomes of viruses with identical G protein ORFs.

One motivation for developing full-genome methods was to increase the sensitivity for tracking RSV across short-term transmission chains. We asked if viruses identical in their G gene regions had differences elsewhere in their genomes. All RSV genomes (both GenBank or in the new data presented here) with identical G regions were identified, and the number of changes outside the G region were determined. Of 7 sets of viruses with identical G regions, all showed at least 1 but up to 9 nucleotide differences across the full genome (Fig. 3). This increased resolution will be important in future studies examining RSV household transmission patterns to identify who acquires infection from whom.
FIG 3

Comparison of RSVB genomes with identical G regions. Each panel represents a genome nucleotide alignment of RSVs that had identical G gene sequences. The G protein ORF portions of the genomes are highlighted gray across the panels and were identical. The vertical lines indicate where there are nucleotide substitutions occurring outside the G gene region between the genomes. The blue blocks indicate a gap in the sequence.

Comparison of RSVB genomes with identical G regions. Each panel represents a genome nucleotide alignment of RSVs that had identical G gene sequences. The G protein ORF portions of the genomes are highlighted gray across the panels and were identical. The vertical lines indicate where there are nucleotide substitutions occurring outside the G gene region between the genomes. The blue blocks indicate a gap in the sequence.

Estimation of RSV tMRCA and evolutionary rates.

Previous data on RSV evolution are largely derived from the G protein coding region. The full genomes generated in this study were combined with the GenBank reference data set, and these allowed an estimation of the global nucleotide substitution rates and the time to most recent common ancestor (tMRCA) for all the recently sequenced RSVA and RSVB viruses. These estimates were calculated for the different ORFs and the whole-genome sequences (Fig. 4). The whole genomes provided more precise estimates of the MRCA, as observed from the interval of lower and upper 95% highest posterior density (HPD) compared to individual ORF data for the same set of viruses (Fig. 4B).
FIG 4

(A) Estimates of the nucleotide substitution rates for RSVA and RSVB in the individual ORFs and for the whole-genome sequence. (B) Estimates of tMRCA for RSVA and RSVB for the individual ORFs and for the whole-genome sequence. The analysis was undertaken using the usearch-thinned data sets (37 genome sequences for RSVA and 23 sequences for RSVB). The analysis was performed with BEAST (36).

(A) Estimates of the nucleotide substitution rates for RSVA and RSVB in the individual ORFs and for the whole-genome sequence. (B) Estimates of tMRCA for RSVA and RSVB for the individual ORFs and for the whole-genome sequence. The analysis was undertaken using the usearch-thinned data sets (37 genome sequences for RSVA and 23 sequences for RSVB). The analysis was performed with BEAST (36). The G protein ORF showed the highest nucleotide substitution rates for both RSVA and RSVB (Fig. 4A). Elevated changes in G and M2-2 were observed previously using RSV full genomes from U.S. and European cohort data (21). Similar to the MRCA estimates, the whole-genome estimates for the evolutionary rates showed narrower confidence intervals than those from the individual ORFs. The two regions considered for vaccine targets, G and F, show a strikingly wide difference in rate, and this may be important for selecting conserved vaccine targets.

Changes in G and F coding regions, comparing local and global viruses.

An important consideration for vaccine development is how representative a vaccine strain is for locally circulating viruses. The transmission patterns of a virus, the evolutionary rate of the virus, and patterns of human movement can strongly influence how quickly global strains reach a rural location. To address this important issue, the amino acid changes encoded by the RSV coding sequences observed in Kilifi were compared to the amino acid changes observed for all known RSV genomes from other parts of the world (Fig. 5; Table 3).
TABLE 3

Kilifi versus global evolution

ProteinNo. of distinct changes for all Kilifi and global virusesNo. of distinct changes in Kilifi virusesNo. (%) of distinct changes unique to Kilifi virusesa
RSVA G409687 (11.8)b
RSVB G2997030 (42.9)
RSVA F200459 (22.2)c
RSVB F811913 (79.3)
RSVA NS273183 (16.7)d
RSVB NS2381613 (81.3)

Number of distinct amino acid changes observed in Kilifi and not in other parts of the world. “Distinct changes” means that the set of changes is reduced to a unique set with multiple occurrences of a change counted only once.

The P value for Fisher's exact test was <0.01 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB.

The P value for Fisher's exact test was <0.05 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB.

The P value for Fisher's exact test was <0.05 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB.

Kilifi versus global changes in the G, F, and NS2 proteins. (A) Kilifi compared to global G protein changes. For each group, the G protein sequences were identified as Kilifi or non-Kilifi (global) and aligned, and a consensus amino acid sequence was generated (at 60% level). The first portion shows the positions of O-linked (red) and N-linked (blue) glycosylation sites, the second portion shows general features of the G protein, and the third portion shows total changes (Kilifi plus global) at each position. The fourth portion shows amino acid differences in each G sequence from the consensus. Amino acid changes observed only in Kilifi are marked in red, and changes observed either globally or in the Kilifi are marked in gray. Gaps are not indicated. N-linked and O-linked glycosylation sites were determined using NetNGlyc 1.0 and NetOGlyc 3.1 (46–48). (B) Kilifi versus global F protein changes. Changes in F protein were determined and are depicted as in panel A. Known motifs of the F protein (49) include signal peptide (SP), heptad repeat C (HRC), 27-mer fragment (p27), putative fusion peptide (FP), heptad repeat A (HRA), domains 1 and 2 (Dom1&2), heptad repeat B (HRB), transmembrane domain (TM), and cytoplasm domain (CP). Antigenic sites I, II, and IV (ASI, ASII, and ASIV) are sites of neutralizing antibody binding (40, 50). (C) Kilifi versus global NS2 protein changes. Changes in NS2 protein were determined and are depicted as in panel A. Known motifs of the NS2 protein include the TRAF3-interacting domain (TRAF3-ID) and C-terminal tetrapeptide sequence (DLNP) (43). Kilifi versus global evolution Number of distinct amino acid changes observed in Kilifi and not in other parts of the world. “Distinct changes” means that the set of changes is reduced to a unique set with multiple occurrences of a change counted only once. The P value for Fisher's exact test was <0.01 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB. The P value for Fisher's exact test was <0.05 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB. The P value for Fisher's exact test was <0.05 for the number of location-specific distinct changes compared to total distinct changes for RSVA versus RSVB. A large percentage of the changes observed in the RSVA G protein were also observed globally, with 88% of the changes seen in Kilifi RSVA G also observed in other parts of the world (Table 3). The Kilifi RSVB viruses appeared to have more local evolution, with only 60% of the observed changes in G shared with global viruses. With reference to the F protein, for Kilifi RSVA viruses, 80% of the observed changes were also found globally, while the Kilifi RSVB viruses showed a higher degree of local evolution, with only 20% of the observed changes specific to Kilifi viruses seen in other locations. To determine if this local evolution of RSVB was observed at other sites, the sequence data were stratified to other locations (the United States, Argentina, and Peru), but no significant local patterns were observed. This suggests that the isolation of the Kilifi site was more pronounced than for other sites. Alternatively, this may reflect more intense sampling of RSVB within a limited area. The RSV envelope proteins are heavily glycosylated. More than 50% of the G protein mass can be carbohydrate (37), and the potential O-linked glycosylation sites (serine or threonine) comprise up to 30% of the G protein amino acid sequence (38). Changes toward or away from asparagine can be associated with a change in the overall glycosylation of the protein and could be associated with adaptive change to local immune responses. The G protein is subject to heavy O-linked glycosylation in the variable regions, with modification frequently on serine or threonine residues in the vicinity of a proline residue (Fig. 5A). Nearly half of the observed changes in the in G protein affect S, T, or P residues (RSVA GA2 37/81 and RSVB 100/241). This is apparent when potential N- and O-linked glycosylation sites are marked on the G protein region (Fig. 5) and is also facilitated by the single nucleotide changes that distinguish codons for these three amino acids. In the RSVA G proteins, an N237D polymorphism observed in many of the viruses is within the site NTT and would remove a predicted N-linked glycosylation site. Tan et al. (22) also noted that the RSVA-GA2 group showed a frequent change in two predicted glycosylation sites (N237D and S242N). Within the RSVB viruses, 3 of 15 amino acid changes involve asparagine, but none of these changes are predicted N-linked glycosylation sites (Asn-Xaa-Ser/Thr). Changes in N-linked glycosylation areas are known to effect binding of human convalescent-phase sera to peptides (39). The RSV F protein contains only 10 to 20% of its mass as carbohydrate, and this is attached exclusively via N-glycosidic bonds (37). For the RSVA viruses, 5 of the protein changes are to or away from asparagine. In RSVB, 3 changes involve asparagine; however, none of these are within predicted glycosylation sites. Many polymorphisms were observed in the F protein p27 domain (Fig. 5B). This peptide is likely to serve as a spacer that is freed by cleavage during F maturation and is not found in the mature protein. The large number of changes may simply reflect the disposable nature of this sequence (40). The NS2 protein may be important in modulating host innate immune responses (41–43) and may influence movement of infected cells (44). The NS2 showed an elevated level of evolutionary rate (Fig. 4), consistent with a protein interacting with polymorphic host target proteins. Monitoring the local versus global protein changes in NS2 revealed multiple changes occurring in the amino-terminal domain and a portion of the domain important for TRAF3 interactions (43). The majority of changes in the Kilifi RSVA NS2 proteins were also observed in other parts of the world; however, the RSVB NS2 protein showed a significantly high degree of variation only observed in the Kilifi viruses (Fig. 5C).

DISCUSSION

The current work presents a functional approach for community-wide monitoring of RSV whole-genome genetic diversity suitable for detailed transmission studies. A challenge with deep sequencing of large sample sets of RNA viruses is the design of amplification primers. Traditionally, PCR primers were designed using alignments of sequences from the target virus; previous RSV studies with dideoxy sequencing used a greater number of tiled amplicons (2, 24, 25, 38) to cover the whole genome. With larger and more diverse sets, the alignment step becomes problematic. The approach described here bypasses the alignment step and was tailored for deep-sequencing methods. The RSV method uses only 6 amplicons to reduce the amplification costs and the required amount of input RNA. Although two primer sets were designed for RSVA and RSVB, the two sets can be pooled to simplify processing of samples of unknown RSV subtype. The computational methods used for primer selection facilitates updating of the primer sets as additional RSV genome sequence data become available. Frequent updating of these primer sets will help avoid sequence bias that could occur using antiquated primer sets. It is also important that the new full genomes reported here were assembled using de novo assembly methods. Although reference-based methods for assembling genomes from short-read data are rapid and less memory intensive, reference-based methods fail if a close reference genome is not available. The method presented here determines virus genomic sequences directly from patient material and shows sensitivity similar to that of traditional sequencing methods, but it avoids the potential virus selection that may occur if samples are first passaged through cell culture. The 27 novel Kilifi RSV genomes (11 RSVA and 16 RSVB) generated in this study were used to assess local versus global RSV variety. Similar to the patterns previously observed with G ORF, the full genomic phylogenetic analysis confirmed that Kilifi genomes were interspersed with genomes from other countries, with rapid appearance of variants in Kilifi soon after they are first observed in other parts of the world (45). Kilifi RSV strains are similar to strains that circulate in other regions of the world and reveal only limited local evolution. Phylogenetic clustering appeared to be more influenced by time of virus sample collection than by geographical location, suggesting a fairly rapid global spread of novel RSV variants. It should also be noted that the similarity of the overall topology of phylogenetic trees from whole genomes and G sequences is encouraging and indicates that although full-genome sequences are most useful for detailed transmission studies, the relationships determined with the G region is similar to the patterns observed with the full-genome sequences. The availability of full genomes allowed a comparison of estimates of the tMRCA of the Kilifi RSV strains. The obtained tMRCAs were broadly similar, although the higher evolutionary rates of the G region lead, as expected, to slightly later tMRCAs. The estimates based on the entire genomes lead to earlier dates and more discrete confidence intervals than estimates from specific genomic regions. Similar observations were made by Tan et al. (22). Our comparison of genomes determined to be identical in the G region found nucleotide substitutions elsewhere in their genomes. The genomes with identical G regions invariably were from the same geographical region and over the same epidemic, the sample collection date interval ranging from a few days to months. This observation suggests that nucleotide substitution in the RSV genome in the short term is random, i.e., not concentrated in the regions that appear the most variable in the long term, and supports the use of whole-genome sequencing for monitoring viral transmission chains. The observed sites of change in the G and F proteins were frequently in exposed regions of the proteins; several involved glycosylation site changes suggestive of immune evasion. In addition, similar to previous reports, the NS2 (Fig. 5B) and M2-2 protein coding regions (not shown) were observed to change at rates higher than that for the full genome. Although these changes could be simply the allowed changes of unconstrained proteins, it is also possible that these sites are important for interacting with the host and may be under some pressure to change. Unfortunately, the sequence data set generated in this study was too small to provide statistically supported evidence of positive selection, but future studies with larger data sets will be facilitated by these methods. The availability of a collection of RSV genome sequences from a single African location allowed a comparison of local versus global RSV evolution patterns. Important for vaccine design, the RSVA variants observed in a small region of Kenya appear to be in equilibrium with global variants. The same was not observed for RSVB. Possibly, RSVB variants may spread less efficiently, with a higher fraction of variants observed to be specific for Kilifi and not detected in other parts of the world. This pattern is consistent with RSVB as a less transmissible infection than RSVA (4, 12). However, there are fewer global sequences available for RSVB, so while the Kilifi RSVB variants appear to be unique, this could be a consequence of less surveillance and documentation of RSVB variation globally. Future work will help clarify this phenomenon, as it may have strong consequences on the efficacy of any RSV vaccine used locally.
  50 in total

1.  Circulation patterns of group A and B human respiratory syncytial virus genotypes in 5 communities in North America.

Authors:  T C Peret; C B Hall; G W Hammond; P A Piedra; G A Storch; W M Sullender; C Tsou; L J Anderson
Journal:  J Infect Dis       Date:  2000-05-22       Impact factor: 5.226

2.  Viral etiology of severe pneumonia among Kenyan infants and children.

Authors:  James A Berkley; Patrick Munywoki; Mwanajuma Ngama; Sidi Kazungu; John Abwao; Anne Bett; Ria Lassauniére; Tina Kresfelder; Patricia A Cane; Marietjie Venter; J Anthony G Scott; D James Nokes
Journal:  JAMA       Date:  2010-05-26       Impact factor: 56.272

3.  Natural history of human respiratory syncytial virus inferred from phylogenetic analysis of the attachment (G) glycoprotein with a 60-nucleotide duplication.

Authors:  Alfonsina Trento; Mariana Viegas; Mónica Galiano; Cristina Videla; Guadalupe Carballal; Alicia S Mistchenko; José A Melero
Journal:  J Virol       Date:  2006-01       Impact factor: 5.103

Review 4.  Viral and host factors in human respiratory syncytial virus pathogenesis.

Authors:  Peter L Collins; Barney S Graham
Journal:  J Virol       Date:  2007-10-10       Impact factor: 5.103

Review 5.  Respiratory syncytial virus genetic and antigenic diversity.

Authors:  W M Sullender
Journal:  Clin Microbiol Rev       Date:  2000-01       Impact factor: 26.132

6.  Structure of respiratory syncytial virus fusion glycoprotein in the postfusion conformation reveals preservation of neutralizing epitopes.

Authors:  Jason S McLellan; Yongping Yang; Barney S Graham; Peter D Kwong
Journal:  J Virol       Date:  2011-05-25       Impact factor: 5.103

7.  Multiple functional domains and complexes of the two nonstructural proteins of human respiratory syncytial virus contribute to interferon suppression and cellular location.

Authors:  Samer Swedan; Joel Andrews; Tanmay Majumdar; Alla Musiyenko; Sailen Barik
Journal:  J Virol       Date:  2011-07-27       Impact factor: 5.103

8.  Effects of nonstructural proteins NS1 and NS2 of human respiratory syncytial virus on interferon regulatory factor 3, NF-kappaB, and proinflammatory cytokines.

Authors:  Kirsten M Spann; Kim C Tran; Peter L Collins
Journal:  J Virol       Date:  2005-05       Impact factor: 5.103

9.  Structural basis for immunization with postfusion respiratory syncytial virus fusion F glycoprotein (RSV F) to elicit high neutralizing antibody titers.

Authors:  Kurt A Swanson; Ethan C Settembre; Christine A Shaw; Antu K Dey; Rino Rappuoli; Christian W Mandl; Philip R Dormitzer; Andrea Carfi
Journal:  Proc Natl Acad Sci U S A       Date:  2011-05-17       Impact factor: 11.205

10.  Whole genome characterization of non-tissue culture adapted HRSV strains in severely infected children.

Authors:  Rajni Kumaria; Laxmi Ravi Iyer; Martin L Hibberd; Eric A F Simões; Richard J Sugrue
Journal:  Virol J       Date:  2011-07-28       Impact factor: 4.099

View more
  41 in total

Review 1.  Respiratory Syncytial Virus: Infection, Detection, and New Options for Prevention and Treatment.

Authors:  Cameron Griffiths; Steven J Drews; David J Marchant
Journal:  Clin Microbiol Rev       Date:  2017-01       Impact factor: 26.132

2.  Investigation of Respiratory Syncytial Virus Outbreak on an Adult Stem Cell Transplant Unit by Use of Whole-Genome Sequencing.

Authors:  Yijun Zhu; Teresa R Zembower; Kristen E Metzger; Zhengdeng Lei; Stefan J Green; Chao Qi
Journal:  J Clin Microbiol       Date:  2017-07-26       Impact factor: 5.948

3.  Molecular epidemiological surveillance of viral agents of acute lower respiratory tract infections in children in Accra, Ghana.

Authors:  Anna Aba Kafintu-Kwashie; Nicholas Israel Nii-Trebi; Evangeline Obodai; Margaret Neizer; Theophilus Korku Adiku; John Kofi Odoom
Journal:  BMC Pediatr       Date:  2022-06-24       Impact factor: 2.567

Review 4.  A systematic review on global RSV genetic data: Identification of knowledge gaps.

Authors:  Annefleur C Langedijk; Eline R Harding; Burak Konya; Bram Vrancken; Robert Jan Lebbink; Anouk Evers; Joukje Willemsen; Philippe Lemey; Louis J Bont
Journal:  Rev Med Virol       Date:  2021-09-20       Impact factor: 11.043

5.  Successive Respiratory Syncytial Virus Epidemics in Local Populations Arise from Multiple Variant Introductions, Providing Insights into Virus Persistence.

Authors:  Charles N Agoti; James R Otieno; Mwanajuma Ngama; Alexander G Mwihuri; Graham F Medley; Patricia A Cane; D James Nokes
Journal:  J Virol       Date:  2015-09-09       Impact factor: 5.103

6.  A Virological and Phylogenetic Analysis of the Emergence of New Clades of Respiratory Syncytial Virus.

Authors:  Farah Elawar; Cameron D Griffiths; Daniel Zhu; Leanne M Bilawchuk; Lionel D Jensen; Lydia Forss; Julian Tang; Bart Hazes; Steven J Drews; David J Marchant
Journal:  Sci Rep       Date:  2017-09-25       Impact factor: 4.379

7.  Respiratory Syncytial Virus whole-genome sequencing identifies convergent evolution of sequence duplication in the C-terminus of the G gene.

Authors:  Seth A Schobel; Karla M Stucker; Martin L Moore; Larry J Anderson; Emma K Larkin; Jyoti Shankar; Jayati Bera; Vinita Puri; Meghan H Shilts; Christian Rosas-Salazar; Rebecca A Halpin; Nadia Fedorova; Susmita Shrivastava; Timothy B Stockwell; R Stokes Peebles; Tina V Hartert; Suman R Das
Journal:  Sci Rep       Date:  2016-05-23       Impact factor: 4.379

8.  Human metapneumovirus epidemiological and evolutionary patterns in Coastal Kenya, 2007-11.

Authors:  Betty E Owor; Geoffrey N Masankwa; Lilian C Mwango; Regina W Njeru; Charles N Agoti; D James Nokes
Journal:  BMC Infect Dis       Date:  2016-06-17       Impact factor: 3.090

9.  Prevailing genotype distribution and characteristics of human respiratory syncytial virus in northeastern China.

Authors:  Yuxuan Zheng; Li Liu; Shaohua Wang; Zhaolong Li; Min Hou; Jingliang Li; Xiao-Fang Yu; Wenyan Zhang; Shucheng Hua
Journal:  J Med Virol       Date:  2016-08-01       Impact factor: 2.327

10.  Molecular epidemiology of human respiratory syncytial virus among children in Japan during three seasons and hospitalization risk of genotype ON1.

Authors:  Akinobu Hibino; Reiko Saito; Kiyosu Taniguchi; Hassan Zaraket; Yugo Shobugawa; Tamano Matsui; Hiroshi Suzuki
Journal:  PLoS One       Date:  2018-01-29       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.