Literature DB >> 26790002

Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants.

Monica K Borucki1, Victoria Lao1, Mona Hwang1, Shea Gardner1, Danielle Adney2, Vincent Munster3, Richard Bowen2, Jonathan E Allen1.   

Abstract

Middle East respiratory syndrome coronavirus (MERS-CoV) is an emerging human pathogen related to SARS virus. In vitro studies indicate this virus may have a broad host range suggesting an increased pandemic potential. Genetic and epidemiological evidence indicate camels serve as a reservoir for MERS virus but the mechanism of cross species transmission is unclear and many questions remain regarding the susceptibility of humans to infection. Deep sequencing data was obtained from the nasal samples of three camels that had been experimentally infected with a human MERS-CoV isolate. A majority of the genome was covered and average coverage was greater than 12,000x depth. Although only 5 mutations were detected in the consensus sequences, 473 intrahost single nucleotide variants were identified. Many of these variants were present at high frequencies and could potentially influence viral phenotype and the sensitivity of detection assays that target these regions for primer or probe binding.

Entities:  

Mesh:

Year:  2016        PMID: 26790002      PMCID: PMC4720378          DOI: 10.1371/journal.pone.0146251

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Middle East respiratory syndrome coronavirus (MERS-CoV) is an emergent beta coronavirus related to SARS virus, and is capable of causing severe respiratory symptoms in humans. Since the discovery of MERS-CoV in 2012, MERS-CoV infections have been detected in 23 countries with local transmission occurring in over half of the affected countries [1]. Although the mortality rate is high (about 40%), subclinical infections may be more prevalent than severe disease, which occurs mainly in elderly individuals and those with pre-existing conditions. Serologic, genetic, epidemiological, and animal infection studies indicate that dromedary camels (DC; Camelus dromedarius) serve as a reservoir for human infection [2-4]. Seropositive assay results have been obtained from DC samples collected throughout the Arabian Peninsula and parts of Africa, including regions where no human cases have been detected [5], however, assay results may be impacted by the presence of cross reactive coronavirus antibodies [1]. Experimental infection of DCs resulted in a self-limiting upper respiratory infection with viral shedding occurring primarily through nasal secretions [2]. MERS-CoV is a member of Betacoronavirus genus lineage 2c [6] and has a genome of positive-sense RNA 30,119 nt in length [7,8]. The first two thirds of the genome codes for a replicase polyprotein, ORF1ab, which contains multiple nonstructural proteins nested within the two open reading frames (ORF1a and ORF1b). The 3’ end of the genome codes for structural proteins: spike glycoprotein, envelope, membrane, and nucleoprotein, and five nonstructural proteins, ORFs 3, 4a, 4b, 5, and 8b [7]. Illumina sequencing of the MERS genome has indicated the presence of high frequency variants in intrahost viral populations [9-11]. To characterize the genetic variants present in nasal swabs from camels experimentally infected with a low passage human MERS-CoV isolate, we amplified a majority of the MERS genome using multiplexed RT-PCR assays and sequenced the products at high depths of coverage. The data generated from this analysis indicate that high frequency intrahost single nucleotide variants (iSNVs) occur throughout the genome and suggest that variant genotypes are present at high enough frequency to be readily transmitted and potentially to impact viral phenotype.

Materials and Methods

Virus

MERS-infected nasal samples were derived from animal work which was previously approved by the Institutional Animal Care and Use Committee of Colorado State University and was performed in compliance with recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institute of Health with every effort to minimize animal suffering [2]. Nasal swabs were collected on days 1, 3 and 5 post-infection from three camels experimentally infected at Colorado State University with MERS-CoV strain HCoV-EMC/2012, as described in Adney et al., 2014 [2]. The viral stock used for the infections was provided with the permission of the Department of Viroscience, Erasmus Medical Center, Rotterdam, The Netherlands and had been propagated in Vero E6 cells. Vero passage 7 of HCoV-EMC/2012 (“Seed”) was used for the camel infections and served as the pre-in vivo passage tissue culture control. No additional cell culture passages were performed as a control for the subsequent passage of the virus from Vero cells into camels. MERS-CoV was obtained from BEI Resources (EMC/2012, NR-44260) for generation of viral cDNA for testing of PCR primers prior to the arrival of nasal samples and seed stock. Note, the number assigned to each camel in this study differ from those assigned to the same camels in Adney et al. [2] (work performed at Colorado State University (CSU)). Camel 1 from this study corresponds to CSU Camel 2, Camel 2 corresponds to CSU Camel 3, and Camel 3 corresponds to CSU Camel 1.

Primer design

Primers were designed that spanned a majority of the genome yielding RT-PCR products of 2–3 kb which overlapped by approximately 100 nt (S1 Table, S1 Fig). With the exception of the first primer set, each region was covered by two primer sets. The primers were designed to perform as two highly multiplexed reactions as described in Gardner et al., 2014 [12].

RNA extraction and RT-PCR

Total RNA was extracted using TRIzol® LS Reagent (Invitrogen) following the manufacturer’s protocol. Reverse transcription was performed using random hexamers and the Superscript III RT reverse transcriptase kit (Invitrogen). Viral cDNA templates were amplified using Q5® Hot Start High-Fidelity 2X Master Mix (New England BioLabs, MA), following manufacturer's instructions. PCR conditions consisted of 98°C for 30 s, followed by 35 cycles of 98°C for 10 s, 60°C for 20 s, and 72°C for 1 min. The final cycle was 72°C for 2 min. PCR products were prepared for sequencing using the QIAquick PCR Purification kit (Qiagen, CA).

Illumina sequencing

Samples were sequenced using 150 paired end reads in multiplex (4 lanes) using the Illumina HiSeq2500's Rapid Run Mode at the Vincent J. Coates Genomics Sequencing Laboratory at the University of California, Berkeley. A plasmid control was used to determine the error rate of the PCR and Illumina sequencing as described previously [13].

Computational analysis

The methods used for analysis of the Illumina data is described in detail in our previous publication [13]. The open source read mapping software SHRiMP2, which was shown to have high read mapping sensitivity was chosen for the tool’s ability to map as many reads as possible in the face of individual errors within each read [14,15]. Variants showing up in 2 or more samples and for which there was at least 100 overlapping read pairs (ORP) were detected in each sample, and were detected at 473 positions in the genome; further analysis was prioritized for these sites. All primers were screened against the reads to trim primer sequences at the ends of reads using fastq-mcf [16] and default parameters. Genomic regions overlapping primer regions were examined separately to consider the potential for un-trimmed primer errors to remain in reads mapped to the reference genome. Consensus sequence that yielded the highest quality data for each camel was submitted to GenBank along with the Seed consensus sequence. The Illumina reads were submitted to the GenBank Sequence Read Archive (BioProject number PRJNA302555).

Results and Discussion

Genome amplification

The PCR primers designed to span the MERS genome coding regions were tested individually using cDNA derived from the MERS-CoV stock obtained from BEI Resources (EMC/2012, NR-44260). All primer pairs yielded product of correct size (S1 Fig). Primers were then tested in staggered sets divided in two multiplex reactions and the presence of each primer set product was accessed using nested PCR of primer overlap regions as described in Gardner et al., 2014 (S1 Fig) [12]. Discrepancies in overlap product sizes were noted in some primer sets covering the last 4000 nts of the genome. These products were sequenced via Sanger sequencing and this confirmed that the primers bound to and amplified the correct area of the genome, however in some cases the amplified sequence terminated unexpectedly resulting in a truncated sequence output. A third multiplex reaction was generated that contained primers to span this region and in some cases this enabled a complete amplification of this region (Fig 1).
Fig 1

Coverage Plots.

UDS coverage depth across the genome was plotted for each sample. The X axis shows the genome position and the Y axis shows the depth coverage (log scale).

Coverage Plots.

UDS coverage depth across the genome was plotted for each sample. The X axis shows the genome position and the Y axis shows the depth coverage (log scale).

Data quality and coverage

S2 Table shows coverage for each iSNV site for which variation was detected in two or more samples. The coverage and read depth varied between time points and camels. Camel 2 day 3 (C2D3) provided the most sequence data with an average read depth of 15,378 x, and 28,488 nt sites covered by at least 100 ORP. C1D5 provided the most complete data for Camel 1 with an average read depth of 15,123 and 28,057 nt sites covered. All time points collected from Camel 3 yielded fairly complete data with C3D1 providing the highest average coverage at 12,585 x depth and with 28,393 nt sites covered. Average coverage for the Seed stock was 29,454 x and 28,075 nt positions were covered. The sequence analysis conducted in this study could not explicitly differentiate between genomes derived from packaged or unpackaged sources. Thus some of the detected mutations could originate from viral genomes that are unable to replicate and persist in isolation. To reduce the emphasis of mutations associated with limited functional relevance, the analysis focused on iSNVs found in high abundance and/or present in two or more samples, and because multiple time points (days 1, 3, and 5) were analyzed for each camel, our analyses favored genomes that persisted during the infection. The expectation is that even when a viral genome is unable to replicate independent of the surrounding population, its presence at sufficient abundance could imply a functional role in the evolution of viral quasispecies. For example, it has previously been shown that defective genomes may function as “defective interfering particles” and may impact viral infection dynamics such as viral persistence [17-19]. Nevertheless, it is important to consider that mutations that are observed at ultra-low abundance and occur rarely could represent viruses with limited functional impact.

Consensus sequence changes

Despite the presence of many high frequency iSNVs, change in the consensus sequence was rare. Two different iSNVs were detected at nt 6172 of nsp3 resulting in two different amino acids, Phe and Ser, at residue 1965. The Seed virus had a C nt present in 14% of the reads with T as consensus coding for Phe in 86% of reads. Six of the 9 camel samples had a T iSNV in 5–42% of reads and a C as consensus (S2 Table, Fig 2). The residue change found in the camel samples, F1965S, and was particularly prevalent in each of the two samples from Camel 1 (C1) for which there was coverage at this site; C1 day 1 (C1D1) had 42% and C1D5 had 19%. The percent of reads with the T iSNV declined between day 1 and 5 for all samples (C1- 42% to 19% (day 1 to day 5), C2- 6%, 6%, and 4.5%, days 1–3 respectively, and C3 had 8% day 1 with none detected on day 5 (coverage >1000x, S2 Table).
Fig 2

Histogram of high frequency iSNVs resulting in a nonsynonymous change in the consensus sequence of one or more samples.

The iSNV detected at site t6172, caused a Phe to Ser mutation in nsp 3 residue 1965 and resulted in a consensus change that differentiated the camel genotypes from that of the Seed. An iSNV, g24538, was present at low frequency in the Seed stock but increased in frequency during propagation in camels with a change in consensus sequence occurring in C2D5, resulting in an N1028S mutation. Note, only data from samples with adequate coverage at the relevant iSNV position were included in the histogram.

Histogram of high frequency iSNVs resulting in a nonsynonymous change in the consensus sequence of one or more samples.

The iSNV detected at site t6172, caused a Phe to Ser mutation in nsp 3 residue 1965 and resulted in a consensus change that differentiated the camel genotypes from that of the Seed. An iSNV, g24538, was present at low frequency in the Seed stock but increased in frequency during propagation in camels with a change in consensus sequence occurring in C2D5, resulting in an N1028S mutation. Note, only data from samples with adequate coverage at the relevant iSNV position were included in the histogram. Synonymous consensus changes were detected; a8396c occurred in the C2D5 sample (80.5% of reads), and t24059c occurred in the spike protein ORF and this change was associated with high frequency variants in many of the samples. The consensus in the Seed sample was a T which was present in 88% of reads and a C present as a variant in 12% of reads. All camel samples had C as consensus at nt 24059. Five of the 9 camel samples yielded sufficient data to detect variant reads, and the presence of a T iSNV was detected in 4 out of 5 samples at frequencies ranging from 3% to 36%. The apparent selection for this nt change in most of the camel samples is interesting given that it should not impact amino acid sequence or phenotype. A change in consensus sequence occurred at nt 24499 and resulted in an amino acid change, N1015T. The A to C transversion was evident in the deep sequence variant genotypes as well. Seed sequence data had the consensus nt, A, present at 88.5% and a C present at 11.5%. Seven of the 9 camel sequences had coverage at this site and the consensus for these sequences was C, however 2 of the 7 sequences had an A present as a variant at 17.6 and 3.0% for C1D5 and C3D1 respectively. The N1015T mutation detected may be a response to the change from cell culture to an in vivo environment. Scobey et al. recently demonstrated that a T1015N mutation resulted from serial tissue culture passage and increased in vitro fitness and plaque size [20], thus it seems likely that the N residue has reduced fitness in vivo thus resulting in the incremental decrease in frequency seen in the sequence data from these 2 camels. A consensus change at nt 24538 was detected in C2D5 reads resulting in an N1028S mutation (Table 1, Fig 2). This position had relatively low coverage, 344x depth, and the iSNV (a24538g) was detected at 70%. High frequency iSNVs were also detected at residue nt 24538 with frequencies between 4–30% in all camel samples that have >350x coverage, and at 3% in the Seed stock. This site is one of the several HR1 residues that form hydrogen bonds with HR2 and are part of the hydrophilic interactions between the domains that stabilize binding of the domains [21].
Table 1

Nonsynonymous mutations occurring at >1% at sites with at least 100 X ORP coverage.

Nt pos.ResidueData typeSEEDC1D1C1D3C1D5C2D1C2D3C2D5C3D1C3D3C3D5ORF
2169I631LConsensusA0.985A1.000A0.926A0.938A0.928A0.894A0.931A0.964A0.877A0.962nsp2
iSNVC0.015NAC0.074C0.062C0.072C0.106C0.069C0.036C0.123C0.038
3475K1066TConsensusA1.000A1.000A1.000A1.000A1.000A1.000A0.992A1.000A1.000A0.804nsp3
iSNV0NANANA0NAC0.009NA0C0.197
6161L1961VConsensusG1.000G0.9930G1.000G1.000G1.000G1.000G1.000G1.000G0.892
iSNVNAT0.008NANANANANANANAC0.108
6172F1965SConsensusT0.858C0.5830C0.812C0.938C0.939C0.955C0.921C1.000C1.000
iSNVC0.142T0.417NAT0.188T0.062T0.061T0.045T0.079NANA
8050D2591AConsensusA0.941A1.000A1.000A1.000A0.622A0.998A0.994A0.912A0.923A0.922
iSNVC0.060NANANAC0.378C0.002C0.006C0.088C0.077C0.078
8614M2779TConsensusT1.00000T0.942T0.84500T0.951T1.000T0.850nsp4
iSNV0NANAC0.058C0.155NANAC0.050NAC0.150
9798D3174HConsensusG0.99800G1.000G0.89600G1.000G1.000G1.000
iSNVA0.002NANANAC0.104NANANANANA
13234E4319VConsensusA1.00000A1.000A1.00000A0.855A1.000A0.940nsp10
iSNV0NANA0NANANAT0.145NAT0.060
23150E565DConsensusG1.000G1.0000G1.000G0.884G0.948G0.877G0.814G0.875G0.931S gene
iSNV0NANA0T0.116T0.052T0.123T0.186T0.125T0.069
24474A1007TConsensusG1.00000G0.986G0.721G1.000G0.969G0.820G0.821G0.889
iSNVNANANAA0.014A0.279NAA0.031A0.180A0.179A0.111
24499N1015TConsensusA0.88600C0.824C0.959C1.000C1.000C0.970C1.000C1.000
iSNVC0.115NANAA0.176A0.041NANAA0.030NANA
24502N1016SConsensusA0.98600A0.967A1.000A1.000A1.000A0.956A1.000A0.978
iSNVG0.014NANAG0.033NANANAG0.044NAG0.022
24538N1028SConsensusA0.96600A0.710A0.803A1.000G0.698A0.944A0.845A0.956
iSNVG0.034NANAG0.290G0.197NAA0.302G0.056G0.155G0.044
27162Y86*Consensus00000G0.878G0.944000orf5
iSNVNANANANANAA0.122A0.056NANANA
28464R204SConsensusG1.000G0.638G0.865G0.995G0.998G0.999G0.999G0.999G0.997G0.980M gene
iSNVNAT0.362T0.100T0.005T0.001T0.001T0.001T0.001T0.002T0.020
28466S2051ConsensusG0.999G1.000G0.920G0.996G0.997G0.999G0.999G0.998G0.997G0.965
iSNVNANAT0.080T0.004T0.002T0.001T0.001T0.002T0.002T0.025
28587R8CConsensusC0.993C0.996C0.993C0.990C0.996C0.950C0.996C1.000C1.000C1.000N gene
iSNVT0.007A0.004T0.007T0.010T0.004T0.050T0.004NANANA
28778L16PConsensusT1.000T1.000T1.000T1.000T0.992T1.000T0.981T0.982T0.996T0.995
iSNV0000C0.0080C0.020C0.018C0.004C0.005
29734G390VConsensusG1.000G1.000G0.571G0.997G0.935G0.990G0.990G1.000G1.000G0.943
iSNVNANAT0.429C0.003T0.065A0.010A0.005NANAT0.057
29737S391IConsensusG0.999G1.000G1.000G0.928G0.906G0.924G1.000G0.932G1.000G0.904
iSNVC0.001NANAT0.072T0.094T0.076NAT0.063NAT0.089
29747Q394HConsensusG0.9920G1.000G0.985G0.936G0.800G1.000G0.992G0.876G0.911
iSNVA0.008NANANAT0.064T0.190NAA0.008T0.124T0.073
29748R395CConsensusC0.9500C1.000C0.994C0.973C1.000C1.000C0.940C0.955C0.998
iSNVT0.050NANAA0.006T0.027NANAT0.058T0.045G0.002
29782P406LConsensusC1.000000C1.000C1.000C1.000T1.000C1.000C0.787
iSNV0NANANANANA00NAT0.213
29783P406LConsensusA1.000000A1.000A1.000A1.000G1.000A1.000A0.787
iSNV0NANANANANA00NAG0.213
29784M407IConsensusA1.000000A1.000A1.000A1.000T1.000A1.000A0.787
iSNV0NANANANANA00NAT0.213
29788I408SConsensusT1.000000T1.000T1.000T1.000T1.000T1.000T0.858
iSNVNANANANANANANANANAG0.142
29794V410SConsensus00.0000.000T1.000T1.000T1.000T1.000T1.000T0.917T1.000
29794iSNVNANANANANANANANAC0.083NA
A consensus change was detected at nt 29849 for sample C3D1 which had an A present in 66.1% of reads and a G in the remaining 33.9%. All other samples had a G at this site with no variants detected in the 6 samples that had reads covering this site. Nt 29849 is located in the 3’ end of the genome outside of coding regions. This mutation occurs at a nt site within reverse primer NSeq-Rev which is used for PCR amplification and sequencing of the N protein, as described by Corman et al., 2012 [22]. A gap in deep sequence data occurred in the genome region of 26600 to 28300 for samples obtained from Camels 1 and 3, and samples from Camel 2 had relatively low coverage in this region (less than 50x depth) (Fig 1). The reason for this was unclear in that no mutations were detected in the primer regions as determined by sequence data from reads that spanned the primer sites. iSNVs were present and may have impacted sensitivity but no trend was observed between iSNV frequency in primer regions and coverage of in corresponding amplicon regions. The primer set that appears to have failed, primer set 14, had previously yielded product when tested individually (S1 Fig). Lack of extensive deep sequence data for this region is unfortunate given that upE PCR primers designed for MERS CoV detection target this region [19].

Subconsensus (iSNV) sequence changes

Illumina data obtained from overlapping read pairs (ORPs) was used to define the subconsensus viral populations present in each sample. iSNVs were detected in 473 non-primer positions of the genome at various frequencies. The iSNVs analyzed were limited to those present in two or more samples and for which there was at least 100 ORPs. Sequencing coverage is a key factor in iSNV detection sensitivity since deeper coverage generally increases the number of ultra-rare variants detected. When comparing iSNVs across samples, for each genome position, the least frequent variant detected across all samples sets the maximum sensitivity threshold. Samples were checked for their coverage and a binomial model (p<0.01) was used to determine whether a rare variant should be detected in a sample with the available coverage. When coverage is found to be too low, it implies that the lack of detecting a rare variant cannot be used to infer the lack of a true rare variant present and the position is marked with a “NA” in S1 Table.

ORF1ab

The Seed stock had iSNVs present at nts 307–313, part of the nsp1 ORF (residues 9–11). All iSNVs were detected at 4% indicating there is a second genotype present at 4% (Table 1). Six of the 8 camel samples had high coverage at these sites thus it appears that these iSNVs do not survive when transmitted to camels. Two nonsynonymous iSNVs were detected in the nsp2 ORF, at nt 2169 residue 631 Ile was replaced by Leu which was present at 2% in Seed and between 4–12% in camel samples (Table 1, Fig 3). At residue 658 (nt 2251) Cys is replaced by Ser in 11% of Seed sequences, whereas the camel samples have this iSNV present 1% or less despite ample coverage. Interestingly, an I631L mutation was noted in strain MERS-CoV Hu/England-N1/2012 [23], a strain that replicated more slowly in Calu-3 cells compared to EMC/2012 (JX869059) strain [24].
Fig 3

Heat maps of nonsynonymous iSNV frequencies and nt positions.

Heat maps for each camel (A-C) show the frequencies and positions of iSNVs for each camel and time point, and for Seed inoculum. The number (count) of iSNV for each frequency value is shown along with the heat map color key is shown above each camel heatmap.

Heat maps of nonsynonymous iSNV frequencies and nt positions.

Heat maps for each camel (A-C) show the frequencies and positions of iSNVs for each camel and time point, and for Seed inoculum. The number (count) of iSNV for each frequency value is shown along with the heat map color key is shown above each camel heatmap. All nonsynonymous iSNVs detected in the nsp3 ORF occurred outside of the papain-like protease (PLpro) domain (residues 1476–1804). A high frequency iSNV was detected in 20% of the reads from C3D5 at nt 3475 resulting in a change in nsp3 residue 1066 from Lys to Thr. This iSNV was also detected in C2D5 at 1%, but not in the Seed virus. Coverage in all samples ranged from 3312x (C1D1) to 26,316x (Seed). Sample C3D5 had adjacent iSNVs in nsp3, both detected at 11% at nts 6159 and 6161 causing a nonsynonymous change, L1961V. The final nonsynonymous iSNV detected in nsp3 occurred at nt 8050, residue D2591A; present in sample C2D1 at 38%, C3D3 and C3D5 at 8%, and Seed at 6%. The nsp4 ORF had two nonsynonymous iSNVs present in a subset of the camel samples. The first iSNV, nt 8614, resulted in an amino acid change, M2779T, and was present in 4 camel samples at 6–16%. One or more samples from each camel had this iSNV but it was not detected in the Seed sequence. Sequence data was available for 5 of the 9 camel samples, and 4 of the 5 camels had coverage >1000x; the fifth camel sample had coverage of 841x and the Seed sample had coverage of 4289x (S2 Table). A Blastp search showed residue M2779 as conserved in MERS sequences. Additionally, a nonsynonymous iSNV was detected at nt 9798, D3174H, at 10% for C2D1 only. Coverage was similar to that of nt 8614 and a Blastp showed this residue to be conserved. One nonsynonymous iSNVs was detected in the nsp10 ORF at nt 13234. Amino acid change in the variant genotype, E4319V, was detected in C3D1 and C3D5, at 15% and 6% respectively, although analysis was limited to 5 camel samples due to limited coverage at this site. nsp10 tends to be well conserved among coronaviruses and the mutation occurs in a region which could potentially interfere with viral replication via nsp10-nsp14 interactions [25]. nsp14 had an iSNV detected at nt 19468, resulting in amino acid change R2012S in 22% of reads for Seed only. Six of the 9 camel samples had coverage >1000x indicating that this iSNV was selected against once introduced to the camel host environment. Although nsp14 5’end encodes a 3′-to-5′ exoribonuclease (ExoN), this iSNV occurred in a domain within nsp14 encoding a (guanine-N7)-methyltransferase [26,27].

Spike gene

Two consensus changes were detected in the spike ORF, nt t24059c and nt a24499c, which result in a nonsynonymous mutation of N1015T. High frequency iSNVs were detected at both of these sites in addition to other iSNVs which were not associated with a consensus change. Two regions of the spike protein had high frequency nonsynonymous iSNVs present- the RBD and the HR1 domain of the S2 subunit (residues 998–1039) which has a role in fusion [21]. The iSNV which occurred in residue 565 of RBD occurred in a region identified as the patch 2 region (RBM; amino acids 484–567) that interfaces with the host receptor hDPP4 [21]. This iSNV, nt t23150, residue E565D, was present in 3/3 samples from Camels 2 and 3, at a frequency of 5–20% (Table 1) but was not detected in the Seed sample. This residue however, has not been shown to be associated with any defined escape mutations [28]. High frequency nonsynonymous iSNVs were detected at 3 sites in the HR1 domain- A1007T, N1015T, and N1028S, and lower frequency iSNVs (<5%) were detected at residue 1016. High frequency iSNVs coding for a nonsynonymous mutation, A1007T, in the HR1 domain were detected at nt 24474. C2D1 had an adenine present at 28%, and C3D1 and D3 had this iSNV at 18%, and D3 at 11%. Two of the three camel samples with no iSNV detected had relatively low coverage (71x and 387x; S2 Table). There were no iSNVs detected at this site in the Seed data although coverage at this site was fairly good (4313x). Nts 24499 and 24502 had iSNVs present that effected adjacent residues (residues N1015T, N1016S). Four of the 9 camel samples had >1000x coverage at these sites, and two of these samples had iSNVs. C1D5 had iSNVs at both sites at a frequency of 18% for C24499A and 3% for A24502G. C3D1 had iSNVs present at 3% and 4%, and the Seed had 12% and 1% for these sites, respectively.

ORF5

An iSNV resulting in a stop codon at residue 109 was detected in 12% of reads at nt 27162 for camel C2D5, the only sample for which there was adequate coverage at this site (>100x, S2 Table). This would truncate the protein roughly in half. This iSNV was also detected in 45% of reads for this strain of MERS in a previous study and maybe an artifact of passage in cell culture [7].

M gene

A high frequency iSNV was detected in the M protein ORF, nt 28464, resulting in amino acid change R204S. In particular, Camel 1 had the mutation detected in 36% of reads on day 1, 10% of reads on day 2 and 1% on day 3. Sample C3D5 had the mutation in 2% of reads. Adjacent residue, 205, was also variable with Ser mutating to Ile in 8% of reads for C1D3 and 3% of reads for C3D5. Given the proximity of the two reads one would expect the mutations to be linked, thus the lack of variation for C1D1 residue 205 was surprising. Sequence coverage was excellent for all samples at this site and no variation was detected in the Seed data. These mutations occur in a region that may impact the interaction of the M protein with the N protein [29].

N gene

All nonsynonymous changes occurred in the intrinsically disordered regions (IDR) of the nucleoprotein ORF. IDRs have been shown to modulate RNA binding and oligomerization of the N protein [30-32], display exceptional genetic plasticity [33], and may play a role in cross-species transmission of zoonotic coronaviruses including MERS-CoV [34,35]. Two of the 13 residue mutations occurred in the 5’ IDR and occurred in limited number of samples. The iSNV at nt 28587, R8C, was detected only in C2D3 in 5% of the reads and in the Seed at 1%. A second residue variant was detected at nt 28778, L16P, in 5 of the 9 camel samples at between 0.4 and 2.0%, but was not detected in the Seed despite excellent coverage (>218K). The remaining nonsynonymous iSNVs occurred in the IDR at the 3’ end of the ORF, and some were detected at high frequency. In particular, a cluster of iSNVs were detected between nts 29734 and 29748 (residues 390–395) despite rather limited coverage for this region. For example, at G390V mutation was detected at 43% in C1D3 reads (nt 29734), and at 7% for C2D1, and 6% for C3D5, but not in the Seed data. Coverage was limited, less than 5K depth for all samples. An iSNV was detected at nt 29736, S391R, at 8% for the Seed sample. Amino acid change S391I was detected in 6–9% for five camel samples (C1D5, C2D1, C2D3, C3D1, C3D5) but not in the Seed data. Similar to previous sites, coverage was relatively low. Three adjacent iSNVs were detected at between 4–10% in the Seed data at nts 29739–41 resulting in I392F. A variant was detected at nt 29743 which resulted in a Thr to Ile mutation at residue 392 in 4% of reads from C2D5. The same residue was present as a Val variant for the Seed data due to two adjacent iSNVs at 29742 and 29743 present at 4% and 7% respectively. An iSNV at nt 29747 was detected in several camel samples and resulted in a nonsynonymous change, Q394H. The iSNV was detected in reads from both C2D1 and C2D3 at 6% and 19% respectively, and also in C3D3 and C3D5 at 12% and 7% respectively. Lack of detection in C1 data may have been due to very low coverage at this site for D1 and D3 (coverage 42 and 284 respectively), however the Seed coverage was considerably greater, 3424x, indicating that if the iSNV was present in the Seed inoculum it was relatively rare and this positively-charged variant may have been arisen due to the change in host environment (in vitro to in vivo). A search of GenBank data revealed that H394 is not represented in any of the MERS sequence data available, although the corresponding residue for other betacoronaviruses MHV and HCoV-OC43 is positively charged (Arg). Multiple camel samples had an iSNV at nt 29748 resulting in a R395C occurring at a frequency of 3% for C2D1, 6% for C3D1, and 5% for C3D3, although coverage was limited as described above. Interestingly a different iSNV was present in the Seed reads at this site, R395Y, in 5% of reads. These data indicate that this region is a hot spot for genetic diversity, and notably there is no indication that these iSNVs are linked given that they vary in frequency and are often present in different samples. C3D5 had a unique set of high frequency variants in this region caused by 3 consecutive nt changes (29782–84) resulting in 2 amino acid changes, P406L and M407I, both of which were detected at 21%. The adjacent nts 29788 and 29794 were the site of 2 more nonsynonymous changes detected in C3D5 data; I408S and V410A, present at 14% and 8% respectively. It is notable that the other camel samples and the Seed sample all had very high coverage in this region, ranging from 50K-220K in depth. The presence of numerous high frequency variants in the 3’ end of the N protein, well past the end of the ORF8b coding region (nt 29100) underscores the potential genetic diversity of this virus. A study by Agnihothram et al., 2014 revealed that a human strain of MERS-CoV, Hu/England-N1/2012 had 2 amino acid deletions in this region of the N protein, residues 391 and 392 [24].

Insertion and deletion (Indel) variant data

Indels detected at >1% were analyzed to identify hotspots and potential impact. No indels occurred in regions that would affect primer binding. Eight positions were identified that had an insertion present at greater than 1%. Six of the 8 inserts occurred in the nsp3 gene, and all 8 were within ORF1ab. The highest frequency insert, “A”, occurred at nt 6074 and was detected at 5.2% for C3D3 as well being present as an “TTTA” insert for 1.2% of reads for C2D3. Insertions that were present for all or the majority of camel samples were also found in the Seed data. For example Seed had an “AA” insertion at site 2845 at 0.0029%, and camels 2 and 3, days 1, 3, and 5 had this insert detected at between 0.01 and 0.02%. Interestingly camel 1 had only a single “A” present at this site and it was detected at 1.1% (day 1) to 0.3% (days 3, 5). Deletions detected at greater than 1% occurred at 21 sites and ranged in frequency from 1.3% to 15.3%, and the number of bases deleted ranged from 1 to 20 nts. The two highest frequency deletions occurred at nt 26856 and were detected in C2D3 at 15.3% and C2D5 at 6.1%; both with 4 nts deleted.

Nonsynonymous (Ka) and synonymous (Ks) substitution rates

To check for evidence of selective pressure within samples, the Ka/Ks ratio was checked for each gene using one representative passage of each camel (C1D5, C2D1 and C3D5) and the seed stock. Ka/Ks ratios were calculated for each sample using the KaKs calculator [36]. To calculate the ratio for each gene in each sample, the consensus sequence was compared with a concatenation of the rare variants, which are treated as a second sequence. Only one gene in one sample (C2D1, M pro) showed significant divergence from 1 (Ka/Ks = 1.49, p-value < 0.05). Thus, there appeared to be limited evidence for an identifiable selective pressure on the sub-population of each sample.

Conclusion

Next generation sequencing (NGS) platforms such as Illumina enable viral populations to be defined with high granularity and provide insight into the role of variant genotypes in viral virulence and transmission. Previously NGS was used to rapidly obtain consensus sequences of multiple MERS-CoV samples [9]. Unexpectedly, analysis of the consensus sequence data revealed the presence of MERS-CoV variants at relatively high frequencies, thus leading the authors to speculate about the role of viral quasispecies in cross-species transmission. Results of the present study demonstrate the value of ultra-deep Illumina sequencing for identification of both rare and high frequency variants present in the nasal samples of three camels that had been experimentally infected with a human isolate of MERS-CoV [2].

Potential impact of high frequency iSNVs on biosurveillance

Infection of camels with a human isolate of MERS-CoV induced numerous high frequency iSNVs throughout the genome upon a single passage in camels. This indicates that MERS-CoV, similar to other coronaviruses, is prone to extreme genetic diversity in intrahost viral populations that may change rapidly in response to a change in host environment [9,11,37-40]. Until patterns of variation are defined for quasispecies present in non-laboratory passaged MERS-CoV samples, design of reagents for detecting MERS-CoV infections will be vulnerable to the effects of intrahost genetic diversity on reagent sensitivity. For example, at nt 2966, there is an iSNV present at 4% for Seed but greater than 20% in samples from all 3 camels indicating that this iSNV may be selected for in camels. Interestingly, an NCBI Blast search of the 60 nt region including this iSNV site revealed no indication of variability at this site for any of the 99 MERS virus sequences that were present on GenBank. This mutation, although present in a high proportion of the intrahost populations, would not be included in Genbank consensus sequences that are used to identify conserved regions for primer design and thus has the potential to interfere with viral detection assays that include this nt position. Indeed, only 5 mutations were detected in the consensus sequences, although 473 intrahost single nucleotide variants were identified. Many of these variants were present at high frequencies and could potentially influence viral phenotype and adversely affect the sensitivity of detection assays that target these regions for primer or probe binding. PCR amplification of MERS-CoV (strain EMC/2012) using primers tested as individual reactions (A) and in two multiplexed reactions (B). (A) Primer sets are listed above each gel product. (B). Multiplex products include bands of expected size, as well as smaller bands and primer dimers. (PDF) Click here for additional data file.

MERS primer sequences and positions.

(PDF) Click here for additional data file.

Frequency of consensus and iSNV nts, and position coverage based on ORP data (non primer regions).

The iSNV data is distributed between two spreadsheets, the first, S2A Table, shows all iSNVs that detected outside of the primer regions, and the second, S3 Table, shows iSNVs occurring in primer regions. iSNVs detected within primer sites were examined separately to consider the potential for un-trimmed primer errors to remain in reads mapped to the reference genome. In total this table consists of approx. 2000 rows of iSNV data and is available on request as an Excel spreadsheet. (XLSX) Click here for additional data file.

Frequency of consensus and iSNV nts, and position coverage based on ORP data (primer regions).

(PDF) Click here for additional data file.
  39 in total

1.  Quantitative characterization of defective virus emergence by deep sequencing.

Authors:  Collin Timm; Fulya Akpinar; John Yin
Journal:  J Virol       Date:  2013-12-18       Impact factor: 5.103

2.  Identification of human neutralizing antibodies against MERS-CoV and their role in virus adaptive evolution.

Authors:  Xian-Chun Tang; Sudhakar S Agnihothram; Yongjun Jiao; Jeremy Stanhope; Rachel L Graham; Eric C Peterson; Yuval Avnir; Aimee St Clair Tallarico; Jared Sheehan; Quan Zhu; Ralph S Baric; Wayne A Marasco
Journal:  Proc Natl Acad Sci U S A       Date:  2014-04-28       Impact factor: 11.205

3.  Severe respiratory illness caused by a novel coronavirus, in a patient transferred to the United Kingdom from the Middle East, September 2012.

Authors:  A Bermingham; M A Chand; C S Brown; E Aarons; C Tong; C Langrish; K Hoschler; K Brown; M Galiano; R Myers; R G Pebody; H K Green; N L Boddington; R Gopal; N Price; W Newsholme; C Drosten; R A Fouchier; M Zambon
Journal:  Euro Surveill       Date:  2012-10-04

4.  Evaluation of serologic and antigenic relationships between middle eastern respiratory syndrome coronavirus and other coronaviruses to develop vaccine platforms for the rapid response to emerging coronaviruses.

Authors:  Sudhakar Agnihothram; Robin Gopal; Boyd L Yount; Eric F Donaldson; Vineet D Menachery; Rachel L Graham; Trevor D Scobey; Lisa E Gralinski; Mark R Denison; Maria Zambon; Ralph S Baric
Journal:  J Infect Dis       Date:  2013-11-18       Impact factor: 5.226

5.  Replication and shedding of MERS-CoV in upper respiratory tract of inoculated dromedary camels.

Authors:  Danielle R Adney; Neeltje van Doremalen; Vienna R Brown; Trenton Bushmaker; Dana Scott; Emmie de Wit; Richard A Bowen; Vincent J Munster
Journal:  Emerg Infect Dis       Date:  2014-12       Impact factor: 6.883

6.  Multiplex degenerate primer design for targeted whole genome amplification of many viral genomes.

Authors:  Shea N Gardner; Crystal J Jaing; Maher M Elsheikh; José Peña; David A Hysom; Monica K Borucki
Journal:  Adv Bioinformatics       Date:  2014-08-03

Review 7.  Middle East respiratory syndrome: An emerging coronavirus infection tracked by the crowd.

Authors:  Ian M Mackay; Katherine E Arden
Journal:  Virus Res       Date:  2015-02-02       Impact factor: 3.303

Review 8.  Molecular evolution and emergence of avian gammacoronaviruses.

Authors:  Mark W Jackwood; David Hall; Andreas Handel
Journal:  Infect Genet Evol       Date:  2012-05-17       Impact factor: 3.342

9.  Prediction of Intrinsic Disorder in MERS-CoV/HCoV-EMC Supports a High Oral-Fecal Transmission.

Authors:  Gerard Kian-Meng Goh; A Keith Dunker; Vladimir Uversky
Journal:  PLoS Curr       Date:  2013-11-13

10.  Full-genome sequence of human betacoronavirus 2c jordan-n3/2012 after serial passage in Mammalian cells.

Authors:  Kenneth G Frey; Cassie L Redden; Kimberly A Bishop-Lilly; Reed Johnson; Lisa E Hensley; Kanakatte Raviprakash; Thomas Luke; Tad Kochel; Vishwesh P Mokashi; Gabriel N Defang
Journal:  Genome Announc       Date:  2014-05-29
View more
  12 in total

1.  SARS-CoV-2 Mutant Spectra at Different Depth Levels Reveal an Overwhelming Abundance of Low Frequency Mutations.

Authors:  Brenda Martínez-González; María Eugenia Soria; Lucía Vázquez-Sirvent; Cristina Ferrer-Orta; Rebeca Lobo-Vega; Pablo Mínguez; Lorena de la Fuente; Carlos Llorens; Beatriz Soriano; Ricardo Ramos-Ruíz; Marta Cortón; Rosario López-Rodríguez; Carlos García-Crespo; Pilar Somovilla; Antoni Durán-Pastor; Isabel Gallego; Ana Isabel de Ávila; Soledad Delgado; Federico Morán; Cecilio López-Galíndez; Jordi Gómez; Luis Enjuanes; Llanos Salar-Vidal; Mario Esteban-Muñoz; Jaime Esteban; Ricardo Fernández-Roblas; Ignacio Gadea; Carmen Ayuso; Javier Ruíz-Hornillos; Nuria Verdaguer; Esteban Domingo; Celia Perales
Journal:  Pathogens       Date:  2022-06-08

2.  Analysis of intrapatient heterogeneity uncovers the microevolution of Middle East respiratory syndrome coronavirus.

Authors:  Donghyun Park; Hee Jae Huh; Yeon Jeong Kim; Dae-Soon Son; Hyo-Jeong Jeon; Eu-Hyun Im; Jong-Won Kim; Nam Yong Lee; Eun-Suk Kang; Cheol In Kang; Doo Ryeon Chung; Jin-Hyun Ahn; Kyong Ran Peck; Sun Shim Choi; Yae-Jean Kim; Chang-Seok Ki; Woong-Yang Park
Journal:  Cold Spring Harb Mol Case Stud       Date:  2016-11

3.  Coronavirus HKU15 in respiratory tract of pigs and first discovery of coronavirus quasispecies in 5'-untranslated region.

Authors:  Patrick Cy Woo; Susanna Kp Lau; Chi-Ching Tsang; Candy Cy Lau; Po-Chun Wong; Franklin Wn Chow; Jordan Yh Fong; Kwok-Yung Yuen
Journal:  Emerg Microbes Infect       Date:  2017-06-21       Impact factor: 7.163

Review 4.  MERS-CoV: Understanding the Latest Human Coronavirus Threat.

Authors:  Aasiyah Chafekar; Burtram C Fielding
Journal:  Viruses       Date:  2018-02-24       Impact factor: 5.048

5.  Spike gene deletion quasispecies in serum of patient with acute MERS-CoV infection.

Authors:  Xiaoyan Lu; Lori A Rowe; Michael Frace; James Stevens; Glen R Abedi; Osman Elnile; Taleb Banassir; Malak Al-Masri; John T Watson; Abdullah Assiri; Dean D Erdman
Journal:  J Med Virol       Date:  2016-08-22       Impact factor: 2.327

Review 6.  MERS-CoV diagnosis: An update.

Authors:  Sameera Al Johani; Ali H Hajeer
Journal:  J Infect Public Health       Date:  2016-04-20       Impact factor: 3.718

Review 7.  Evolution, Ecology, and Zoonotic Transmission of Betacoronaviruses: A Review.

Authors:  Herbert F Jelinek; Mira Mousa; Eman Alefishat; Wael Osman; Ian Spence; Dengpan Bu; Samuel F Feng; Jason Byrd; Paola A Magni; Shafi Sahibzada; Guan K Tay; Habiba S Alsafar
Journal:  Front Vet Sci       Date:  2021-05-20

8.  Correction: Middle East Respiratory Syndrome Coronavirus Intra-Host Populations Are Characterized by Numerous High Frequency Variants.

Authors:  Monica K Borucki; Victoria Lao; Mona Hwang; Shea Gardner; Danielle Adney; Vincent Munster; Richard Bowen; Jonathan E Allen
Journal:  PLoS One       Date:  2016-04-25       Impact factor: 3.240

9.  Selection and Characterization of Monoclonal Antibodies Targeting Middle East Respiratory Syndrome Coronavirus through a Human Synthetic Fab Phage Display Library Panning.

Authors:  Yoonji Kim; Hansaem Lee; Keunwan Park; Sora Park; Ju-Hyeon Lim; Min Kyung So; Hye-Min Woo; Hyemin Ko; Jeong-Min Lee; Sun Hee Lim; Byoung Joon Ko; Yeon-Su Park; So-Young Choi; Du Hyun Song; Joo-Yeon Lee; Sung Soon Kim; Dae Young Kim
Journal:  Antibodies (Basel)       Date:  2019-07-31

10.  Genetic diversity of MERS-CoV spike protein gene in Saudi Arabia.

Authors:  Sayed S Sohrab; Esam I Azhar
Journal:  J Infect Public Health       Date:  2019-12-09       Impact factor: 3.718

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.