Jonathan Daniel Ip1, Kin-Hang Kok1, Wan-Mui Chan1, Allen Wing-Ho Chu1, Wai-Lan Wu1, Cyril Chik-Yan Yip2, Wing-Kin To3, Owen Tak-Yin Tsang4, Wai-Shing Leung4, Thomas Shiu-Hong Chik4, Kwok-Hung Chan1, Ivan Fan-Ngai Hung5, Kwok-Yung Yuen6, Kelvin Kai-Wang To7. 1. State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China. 2. Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China. 3. Department of Pathology, Princess Margaret Hospital, Hong Kong Special Administrative Region, China. 4. Department of Medicine and Geriatrics, Princess Margaret Hospital, Hong Kong Special Administrative Region, China. 5. Department of Medicine, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China. 6. State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China; Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China. 7. State Key Laboratory for Emerging Infectious Diseases, Carol Yu Centre for Infection, Department of Microbiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong Special Administrative Region, China; Department of Microbiology, Queen Mary Hospital, Hong Kong Special Administrative Region, China. Electronic address: kelvinto@hku.hk.
Abstract
OBJECTIVES: SARS-CoV-2 has evolved rapidly into several genetic clusters. However, data on mutations during the course of infection are scarce. This study aims to determine viral genome diversity in serial samples of COVID-19 patients. METHODS: Targeted deep sequencing of the spike gene was performed on serial respiratory specimens from COVID-19 patients using nanopore and Illumina sequencing. Sanger sequencing was then performed to confirm the single nucleotide polymorphisms. RESULTS: A total of 28 serial respiratory specimens from 12 patients were successfully sequenced using nanopore and Illumina sequencing. A 75-year-old patient with severe disease had a mutation, G22017T, identified in the second specimen. The frequency of G22017T increased from ≤5% (nanopore: 3.8%; Illumina: 5%) from the first respiratory tract specimen (sputum) to ≥60% (nanopore: 67.7%; Illumina: 60.4%) in the second specimen (saliva; collected 2 days after the first specimen). The difference in G22017T frequency was also confirmed by Sanger sequencing. G22017T corresponds to W152L amino acid mutation in the spike protein which was only found in <0.03% of the sequences deposited into a public database. Spike amino acid residue 152 is located within the N-terminal domain, which mediates the binding of a neutralizing antibody. DISCUSSION: A spike protein amino acid mutation W152L located within a neutralizing epitope has appeared naturally in a patient. Our study demonstrated that monitoring of serial specimens is important in identifying hotspots of mutations, especially those occurring at neutralizing epitopes which may affect the therapeutic efficacy of monoclonal antibodies.
OBJECTIVES:SARS-CoV-2 has evolved rapidly into several genetic clusters. However, data on mutations during the course of infection are scarce. This study aims to determine viral genome diversity in serial samples of COVID-19patients. METHODS: Targeted deep sequencing of the spike gene was performed on serial respiratory specimens from COVID-19patients using nanopore and Illumina sequencing. Sanger sequencing was then performed to confirm the single nucleotide polymorphisms. RESULTS: A total of 28 serial respiratory specimens from 12 patients were successfully sequenced using nanopore and Illumina sequencing. A 75-year-old patient with severe disease had a mutation, G22017T, identified in the second specimen. The frequency of G22017T increased from ≤5% (nanopore: 3.8%; Illumina: 5%) from the first respiratory tract specimen (sputum) to ≥60% (nanopore: 67.7%; Illumina: 60.4%) in the second specimen (saliva; collected 2 days after the first specimen). The difference in G22017T frequency was also confirmed by Sanger sequencing. G22017T corresponds to W152L amino acid mutation in the spike protein which was only found in <0.03% of the sequences deposited into a public database. Spike amino acid residue 152 is located within the N-terminal domain, which mediates the binding of a neutralizing antibody. DISCUSSION: A spike protein amino acid mutation W152L located within a neutralizing epitope has appeared naturally in a patient. Our study demonstrated that monitoring of serial specimens is important in identifying hotspots of mutations, especially those occurring at neutralizing epitopes which may affect the therapeutic efficacy of monoclonal antibodies.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has spread rapidly, resulting in more than 28 million laboratory-confirmed COVID-19 cases globally as of 14 September 2020. SARS-CoV-2 mainly causes respiratory tract infection, although extrapulmonary manifestations have been reported [1]. The efficient person-to-person transmission may be related to the high viral load shortly after symptom onset and the large number of asymptomatic individuals [2,3]. Prolonged persistence of RNA in body fluids is common despite the development of neutralizing antibodies [2].As an RNA virus, the genome replication of SARS-CoV-2 is prone to error, and gene mutations arise frequently. Whole genome sequencing showed that the viral genomes may differ between family members [4]. Major genetic diversity has already been seen [5]. Phylogenetic analysis has demonstrated that even patients within the same geographical region were infected with genetically diverse SARS-CoV-2 strains [6].Most studies on viral genome focus on the population level. However, understanding viral mutations on an individual patient level can have important implications for pathogenesis and treatment of virus infection. For A(H1N1)pdm09 virus, the D225G substitution of haemagglutinin was more enriched in the lower respiratory tract specimens because this substitution enables the virus to bind more efficiently to α2,3 sialic acid receptors that are predominantly found in the alveoli [7]. Furthermore, analysis of serial specimens allows the detection of resistant mutants arising during antiviral treatment [8]. A recent analysis of bronchoalveolar fluid from COVID-19patients showed that intra-host viral genome variation is a rare event, with a median of 1 nucleotide variant with minor allele frequency of ≥20% [9], but whether mutations arise during the course of illness was not examined.In this study, we analysed serial specimens from individual patients to search for nucleotide and amino acid variations that develop over the course of illness. We have specifically chosen the spike protein for targeted sequencing because the spike protein is responsible for the binding of the virus to the host cell angiotensin-converting enzyme 2 (ACE2) receptor and membrane fusion [10]. Our recent study has demonstrated that the level of IgG against spike protein RBD has a high correlation with neutralizing antibody titre [11].
Patients and methods
Patients
This study included archived respiratory specimens from COVID-19patients with laboratory confirmation by real-time reverse transcription-polymerase chain reaction (RT-PCR) targeting the E or RdRp-Hel gene as described previously [12,13]. The respiratory specimens included nasopharyngeal swab, saliva, endotracheal aspirate and sputum. Serial specimens were defined as specimens collected from the same patient at least 1 day apart. Some patients in this study were enrolled in our previous randomized controlled trial on triple therapy with interferon-β1b, lopinavir–ritonavir and ribavirin [14], or in our previous study on viral load and serological profile [2]. This study has been approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster (UW 13-372).
SARS-CoV-2 spike gene RT-PCR
Viral RNA was extracted from respiratory specimens using Qiagen Viral RNA Mini Kit (Hilden, Germany) or BioMérieux NucliSENS easyMag (Marcy-l'Étoile, France) as described previously [4]. Full-length spike gene was amplified using SuperScript™ III one-step RT-PCR system with Platinum™ Taq High Fidelity DNA polymerase (Thermo Fisher Scientific, Waltham, MA, USA) using primer set 1, or primer set 2, and 3 (Table S1) under the following RT-PCR conditions: 50°C for 30 min, 94°C for 2 min; then 40 cycles of 94°C for 30 s, 55°C for 30 s and 68°C for 250 s. Amplified PCR products were either purified by 0.5x AMPure XP bead (Beckman Coulter, California, CA, USA) or gel purified by QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany). Purified PCR products were then subjected to nanopore or Illumina library preparations. Since RT-PCR may lead to bias in assessing frequency of nucleotide variants, we performed separate RT-PCRs for nanopore and Illumina sequencing.
Nanopore targeted sequencing
DNA libraries were prepared using the Ligation Sequencing Kit (SQK-LSK109, Oxford Nanopore Technologies) according to the manufacturer's instructions with modifications [14]. Briefly, amplified and purified spike gene PCR products were subjected to PCR barcoding using PCR Barcoding Expansion 1-96 kit (EXP-PBC096). Barcoded samples were then pooled at equal molar ratio prior to end repair, sequencing adapter ligation and clean-up. The end repair incubation time was prolonged from the manufacturer's recommendation of 5 min at 20°C and 5 min at 65°C, to 30 min at 20°C and 30 min at 65°C. The Oxford Nanopore MinION platform and R9.4.1 flow cell were used for sequencing.Guppy v3.4.5 was used in converting the raw signal data into FASTQ format, demultiplexing and removing nanopore adaptor sequences. Only reads with a minimum Q score of 10 were included for subsequent analysis. The sequencing run was quality checked using MinIONQC [15]. Quality-checked reads were filtered by length to remove primer dimer and chimera. The filtered reads were then aligned with the SARS-CoV-2 reference genome Wuhan-Hu-1 (GenBank accession number MN908947.3) using the Burrows–Wheeler Aligner (BWA) [16]. SAMtools v1.10 and BCFtools mpileup were used in creating a variant file [17,18]. Only reads with mapping quality ≥30 and base quality ≥20 were used in piling. BCFtools [17] call, vcfutils.pl [18] and SEQTK [19] were used in generating the FASTA consensus sequence. The variation frequencies were obtained using SAMtools v1.10 [18] and VarScan2 [20] with reads of mapping quality score ≥30, basecalling quality score ≥20. Only single nucleotide polymorphisms (SNPs) with a minimum of 250 × coverage, p < 0.001, minimum variant frequency of 0.5%, minimum average basecalling quality score of 30 and minimum read support for variant of 10 reads were reported. The raw data have been deposited in the NCBI Sequence Read Archive with accession number PRJNA664839.
Targeted sequencing of spike gene using Illumina sequencing
For Illumina sequencing, DNA libraries were prepared using the Nextera XT DNA Library Prep Kit (Illumina) following the manufacturer's protocol with 1 ng of the spike gene cDNA as the starting material. In brief, DNA fragments with adaptors were generated by tagmentation reaction at 55°C for 5 min. The tagmented DNA were then indexed and amplified in a 50 μL reaction volume with 12 cycles of PCR, followed by AMPure XP bead clean-up. The quality of the enriched libraries was then validated using Agilent Bioanalyzer, Qubit and real-time RT-qPCR.The libraries were pooled at equal molar ratio, denatured and diluted to optimal concentration prior to sequencing. The Illumina NovaSeq 6000 was used for sequencing to generate pair-end 151 bp reads. The whole dataset was deposited in the NCBI Sequence Read Archive with accession number PRJNA664541.Illumina adapters were removed from the reads. Any reads with length of at least 100 bp and at least 90% of bases with quality score of ≥30 were retained during the quality filtering process using FASTP [21]. Pair-end reads were aligned with the reference genome SARS-CoV-2 Wuhan-Hu-1 (MN908947.3) using the BWA [16]. Sorting and read deduplication were performed using SAMtools v1.10 [18] and Picard v2.22.3. The consensus sequence was generated using SEQTK [19] and BCFTools [17] with reads of mapping quality and basecalling quality score ≥30. The coverage depth is generated by SAMtools depth command. The variation frequencies were obtained using SAMtools v1.10 [18] and VarScan2 [20] with reads of mapping quality score ≥30 and basecalling quality score ≥35. Only SNPs with a minimum of 250 × coverage p < 0.001, minimum variant frequency of 0.5%, minimum average basecalling quality score of 30 and minimum read support for variant of 10 reads were reported.
Sanger sequencing
Sanger sequencing was performed as described previously [22]. Both strands were sequenced twice with an ABI Prism 3730xl DNA Analyser (Applied Biosystems, Foster City, CA, USA) using forward and reverse primers.
Consensus sequence
The consensus sequences were deposited into GISAID (Accession number EPI_ISL_538524-538551).
Results
This study included 21 patients with at least two serial respiratory tract specimens available. A total of 98 serial respiratory tract specimens, collected between January 23 and March 16, 2020, were retrieved. SARS-CoV-2spike gene RT-PCR was positive in 19 patients and 55 specimens. Of these 19 patients, 12 patients had at least two serial samples positive on different days (Table S2). Hence, nanopore and Illumina sequencing were performed for 12 patients on 28 specimens. These 12 patients included five males and seven females. The median age was 62 years, with a range between 30 and 75 years. Out of the 12 patients, four (33.3%) required oxygen supplementation, two (16.7%) were admitted to the intensive care unit, one (8.3%) was intubated and one (8.3%) died.For these 12 patients, the mean filtered coverage was 36 393 × for nanopore sequencing and 40 990 × for Illumina sequencing runs (Tables S3 and S4). Out of these 12 patients, only one patient had nucleotide differences between samples collected on different days and with a difference in variant frequency between initial and subsequent samples exceeding 10% in both nanopore and Illumina sequencing. This patient was a 75-year-old man who required oxygen supplementation during hospitalization. For nanopore sequencing, the G22017T (guanine to thymine) was found in 3.8% (158/4116) of the reads in the sputum specimen collected on day 7 (first specimen) after symptom onset, while it accounts for 67.8% (12060/17795) of reads in the saliva specimen collected on day 9 after symptom onset (second specimen) (Fig. 1
A). For Illumina sequencing, G22017T was found in 5.0% (356/7082) of the reads in the first specimen, while this mutation was found in 60.4% (4578/7574) in the second specimen (Fig. 1B). Sanger sequencing was performed to verify the mixed population at the position 22 017 in the second specimen (Fig. 1C). He had received lopinavir-ritonavir and ribavirin one day before the collection of the second specimen. G22017T resulted in the non-synonymous mutation of W152L (tryptophan to leucine) in the N-terminal domain of the S1 subunit. This mutation is within the N3 loop of the N-terminal domain, which mediates the binding of a neutralizing antibody, 4A8 [23].
Fig. 1
G22017T spike gene mutation in patient HKU-IHCE0511-006. (A) Nanopore sequencing. (B) Illumina sequencing. (C) Sanger sequence tracing, demonstrating the double peak at position 22017 in the day 9 specimen.
G22017Tspike gene mutation in patient HKU-IHCE0511-006. (A) Nanopore sequencing. (B) Illumina sequencing. (C) Sanger sequence tracing, demonstrating the double peak at position 22017 in the day 9 specimen.Next, we determined the prevalence of W152L mutation among SARS-CoV-2 isolates deposited in GISAID up to 15 September 2020 (Table S5). Of 92942 sequences available, 21 (0.023%) had W152L mutation.
Discussion
Summary of principal findings
This study assessed the SARS-CoV-2spike gene mutations in serial specimens from COVID-19patients using a combination of nanopore, Illumina and Sanger sequencing. In one patient with severe disease, an important non-synonymous mutation G22017T, which results in W152L (tryptophan to leucine) mutation in the N-terminal domain of the spike gene, was present at a low frequency of ≤5% in the sputum specimen but represented the predominant population (≥60%) in the saliva specimen collected 2 days later. W152L is located within the binding site of a recently identified neutralizing antibody 4A8. Spike protein amino acid residue 152, together with residue 145 in the N3 loop of N-terminal domain, interacts with the 4A8 antibody via hydrophobic interactions [23]. Since tryptophan has an aromatic side chain while leucine has an aliphatic side chain, the W152L mutation might change the structure of the N3 loop in the N-terminal domain, hence affecting the binding of neutralizing antibodies. Such observation may be an example of microevolution induced by host antibody response in an attempt to evade neutralizing antibody response which arise during the course of illness.
Comparison with other studies
W152L mutation is located at the N-terminal domain of the spike protein. The presence of W152L from a minor population to become the predominant population suggests that this site may be under immune selection pressure in this patient. Neutralizing antibodies against N-terminal domain could be found in COVID-19patients [[23], [24], [25]], and may neutralize virus infection for several reasons. First, Chi et al. postulated that the antibody may prevent the conformational change of the S protein which is necessary for fusion [23]. Second, antibody against N-terminal domain may block the spike protein that binds to an unidentified receptor. The N-terminal domain of S1 subunit has been shown to participate in binding for other coronaviruses. The humancoronaviruses OC43 and HKU1 has been shown to bind to 9-O-acetylated sialic acids, while MERS-CoV-2 binds to non-acetylated sialic acid, via the N-terminal domain [26,27]. The mouse hepatitis virus binds to host cell receptor carcinoembryonic antigen-related cell adhesion molecule via the S1 N-terminal domain [28].Although nucleotide changes are frequent, amino acid mutation is actually an infrequent event for the spike gene [29], except for D614G which now accounts for the majority of the SARS-CoV-2 reported worldwide [5]. Mutations at spike protein amino acid residue 152 are rarely found, accounting for <0.03% of the strains deposited in GISAID as of 15 September 2020. The low prevalence of W152L mutation suggest that this mutation may confer reduced fitness or transmissibility. However, since usually only one viral sequence from each patient would be deposited into public databases, the prevalence of W152L mutation may have been underestimated.Our patient with W152L mutation was treated with ribavirin and lopinavir–ritonavir which started between the collection of the first and second specimen. Our previous clinical trial showed that triple combination, which includes ribavirin, lopinavir–ritonavir and interferon-β 1b, can shorten the duration of illness in COVID-19patients [30]. Ribavirin is known to induce mutations in RNA viruses, and this may have promoted the mutation in this patient. However, out of 4000 bp sequenced, only W152L was found to be the predominant mutant. Therefore, it is unlikely that the mutation is related to the antivirals given.
Limitations of this study
There are several limitations in this study. First, we can only determine the SNPs among specimens with successful RT-PCR of the spike region. Second, the serial specimens of each patient that were successfully sequenced were collected <14 days apart. This is because the viral load reduces substantially during the second week of infection [2,31]. Therefore, we were not able to determine the variants that are present in patients with prolonged viral shedding. Viral load can also be affected by other factors. Older age has been associated with a higher viral load [2]. The type of specimen can also affect the viral load [31]. The duration of viral shedding can be longer in fecal than in respiratory specimens [32]. Third, the amplification of the spike gene may cause distortion in the variant frequencies. We have addressed this problem by performing a separate RT-PCR reaction for nanopore and Illumina sequencing of the same specimen. Furthermore, we only focused on variants that exceed 10% variation from the same individual. Fourth, for the patient with W152L mutation in the saliva specimen collected on day 9 after symptom onset, only sputum specimen was available on day 7 after symptom onset. It is possible that W152L mutation is already present as a predominant population in the saliva of the patient on day 7 after symptom onset.
Conclusions and implications for clinical practice and research studies
Our study demonstrates that mutations may arise spontaneously at neutralizing antibody sites during the course of COVID-19. Although these mutations may not be sustained during person-to-person transmission, these amino acid mutations may affect the antiviral activity of neutralizing antibodies. Monitoring for intra-host mutational hotspots are warranted. Serial monitoring of mutations is needed for clinical trials on monoclonal antibody therapeutic trials.
Transparency declaration
All authors declare no conflict of interest.
Funding
This work was supported by the Consultancy Service for Enhancing Laboratory Surveillance of Emerging Infectious Diseases and Research Capability on Antimicrobial Resistance for the Department of Health of the HKSAR Government, and donations of Richard Yu and Carol Yu, May Tam Mak Mei Yin, the Shaw Foundation Hong Kong, Michael Seak-Kan Tong, Respiratory Viral Research Foundation Limited, Hui Ming, Hui Hoy and Chow Sin Lan Charity Fund Limited, Chan Yin Chuen Memorial Charitable Foundation, Marina Man-Wai Lee, the Hong Kong Hainan Commercial Association South China Microbiology Research Fund, the Jessie & George Ho Charitable Foundation, Perfect Shape Medical Limited, Kai Chong Tong, Tse Kam Ming Laurence, Betty Hing-Chu Lee, and Ping Cham So.
Author's contribution
J.D.I., K.H.K., W.M.C., K.Y.Y. and K.K.W.T. designed the study. J.D.I., W.M.C., C.C.Y.Y., A.W.H.C. and K.H.C. acquired the data. O.T.Y.T., W.S.L., T.S.H.C. and I.F.N.H. were involved in patient recruitment. All authors interpreted the data, revised the manuscript critically for important intellectual content and approved the final report.
Authors: Ivan F N Hung; Kelvin K W To; Jasper F W Chan; Vincent C C Cheng; Kevin S H Liu; Anthony Tam; Tuen-Ching Chan; Anna Jinxia Zhang; Patrick Li; Tin-Lun Wong; Ricky Zhang; Michael K S Cheung; William Leung; Johnson Y N Lau; Manson Fok; Honglin Chen; Kwok-Hung Chan; Kwok-Yung Yuen Journal: Chest Date: 2016-11-22 Impact factor: 9.410
Authors: Ka Shing Cheung; Ivan F N Hung; Pierre P Y Chan; K C Lung; Eugene Tso; Raymond Liu; Y Y Ng; Man Y Chu; Tom W H Chung; Anthony Raymond Tam; Cyril C Y Yip; Kit-Hang Leung; Agnes Yim-Fong Fung; Ricky R Zhang; Yansheng Lin; Ho Ming Cheng; Anna J X Zhang; Kelvin K W To; Kwok-H Chan; Kwok-Y Yuen; Wai K Leung Journal: Gastroenterology Date: 2020-04-03 Impact factor: 22.682
Authors: Joseph R Fauver; Mary E Petrone; Emma B Hodcroft; Kayoko Shioda; Hanna Y Ehrlich; Alexander G Watts; Chantal B F Vogels; Anderson F Brito; Tara Alpert; Anthony Muyombwe; Jafar Razeq; Randy Downing; Nagarjuna R Cheemarla; Anne L Wyllie; Chaney C Kalinich; Isabel M Ott; Joshua Quick; Nicholas J Loman; Karla M Neugebauer; Alexander L Greninger; Keith R Jerome; Pavitra Roychoudhury; Hong Xie; Lasata Shrestha; Meei-Li Huang; Virginia E Pitzer; Akiko Iwasaki; Saad B Omer; Kamran Khan; Isaac I Bogoch; Richard A Martinello; Ellen F Foxman; Marie L Landry; Richard A Neher; Albert I Ko; Nathan D Grubaugh Journal: Cell Date: 2020-05-07 Impact factor: 41.582
Authors: Alexandra C Walls; Young-Jun Park; M Alejandra Tortorici; Abigail Wall; Andrew T McGuire; David Veesler Journal: Cell Date: 2020-03-09 Impact factor: 41.582
Authors: Anna Z Wec; Daniel Wrapp; Andrew S Herbert; Daniel P Maurer; Denise Haslwanter; Mrunal Sakharkar; Rohit K Jangra; M Eugenia Dieterle; Asparouh Lilov; Deli Huang; Longping V Tse; Nicole V Johnson; Ching-Lin Hsieh; Nianshuang Wang; Juergen H Nett; Elizabeth Champney; Irina Burnina; Michael Brown; Shu Lin; Melanie Sinclair; Carl Johnson; Sarat Pudi; Robert Bortz; Ariel S Wirchnianski; Ethan Laudermilch; Catalina Florez; J Maximilian Fels; Cecilia M O'Brien; Barney S Graham; David Nemazee; Dennis R Burton; Ralph S Baric; James E Voss; Kartik Chandran; John M Dye; Jason S McLellan; Laura M Walker Journal: Science Date: 2020-06-15 Impact factor: 47.728
Authors: Philippe Colson; Jacques Fantini; Nouara Yahi; Jeremy Delerce; Anthony Levasseur; Pierre-Edouard Fournier; Jean-Christophe Lagier; Didier Raoult; Bernard La Scola Journal: Arch Virol Date: 2022-01-27 Impact factor: 2.685
Authors: Douglas Carvalho Caixeta; Stephanie Wutke Oliveira; Leia Cardoso-Sousa; Thulio Marquez Cunha; Luiz Ricardo Goulart; Mario Machado Martins; Lina Maria Marin; Ana Carolina Gomes Jardim; Walter Luiz Siqueira; Robinson Sabino-Silva Journal: Front Public Health Date: 2021-05-21