Literature DB >> 32492196

High-coverage SARS-CoV-2 genome sequences acquired by target capture sequencing.

Shaoqing Wen1,2, Chang Sun2, Huanying Zheng3, Lingxiang Wang2, Huan Zhang3, Lirong Zou3, Zhe Liu3, Panxin Du2, Xuding Xu2, Lijun Liang3, Xiaofang Peng3, Wei Zhang3, Jie Wu3, Jiyuan Yang2, Bo Lei2, Guangyi Zeng4, Changwen Ke3, Fang Chen4, Xiao Zhang1,5.   

Abstract

In this study, we designed a set of SARS-CoV-2 enrichment probes to increase the capacity for sequence-based virus detection and obtain the comprehensive genome sequence at the same time. This universal SARS-CoV-2 enrichment probe set contains 502 120 nt single-stranded DNA biotin-labeled probes designed based on all available SARS-CoV-2 viral sequences and it can be used to enrich for SARS-CoV-2 sequences without prior knowledge of type or subtype. Following the CDC health and safety guidelines, marked enrichment was demonstrated in a virus strain sample from cell culture, three nasopharyngeal swab samples (cycle threshold [Ct ] values: 32.36, 36.72, and 38.44) from patients diagnosed with COVID-19 (positive control) and four throat swab samples from patients without COVID-19 (negative controls), respectively. Moreover, based on these high-quality sequences, we discuss the heterozygosity and viral expression during coronavirus replication and its phylogenetic relationship with other selected high-quality samples from the Genome Variation Map. Therefore, this universal SARS-CoV-2 enrichment probe system can capture and enrich SARS-CoV-2 viral sequences selectively and effectively in different samples, especially clinical swab samples with a relatively low concentration of viral particles.
© 2020 Wiley Periodicals LLC.

Entities:  

Keywords:  SARS coronavirus; gene expression; genetic networks; genetic variability; mutation

Mesh:

Substances:

Year:  2020        PMID: 32492196      PMCID: PMC7300714          DOI: 10.1002/jmv.26116

Source DB:  PubMed          Journal:  J Med Virol        ISSN: 0146-6615            Impact factor:   20.693


INTRODUCTION

The outbreak of the novel coronavirus (SARS‐CoV‐2) disease has become a global and ongoing health concern. Since a patient with pneumonia of unknown etiology was first reported in the city of Wuhan on 30 December 2019, epidemiological, clinical, radiological, laboratory and genomic findings of this virus were gradually discovered by Chinese and international experts. At the current stage of research, however, two crucial topics must be addressed. First, according to the latest diagnostic criteria, reverse‐transcriptase polymerase chain reaction (RT‐PCR) assays are recommended as the standard diagnosis of SARS‐CoV‐2‐infection. However, present studies found that some patients have typical imaging findings, including ground‐glass opacity, but negative RT‐PCR results. The false‐negative RT‐PCR results can be caused by many factors, especially the insufficient detection sensitivity in a low viral load scenario. Second, more work must be done to monitor the virus mutation, and these mutations influence of disease severity and progression. Necessitating the full‐length of the SARS‐CoV‐2 genome, metagenome sequencing technology is the latest and most comprehensive approach , , , but still costly. Moreover, in the metagenome sequencing library, there are significant amounts of host (human) nucleic acid contamination and carrier RNA contamination introduced in commercial RNA extraction kits, both of which impair the amount of viral sequence readout. In this context, we developed a set of SARS‐CoV‐2 enrichment probes by using hybridization capture technology to increase the sensitivity of sequence‐based virus detection and characterization. This method was first used to enrich sequence targets from the human genome and then from vertebrate virome. The enrichment probe set contains 502 single‐stranded DNA biotin‐labeled probes at 2× tiling designed based on all available SARS‐CoV‐2 viral sequences, downloaded from the Global Initiative on Sharing All Influenza Data (GISAID; https://www.gisaid.org/) on 1 February 2020, and it can be used to enrich for SARS‐CoV‐2 sequences without prior knowledge of type or subtype. In addition, the probes for human housekeeping genes (GAPDH, PCBP1, EIF3L, POLR2A, EIF3A, TGOLN2, TCEB3, CDK12, and BTBD7) were spiked in the probe set as internal controls for studying viral expression.

MATERIALS AND METHODS

To evaluate the sensitivity and specificity, we tested the enrichment probe set by using a virus strain sample derived from cell culture, three nasopharyngeal swab samples collected from patients diagnosed COVID‐19 (positive controls), and four throat swab samples were taken from patients without COVID‐19 (negative controls), respectively. Blank control is RNase free water. The SARS‐CoV‐2 virus isolation and culturing were reported previously, which followed the CDC guidelines and good practice in laboratory health and safety requirements. Experiments were performed with the approval of the W96‐027B framework. The RT‐PCR tests were performed on all samples following a previously described method. The RT‐PCR test kits (Bioperfectus) were officially approved by China's National Medical Products Administration. The C t values for all samples are listed in Table 1. Notably, the sample GDFS2020329 showed weakly positive RT‐PCR results, and the C t value was adjacent to the cut‐off value (40) for positivity.
Table 1

Summary statistics of the enrichment libraries in this study

Sample IDSample typeRT‐PCR (C t value)Volume, µLPCR circles of library amplificationTotal_reads_rawTotal_reads_rmdupMap_SARS‐CoV‐2_Reads _ReadsSARS‐CoV‐2_Endo_RatioCluster_FactorMean_depth
20200217A_1Virus strain339155 841 0445 036 9024 813 7340.955691.1596515 070.33
20200217A_2Virus strain339155 738 5604 797 8814 412 6650.919711.1960613 816.72
20200217A_3Virus strain337156 125 2195 150 4844 638 8660.900671.1892514 526.37
20200217A_4Virus strain337158 837 9147 432 3676 694 5150.900721.1891120 966.49
20200217A_5Virus strain33617170 189 070150 142 532145 007 2610.965801.13352454 007.04
20200217A_6Virus strain33717224 071 046199 421 414192 545 5320.965521.12361602 829.61
GDFS2020309Nasopharyngeal swabs36.72101590 343 60386 471 82131 8210.000371.0447898.92
GDFS2020329Nasopharyngeal swabs38.44101597 086 90279 596 77447500.000061.2197314.76
GDFS2020336Nasopharyngeal swabs32.36101597 608 77264 347 810756 9730.011761.516892370.64
MGI056Z14DThroat swab201313 062 31612 121 570001.077610
MGI057Z15AThroat swab201318 048 42715 685 239001.150660
MGI066Z17BThroat swab201319 511 56415 694 790001.243190
MGI076Z19DThroat swab201336 170 26531 850 222001.135640
Blank controlRNase free water2013105 83591 275001.159520

Abbreviation: RT‐PCR, reverse‐transcriptase polymerase chain reaction.

Summary statistics of the enrichment libraries in this study Abbreviation: RT‐PCR, reverse‐transcriptase polymerase chain reaction. We divided the total RNA sample of the SARS‐CoV‐2 virus strain (20SF014) into six samples (with slightly different experimental conditions) (Table 1). Six virus strain samples, three positive samples, four negative samples, and one blank control were reverse‐transcribed into complementary DNA, respectively, followed by the second‐strand synthesis. Using the synthetic double‐stranded DNA, all DNA libraries were constructed through DNA‐fragmentation, end‐repair, adaptor‐ligation, and PCR amplification. Subsequently, library hybridization capture was performed by using the SARS‐CoV‐2 enrichment probe set. The enriched libraries were qualified with Agilent 2100 Bioanalyzer using Agilent High Sensitivity DNA Kit and equivalent double‐stranded DNA libraries were pooled and transformed into a single‐stranded circular DNA library through DNA‐denaturation and circularization. DNA nanoballs were generated from single‐stranded circular DNA by rolling circle amplification, then qualified with Invitrogen Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Foster City, CA) and loaded onto the flow cell and sequenced with PE100 on the MGISEQ‐2000 platform (MGI, Shenzhen, China). Detailed experimental protocol in the Chinese and English version is presented in the Supporting Information Doc S1. The Cutadapt (version 2.7) and trimmomatic (version 0.38) software were used for clipping adaptors and trimming low‐quality reads. After removing the adaptor, low‐quality, and low‐complexity reads, high‐quality reads were first filtered against the human reference genome (hg 38) using Burrows‐Wheeler Alignment (MEM). The remaining nonhuman reads were then realigned to the SARS‐COV‐2 reference (MN908947.3, https://www.ncbi.nlm.nih.gov/nuccore/MN908947) using bowtie2 (version 2.3.4.1) and filtered reads according to mapping quality (−q 30) by SAMtools (version 1.10). The variant was called by SAMtools and VarScan (version 2.3.9, parameter: —strand‐filter 0 —min‐avg‐qual 30 —min‐reads2 15 —min‐coverage 15). Finally, the sample consensus sequence was created by SAMtools and BCFtools (version 1.9) according to the variants called above.

RESULTS AND DISCUSSION

The summary statistics for each enrichment library are described in Table 1. For virus strain sample (library 1‐6) and three positive samples, 4 797 881‐199 421 414 and 64 347 810‐86 471 821 unique reads were obtained, of which 4 412 665‐192 545 532 and 4750‐756 973 reads (reflecting the viral RNA copy number [inversely related to C t value]) were mapped to SARS‐CoV‐2 reference sequence (MN908947.3), respectively. For four negative and one control samples, none reads were mapped to the SARS‐CoV‐2 reference sequence. The fraction of SARS‐CoV‐2 endogenous DNA from virus strain enrichment libraries were found to be between 90.07% and 96.58%, demonstrating that the numbers of mapped reads to SARS‐CoV‐2 reference sequence significantly increased compared with metagenomic sequencing technology. The library complexity is evaluated by cluster factor, which is defined by “the number of raw reads divided by the number of reads after removing duplicates.” In all enrichment libraries, the clustering factor is less than 1.5, with 1 being the best value for library construction. Notably, when adding the PCR cycle numbers of library amplification from 15 to 17, the library quality improves. Moreover, by merging the data from six virus strain enrichment libraries, we obtained a total of 371 981 580 unique reads, among which 358 112 573 reads were mapped to SARS‐CoV‐2 reference. Using these unique SARS‐CoV‐2 fragments from the virus strain sample, we reconstructed six SARS‐CoV‐2 genomes (mean depth being 186 869× and minimum coverage 13 816×). Only the merged sequence (coverage 1 121 217×) was used for further analysis. For three positive samples, we also reconstructed three SARS‐CoV‐2 genomes (mean depth 98.92×, 14.76×, and 2370.64×, respectively). Their C t values are 36.72, 38.44, and 32.36, accordingly. Finally, for the virus strain sample, there are five variants called from merged data, including one homozygous variant at SNP (T23569C), and four heterozygotic variants (three SNPs: C4534T, A5522T, C23525T, and one deletion: CT16779C). For three positive samples, GDFS2020309 has two homozygous variants: C23525T, CT27791C, and heterozygotic variants T23569C; GDFS2020336 has two homozygous variants: C635T and C29303T; GDFS2020329 has no variant. The phenomenon of heterozygosity had been reported in previous studies, , we propose that this heterogeneity could be caused by the mutations that occur during viral replication or the infection by multistrain of coronavirus. We collected the variations information (gff3 files) of high‐quality samples from the Genome Variation Map (ftp://download.big.ac.cn/GVM/Coronavirus/gff3/) (on 22 March 2020). According to the quality criteria for 2019‐nCoV delivered by National Genomics Data Center (2019nCoVR; https://bigd.big.ac.cn/ncov), we enrolled 601 samples with 45 SNVs at first and second levels (with MAF > 0.01 and no dense variation regions; see https://bigd.big.ac.cn/ncov/variation/annotation) in the following analysis. The information of raw variations in the gff3 file is recoded into the binary format as an input file for Network analysis (Network version 5; www.fluxus-engineering.com; Table S1). Five clades could be identified and labeled, corresponding to the full genome tree delivered by GISAID (see Figure S1). Except for three main larger clades (named: S:ORF8‐L84S [defined by SNP: 28144], G: S‐D614G [SNP: 23403], V:NS3‐G251V [SNP: 26144]), we defined a new clade, clade I: orf1ab‐V378I (segregating at position 1397). The haplotype of the reference genome (MN908947) is in the central clade (yellow circle), and our samples (20200217A, GDFS2020309, GDFS2020329, and GDFS2020336) are also in this clade. In Figure 1A, we found two peaks in genome sequencing depths, one covering the 5′‐UTR region (MN908947.3:1‐256) and another covering the N region (MN908947.3:28274‐29533), which may be associated with the high expression in these two regions during replication of coronavirus. , For high sequencing depths in 5′‐UTR region, a reasonable explanation is that 5′‐UTRs before ORF1a is necessary for the discontinuous synthesis of subgenomic RNAs in the beta coronaviruses and contains the cis‐acting sequences necessary for viral replication. Clinically, N gene RT‐PCR assay was found to be more sensitive than other genes in SARS‐CoV‐2 detection, which is consistent with our finding of high sequencing depths in N region. This can be explained as the structural composition of coronavirus, also the difference in expression regulation in the host cells regarding subgenomic mRNA. , , In Figure 1B, however, there were no typical depth peaks found in the 5′‐UTR region and N region in positive samples. We suggest that a larger sample size is needed to evaluate the divergent expression pattern in the future.
Figure 1

Sequencing depth statistics and transcripts per million (TPM) statistics. A, Sequencing depth of the virus strain sample, corresponding to SARS‐CoV‐2 genome reference (MN908947.3). B, Sequencing depths of three positive samples. C, TPM statistics of the positive and negative samples

Sequencing depth statistics and transcripts per million (TPM) statistics. A, Sequencing depth of the virus strain sample, corresponding to SARS‐CoV‐2 genome reference (MN908947.3). B, Sequencing depths of three positive samples. C, TPM statistics of the positive and negative samples In general, in our selected human housekeeping genes (GAPDH, PCBP1, EIF3L, POLR2A, EIF3A, TGOLN2, TCEB3, CDK12, and BTBD7), GAPDH exhibited a relatively high expression level, PCBP1, EIF3L and POLR2A showed a moderate expression level, and the rest genes had a relatively low expression level. This gene expression pattern was clearly shown in all positive and negative samples (see Figure 1C). Importantly, according to the transcripts per million statistics, we found that positive samples (GDFS2020336 [C t value: 32.36], GDFS2020309 [C t value: 36.72], and GDFS2020329 [C t value: 38.44]) exhibited the high, moderate, and low expression level (red bar), respectively, which was nearly equivalent to that of gene GAPDH, PCBP1, and BTBD7 (Figure 1C). In the current study, we, based on the available SARS‐CoV‐2 virus sequences, designed a set of SARS‐CoV‐2 enrichment probes. We made six enrichment libraries from one cultured SARS‐CoV‐2 virus strain and seven enrichment libraries from three positive samples (especially a weakly positive sample) and four negative samples to test the enrichment effects and sequenced them on MGISEQ‐2000 platform. Overall, the SARS‐CoV‐2 enrichment probe described in this study showed significant, SARS‐CoV‐2‐specific enrichment and should be a useful tool for the SARS‐CoV‐2 research community for detecting SARS‐CoV‐2 RNA in low amounts and for monitoring the future mutations.

CONFLICT OF INTERESTS

The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS

SQW and XZ were involved in designing the study and preparing the manuscript. HYZ, HZ, LRZ, ZL, LJL, XFP, WZ, JW, JYY, BL, and GYZ performed most of the experiments. SQW, CS, LXW, PXD, and XDX analyzed the data. CWK, FC, and XZ contributed to the critical revision of the manuscript. The corresponding authors were responsible for all aspects of the study and ensured that issues related to the accuracy or integrity of any part of the work were investigated and resolved. All authors reviewed and approved the final version of the manuscript. Supplementary information Click here for additional data file. Supplementary information Click here for additional data file. Supplementary information Click here for additional data file. Supplementary information Click here for additional data file.
  15 in total

1.  The 2019 novel coronavirus resource.

Authors:  Wen-Ming Zhao; Shu-Hui Song; Mei-Li Chen; Dong Zou; Li-Na Ma; Ying-Ke Ma; Ru-Jiao Li; Li-Li Hao; Cui-Ping Li; Dong-Mei Tian; Bi-Xia Tang; Yan-Qing Wang; Jun-Wei Zhu; Huan-Xin Chen; Zhang Zhang; Yong-Biao Xue; Yi-Ming Bao
Journal:  Yi Chuan       Date:  2020-02-20

2.  Molecular Diagnosis of a Novel Coronavirus (2019-nCoV) Causing an Outbreak of Pneumonia.

Authors:  Daniel K W Chu; Yang Pan; Samuel M S Cheng; Kenrie P Y Hui; Pavithra Krishnan; Yingzhi Liu; Daisy Y M Ng; Carrie K C Wan; Peng Yang; Quanyi Wang; Malik Peiris; Leo L M Poon
Journal:  Clin Chem       Date:  2020-04-01       Impact factor: 8.327

3.  Virome Capture Sequencing Enables Sensitive Viral Diagnosis and Comprehensive Virome Analysis.

Authors:  Thomas Briese; Amit Kapoor; Nischay Mishra; Komal Jain; Arvind Kumar; Omar J Jabado; W Ian Lipkin
Journal:  MBio       Date:  2015-09-22       Impact factor: 7.867

4.  Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases.

Authors:  Tao Ai; Zhenlu Yang; Hongyan Hou; Chenao Zhan; Chong Chen; Wenzhi Lv; Qian Tao; Ziyong Sun; Liming Xia
Journal:  Radiology       Date:  2020-02-26       Impact factor: 11.105

5.  RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak.

Authors:  Liangjun Chen; Weiyong Liu; Qi Zhang; Ke Xu; Guangming Ye; Weichen Wu; Ziyong Sun; Fang Liu; Kailang Wu; Bo Zhong; Yi Mei; Wenxia Zhang; Yu Chen; Yirong Li; Mang Shi; Ke Lan; Yingle Liu
Journal:  Emerg Microbes Infect       Date:  2020-02-05       Impact factor: 7.163

6.  Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding.

Authors:  Roujian Lu; Xiang Zhao; Juan Li; Peihua Niu; Bo Yang; Honglong Wu; Wenling Wang; Hao Song; Baoying Huang; Na Zhu; Yuhai Bi; Xuejun Ma; Faxian Zhan; Liang Wang; Tao Hu; Hong Zhou; Zhenhong Hu; Weimin Zhou; Li Zhao; Jing Chen; Yao Meng; Ji Wang; Yang Lin; Jianying Yuan; Zhihao Xie; Jinmin Ma; William J Liu; Dayan Wang; Wenbo Xu; Edward C Holmes; George F Gao; Guizhen Wu; Weijun Chen; Weifeng Shi; Wenjie Tan
Journal:  Lancet       Date:  2020-01-30       Impact factor: 79.321

7.  SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients.

Authors:  Lirong Zou; Feng Ruan; Mingxing Huang; Lijun Liang; Huitao Huang; Zhongsi Hong; Jianxiang Yu; Min Kang; Yingchao Song; Jinyu Xia; Qianfang Guo; Tie Song; Jianfeng He; Hui-Ling Yen; Malik Peiris; Jie Wu
Journal:  N Engl J Med       Date:  2020-02-19       Impact factor: 91.245

8.  Enrichment of sequencing targets from the human genome by solution hybridization.

Authors:  Ryan Tewhey; Masakazu Nakano; Xiaoyun Wang; Carlos Pabón-Peña; Barbara Novak; Angelica Giuffre; Eric Lin; Scott Happe; Doug N Roberts; Emily M LeProust; Eric J Topol; Olivier Harismendy; Kelly A Frazer
Journal:  Genome Biol       Date:  2009-10-16       Impact factor: 13.583

9.  High-coverage SARS-CoV-2 genome sequences acquired by target capture sequencing.

Authors:  Shaoqing Wen; Chang Sun; Huanying Zheng; Lingxiang Wang; Huan Zhang; Lirong Zou; Zhe Liu; Panxin Du; Xuding Xu; Lijun Liang; Xiaofang Peng; Wei Zhang; Jie Wu; Jiyuan Yang; Bo Lei; Guangyi Zeng; Changwen Ke; Fang Chen; Xiao Zhang
Journal:  J Med Virol       Date:  2020-06-19       Impact factor: 20.693

View more
  5 in total

1.  Genomic surveillance of Nevada patients revealed prevalence of unique SARS-CoV-2 variants bearing mutations in the RdRp gene.

Authors:  Paul D Hartley; Richard L Tillett; David P AuCoin; Joel R Sevinsky; Yanji Xu; Andrew Gorzalski; Mark Pandori; Erin Buttery; Holly Hansen; Michael A Picker; Cyprian C Rossetto; Subhash C Verma
Journal:  J Genet Genomics       Date:  2021-02-18       Impact factor: 4.275

2.  Genotyping of the Major SARS-CoV-2 Clade by Short-Amplicon High-Resolution Melting (SA-HRM) Analysis.

Authors:  Hector Diaz-Garcia; Ana L Guzmán-Ortiz; Tania Angeles-Floriano; Israel Parra-Ortega; Briceida López-Martínez; Mirna Martínez-Saucedo; Guillermo Aquino-Jarquin; Rocío Sánchez-Urbina; Hector Quezada; Javier T Granados-Riveron
Journal:  Genes (Basel)       Date:  2021-04-05       Impact factor: 4.096

3.  Automatic system for high-throughput and high-sensitivity diagnosis of SARS-CoV-2.

Authors:  Jun Lu; Weihua Fan; Zihui Huang; Ke Fan; Jianhua Dong; Jisheng Qin; Jianzhong Luo; Zhizhong Zhang; Guodong Sun; Chaohui Duan; Kunyi Pan; Wenshen Gu; Xiao Zhang
Journal:  Bioprocess Biosyst Eng       Date:  2022-01-15       Impact factor: 3.434

4.  The high diversity of SARS-CoV-2-related coronaviruses in pangolins alerts potential ecological risks.

Authors:  Min-Sheng Peng; Jian-Bo Li; Zheng-Fei Cai; Hang Liu; Xiaolu Tang; Ruochen Ying; Jia-Nan Zhang; Jia-Jun Tao; Ting-Ting Yin; Tao Zhang; Jing-Yang Hu; Ru-Nian Wu; Zhong-Yin Zhou; Zhi-Gang Zhang; Li Yu; Yong-Gang Yao; Zheng-Li Shi; Xue-Mei Lu; Jian Lu; Ya-Ping Zhang
Journal:  Zool Res       Date:  2021-11-18

5.  High-coverage SARS-CoV-2 genome sequences acquired by target capture sequencing.

Authors:  Shaoqing Wen; Chang Sun; Huanying Zheng; Lingxiang Wang; Huan Zhang; Lirong Zou; Zhe Liu; Panxin Du; Xuding Xu; Lijun Liang; Xiaofang Peng; Wei Zhang; Jie Wu; Jiyuan Yang; Bo Lei; Guangyi Zeng; Changwen Ke; Fang Chen; Xiao Zhang
Journal:  J Med Virol       Date:  2020-06-19       Impact factor: 20.693

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.