| Literature DB >> 32605661 |
Minfeng Xiao1,2, Xiaoqing Liu3, Jingkai Ji1,2,4, Min Li1,2,5, Jiandong Li1,2,5, Lin Yang6, Wanying Sun1,2,5, Peidi Ren1,2, Guifang Yang6, Jincun Zhao3,7, Tianzhu Liang1,2, Huahui Ren1, Tian Chen6, Huanzi Zhong1, Wenchen Song1,2, Yanqun Wang3, Ziqing Deng1,2, Yanping Zhao1,2, Zhihua Ou1,2, Daxi Wang1,2, Jielun Cai1, Xinyi Cheng1,2,8, Taiqing Feng6, Honglong Wu9, Yanping Gong9, Huanming Yang1,10, Jian Wang1,10, Xun Xu1,11, Shida Zhu1,12, Fang Chen1,6, Yanyan Zhang13, Weijun Chen14,15, Yimin Li16, Junhua Li17,18,19.
Abstract
BACKGROUND: COVID-19 (coronavirus disease 2019) has caused a major epidemic worldwide; however, much is yet to be known about the epidemiology and evolution of the virus partly due to the scarcity of full-length SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) genomes reported. One reason is that the challenges underneath sequencing SARS-CoV-2 directly from clinical samples have not been completely tackled, i.e., sequencing samples with low viral load often results in insufficient viral reads for analyses.Entities:
Keywords: COVID-19; Emerging infectious diseases; Genomic surveillance; Hybrid capture; Metatranscriptomic sequencing; Multiplex PCR; Quasispecies; Virus evolution; iSNV
Mesh:
Substances:
Year: 2020 PMID: 32605661 PMCID: PMC7325194 DOI: 10.1186/s13073-020-00751-4
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Metatranscriptomic sequencing data summary of eight SARS-CoV-2-positive clinical samples collected from Guangzhou in February 2020
| Sample ID | Sample type | Ct | No. of sequencing read pairs | No. of SARS-CoV-2 read pairs | Percentage of SARS-CoV-2 read pairs | Coverage (%) | Depth (×) |
|---|---|---|---|---|---|---|---|
| Nasal swab | 18 | 1,547,648,648 | 85,316,930 | 5.513 | 100 | 113,021 | |
| Sputum | 21 | 1,578,573,142 | 7,489,563 | 0.474 | 99.96 | 12,734 | |
| Throat swab | 24 | 1,647,198,588 | 3,365,330 | 0.204 | 99.91 | 6508 | |
| Nasal swab | 26 | 1,609,367,415 | 7,275,402 | 0.452 | 99.92 | 12,758 | |
| Throat swab | 29 | 1,725,727,056 | 31,148 | 0.002 | 99.87 | 69 | |
| Sputum | 30 | 1,596,713,550 | 46,199 | 0.003 | 99.9 | 95 | |
| Sputum | 32 | 1,481,162,934 | 567,266 | 0.038 | 99.94 | 1133 | |
| Anal swab | 32 | 1,671,721,507 | 25,392 | 0.002 | 99.89 | 14 |
Fig. 1The general workflow of multiple sequencing approaches adopted in this study. We employed unique dual indexing (UDI) strategy and DNB-based (DNA nanoball) PCR-free MPS platform to minimize index hopping and relevant sequencing errors [41–43]. a Amplicon-based enrichment: the UDI was integrated in the 2nd PCR. Navy, multiplex PCR primers. b Metatranscriptomic library preparations: the UDI was integrated in the adaptor ligation and universal PCR steps. c Library preparations and hybrid capture-based enrichment: the UDI was integrated in the adaptor ligation and pre-capture PCR steps. Ocher, ssDNA probes. Red and green lines represent adaptor sequences; green dots represent phosphate groups
Fig. 2Overview of the study design. Eight clinical samples and serial dilutions of a cultured isolate were subjected to direct metatranscriptomic library construction, amplicon-based enrichment, and hybrid capture-based enrichment, respectively. Libraries generated from each method were pooled, respectively. DNB, DNA nanoball. 14, GZMU0014; 16, GZMU0016; 30, GZMU0030; 31, GZMU0031; 42, GZMU0042; 44, GZMU0044; 47, GZMU0047; 48, GZMU0048. D0, undiluted sample of the cultured isolate; D1–D7, seven serial diluted samples of the cultured isolate, ranging from 1E+07 to 1E+01 genome copies per milliliter, in 10-fold dilution. “-”, negative controls prepared from nuclease-free water and human nucleic acids. PE100, paired-end 100-nt reads; SE400, single-end 100-nt reads
Fig. 3Sequencing coverage and depth of the cultured isolate and eight clinical samples. a Amplicon sequencing coverage by sample (row) across the SARS-CoV-2 genome. Dark blue, sequencing depth ≥ 100×; heatmap (bottom) sums coverage across all samples. HNA, negative control prepared from human nucleic acids; water, negative control prepared from nuclease-free water. Green horizontal lines on heatmap, amplicon locations. Overlap regions between amplicons range from 59 to 209 bp. b–d Normalized coverage across viral genomes of the clinical samples across methods. e SARS-CoV-2-RPM sequence plotted against genome copies per milliliter for the cultured isolate. Three independent experiments were performed for amplicon sequencing. Dark blue, ~ 400 bp amplicon-based sequencing including human and lambda phage nucleic acid background; soft blue, ~ 200 bp amplicon-based sequencing; fluorescent cyan, ~ 400 bp amplicon-based sequencing excluding human and lambda phage nucleic acid background (NAB); red, capture sequencing; grey, meta sequencing. f SARS-CoV-2-RPM (reads per million) sequence plotted against qRT-PCR Ct value for the clinical samples. Dark blue, amplicon; red, capture; grey, meta. g Estimated minimum amount of bases required by each method for high-confidence downstream analyses. Dark blue, amplicon; red, capture
Fig. 4Between-sample and within-sample variants of SARS-CoV-2 detected across methods. a SNVs detected between clinical samples against a reference genome (GISAID accession: EPI_ISL_402119) [27]. Alleles with ≥ 80% frequencies were called. *SNVs verified by Sanger sequencing. b Allele frequencies of the identified SNVs. Dark blue, amplicon; red, capture; grey, meta. Minor allele frequencies detected in serial dilutions of the cultured isolate (c) and clinical samples (d) across methods. Dark blue, amplicon vs meta; red, capture vs meta. Minor alleles are defined with ≥ 5% and < 50% frequencies. Besides general quality filter, iSNVs had to pass depth and strand bias filter as described in the “Methods” section
General characteristics of the three approaches employed in this study
| Metatranscriptomic sequencing | Hybrid capture-based sequencing | Multiplex PCR amplicon-based sequencing | |
|---|---|---|---|
| Microbiome + human | Target genome | Target genome | |
| Y | Y | N | |
| Y | Y | N | |
| Y | Y | N | |
| 18 cycles | 18 + 18 cycles | 15 + 25 cycles | |
| 10.5 h | 20.5 h | 7.5 h | |
| 120 nt × 506 | 40–60 nt × 2 × (113 + 14 + 10) | ||
| 112.86 | 65.14 | 48.00 | |
| > 10 Gb | Mb | Mb | |
| High | Moderate | Low | |
| + | ++ | +++ | |
| +++ | ++ | +++ | |
| +++ | ++ | + |
aThe price varies greatly with different sequencing output and in different regions