| Literature DB >> 33851337 |
Yi Yan1,2,3,4, Ke Wu1,2,3,4, Jun Chen5, Haizhou Liu1,2,3, Yi Huang6, Yong Zhang1, Jin Xiong1, Weipeng Quan7, Xin Wu8, Yu Liang9, Kunlun He10,11, Zhilong Jia10,11, Depeng Wang8, Di Liu12,13,14,15,16, Hongping Wei17, Jianjun Chen18,19.
Abstract
Genome sequencing has shown strong capabilities in the initial stages of the COVID-19 pandemic such as pathogen identification and virus preliminary tracing. While the rapid acquisition of SARS-CoV-2 genome from clinical specimens is limited by their low nucleic acid load and the complexity of the nucleic acid background. To address this issue, we modified and evaluated an approach by utilizing SARS-CoV-2-specific amplicon amplification and Oxford Nanopore PromethION platform. This workflow started with the throat swab of the COVID-19 patient, combined reverse transcript PCR, and multi-amplification in one-step to shorten the experiment time, then can quickly and steadily obtain high-quality SARS-CoV-2 genome within 24 h. A comprehensive evaluation of the method was conducted in 42 samples: the sequencing quality of the method was correlated well with the viral load of the samples; high-quality SARS-CoV-2 genome could be obtained stably in the samples with Ct value up to 39.14; data yielding for different Ct values were assessed and the recommended sequencing time was 8 h for samples with Ct value of less than 20; variation analysis indicated that the method can detect the existing and emerging genomic mutations as well; Illumina sequencing verified that ultra-deep sequencing can greatly improve the single read error rate of Nanopore sequencing, making it as low as 0.4/10,000 bp. In summary, high-quality SARS-CoV-2 genome can be acquired by utilizing the amplicon amplification and it is an effective method in accelerating the acquisition of genetic resources and tracking the genome diversity of SARS-CoV-2.Entities:
Keywords: Amplicon; Genome; Nanopore sequencing; SARS-CoV-2
Mesh:
Substances:
Year: 2021 PMID: 33851337 PMCID: PMC8043101 DOI: 10.1007/s12250-021-00378-8
Source DB: PubMed Journal: Virol Sin ISSN: 1995-820X Impact factor: 4.327
Fig. 1The workflow and schematic overview of SARS-CoV-2 genome sequencing using the amplicon-Nanopore technique. The general workflow of amplicon-nanopore sequencing of SARS-CoV-2 genome from throat swab samples.
Fig. 2Performance of representative samples in the amplicon-Nanopore sequencing. A The proportion of mapped and unmapped reads of 38 samples when compared to the reference genome. Each bar represents a sample, bars in violet are reads mapped to the reference, and bars in dark-purple are unmapped reads. The order of samples is the same as the order of samples in Supplementary Table S1, excluding 4 low-quality samples. B The length distribution of reads. The internal small graph represents the reads distribution according to the theoretical length of amplicons, while the external large graph represents the reads length distribution of each sample in the actual sequencing results. C Sequencing coverage and depth overview of 10 samples with different Ct values. On the left is the corresponding situation of amplicons location and sequencing coverage and depth of each sample genome, and on the right is the Ct value of each sample.
Fig. 3The correlation between different data quality indicators and Ct value and data yielding assessment. A, B Correlation between Ct value and five sequencing quality indicators (total data volume, mapping rate, coverage, average sequencing depth, median sequencing depth) in the sequencing batch PAE36018 (A) and batch PAE38111 (B). C–F The genome coverage process with sequencing of samples in different range of Ct values: C, Ct values less than 20 cycles (n = 1); D, Ct values between 20 and 25 (n = 8); E, Ct values between 25 and 30 (n = 12); F, Ct values between 30 and 35 (n = 14); G, Ct values greater than 35 (n = 3). The coverage of the data when mapped to the reference genome at different sequencing time points: the black line represents the proportion of unmeasured regions to the genome, the blue dotted line represents the proportion of regions with sequencing depth greater than 10, the light-blue line represents the proportion of regions with sequencing depth greater than 100, and the green line represents the proportion of regions with sequencing depth greater than 1000. The gray vertical dotted line indicates the time at which the sequencing basically reached saturation. The proportion indicated in the figure is the data of sequencing time point of corresponding saturation timepoint.
Fig. 4Nucleotide and amino acid variations in 38 sample genomes. The top shows the open reading frames (ORFs) position of the reference genome, and the middle displays the amino acid variations on the genome of each sample. The blue triangle represents synonymous mutation, the red triangle represents non-synonymous mutation. The bottom shows the nucleotide composition at the mutation sites identified in the 38 samples. Different colors represent different bases, the bar in red is adenine (A), the bar in green is cytosine (C), the bar in blue is thymine (T), and the bar in purple is guanine (G).
Fig. 5Evaluation of variation sites and mutated allele frequency of Nanopore sequencing using Miseq sequencing. A Comparison of mutation frequency of SNP sites obtained by Miseq sequencing analysis with those obtained by Nanopore sequencing analysis in six samples with more SNP sites than the others. The ordinate is the proportion of the number of major mutant bases to the total number of all sequenced bases at the mutation site. The various mutation types are shown in different colors and are marked on the right side of each small figure. B Correlation of mutated allele frequencies (MuAFs) observed for SNPs detected at viral genomes with Nanopore and Illumina sequencing. SNPs detected with Nanopore but not Illumina were considered to be false-positives (FP; green) and SNPs detected with Illumina but not Nanopore were considered to be false-negatives (FN; red).