| Literature DB >> 36186464 |
Abstract
Nanopore sequencing technology (NST) has become a rapid and cost-effective method for the diagnosis and epidemiological surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the coronavirus disease 2019 (COVID-19) pandemic. Compared with short-read sequencing platforms (e.g., Illumina's), nanopore long-read sequencing platforms effectively shorten the time required to complete the detection process. However, due to the principles and data characteristics of NST, the accuracy of sequencing data has been reduced, thereby limiting monitoring and lineage analysis of SARS-CoV-2. In this study, we developed an analytical pipeline for SARS-CoV-2 rapid detection and lineage identification that integrates phylogenetic-tree and hotspot mutation analysis, which we have named NanoCoV19. This method not only can distinguish and trace the lineages contained in the alpha, beta, delta, gamma, lambda, and omicron variants of SARS-CoV-2 but is also rapid and efficient, completing overall analysis within 1 h. We hope that NanoCoV19 can be used as an auxiliary tool for rapid subtyping and lineage analysis of SARS-CoV-2 and, more importantly, that it can promote further applications of NST in public-health and -safety plans similar to those formulated to address the COVID-19 outbreak.Entities:
Keywords: SARS-CoV-2; coronavirus disease 2019 (COVID-19); hotspot mutation; nanopore sequencing technology; phylogenetic tree
Year: 2022 PMID: 36186464 PMCID: PMC9520466 DOI: 10.3389/fgene.2022.1008792
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Schematic diagram showing the analytical principle of NanoCoV19. (A) Construction of a reference sequences and hotspot mutations database. (B) Pipeline for lineage analysis of SARS-CoV-2 based on nanopore sequencing data.
FIGURE 2Analytical results of simulated sequence data for 60 lineages. (A) The result of phylogenetic tree analysis (the red text represents simulated data). (B) The heatmap analysis of hotspot mutations.
FIGURE 3Assembly accuracy affects phylogenetic tree analysis. (A) The assembly results of FlyE to analyze simulated data with errors. (B) The assembly results of FlyE to analyze simulated data without errors. (C) The assemblies and consensus results of Trycycler to analyze simulated data with errors, which combination with 10 high-quality assembly results. (D) The assemblies and consensus results of Trycycler to analyze simulated data with errors, which combination with 23 high-quality assembly results. (E) The structural problems of the assembled draft genomes resulted in the outlier samples, which could not effectively distinguish the lineage.
Running time during each step of the five tests.
| Testing sample | Alpha | Beta | Gamma | Lambda | Omicron | |
|---|---|---|---|---|---|---|
| Compute resource | AMD EPYC 7542 32-core processor, 2T memory, 128 processor (16 processor/task) | |||||
| Data size | Read number | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
| Base number | 7,759,122 | 7,869,879 | 7,784,216 | 7,485,683 | 7,638,683 | |
| Read length N50 | 9,496 | 9,553 | 9,469 | 9,168 | 9,134 | |
| Data analysis | Data preprocessing | 0:05:40 | 0:06:20 | 0:05:01 | 0:07:12 | 0:05:07 |
| Assembly-FlyE | 0:01:57 | 0:02:01 | 0:02:01 | 0:01:57 | 0:01:56 | |
| Assembly-Canu | 0:02:08 | 0:02:07 | 0:02:06 | 0:01:58 | 0:02:01 | |
| Assembly-Wtdbg2 | 0:00:06 | 0:00:13 | 0:00:07 | 0:00:05 | 0:00:11 | |
| Assembly-raven | 0:00:03 | 0:00:02 | 0:00:03 | 0:00:02 | 0:00:03 | |
| Racon | 0:00:15 | 0:00:21 | 0:00:21 | 0:00:18 | 0:00:18 | |
| Pilon | 0:11:16 | 0:10:44 | 0:10:28 | 0:09:52 | 0:09:08 | |
| Trycycler | 0:00:38 | 0:00:38 | 0:00:43 | 0:00:41 | 0:00:40 | |
| Phylogenetic tree | 0:33:58 | 0:35:27 | 0:23:48 | 0:22:28 | 0:23:02 | |
| Variation calling | 0:00:07 | 0:00:06 | 0:00:11 | 0:00:06 | 0:00:07 | |
| Total time | 0:56:08 | 0:57:59 | 0:44:49 | 0:44:39 | 0:42:33 | |