| Literature DB >> 32278821 |
I-Na Lu1, Claude P Muller2, Feng Q He3.
Abstract
Next-generation sequencing (NGS) has revolutionized the scale and depth of biomedical sciences. Because of its unique ability for the detection of sub-clonal variants within genetically diverse populations, NGS has been successfully applied to analyze and quantify the exceptionally-high diversity within viral quasispecies, and many low-frequency drug- or vaccine-resistant mutations of therapeutic importance have been discovered. Although many works have intensively discussed the latest NGS approaches and applications in general, none of them has focused on applying NGS in viral quasispecies studies, mostly due to the limited ability of current NGS technologies to accurately detect and quantify rare viral variants. Here, we summarize several error-correction strategies that have been developed to enhance the detection accuracy of minority variants. We also discuss critical considerations for preparing a sequencing library from viral RNAs and for analyzing NGS data to unravel the mutational landscape.Entities:
Keywords: Consensus-based error correction; Next-generation sequencing (NGS); Quasispecies; RNA; Rare variants; Viruses
Year: 2020 PMID: 32278821 PMCID: PMC7144618 DOI: 10.1016/j.virusres.2020.197963
Source DB: PubMed Journal: Virus Res ISSN: 0168-1702 Impact factor: 3.303
Comparison of various NGS approaches in virus quasispecies analysis.
| Principle | Strengths | Weaknesses | Error frequency |
|---|---|---|---|
| • Preservation of minor variant frequency | • Incapable of correcting reverse transcription polymerase chain reaction (RT-PCR) errors | 1.4 × 10−5 | |
| • Multiplexing possible | • Incapable of correcting PCR errors that occur during reverse transcription. | 5 × 10−8 | |
| • No probe or primer design required | • A tendency towards G-to-A and C-to-T errors in the absence of uracil-DNA glycosylase and formamidopyrimidine-DNA glycosylase | 7.6 × 10−6 | |
| • Capability of extremely long-read sequencing (possible to identify multidrug-resistant variants in a single viral genome) | • High single-read error rates (about 1%–5%) | 3 × 10−2 |
Fig. 1Library preparation approaches of consensus-based error correction for investigating virus quasispecies. (a) Safe-SeqS uses primers linked to unique molecular identifiers (UIDs) and mouse identifiers (MIDs) for reverse transcription, which not only enables the recognition of every original viral RNA strand after PCR amplification, but also allows multiplexing of samples in the same sequencing run. (b) DupSeq applies randomized duplex tags to each double-stranded DNA molecule in a way that derivative PCR products of the two strands can be informatively related to each other but also distinguishable. Consensus wild-type or mutation sequences are reached only if the reads of each of the double strands show identical sequences. (c) CirSeq begins by circularizing of single-stranded DNA fragments without any exogenous molecular barcodes followed by rolling-circle amplification, fragmentation and sequencing. (d) INC-Seq also entails circularization single-stranded DNA fragments followed by rolling-circle amplification of the loop; however, the end product is a long DNA strand (>10Kb) comprising concatenated copies of one of the strands of the starting molecule to be sequenced on a long-read platform. For INC-Seq, only in-silico fragmentation is performed for analysis following sequencing. For CirSeq and INC-seq, the random fragmentation points of the starting molecules serve as endogenous UIDs for consensus-based error correction. For all above-mentioned four methods, after library preparation, pooling and sequencing, sequences originating from the same viral RNA strand of the same sample, are collapsed to a single consensus sequence. True mutations (pink circle) can be distinguished from PCR errors (purple star). Due to limited space, sequencing errors are not marked here.
Fig. 2A general experimental and computational workflow for improving NGS data quality of virus quasispecies studies. For the comparative studies or clinical samples, we start from different samples (1,2,i,…,n). One has to first go through different experimental steps, such as sample preparation, library preparation, library quality control, sample indexing, library pooling and sequencing. Then, computational steps are followed, such as data cleaning, consensus-based error correction, variant calling and annotation.