| Literature DB >> 22973268 |
Niko Beerenwinkel1, Huldrych F Günthard, Volker Roth, Karin J Metzner.
Abstract
Many viruses, including the clinically relevant RNA viruses HIV (human immunodeficiency virus) and HCV (hepatitis C virus), exist in large populations and display high genetic heterogeneity within and between infected hosts. Assessing intra-patient viral genetic diversity is essential for understanding the evolutionary dynamics of viruses, for designing effective vaccines, and for the success of antiviral therapy. Next-generation sequencing (NGS) technologies allow the rapid and cost-effective acquisition of thousands to millions of short DNA sequences from a single sample. However, this approach entails several challenges in experimental design and computational data analysis. Here, we review the entire process of inferring viral diversity from sample collection to computing measures of genetic diversity. We discuss sample preparation, including reverse transcription and amplification, and the effect of experimental conditions on diversity estimates due to in vitro base substitutions, insertions, deletions, and recombination. The use of different NGS platforms and their sequencing error profiles are compared in the context of various applications of diversity estimation, ranging from the detection of single nucleotide variants (SNVs) to the reconstruction of whole-genome haplotypes. We describe the statistical and computational challenges arising from these technical artifacts, and we review existing approaches, including available software, for their solution. Finally, we discuss open problems, and highlight successful biomedical applications and potential future clinical use of NGS to estimate viral diversity.Entities:
Keywords: bioinformatics; error correction; haplotype inference; next-generation sequencing; quasispecies assembly; statistics; viral diversity; viral quasispecies
Year: 2012 PMID: 22973268 PMCID: PMC3438994 DOI: 10.3389/fmicb.2012.00329
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Flow chart of sample processing for next-generation sequencing (NGS) of virus samples.
Figure 2Spatial scales of diversity estimation from NGS data. In this example, it is assumed that the true virus population (top of figure) consists of three haplotypes of relative frequencies 60% (A, blue), 30% (B, orange), and 10% (C, green). Segregating sites are indicated by arrows. Twenty short reads (labeled 1 through 20) are generated by NGS from the virus population subject to sequencing errors (indicated in magenta). Reads are displayed in a MSA and in the color of their corresponding parental haplotype. Diversity estimation can be approached at single sites (SNV detection, solid-line rectangle), in windows of the MSA (local haplotype inference, dashed-line rectangle), or over the entire genomic region (global haplotype reconstruction, dotted-line rectangle).
Figure 3Local read clustering. The local window of the MSA displayed in Figure 2 is considered (dashed-line rectangle), with colors defined as in Figure 2. Reads that are more similar to each other than to other reads are grouped together which recovers the three original haplotypes A, B, and C of Figure 2 as indicated by the three different colors. Each cluster center sequence is a predicted haplotype (bold, underlined) and the size of its corresponding cluster is an estimate of the frequency of the haplotype (here, 4/f/9, and 2/9, respectively).
Figure 4Read graph-based global haplotype reconstruction. Shown is the read graph for the first 15 reads of the MSA shown in Figure 2. Each read is represented by its index and colored according to its parental haplotype (A, blue, first row; B, orange, second row; and C, green, third row). Reads are connected by a direct edge if they agree on their non-empty overlap. Each path from the begin node to the end node represents a potential global haplotype, but there are more paths in the graph than the original three haplotypes the reads have been derived from.
Figure 5Probabilistic global haplotype reconstruction using a generative mixture model. Each of the three haplotypes colored as in Figure 2 (A, blue; B, orange; and C, green) is represented as a chain of probability tables over the four nucleotides, where darker shading of a base indicates higher probability. The probabilities of traversing from the begin node to one of the haplotypes serve as an estimate for the haplotype frequencies. Each read is regarded as an independent observation from this statistical model.
Available software tools for viral quasispecies inference.
| QuRe | Read graph | Prosperi and Salemi, | |
| ShoRAH | Read graph | Zagordi et al., | |
| ViSpA | Read graph | Astrovskaya et al., | |
| BIOA | Read graph | Mancuso et al., | |
| Hapler | Read graph | O'Neil and Emrich, | |
| AmpliconNoise | Probabilistic | Quince et al., | |
| PredictHaplo | Probabilistic | Prabhakaran et al., | |
| QuasiRecomb | Probabilistic | Zagordi et al., |
Applications of 454/Roche pyrosequencing and Illumina NGS technologies in clinical virology.
| CMV | Epidemiology | 454/Roche | Amplicon-based | Reads | Gorzer et al., |
| CMV | Epidemiology | 454/Roche | Shotgun | Consensus sequence | Jung et al., |
| EBV | Epidemiology | Illumina | Shotgun | SNV, consensus sequence | Liu et al., |
| EBV | Epidemiology | Illumina | Shotgun (amplicons) | SNV | Kwok et al., |
| HBV | Drug resistance | 454/Roche | Amplicon-based | Reads, SNV | Solmone et al., |
| HBV | Drug resistance | 454/Roche | Amplicon-based | SNV | Margeridon-Thermet et al., |
| HBV | Drug resistance | Illumina | Shotgun | SNV | Nishijima et al., |
| HCV | Drug resistance | 454/Roche | Amplicon-based | Reads | Bolcic et al., |
| HCV | Drug resistance | Illumina | Shotgun (cDNA) | SNV | Hiraga et al., |
| HCV | Drug resistance | 454/Roche | Shotgun (amplicons) | SNV, consensus sequences | Lauck et al., |
| HCV | Drug resistance | Illumina | Paired-end (amplicons) | SNV | Nasu et al., |
| HCV | Drug resistance | 454/Roche | Amplicon-based | SNV | Powdrill et al., |
| HCV | Epidemiology | 454/Roche | Amplicon-based | Reads | Escobar-Gutiérrez et al., |
| HCV | Epidemiology | Illumina | Shotgun (cDNA) | SNV, consensus sequences | Ninomiya et al., |
| HIV | Drug resistance | 454/Roche | Amplicon-based | SNV | Hoffmann et al., |
| HIV | Drug resistance | 454/Roche | Amplicon-based | Reads, SNV | Hedskog et al., |
| HIV | Epidemiology | 454/Roche | Shotgun (amplicons) | Consensus sequence | Bruselles et al., |
| HIV | Epidemiology | 454/Roche | Amplicon-based | Consensus sequence | Eshleman et al., |
| HIV | Epidemiology | 454/Roche | Amplicon-based | Reads | Redd et al., |
| HIV | Tropism | 454/Roche | Amplicon-based | Reads | Archer et al., |
| Influenza A virus | Epidemiology | Illumina | Shotgun (amplicons) | SNV | Kuroda et al., |
| Influenza A virus | Epidemiology | 454/Roche | Shotgun (amplicons) | SNV | Bartolini et al., |
| Influenza A virus | Epidemiology | 454/Roche | Shotgun | Reads | Lorusso et al., |
| norovirus | Epidemiology | 454/Roche | Shotgun (amplicons) | SNV, haplotype recon-struction | Bull et al., |
| rhinovirus | Epidemiology | Illumina | Shotgun (amplicons) | SNV, consensus sequences | Tapparel et al., |
| rotavirus | Epidemiology | 454/Roche | Shotgun (cDNA) | Consensus sequences | Jere et al., |
| VZV | Epidemiology | 454/Roche | Shotgun (amplicons) | Consensus sequences | Zell et al., |
BAL, bronchoalveolar lavage; CMV, cytomegalovirus; EBV, Epstein Barr virus; HBV, hepatitis B virus; HCV, hepatitis C virus; HIV, human immunodeficiency virus; SNV, single nucleotide variant; VZV, varicella zoster virus.