| Literature DB >> 27385709 |
Emma Thomson1, Camilla L C Ip2, Anjna Badhan3, Mette T Christiansen4, Walt Adamson1, M Azim Ansari5, David Bibby3, Judith Breuer4, Anthony Brown5, Rory Bowden2, Josie Bryant4, David Bonsall5, Ana Da Silva Filipe1, Chris Hinds1, Emma Hudson5, Paul Klenerman5, Kieren Lythgow3, Jean L Mbisa3, John McLauchlan1, Richard Myers3, Paolo Piazza2, Sunando Roy4, Amy Trebes2, Vattipally B Sreenu1, Jeroen Witteveldt6, Eleanor Barnes7, Peter Simmonds8.
Abstract
Affordable next-generation sequencing (NGS) technologies for hepatitis C virus (HCV) may potentially identify both viral genotype and resistance genetic motifs in the era of directly acting antiviral (DAA) therapies. This study compared the ability of high-throughput NGS methods to generate full-length, deep, HCV sequence data sets and evaluated their utility for diagnostics and clinical assessment. NGS methods using (i) unselected HCV RNA (metagenomics), (ii) preenrichment of HCV RNA by probe capture, and (iii) HCV preamplification by PCR implemented in four United Kingdom centers were compared. Metrics of sequence coverage and depth, quasispecies diversity, and detection of DAA resistance-associated variants (RAVs), mixed HCV genotypes, and other coinfections were compared using a panel of samples with different viral loads, genotypes, and mixed HCV genotypes/subtypes [geno(sub)types]. Each NGS method generated near-complete genome sequences from more than 90% of samples. Enrichment methods and PCR preamplification generated greater sequence depth and were more effective for samples with low viral loads. All NGS methodologies accurately identified mixed HCV genotype infections. Consensus sequences generated by different NGS methods were generally concordant, and majority RAVs were consistently detected. However, methods differed in their ability to detect minor populations of RAVs. Metagenomic methods identified human pegivirus coinfections. NGS provided a rapid, inexpensive method for generating whole HCV genomes to define infecting genotypes, RAVs, comprehensive viral strain analysis, and quasispecies diversity. Enrichment methods are particularly suited for high-throughput analysis while providing the genotype and information on potential DAA resistance.Entities:
Mesh:
Year: 2016 PMID: 27385709 PMCID: PMC5035407 DOI: 10.1128/JCM.00330-16
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 11.677
Sequencing methods and analysis pipelines evaluated at each sequencing center in the United Kingdom
| Center | Method | Method code | Sequencing method | Analysis strategy |
|---|---|---|---|---|
| Oxford | Metagenomic | O-Meta | Illumina RNA-Seq of total plasma RNA | Bespoke bioinformatic pipeline to infer metagenomic, consensus, and subpopulation level information |
| IDT | O-Capt | Genotype-specific HCV capture using IDT probes, followed by Illumina RNA-Seq | Bespoke bioinformatic pipeline to infer metagenomic, consensus, and subpopulation level information | |
| Glasgow | Metagenomic | G-Meta | Illumina RNA-Seq of total plasma RNA | FastQC, Tanoti, |
| SureSelect | G-Ssel | Genotype-specific HCV capture using SureSelect DNA probes, followed by Illumina RNA-Seq | FastQC, Tanoti, in-house resistance mutation tools, | |
| NimbleGen | G-Nimb | Genotype-specific HCV capture using NimbleGen RNA probes, followed by Illumina RNA-Seq | FastQC, Tanoti, in-house resistance mutation tools, | |
| UCL | SureSelect | U-Capt | SureSelectXT Target Enrichment library preparation and hybridization and enrichment using custom designed RNA probes, followed by Illumina DNA-Seq | Genome mapping, |
| PHE | Pre-PCR | P-PCR | Genotype-specific nested PCR of 5 or 6 overlapping fragments, followed by Illumina sequencing. | Contig assembly by SPAdes 3.5.0. HCV contigs longer than 250-nt assembled and PCR fragments combined using Sequencher 5.0. Reads were remapped to assembled sequences using BWA 0.7.5. |
See reference 10.
V. Sreenu, G. Nikolov, S. Alotaibi, T. Abdelrahman, K. Brunker, R. Orton, T. Klymenko, G. Wilkie, and E. Thomson, submitted for publication.
Adapted from reference 17.
FIG 1Relationship between viral loads and read counts for each method. (A to C) Total HCV-specific bases read from each sample (y axis, log scale) was compared with viral loads separately for target enrichment (A), metagenomic library (B), and sequence preamplified by PCR (C), on a common x/y scale. Genotype 1 and non-genotype 1 samples are indicated according to the symbol key. The significance of the association between viral loads and read counts was calculated by Spearman's rank order correlation test; Spearman correlation coefficient (r) values and P values are provided in inset boxes. (D) Distribution of viral loads by method with logarithmic mean values shown below the x axis. The box-and-whisker plots shows the median values and 67 and 95 percentiles.
FIG 2Relationship between viral load and completeness of the HCV consensus sequence from each method. (A to C) The proportion of the whole genome sequenced was compared with viral loads separately for target enrichment (A), metagenomics (B), and sequence preamplified by PCR (C) (plotted on a common x/y scale). Sequence completeness was expressed as a percentage, assuming a genome length of 9,650 bases. Genotype 1 and non-genotype 1 samples are indicated according to the symbol key. The significance of the association between viral load and genome coverage was calculated by Spearman's rank order correlation test; values of r and P values are provided in inset boxes.
FIG 3Variability in read depth across the HCV genome coverage and divergence from a global consensus for each of the sequencing methods. (A to C) Mean read coverage across the HCV genome by different NGS methods. Mean coverage was calculated as the number of bases at each site as a proportion of total reads for the sequence (expected mean value of 0.00014); mean values were calculated from samples with >100,000 total reads. Genome positions were based on the H77 reference sequence. A genome diagram of HCV drawn to the same scale as the x axis is included below panels A to C. A plot of Z-scores is provided in the supplemental material (see Fig. S1 in the supplemental material). (D to F). Divergence between the global consensus and individual consensus sequences generated by different methods were calculated for a sliding window of 250 bases centered on every 30th base. Mean divergence values for each sequencing method at each site (expressed as proportional distance [p-distance]) were plotted for positions homologous to the H77 reference strain. Genomic features of the HCV genome are shown below panels D to F, with structural genes shown in red. A comparable plot of mean values for each genotype is shown in Fig. S3 in the supplemental material.
FIG 4Comparison of the completeness of consensus sequences and their genetic relatedness to each other. Percentage sequence completeness for coding regions is given for each sample. Consensus sequences were assembled from the panel samples by each NGS method and used to define HCV genotype and compared with the genotype identified by conventional genotyping assay (Genotype column). Samples have been ranked by viral load (VL-IU/ml column) (from highest to lowest). Assembled sequences that correspond to the global consensus are shown on a gray/white scale; those that differed by >5% in nucleotide sequence from each other were considered separate strains and are shown on a green scale. Sample sP799685 generated a diverse range of sequences by different NGS methods, and it was not possible to generate a global consensus sequence by combining sequences (red shading). NA, not available.
FIG 5Assessment of viral diversity: sequence differences between the global consensus and majority sequences generated by each NGS method, and the association of HCV viral load with diversity. (A and B) Distribution of the numbers of nucleotide and amino acid differences, respectively (y axis, log scale) between the global consensus sequence and the individual majority-rule sequences generated by each NGS method (x axis). Sequences phylogenetically unrelated to the global consensus (shaded green in Fig. 4) or where there was no global consensus (shaded red in Fig. 4) have been excluded from this analysis. Gray bars represent median values for the distribution. (C) Nonsynonymous/synonymous ratio of substitutions between each assembled sequence and the corresponding global consensus sequence. More-divergent sequences showing ≥5 differences (Diffs) from the global consensus are plotted with gray filled circles. (D) Distribution of nucleotide and amino acid differences between directly sequenced amplicons derived from the NS3 (positions 3288 to 5727) and NS5B region (positions 7407 to 9366) of 12 samples from the evaluation panel with corresponding regions from the global consensus obtained by NGS methods.
Error rates of representative sequencing methods for HCV genotype 1a and 2a transcripts
| Method | Transcript | Unresolved sites (>5%) | Shannon entropy at all sites | Shannon entropy at codon position: | ||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | ||||
| O-Meta | 1a_AF011751 | 35 | 0.0158 | 0.0150 | 0.0157 | 0.0135 |
| 2a_AB047639 | 42 | 0.0129 | 0.0132 | 0.0143 | 0.0166 | |
| O-Capt | 1a_AF011751 | 25 | 0.0079 | 0.0077 | 0.0075 | 0.0075 |
| 2a_AB047639 | 18 | 0.0065 | 0.0065 | 0.0029 | 0.0032 | |
All methods had 100% accuracy for the sequence concordance of majority consensus sequence with the sequence of the clone.
Transcripts are shown by the HCV genotype first and the GenBank accession number.
Number of ambiguous sites (discordant reads forming >5% of total).
FIG 6Mean Shannon entropy values of NGS-generated sequences and relationship with viral load. (A to C) Shannon entropy values for polymorphic sites inferred for NGS sequencing methods based on metagenomic libraries (A), target enrichment (B), and PCR preamplification (C). Viral loads are plotted on log scales. (D and E) Shannon entropy values at each codon position in the consensus sequences inferred by each sequencing method based on the whole genome (D) and the nonstructural regions (E).
FIG 7Capacity of NGS to detect mixed-genotype/subtype samples. Observed ratios of NGS read counts between component genotypes genotype A (Gt A) and genotype B (Gt B) (y axis) compared to their input ratios (x axis), plotted on a log/log scale. The dotted line represents the expected position of data points if the assays were able to detect both input genotypes (genotypes A and B) with equal efficiency. Samples of mixed genotype of known ratio (the input ratio) were acquired from QCMD or through patient samples or in vitro transcripts of known genotype that were mixed in vitro (listed in Table S1B in the supplemental material).
FIG 8Frequencies of RAVs in the study samples (untreated subjects). Frequencies of resistance-associated mutations in NS3 genes (A) and NS5A and NS5B genes (B) detected by different sequencing methods, shown on a gray or color background to indicate frequencies. Resistance mutations were present either as minor variants (around 1 to 10% of the population; shown by yellow background) or represented the predominant variant in the population (shown by red background). Frequency information from samples with <10 reads at a site were excluded, as were polymorphisms found within a single sequence. Samples have been grouped by genotype.