| Literature DB >> 32669382 |
David Bonsall1,2, Tanya Golubchik3, Mariateresa de Cesare3,2, Mohammed Limbada4,5, Barry Kosloff4,5, George MacIntyre-Cockett2,3, Matthew Hall3, Chris Wymant3, M Azim Ansari2,6, Lucie Abeler-Dörner3, Ab Schaap4,5, Anthony Brown6, Eleanor Barnes6, Estelle Piwowar-Manning7, Susan Eshleman7, Ethan Wilson8, Lynda Emel8, Richard Hayes5, Sarah Fidler9, Helen Ayles4,5, Rory Bowden2, Christophe Fraser3,2.
Abstract
Viral genetic sequencing can be used to monitor the spread of HIV drug resistance, identify appropriate antiretroviral regimes, and characterize transmission dynamics. Despite decreasing costs, next-generation sequencing (NGS) is still prohibitively costly for routine use in generalized HIV epidemics in low- and middle-income countries. Here, we present veSEQ-HIV, a high-throughput, cost-effective NGS sequencing method and computational pipeline tailored specifically to HIV, which can be performed using leftover blood drawn for routine CD4 cell count testing. This method overcomes several major technical challenges that have prevented HIV sequencing from being used routinely in public health efforts; it is fast, robust, and cost-efficient, and generates full genomic sequences of diverse strains of HIV without bias. The complete veSEQ-HIV pipeline provides viral load estimates and quantitative summaries of drug resistance mutations; it also exploits information on within-host viral diversity to construct directed transmission networks. We evaluated the method's performance using 1,620 plasma samples collected from individuals attending 10 large urban clinics in Zambia as part of the HPTN 071-2 study (PopART Phylogenetics). Whole HIV genomes were recovered from 91% of samples with a viral load of >1,000 copies/ml. The cost of the assay (30 GBP per sample) compares favorably with existing VL and HIV genotyping tests, proving an affordable option for combining HIV clinical monitoring with molecular epidemiology and drug resistance surveillance in low-income settings.Entities:
Keywords: HIV; HPTN; HPTN 071; Illumina; NGS; PopART; RNA virus; SMARTer; antiretroviral resistance; antiretroviral therapy; bait capture; drug resistance; drug resistance evolution; gene sequencing; human immunodeficiency virus; phylogenetic analysis; phylogenetics; public health; short-read sequencing; sub-Saharan Africa; surveillance studies; viral evolution; viral genomics; viral sequencing
Mesh:
Substances:
Year: 2020 PMID: 32669382 PMCID: PMC7512176 DOI: 10.1128/JCM.00382-20
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 5.948
FIG 1veSEQ-HIV includes a sequencing protocol and bioinformatics pipeline, yielding information on individual and population levels. (A) The veSEQ-HIV method was developed to provide multiple measurements from a single assay, including viral load, HIV genotype, drug resistance, and transmission inference. (B) Overview of veSEQ-HIV: a complete laboratory and computational pipeline for high-throughput sequencing. RNA extraction from plasma samples is carried out in a CL-3 certified laboratory, before transfer to a dedicated genomics facility for library preparation, bait capture, and finally sequencing. Raw sequencing data are preprocessed to remove host and contaminant RNA, and these computationally filtered reads together with their de novo-assembled contigs are used to determine the consensus genome and minority variant frequencies using shiver. QC metrics are then calculated, and the proportion of contaminant reads originating from other samples is estimated with Kallisto. Samples which result in a successful read mapping are then cleaned with phyloscanner to remove contaminant reads, and clean reads are used to infer transmission patterns with phyloscanner, and to make drug resistance predictions with HIVdb and drmSEQ.
FIG 2Viral load is calculated from the number of sequencing reads. (A) The data and linear regression model estimates for the viral load standards. The narrow shaded area is the 95% confidence interval for the regression curve, and the dashed lines are 95% prediction intervals for measurements. The mean squared error of prediction was 0.324 log10 copies/ml. (B) Distribution of independently measured clinical viral loads in a subset of 146 samples used to assess model performance. (C) Relationship between the clinical viral load and the sequence-derived viral load from the model shown in panel A for these 146 samples. (D) Frequency of sequence-derived viral load estimates for all 1,620 samples.
FIG 3veSEQ-HIV is both sensitive and specific. Figure shows the length of recovered HIV genome for all sequenced samples. We consider a position in the genome to be accurately determined when the read depth is at least five. The category “Other” consists of potential intersubtype recombinants. Quantitative standards (HXB2, subtype B) are included in all sequencing runs, but are not displayed in this analysis.
Numbers of samples processed using the sequencing pipeline and near-full genomes obtained (>8,000 bp), stratified by sequence-derived HIV-1 viral load (VL)
| VL range (sequence derived) | Samples sequenced | Near-full-length genome |
|---|---|---|
| <102 | 126 | 0 |
| 102–103 | 68 | 0 |
| 103–104 | 220 | 93 |
| >104 | 1,204 | 1,204 |
FIG 4Sequencing success is influenced by viral load. (A) The length of the HIV genomes reconstructed by shiver software, from paired-end Illumina reads, stratified by log viral load, showed reproducible whole-genome coverage for samples with sequence inferred viral loads of >4 log10 copies/ml and near-complete coverage for the majority of samples with VL between 3 and 4 log10 copies/ml. (B) The viral loads at which genome coverage exceed 8 kb with minimum depth thresholds of 1 read, 5 reads, and 15 reads (after removal of PCR duplicates) are shown by the intercepts of curves fitted using a sigmoid function. (C) The median (thick lines) and 95th percentile range (ribbons) of read-depth observed across the genome are shown for all samples, grouped by sequence-derived viral load.
FIG 5Within-host phylogenetic trees of Illumina reads spanning drug resistance sites in pol. Phyloscanner software performs ancestral state reconstructions of phylogenetic trees generated from Illumina reads in “windows” across the genome in order to identify pairs consisting of transmitters (T) and recipients (R). Phylogenetic trees of reads spanning drug resistance mutations sites in pol are shown for two inferred transmission pairs (A and B). Tree tips (circles) are colored by the combinations of drug resistance mutations observed for each unique taxon and scaled to total read counts within each taxa (after removal of PCR duplicates). Heatmaps report the predicted drug susceptibilities for each read using the Stanford HIVdb classification. Sequence-derived viral loads (log 10 RNA copies/ml) and the complete list of resistance mutations with associated frequencies, observed across entire genomes, are shown for each individual. Mutations observed at frequencies below 5% are shown in parentheses.