| Literature DB >> 30923677 |
Gustavo H Kijak1,2, Eric Sanders-Buell1,2, Phuc Pham1,2, Elizabeth A Harbolick1,2, Celina Oropeza1,2, Anne Marie O'Sullivan1,2, Meera Bose1,2, Charmagne G Beckett3, Mark Milazzo1,2, Merlin L Robb1,2, Sheila A Peel1, Paul T Scott1, Nelson L Michael1, Adam W Armstrong4, Jerome H Kim1, David M Brett-Major5, Sodsai Tovanabutra1,2.
Abstract
The analysis of HIV-1 sequences has helped understand the viral molecular epidemiology, monitor the development of antiretroviral drug resistance, and design candidate vaccines. The introduction of single genome amplification (SGA) has been a major advancement in the field, allowing for the characterization of multiple sequences per patient while preserving linkage among polymorphisms in the same viral genome copy. Sequencing of SGA amplicons is performed by capillary Sanger sequencing, which presents low throughput, requires a high amount of template, and is highly sensitive to template/primer mismatching. In order to meet the increasing demand for HIV-1 SGA amplicon sequencing, we have developed a platform based on benchtop next-generation sequencing (NGS) (IonTorrent) accompanied by a bioinformatics pipeline capable of running on computer resources commonly available at research laboratories. During assay validation, the NGS-based sequencing of 10 HIV-1 env SGA amplicons was fully concordant with Sanger sequencing. The field test was conducted on plasma samples from 10 US Navy and Marine service members with recent HIV-1 infection (sampling interval: 2005-2010; plasma viral load: 5,884-194,984 copies/ml). The NGS analysis of 101 SGA amplicons (median: 10 amplicons/individual) showed within-individual viral sequence profiles expected in individuals at this disease stage, including individuals with highly homogeneous quasispecies, individuals with two highly homogeneous viral lineages, and individuals with heterogeneous viral populations. In a scalability assessment using the Ion Chef automated system, 41/43 tested env SGA amplicons (95%) multiplexed on a single Ion 318 chip showed consistent gene-wide coverage >50×. With lower sample requirements and higher throughput, this approach is suitable to support the increasing demand for high-quality and cost-effective HIV-1 sequences in fields such as molecular epidemiology, and development of preventive and therapeutic strategies.Entities:
Keywords: Bioinformatics; HIV-1; IonTorrent; Next-generation sequencing; Single genome amplification
Year: 2019 PMID: 30923677 PMCID: PMC6423504 DOI: 10.1016/j.bdq.2019.01.002
Source DB: PubMed Journal: Biomol Detect Quantif
Sample set used in the validation and field test of HIV-1 env SGA NGS.
| Patient | Plasma viral load | SGA amplicons (n) | ||
|---|---|---|---|---|
| (copies/ml) | log10 | Validation | Field test | |
| A | 44,668 | 4.65 | 11 | |
| B | 169,824 | 5.23 | 9 | |
| C | 194,984 | 5.29 | 12 | |
| D | 5,884 | 3.77 | 9 | |
| E | 100,000 | 5.00 | 11 | |
| F | 85,114 | 4.93 | 13 | |
| G | 11,749 | 4.07 | 13 | |
| H | 47,863 | 4.68 | 9 | |
| I | 10,471 | 4.02 | 8 | |
| J | 154,882 | 5.19 | 10 | 6 |
Fig. 1Next-generation sequencing of HIV-1 single genome amplicons. cDNA of HIV-1 env is titrated using serial dilution followed by nested PCR. The dilution that yields ≤30% of positive reactions is used for downstream library preparation. Amplicons are subject to enzymatic fragmentation, which is visualized on a Bioanalyzer. Gel electrophoresis is used for size selection (˜400 bp), and ligation of barcodes and sequencing adapters allows for multiplexing dozens of samples in a single emulsion PCR (ePCR) run. Libraries are then loaded on a 316 chip v2, and Ion sequencing is performed on a PGM instrument. The histogram shows the NGS read length distribution of a typical run. See text for details.
Fig. 2Algorithm for deriving a consensus sequence from HIV-1 single genome amplicon Using the tango software, the closest published HIV-1 sequence that can serve as a reference is obtained by applying BLAST on a random sample of NGS reads (Reference#01). The complete set of NGS reads is then aligned to Reference#01 using an implementation of BWA [45]. Afterwards, the consensus of the alignment (Reference#02) is derived by analyzing the frequency of nucleotide bases, insertions, and deletions in the sam file using Nautilus and SaGA software. Reference#02 is used to guide the alignment of NGS reads and consensus derivation, in a new iteration, until the “final consensus sequence” is obtained. See text for details.
Fig. 3Example of derivation of single genome amplicon NGS consensus. A) At each alignment position (columns), the sequence of the reference (top) is compared with the frequency of nucleotide bases or gaps (rows) tallied based on the analysis of the sam file (to ease visualization, frequencies are here presented as a heat map). Whenever the most frequent base/gap differs from the reference (red border), the sequence of the consensus is modified accordingly (black boxes). B) By analyzing the CIGAR field in the sam alignment file it is possible to tally the sequences from the NGS reads encoded as “I”, which correspond to “insertion to the reference” (i.e., bases present in the NGS reads that do not have a corresponding position in the reference). The plot depicts, at position of the alignment (x-axis), the frequency of the insertions (y-axis). Data points are color-coded based on the length of the insertion. The arrows depict the sequences of three predominant insertions. Insertions above the operational threshold (dotted line at 50%) are followed up in downstream analysis, where C) the most common motif (in this case “TAA”) is inserted back into the new consensus sequence in the corresponding position.
Fig. 4NGS-based analysis of mixed bases. A) In amplicon #04 from participant “J” 5/5 Sanger capillary sequencing chromatograms covering position 2516 (red border) show overlapping peaks for A and G, which would result in an “R” base calling based on the IUPAC nomenclature. The presence of “A” and “G” are also supported in the NGS reads (heat map on the bottom). B) NGS data shows ˜80% of reads supporting the A and 20% supporting the G, with no directional bias. Quantitation of NGS reads allows to discriminate between events that should be base called as single vs. mixed bases, in a more consistent manner.
Fig. 5Validation of NGS of HIV-1 single genome amplicons. Ten env amplicons from participant “J” were subject to Sanger capillary sequencing and NGS. To ease visualization, the highlighter plot is shown, where each sequence was compared to the consensus of the participant, and differences from the consensus are denoted with color-coded tic marks using the LANL Highlighter convention (i.e., green = A, blue = C, orange = G, red = T, and gray = deletion) [13]. Each cognate pair shows identical patterns indicating 100% concordance between the two techniques.
Fig. 6Field test of NGS of HIV-1 single genome amplicons. 101 env amplicons from 10 HIV-1 infected individuals were subject to NGS sequencing. A) the phylogenetic tree shows separate clustering of sequences from each individual, supported by 100% bootstrap values. The 10 sequences from participant “J” that had been used in the assay validation are shown in grey. HIV-1 reference sequence HXB2 is depicted. B) Highlighter plot of sequences from each participant show different within-individual diversity profiles, ranging from highly homogeneous (e.g., participant “B”) to more diverse (e.g., participant “D”). Color-coding is as in Fig. 5. The 10 sequences from participant “J” that had been used in the assay validation are depicted by the grey bar. C) Pair-wise sequence diversity within each participant.
Fig. 7Scale up of NGS of HIV-1 single genome amplicons. 43 different env amplicons used in the assay validation were multiplexed using distinct barcodes were run together on a 318 chip v2 BC. The top panel shows the loading efficiency and run metrics. The bottom-left graph shows the per-base coverage for each sequence, represented by a different line color and the bottom-right graph summarizes the statistics of the 43 sequences. The dotted line indicates 50x coverage.