Literature DB >> 30923677

Next-generation sequencing of HIV-1 single genome amplicons.

Gustavo H Kijak^1,2, Eric Sanders-Buell^1,2, Phuc Pham^1,2, Elizabeth A Harbolick^1,2, Celina Oropeza^1,2, Anne Marie O'Sullivan^1,2, Meera Bose^1,2, Charmagne G Beckett³, Mark Milazzo^1,2, Merlin L Robb^1,2, Sheila A Peel¹, Paul T Scott¹, Nelson L Michael¹, Adam W Armstrong⁴, Jerome H Kim¹, David M Brett-Major⁵, Sodsai Tovanabutra^1,2.

Abstract

The analysis of HIV-1 sequences has helped understand the viral molecular epidemiology, monitor the development of antiretroviral drug resistance, and design candidate vaccines. The introduction of single genome amplification (SGA) has been a major advancement in the field, allowing for the characterization of multiple sequences per patient while preserving linkage among polymorphisms in the same viral genome copy. Sequencing of SGA amplicons is performed by capillary Sanger sequencing, which presents low throughput, requires a high amount of template, and is highly sensitive to template/primer mismatching. In order to meet the increasing demand for HIV-1 SGA amplicon sequencing, we have developed a platform based on benchtop next-generation sequencing (NGS) (IonTorrent) accompanied by a bioinformatics pipeline capable of running on computer resources commonly available at research laboratories. During assay validation, the NGS-based sequencing of 10 HIV-1 env SGA amplicons was fully concordant with Sanger sequencing. The field test was conducted on plasma samples from 10 US Navy and Marine service members with recent HIV-1 infection (sampling interval: 2005-2010; plasma viral load: 5,884-194,984 copies/ml). The NGS analysis of 101 SGA amplicons (median: 10 amplicons/individual) showed within-individual viral sequence profiles expected in individuals at this disease stage, including individuals with highly homogeneous quasispecies, individuals with two highly homogeneous viral lineages, and individuals with heterogeneous viral populations. In a scalability assessment using the Ion Chef automated system, 41/43 tested env SGA amplicons (95%) multiplexed on a single Ion 318 chip showed consistent gene-wide coverage >50×. With lower sample requirements and higher throughput, this approach is suitable to support the increasing demand for high-quality and cost-effective HIV-1 sequences in fields such as molecular epidemiology, and development of preventive and therapeutic strategies.

Entities: Chemical Disease Gene Species

Keywords: Bioinformatics; HIV-1; IonTorrent; Next-generation sequencing; Single genome amplification

Year: 2019 PMID： 30923677 PMCID： PMC6423504 DOI： 10.1016/j.bdq.2019.01.002

Source DB: PubMed Journal: Biomol Detect Quantif

Introduction

After more than three decades since the first identification of the Human Immunodeficiency Virus type 1 (HIV-1), the development of safe and effective preventive vaccines, antiretroviral treatments (ART), and cure strategies remain major public health priorities [1]. The study of viral sequences has been central to these efforts, spanning through different stages of product development. Molecular epidemiology analyses are widely employed to model viral evolution [2], inform immunogen selection and design [3], and monitor the circulation of ART resistance mutations [4]. In vaccine efficacy trials, the exploration of immune pressure signatures imprinted in viral genomes from breakthrough cases can help elucidate the mechanism of action [5]. Like other RNA viruses, HIV-1 populations are genetically diverse and behave as quasispecies (i.e., swarms of highly related but distinct viral sequences [6]), due to the high rates of viral replication, mutation, and recombination [7]. The plasticity of HIV-1 quasispecies allows them to infect different cell targets [8], escape host immunity [9], and resist inhibition by ART [10]. A major challenge for the study of this high level of genetic diversity is the limited capacity of bulk PCR/sequencing techniques to capture the complexity of the viral quasispecies [11]. The application of single genome amplification (SGA) of sub-genomic (i.e., 1.5–3 Kb) [12,13] and full-length genome HIV-1 (i.e., 9 Kb) [14,15] has been a major advancement in the field. This technique is based on serial dilution of a viral genome template (usually complementary DNA (cDNA) obtained from viral RNA (vRNA) by reverse transcription) followed by nested PCR of multiple (>10) replicates. Based on the Poisson distribution, the dilution that yields ≤30% positive reactions has an 80% probability of deriving from a single amplifiable template [12,13]. This approach allows for the study of multiple viral sequences per patient, preserving linkage among polymorphisms in the same viral genome copy, with limited impact from PCR-induced misincorporation/recombination or bacterial selection during cloning [14]. HIV-1 SGA amplicon derived sequences published to date have been obtained exclusively by capillary sequencing based on the Sanger method (the single exception is the PacBio-based sequencing of pooled single genome amplicons by Dilernia et al. [16] The Sanger sequencing technique provides reads of length ˜800 base pairs (bp) with low sequencing error, which allows for the straightforward generation of contigs de novo (i.e., without the need of a pre-existing reference sequence). This method is based on primer-directed sequencing [17], thus requiring prior knowledge of the target sequence. In the case of HIV-1, where inter-strain nucleotide sequence diversity can reach 20% [18], some sequencing reactions may fail due to mismatches between target and primer, and require the selection of a second set of sequencing primers to “fill in” the low-coverage areas in the contig. To achieve the desired level of bidirectional coverage (usually 4×), ∼6 μg of PCR amplicon is used as substrate for the multiple dye-termination sequencing reactions. Recent years have seen an increase in demand for HIV-1 sequencing in large cohort studies [[19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32]]. For instance, the sieve analysis of the RV144 vaccine efficacy trial generated >1000 HIV-1 env gene SGA amplicon sequences from 121 patients [33]. Consequently, interest is increasing in the field for reliable, cost-effective, and scalable alternatives to capillary Sanger sequencing. Here we describe the development, validation, and field-testing of an alternative HIV-1 SGA amplicon sequencing platform based on next-generation sequencing (NGS). Unlike capillary sequencing, which allows for a maximum of 96 parallel reactions, NGS allows for millions of parallel reactions [34]. While the high cost of first-generation NGS instruments limited their availability to sequencing core facilities, by the early 2010′s Life Technologies and Illumina launched more affordable benchtop NGS sequencers (i.e., Ion Torrent PGM [35] and MiSeq [36], respectively) which has allowed for the wider spread of NGS technologies in research laboratories [37]. Sequence reads obtained by benchtop NGS instruments are of shorter length and lower quality [38] than capillary sequences, thus requiring a large reading redundancy to mitigate sequencing errors [39]. Here we propose a strategy that is based on benchtop NGS, including an accompanying bioinformatics pipeline that can run on conventional desktops/laptops. Overall, our results demonstrate that this NGS strategy performs with comparable accuracy to capillary sequencing. Properly incorporated, the NGS platform can accommodate the increasing needs of HIV-1 SGA amplicon sequencing with its advantages in cost, scalability and ease of data analysis.

Material and methods

Population under study

As a part of proactive public health management, we undertook a characterization of the contemporary HIV epidemic in the US Navy and Marine Corps [40]. Health system and occupational data as well as reposed sera from all Sailors and Marines identified as HIV-infected over a five-year period ending in 2010 were included (n = 496 service members). In addition to exploring holistic relationships which might inform public health engagement to reduce service member HIV infection risk, a cluster analysis was performed through molecular methods [41]. Also, a sub-group of the cohort volunteered and participated in a risk survey [42]. For the current work, samples from 10 random participants were used, meeting the following criteria: 1) plasma viral load >5000 copies/ml, and 2) available sample volume >2.0 ml (which would allow future work on leftover specimens) (Table 1).

Table 1

Sample set used in the validation and field test of HIV-1 env SGA NGS.

Patient	Plasma viral load		SGA amplicons (n)
Patient	(copies/ml)	log10	Validation	Field test
A	44,668	4.65		11
B	169,824	5.23		9
C	194,984	5.29		12
D	5,884	3.77		9
E	100,000	5.00		11
F	85,114	4.93		13
G	11,749	4.07		13
H	47,863	4.68		9
I	10,471	4.02		8
J	154,882	5.19	10	6

Sample set used in the validation and field test of HIV-1 env SGA NGS.

Single genome amplification of HIV-1 env

vRNA was extracted from plasma samples using the QIAamp Viral RNA Mini Kit (QIAGEN, Valencia, CA). Single genome amplicons of full length HIV-1 env were retrieved from vRNA using reverse transcription (RT) followed by nested PCR as previously described [41]. Briefly, after RT, cDNA was titrated through nested-PCR of HIV-1 env of 10 replicates/dilution. The dilution that provided 3/10 positive reactions was used to generate HIV-1 env single genome amplicons [12,13].

Library preparation and next-generation sequencing

For each single genome amplicon, 100 ng of second round PCR product was enzymatically sheared to 400 bp followed by barcoding using the Ion Xpress Plus Fragment Library & Ion Xpress Barcode Adapters kits (LifeTechnologies, ThermoFisher Scientific). Quantification was performed using a 2100 Bioanalyzer (DNA 1000 kit, Agilent Technologies, Sunnyvale, CA). DNA size-selection was performed using Blue Pippin 2% dye free cassette with internal standard marker V1 (Sage Science). The size-selected product was equalized using the Ion Equalizer kit (LifeTechnologies, ThermoFisher Scientific) following manufacturer’s instructions. All purifications used Agencourt AMPure XP Reagent (Beckman Coulter). Emulsion PCR (ePCR) and enrichment for 400 bp sheared product used the Ion OneTouch 400 bp Template kit (LifeTechnologies, Carlsbad, CA) on the OneTouch and ES instruments. Sequencing was carried out using the Ion PGM 400 kit and Ion 316 chip v2 on the IonTorrent PGM platform (LifeTechnologies, Carlsbad, CA), following manufacturer’s instructions. For scalability experiments, the ePCR/enrichment of libraries derived from 43 different amplicons were carried out on the Ion Chef instrument with Ion PGM Hi-Q Chef kit (LifeTechnologies, Carlsbad, CA) and were sequenced on Ion 318 chips v2 BC.

Sanger sequencing

To validate the NGS-based platform, Sanger sequencing of HIV-1 env single genome amplicons was performed as previously described [41]. Contigs of Sanger reads were assembled with Sequencher 5.0 (Gene Codes Corporation, Ann Arbor, MI).

Data analysis

Quality control

FastQ files [43] were exported from the PGM using Torrent Suite 4.4 software (LifeTechnologies, ThermoFisher Scientific). Sequence quality control was performed using FastQC (courtesy of Dr. Simon Andrews, Babraham Institute, Cambridge, UK; URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). FastQ files [43] were then imported into CLC Genomics Workbench version 7.0.3 (Aarhus, Denmark) to remove sequencing adapters and trim sequences based on quality (limit = 0.05; maximal 2 ambiguous nucleotides) and length (minimum length = 200 nucleotides), followed by barcode-based demultiplexing. We developed a pipeline, combining published and novel tools, to obtain the consensus sequence of single genome amplicons sequenced by NGS. The pipeline is composed of three modules. First, newly-developed tango was used to select a reference sequence to serve as seed for the initial reference-guided alignment. Then, previously-published Nautilus [44] and newly-developed SaGA were used for iterative determination of consensus sequence Finally, newly-developed ViKiNGS was employed for manual quality control and editing of the final consensus sequence. The description of these modules is presented in detail in the Results section, along with examples of their implementation. The software is accessible to interested users through the execution of individual software licenses with Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc. Please contact the corresponding authors.

Reference-guided alignment

Filtered reads were aligned to reference using the Burrows-Wheeler Aligner (BWA) [45] implemented in tmap version 3.2.2 (by Nils Homer, distributed through https://github.com/iontorrent/TMAP) using the following parameters: command = map2; match score = 1; mismatch penalty = 3; gap open penalty = 5; gap extension penalty = 2; and soft-clip only the right portion of the read.

Statistical testing

Sequence alignments were generated using HIVAlign [46], and were manually edited using Geneious 3 (http://www.geneious.com) [47]. The HIV-1 subtype was determined using NCBI Genotyping tool (https://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi) and jumping profile Hidden Markov Model (jpHMM) tool [48] (http://jphmm.gobics.de). The inter-variant sequence diversity within a sample was visualized using Highlighter tool [13]. Phylogenetic analyses were conducted using MEGA6.06 [49] (www.megasoftware.net). Prism version 6.0e (GraphPad Software) and JMP10 (SAS Institute, Cary, NC) were used for summary statistical analyses.

Results and discussion

Model selection

Since its first description, the SGA method has been applied to the study of the three structural HIV-1 genes (i.e., gag [1.5 kilobases (Kb)], pol [3 Kb], and env [2.6 Kb]) [12,13,50], as well as half-length (˜5 Kb) [51] and near full-length genomes (˜9 Kb) [14,51]. HIV-1 env has been a major focus of SGA amplicon sequencing efforts, due to the importance of the encoded glycoprotein (Gp160) in viral tropism and antibody-mediated immune responses [5,13]. Among HIV-1 structural genes, env presents the highest level of genetic diversity, with nucleotide sequences differing by up to 20% [http://www.hiv.lanl.gov/]. Moreover, env presents marked length polymorphism both intra- and inter-patient [52]. Thus, in the current paper we have focused on the NGS of env SGA amplicons, as a “worst-case scenario” for difficulty of sequence alignment, and the current platform can be applied to other subgenomic regions or full-length HIV-1.

Summary of SGA amplicon library preparation

The preparation of HIV-1 env SGA amplicon NGS libraries is described in detail in the Materials and Methods section (Fig. 1). Briefly, HIV-1 env SGA amplicons were sheared to 400 bp and were subjected to adapter/barcode ligation. After emulsion PCR, the libraries were sequenced in IonTorrent PGM 316 chip v2. Reads were exported from the instrument using the Torrent Server. After filtering low-quality and short reads (<200 bp), sequences were exported in FastQ files for alignment.

Fig. 1

Next-generation sequencing of HIV-1 single genome amplicons. cDNA of HIV-1 env is titrated using serial dilution followed by nested PCR. The dilution that yields ≤30% of positive reactions is used for downstream library preparation. Amplicons are subject to enzymatic fragmentation, which is visualized on a Bioanalyzer. Gel electrophoresis is used for size selection (˜400 bp), and ligation of barcodes and sequencing adapters allows for multiplexing dozens of samples in a single emulsion PCR (ePCR) run. Libraries are then loaded on a 316 chip v2, and Ion sequencing is performed on a PGM instrument. The histogram shows the NGS read length distribution of a typical run. See text for details.

Alignment

In order to allow for the current bioinformatics pipeline to run on a computer with capabilities usually found in research laboratories, we selected reference-guided alignment over de novo alignment. The challenge with the former method is that it requires a reference sequence to which align the reads. In the absence of an autologous (i.e., from the same patient) reference, we developed tango, an algorithm that uses a random sample of the NGS reads to select the closest sequence (“Reference#01” in Fig. 2), using BLAST [53], from a local database of published HIV-1 sequences, following the model proposed by Archer et al. [54]. All of the NGS reads were then aligned to Reference#01 using a BWA-based algorithm [45], resulting in a sam file: “Alignment#01”. Due to HIV-1 vast genetic diversity, Reference#01 will unavoidably represent an “imperfect” reference, which can differ from the query reads in substitutions, deletions and insertions. We used Nautilus [44] to explore Alignment#01, tallying the frequency of each nucleotide base at each position of the alignment (Fig. 3a), and we used SaGA to derive a consensus sequence depicting the base present at ≥ 50% in each position (in the case where no base was present at ≥ 50%, the position was represented by an “N”). In the cases where the query reads predominantly presented a deletion, the position was represented in the consensus by a gap (i.e., “-”). The Concise Idiosyncratic Gapped Alignment Report (CIGAR) field in the sam file [55] encodes for insertions in the query compared to the reference (the example in Fig. 3b shows the presence of three insertions in the query reads at 80%: one trinucleotide and two hexanucleotides). When insertions at a given position were present in ≥ 50% of the reads, the most frequent motif was inserted into the consensus (Fig. 3c). The consensus sequence thus generated, which reconciled differences between Reference#01 and the query reads, was made into a new reference sequence (Reference#02) that guided the alignment of NGS reads in a new iteration (Fig. 2). This process was then repeated, until it resulted in no further improvement; the obtained sequence was the “final sequence”.

Fig. 2

Fig. 3

Example of derivation of single genome amplicon NGS consensus. A) At each alignment position (columns), the sequence of the reference (top) is compared with the frequency of nucleotide bases or gaps (rows) tallied based on the analysis of the sam file (to ease visualization, frequencies are here presented as a heat map). Whenever the most frequent base/gap differs from the reference (red border), the sequence of the consensus is modified accordingly (black boxes). B) By analyzing the CIGAR field in the sam alignment file it is possible to tally the sequences from the NGS reads encoded as “I”, which correspond to “insertion to the reference” (i.e., bases present in the NGS reads that do not have a corresponding position in the reference). The plot depicts, at position of the alignment (x-axis), the frequency of the insertions (y-axis). Data points are color-coded based on the length of the insertion. The arrows depict the sequences of three predominant insertions. Insertions above the operational threshold (dotted line at 50%) are followed up in downstream analysis, where C) the most common motif (in this case “TAA”) is inserted back into the new consensus sequence in the corresponding position.

Algorithm for deriving a consensus sequence from HIV-1 single genome amplicon Using the tango software, the closest published HIV-1 sequence that can serve as a reference is obtained by applying BLAST on a random sample of NGS reads (Reference#01). The complete set of NGS reads is then aligned to Reference#01 using an implementation of BWA [45]. Afterwards, the consensus of the alignment (Reference#02) is derived by analyzing the frequency of nucleotide bases, insertions, and deletions in the sam file using Nautilus and SaGA software. Reference#02 is used to guide the alignment of NGS reads and consensus derivation, in a new iteration, until the “final consensus sequence” is obtained. See text for details. Example of derivation of single genome amplicon NGS consensus. A) At each alignment position (columns), the sequence of the reference (top) is compared with the frequency of nucleotide bases or gaps (rows) tallied based on the analysis of the sam file (to ease visualization, frequencies are here presented as a heat map). Whenever the most frequent base/gap differs from the reference (red border), the sequence of the consensus is modified accordingly (black boxes). B) By analyzing the CIGAR field in the sam alignment file it is possible to tally the sequences from the NGS reads encoded as “I”, which correspond to “insertion to the reference” (i.e., bases present in the NGS reads that do not have a corresponding position in the reference). The plot depicts, at position of the alignment (x-axis), the frequency of the insertions (y-axis). Data points are color-coded based on the length of the insertion. The arrows depict the sequences of three predominant insertions. Insertions above the operational threshold (dotted line at 50%) are followed up in downstream analysis, where C) the most common motif (in this case “TAA”) is inserted back into the new consensus sequence in the corresponding position. In some instances, SGA amplicon sequences can present ≥1 mixed bases, due to the presence of >1 template in the original PCR reaction [14]. Also, Salazar-Gonzalez et al. have reported on the presence of mixed bases due to misincorporation by Taq polymerase during initial PCR cycles [14], emphasizing the importance of using high-fidelity polymerases. In capillary sequencing, mixed bases are evidence as overlapping peaks in the chromatogram, but it is not possible to reliably estimate the frequency of the different variants. Using the current NGS platform, it is possible to quantify the number of forward and reverse reads supporting each variant (Fig. 4). This information is particularly useful when ruling out miscalls due to strand bias. We have designed a GUI, ViKiNGS, to support the quality control and manual editing of the “final sequences” (Suppl Fig. 1).

Fig. 4

NGS-based analysis of mixed bases. A) In amplicon #04 from participant “J” 5/5 Sanger capillary sequencing chromatograms covering position 2516 (red border) show overlapping peaks for A and G, which would result in an “R” base calling based on the IUPAC nomenclature. The presence of “A” and “G” are also supported in the NGS reads (heat map on the bottom). B) NGS data shows ˜80% of reads supporting the A and 20% supporting the G, with no directional bias. Quantitation of NGS reads allows to discriminate between events that should be base called as single vs. mixed bases, in a more consistent manner.

Validation

In order to assess the accuracy of the NGS platform, we used SGA amplicons that had been sequenced with capillary Sanger sequencing. The validation set represented 10 HIV-1 env SGA amplicons from the same individual (participant "J”) with median inter-variant genetic distance of 0.86% (range: 0.35–1.13%), distinguishable by substitutions and insertions/deletions. In 10/10 cases, the NGS results fully match the capillary Sanger sequences (Fig. 5).

Fig. 5

Validation of NGS of HIV-1 single genome amplicons. Ten env amplicons from participant “J” were subject to Sanger capillary sequencing and NGS. To ease visualization, the highlighter plot is shown, where each sequence was compared to the consensus of the participant, and differences from the consensus are denoted with color-coded tic marks using the LANL Highlighter convention (i.e., green = A, blue = C, orange = G, red = T, and gray = deletion) [13]. Each cognate pair shows identical patterns indicating 100% concordance between the two techniques.

Field test

The applicability of the NGS platform was assessed on a field test of additional 101 HIV-1 env SGA amplicons from 10 individuals with plasma viral loads ranging 5,884-194,984 copies/ml (Table 1). This sample set included 6 additional SGA amplicons from the same individual used in the assay validation (participant “J”). All of the sequences were subtype B (Suppl Fig. 2) and the phylogenetic analysis supported the grouping of sequences from each individual in separate and distinct clusters (Fig. 6a). The observed profiles included individuals with highly homogeneous viral quasispecies (e.g., participant “B”), individuals with two highly homogeneous viral lineages (e.g., participant “C”), and individuals with heterogeneous viral populations (e.g., participant “G”) (Fig. 6b and c). Of note, 9/9 SGA amplicons from participant “H” presented the same premature stop codon in gp160 (codon 24). The capillary Sanger sequencing of the SGA amplicon showed an open reading frame, while the frame shift observed in NGS was due to a single-base deletion at codon 9. A detailed inspection of the NGS data showed that only a minority of the reads (13%) presented the open reading frame sequence (Suppl Fig. 3). The reason for the aberrant NGS sequence is unknown, but the presence of an unusually high GC-content in the vicinity of the affected region (GC = 75%, compared to genome-wide average GC = 43%) might have affected NGS reading.

Fig. 6

Field test of NGS of HIV-1 single genome amplicons. 101 env amplicons from 10 HIV-1 infected individuals were subject to NGS sequencing. A) the phylogenetic tree shows separate clustering of sequences from each individual, supported by 100% bootstrap values. The 10 sequences from participant “J” that had been used in the assay validation are shown in grey. HIV-1 reference sequence HXB2 is depicted. B) Highlighter plot of sequences from each participant show different within-individual diversity profiles, ranging from highly homogeneous (e.g., participant “B”) to more diverse (e.g., participant “D”). Color-coding is as in Fig. 5. The 10 sequences from participant “J” that had been used in the assay validation are depicted by the grey bar. C) Pair-wise sequence diversity within each participant.

Scalability

Next, we assessed the capacity of the NGS platform to scale up, in order to accommodate the increasing demand for SGA amplicon sequencing. The assay validation and field test had been run on IonTorrent 316 chip v2, and in the following experiment we tested the 318 chip v2 BC, which provides ˜1.8× sequencing capacity compared to the 316v2 chip. Also, to assist with ePCR/enrichment and chip loading, we employed the Ion Chef automated system. We ran 43 different HIV-1 env SGA amplicons, which had already been sequenced during the assay validation and field test. The run resulted in 5.9 million reads, of which 3.9 million (66.1%) had a length >300 bp. The number of reads per barcode ranged 704-187,350 (median: 108,548 reads per barcode; inter-quartile range: 26,132-127,354.5). When the NGS reads were ran through the genotyping pipeline, 41/43 (95%) of the SGA amplicons presented consistent gene-wide coverage >50× (the operational cut-off depth following guidelines by Pelak et al. [56]), with 37/43 (86%) of the SGA amplicons having a median coverage >500× (Fig. 7). Consensus sequences matched those obtained in the previous experiment using 316 chip v2.

Fig. 7

Scale up of NGS of HIV-1 single genome amplicons. 43 different env amplicons used in the assay validation were multiplexed using distinct barcodes were run together on a 318 chip v2 BC. The top panel shows the loading efficiency and run metrics. The bottom-left graph shows the per-base coverage for each sequence, represented by a different line color and the bottom-right graph summarizes the statistics of the 43 sequences. The dotted line indicates 50x coverage.

Overall comparison of NGS- and Sanger-based SGA amplicon sequencing platforms

Motivated by the increase in the demand for HIV-1 SGA amplicon sequencing, we have developed an NGS-based SGA amplicon sequencing platform. Among the advantages of the NGS-based system are: Lower DNA input: the gold standard Sanger sequencing platform requires ∼6 μg of 2nd round PCR product to support bidirectional sequencing of HIV-1 env (at 4x coverage), which is usually achieved by running multiple PCRs in parallel. In contrast, the NGS-based platform has a much lower template requirement (˜100 ng), which represents a substantial reduction in the cost of library preparation. No need of HIV-1 sequencing primers: HIV-1 sequence diversity is a major challenge for Sanger-based sequencing. Mismatching between primers and target sequences usually results in gaps in the contigs, which require the selection of new sets of primers by a trained operator and the running of additional sequencing reactions to provide complete coverage. In contrast, the NGS-based platform does not depend on HIV-1 primers for sequencing; however, it does necessitate an HIV-1 sequence for reference-guided alignment. In the current bioinformatics pipeline, the query for a suitable reference capitalizes from the vast public sequence database (to date 53,666 HIV-1 env sequences have been deposited in the Los Alamos National Laboratory HIV database, URL:http://www.hiv.lanl.gov/, accessed 03JAN18). Higher throughput: The large sequencing landscape within a chip combined with the capacity to multiplex samples using distinguishable barcodes lends the NGS platform the capacity to process dozens of different samples per run [56]. The throughput of this platform can be further increased by leveraging automation systems for library preparation and more efficient bioinformatics pipelines for sequence handling. Platform independence: While the data presented in the current study were obtained using the IonTorrent platform, similar analyses could be perform on other benchtop NGS platforms (i.e., Illumina’s MiSeq). Moreover, the bioinformatics pipeline is implemented in Java, which allows for implementation on Windows, Linux, and Mac OS. The per-base cost of NGS has been rapidly decreasing (Data from the NHGRI Genome Sequencing Program. URL: https://www.genome.gov/27541954/dna-sequencing-costs-data/ accessed on 11JAN19), making the current platform an affordable alternative to Sanger capillary sequencing. Despite the abovementioned advantages of the NGS-based platform, it is important to consider that Sanger capillary sequencing is a mature technology, widely employed for more than 2 decades, which can substantially ease assay development and troubleshooting. As shown in the current field test, NGS-based systems can also be affected by artifacts. Moreover, the use of short-read NGS limits the capacity to assess the linkage/phasing of polymorphisms, which could be needed to distinguish if the presence of multiple instances of mixed bases in the same sequence are due to multiple templates in the PCR or misincorporation by Taq polymerase. As the new technology expands, it will be important to remain vigilant to systematic errors and the search for technical ways to mitigate them. Finally, a third generation of sequencing technologies has been developed, characterized by longer reads (>10 Kb) of single molecules (e.g., SMRT by Pacific Biosciences and MinION by Oxford Nanopore Technologies). Reports published thus far show promising results regarding the sequencing of complex HIV-1 quasi-species [16,[57], [58], [59]].

Conclusion

In the current study, we have demonstrated the applicability of benchtop NGS platforms for the sequencing of HIV-1 single genome amplicons. With lower sample requirements and higher throughput, this approach is suitable to support the increasing demand for high-quality HIV-1 sequences in fields such as molecular epidemiology, and development of preventive and therapeutic strategies.

Conflict of interests

The authors declare no conflict of interests. Gustavo Kijak is currently an employee of GSK Vaccines, Rockville, MD, USA.

Financial support

This project was funded in part by the U.S. Navy Bureau of Medicine and Surgery and the Military Infectious Diseases Research Program, project MIDRP-H014010OTPPOC.These studies were supported by a cooperative agreement (W81XWH-07-2-0067) between the Henry M. Jackson Foundation for the Advancement of Military Medicine and the Department of Defense.

58 in total

1. Nautilus: a bioinformatics package for the analysis of HIV type 1 targeted deep sequencing data.

Authors: Gustavo H Kijak; Phuc Pham; Eric Sanders-Buell; Elizabeth A Harbolick; Leigh Anne Eller; Merlin L Robb; Nelson L Michael; Jerome H Kim; Sodsai Tovanabutra
Journal: AIDS Res Hum Retroviruses Date: 2013-08-02 Impact factor: 2.205

2. Comparative performance of high-density oligonucleotide sequencing and dideoxynucleotide sequencing of HIV type 1 pol from clinical samples.

Authors: H F Günthard; J K Wong; C C Ignacio; D V Havlir; D D Richman
Journal: AIDS Res Hum Retroviruses Date: 1998-07-01 Impact factor: 2.205

Review 3. Resistance to reverse transcriptase inhibitors used in the treatment and prevention of HIV-1 infection.

Authors: Nicolas Sluis-Cremer; Mark A Wainberg; Raymond F Schinazi
Journal: Future Microbiol Date: 2015-10-30 Impact factor: 3.165

Review 4. Evolution of Host Target Cell Specificity During HIV-1 Infection.

Authors: Olivia D Council; Sarah B Joseph
Journal: Curr HIV Res Date: 2018 Impact factor: 1.581

5. Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.

Authors: Melissa Laird Smith; Ben Murrell; Kemal Eren; Caroline Ignacio; Elise Landais; Steven Weaver; Pham Phung; Colleen Ludka; Lance Hepler; Gemma Caballero; Tristan Pollner; Yan Guo; Douglas Richman; Pascal Poignard; Ellen E Paxinos; Sergei L Kosakovsky Pond; Davey M Smith
Journal: Virus Evol Date: 2016-07-08

6. Demographic processes affect HIV-1 evolution in primary infection before the onset of selective processes.

Authors: Joshua T Herbeck; Morgane Rolland; Yi Liu; Sherry McLaughlin; John McNevin; Hong Zhao; Kim Wong; Julia N Stoddard; Dana Raugi; Stephanie Sorensen; Indira Genowati; Brian Birditt; Angela McKay; Kurt Diem; Brandon S Maust; Wenjie Deng; Ann C Collier; Joanne D Stekler; M Juliana McElrath; James I Mullins
Journal: J Virol Date: 2011-05-18 Impact factor: 5.103

7. Geographic and temporal trends in the molecular epidemiology and genetic mechanisms of transmitted HIV-1 drug resistance: an individual-patient- and sequence-level meta-analysis.

Authors: Soo-Yon Rhee; Jose Luis Blanco; Michael R Jordan; Jonathan Taylor; Philippe Lemey; Vici Varghese; Raph L Hamers; Silvia Bertagnolio; Tobias F Rinke de Wit; Avelin F Aghokeng; Jan Albert; Radko Avi; Santiago Avila-Rios; Pascal O Bessong; James I Brooks; Charles A B Boucher; Zabrina L Brumme; Michael P Busch; Hermann Bussmann; Marie-Laure Chaix; Bum Sik Chin; Toni T D'Aquin; Cillian F De Gascun; Anne Derache; Diane Descamps; Alaka K Deshpande; Cyrille F Djoko; Susan H Eshleman; Herve Fleury; Pierre Frange; Seiichiro Fujisaki; P Richard Harrigan; Junko Hattori; Africa Holguin; Gillian M Hunt; Hiroshi Ichimura; Pontiano Kaleebu; David Katzenstein; Sasisopin Kiertiburanakul; Jerome H Kim; Sung Soon Kim; Yanpeng Li; Irja Lutsar; Lynn Morris; Nicaise Ndembi; Kee Peng Ng; Ramesh S Paranjape; Martine Peeters; Mario Poljak; Matt A Price; Manon L Ragonnet-Cronin; Gustavo Reyes-Terán; Morgane Rolland; Sunee Sirivichayakul; Davey M Smith; Marcelo A Soares; Vincent V Soriano; Deogratius Ssemwanga; Maja Stanojevic; Mariane A Stefani; Wataru Sugiura; Somnuek Sungkanuparph; Amilcar Tanuri; Kok Keng Tee; Hong-Ha M Truong; David A M C van de Vijver; Nicole Vidal; Chunfu Yang; Rongge Yang; Gonzalo Yebra; John P A Ioannidis; Anne-Mieke Vandamme; Robert W Shafer
Journal: PLoS Med Date: 2015-04-07 Impact factor: 11.069

8. HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences.

Authors: Oliver Ratmann; Chris Wymant; Caroline Colijn; Siva Danaviah; M Essex; Simon D W Frost; Astrid Gall; Simani Gaiseitsiwe; Mary Grabowski; Ronald Gray; Stephane Guindon; Arndt von Haeseler; Pontiano Kaleebu; Michelle Kendall; Alexey Kozlov; Justen Manasa; Bui Quang Minh; Sikhulile Moyo; Vladimir Novitsky; Rebecca Nsubuga; Sureshnee Pillay; Thomas C Quinn; David Serwadda; Deogratius Ssemwanga; Alexandros Stamatakis; Jana Trifinopoulos; Maria Wawer; Andrew Leigh Brown; Tulio de Oliveira; Paul Kellam; Deenan Pillay; Christophe Fraser
Journal: AIDS Res Hum Retroviruses Date: 2017-05-25 Impact factor: 2.205

9. Rare HIV-1 transmitted/founder lineages identified by deep viral sequencing contribute to rapid shifts in dominant quasispecies during acute and early infection.

Authors: Gustavo H Kijak; Eric Sanders-Buell; Agnes-Laurence Chenine; Michael A Eller; Nilu Goonetilleke; Rasmi Thomas; Sivan Leviyang; Elizabeth A Harbolick; Meera Bose; Phuc Pham; Celina Oropeza; Kultida Poltavee; Anne Marie O'Sullivan; Erik Billings; Melanie Merbah; Margaret C Costanzo; Joanna A Warren; Bonnie Slike; Hui Li; Kristina K Peachman; Will Fischer; Feng Gao; Claudia Cicala; James Arthos; Leigh A Eller; Robert J O'Connell; Samuel Sinei; Lucas Maganga; Hannah Kibuuka; Sorachai Nitayaphan; Mangala Rao; Mary A Marovich; Shelly J Krebs; Morgane Rolland; Bette T Korber; George M Shaw; Nelson L Michael; Merlin L Robb; Sodsai Tovanabutra; Jerome H Kim
Journal: PLoS Pathog Date: 2017-07-31 Impact factor: 6.823

10. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

4 in total

Review 1. So Pathogenic or So What?-A Brief Overview of SIV Pathogenesis with an Emphasis on Cure Research.

Authors: Adam J Kleinman; Ivona Pandrea; Cristian Apetrei
Journal: Viruses Date: 2022-01-12 Impact factor: 5.048

2. Evidence of recurrent selection of mutations commonly found in SARS-CoV-2 variants of concern in viruses infecting immunocompromised patients.

Authors: Livia R Goes; Juliana D Siqueira; Marianne M Garrido; Brunna M Alves; Claudia Cicala; James Arthos; João P B Viola; Marcelo A Soares
Journal: Front Microbiol Date: 2022-07-26 Impact factor: 6.064

3. Evolution of Multiple Domains of the HIV-1 Envelope Glycoprotein during Coreceptor Switch with CCR5 Antagonist Therapy.

Authors: Yueqi Du; Ellen Wu; Xiang Gao; Jie Zhang; John C Martin; Bruce A Rosa; Makedonka Mitreva; Lee Ratner
Journal: Microbiol Spectr Date: 2022-06-21

4. Validation of Variant Assembly Using HAPHPIPE with Next-Generation Sequence Data from Viruses.

Authors: Keylie M Gibson; Margaret C Steiner; Uzma Rentia; Matthew L Bendall; Marcos Pérez-Losada; Keith A Crandall
Journal: Viruses Date: 2020-07-14 Impact factor: 5.048

4 in total