| Literature DB >> 27998286 |
Valerie F Boltz1, Jason Rausch2, Wei Shao3, Junko Hattori2, Brian Luke3, Frank Maldarelli2, John W Mellors4, Mary F Kearney2, John M Coffin5.
Abstract
BACKGROUND: Although next generation sequencing (NGS) offers the potential for studying virus populations in unprecedented depth, PCR error, amplification bias and recombination during library construction have limited its use to population sequencing and measurements of unlinked allele frequencies. Here we report a method, termed ultrasensitive Single-Genome Sequencing (uSGS), for NGS library construction and analysis that eliminates PCR errors and recombinants, and generates single-genome sequences of the same quality as the "gold-standard" of HIV-1 single-genome sequencing assay but with more than 100-fold greater depth.Entities:
Keywords: Allele linkage; Deep sequencing; HIV; HIV drug resistance; Minority variants; NGS; Primer ID; SGS; Single-genome sequencing; Supermajority correction; Targeted next-generation sequencing
Mesh:
Substances:
Year: 2016 PMID: 27998286 PMCID: PMC5175307 DOI: 10.1186/s12977-016-0321-6
Source DB: PubMed Journal: Retrovirology ISSN: 1742-4690 Impact factor: 4.602
Fig. 1Schematic representation of the methods used for NGS library construction. A cDNA library labeled with Primer IDs (top) is divided and used for each method. a uSGS. Short PCR primers (25 and 31 bases) containing 5′ dU in place of dT residues (dots in the primers) are used to amplify the cDNA. Products are cleaved at the dU sites creating dsDNA with 17-nt 3′-overhangs at both ends. The ends are then ligated to the essential NGS adapters and filled out using Klenow Fragment DNA polymerase to generate a fully double-stranded NGS library. b Long primer PCR-1. Long primers (90 and 93nt) containing NGS adapter sequences are used to amplify the cDNA library. c Long primer PCR-2. Relatively long primers (50–61nt) are used in 2 rounds of flanking PCR to amplify the cDNA and attach the adaptors. d Long primer PCR-3 involves 3 rounds of PCR. The cDNA is amplified with short primers (25 and 31 bases) followed by 2 rounds of flanking PCR using long primers (50–61nt) to attach the adaptors
Comparison of cDNA amplification efficiency among methods using the same HIV-1 site specific RT-Primer ID primer
| uSGS | LP-PCR-1 | |||
|---|---|---|---|---|
| AVG | STDV | AVG | STDV | |
| A. | ||||
| Copies of cDNA by qPCR | 134,193 | 45,940 | 134,193 | 45,940 |
| Total # unique primer IDsa | 39,597 | 7491 | 17,614 | 6442 |
| % cDNA amplified | 30% | 13% | 13% | 3.3% |
All results are taken from an average of 3 separate experimental libraries prepared from each method
aTotal number of consensus sequences above the Zhou algorithm cutoff [19]
Comparison of Recombination between methods at different consensus majority cutoffs in mixtures of BH10 WT and mutant transcript RNA
| % Majority cutoff | Method/enzyme | Total sequences | % Sequences excludeda | % Sequences remaining | % Remaining recombinants missed | % Errorb |
|---|---|---|---|---|---|---|
| 50 | uSGS | 33,870 | 0.6 | 99.4 | 0.39 | 0.020 |
| 60 | 33,870 | 1.1 | 98.9 | 0.30 | 0.018 | |
| 70 | Kapa Hi Fi | 33,870 | 2.6 | 97.4 | 0.20 | 0.015 |
| 80 | Uracil+ | 33,870 |
|
|
|
|
| 90 | 33,870 | 28.4 | 71.6 | 0.12 | 0.010 | |
| 100 | 33,870 | 43.9 | 56.1 | 0.12 | 0.009 | |
| 50 | LP-PCR-1 | 11,008 | 4.7 | 95.3 | 4.07 | 0.033 |
| 60 | 11,008 | 11.7 | 88.3 | 1.86 | 0.030 | |
| 70 | Taq Gold | 11,008 | 27.3 | 72.7 | 0.34 | 0.020 |
| 80 | 11,008 |
|
|
|
| |
| 90 | 11,008 | 77.3 | 22.7 | <0.01 | 0.007 | |
| 100 | 11,008 | 87.2 | 12.8 | <0.06 | 0.006 | |
| 50 | LP-PCR-2 | 23,142 | 1.4 | 98.6 | 0.91 | 0.037 |
| 60 | 23,142 | 3.3 | 96.7 | 0.62 | 0.033 | |
| 70 | Kapa2G | 23,142 | 10.5 | 89.5 | 0.30 | 0.024 |
| 80 | Robust | 23,142 |
|
|
|
|
| 90 | 23,142 | 63.5 | 36.5 | 0.09 | 0.013 | |
| 100 | 23,142 | 85.7 | 14.3 | 0.15 | 0.009 | |
| 50 | LP-PCR-3 | 20,252 | 4.6 | 95.4 | 6.44 | 0.016 |
| 60 | 20,252 | 13.8 | 86.2 | 2.00 | 0.015 | |
| 70 | Platinum | 20,252 | 23.4 | 76.6 | 0.35 | 0.013 |
| 80 | Hi Fi Taq | 20,252 |
|
|
|
|
| 90 | 20,252 | 71.7 | 28.3 | <0.04 | 0.004 | |
| 100 | 20,252 | 92.0 | 8.0 | <0.07 | 0.004 |
aConsensus sequences were excluded due to failure to achieve the required majority at each level of consensus at each nucleotide position, likely due to in vitro PCR recombination
bIncorrect bases at non drug resistant sites
Fig. 2Neighbor joining trees comparing PCR recombination in each method. Neighbor joining trees rooted on NL4-3 generated from randomly selected sets of 50 supermajority sequences obtained from mixtures of WT and Mutant BH10 pol transcripts. Reference sequences for the BH10 mutant and WT and NL4-3 WT are shown in large blue, black and green squares, respectively. Sequences matching the references are shown in the same colors as the references. The orange circles show an intermediate step in the bioinformatics computations and represent those sequences identified as PCR recombinant species that would be lost from the respective data sets after they were deleted in the final steps of the pipeline
Comparison of individual allele frequencies from different mixtures of BH10 WT and Mutant RNA transcripts analyzed by all methods
| Method | qPCR cDNA input copies | Consensus sequences | Allele frequency expected (%) | % allele frequency detected | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 65R | 67N | 70R | 74V | 100I | 103N | 181C | 184V | 188C | 190A | ||||
| a. uSGS vs Long Primer PCR-1 at 50% majority cutoff | |||||||||||||
| uSGS | 179,825 | 41,050 | 10 | 15.2 | 15.7 | 15.7 | 15.7 | 15.8 | 15.8 | 16.0 | 16.0 | 16.0 | 16.0 |
| LP-PCR-1 | 179,825 | 32,492 | 10 | 1.90 | 2.00 | 2.03 | 1.07 | 2.80 | 2.94 | 7.04 | 7.06 | 7.07 | 7.08 |
| uSGS | 178,175 | 28,061 | 1 | 1.53 | 1.60 | 1.58 | 1.59 | 1.62 | 1.62 | 1.65 | 1.65 | 1.65 | 1.64 |
| LP-PCR-1 | 178,175 | 20,565 | 1 | 0.22 | 0.23 | 0.24 | 0.12 | 0.31 | 0.32 | 0.69 | 0.70 | 0.70 | 0.70 |
| b. uSGS vs Long Primer PCR-1 at 80% majority cutoff | |||||||||||||
| uSGS | 179,825 | 41,050 | 10 | 8.2 | 8.5 | 8.5 | 8.5 | 8.5 | 8.5 | 8.5 | 8.5 | 8.5 | 8.5 |
| LP-PCR-1 | 179,825 | 32,492 | 10 | 0.07 | 0.07 | 0.07 | 0.07 | 0.07 | 0.07 | 0.27 | 0.27 | 0.27 | 0.27 |
| uSGS | 178,175 | 28,061 | 1 | 0.39 | 0.40 | 0.39 | 0.40 | 0.40 | 0.40 | 0.40 | 0.40 | 0.41 | 0.41 |
| LP-PCR-1 | 178,175 | 20,565 | 1 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.04 | 0.05 | 0.04 | 0.04 |
| c. uSGS vs Long Primer PCR-1 | |||||||||||||
| uSGS | 179,825 | 41,050 | 10 | 18 | 18.8 | 18.9 | 18.8 | 18.5 | 18.75 | 19 | 19.5 | 19.5 | 19.4 |
| LP-PCR-1 | 179,825 | 32,492 | 10 | 6.8 | 8.5 | 8.4 | 6.9 | 8.6 | 8.8 | 9.2 | 9.3 | 9.0 | 8.8 |
| uSGS | 178,175 | 28,061 | 1 | 1.8 | 2.0 | 1.9 | 1.9 | 1.9 | 1.9 | 2.1 | 2.1 | 2.1 | 2.1 |
| LP-PCR-1 | 178,175 | 20,565 | 1 | 0.83 | 0.74 | 0.83 | 0.58 | 0.78 | 0.8 | 0.96 | 1.3 | 0.86 | 0.86 |
| d. uSGS vs Long Primer PCR-2 and Long Primer PCR-3 at 50% majority cutoff | |||||||||||||
| uSGS | 69,000 | 23,048 | 30 | 30 | 31 | 32 | 31 | 32 | 32 | 32 | 32 | 32 | 32 |
| LP-PCR-2 | 69,000 | 17,230 | 30 | 21 | 22 | 22 | 22 | 22 | 23 | 23 | 23 | 23 | 23 |
| LP-PCR-3 | 69,000 | 14,229 | 30 | 23 | 24 | 24 | 24 | 25 | 26 | 28 | 28 | 28 | 28 |
| uSGS | 74,800 | 17,270 | 3 | 4.7 | 4.9 | 4.9 | 4.9 | 4.9 | 4.9 | 4.9 | 4.9 | 4.9 | 4.9 |
| LP-PCR-2 | 74,800 | 14,451 | 3 | 3.1 | 3.2 | 3.3 | 3.3 | 3.3 | 3.3 | 3.3 | 3.3 | 3.3 | 3.3 |
| LP-PCR-3 | 74,800 | 10,915 | 3 | 0.6 | 0.7 | 0.7 | 0.8 | 2.2 | 2.6 | 4.2 | 4.2 | 4.2 | 4.2 |
| uSGS | 67,860 | 16,757 | 0.3 | 0.35 | 0.38 | 0.36 | 0.36 | 0.36 | 0.37 | 0.37 | 0.37 | 0.36 | 0.36 |
| LP-PCR-2 | 67,860 | 15,696 | 0.3 | 0.32 | 0.30 | 0.33 | 0.31 | 0.32 | 0.36 | 0.32 | 0.32 | 0.32 | 0.32 |
| LP-PCR-3 | 67,860 | 9546 | 0.3 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.10 | 0.31 | 0.31 | 0.32 | 0.32 |
| e. uSGS vs Long Primer PCR-2 and Long Primer PCR-3 at 80% majority cutoff | |||||||||||||
| uSGS | 69,000 | 23,048 | 30 | 29 | 30 | 30 | 30 | 30 | 30 | 30 | 31 | 31 | 31 |
| LP-PCR-2 | 69,000 | 17,230 | 30 | 12 | 12 | 12 | 13 | 17 | 17 | 21 | 22 | 22 | 22 |
| LP-PCR-3 | 69,000 | 14,229 | 30 | 1.8 | 1.9 | 2.1 | 2.1 | 4.4 | 5.3 | 24 | 25 | 26 | 26 |
| uSGS | 74,800 | 17,270 | 3 | 4.3 | 4.5 | 4.5 | 4.5 | 4.6 | 4.6 | 4.6 | 4.9 | 4.9 | 4.9 |
| LP-PCR-2 | 74,800 | 14,451 | 3 | 1.1 | 1.1 | 1.2 | 1.2 | 2.0 | 2.0 | 3.2 | 3.3 | 3.3 | 3.3 |
| LP-PCR-3 | 74,800 | 10,915 | 3 | <0.01 | <0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 2.6 | 3.0 | 3.2 | 3.2 |
| uSGS | 67,860 | 16,757 | 0.3 | 0.28 | 0.31 | 0.29 | 0.29 | 0.32 | 0.32 | 0.33 | 0.36 | 0.36 | 0.36 |
| LP-PCR-2 | 67,860 | 15,696 | 0.3 | 0.11 | 0.10 | 0.10 | 0.11 | 0.21 | 0.22 | 0.30 | 0.32 | 0.31 | 0.32 |
| LP-PCR-3 | 67,860 | 9546 | 0.3 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | <0.01 | 0.12 | 0.12 | 0.17 | 0.19 |
Comparison of different DNA polymerases for library preparation using LP-PCR-1 and uSGS
| Method | Enzyme | cDNA starting copy | Qualified sequences | % Sequences excludeda | Final # sequences | % Population represented |
|---|---|---|---|---|---|---|
| LP-PCR-1 | AmpliTaq | 34,094 | 11,215 | 40 | 6729 | 20 |
| uSGS | AmpliTaq | 34,094 | 11,859 | 7 | 11,060 | 32 |
| LP-PCR-1 | Kapa | 34,094 | 3566 | 11 | 3188 | 9 |
| uSGS | Kapa | 34,094 | 18,855 | 2 | 18,512 | 54 |
| LP-PCR-1 | Platinum | 14,410 | 2789 | 82 | 558 | 4 |
| uSGS | Platinum | 88,560 | 25,773 | 10 | 22,938 | 26 |
aFinal number of “super majority” consensus sequences after removal of >2 ambiguous sites likely due to in vitro PCR recombination
bSynthesis of cDNA from WT/Mutant mixture of transcript HIV-1 RNA, divided into 4 parts and parallel libraries sequenced
cSynthesis of cDNA from WT/Mutant mixtures of transcript HIV-1 RNA independently in two separate experiments
Fig. 3Snapshot of sequence alignments of library construction obtained from clinical sample. Small subsets of supermajority sequence alignments obtained from a donor sample using the (a) LP-PCR or (b) uSGS methods of NGS library construction. Dashes (“–” in red) which have been placed in the consensus sequences by the bioinformatics pipeline in positions with <80% identity in sequences in a given daughter set, which is indicative of recombination during PCR. Asterisks mark positions with diverse bases in the uSGS data where no PCR recombination is seen
Linkage of resistance mutations in donors failing anti-retroviral therapy as measured by uSGS
| Patient/sample | Haplotypes | Expected (%) | Observed (%) | P value |
|---|---|---|---|---|
| 1/1 |
| 0.05 | 0.06 | 0.58 |
|
| 0.002 | 0.13 | 0.0004 | |
|
| 8.41 | 5.58 | 1.18*10−5 | |
|
| 0.04 | 0.06 | 0.49 | |
|
| 0.36 | 3.04 | 1.0*10−237 | |
|
| 0.0002 | 0.06 | 0.004 | |
|
| 0.44 | 0.25 | 0.18 | |
|
| 0.11 | 0.13 | 0.52 | |
|
| 86.25 | 89.22 | 0.0003 | |
|
| 0.05 | 0.06 | 0.58 | |
|
| 0.34 | 0.38 | 0.46 | |
|
| 3.72 | 0.82 | 3.1*10−13 | |
|
| 0.05 | 0.06 | 0.58 | |
|
| 0.05 | 0.06 | 0.58 | |
|
| 0.02 | 0.06 | 0.26 | |
| 1/2 |
| 0.13 | 0.15 | 0.58 |
|
| 0.02 | 0.15 | 0.15 | |
| 67D, 69T, | 0.01 | 0.15 | 0.08 | |
|
| 87.42 | 90.25 | 0.01 | |
|
| 0.33 | 3.10 | 5.1*10−14 | |
|
| 0.27 | 0.15 | 0.45 | |
|
| 8.04 | 5.02 | 0.001 | |
|
| 0.13 | 0.15 | 0.58 | |
|
| 3.64 | 0.89 | 6.1*10−6 | |
| 2/3 |
| 1.05 | 1.07 | 0.51 |
|
| 0.48 | 0.49 | 0.52 | |
|
| 0.16 | 0.16 | 0.57 | |
| 67D, 69T, 70K, 74L, 101K, 106V, 108V, | 0.08 | 0.08 | 0.62 | |
|
| 0.32 | 0.33 | 0.54 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 0.08 | 0.08 | 0.63 | |
|
| 1.21 | 1.23 | 0.51 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 95.76 | 95.64 | 0.44 | |
|
| 0.08 | 0.08 | 0.62 | |
|
| 0.40 | 0.41 | 0.53 |
Fig. 4Analysis of HIV-1 population structure in a clinical sample using uSGS for library construction. Neighbor joining trees of HIV-1 pol sequence from donor 1 sample 1 and blow up of NJ subtree, showing clustering and linkage of WT at RT position 106 (V106V blue squares) with 101Q (blue square with black outline). Numbers of identical sequences are shown in parentheses as this NJ tree was extracted from 1585 unique SGS, where identical sequences were collapsed. Within the highlighted subtree, note especially two sequences (2) in which WT RT codon 106, and rare mutations 101Q and 108I (blue square with black and orange out line) were found to be linked. Detection of such a rare linkage event would be virtually impossible using LP-PCR or conventional SGS. Data were obtained by NGS uSGS and rooted on Consensus B
Frequency of resistance mutations in donors failing anti-retroviral therapy as measured by uSGS
| Allele | Donor 1 Sample 1 | Donor 1 Sample 2 | Donor 2 Sample 3 |
|---|---|---|---|
| D67N | 100 | 99.85 | 99.92 |
| T69A | <0.06 | 0.15 | 0.08 |
| T69I | 0.06 | <0.15 | <0.08 |
| K70N | <0.06 | <0.15 | 0.08 |
| K70Q | <0.06 | <0.15 | 1.07 |
| K70T | <0.06 | <0.15 | 1.23 |
| L74I | <0.06 | <0.15 | 0.08 |
| K101E | 95.56 | 95.72 | <0.08 |
| K101Q | 4.12 | 3.99 | 0.08 |
| K101R | <0.06 | <0.15 | 0.49 |
| V106I | 91.12 | 91.58 | 0.33 |
| V108A | <0.06 | <0.15 | 0.08 |
| V108I | 0.51 | <0.15 | 0.41 |
| M184I | <0.06 | <0.15 | 0.16 |
| M184V | <0.06 | <0.15 | 99.84 |
| Y188C | 0.06 | <0.15 | <0.08 |
| G190A | 99.81 | 100 | <0.08 |
| G190E | <0.06 | <0.15 | 0.08 |
| G190R | <0.06 | <0.15 | 0.08 |
| G190T |
| <0.15 | <0.08 |
| T215F | 0.06 | <0.15 | <0.08 |
| T215R | <0.06 | <0.15 | <0.08 |
| T215Y | 99.94 | 100 | 100 |
| K219R | 99.94 | 99.85 | <0.08 |