| Literature DB >> 31616589 |
Travis C Glenn1,2,3,4, Todd W Pierson1,5, Natalia J Bayona-Vásquez1,4, Troy J Kieran1, Sandra L Hoffberg2,6, Jesse C Thomas Iv1,7, Daniel E Lefever8,9, John W Finger1,3,10, Bei Gao1,11, Xiaoming Bian1,12, Swarnali Louha4, Ramya T Kolli3,13,14, Kerin E Bentley2,15, Julie Rushmore16,17, Kelvin Wong18,19, Timothy I Shaw4,18,20, Michael J Rothrock21, Anna M McKee22, Tai L Guo8, Rodney Mauricio2, Marirosa Molina18,23, Brian S Cummings3,13, Lawrence H Lash24, Kun Lu1,25, Gregory S Gilbert26, Stephen P Hubbell27,28, Brant C Faircloth29.
Abstract
Next-generation sequencing (NGS) of amplicons is used in a wide variety of contexts. In many cases, NGS amplicon sequencing remains overly expensive and inflexible, with library preparation strategies relying upon the fusion of locus-specific primers to full-length adapter sequences with a single identifying sequence or ligating adapters onto PCR products. In Adapterama I, we presented universal stubs and primers to produce thousands of unique index combinations and a modifiable system for incorporating them into Illumina libraries. Here, we describe multiple ways to use the Adapterama system and other approaches for amplicon sequencing on Illumina instruments. In the variant we use most frequently for large-scale projects, we fuse partial adapter sequences (TruSeq or Nextera) onto the 5' end of locus-specific PCR primers with variable-length tag sequences between the adapter and locus-specific sequences. These fusion primers can be used combinatorially to amplify samples within a 96-well plate (8 forward primers + 12 reverse primers yield 8 × 12 = 96 combinations), and the resulting amplicons can be pooled. The initial PCR products then serve as template for a second round of PCR with dual-indexed iTru or iNext primers (also used combinatorially) to make full-length libraries. The resulting quadruple-indexed amplicons have diversity at most base positions and can be pooled with any standard Illumina library for sequencing. The number of sequencing reads from the amplicon pools can be adjusted, facilitating deep sequencing when required or reducing sequencing costs per sample to an economically trivial amount when deep coverage is not needed. We demonstrate the utility and versatility of our approaches with results from six projects using different implementations of our protocols. Thus, we show that these methods facilitate amplicon library construction for Illumina instruments at reduced cost with increased flexibility. A simple web page to design fusion primers compatible with iTru primers is available at: http://baddna.uga.edu/tools-taggi.html. A fast and easy to use program to demultiplex amplicon pools with internal indexes is available at: https://github.com/lefeverde/Mr_Demuxy. ©2019 Glenn et al.Entities:
Keywords: Fusion primers; Hierarchical indexing; Internal tagging; Libraries; MiSeq; Multiplexing; Next generation sequencing; PCR; Quadruple indexing
Year: 2019 PMID: 31616589 PMCID: PMC6791344 DOI: 10.7717/peerj.7786
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1High throughput workflow to create and multiplex TaggiMatrix libraries.
The components of the quadrupled-indexed amplicon libraries. A specific DNA region is amplified using fusion and tagged locus-specific primers, also known as “indexed fusion primers”, to produce a fusion amplicon. Then iTru adapters are incorporated using limited cycle PCR with i5- and i7-indexed primers to make the complete double stranded DNA library. Internal indexes and outer i5/i7 indexes are represented as well as the set of primers used.
Figure 2Examples of possible primer types (Table 3), including “flipped” fusion primers.
Elements in the box are combined to form each of these various primer types, shown below the box. Standard locus-specific primer sequences are indicated by the letter “N”, in uppercase the forward primer and lowercase the reverse primer. Green and red nucleotide bases refer to unique index sequences. Blue and pink sequences are Read1 and Read 2 fusion sequences, respectively.
Figure 3Sequencing reads that can be obtained from dual-indexed paired-end reads.
(A) Illustration of a double-stranded DNA molecule from a full-length amplicon library (i.e., following the limited-cycle round of PCR). Horizontal arrowheads indicate the 3′ ends. Labels on the double-stranded DNA indicate the function of each section, with shading to help indicate boundaries. (B) Scheme of the four separate primers used for the four sequencing reactions that occur in paired-end dual-indexed sequencing and the reads that each primer produces (number in the circle). One primer, as indicated, is added to each of the four sequencing reads, which are performed in numerical order. Vertical height also indicates this order (read with the top primer is conducted first). 3A and 3B correspond to workflow A (NovaSeq™ 6000, MiSeq™, HiSeq 2500, and HiSeq 2000) and workflow B (iSeq™ 100, MiniSeq™, NextSeq™, HiSeq X, HiSeq 4000, and HiSeq3000), respectively, of dual-indexed workflows on paired-end flow cells (Illumina, 2018a; Illumina, 2018b). Dark read indicates bases that are synthesized via sequencing strand extension, but for which the base sequence is not determined (because it is a known invariant portion of the adapter).
Internal identifying index sequences.
All indexes have an edit distance of ≥ 3. Upper case letters are the indexes; lower case letters add length variation to facilitate sequence diversity at each base position of amplicon pools (see text for details). For Illumina MiSeq and HiSeq models ≤ 2500, adenosine and cytosine are in the red detection channel, whereas guanine and thymine are in the green channel. Indexes and spacers have balanced red (shown here in red) and green (shown here underlined and in blue) representation at each base position within each group of four indexes (i.e., count 1–4, 5–8, 9–12, 13–16, and 17–20).
| Index count | Index label | Sequence | Length |
|---|---|---|---|
| 1 | A |
| 5 |
| 2 | B | 6 | |
| 3 | C | 7 | |
| 4 | D |
| 8 |
| 5 | E | 5 | |
| 6 | F |
| 6 |
| 7 | G | 7 | |
| 8 | H |
| 8 |
| 9 | 1 | 5 | |
| 10 | 2 |
| 6 |
| 11 | 3 | 7 | |
| 12 | 4 |
| 8 |
| 13 | 5 | 5 | |
| 14 | 6 |
| 6 |
| 15 | 7 |
| 7 |
| 16 | 8 | 8 | |
| 17 | 9 | 5 | |
| 18 | 10 |
| 6 |
| 19 | 11 | 7 | |
| 20 | 12 |
| 8 |
Primer pairs used in the example projects presented.
Project, target locus, forward and reverse primer names and sequences, as well as the sources of the primer sequences are shown.
| Kissing Bug | cyt-b | L14816: CCATCCAACATCTCAGCATGATGAAA | H15173: CCCCTCAGAATGATATTTGTCCTCA |
| Pathogenic Fungi | ITS | ITS1-F_KYO2: TAGAGGAAGTAAAAGTCGTAA | ITS2_KY02: TTYRCTRCGTTCTTCATC |
| ITS3-KYO2: AHCGATGAAGAACRYAG | ITS4: TCCTCCGCTTATTGATATGC | ||
| ITS1-F_KYO2: TAGAGGAAGTAAAAGTCGTAA | ITS4: TCCTCCGCTTATTGATATGC | ||
| Salamander eDNA | 12S | Pleth_12S_F: AAAAAAGTCAGGTCAAGG | Pleth_12S_R: GGTGACGGGCGGTGTGTG |
| Methylation | hp21-TSS F: ATAGTGTTGTGTTTTTTTGGAGAGTG | hp21-TSS R: ACAACTACTCACACCTCAACTAAC | |
| hp21-SIE1 F: TTTTTTGAGTTTTAGTTTTTTTAGTAGTGT | hp21-SIE1 R: AACCAAAATAATTTTTCAATCCC | ||
| Bacterial Community | 16S | Bact-0341-b-S-17: CCTACGGGNGGCWGCAG | S-D-Bact-0785-a-A-21: GACTACHVGGGTATCTAATCC |
| 16S | 515F: GTGCCAGCMGCCGCGGTAA | 806R: GGACTACHVGGGTWTCTAAT | |
| nr824 | w898-824F: CATGTTGCATTCAATCTTGG | w898-824R: GCCTCCATACAAGTTAGTTG | |
| nr997 | w843-997F: GAATCAACGCTGAACGTT | w843-997AluR: GGTTCAATTTATTGATGTG | |
| trnL; trnL/F | WistmLF: AGTTGACGACATTTCCTTAC | WistmLR: GGAGTGAATGGTTTGATCAATG | |
| nad4 | NAD4RSF1: CTACTAGACTACTAGAGGT | NAD4RSRl: GTTTGGCAACAAGCAAACG | |
| cyt-b | COBRSF1: CATATTGACTTTCTCTCGCC | COBRSR1: GAATAGGATGACTCAGCGTC |
Notes.
Parson et al., 2000.
Toju et al., 2012.
White et al., 1990.
Kolli et al., 2019.
Klindworth et al., 2013.
Caporaso et al., 2011.
Trusty et al., 2007a.
Trusty et al., 2007b.
Trusty et al., 2008.
General strategies for producing and indexing amplicon libraries for Illumina sequencing.
These examples use iTru primers, but as mentioned in the text, this can be implemented instead with iNext primers. Method 5 is illustrated below, but we are not including any dataset in the present manuscript that has implemented it (see Discussion). Note: this table does not include “flipped” primers.
| Method 1 | Method 2 | Method 3 | Method 4 | Method 5 | |
|---|---|---|---|---|---|
| Standard primers | Standard primers | Indexed primers | Fusion primers | Indexed fusion primers | |
| ↓ | ↓ | ↓ | ↓ | ↓ | |
| PCR | PCR | PCR | PCR | PCR | |
| ↓ | ↓ | [Pool] | ↓ | [Pool] | |
| Indexed fusion primers | |||||
| ↓ | ↓ | ↓ | |||
| Y-yoke | PCR | Y-yoke | |||
| ↓ | [Pool] | ↓ | |||
| iTru PCR | ↓ | iTru PCR | iTru PCR | iTru PCR | |
| iTru PCR | |||||
| ↓ | ↓ | ↓ | ↓ | ↓ | |
| Completed library | Completed library | Completed library | Completed library | Completed library | |
| − | + | + | – | + | Base diversity in reads |
| − | + | + | − | + | Poolable to reduce library preparation costs |
| 2 | 20 | 20 | 2 | 20 | Number of primers |
| 192 | 193 | 97 | 192 | 97 | Minimum number of PCRs for 96 samples |
| − | − | + | − | + | PCR bias varies among samples |
| Low | Low | Med | Med | High | Optimization difficulty |
| Low | High | Med | Med | High | Relative primer cost |
| High | Med | Med | Med | Low | Relative library preparation cost |
Detailed information for example projects presented to validate our approach.
Summarized information for all example projects used to demonstrate Taggimatrix. The “Method” column refers to methods in Table 3; the “Pool name” column applies only to projects in which individual samples were pooled prior to the final pooling proportionate to targeted read number; the “Samples in pool” column cites the number of samples (including replicates) pooled together before final pooling for sequencing; the “Target reads” column cites the approximate number of reads per pool (i.e., not per individual sample) we targeted when pooling samples with other libraries. Note that these data were generated on many independent MiSeq runs.
| # | Organism | Project Goal | Target Loci | Library type | Method | Pool name | Samples | Target reads | Actual reads | Summary |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Kissing bug | Diet analysis | cyt-b | iTru | 1 | N/A | 1 | 100k | 916k | Identified five vertebrate sources of blood meals. |
| 2 | Fungal Community | Fungal identification | Full-ITS1 | iTru | 2 | Homokaryon | 48 | 400k | 515k | Identified the primary fungal OTU from each culture |
| Het. multispore | 48 | 400k | 619k | |||||||
| Het. tissue | 47 | 400k | 444k | |||||||
| Full-ITS2 (standard & “flipped”) | iTru | 2 | Homokaryon | 48 | 400k | 268k | ||||
| Het. multispore | 48 | 400k | 310k | |||||||
| Het. tissue | 47 | 400k | 257k | |||||||
| Incomplete-ITS1&ITS2 | iTru | 2 | Homokaryon | 48 | 400k | 460k | ||||
| Het. multispore | 48 | 400k | 579k | |||||||
| Het. tissue | 47 | 400k | 514k | |||||||
| 3 | Salamander | Environmental DNA | 12S | iTru | 4 & 5 | Reference samples | 7 | 10k | 8k | Detected 6/7 species of salamander expected in community |
| eDNA samples | 30 | 12M | 4.4M | |||||||
| 4 | Human | Methylation | iTru | 4 | N/A | 1 | 40k | 121k | Compared methylation patterns between cell types | |
| 5 | Peromyscus | Microbiome | 16S | iTru | 5 | Ash Basin | 44 | 1.5M | 3.8M | Detected 90,862 bacterial OTUs |
| Pond B | 35 | 1.5M | 2.8M | |||||||
| Tim’s Branch | 48 | 1.5M | 0.7M | |||||||
| Upper Three Runs | 44 | 1.5M | 2.9M | |||||||
| 6 | Wisteria | Population genetics | nr824 | iNext | 5 | N/A | 1 | 150k | 79k | Demonstrated mixed ancestry and no population structure in an introduced population |
Figure 4Total cost of experiments across the five methods given a number of samples.
Line plot of price of each method according to the number of samples. The starting point in the x-axis (x = 0) represents the buy-in cost of oligos.
Buy-in and per sample costs among methods.
Costs associated with the implementation of the different methods. In segment (A) we present buy-in costs of oligos and iTru primers and costs per sample of library prep which consists of both, fixed and variable costs depending on pooling at early stages. Segment (B) is the cost of library prep (not considering primers/adapters) per sample given a number of samples. Segment (C) is the total experimental cost of primers/adapters and library prep according to the number of samples in the experiment, the first section is in terms of number of samples, the second section is in terms of plates, each plate consisting of 96 samples. Costs for iTru are calculated using list prices of aliquots from baddna.uga.edu. Costs for ‘oligos’ are calculated using list prices from Integrated DNA Technologies (IDT; Coralville, IA). Other costs are from listed prices from various vendors in January 2019. Please view Files S1 and S6 for additional details on price calculations and also to review total prices of experiment given a number of samples.
| (A) | Method | Method | Method | Method | Method | |
|---|---|---|---|---|---|---|
| Fixed cost | $18.86 | $3.12 | $1.39 | $4.44 | $1.39 | |
| Variable cost | – | $4.07 | $17.52 | – | $4.07 |
Figure 5Decision tree to select an amplicon library preparation method.
Guide to make an informed decision on which amplicon library preparation method best suits the experimental goals and budget of your project.