| Literature DB >> 31616583 |
Natalia J Bayona-Vásquez1,2,3, Travis C Glenn1,3,4,5, Troy J Kieran1, Todd W Pierson1,6, Sandra L Hoffberg4,7, Peter A Scott8,9, Kerin E Bentley4,10, John W Finger1,5,11, Swarnali Louha3, Nicholas Troendle4,12, Pindaro Diaz-Jaimes2, Rodney Mauricio4, Brant C Faircloth13.
Abstract
Molecular ecologists frequently use genome reduction strategies that rely upon restriction enzyme digestion of genomic DNA to sample consistent portions of the genome from many individuals (e.g., RADseq, GBS). However, researchers often find the existing methods expensive to initiate and/or difficult to implement consistently, especially because it is difficult to multiplex sufficient numbers of samples to fill entire sequencing lanes. Here, we introduce a low-cost and highly robust approach for the construction of dual-digest RADseq libraries that build on adapters and primers designed in Adapterama I. Major features of our method include: (1) minimizing the number of processing steps; (2) focusing on a single strand of sample DNA for library construction, allowing the use of a non-phosphorylated adapter on one end; (3) ligating adapters in the presence of active restriction enzymes, thereby reducing chimeras; (4) including an optional third restriction enzyme to cut apart adapter-dimers formed by the phosphorylated adapter, thus increasing the efficiency of adapter ligation to sample DNA, which is particularly effective when only low quantity/quality DNA samples are available; (5) interchangeable adapter designs; (6) incorporating variable-length internal indexes within the adapters to increase the scope of sample indexing, facilitate pooling, and increase sequence diversity; (7) maintaining compatibility with universal dual-indexed primers and thus, Illumina sequencing reagents and libraries; and, (8) easy modification for the identification of PCR duplicates. We present eight adapter designs that work with 72 restriction enzyme combinations. We demonstrate the efficiency of our approach by comparing it with existing methods, and we validate its utility through the discovery of many variable loci in a variety of non-model organisms. Our 2RAD/3RAD method is easy to perform, has low startup costs, has increased utility with low-concentration input DNA, and produces libraries that can be highly-multiplexed and pooled with other Illumina libraries. ©2019 Bayona-Vásquez et al.Entities:
Keywords: HiSeq; Illumina; In-line barcodes; Multiplexing; Next generation sequencing; NovaSeq; Reduced representation library; Restriction enzyme; ddRAD; iTru
Year: 2019 PMID: 31616583 PMCID: PMC6791345 DOI: 10.7717/peerj.7724
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overview of 2RAD/3RAD library construction.
Genomic DNA is digested with two restriction enzymes (A and B). Adapters are ligated to the digested DNA, but only the bottom strand has functional adapters. The top strand has shorter, non-functional versions of the adapters. The ligation products are then used in a limited cycle PCR with iTru5 and iTru7 primers to form fully active double-stranded DNA molecules. The color-scheme follows those of Glenn et al. (2019) and Hoffberg et al. (2016).
Enzyme combinations and characteristics.
Four design sets each for Read1 (R1) and Read2 (R2) are given. For 2RAD, any two-enzyme combination of Read 1 and Read 2 in black can be used. For 3RAD, the third enzyme (in blue) blocks adapter-dimer formation of the Read 1 adapter (File S3). Digestion efficiency is given for three NEB buffers (2.1, 3.1, and CutSmart®), with the best conditions highlighted in green, and poor or important non-standard conditions in red. Sensitivity to methylation in the template sequence is given, as is the optimal temperature for digestion and the number of bases in the recognition sequence. Note: some restriction enzymes are available as high-fidelity (HF, i.e.: NheI-HF, SpeI-HF, and NsiI-HF), all these have 100% efficiency in CutSmart® Buffer.
| Read 1 adapter sets | Read 2 adapter sets | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Set | Enzyme | NEB buffer | CpG meth | Cut temp | Base cutter | Set | Enzyme | NEB buffer | CpG meth | Cut temp | Base cutter | ||||
| 2.1 | 3.1 | CutSmart® | 2.1 | 3.1 | CutSmart® | ||||||||||
| R1.A | +∕ − | 37 | 6 | R2.1 | EcoRI-HF | +∕ − | 37 | 6 | |||||||
| XbaI | 75 | − | 37 | 6 | MfeI-HF | − | 37 | 6 | |||||||
| SpeI | − | 37 | 6 | ApoI | 75 | 75 | − | 50* | 6 | ||||||
| R1.B | 50 | 50 | + | 37 | 6 | R2.2 | BamHI-HF | 50 | − | 37 | 6 | ||||
| MspI | 50 | − | 37 | 4 | BclI | 75 | − | 50* | 6 | ||||||
| TaqαI | 75 | − | 65 | 4 | BstYI | 75 | − | 60** | 6 | ||||||
| R1.C | 75 | 50 | − | 37 | 6 | ||||||||||
| 75 | 50 | − | 37 | 6 | R2.3 | DdeI | − | 37 | 4 | ||||||
| NsiI | 75 | − | 37 | 6 | |||||||||||
| R1.D | 75 | − | 25 | 6 | R2.4 | HindII-HF | − | 37 | 6 | ||||||
| NdeI | − | 37 | 6 | HindIII | 50 | 50 | − | 37 | 6 | ||||||
| MseI | 75 | − | 37 | 4 | |||||||||||
| AseI | 50 | − | 37 | 6 | |||||||||||
| BfaI | − | 37 | 4 | ||||||||||||
Example 2RAD/3RAD adapter stub sequences.
Groups of four adapters form a balanced set, all eight complete sets are available in File S3. Non-complementary sequences are given in lower case. Tag sequences are in italics. Adapters must be hydrated and annealed prior to use (File S4).
| Adapter | Oligo name | Sequence (5′ to 3′) |
|---|---|---|
| iTru_NheI_R1_A | iTru_NheI_R1_stub_A | ACGACGCTCTTCCGATCT |
| iTru_NheI_R1_RCp_A | /5phos/CTAGC | |
| iTru_EcoRI_R2_1 | iTru_EcoRI_R2_RC_stub_1 | AATTA |
| iTru_EcoRI_R2_1 | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |
| iTru_ClaI_R1_B | iTru_ClaI_R1_stub_B | ACGACGCTCTTCCGATCT |
| iTru_ClaI_R1_RCp_B | /5phos/CGAT | |
| iTru_BamHI_R2_2 | iTru_BamHI_R2_RC_stub_2 | GATCG |
| iTru_BamHI_R2_2 | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |
| iTru_PstI_R1_C | iTru_PstI_R1_stub_C | ACGACGCTCTTCCGATCT |
| iTru_PstI_R1_RCp_C | /5phos/G | |
| iTru_DdeI_R2_3 | iTru_DdeI_R2_RC_stub_3 | TNAC |
| iTru_DdeI_R2_3 | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT | |
| iTru_CviQI_R1_D | iTru_CviQI_R1_stub_D | ACGACGCTCTTCCGATCT |
| iTru_CviQI_R1_RCp_D | /5phos/TAC | |
| iTru_HindIII_R2_4 | iTru_HindIII_R2_RC_stub_4 | AGCTA |
| iTru_HindIII_R2_4 | GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT |
Figure 2Specific adapter sequences and products created during the ligation of 3RAD libraries.
The full adapter sequences for the 3RAD enzyme combination NheI, XbaI and EcoRI-HF (Table 1) are given in the top center boxes. The relevant recognition sequences for the three restriction endonucleases are given in the top outer boxes. The products that are formed from ligation of the triple-enzyme digests and adapters are shown at the bottom.
Figure 32RAD/3RAD workflow for samples with unique or repeated adapter indexes.
DNA is normalized, digested with restriction enzymes, and ligated to adapters. If indexes within adapters uniquely identify all samples (right), samples can be pooled before clean-up and PCR. If indexes do not uniquely identify individuals, PCR must be done separately on each sample, and samples must be normalized and cleaned before pooling. Then, samples are size-selected and quantified to determine if a final P5/P7 PCR should be performed before sequencing.
Figure 4Sequencing reads that can be obtained from full length 2RAD/3RAD library molecules.
The top double stranded molecule shows a 2RAD/3RAD library molecule prepared as described in the text (File S1). The horizontal arrows beneath the library molecule indicate Illumina sequencing primers (binding to the complementary strand of the library molecules). The tip of the arrowhead indicates the 3′ end of the primer and the direction of elongation for sequencing. Four sequencing reads are shown for each library prepared molecule, with one read for each index and each strand of the genomic DNA, including internal indexes. Reads are arranged 1 to 4 (numbered in magenta) from top to bottom, respectively. The arrow immediately 3′ of the primers, indicates the data that are obtained from that primer, with coloring that is consistent with 2RAD/3RAD library molecule.
3RAD example projects.
Classification and genome size of taxa, number of samples tested for each, Illumina read length (nt), number of loci obtained after the assembly method, number loci and SNPs obtained after filtering by only polymorphic loci shared in at least 75% of samples, and the average coverage among loci and individuals. The number of loci can be quite large and certainty of homology variable with distantly related samples, particularly if they have large genomes.
| Class | Genome size (c-value) | Groups | Indiv. | PE Read Length (nt) | Loci | Final Loci | SNPs | Mean Coverage (x) | |
|---|---|---|---|---|---|---|---|---|---|
| Kinosternidae ( | Reptilia | 2.9 | 7 | 24 | 75 | 233,072 | 4,034 | 27,881 | 12 |
| Ixodidae ( | Arachnida | 2.7 | 4 | 16 | 150 | 332,057 | 4,484 | 13,136 | 36 |
| Amphibia | 25.4 | 1 | 21 | 150 | 425,729 | 30 | 360 | 7 | |
| Magnoliopsida | ? | 1 | 24 | 75 | 30,029 | 1,669 | 5,820 | 44 | |
| Reptilia | 2.7 | 3 | 12 | 75 | 103,240 | 16,695 | 25,578 | 11 | |
| Arachnida | 2.4 | 2 | 7 | 150 | 128,899 | 19,843 | 69,518 | 36 | |
| Insecta | 0.7 | 5 | 16 | 75 | 92,687 | 7,779 | 12,099 | 23 | |
| Actinopterygii | 0.9 | 5 | 24 | 75 | 18,629 | 2,140 | 5,429 | 54 | |
| Chondrichthyes | 3.9 | 6 | 24 | 150 | 42,705 | 7,183 | 17,555 | 18 | |
| Chondrichthyes | 3.6 | 7 | 15 | 150 | 44,125 | 5,263 | 12,272 | 27 |
Notes.
Genome sizes are approximations from Gregory (2018, November 20). Animal Genome Size Database. Retrieved from http://www.genomesize.com. We could find no published genome size for Wisteria in the literature, so we omitted it. For all other examples, we averaged reported genome sizes for that species or its closest available relatives; for examples including multiple species (e.g., Kinosternidae), we weighted this average dependent upon the taxonomic composition of the sample.
From pyRAD assembly of homologous loci across all Kinosternidae.
Figure 5Agarose gel with 3RAD, 2RAD, and ddRAD library products performed on pUC19 vector with an input quantity of 0.5 ng.
The band close to the 200 bp size standard (arrow above) is that corresponding to a proper library construct. The band below the 100 pb size standard (arrow below) corresponds to adapter-dimers (File S5). The gel indicates that 3RAD libraries outperformed the other two types of libraries tested by decreasing the adapter-dimers and therefore increasing the quantity of desired library constructs.
Figure 6Scatterplot of the average coverage of all loci (polymorphic and fixed) for each sample relative to sequencing depth of each sample.
Eurycea have the largest genome size and therefore the lowest average coverage per locus with approximately 1,000,000 reads. Average coverage increases as the genome size decreases (Fig. S3).