Literature DB >> 25202592

A long PCR-based approach for DNA enrichment prior to next-generation sequencing for systematic studies.

Simon Uribe-Convers1, Justin R Duke2, Michael J Moore3, David C Tank1.   

Abstract

PREMISE OF THE STUDY: We present an alternative approach for molecular systematic studies that combines long PCR and next-generation sequencing. Our approach can be used to generate templates from any DNA source for next-generation sequencing. Here we test our approach by amplifying complete chloroplast genomes, and we present a set of 58 potentially universal primers for angiosperms to do so. Additionally, this approach is likely to be particularly useful for nuclear and mitochondrial regions. • METHODS AND
RESULTS: Chloroplast genomes of 30 species across angiosperms were amplified to test our approach. Amplification success varied depending on whether PCR conditions were optimized for a given taxon. To further test our approach, some amplicons were sequenced on an Illumina HiSeq 2000. •
CONCLUSIONS: Although here we tested this approach by sequencing plastomes, long PCR amplicons could be generated using DNA from any genome, expanding the possibilities of this approach for molecular systematic studies.

Entities:  

Keywords:  angiosperms; chloroplast enrichment; long PCR; next-generation sequencing; plastome; universal chloroplast PCR primers

Year:  2014        PMID: 25202592      PMCID: PMC4104715          DOI: 10.3732/apps.1300063

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


Advancements in next-generation sequencing (NGS) technologies have permitted the assembly of large, genome-scale data sets that have shed light on the evolutionary history of many taxa (e.g., Parks et al., 2009; Moore et al., 2010; Xi et al., 2012; Eaton and Ree, 2013; Tennessen et al., 2013). For plant phylogenetics, there has been a major focus on methods for chloroplast phylogenomics (e.g., Parks et al., 2009; Moore et al., 2010), although methods for collecting phylogenomic data sets from the nuclear and mitochondrial genomes have also been developed (e.g., Straub et al., 2012; Eaton and Ree, 2013). Stull et al. (2013) developed a custom RNA probe set designed to capture angiosperm plastomes via solution-based hybridization. While their capture system was broadly successful, Stull et al. (2013) found that the most variable spacer regions were often captured at much-reduced coverage compared to more conserved regions, and were sometimes missed entirely if the target taxon was phylogenetically divergent from one of the 22 plastomes used in the bait design. Moreover, the current cost of the capture probes makes this method most efficient for projects dealing with hundreds of species. Another commonly employed method for plant phylogenomic studies is genome skimming (Straub et al., 2012), which takes advantage of the fact that organellar DNA and nuclear ribosomal DNA are present at high copy numbers in genomic DNA. However, a significant limitation of this method for systematic studies is that only high-copy number regions are recovered consistently across all samples, whereas regions with lower representation are only recovered in some samples and missed completely in others (Straub et al., 2011). This can be problematic for molecular systematic studies where missing data may result in misleading phylogenetic results (Lemmon et al., 2009). Moreover, being limited to high-copy regions in the genome becomes restrictive for experimental design as it excludes putatively highly informative regions in the genome such as single-copy nuclear genes (e.g., the single-copy orthologous genes [COSII] and the pentatricopeptide repeat [PPR] gene family; Wu et al., 2006, and Yuan et al., 2009, respectively). As an alternative, we present an NGS approach that combines long PCR and Illumina sequencing to strategically compile phylogenomic data sets for molecular systematic studies. Long PCR, or long-range PCR, uses a combination of two polymerases—a nonproofreading polymerase at high concentration and a proofreading polymerase at a lower concentration—to amplify DNA fragments that range between 3 and 15 kilobases (kb), although cases of extremely large fragments (22–42 kb) have been reported (e.g., Cheng et al., 1994). Long PCR has been used extensively in human genome projects (e.g., Craig et al., 2008) and to sequence complete mitochondrial genomes (e.g., Knaus et al., 2011; Alexander et al., 2013), using both Sanger sequencing and NGS technologies. Here, we use long PCR to generate chloroplast DNA templates for systematic studies using NGS. While we focus on whole chloroplast amplification, this approach is directly translatable to targeted studies where only particular regions of the plastome are of interest (e.g., the inverted repeat or the small single-copy region). In addition, long PCR could also be very useful for the enrichment of mitochondrial and/or nuclear regions where intron sizes are large or unknown, as well as for regions that are difficult to assemble bioinformatically, such as repetitive regions. Our focus on the chloroplast genome is driven by its phylogenetic informativeness at essentially all taxonomic scales and its relative ease of amplification (e.g., Downie and Palmer, 1992; Graham and Olmstead, 2000; Moore et al., 2007; Parks et al., 2009; Moore et al., 2010), which have made the chloroplast the workhorse of molecular plant systematics since the beginning of the field. Moreover, the availability of a large number of angiosperm plastome sequences had facilitated the design of potentially universal PCR primers. To test this approach, we amplified the chloroplast genomes of 30 species (17 genera) across angiosperms using a set of 58 chloroplast PCR primers that were designed to potentially be universal in angiosperms and that may work in some gymnosperm lineages.

METHODS AND RESULTS

Representatives of 17 different genera (30 spp.) spanning 12 orders of angiosperms sensu APG III (Angiosperm Phylogeny Group, 2009) were chosen to test this approach (Table 1). Special focus was given to three genera in Orobanchaceae: Lamourouxia Kunth (one species), Bartsia L. (two species), and Castilleja Mutis ex L. f. (12 species). High-quality genomic DNA was extracted from ca. 0.02 g of silica gel–dried or herbarium tissue using a modified 2× cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1987), yielding 30–70 ng/μL of DNA per sample. Using the 83 plastid gene angiosperm alignments of Moore et al. (2010; Appendix S1), we developed 58 primers with a goal of maximizing universality across angiosperms (Table 2). Conserved regions for primer design were identified by eye, and the primers were tested with IDT OligoAnalyzer tools (Integrated DNA Technologies, Coralville, Iowa, USA) to ensure that melting temperatures (Tm) were greater than 50°C and that there were no significant hairpins or self-dimerization problems. From these, 16 overlapping primer combinations were chosen to amplify the entire chloroplast genome in appropriately sized, overlapping fragments, making sure to allow at least 100 bp of overlap between regions (Fig. 1, Table 2) to minimize the decrease in sequencing depth usually associated with the ∼30 bp immediately adjacent to the primer sites (Cronn et al., 2008; Harismendy and Frazer, 2009; Cronn et al., 2012).
Table 1.

List of species included in this study, with voucher information, tissue sources, and NGS assembly statistics when available.

SpeciesOrder/FamilyCollection no.HerbariumType of tissueCollection dateNo. of amplified regionsRegion no. not amplifiedbBase pairs sequencedcNo. of contigsCAL bp (min–max)Ave. assembly depthNo. of masked bpd% of masked bpN50% called baseseNo. of ambiguous bases% of ambiguous bases
Bartsia inaequalis Benth.Lamiales/OrobanchaceaeUribe-Convers 2010-22IDSilica gel–dried5 July 201016n/a125,283255011 (204–28,257)65621261.719,29499.9729340.02714
Castilleja covilleana L. F. Hend.Lamiales/OrobanchaceaeTank 1046IDSilica gel–dried13 July 200916n/a133,5951013,360 (1222–48,767)6411010.0837,10799.994870.00524
Castilleja elmeri FernaldLamiales/OrobanchaceaeOlmstead 2001-78WTUSilica gel–dried4 July 200116n/a122,6141111,147 (464–34,602)6644400.3633,04999.997630.00245
Castilleja linariifolia Benth.Lamiales/OrobanchaceaeTank 2001-49WTUSilica gel–dried21 July 200116n/a122,046815,256 (819–50,680)6422600.2128,52999.998420.00164
Castilleja miniata Douglas ex Hook.Lamiales/OrobanchaceaeTank 1048-bIDSilica gel–dried13 July 200916n/a134,704433,676 (6157–75,123)844350.0375,12399.997040.00297
Castilleja pallescens (A. Gray) Greenm.Lamiales/OrobanchaceaeTank 2009-8IDSilica gel–dried6 June 200916n/a125,490431,372 (3039–73,629)764290.0273,62999.998420.00159
Bartsia stricta (Kunth) Benth.Lamiales/OrobanchaceaeUribe-Convers 2010-24IDSilica gel–dried7 July 20101513119,828148559 (425–67,195)70710450.8767,19599.996740.00334
Castilleja applegatei FernaldLamiales/OrobanchaceaeTank 2001-35WTUSilica gel–dried24 June 20011510119,647148546 (204–28,559)6423940.3318,85699.998320.00167
Castilleja virgata (Domb. ex Wedd.) EdwinLamiales/OrobanchaceaeOlmstead 2009-22WTUSilica gel–dried5 Mar. 2009157113,650215412 (178–39,914)69815251.3414,54199.993870.00616
Castilleja ortegae Standl.Lamiales/OrobanchaceaeEgger 1213WTUSilica gel–dried22 Feb. 20021513108,071336,024 (269–97,615)9251980.1897,61599.999110.00093
Castilleja lineariloba (Benth.) T. I. Chuang & HeckardLamiales/OrobanchaceaeTank 2002-04WTUSilica gel–dried27 Apr. 2004149, 10122,182235312 (179–36,972)5408100.6611,65699.9844190.01555
Castilleja victoriae Fairbarns & J. M. EggerLamiales/OrobanchaceaeFairbarns s.n.WTUSilica gel–dried21 July 20051410, 14111,3711011,137 (616–44,011)6885470.4918,39899.998220.00180
Lamourouxia virgata KunthLamiales/OrobanchaceaeZak & Jaramillo, 3387FHerbarium16 Jan. 1988149, 10108,767303626 (214–36,850)65222552.0711,01299.9669360.03310
Castilleja oresbia Greenm.Lamiales/OrobanchaceaeTank 2001-27WTUSilica gel–dried19 June 2001106, 9, 10, 13, 14, 1683,384204169 (222–36,830)71715441.85998699.9676270.03238
Castilleja arvensis Cham. & Schltdl.Lamiales/OrobanchaceaeTank 2005-27WTUSilica gel–dried16 Apr. 200564, 6, 7, 8, 9, 10, 13, 14, 15, 1673,378154892 (186–36,621)70111871.62980399.987790.01227
Penstemon montanus Greene var. idahoensis (D. D. Keck) Cronq.Lamiales/PlantaginaceaeBrunsfeld 4159IDHerbarium14 June 200116n/an/an/an/an/an/an/an/an/an/an/a
Balsamorhiza sagittata (Pursh) Nutt.Asterales/AsteraceaeWillard 2013-42IDSilica gel–dried3 July 2013155n/an/an/an/an/an/an/an/an/an/a
Lomatium dissectum (Nutt.) Mathias & ConstanceApiales/ApiaceaePoor 21IDHerbarium27 May 20041514n/an/an/an/an/an/an/an/an/an/a
Nuphar polysepala Engelm.Nymphaeales/NymphaeaceaeMorales-Briones 412IDSilica gel–dried8 July 2013155n/an/an/an/an/an/an/an/an/an/a
Salix scouleriana Barratt ex Hook.Malpighiales/SalicaeaeBrunsfeld 7213IDHerbarium11 June 2008159n/an/an/an/an/an/an/an/an/an/a
Crataegus columbiana HowellRosales/RosaceaeHetrick 1005IDHerbarium10 Apr. 1996139, 14, 17n/an/an/an/an/an/an/an/an/an/a
Polygonum douglasii GreeneCaryophyllales/PolygonaceaeSmith 8040IDHerbarium23 June 2005125, 6, 9, 15n/an/an/an/an/an/an/an/an/an/a
Umbellularia californica (Hook. & Arn.) Nutt.Laurales/LauraceaeHalse 6901IDHerbarium28 Mar. 2002126, 8, 9, 10n/an/an/an/an/an/an/an/an/an/a
Bromus tectorum L.Poales/PoaceaeClippinger 2IDHerbarium1 May 2004115, 6, 9, 11, 17n/an/an/an/an/an/an/an/an/an/a
Alnus rhombifolia Nutt.Fagales/BetulaceaeGray 52IDHerbarium7Aug. 1989105, 6, 8, 9, 10, 14n/an/an/an/an/an/an/an/an/an/a
Poa bulbosa L.Poales/PoaceaeWillard 2013-26IDSilica gel–dried3 July 2013105, 6, 9, 12, 13, 14n/an/an/an/an/an/an/an/an/an/a
Senecio integerrimus Nutt. var. exaltatus (Nutt.) Cronq.Asterales/AsteraceaeWillard 2013-21IDSilica gel–dried3 July 2013103, 5, 6, 8, 9, 11n/an/an/an/an/an/an/an/an/an/a
Abies amabilis Douglas ex J. ForbesPinales/Pinaceae1419-46WA Park Arb.Silica gel–dried24 May 200994, 6, 7, 9, 10, 11, 12n/an/an/an/an/an/an/an/an/an/a
Capsella bursa-pastoris (L.) Medik.Brassicales/BrassicaceaeBrunsfeld 6313IDHerbarium1 June 200584, 6, 8, 9, 10, 13, 14, 17n/an/an/an/an/an/an/an/an/an/a
Lupinus leucophyllus Douglas ex Lindl.Fabales/FabaceaeWillard 2013-03IDSilica gel–dried3 July 201381, 6, 8, 9, 10, 12, 13, 14n/an/an/an/an/an/an/an/an/an/a
Abies fraseri (Pursh) Poir.Pinales/Pinaceae1005-47WA Park Arb.Silica gel–dried24 May 200974, 5, 6, 7, 8, 9, 10, 11, 12n/an/an/an/an/an/an/an/an/an/a
Balsamorhiza hookeri Nutt.Asterales/AsteraceaeSmith 9421IDHerbarium4 June 200774, 5, 6, 7, 8, 9, 10, 11, 13n/an/an/an/an/an/an/an/an/an/a
Abies grandis (Douglas ex D. Don) Lindl.Pinales/Pinaceae1084-49WA Park Arb.Silica gel–dried24 May 200961, 3, 4, 6, 7, 8, 9, 10, 11, 12n/an/an/an/an/an/an/an/an/an/a
Average11,49314.1313,166.60698.73833.070.7935,052.8799.9910.60.01

Note: CAL = contig average length; F = Field Museum of Natural History Herbarium; ID = University of Idaho Stillinger Herbarium; WA Park Arb. = Washington Park Arboretum; WTU = University of Washington Herbarium.

All data from the 16 chosen primer combinations.

The number of the regions is the same as the order in Fig. 1.

Base pairs (bp) sequenced is the sum of all contigs when including only one copy of the inverted repeat.

Number of bases masked because the minimum sequencing depth of 5× was not achieved.

Percentage of unambiguously called bases.

Table 2.

Universal angiosperm primers used for chloroplast genome amplifications. The 16 primer combinations chosen for this study are in bold with approximate amplicon sizes in kilobases (kb) indicated.

Region no.Approx. size (kb)Primer (F/R)Primer sequence (5′–3′)Overlap between regions in bpb
18trnH.GUG.6RCCTTRATCCACTTGGCTACATRegions 1 & 2 = 542
1psbK.195RACTTACAGCAGCTTGCCAAACRegions 1 & 2a = 542
2/2a10.3/6.3trnQ.UUG.50RGGACGGAAGGATTCGAACCRegions 2a & 2b = 627
2aatpH.17FCTGCYGCTTCYGTTATTGCTRegions 2b & 3 = 2059
2b4atpF.65RCGGTATTAAACCCGAAACTCCRegions 2 & 3 = 2059
2/2brpoC2.4805FGYCGTATYGATTGGTTRAAAGGRegions 3 & 4 = 1274
37atpI.705RCRGCTAAAGTTGCAAAAATAAGAGCTRegions 4 & 5 = 860
3rpoC1.1670FGRGATCAAATGGCTGTTCATRegions 5 & 6 = 618
49rpoC2.520RGTTCGTACAGCAGTATCYACAACRegions 6 & 7 = 764
4petN.3RGCCCAAGCRAGACTTACTATATCCRegions 7 & 8 = 153
510.5trnC.GCA.47FCCCAGTTCAAATCCGGGTRegions 8 & 9 = 1216
5psaB.2170FGCRGCTTTCTTGATTGCYTCRegions 9 & 10 = 135
610trnfM.CAU.21RGGTTATGAGCCTTGCGAGCTARegions 10 & 11 = 771
6trnT.UGU.17FGGTTAGAGCATCGCATTTGTAATGRegions 11 & 12 = 2781
710.3rps4.380RGGTTTGCARCGATAACTTGGKATATCRegions 12 & 13 = 142
7rbcL.178RGTCCATGTACCAGTAGARGATTCRegions 13 & 14 = 392
89.2rbcL.2FTGTCACCACAAACAGARACTAAAGRegions 14 & 15 = 1911
8psbJ.3FGGCYGATACTACTGGAAGRATRegions 16 & 1 = 840
99.8petA.920FCTTCAAGAYCCATTACGTGTHCAAG
9psbB.160RTRCCYTGTCTCCACATTGGAT
1010.9psbB.3FGGGTTTRCCTTGGTATCGTGT
10rps3.17F.newATCCACTTGGTTTYMGACTTGG
118.7rpl16.3RAACCAACGAGTCACACACTAAGC
11/16ycf2.5100RCAGATCATGAATGTTTGGAATCCAT
1210ycf2.2300FTCGGGATCCTRATGCATATAGATAC
12rps12.190FGTTGCCAGAGTACGMTTAACCT
1311rps12.360RCCCTTGTTGACGATCCTTTACTC
13ycf1.59RCCGACCACAACGACCGAAT
14/1511.2trnN.GUU.7RCCGCTCTACCACTGAGCTAC
14ndhA.535FGCTGCTCAATCDATTAGTTATGAA
1510.5ndhI.194RCGAACRCATACTTCACAAGCAA
168.2psbA.640FGCTATGCATGGTTCYTTGGTAAC
rps16.50RCGAACATCAATTGCAACGATTCGATA
rps16.50FTATCGAATCGTTGCAATTGATGTTCG
psbK.200FGGCAAGCTGCTGTAAGTTTTCGA
atpF.70FGGGTTTAATACCGATATTTTAGCAAC
trnR.UCU.45FGGTATAGGTTCAAATCCTATTGGAC
trnQ.UUG.47FCGGAGGTTCGAATCCTTCC
trnK.UUU.3RGAGATGGCAACTCAATCGTTG
trnK.UUU.3FCAACGATTGAGTTGCCATCTC
atpA.430FCGTTCYGTATATGARCCTCTTCAAAC
atpA.820FATCGMCAAATGTCTCTTCTATTAMG
ccsA.890RTCCAAGTAATAAANGCCCAAGTTTC
trnR.ACG.15FGAGGATTAGAGCACGTGG
ycf1.70FGTGGTCGGACTCTATTATGGAT
trnL.UAG.18FGGTAGACACGCTGCTCTTAGG
trnL.UAG.19FGTAGACACGCTGCTCTTAGGAAG
rps12.320RGGGTTCCTCGAACAATGTGATATC
rpl2.550FGTGCTGTAGCGAAACTGATTG
rpl2.640FTCAGCAACAGTCGGACARGT
psbT.3FTGGAAGCATTGGTTTATACATTYCT
atpB.1290RARGGTTGTGATAAGAAACGYTCAA
trnT.UGU.42FGATGGTCATCGGTTCGATTC
psbC.3RAGTTCCATTAAAGAGCGTTTCC
psbD.860FCYGGTTTATGGATGAGYGCT
rpoB.900RCGTCGACCAATCYTTCCTAATTC
rpoB.470RCCRGGRCTTTGCAATATTTGATTG
rpoC2.430RATRGGTAAATCAATCATTTGYCCTTG

All primers are shown in the 5′ to 3′ direction; the name of each primer consists of three parts: the gene in which the primer is anchored, the approximate position of the primer within that gene, and either an “F” or an “R.” It is important to note that the F and R designations do not indicate that the primer should be used as a forward or reverse primer; rather, they indicate the 5′ to 3′ orientation of the primer with respect to the gene—i.e., a primer that is designated as an “F” primer has its 5′ to 3′ orientation in the same orientation as the gene (i.e., on the forward strand), whereas an “R” primer is oriented in the direction opposite to the 5′ to 3′ orientation of the gene (i.e., on the reverse strand).

Overlap between regions is given in number of base pairs (bp), without taking the length of the primers into consideration.

Fig. 1.

The final annotated chloroplast genome assembly of Bartsia inaequalis with the 16 overlapping primer combinations indicated. Note that the primer combinations for regions 11, 12, 13, and 16 amplify both inverted repeat A and B in a single reaction. Photos by Simon Uribe-Convers.

List of species included in this study, with voucher information, tissue sources, and NGS assembly statistics when available. Note: CAL = contig average length; F = Field Museum of Natural History Herbarium; ID = University of Idaho Stillinger Herbarium; WA Park Arb. = Washington Park Arboretum; WTU = University of Washington Herbarium. All data from the 16 chosen primer combinations. The number of the regions is the same as the order in Fig. 1. Base pairs (bp) sequenced is the sum of all contigs when including only one copy of the inverted repeat. Number of bases masked because the minimum sequencing depth of 5× was not achieved. Percentage of unambiguously called bases. The final annotated chloroplast genome assembly of Bartsia inaequalis with the 16 overlapping primer combinations indicated. Note that the primer combinations for regions 11, 12, 13, and 16 amplify both inverted repeat A and B in a single reaction. Photos by Simon Uribe-Convers. PCRs were performed using a combination of two high-quality Taq polymerases—QIAGEN Taq DNA Polymerase (5 units/µL) and QIAGEN HotStar HiFidelity DNA Polymerase (2.5 units/μL) (QIAGEN, Valencia, California, USA)—to obtain amplification of fragments between 5 kb and 12 kb. The QIAGEN HotStar HiFidelity DNA Polymerase was diluted to 0.2 units/µL by combining 0.1 µL of 5× QIAGEN HotStar HiFidelity PCR buffer, 0.36 µL of double-deionized water (ddH2O), and 0.04 µL of QIAGEN HotStar HiFidelity DNA Polymerase (2.5 units/µL). Each PCR had a total volume of 25 µL, was prepared on ice, and contained the following reagents: 2.5 µL of 10× PCR buffer (QIAGEN CoralLoad or colorless, with 15 mM MgCl2), 1.0 µL MgCl2 (QIAGEN 25 mM), 0.75 µL of deoxyribonucleotide triphosphates (dNTPs, each at 10 mM), 5.0 µL of 5× QIAGEN Q solution, 2.5 µL of both forward and reverse primers (each at 5 µM), 0.25 µL (1.25 units) of QIAGEN Taq DNA Polymerase, 0.5 µL of the diluted QIAGEN HotStar HiFidelity DNA Polymerase solution, 9 µL of ddH2O, and 1.0 µL of DNA template. Long PCR profiles were as follows: preheat at 93°C, initial denaturation at 93°C for 3 min followed by 35 cycles of denaturation at 93°C for 15 s, annealing at 48–68°C (depending on the primer pair) for 30 s, and extension at 68°C for 5–12 min (1 min/kb of target). To assess amplification, 2 µL of the final reactions were examined on a 1% agarose gel with appropriate size standards and the final products were kept at 4°C. The complete, step-by-step long PCR protocol can be found in Appendix 1.
Appendix 1.

Protocol for long PCR for amplification of 4–20-kb targets. Developed by the Tank Laboratory, University of Idaho; published January 2014.

ProductContentsCatalog no.
QIAGEN Taq DNA Polymerase1250 units Taq DNA Polymerase, 10× PCR Buffer, 5× Q-Solution, 25 mM MgCl2201205
QIAGEN HotStar HiFidelity DNA Polymerase2100 units HotStar HiFidelity DNA Polymerase2, 10× HotStar PCR Buffer, 5× Q-solution, 25 mM MgSO4202602

Almost any high-quality Taq polymerase should work; however, cheap Taq polymerases (e.g., QIAGEN TopTaq or Promega GoTaq) do not work and result in large smears, rather than discrete bands.

QIAGEN HotStar HiFidelity DNA Polymerase was the only high-fidelity polymerase used in this study.

Q-solution does seem to be an important additive, thus the use of QIAGEN Taq. However, this does work using Q-solution with other high-quality Taq polymerases such as Promega’s or New England Biolab’s standard Taq (i.e., if you have a stock of Q-solution, but no QIAGEN Taq).

For the three genera of Orobanchaceae in which PCR optimization was performed, amplification of the fragments was straightforward and had an average success rate of 89.7% (range = 73–100%). The most difficult regions to amplify were regions 2 (trnQ(UUG)-rpoC2), 9 (petA-psbB), 10 (psbB-rps3), and 14 (trnN(GUU)-ndhA), which are among the largest fragments (10.3 kb, 9.8 kb, 10.9 kb, and 11.2 kb, respectively; Table 2). It was possible to split region 2 into two smaller fragments, 2a (trnQ(UUG)-atpH: 6.3 kb) and 2b (atpF-rpoC2: 4 kb), which facilitated its amplification in several taxa. This was not the case for regions 9, 10, and 14, for which multiple long PCR experiments using varying amounts of DNA template were necessary to obtain successful amplifications. Amplification outside of Orobanchaceae was highly variable, with an average success rate of 70.8% (range = 22–100%) with regions 5, 6, 9, 10, and 11 showing the lowest success. Importantly, the results for these taxa were obtained after just two rounds of PCR where the annealing temperatures were changed to either 48°C or 55°C. Although we did not optimize the long PCRs for each group, we are confident that optimization on a per group basis (e.g., increasing template volume, altering annealing temperatures, and/or long PCR profiles) and/or the use of fresh tissue for DNA extractions would improve success rates. Furthermore, if genomic rearrangements and/or primer mismatches are present in certain groups, primer combinations other than the 16 that were used here could be tested (Table 2). Nevertheless, we successfully amplified all 16 regions in seven species, whereas in the remaining 23 species it was only possible to amplify between six (1 sp.) and 15 (8 spp.) regions (Table 1). These results translate to 21 species having at least 12 regions amplified (114.7 kb based on potential amplicon size), representing ca. 74% of the chloroplast genome when considering only one copy of the inverted repeat. Even the species with the smallest number of amplified fragments (Castilleja arvensis Cham. & Schltdl.) was represented by ∼73 kb of data, exemplifying the effectiveness of this approach. Universal angiosperm primers used for chloroplast genome amplifications. The 16 primer combinations chosen for this study are in bold with approximate amplicon sizes in kilobases (kb) indicated. All primers are shown in the 5′ to 3′ direction; the name of each primer consists of three parts: the gene in which the primer is anchored, the approximate position of the primer within that gene, and either an “F” or an “R.” It is important to note that the F and R designations do not indicate that the primer should be used as a forward or reverse primer; rather, they indicate the 5′ to 3′ orientation of the primer with respect to the gene—i.e., a primer that is designated as an “F” primer has its 5′ to 3′ orientation in the same orientation as the gene (i.e., on the forward strand), whereas an “R” primer is oriented in the direction opposite to the 5′ to 3′ orientation of the gene (i.e., on the reverse strand). Overlap between regions is given in number of base pairs (bp), without taking the length of the primers into consideration. It is notable that many of the DNAs that were tested were extracted from herbarium tissues that ranged from five to 25 yr old when isolated. In addition, we tested these primers in several species of Abies Mill. (Pinaceae; Table 1) with surprising success, amplifying between six and nine regions without any PCR optimization. We caution that our long PCR protocol works best using recent DNA extractions that have not been through multiple freeze-thaw cycles. Ideally, long PCR should be conducted using new DNA extractions that are stored at 4°C while performing experiments. Additionally, discrete PCR bands were only obtained using high-quality Taq polymerases. When conventional polymerases were used (e.g., GoTaq [Promega Corporation, Madison, Wisconsin, USA] or TopTaq [QIAGEN]), the resulting PCR products were smears rather than discrete bands and were not used for sequencing. To confirm that our long PCR approach was compatible with NGS and that our primers would yield complete chloroplast genomes, the amplicons from each of the 15 Orobanchaceae taxa were purified by precipitation in a 20% polyethylene glycol 8000 (PEG)/2.5 M NaCl solution and washed in 70% ethanol. The amplicons were sheared by nebulization at 30 psi for 70 s, yielding an average shear size of 500 bp as measured by a Bioanalyzer High-Sensitivity Chip (Agilent Technologies, Santa Clara, California, USA). DNA normalization is a critical step when pooling samples for multiplexing in NGS; however, due to the large number of plastomes per cell and the very few samples that were being sequenced in such a high-throughput sequencing platform, no DNA quantification was made and the sheared amplicons were pooled by species at equal volume ratios. Sequencing libraries were constructed using the Illumina TruSeq library preparation kit and protocol (Illumina, San Diego, California, USA) and were standardized at 2 nM prior to sequencing. Library concentrations were determined using the KAPA qPCR kit (KK4835; Kapa Biosystems, Woburn, Massachusetts, USA) on an ABI StepOnePlus Real-Time PCR System (Life Technologies, Grand Island, New York, USA). The resulting libraries were multiplexed in one Illumina HiSeq 2000 lane (∼187.5 million reads per lane [Glenn, 2011]) at the Vincent J. Coates Genomics Sequencing Laboratory at the University of California, Berkeley, yielding ∼12.5 million 100-bp single-end reads for each taxon (GenBank Sequence Read Archive accessions: SRR1023085, SRR1023089, SRR1023095, SRR1023112, SRR1023113, SRR1023126, SRR1023128–SRR1023136). Average depth of coverage of our sequencing experiment was ∼8333× (taking 150 kb as the average plastome size). The results obtained here clearly do not maximize the potential of the Illumina HiSeq 2000 for plastome sequencing. To take full advantage of the large amount of data produced by a HiSeq 2000 for plastome sequencing, it would be theoretically possible to sequence ∼4170 samples per lane and still reach the 30× minimum threshold generally regarded as ideal for plastome sequencing (Straub et al., 2012). However, high-level multiplexing in NGS with this or any other high-throughput method requires careful normalization of DNA concentrations across samples and sufficient adapter barcodes; commonly used commercial kits currently offer either 96 (NEXTflex DNA Barcode kit; Bioo Scientific, Austin, Texas, USA) or 386 (Fluidigm, San Francisco, California, USA). Alternatively, one could choose to perform this type of experiment on an NGS platform that yielded a lesser amount of data, e.g., 1 million 250-bp paired-end reads on an Illumina MiSeq Reagent Nano Kit version 2, which would allow a 30× sequencing depth for 96 samples (or 50× sequencing depth for 64 samples). Because of the high depth of coverage of our sequencing experiment, reads were cleaned at high stringency (minimum quality = 30/40, maximum number of low-quality bases per read = 5, maximum number of duplicate reads = 10, minimum number of duplicate reads = 2) and assembled against a reference genome (Sesamum indicum L., GenBank accession no. JN637766) using the Alignreads pipeline version 2.25 (Straub et al., 2011) with the following options: percent identity = medium, minimum coverage depth = 5, and single nucleotide polymorphism (SNP) minimum coverage depth = 25 with 80% of those reads supporting the SNP. The resulting assemblies had an average depth of ∼700×, an average of 0.79% bases that were masked for not reaching the minimum sequencing depth of 5×, and an average N50 of 35,053 bp (Table 1; contigs and ACE files deposited in the Dryad Digital Repository: http://doi.org/10.5061/dryad.kc75n; Uribe-Convers et al., 2014). We noticed a small decrease in sequencing depth in regions immediately adjacent to some primer sites, which is a phenomenon that has been reported in the past (Whittall et al., 2010; Knaus et al., 2011; reviewed in Cronn et al., 2012). Given that our shortest overlap between amplicons is 135 bp (between regions 9 and 10; Table 2), with the rest spanning hundreds of base pairs (Table 2), and that our experiment yielded a high sequencing depth, we had no problems calling bases unambiguously (99.99% on average, Table 1). The Bartsia inaequalis Benth. assembly (Fig. 1; GenBank accession no. KF922718) was annotated using DOGMA (Wyman et al., 2004) and visualized in GenomeVx (Conant and Wolfe, 2008).

CONCLUSIONS

We present an alternative approach for systematic studies that combines long PCR and NGS to strategically compile phylogenomic data sets for molecular systematic studies. This approach is on par with genome skimming in terms of costs, but it has the advantage of being a targeted approach and has the potential to produce data more uniformly across samples, i.e., minimizing missing data across taxa. Although this approach was only tested with chloroplast data, we emphasize that the long PCR amplicons can be generated using DNA from any genome, expanding the possibilities of long PCR and NGS for molecular systematic studies. This last point is important for studies targeting the mitochondrion or low-copy regions of the genome that otherwise might be missed or not shared across all samples using genome skimming approaches. For example, this approach may be particularly useful for the enrichment of nuclear regions, where intron sizes are large or unknown. Click here for additional data file.
Reagents to prepare the HotStar Taq dilutionVolumes for 25 reactions (total 12.5 µL)Volumes for 50 reactions (total 25 µL)Volumes for 100 reactions (total 50 µL)
5× HotStar HiFidelity PCR buffer2.5 μL5.0 μL10 μL
H2O9.0 μL18 μL36 μL
QIAGEN HotStar Taq1.0 μL2.0 μL4.0 μL
Cocktail×1 (25 μL reaction)
10× PCR buffer (QIAGEN CoralLoad PCR Buffer or colorless, 15 mM MgCl2)2.5 μL
MgCl2 (25 mM)1.0 μL (3 mM final conc.; adjustable)
dNTP (10 mM each)0.75 μL (3 μL of 2.5 mM each)
Q solution (5×)5.0 μL
5′ primer (5 μM)2.5 μL (0.5 μM final conc.)
3′ primer (5 μM)2.5 μL (0.5 μM final conc.)
Taq DNA polymerase (QIAGEN)0.25 μL (1.25 units)1
QIAGEN HotStar DNA polymerase (diluted)0.50 μL
H2Oto 25 μL (9 μL if using 1.0 μL DNA)

The success rate was lower when a smaller quantity was used, but the best DNAs work with ≥0.125 μL.

Primer combinations for long PCR amplification of the chloroplast genome.,

Region no.Approx. size (kb)Primers (F/R)3Primer sequence (5′–3′)
18trnH.GUG.6RCCTTRATCCACTTGGCTACAT
psbK.195RACTTACAGCAGCTTGCCAAAC
210.3trnQ.UUG.50RGGACGGAAGGATTCGAACC
rpoC2.4805FGYCGTATYGATTGGTTRAAAGG
2a46.3trnQ.UUG.50RGGACGGAAGGATTCGAACC
atpH.17FCTGCYGCTTCYGTTATTGCT
2b44atpF.65RCGGTATTAAACCCGAAACTCC
rpoC2.4805FGYCGTATYGATTGGTTRAAAGG
37atpI.705RCRGCTAAAGTTGCAAAAATAAGAGCT
rpoC1.1670FGRGATCAAATGGCTGTTCAT
49rpoC2.520RGTTCGTACAGCAGTATCYACAAC
petN.3RGCCCAAGCRAGACTTACTATATCC
510.5trnC.GCA.47FCCCAGTTCAAATCCGGGT
psaB.2170FGCRGCTTTCTTGATTGCYTC
610trnfM.CAU.21RGGTTATGAGCCTTGCGAGCTA
trnT.UGU.17FGGTTAGAGCATCGCATTTGTAATG
710.3rps4.380RGGTTTGCARCGATAACTTGGKATATC
rbcL.178RGTCCATGTACCAGTAGARGATTC
89.2rbcL.2FTGTCACCACAAACAGARACTAAAG
psbJ.3FGGCYGATACTACTGGAAGRAT
99.8petA.920FCTTCAAGAYCCATTACGTGTHCAAG
psbB.160RTRCCYTGTCTCCACATTGGAT
1010.9psbB.3FGGGTTTRCCTTGGTATCGTGT
rps3.17F.newATCCACTTGGTTTYMGACTTGG
118.7rpl16.3RAACCAACGAGTCACACACTAAGC
ycf2.5100RCAGATCATGAATGTTTGGAATCCAT
1210ycf2.2300FTCGGGATCCTRATGCATATAGATAC
rps12.190FGTTGCCAGAGTACGMTTAACCT
13511rps12.360RCCCTTGTTGACGATCCTTTACTC
ycf1.59RCCGACCACAACGACCGAAT
1411.2trnN.GUU.7RCCGCTCTACCACTGAGCTAC
ndhA.535FGCTGCTCAATCDATTAGTTATGAA
14′67trnR.ACG.15FGAGGATTAGAGCACGTGG
ccsA.890RTCCAAGTAATAAANGCCCAAGTTTC
1510.5ndhI.194RCGAACRCATACTTCACAAGCAA
trnN.GUU.7RCCGCTCTACCACTGAGCTAC
168.2psbA.640FGCTATGCATGGTTCYTTGGTAAC
ycf2.5100RCAGATCATGAATGTTTGGAATCCAT

Universal primers designed by M.J.M.; compiled and tested by D.C.T. and S.U.C.

Ta should be ∼5°C below Tm of primers; however, temperatures of 55°C have worked for all primer combinations.

The name of each primer consists of three parts: (1) the gene in which the primer is anchored in, (2) the approximate position of the primer within that gene (based on all-angiosperm alignment per Moore et al., 2007), and (3) either an “F” or an “R.” The F and R designations do not indicate that the primer should be used as a forward or reverse primer; rather, they indicate the 5′ to 3′ orientation of the primer with respect to the gene. In other words, a primer that is designated as an F primer has its 5′ to 3′ orientation in the same orientation as the gene (i.e., on the forward strand, or from start to stop), whereas an R primer is oriented in the direction opposite to the 5′ to 3′ orientation of the gene (i.e., on the reverse strand).

Regions 2a and 2b can be used to amplify region 2 in two pieces.

Regions 11, 12, and 13 represent a large portion of the inverted repeat (IR), thus, one amplification for both IRa and IRb.

Region 14′ amplifies ca. 2/3 of region 14.

  25 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  Targeted enrichment strategies for next-generation plant biology.

Authors:  Richard Cronn; Brian J Knaus; Aaron Liston; Peter J Maughan; Matthew Parks; John V Syring; Joshua Udall
Journal:  Am J Bot       Date:  2012-02-06       Impact factor: 3.844

3.  Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots.

Authors:  Michael J Moore; Pamela S Soltis; Charles D Bell; J Gordon Burleigh; Douglas E Soltis
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-22       Impact factor: 11.205

4.  Phylogenomics and a posteriori data partitioning resolve the Cretaceous angiosperm radiation Malpighiales.

Authors:  Zhenxiang Xi; Brad R Ruhfel; Hanno Schaefer; André M Amorim; M Sugumaran; Kenneth J Wurdack; Peter K Endress; Merran L Matthews; Peter F Stevens; Sarah Mathews; Charles C Davis
Journal:  Proc Natl Acad Sci U S A       Date:  2012-10-08       Impact factor: 11.205

5.  The pentatricopeptide repeat (PPR) gene family, a tremendous resource for plant phylogenetic studies.

Authors:  Yao-Wu Yuan; Chang Liu; Hannah E Marx; Richard G Olmstead
Journal:  New Phytol       Date:  2009-01-13       Impact factor: 10.151

6.  Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae).

Authors:  Deren A R Eaton; Richard H Ree
Journal:  Syst Biol       Date:  2013-05-07       Impact factor: 15.683

7.  Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes.

Authors:  Matthew Parks; Richard Cronn; Aaron Liston
Journal:  BMC Biol       Date:  2009-12-02       Impact factor: 7.431

8.  Low diversity in the mitogenome of sperm whales revealed by next-generation sequencing.

Authors:  Alana Alexander; Debbie Steel; Beth Slikas; Kendra Hoekzema; Colm Carraher; Matthew Parks; Richard Cronn; C Scott Baker
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

9.  Identification of genetic variants using bar-coded multiplexed sequencing.

Authors:  David W Craig; John V Pearson; Szabolcs Szelinger; Aswin Sekar; Margot Redman; Jason J Corneveaux; Traci L Pawlowski; Trisha Laub; Gary Nunn; Dietrich A Stephan; Nils Homer; Matthew J Huentelman
Journal:  Nat Methods       Date:  2008-09-14       Impact factor: 28.547

10.  Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology.

Authors:  Richard Cronn; Aaron Liston; Matthew Parks; David S Gernandt; Rongkun Shen; Todd Mockler
Journal:  Nucleic Acids Res       Date:  2008-08-27       Impact factor: 16.971

View more
  13 in total

1.  Mechanistic model of evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants.

Authors:  Susann Wicke; Kai F Müller; Claude W dePamphilis; Dietmar Quandt; Sidonie Bellot; Gerald M Schneeweiss
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-22       Impact factor: 11.205

2.  Genome and metagenome sequencing: Using the human methyl-binding domain to partition genomic DNA derived from plant tissues.

Authors:  Erbay Yigit; David I Hernandez; Joshua T Trujillo; Eileen Dimalanta; C Donovan Bailey
Journal:  Appl Plant Sci       Date:  2014-11-03       Impact factor: 1.936

3.  Limited mitogenomic degradation in response to a parasitic lifestyle in Orobanchaceae.

Authors:  Weishu Fan; Andan Zhu; Melisa Kozaczek; Neethu Shah; Natalia Pabón-Mora; Favio González; Jeffrey P Mower
Journal:  Sci Rep       Date:  2016-11-03       Impact factor: 4.379

4.  Application of a simplified method of chloroplast enrichment to small amounts of tissue for chloroplast genome sequencing.

Authors:  Shota Sakaguchi; Saneyoshi Ueno; Yoshihiko Tsumura; Hiroaki Setoguchi; Motomi Ito; Chie Hattori; Shogo Nozoe; Daiki Takahashi; Riku Nakamasu; Taishi Sakagami; Guillaume Lannuzel; Bruno Fogliani; Adrien S Wulff; Laurent L'Huillier; Yuji Isagi
Journal:  Appl Plant Sci       Date:  2017-05-08       Impact factor: 1.936

5.  The Complete Chloroplast Genome Sequences of Six Rehmannia Species.

Authors:  Shuyun Zeng; Tao Zhou; Kai Han; Yanci Yang; Jianhua Zhao; Zhan-Lin Liu
Journal:  Genes (Basel)       Date:  2017-03-15       Impact factor: 4.096

6.  Characterizing gene tree conflict in plastome-inferred phylogenies.

Authors:  Joseph F Walker; Nathanael Walker-Hale; Oscar M Vargas; Drew A Larson; Gregory W Stull
Journal:  PeerJ       Date:  2019-09-24       Impact factor: 2.984

7.  A long PCR-based approach for DNA enrichment prior to next-generation sequencing for systematic studies.

Authors:  Simon Uribe-Convers; Justin R Duke; Michael J Moore; David C Tank
Journal:  Appl Plant Sci       Date:  2014-01-07       Impact factor: 1.936

8.  A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae).

Authors:  Simon Uribe-Convers; Matthew L Settles; David C Tank
Journal:  PLoS One       Date:  2016-02-01       Impact factor: 3.240

9.  Complete Plastid Genome of the Recent Holoparasite Lathraea squamaria Reveals Earliest Stages of Plastome Reduction in Orobanchaceae.

Authors:  Tahir H Samigullin; Maria D Logacheva; Aleksey A Penin; Carmen M Vallejo-Roman
Journal:  PLoS One       Date:  2016-03-02       Impact factor: 3.240

10.  Detecting and Characterizing the Highly Divergent Plastid Genome of the Nonphotosynthetic Parasitic Plant Hydnora visseri (Hydnoraceae).

Authors:  Julia Naumann; Joshua P Der; Eric K Wafula; Samuel S Jones; Sarah T Wagner; Loren A Honaas; Paula E Ralph; Jay F Bolin; Erika Maass; Christoph Neinhuis; Stefan Wanke; Claude W dePamphilis
Journal:  Genome Biol Evol       Date:  2016-01-06       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.