| Literature DB >> 22002916 |
Seth M Bybee1, Heather Bracken-Grissom, Benjamin D Haynes, Russell A Hermansen, Robert L Byers, Mark J Clement, Joshua A Udall, Edward R Wilcox, Keith A Crandall.
Abstract
Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.Entities:
Mesh:
Year: 2011 PMID: 22002916 PMCID: PMC3236605 DOI: 10.1093/gbe/evr106
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FPCR-based library preparation for targeted sequencing. (A) During PCR 1, a locus specific primer amplifies a targeted region of DNA while also attaching an adapter sequence that has been incorporated at the end of the locus specific primer. (B) PCR 2 uses the adapter sequence attached during PCR 1 and adds desired barcode and 454 primer resulting in amplicon libraries ready for purification, quantification, pooling (C), and subsequent emPCR. (C) Following pyrosequencing reads are separated via MID and provided to BarcodeCruncher for further refinement before phylogenetic reconstruction.
Comparison of the 454 Bioinformatic Amplicon Filter versus Shotgun Filter
| Amplicon (Ambiguity = 1) | Shotgun (Ambiguity = 0) | Shotgun (Ambiguity = 1) | |
| Total number of reads | 211,633 | 221,347 | 221,347 |
| Number of rejected reads (no Blast hit) | 5,995 | 10,039 | 10,009 |
| Number of rejected short reads (>100 bp) | 8,396 | 9,499 | 9,545 |
| Average length of best read (bp) | 260.82 | 304.65 | 306.61 |
| Average length of used reads | 209.44 | 234.18 | 234.31 |
| # Of unique assembled contigs | 438 | 483 | 469 |
(A) Amplicons Provided (“P”) to the 454 and Reads Recovered “R” for Each Targeted Gene Region and (B) Percentages Returned, Averaged Read Length, and Average Number of Reads for Amplicons of Optimal 454 Size versus Nonoptimal 454 Size
Fractional Analysis of Raw Returned 454 Sequence Data
| Shotgun | Amplicon | |||||||
| Total Reads | One-Half Reads | One-Fourth Reads | One-Eighth Reads | Total Reads | One-Half Reads | One-Fourth Reads | One-Eighth Reads | |
| Raw reads | 221,347 | 110,674 | 55,337 | 27,668 | 211,633 | 105,817 | 52,908 | 26,454 |
| Provided to BarcodeCruncher | 219,879 | 109,943 | 54,994 | 27,487 | 210,321 | 105,156 | 52,577 | 26,320 |
| Blasted to reference sequence | 56,021 | 24,641 | 12,789 | 7,044 | 37,782 | 17,481 | 10,905 | 6,076 |
| % Used reads | 99.34% | 99.34% | 99.38% | 99.35% | 99.38% | 99.38% | 99.37% | 99.49% |
| % Blasted | 25.31% | 22.26% | 23.11% | 25.46% | 17.85% | 16.52% | 20.61% | 22.97% |
FPhylogenetic estimate of targeted sequence data. All data associated with in-group taxa were generated using 454 pyrosequencing technology. ML phylogram reflecting the topology recovered from both Bayesian and RAxML analyses. Bayesian topology had polytomies at nodes with little support, but all other nodes/relationships were in common. Bootstrap supports >70% and posterior probabilities >90% are shown below and above each branch, respectively.