| Literature DB >> 24641208 |
Shadi Shokralla1, Joel F Gibson, Hamid Nikbakht, Daniel H Janzen, Winnie Hallwachs, Mehrdad Hajibabaei.
Abstract
DNA barcoding is an efficient method to identify specimens and to detect undescribed/cryptic species. Sanger sequencing of individual specimens is the standard approach in generating large-scale DNA barcode libraries and identifying unknowns. However, the Sanger sequencing technology is, in some respects, inferior to next-generation sequencers, which are capable of producing millions of sequence reads simultaneously. Additionally, direct Sanger sequencing of DNA barcode amplicons, as practiced in most DNA barcoding procedures, is hampered by the need for relatively high-target amplicon yield, coamplification of nuclear mitochondrial pseudogenes, confusion with sequences from intracellular endosymbiotic bacteria (e.g. Wolbachia) and instances of intraindividual variability (i.e. heteroplasmy). Any of these situations can lead to failed Sanger sequencing attempts or ambiguity of the generated DNA barcodes. Here, we demonstrate the potential application of next-generation sequencing platforms for parallel acquisition of DNA barcode sequences from hundreds of specimens simultaneously. To facilitate retrieval of sequences obtained from individual specimens, we tag individual specimens during PCR amplification using unique 10-mer oligonucleotides attached to DNA barcoding PCR primers. We employ 454 pyrosequencing to recover full-length DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run (i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence reads for each individual specimen. The sequences produced are full-length DNA barcodes for all but one of the included specimens. In a subset of samples, we also detected Wolbachia, nontarget species, and heteroplasmic sequences. Next-generation sequencing is of great value because of its protocol simplicity, greatly reduced cost per barcode read, faster throughout and added information content.Entities:
Keywords: COI; DNA; Lepidoptera; Wolbachia; biodiversity; genomics; heteroplasmy; taxonomy
Mesh:
Year: 2014 PMID: 24641208 PMCID: PMC4276293 DOI: 10.1111/1755-0998.12236
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Figure 1Schematic diagram of parallel barcode recovery using multiple identifier (MID) tagging and next-generation sequencing (NGS) protocol.
Figure 2Comparison of DNA sequence data recovered by Sanger sequencing and 454 pyrosequencing. A) Green bars represent number of full-length COI barcode sequences. Yellow bar represents number of partial COI barcode sequences. Red bars represent failed target barcode attempts. Orange bar represents number of heteroplasmic COI sequences. Purple bar represents number of coamplified nontarget COI sequences (i.e. ‘contaminants’). Light green bar represents number of Wolbachia sequences. B) Number of organisms recovering single or multiple sequence clusters during 454 pyrosequencing.
Figure 3Neighbour-joining diagram of 352 DNA sequences recovered by 454 pyrosequencing and Sanger sequencing. Short sequences (<600 bp) have not been included. Distance measurement is calculated in number of base substitutions per site based on the Kimura 2-parameter method. The tree backbone represents the 454 pyrosequences and green triangles represent sequences produced by Sanger sequencing (>600 bp). Red circles represent sequences determined to be heteroplasmic. Blue squares represent individual specimens that also recovered a Wolbachia sequence.
Figure 4Portion of a sequence electropherogram as produced by Sanger sequencing and composite sequence clusters as recovered by 454 pyrosequencing of a single specimen. Highlighted bases represent differences from the Sanger sequence. Arrows indicate the presence of peaks in the electropherogram corresponding to alternate sequences.
General specifications of the most commonly used next-generation sequencing (NGS) platforms as compared to Sanger sequencing
| 454 GS FLX + | 454 Junior | Illumina HiSeq | Illumina MiSeq | Ion Torrent PGM | Sanger ABI 3730xl | |
|---|---|---|---|---|---|---|
| Max. read length | 700 bp | 400 bp | 2 × 150 bp | 2 × 300 bp | 400 bp | 1–1.5 kb |
| Max. output/run | 450–700 Mb | 35 Mb | 150–180 Gb | 13.2–15 Gb | 1.2–2 Gb | 96 Kb |
| Max. reads/run | 700 k–1 M S-R | 70 K S-R | 1.2 Billion PE-R | 44–50 M PE-R | 4–5.5 M S-R | 96 S-R |
| Time per run | 23 h | 10 h | 40 h | 65 h | 7.3 h | 4 h |
S-R, single reads; PE-R, paired-end reads; Information in this table was obtained from manufacturers’ web pages accessed on 7 January 2014.