| Literature DB >> 29077841 |
Shanlin Liu1,2,3, Chentao Yang2, Chengran Zhou1,4, Xin Zhou1,5.
Abstract
Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.Entities:
Keywords: Biodiveristy; COI; DNA Barcode; High-throughput sequencing; meta-barcoding
Mesh:
Year: 2017 PMID: 29077841 PMCID: PMC5726475 DOI: 10.1093/gigascience/gix104
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Schematic illustration of the HIFI-Barcode pipeline.
Figure 2:HIFI-Barcode assembly pipeline.
Read distribution of both Illumina and Pacbio platforms
| Raw | Clean | 5΄ and | Read | Recovered | Sample | Single | Full-length | |
|---|---|---|---|---|---|---|---|---|
| read | read | 3΄ read | in-between | indices | size1 | unique2 | barcodes | |
| Hiseq 1 | 8 567 336 | 4 824 443 | 1 910 616 | 1 898 372 | 96 | 39 805 (64 705; 2444) | 61 | 96 |
| Hiseq 2 | 11 531 498 | 4 439 345 | 1 306 054 | 2 676 915 | 96 | 27 210 (101 512; 279) | 45 | 88 |
| Pacbio 2* | 1 201 158 | 28 770 | 26.4 | 17 102 | 82 | 208 (1696; 1) | NA | 82 |
| Total number3 | Average pass3 | Assigned3 |
*Numbers 1 and 2 in this column represent plate ID. (1) Read number possessed by samples formatted as: average (max; min). (2) Number of clusters that left only 1 single representative candidate after read assignment filtering. (3) Statistics of circular consensus sequence.
Figure 3:Comparison between HIFI-Barcode and Sanger reference. (A) Success rates of the first plate. For all 96 samples, both Sanger (left semicircle) and HIFI-Barcode (right semicircle) were successful in producing a full-length COI barcode. Samples with red out lining are marked on the phylograms. (B) Phylogenetic tree of all HIFI-Barcodes and Sanger references. (C) Close-up view of representative individuals. (D) Degenerate sites of Sanger references were recuperated by HIFI-Barcodes.
Figure 4:Discrepancies between Sanger and HIFI-Barcodes in the first plate. Entropy weight was calculated based on the strength of read depth by aligning Illumina raw reads onto assembled HIFI-Barcodes, showing potential heteroplasmy (A) and differences between ambiguous Sanger base-calling and specific nucleotide identified in HIFI-Barcodes (B).
Figure 5:Success rates of the second plate. For each sample, the upper, left, and right pies represent PCR, Pacbio, and HIFI-Barcode, respectively. Gray represents failure, and the others represent success.