| Literature DB >> 27191722 |
Jérôme Morinière1, Bruno Cancian de Araujo1, Athena Wai Lam1, Axel Hausmann1,2, Michael Balke1,2, Stefan Schmidt1, Lars Hendrich1, Dieter Doczkal1, Berthold Fartmann3, Samuel Arvidsson3, Gerhard Haszprunar1,2.
Abstract
The German Barcoding initiatives BFB and GBOL have generated a reference library of more than 16,000 metazoan species, which is now ready for applications concerning next generation molecular biodiversity assessments. To streamline the barcoding process, we have developed a meta-barcoding pipeline: We pre-sorted a single malaise trap sample (obtained during one week in August 2014, southern Germany) into 12 arthropod orders and extracted DNA from pooled individuals of each order separately, in order to facilitate DNA extraction and avoid time consuming single specimen selection. Aliquots of each ordinal-level DNA extract were combined to roughly simulate a DNA extract from a non-sorted malaise sample. Each DNA extract was amplified using four primer sets targeting the CO1-5' fragment. The resulting PCR products (150-400bp) were sequenced separately on an Illumina Mi-SEQ platform, resulting in 1.5 million sequences and 5,500 clusters (coverage ≥10; CD-HIT-EST, 98%). Using a total of 120,000 DNA barcodes of identified, Central European Hymenoptera, Coleoptera, Diptera, and Lepidoptera downloaded from BOLD we established a reference sequence database for a local CUSTOM BLAST. This allowed us to identify 529 Barcode Index Numbers (BINs) from our sequence clusters derived from pooled Malaise trap samples. We introduce a scoring matrix based on the sequence match percentages of each amplicon in order to gain plausibility for each detected BIN, leading to 390 high score BINs in the sorted samples; whereas 268 of these high score BINs (69%) could be identified in the combined sample. The results indicate that a time consuming presorting process will yield approximately 30% more high score BINs compared to the non-sorted sample in our case. These promising results indicate that a fast, efficient and reliable analysis of next generation data from malaise trap samples can be achieved using this pipeline.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27191722 PMCID: PMC4871420 DOI: 10.1371/journal.pone.0155497
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Arthropod orders sorted and combined by sample number.
| Sample Number | Arthropod order |
|---|---|
| Aranaea & Opiliones | |
| Collembola | |
| Dermaptera & Blattodea | |
| Mecoptera & Neuroptera | |
| Pscoptera | |
| Trichoptera | |
| Hemiptera | |
| Coleoptera | |
| Orthoptera | |
| Lepidoptera | |
| Hymenoptera | |
| Diptera | |
| Combined fraction of numbers 1–12 |
Primers and corresponding PCR conditions used in this study.
| Amplicon | Sequence | Reference | PCR conditions |
|---|---|---|---|
| 5’—TAA ACT TCA GGG TGA CCA AAA ATC A—3’ | [ | 2’::94°C– 5x[30”:94°C– 40”:45°C– 1’:72°C]– 35x[30”:94°C– 40”:50°C– 1’:72°C]– 10’:72°C | |
| 5’—GGT CAA CAA ATC ATA AAG ATA TTG G—3’ | [ | ||
| 5’—ATT CAA CCA ATC ATA AAG ATA TTG G—3’ | [ | 2’:94°C– 5x[1’:94°C– 90”:45°C– 90”:72°C]– 35x[1’:93°C– 90”:50°C– 90”:72°C]– 10’:72°C | |
| 5’—CCT GGT AAA ATT AAA ATA TAA ACT TC—3’ | [ | ||
| 5’—GAA AAT CAT AAT GAA GGC ATG AGC—3’ | [ | 2’:95°C– 5x[1’:95°C– 1’:46°C– 30”:72°C]– 35x[1’:95°C –1’:53°C– 30”:72°C]– 5’:72°C | |
| 5’—TCC ACT AAT CAC AAR GAT ATT GGT AC—3’ | [ | ||
| 5’—TAA ACT TCA GGG TGA CCA AAR AAY CA—3’ | [ | 2‘:96°C– 3x[15‘‘:96°C– 30‘‘:48°C– 90‘‘:65°C]– 30x[15‘‘:96°C– 30‘‘:55°C—90‘‘:65°C]– 10’:72°C | |
| 5’—GGW ACW GGW TGA ACW GTW TAY CCY CC—3’ | [ |
Categories of scoring according to the sequence identity percentage.
| Interval | Score |
|---|---|
| 97.00–97.99% | 70 |
| 98.00–98.99% | 150 |
| 99.00–99.99% | 240 |
| 100% | 340 |
Efficiency of amplicons used.
| Hco | dgHco | miniBC | LepF | |
|---|---|---|---|---|
| Lepidoptera | 28% | 23% | 25% | 25% |
| Hymenoptera | 33% | 49% | 5% | 13% |
| Coleoptera | 26% | 43% | 20% | 11% |
| Diptera | 23% | 47% | 17% | 13% |
| total | 26% | 45% | 16% | 14% |
Number of BINs (compared to the total) is displayed as percentage values.