| Literature DB >> 25294837 |
Min Tang1, Meihua Tan2, Guanliang Meng3, Shenzhou Yang1, Xu Su1, Shanlin Liu1, Wenhui Song1, Yiyuan Li1, Qiong Wu1, Aibing Zhang4, Xin Zhou5.
Abstract
The advent in high-throughput-sequencing (HTS) technologies has revolutionized conventional biodiversity research by enabling parallel capture of DNA sequences possessing species-level diagnosis. However, polymerase chain reaction (PCR)-based implementation is biased by the efficiency of primer binding across lineages of organisms. A PCR-free HTS approach will alleviate this artefact and significantly improve upon the multi-locus method utilizing full mitogenomes. Here we developed a novel multiplex sequencing and assembly pipeline allowing for simultaneous acquisition of full mitogenomes from pooled animals without DNA enrichment or amplification. By concatenating assemblies from three de novo assemblers, we obtained high-quality mitogenomes for all 49 pooled taxa, with 36 species >15 kb and the remaining >10 kb, including 20 complete mitogenomes and nearly all protein coding genes (99.6%). The assembly quality was carefully validated with Sanger sequences, reference genomes and conservativeness of protein coding genes across taxa. The new method was effective even for closely related taxa, e.g. three Drosophila spp., demonstrating its broad utility for biodiversity research and mito-phylogenomics. Finally, the in silico simulation showed that by recruiting multiple mito-loci, taxon detection was improved at a fixed sequencing depth. Combined, these results demonstrate the plausibility of a multi-locus mito-metagenomics approach as the next phase of the current single-locus metabarcoding method.Entities:
Mesh:
Year: 2014 PMID: 25294837 PMCID: PMC4267667 DOI: 10.1093/nar/gku917
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic illustration of the pipeline.
List of taxa analyzed and corresponding assembly results
Figure 2.Assembled mitogenome scaffolds for all 49 taxa. Assembled mitogenomes are presented following each of the 49 taxa. Color bars represent 13 protein coding genes. Fragmented scaffolds assigned to the same taxon are connected by dash lines. To aid visualization, each mitogenome is manually broken at the beginning of ND2, with the bold black line connecting the associated scaffold containing rDNA and/or control region aligned to the right side of the super-scaffold. Complete circular mitogenomes are marked with a circle sign after the taxon name.
Figure 3.Completeness of assembled protein coding genes. Presence and absence of assembled protein coding genes are represented by pink ovals and blanks. Gray ovals represent putative genes (all ATP8) that were successfully assembled but failed in annotation due to its poor coverage in public databases. Only 5 out of 637 protein coding genes were missing from the final assemblies.
Average depth and length of assembled mito-scaffolds
| Average depth (X) | ≤3 | 3–5 | 5–10 | >10 |
|---|---|---|---|---|
| # of scaffolds | 407 | 106 | 31 | 105 |
| Average length (bp) | 362 | 598 | 860 | 7519 |
Figure 4.Categories of assembled mito-scaffolds. Sixty (green dots) were successfully associated to input taxa in the taxon assignment procedure; three scaffolds (yellow dots) potentially belonged to input taxa; one scaffold (pink dots) was confirmed as polymorphic nucleotide by read coverage; seven scaffolds were identified as bacterial origin (blue dots), most of which were insect endosymbionts; and 23 scaffolds were identified as erroneous assemblies as no read covered in some areas.
Figure 5.Validation of Drosophila assemblies using Sanger sequences and reference mitogenomes. Pairwise divergences are plotted along the assembled length of the mitogenomes for Drosophila melanogaster, Drosophila erecta and Drosophila pseudoobscura. P-distance values are calculated using a slide-window of 50 and 1 bp step. Protein coding genes, rDNA and the control region are marked along the mitogenome. Two regional assemblies are shown in details with the NGS assemblies aligned with reference mitogenomes obtained from GenBank and Sanger sequences (only available here for the 5′ end of the CO1 barcode region as shown in the bottom panel). NGS assemblies are confirmed by both reference mitogenomes and Sanger sequences and successfully recover the characteristic gap regions for D. melanogaster and D. pseudoobscura.
Figure 6.In silico simulation for taxon detection using multiple loci. Varied proportions of the clean raw reads (2G, 5G and 8G) were randomly selected as the in silico datasets to test efficiency of a multi-locus identification approach. The selected reads and assembled contigs using these reads via SOAPdenovo 2.0were blasted against the 632 protein coding gene references extracted from the assembled mitogenomes. A taxon was considered detected when the sequence coverage reached 90% at 100% sequence identity for at least one reference protein coding gene. This was tested for the standard CO1 barcode region first, and then extended to include the full CO1 gene and other protein coding genes one by one. Taxonomic recovery rates were improved with increased sequencing volume and inclusion of multiple loci.