| Literature DB >> 25957318 |
Alex Crampton-Platt1, Martijn J T N Timmermans2, Matthew L Gimmel3, Sujatha Narayanan Kutty4, Timothy D Cockerill5, Chey Vun Khen6, Alfried P Vogler7.
Abstract
In spite of the growth of molecular ecology, systematics and next-generation sequencing, the discovery and analysis of diversity is not currently integrated with building the tree-of-life. Tropical arthropod ecologists are well placed to accelerate this process if all specimens obtained through mass-trapping, many of which will be new species, could be incorporated routinely into phylogeny reconstruction. Here we test a shotgun sequencing approach, whereby mitochondrial genomes are assembled from complex ecological mixtures through mitochondrial metagenomics, and demonstrate how the approach overcomes many of the taxonomic impediments to the study of biodiversity. DNA from approximately 500 beetle specimens, originating from a single rainforest canopy fogging sample from Borneo, was pooled and shotgun sequenced, followed by de novo assembly of complete and partial mitogenomes for 175 species. The phylogenetic tree obtained from this local sample was highly similar to that from existing mitogenomes selected for global coverage of major lineages of Coleoptera. When all sequences were combined only minor topological changes were induced against this reference set, indicating an increasingly stable estimate of coleopteran phylogeny, while the ecological sample expanded the tip-level representation of several lineages. Robust trees generated from ecological samples now enable an evolutionary framework for ecology. Meanwhile, the inclusion of uncharacterized samples in the tree-of-life rapidly expands taxon and biogeographic representation of lineages without morphological identification. Mitogenomes from shotgun sequencing of unsorted environmental samples and their associated metadata, placed robustly into the phylogenetic tree, constitute novel DNA "superbarcodes" for testing hypotheses regarding global patterns of diversity.Entities:
Keywords: Coleoptera; Illumina MiSeq; biodiversity; bulk samples; community ecology; metagenome skimming; mitochondrial genomes; mitochondrial metagenomics; phylogeny; tree-of-life
Mesh:
Year: 2015 PMID: 25957318 PMCID: PMC4540967 DOI: 10.1093/molbev/msv111
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Number of Read Pairs after Various Filtering Steps.
| Short Insert Library | Long Insert Library | Total | |
|---|---|---|---|
| Raw reads | 16,996,158 | 16,898,216 | 33,894,374 |
| Adapters removed | 10,701,469 | 11,961,260 | 22,662,729 |
| BLAST filtered | 1,060,340 | 1,344,092 | 2,404,432 |
| BLAST filtered + QC | 846,156 | 1,260,119 | 2,106,275 |
| Estimated mitochondrial | 119,647 | 171,431 | 291,078 |
aMitochondrial-like reads used for Celera Assembler.
bMitochondrial-like reads used for IDBA-UD.
cEstimate requiring both reads per pair to align to a mitochondrial genome sequence (1e-5, ≥100 bp).
Assembly Metrics after Contig Generation with Different Assemblers for all Contigs and the Mitochondrial Subset.
| Assembly (contigs) | Contigs | Minimum (bp) | Maximum (bp) | Mean (bp) | N50 (bp) | Total Length (bp) | Coverage |
|---|---|---|---|---|---|---|---|
| Celera (all) | 11,120 | 88 | 19,685 | 1,050 | 1,038 | 11,677,235 | 16.59 |
| IDBA-UD (all) | 12,588 | 230 | 19,201 | 830 | 794 | 10,457,969 | 4.23 |
| Celera (mt) | 498 | 1,007 | 19,685 | 5,616.6 | 12,804 | 2,797,049 | 27.99 |
| IDBA-UD (mt) | 564 | 1,001 | 19,201 | 4,760.4 | 14,490 | 2,684,874 | 28.12 |
| Non-redundant (mt) | 504 | 1,001 | 19,382 | 5,906.8 | 15,650 | 2,977,013 | 27.71 |
aValue given by Celera Assembler for “ContigsOnly.”
bNot given by the assembler. Calculated as ((mean input read length × number of reads aligned by IDBA-UD)/total length).
cAverage value obtained with Qualimap from mapping quality controlled reads to the 498 mitochondrial contigs with SMALT (default parameters except −y 0.98).
dAverage value obtained with Qualimap from mapping quality controlled reads to the 564 mitochondrial contigs with SMALT (default parameters except −y 0.98).
eAverage value obtained with Qualimap from mapping quality controlled reads to the 504 mitochondrial contigs with SMALT (default parameters except −y 0.98).
The Number of Mitochondrial Contigs per Assembly and in the Non-redundant Set for Four Size Classes.
| Assembly | 1–5 kb | 5–10 kb | 10–15 kb | ≥15 kb (circular) |
|---|---|---|---|---|
| Celera Assembler | 333 | 64 | 28 | 73 (44) |
| IDBA-UD | 424 | 41 | 18 | 81 (49) |
| Non-redundant set | 334 | 45 | 18 | 107 (77) |
FThe frequency and coverage of each PCG in the alignments for the non-redundant data set. (a) The frequency and source (CA, IDBA-UD, both assemblies) of each unique sequence in the alignment. In all cases, both CA and IDBA-UD assembled the majority of sequences. The inclusion of both assemblies provided a small number of novel sequences for all genes, indicating that neither assembly fully captures the diversity of the sample. (b) The coverage by the reads of each base in the alignments is similar between genes and does not explain the variation in frequency in (a).
FThe mitochondrial phylogeny for beetles. The 240 reference mitochondrial genomes are highlighted by superfamily, while 124 contigs contributed by this study, each comprising a minimum of ten PCGs, are shown in black. These novel sequences are distributed throughout the tree, representing most major lineages. The number of novel sequences within each superfamily, and the proportion these represent of the corresponding clade in the community phylogeny (fig. 3b), is highlighted. Tree topology and branch lengths are derived from the maximum clade credibility consensus phylogeny from the PhyloBayes analysis. Open circles indicate posterior probabilities ≥0.5, filled circles indicate posterior probabilities ≥0.95.
FThe minimum ten-gene and community phylogenies. (a) The minimum ten-gene phylogeny: Maximum clade credibility consensus tree for 124 contigs comprising a minimum of ten PCGs (10+ contigs), colored by superfamily inferred from placement in the phylogenetic tree of beetles (fig. 2). Contigs for which there is a corresponding morphological identification confirming their placement (based on a cox1 barcode bait sequence) are shown with solid lines, those which either do not contain the barcode region or did not match to any bait sequence are shown with dashed lines. (b) The community phylogeny: Maximum clade credibility consensus tree for all 175 contigs longer than 2 kb and containing the most frequently recovered gene, nad4l. The colored branches represent the contigs that are in both trees ([a] and [b]), with the sequences unique to tree (b) shown in black. The gene composition of each contig is represented by the presence of gray squares adjacent to the tips, with nad4l highlighted by a black box and the presence of a cox1 barcode sequence indicated by a black star. In both (a) and (b), open circles indicate posterior probabilities ≥0.5, filled circles indicate posterior probabilities ≥0.95.
Success Rates for Assignation of 96 Baited 10+ Contigs to Higher Taxonomic Ranks.
| Taxonomic Rank | Mito34 Topology | Mito240 Topology | BLAST Top Hit (+incorrect) | MEGAN (+incorrect) |
|---|---|---|---|---|
| Family | 30 | 72 | 77 (18) | 53 (13) |
| Superfamily | 29 | 23 | 6 | 5 |
| Infraorder | 31 | — | 5 | 9 |
| Suborder | 6 | 1 | 7 | 12 |
| Order | — | — | — | 10 |
| Subclass | — | — | 1 | 6 |
| Infraclass | — | — | — | 1 |
aThe table gives the lowest hierarchical level to which a contig was assigned, summed across the 96 contigs in the analysis. Note that the assignations based on the phylogenies were often conservative due to the incompleteness of the reference sets, however in all cases these conservative assignations were correct at the rank at which they were made. In contrast, the majority of BLAST-based assignations were at family level (all but one top hits and most MEGAN assignations) but several were incorrect (shown in brackets), instead requiring identifications to be scored as correct at a given taxonomic rank.