| Literature DB >> 32243090 |
Rémi Allio1, Alex Schomaker-Bastos2, Jonathan Romiguier1, Francisco Prosdocimi2, Benoit Nabholz1, Frédéric Delsuc1.
Abstract
Thanks to the development of high-throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routine inference of phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture-based studies. Here, we developed MitoFinder, a user-friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA-UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries, allowing us to confirm species identification using CO1 barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By utilizing the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mitonuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data and whole genome shotgun sequencing in diverse taxa. The MitoFinder software is available from GitHub (https://github.com/RemiAllio/MitoFinder).Entities:
Keywords: DNA barcoding; bioinformatics; insects; invertebrates; metagenomics; systematics
Mesh:
Substances:
Year: 2020 PMID: 32243090 PMCID: PMC7497042 DOI: 10.1111/1755-0998.13160
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
FIGURE 1Conceptualization of the pipeline used to assemble and extract UCE and mitochondrial signal from ultraconserved element sequencing data [Colour figure can be viewed at wileyonlinelibrary.com]
Summary statistics on assembly results according to the assembler used
| Assembler | Assembly time | UCEs | ||||
|---|---|---|---|---|---|---|
| Number of contigs | Number of loci | Matrix size | % Variable sites | % Missing data | ||
| IDBA‐UD (5 CPU) | 0 hr:11 min:02 s | 30.544 | 2,581 | 132.403 | 43.9 | 6.7 |
| MEGAHIT (5 CPU) | 0 hr:12 min:35 s | 114.392 | 2,579 | 147.589 | 43.2 | 12.5 |
| MetaSPAdes (5 CPU) | 0 hr:25 min:42 s | 113.303 | 2,582 | 156.456 | 44.3 | 6.1 |
| Trinity (35 CPU) | 1 hr:06 min:22 s | 43.481 | 2,579 | 127.803 | 40.5 | 17.8 |
The values are averages over the 501 assemblies, except for the assembly time, which is a median value. The two parts of the table report specific statistics for (a) ultraconserved elements data, and (b) mitochondrial data. Note that 35 CPUs were used for trinity whereas 5 CPUs were used for other assemblers.
FIGURE 2Comparison of the efficiency of the assemblers in terms of: (a) computational time, (b) number of potential mitochondrial contigs identified, and (c) number of mitochondrial genes annotated. Violin plots reflect the data distribution with a horizontal line indicating the median. Note that for the three metagenomic assemblers, 5 CPUs were used compared to 35 CPUs for trinity. Plots were obtained using plotsofdata (Postma & Goedhart, 2019) [Colour figure can be viewed at wileyonlinelibrary.com]
Statistical comparison between the performances of the different assemblers
| Number of mtDNA contigs | Number of mtDNA genes | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| IDBA‐UD | MEGAHIT | MetaSPAdes | Trinity | IDBA‐UD | MEGAHIT | MetaSPAdes | Trinity | ||
| IDBA‐UD | ** (+) | *** (+) | NS (–) | IDBA‐UD | *** (–) | *** (–) | *** (–) | ||
| MEGAHIT | ** (–) | * (+) | *** (–) | MEGAHIT | *** (+) | * (–) | *** (+) | ||
| MetaSPAdes | *** (–) | * (–) | *** (–) | MetaSPAdes | *** (+) | * (+) | *** (+) | ||
| Trinity | NS (+) | *** (+) | *** (+) | Trinity | *** (+) | *** (+) | *** (–) | ||
Statistical significance was estimated with a paired nonparametric test (paired Wilcoxon test). ***p < .001; **p < .01; *p < .05; NS = p> .05; and (+)/(–) indicates the result of the comparison between the row and the column.
FIGURE 3Phylogenomic relationships of ants (Formicidae). (a) Mitonuclear phylogenetic differences among subfamily relationships based on the UCE and mtDNA supermatrices obtained with the MetaSPAdes assembler. Clades corresponding to subfamilies were collapsed. Inter‐subfamily relationships with UFBS <95% were collapsed. Nonmaximal node support values are reported. (b) The topology obtained reflects the results of phylogenetic analyses based on the amino acid mitochondrial supermatrix (using MetaSPAdes as assembler). Histograms reflect the percentage of UCEs (light grey) and mitochondrial genes (dark grey) recovered for each species. Illustrative pictures (*): Diacamma sp. (Ponerinae; top left), Formica sp. (Formicinae; top right) and Messor barbarus (Myrmicinae; bottom right) [Colour figure can be viewed at wileyonlinelibrary.com]