| Literature DB >> 23661685 |
Christoph Hahn1, Lutz Bachmann, Bastien Chevreux.
Abstract
We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data-mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling (Thymallus thymallus) and Rainbow trout (Oncorhynchus mykiss). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in <24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23661685 PMCID: PMC3711436 DOI: 10.1093/nar/gkt371
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Read statistics pre and post error correction and trimming for the two samples sequenced using an illumina HiSeq 2000™ instrument
| Sample name | Parasite | Host | Amount genomic DNA (µg) | No. of bases (Gb) | Insert size | Post error correction and trimming | |
|---|---|---|---|---|---|---|---|
| No. bases (Gb) | Average readlength | ||||||
| thy | 3.3 | 4.25 | 242 ± 73 | 2.87 | 97 | ||
| der | 2.4 | 3.46 | 234 ± 85 | 1.72 | 95 | ||
The comparatively high losses for illumina type sequencing data are due to the rigorous removal of reads representing low coverage genomic data by error correction and trimming routines.
Figure 1.Schematic workflow of MITObim procedure. Step one, mitochondrial reads are mapped to conserved regions on related reference sequence. An initial reference for the species in question is build from the mapping result. Step two, fishing reads with overlaps to previously identified regions from the readpool. Step three, mapping this subset of reads and creating new extended reference. Steps two and three are iteratively repeated until all gaps are closed and the number of reads remains stationary. Black rectangles, nuclear reads; red rectangle, mitochondrial genome of distantly related species; green rectangles, mitochondrial reads and growing mitochondrial reference.
Results of case study 4
| COI reference | No. of iterations | Accuracy (%) | Length (%) | Reads (%) | fp rate (%) |
|---|---|---|---|---|---|
| 102 | 100 | 99.98 | 99.30 | 0.27 | |
| 90 | 100 | 99.72 | 99.53 | 0.47 | |
| 100 | 99 | 96.97 | 96.82 | 1.14 | |
| 103 | 100 | 99.62 | 99.43 | 0.57 | |
| 94 | 100 | 99.84 | 99.47 | 0.00 |
For each seed COI sequence, the table illustrates the number of iterations until read number convergence, per base accuracy in percentage, percentage length of the reference mitochondrial genome recovered, fraction of reads recovered from the respective specific readpool and fraction of false positive reads, i.e. reads of heterospecific origin incorporated from the entire mixed readpool.
Summary of the quality assessment for case studies 1 to 3
| Sample name | Case study 1 | Case study 2 | Case study 3 | ||||
|---|---|---|---|---|---|---|---|
| thy | 8/4 (99.87%) | 0/0 (100%) | 0/0 (100%) | 0/0 (100%) | 0/0 (100%) | ||
| der | 8/7 (99.62%) | 0/1 (99.99%) | 0/0 (100%) | 0/1 (99.99%) | |||
Number of single nucleotide substitutions/indels (percentage identity) is given for the used starting reference/seed in respect to the direct mapping result using the correct mitochondrial reference sequence.
Figure 2.Number of MITObim iterations to reach a stationary mitochondrial read number depending on the initial reference used. Data sets ‘thy’ and ‘der’ are represented by dashed and dotted lines, respectively. Initially used reference is indicated by color.