| Literature DB >> 27084948 |
Hansi Weissensteiner1, Lukas Forer2, Christian Fuchsberger3, Bernd Schöpf2, Anita Kloss-Brandstätter2, Günther Specht4, Florian Kronenberg2, Sebastian Schönherr5.
Abstract
Next generation sequencing (NGS) allows investigating mitochondrial DNA (mtDNA) characteristics such as heteroplasmy (i.e. intra-individual sequence variation) to a higher level of detail. While several pipelines for analyzing heteroplasmies exist, issues in usability, accuracy of results and interpreting final data limit their usage. Here we present mtDNA-Server, a scalable web server for the analysis of mtDNA studies of any size with a special focus on usability as well as reliable identification and quantification of heteroplasmic variants. The mtDNA-Server workflow includes parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation as well as several quality control metrics, often neglected in current mtDNA NGS studies. All computational steps are parallelized with Hadoop MapReduce and executed graphically with Cloudgene. We validated the underlying heteroplasmy and contamination detection model by generating four artificial sample mix-ups on two different NGS devices. Our evaluation data shows that mtDNA-Server detects heteroplasmies and artificial recombinations down to the 1% level with perfect specificity and outperforms existing approaches regarding sensitivity. mtDNA-Server is currently able to analyze the 1000G Phase 3 data (n = 2,504) in less than 5 h and is freely accessible at https://mtdna-server.uibk.ac.at.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27084948 PMCID: PMC4987870 DOI: 10.1093/nar/gkw247
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overall mtDNA-Server workflow for FASTQ and BAM input.
Possible data sources and files sizes for mtDNA-Server
| mtDNA data source | Sample file size (BAM Format) | Mean coverage |
|---|---|---|
| Ancient DNA | <10 MB / sample | ≤100-fold |
| Whole exome sequencing | <20 MB / sample | ≤1,000-fold |
| Whole genome low coverage | 1-80 MB / sample | ≤3,000-fold |
| Whole genome high coverage | 200 MB / sample | ≤20,000-fold |
| Targeted mtDNA sequencing | Up to 1 GB / sample | ∼50,000-fold |
Figure 2.Four plots of the final HTML report: Boxplot over heteroplasmic levels per sample (V), frequency of heteroplasmic variants as a bar plot (VI), locus of the heteroplasmic variants on the mitochondrial genome over all analyzed samples (VII) and coverage plots per sample (X).
mtDNA-Server versus LoFreq on IonTorrent PGM
| Sample mix-up IonTorrent PGM | mtDNA-Server | LoFreq | ||||
|---|---|---|---|---|---|---|
| Precision | Sensitivity | Specificity | Precision | Sensitivity | Specificity | |
| 1:2 | 81.48% | |||||
| 1:10 | ||||||
| 1:50 | 55.56% | |||||
| 1:100 | 11.11% | |||||
mtDNA-Server versus LoFreq on Illumina HiSeq
| Sample mix-up Illumina HiSeq | mtDNA-Server | LoFreq | ||||
|---|---|---|---|---|---|---|
| Precision | Sensitivity | Specificity | Precision | Sensitivity | Specificity | |
| 1:2 | 93.10% | 99.98% | ||||
| 1:10 | 89.29% | 99.98% | ||||
| 1:50 | 82.76% | 88.9% | 99.99% | |||
| 1:100 | 85.2% | 83.87% | 99.99% | |||