Literature DB >> 25028726

MToolBox: a highly automated pipeline for heteroplasmy annotation and prioritization analysis of human mitochondrial variants in high-throughput sequencing.

Claudia Calabrese1, Domenico Simone1, Maria Angela Diroma1, Mariangela Santorsola1, Cristiano Guttà1, Giuseppe Gasparre1, Ernesto Picardi2, Graziano Pesole2, Marcella Attimonelli1.   

Abstract

MOTIVATION: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data.
RESULTS: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets.
AVAILABILITY AND IMPLEMENTATION: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25028726      PMCID: PMC4201154          DOI: 10.1093/bioinformatics/btu483

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Emerging discoveries in human mitochondrial genetics, driven by the advent of next-generation sequencing, have revealed that individuals exhibit a complex mixture of mitochondrial genotypes (He ) and carry low-level heteroplasmic variants (Payne ). On the other hand, the deeper the sequencing coverage, the higher the number of mitochondrial DNA (mtDNA) variants and the variety of heteroplasmic ranges found per individual (Diroma ; He ; Payne ). In this frame, the deep sequencing of mtDNA raises the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated bioinformatics pipeline to reconstruct and analyze human mtDNA from high-throughput sequencing (HTS) data. The MToolBox workflow includes a computational strategy to assemble mitochondrial genomes from whole-exome sequencing (WXS) and/or whole-genome sequencing (WGS) data (Picardi and Pesole, 2012), which was further updated to detect insertions and deletions (ins/dels) and to assess the heteroplasmic fraction (HF) of each variant allele with the related confidence interval (CI), reported as sample-specific meta-information in an enhanced version of the Variant Call Format (VCF) file (version 4.0). The MToolBox pipeline analyzes the reconstructed genomes for haplogroup assignment (Rubino ) and variant prioritization.

2 METHODS

2.1 Mitochondrial reads extraction, genome reconstruction and VCF file generation

The MToolBox pipeline integrates in a unique automatic workflow a computational strategy for mtDNA data extraction from WXS and WGS data (Picardi and Pesole, 2012), where new important features have been added. MToolBox can accept as input raw data or prealigned reads (Fig. 1a). In both cases, reads are mapped/remapped by the mapExome.py script (Fig. 1b) at user’s choice either onto the Reconstructed Sapiens Reference Sequence (RSRS; Behar ) or the revised Cambridge Reference Sequence (rCRS; Andrews ). Subsequently, reads mapped on mtDNA are realigned onto the nuclear genome (GRCh37/hg19), to discard Nuclear mitochondrial Sequences (NumtS; Simone ; Fig. 1c) and amplification artifacts. The resulting Sequence Alignment/Map (SAM) file (Fig. 1d) can be optionally processed for ins/dels realignment around a set of known ins/dels, annotated in HmtDB (Rubino ) and MITOMAP (Ruiz-Pesini ), and for putative PCR duplicates removal (Fig. 1e–h and Supplementary Information). This step generates a dataset of highly reliable mitochondrial aligned reads, which is used to reconstruct a complete mitochondrial genome by the assembleMTgenome.py script (Fig. 1i), now integrating the mtVariantCaller.py module for nucleotide mismatches and ins/dels detection. All the genomic variants are filtered based on the quality scores and read depth, and annotated in a VCF file (v.4.0), with the corresponding HF and CI values (Fig. 1j and Supplementary Information).
Fig. 1.

The main steps of the MToolBox workflow: (a–d) read mapping and NumtS filtering; (e–h) post-mapping processing; (i–m) genome assembly, haplogroup prediction and variant annotation. In brackets, programs or modules particularly important for the associated process. Solid connectors indicate mandatory pipeline steps; dashed connectors (e–g) indicate that the corresponding post-mapping steps can be optional, otherwise the OUT2.sam file directly undergoes the assembly process (h). Please refer to Supplementary Information for a detailed description of MToolBox workflow steps

The main steps of the MToolBox workflow: (a–d) read mapping and NumtS filtering; (e–h) post-mapping processing; (i–m) genome assembly, haplogroup prediction and variant annotation. In brackets, programs or modules particularly important for the associated process. Solid connectors indicate mandatory pipeline steps; dashed connectors (e–g) indicate that the corresponding post-mapping steps can be optional, otherwise the OUT2.sam file directly undergoes the assembly process (h). Please refer to Supplementary Information for a detailed description of MToolBox workflow steps

2.2 Haplogroup prediction and prioritization analysis of mitochondrial variants

MToolBox provides an output file with reconstructed contig sequence(s) (Contigs.fa) (Fig. 1k and Supplementary Information). Each set of contigs is subjected to haplogroup prediction, relying on the RSRS-based Phylotree resource (van Oven and Kayser, 2009), by mt-classifier (Fig. 1l), an updated version of the fragment-classify tool (Rubino ), which now includes a module to perform functional annotation and prioritization of mitochondrial variants (Fig. 1m and Supplementary Information). This latter analysis is carried out by aligning each sample-specific reconstructed contig against the related macro-haplogroup-specific consensus sequence (Supplementary Information) to recognize, via a prioritization process, private variants, deserving further clinical investigation. The prioritization takes into account also the pathogenicity of each mutated allele, determined with different algorithms, and the nucleotide variability of each variant site; amino acid variability is also considered if the variant site is codogenic (Supplementary Information). For each mutated allele, additional annotations are also reported, i.e. annotation from HmtDB and MITOMAP resources and their occurrence among 1000 Genomes Project samples (Supplementary Information). Variants of assembled genomes are also reported with respect to rCRS (Supplementary Information), to ensure a full compatibility of the resulting annotation with the current clinical literature (Bandelt ).

3 RESULTS

The MToolBox performance in heteroplasmy detection was tested on four artificial heteroplasmic samples, whose sequencing was simulated at different mean depth (Supplementary Information). MToolBox showed high specificity and sensitivity in detecting all the artificial heteroplasmy tested, with an average coverage depth equal or above 1000×. MToolBox was extensively applied on WXS data from 1000 Genomes (Genomes Project and Supplementary Information), to obtain a VCF file of mtDNA variants from 2419 individuals (available at https://sourceforge.net/projects/mtoolbox/files/1000Genomes_data/). Reliability of reconstructed mitochondrial genomes was confirmed by their haplogroup predictions, the majority of which coherent with the ancestry of the related individual (Supplementary Information). The accuracy in heteroplasmy detection and quantification was confirmed by the results from four mother–child pairs that showed the expected pattern of mtDNA inheritance (Supplementary Information).

4 DISCUSSION

A highly automated pipeline for mtDNA analysis from HTS data is not available to date. To fill this gap, we developed MToolBox, an effective workflow with customizable parameters and able to analyze multiple samples in a single run. MToolBox is the only tool that generates as output a VCF file, the standard format for large-scale genotyping information, suitably customized for mitochondrial data, by including the heteroplasmy fraction and its related CI. In fact, also the MitoSeek tool (Guo ) performs mitochondrial HTS data analyses, including somatic and structural variant recognition. Additionally, MToolBox provides the user with essential analyses of reconstructed mitochondrial genomes, i.e. haplogroup assignment and variant prioritization, exploiting a broad collection of annotation resources. Thus, MToolBox may provide a valuable support for the recognition of candidate mitochondrial mutations in clinical studies. Funding: This work was supported by Progetto Strategico ‘Invecchiamento’ e ‘Medicina Personalizzata’ (CNR, Italy) and the PRIN2009 fund assigned to M.A. The computational work has been executed on the IT resources made available by the ReCaS project (PONa3_00052). Conflicts of interest: none declared.
  13 in total

1.  A "Copernican" reassessment of the human mitochondrial DNA tree from its root.

Authors:  Doron M Behar; Mannis van Oven; Saharon Rosset; Mait Metspalu; Eva-Liis Loogväli; Nuno M Silva; Toomas Kivisild; Antonio Torroni; Richard Villems
Journal:  Am J Hum Genet       Date:  2012-04-06       Impact factor: 11.025

2.  Mitochondrial genomes gleaned from human whole-exome sequencing.

Authors:  Ernesto Picardi; Graziano Pesole
Journal:  Nat Methods       Date:  2012-05-30       Impact factor: 28.547

Review 3.  The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies.

Authors:  Hans-Jürgen Bandelt; Anita Kloss-Brandstätter; Martin B Richards; Yong-Gang Yao; Ian Logan
Journal:  J Hum Genet       Date:  2013-12-05       Impact factor: 3.172

4.  MitoSeek: extracting mitochondria information and performing high-throughput mitochondria sequencing analysis.

Authors:  Yan Guo; Jiang Li; Chung-I Li; Yu Shyr; David C Samuels
Journal:  Bioinformatics       Date:  2013-03-06       Impact factor: 6.937

5.  Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation.

Authors:  Mannis van Oven; Manfred Kayser
Journal:  Hum Mutat       Date:  2009-02       Impact factor: 4.878

6.  Heteroplasmic mitochondrial DNA mutations in normal and tumour cells.

Authors:  Yiping He; Jian Wu; Devin C Dressman; Christine Iacobuzio-Donahue; Sanford D Markowitz; Victor E Velculescu; Luis A Diaz; Kenneth W Kinzler; Bert Vogelstein; Nickolas Papadopoulos
Journal:  Nature       Date:  2010-03-03       Impact factor: 49.962

7.  HmtDB, a genomic resource for mitochondrion-based human variability studies.

Authors:  Francesco Rubino; Roberta Piredda; Francesco Maria Calabrese; Domenico Simone; Martin Lang; Claudia Calabrese; Vittoria Petruzzella; Mila Tommaseo-Ponzetta; Giuseppe Gasparre; Marcella Attimonelli
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

8.  The reference human nuclear mitochondrial sequences compilation validated and implemented on the UCSC genome browser.

Authors:  Domenico Simone; Francesco Maria Calabrese; Martin Lang; Giuseppe Gasparre; Marcella Attimonelli
Journal:  BMC Genomics       Date:  2011-10-20       Impact factor: 3.969

9.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

10.  Extraction and annotation of human mitochondrial genomes from 1000 Genomes Whole Exome Sequencing data.

Authors:  Maria Angela Diroma; Claudia Calabrese; Domenico Simone; Mariangela Santorsola; Francesco Maria Calabrese; Giuseppe Gasparre; Marcella Attimonelli
Journal:  BMC Genomics       Date:  2014-05-06       Impact factor: 3.969

View more
  86 in total

Review 1.  The unresolved role of mitochondrial DNA in Parkinson's disease: An overview of published studies, their limitations, and future prospects.

Authors:  Amica C Müller-Nedebock; Rebecca R Brennan; Marianne Venter; Ilse S Pienaar; Francois H van der Westhuizen; Joanna L Elson; Owen A Ross; Soraya Bardien
Journal:  Neurochem Int       Date:  2019-06-21       Impact factor: 3.921

2.  Phy-Mer: a novel alignment-free and reference-independent mitochondrial haplogroup classifier.

Authors:  Daniel Navarro-Gomez; Jeremy Leipzig; Lishuang Shen; Marie Lott; Alphons P M Stassen; Douglas C Wallace; Janey L Wiggs; Marni J Falk; Mannis van Oven; Xiaowu Gai
Journal:  Bioinformatics       Date:  2014-12-12       Impact factor: 6.937

3.  Leber's hereditary optic neuropathy (LHON) in an Apulian cohort of subjects.

Authors:  Angelica Bianco; Luigi Bisceglia; Paolo Trerotoli; Luciana Russo; Leonardo D'Agruma; Silvana Guerriero; Vittoria Petruzzella
Journal:  Acta Myol       Date:  2017-09-01

4.  Accurate and comprehensive analysis of single nucleotide variants and large deletions of the human mitochondrial genome in DNA and single cells.

Authors:  Filippo Zambelli; Kim Vancampenhout; Dorien Daneels; Daniel Brown; Joke Mertens; Sonia Van Dooren; Ben Caljon; Luca Gianaroli; Karen Sermon; Thierry Voet; Sara Seneca; Claudia Spits
Journal:  Eur J Hum Genet       Date:  2017-08-23       Impact factor: 4.246

5.  Intra-individual purifying selection on mitochondrial DNA variants during human oogenesis.

Authors:  Sara De Fanti; Saverio Vicario; Martin Lang; Domenico Simone; Cristina Magli; Donata Luiselli; Luca Gianaroli; Giovanni Romeo
Journal:  Hum Reprod       Date:  2017-05-01       Impact factor: 6.918

6.  MSeqDR mvTool: A mitochondrial DNA Web and API resource for comprehensive variant annotation, universal nomenclature collation, and reference genome conversion.

Authors:  Lishuang Shen; Marcella Attimonelli; Renkui Bai; Marie T Lott; Douglas C Wallace; Marni J Falk; Xiaowu Gai
Journal:  Hum Mutat       Date:  2018-04-06       Impact factor: 4.878

Review 7.  Clinical Bioinformatics in Precise Diagnosis of Mitochondrial Disease.

Authors:  Lishuang Shen; Elizabeth M McCormick; Colleen Clarke Muraresku; Marni J Falk; Xiaowu Gai
Journal:  Clin Lab Med       Date:  2020-06       Impact factor: 1.935

8.  Spectrum of mitochondrial genomic variation in parathyroid neoplasms.

Authors:  Ya Hu; Xiang Zhang; Ou Wang; Xiaoping Xing; Ming Cui; Mengyi Wang; Chengli Song; Quan Liao
Journal:  Endocrine       Date:  2021-07-22       Impact factor: 3.633

9.  Identification of Variants in Mitochondrial D-Loop and OriL Region and Analysis of Mitochondrial DNA Copy Number in Women with Polycystic Ovary Syndrome.

Authors:  Pallavi Shukla; Srabani Mukherjee; Anushree Patil
Journal:  DNA Cell Biol       Date:  2020-06-08       Impact factor: 3.311

10.  MSeqDR: A Centralized Knowledge Repository and Bioinformatics Web Resource to Facilitate Genomic Investigations in Mitochondrial Disease.

Authors:  Marni J Falk; Xiaowu Gai; Lishuang Shen; Maria Angela Diroma; Michael Gonzalez; Daniel Navarro-Gomez; Jeremy Leipzig; Marie T Lott; Mannis van Oven; Douglas C Wallace; Colleen Clarke Muraresku; Zarazuela Zolkipli-Cunningham; Patrick F Chinnery; Marcella Attimonelli; Stephan Zuchner
Journal:  Hum Mutat       Date:  2016-03-21       Impact factor: 4.878

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.