| Literature DB >> 27084951 |
Hansi Weissensteiner1, Dominic Pacher2, Anita Kloss-Brandstätter2, Lukas Forer2, Günther Specht3, Hans-Jürgen Bandelt4, Florian Kronenberg2, Antonio Salas5, Sebastian Schönherr6.
Abstract
Mitochondrial DNA (mtDNA) profiles can be classified into phylogenetic clusters (haplogroups), which is of great relevance for evolutionary, forensic and medical genetics. With the extensive growth of the underlying phylogenetic tree summarizing the published mtDNA sequences, the manual process of haplogroup classification would be too time-consuming. The previously published classification tool HaploGrep provided an automatic way to address this issue. Here, we present the completely updated version HaploGrep 2 offering several advanced features, including a generic rule-based system for immediate quality control (QC). This allows detecting artificial recombinants and missing variants as well as annotating rare and phantom mutations. Furthermore, the handling of high-throughput data in form of VCF files is now directly supported. For data output, several graphical reports are generated in real time, such as a multiple sequence alignment format, a VCF format and extended haplogroup QC reports, all viewable directly within the application. In addition, HaploGrep 2 generates a publication-ready phylogenetic tree of all input samples encoded relative to the revised Cambridge Reference Sequence. Finally, new distance measures and optimizations of the algorithm increase accuracy and speed-up the application. HaploGrep 2 can be accessed freely and without any registration at http://haplogrep.uibk.ac.at.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27084951 PMCID: PMC4987869 DOI: 10.1093/nar/gkw233
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
HaploGrep 2 runtime and concordance over different metrics
| NGS mtDNA dataset | Sample size | Full concordance over all metrics | HaploGrep 2 Runtime (including QC) |
|---|---|---|---|
| 1000G Phase 1 | 1,074 | 98.0% | 5.7 s |
| Li | 2,000 | 95.1% | 7.7 s |
| 1000G Phase 3 | 2,504 | 93.9% | 13.0 s |
Full concordance means that all samples are classified into the same haplogroup for the different metrics. The HaploGrep 2 runtime refers to the calculation of the Kulczynski distance including all quality checks from the rule-based engine. The detailed results are provided in Supplementary Tables S2, 3 and 4.
Figure 1.Excerpt of the 1000G Phase 1 data generated with the new provided ‘Graphical Phylogenetic Tree’. Polymorphisms in the tips of the phylogeny are candidates for new haplogroups, see for instance the samples belonging to haplogroup D4j15, (confirmed to be related (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130606_sample_info/20130606_sample_info.xlsx)) or samples HG00699 and HG00421 (not related). Polymorphisms marked in red are not occurring in Phylotree and may require additional attention, whereas mutations in blue are private polymorphisms for this group, already known by Phylotree. The annotation of amino acid changes and mutational hotspots (green) can be defined by the user, thereby hotspots at positions 16182, 16183 and 16519, AC insertion and deletions at 515–524, inserts at 16193 as well as variation around position 310 and point heteroplasmies can be excluded for the phylogenetic reconstruction.