| Literature DB >> 23948335 |
Alexander W Röck1, Arne Dür2, Mannis van Oven3, Walther Parson4.
Abstract
The assignment of haplogroups to mitochondrial DNA haplotypes contributes substantial value for quality control, not only in forensic genetics but also in population and medical genetics. The availability of Phylotree, a widely accepted phylogenetic tree of human mitochondrial DNA lineages, led to the development of several (semi-)automated software solutions for haplogrouping. However, currently existing haplogrouping tools only make use of haplogroup-defining mutations, whereas private mutations (beyond the haplogroup level) can be additionally informative allowing for enhanced haplogroup assignment. This is especially relevant in the case of (partial) control region sequences, which are mainly used in forensics. The present study makes three major contributions toward a more reliable, semi-automated estimation of mitochondrial haplogroups. First, a quality-controlled database consisting of 14,990 full mtGenomes downloaded from GenBank was compiled. Together with Phylotree, these mtGenomes serve as a reference database for haplogroup estimates. Second, the concept of fluctuation rates, i.e. a maximum likelihood estimation of the stability of mutations based on 19,171 full control region haplotypes for which raw lane data is available, is presented. Finally, an algorithm for estimating the haplogroup of an mtDNA sequence based on the combined database of full mtGenomes and Phylotree, which also incorporates the empirically determined fluctuation rates, is brought forward. On the basis of examples from the literature and EMPOP, the algorithm is not only validated, but both the strength of this approach and its utility for quality control of mitochondrial haplotypes is also demonstrated.Entities:
Keywords: EMPOP; Fluctuation rates; Haplogroup; Phylotree; mtDNA
Mesh:
Substances:
Year: 2013 PMID: 23948335 PMCID: PMC3819997 DOI: 10.1016/j.fsigen.2013.07.005
Source DB: PubMed Journal: Forensic Sci Int Genet ISSN: 1872-4973 Impact factor: 4.882
Samples from Table 2 of Ref. [7] with updated haplogroup classification based on Phylotree Build 15 according to HaploGrep and EMMA.
| No. | Related GenBank sample in Ref. | (Updated) haplogroup of related GenBank sample from Ref. | Likely haplogroup | HaploGrep | QV | EMMA | Costs | Rank 1 results EMMA | Missing mutations | Private mutations |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | M20 | M20 | M20 | 90.8 | M20 | 0.72 | T16362C T16519C | None | ||
| 2 | N10a | N10a | N10a | 86.6 | N10a | 3.06 | C16111T T16224C T16519C | A16258T T16311C | ||
| 3 | N10b | N10b | N10b | 100.0 | N10b | 0.00 | N10b | None | None | |
| 0.00 | None | −309.2C | ||||||||
| 4 | E1a1a | M52a | M52a | 74.9 | M52a | 4.62 | M52a | None | C16114A T16126C C16218T C16291T T16356C G16391A | |
| 5 | E1a1a | E1a1a | E1a1a | 100.0 | E1a1a, E1a1a1 | 0.00 | E1a1a | None | None | |
| 0.00 | E1a1a1 | None | None | |||||||
| 0.00 | None | −309.1C | ||||||||
| 6 | U5b2a1b | U5b2a1b | U5b2a1b | 93.5 | U5b2a1b | 0.61 | U5b2a1b, | None | G16319A | |
| 7 | H1a3c | H1a3 | H1a3c | 90.4 | H1a3c | 0.39 | None | T146C | ||
| 8 | J2a1a1a | J2a1a1 | J2a1a1a | 100.0 | J2a1a1a, J2a1a1a2 | 0.00 | J2a1a1a | None | None | |
| 0.00 | J2a1a1a2 | None | None | |||||||
| 0.00 | None | None | ||||||||
| 9 | K2b1a1 | K2b1a | K2b1a1 | 94.1 | K2b1a1 | 0.00 | None | None | ||
| 10 | U5b2a1a1 | U5b2a1a1 | U5b2a1a2 | 75.1 | U5b2a1a1 | 0.00 | None | None | ||
| 11 | K1a4a1e | K1a(K1a4a1) | K1a4a1e | 94.4 | K1a4a1e | 0.52 | K1a4a1e | None | T204C | |
| 0.52 | −524.3A −524.4C | T204C | ||||||||
| 12 | J1c6 | J1c6 | J1c + 16261 + 189 | 92.2 | J1c-16261-189, J1c12 | 0.63 | J1c-16261-189 | T16126C | None | |
| 0.63 | J1c12 | T16126C | None | |||||||
| 13 | K1a4a1e | K1a(K1a4a1) | K1a4a1e | 94.4 | K1a4a1e | 1.51 | K1a4a1e | None | T204C A272G | |
| 1.51 | −524.3A −524.4C | T204C A272G | ||||||||
| 14 | U1a1 | U1a1 | U1a1b | 95.1 | U1a1b | 0.00 | None | −16193.1C | ||
| 15 | J2a1a1a2 | J2a1a | J2a1a1 | 86.0 | J2a1a1 | 0.00 | None | None | ||
| 16 | U1a1 | U1a1 | U1a1b | 95.1 | U1a1b | 0.00 | −309.2C | −16193.1C | ||
| 17 | V7a1, excluded for EMMA | V7a | V7a | 76.1 | V7a, V7a1 | 1.55 | V7a | None | A73G A95C | |
| 1.55 | V7a1 | None | A73G A95C | |||||||
| 1.55 | −309.2C | A73G A95C | ||||||||
| 18 | V7a1, excluded for EMMA | V7a | V7a | 85.8 | V7a, V7a1 | 1.55 | V7a | None | A73G A95C | |
| 1.55 | V7a1 | None | A73G A95C | |||||||
| 1.55 | −309.2C | A73G A95C | ||||||||
| 19 | B2d | B4b(B2d) | B4b | 89.0 | B2d | 0.00 | None | −16193.1C −309.2C −309.3C | ||
| 20 | M43a | M(M43a) | D4e2a | 103.3 | D4e2a, M10, M10a, M74, M74b, M74b2, D4j-16311, D4j11, M43a | 0.38 | D4e2a | None | T16311C −573.2C −573.3C | |
| 0.47 | M10 | None | T16362C −573.2C −573.3C | |||||||
| 0.47 | M10a | None | T16362C −573.2C −573.3C | |||||||
| 0.53 | M74 | None | −573.1C −573.2C −573.3C | |||||||
| 0.53 | M74b | None | −573.1C −573.2C −573.3C | |||||||
| 0.53 | M74b2 | None | −573.1C −573.2C −573.3C | |||||||
| 0.53 | D4j-16311 | None | −573.1C −573.2C −573.3C | |||||||
| 0.53 | D4j11 | None | −573.1C −573.2C −573.3C | |||||||
| 0.53 | −309.1C | −573.1C −573.2C −573.3C | ||||||||
| 0.53 | −309.1C | −573.1C −573.2C −573.3C | ||||||||
| 0.63 | None | T16311C T16519C −573.2C −573.3C | ||||||||
| 0.63 | −573.4C | T16311C T16519C | ||||||||
| 21 | M74b | M74 | D4j1b2 | 79.8 | M74b | 2.71 | C16214T | T16093C T16172C T16297C | ||
| 22 | V7a1, excluded for EMMA | V7a | V7a | 90.7 | V7a, V7a1 | 0.80 | V7a | None | A95C | |
| 0.80 | V7a1 | None | A95C | |||||||
| 23 | V-@72 | HV(V) | V + @72 | 100.0 | V-@72, V1a1, V1a, V1a1b | 0.00 | V-@72 | None | None | |
| 0.25 | −309.1C | T16519C | ||||||||
| 24 | sample not in GenBank anymore | HV1(HV1b) | HV1b3 | 96.0 | HV1b3 | 0.00 | None | None | ||
| 25 | Z1a1a | Z1a1a | Z1a | 100.0 | Z1a, Z1a1, Z1a1a, Z1a2 | 0.00 | Z1a | None | None | |
| 0.00 | −309.1C | None | ||||||||
| 26 | J2a2a, excluded for EMMA | J2a2a | J2a2a | 86.8 | J2a2a | 0.00 | None | None |
Likely haplogroup assigned in Ref. [7] based on Phylotree Build 13.
Haplogroup classification by HaploGrep using Phylotree Build 15.
Quality value assigned by HaploGrep.
Haplogroups of rank 1 results estimated by EMMA.
Estimated costs by EMMA; the default cost range of 0.3 was applied for all calculations.
Source for rank 1 haplogroups by EMMA; Haplogroup labels indicate virtual haplotypes from Phylotree, GenBank accession numbers indicate mtGenomes.
Mutations within the test profile's range that are not present in the test profile but in the database profile.
Mutations within the test profile's range that are present in the test profile but absent in the database profile.