| Literature DB >> 35860401 |
Arne Dür1, Nicole Huber2,3, Alexander Röck4, Cordula Berger2, Christina Amory2, Walther Parson2,5.
Abstract
In this paper we present a new algorithm for splitting (partial) human mitogenomes into components with high similarity to haplogroup motifs of Phylotree. The algorithm reads a (partial) mitogenome coded by the differences to the reference (rCRS) and outputs the estimated haplogroups of the putative components. The algorithm requires no special information on the raw data of the sequencing process and is therefore suited for the post hoc analysis of mixtures of any sequencing technology. The software EMMA 2 implementing the algorithm will be made available via the EMPOP (https://empop.online) database and extends the nine years old software EMMA for haplogrouping single mitogenomes to mixtures with at most three components.Entities:
Keywords: EMPOP; Fluctuation Rates; Mitochondrial DNA; Mixture Deconvolution; mtDNA Phylogeny
Year: 2022 PMID: 35860401 PMCID: PMC9283771 DOI: 10.1016/j.csbj.2022.06.053
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Motif table for the first example in section 3.1.
| Nucleotide | Component 1 | Component 2 |
|---|---|---|
| C | 1 | 0 |
| T | 0 | 1 |
Motif table for the second example in section 3.1.
| Nucleotide | Component 1 | Component 2 | Component 3 |
|---|---|---|---|
| C | 1 | 1 | 0 |
| G | 0 | 0 | 1 |
| T | 0 | 0 | 0 |
Artificial mixtures deconvolved.
| 2 components | 3 components | |
|---|---|---|
| covered by rank 1 combinations | 997/1000 (99.7%) | 95/100 (95%) |
| covered by rank 2 combinations | 3/3 (100%) | 4/5 (80%) |
Haplogroup estimates for IV.3 in rank 1.
| Number of components | Costs | Haplogroup combinations |
|---|---|---|
| 1 | 20.39–20.86 | R |
| 2 | 2.74–3.19 | |
| 3 | 2.74–3.21 | R&V&U4c1 |
Haplogroup/NUMT estimates for the second example in rank 1.
| Number of components | Costs | Haplogroup combinations |
|---|---|---|
| 1 | 23.37–23.85 | A2+(64) |
| 2 | 15.78–16.26 | A2+(64)&CDSN1036 |
| 3 | 13.65–14.12 | A2+(64)&CDSN660&CDSN1036 |
a-i: Deconvolution and haplogroup estimates for GEDNAP two-component mixtures.
| (a): GEDNAP 30 Stain 2 (CR): 16093Y 16172Y 16183 M 16189Y 16209Y 16219R 16278Y 16304Y 16519Y 73R 146Y 185R 263G 291.1a 309.1C 315.1C 456Y 523DEL 524DEL | ||
|---|---|---|
| 1 | 4.72–5.17 | R |
| 2 | 2.09–2.56 | |
| 3 | 2.40–2.89 | R&H5a1j&U6a3a2, H&H5&U6a3a2, H&H5a1j&U6a3 + 185, H1&H5&U6a, H5&R30a1b&U6a3a2, H5&R30a1b&U6a3 + 185, H5&U6a3a2&U6a3c, H5a1j&H5a1 + 16093&U6a3c, H5a1j& |
| True component 16209C 16304C 16519C 263G 309.1C 315.1C 456 T 523DEL 524DEL | H5a1j | |
| True component 16093C 16172C 16183C 16189C 16219G 16278 T 73G 146C 185A 263G 291.1A 309.1C 315.1C 523DEL 524DEL | U6a3a2, | |
a-c: Deconvolution and haplogroup estimates for GEDNAP three-component mixtures.
| (a): GEDNAP 32 Stain 2 (CR): 16256Y 16270Y 16311Y 16352Y 16399R 73R 152Y 263G 309.1C 315.1C | ||
|---|---|---|
| 1 | 2.66–2.88 | R |
| 2 | 1.26–1.75 | R0&U5a1+@16192, R1&U5a1a1 + 152 |
| 3 | 1.20–1.69 | |
| True component 16311C 152C 263G 309.1C 315.1C | H + 152, H1 + 152, H1 + 16311, H13a1 + 152, H13a1a1a, H13a1a2 + 16311, H13a2b1, H14b, H16 + 152, H1bc, H1e1a4, H2 + 152 + 16311, H2a2a2, H3 + 152, H3 + 16311, H30b1, H3q1, H47a, H72, H76, H80, HV + 16311 | |
| True component 16256 T 16270 T 16399G 73G 263G 309.1C 315.1C | U5a1+@16192 | |
| True component 16256 T 16352C 263G 309.1C 309.2C 315.1C | H14a | |