| Literature DB >> 35525961 |
Dana Kristjansson1,2, Jon Bohlin3,4, Truc Trung Nguyen5, Astanand Jugessur3,6, Theodore G Schurr7.
Abstract
BACKGROUND: We combined an unsupervised learning methodology for analyzing mitogenome sequences with maximum likelihood (ML) phylogenetics to make detailed inferences about the evolution and diversification of mitochondrial DNA (mtDNA) haplogroup U5, which appears at high frequencies in northern Europe.Entities:
Keywords: Clade; Haplotype; Migration; Phylogeny; Scandinavia
Mesh:
Substances:
Year: 2022 PMID: 35525961 PMCID: PMC9080151 DOI: 10.1186/s12864-022-08572-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 4.547
HierBAPS groups and their representative subclade(s) based on the human mtDNA U5 haplogroup
| U5a2 | U5a2, U5a2b, U5a2c, U5a2d | U5a2, U5a2 + 16294 T, U5a2b, U5a2b1a, U5a2b1b, U5a2b1c, U5a2b1d, U5a2b2, U5a2b2a, U5a2b3, U5a2b3a, U5a2b3a1, U5a2b4, U5a2b4a, U5a2c, U5a2c1, U5a2c3a, U5a2c4, U5a2d, U5a2d1, U5a2d1a | 90 | 10.3 | |
| U5a2 | U5a2e | U5a2e | 11 | 1.3 | |
| U5a1 | U5a1, U5a1g, U5a1i | U5a1, U5a1b, U5a1b + 16362C, U5a1b1, U5a1b1a, U5a1b1b, U5a1b1b1, U5a1b1c, U5a1b1c1, U5a1b1c2, U5a1b1d + 16093C, U5a1b1d1, U5a1b1e, U5a1b1g, U5a1b1h, U5a1b2, U5a1b3, U5a1b3a, U5a1b3a1, U5a1b4, U5a1d, U5a1d1, U5a1e, U5a1f1a, U5a1f2, U5a1g, U5a1g1, U5a1i, U5a1i1 U5a1j | 128 | 14.6 | |
| U5b1 + U5b3 | U5b1, U5b1a, U5b1d, U5b1f, U5b1i, U5b3 | U5b1, U5b1a, U5b1d1a, U5b1d1b, U5b1d1c, U5b1d2, U5b1f, U5b1f1, U5b1f1a, U5b1i, U5b3, U5b3a1a, U5b3a2, U5b3b1, U5b3b2, U5b3e, U5b3h | 44 | 5 | |
| U5b1 + U5b3 | U5b1 + 16189C!, U5b1b, U5b1c | U5b1 + 16189C!, U5b1b, U5b1b2, U5b1b2a, U5b1b2b, U5b1c, U5b1c1a, U5b1c1a1, U5b1c2, U5b1c2a, U5b1c2b | 72 | 8.2 | |
| U5a1 | U5a1a2 | U5a1a2, U5a1a2a, U5a1a2a1, U5a1a2a1a, U5a1a2b1 | 26 | 3 | |
| U5a1 | U5a1h | U5a1h | 7 | 0.8 | |
| 1 | |||||
| U5a1 | U5a1d2 | U5a1d2a, U5a1d2a1, U5a1d2b | 18 | 2.1 | |
| U5a1 | U5a1c | U5a1c | 28 | 3.2 | |
| U5a1 | U5a1a1 | U5a1a1, U5a1a1a, U5a1a1b, U5a1a1c, U5a1a1d, U5a1a1h, U5a1a1i | 89 | 10.2 | |
| U5b2 | U5b2a | U5b2a, U5b2a1b, U5b2a3, U5b2a3a, U5b2a4, U5b2a4a, U5b2a5, U5b2a5a, U5b2a6 | 25 | 2.9 | |
| U5b2 | U5b2, U5b2c | U5b2, U5b2c1, U5b2c2, U5b2c2b | 11 | 1.3 | |
| U5b2 | U5b2a2 | U5b2a2, U5b2a2a1, U5b2a2b, U5b2a2b1, U5a2a2c | 29 | 3.3 | |
| U5b2 | U5b2b | U5b2b, U5b2b2, U5b2b3a1a, U5b2b4, U5b2b4a, U5b2b5 | 19 | 2.2 | |
| U5b2 | U5b2b1 | U5b2b1a, U5b2b1a1, U5b2a1a2, U5b2b1b | 10 | 1.1 | |
| U5b1 + U5b3 | U5b1b1a | U5b1b1a, U5b1b1a1, U5b1b1a1a, U5b1b1a1a1, U5b1b1a1b, U5b1b1a2, U5b1b1a3 | 83 | 9.3 | |
| U5b1 + U5b3 | U5b1b1 | U5b1b1, U5b1b1 + 152C!, U5b1b1b, U5b1b1d, U5b1b1e, U5b1b1f, U5b1b1g1, U5b1b1g1a | 39 | 4.5 | |
| U5b2 | U5b2a1a + 16311 T! | U5b2a1a + 16311 T!, U5b2a1a1, U5b2a1a1a, U5b2a1a1d | 32 | 3.7 | |
| U5b2 | U5b2a1a2 | U5b2a1a2 | 5 | 0.6 | |
| U5a2 | U5a2a | U5a2a, U5a2a1, U5a2a1 + 152C!, U5a2a1a, U5a2a1b, U5a2a1b1, U5a2a1c, U5a2a1e | 70 | 8 | |
| U5a2 | U5a2a2a | U5a2a2a | 8 | 0.9 | |
| U5b1 + U5b3 | U5b1e1 | U5b1e1, U5b1e1a | 25 | 2.9 | |
| U5b1 + U5b3 | U5b1e1 (+ T8337C) | U5b1e1 (+ T8337C) | 6 | 0.7 |
Shared mtDNA polymorphisms per hierBAPS groupa
| hierBAPS Group | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| I | II | III | IV | V | VI | VII | IX | X | XI | XII | XIII | XIV | |||
| U5a2, U5a2b, U5a2c, U5a2d | U5a2e | U5a1,U5a1g, U5a1i | U5b1, U5b1a, U5b1d, U5b1f, U5b1i, U5b3 | U5b1 + T16189C! U5b1b, U5b1c | U5a1a2 | U5a1h | U5a1d | U5a1c | U5a1a1 | U5b2a | U5b2, U5b2c | U5b2a2 | |||
| Number of shared mitochondrial polymorphisms within each U5 hierBAP group (n) | 19 | 21 | 16 | 18 | 19 | 29 | 45 | 20 | 28 | 21 | 27 | 33 | 35 | ||
| Location | |||||||||||||||
| 146 | HVS-II | T | T | T | T | T | T | T | |||||||
| 150 | HVS-II | T | T | C | T | T | T | ||||||||
| 151 | HVS-II | T | C | ||||||||||||
| 152 | HVS-II | C! | T | T | T | T | T | ||||||||
| 195 | HVS-II | T | T | T | T | T | |||||||||
| 247 | HVS-II | G | G | G | G | G | G | G | G | G | G | G | |||
| 523 | HVS-III | A | A | A | |||||||||||
| 524 | HVS-III | C | C | C | |||||||||||
| 769 | 12S_rRNA | G | G | G | G | G | G | G | G | G | G | G | G | G | |
| 1303 | 12S_rRNA | A | G | ||||||||||||
| 1700 | 16S_rRNA | C | T | C | |||||||||||
| 1721 | 16S_rRNA | C | T | T | T | ||||||||||
| 2757 | 16S_rRNA | A | |||||||||||||
| 3027 | 16S_rRNA | T | C | ||||||||||||
| 3107 | preserves historical genome annotation numbering | d | d | - | d | ||||||||||
| 3192 | 16S_rRNA | T | C | ||||||||||||
| 3197 | 16S_rRNA | C | C | C | C | C | C | T | C | C | C | C | C | C | |
| 3212 | 16S_rRNA | C | T | ||||||||||||
| 3552 | ND1 (Ala—3rd position in codon) | T | C | ||||||||||||
| 3591 | ND1 (Leu—3rd position in codon) | A | G | ||||||||||||
| 3768 | ND1 (Leu—3rd position in codon) | G | A | ||||||||||||
| 4592 | ND2 (Ser—3rd position in codon) | C | T | ||||||||||||
| 4732 | ND2 (Asn—2nd position in codon) | A | G | G | |||||||||||
| 5452 | ND2 (Thr—2nd position in codon) | C | |||||||||||||
| 5495 | ND2 (Phe—3nd position in codon) | T | C | ||||||||||||
| 5656 | position between tRNA-Ala and tRNA-Asn | G | A | ||||||||||||
| 7146 | CO1 (Ala—1st position in codon) | A | A | A | A | A | A | A | A | A | A | A | A | ||
| 7256 | CO1 (Asn—3rd position in codon) | C | C | C | C | C | C | C | C | C | C | C | C | C | |
| 7521 | tRNA-Asp | G | G | G | G | G | G | G | G | G | G | G | G | G | |
| 7768 | CO2 (Met—3rd position in codon) | G | G | A | G | G | G | ||||||||
| 7853 | CO2 (Val—1st position in codon) | G | |||||||||||||
| 8337 | tRNA-Lys | T | |||||||||||||
| 8701 | ATP6 (Ala—1st position codon) | A | A | A | A | A | A | A | A | A | A | A | A | A | |
| 8705 | ATP6 (Met—2nd position codon) | T | |||||||||||||
| 9477 | CO3 (Val—1st position codon) | A | A | A | A | A | A | A | G | A | A | A | A | A | |
| 9540 | CO3 (Leu—1st position codon) | T | T | T | T | T | T | T | T | T | T | T | T | T | T |
| 10,283 | ND3 (Leu—3rd position codon) | A | |||||||||||||
| 10,398 | ND3 (Ala—1st position codon) | A | A | A | A | A | A | A | A | A | A | A | A | A | |
| 10,810 | ND4 (Leu—3rd position codon) | T | T | T | T | T | T | T | T | T | T | T | T | T | |
| 10,873 | ND4 (Pro—3rd position codon) | T | T | T | T | T | T | T | T | T | T | T | T | T | |
| 10,915 | ND4 (Cys—3rd position codon) | T | T | T | T | T | T | C | T | T | T | T | T | T | |
| 10,927 | ND4 (Phe—3rd position codon) | T | |||||||||||||
| 11,296 | ND4 (Leu—3rd position codon) | T | C | ||||||||||||
| 11,653 | ND4 (Val—3rd position codon) | A | |||||||||||||
| 11,914 | ND4 (Thr—3rd position codon) | G | G | G | G | G | G | G | G | G | G | G | |||
| 11,938 | ND4 (Leu—3rd position codon) | T | C | ||||||||||||
| 12,308 | tRNA-Leu | G | G | G | G | G | A | G | G | G | G | G | G | ||
| 12,346 | ND5 (His—1st position codon) | T | C | ||||||||||||
| 12,372 | ND5 (Leu—3rd position codon) | A | A | A | A | A | A | G | A | A | A | A | A | A | |
| 12,406 | ND5 (Val—1st position codon) | G | |||||||||||||
| 12,616 | ND5 (Leu—1st position codon) | T | |||||||||||||
| 12,618 | ND5 (Leu—3rd position codon) | A | G | ||||||||||||
| 12,634 | ND5 (Ile—1st position codon) | A | |||||||||||||
| 12,705 | ND5 (Ile—3rd position codon) | C | |||||||||||||
| 13,105 | ND5 (Val—1st position codon) | A | A | A | A | A | A | A | A | A | A | A | A | ||
| 13,145 | ND5 (Ser—2nd position codon) | G | |||||||||||||
| 13,276 | ND5 (Val—2nd position codon) | A | |||||||||||||
| 13,617 | ND5 (Ile—3rd position codon) | C | C | C | C | C | T | C | C | C | C | C | C | ||
| 13,630 | ND5 (Thr—1st position codon) | A | |||||||||||||
| 13,637 | ND5 (Gln—2nd position codon) | A | G | C | G | ||||||||||
| 14,182 | ND6 (Val—1st position codon) | C | T | C | C | ||||||||||
| 14,518 | ND6 (Gly—1st position codon) | A | |||||||||||||
| 14,793 | CYB (His—2nd position codon) | G | G | G | G | A | G | G | |||||||
| 15,218 | CYB (Thr—1st position codon) | G | G | A | G | G | |||||||||
| 15,497 | CYB (Gly—1st position codon) | G | |||||||||||||
| 15,511 | CYB (Asn—3rd position codon) | T | |||||||||||||
| 15,924 | tRNA-Thr | A | |||||||||||||
| 16,114 | HVS-I | C | |||||||||||||
| 16,129 | HVS-I | G | G | G | G | G | G | G | |||||||
| 16,187 | HVS-I | C | C | C | C | C | C | C | |||||||
| 16,189 | HVS-I | T | T | C! | T!! | ||||||||||
| 16,192 | HVS-I | T | T | C | T | T | |||||||||
| 16,223 | HVS-I | C | C | C | C | C | T | C | C | C | |||||
| 16,230 | HVS-I | A | A | A | A | G | A | A | A | A | A | ||||
| 16,239 | HVS-I | T | C | ||||||||||||
| 16,256 | HVS-I | T | T | C | T | ||||||||||
| 16,270 | HVS-I | C | T | T | |||||||||||
| 16,278 | HVS-I | C | C | C | C | C | C | C | C | ||||||
| 16,294 | HVS-I | C | |||||||||||||
| 16,311 | HVS-I | C! | T | T | T | T | T | ||||||||
| 16,320 | HVS-I | C | T | ||||||||||||
| 16,362 | HVS-I | T | |||||||||||||
| 16,398 | HVS-I | G | A | ||||||||||||
| 16,399 | HVS-I | G | G | A | G | ||||||||||
| 16,465 | HVS-I | C | |||||||||||||
| 16,519 | HVS-I | T | T | ||||||||||||
a The Ancestral state is represented by the RSRS sequence. Blank cells indicate that the nucleotide position was not a factor in determining the hierBAPS group. All BAPS groups also contain the following mutations: 825 T,1018G,2758G,2885 T,3594C,4104A,4312C,8468C,8655C,10664C,10688G,11467G,12705C,13276A,13506C,13650C. Mutations are reckoned in forward evolutionary time direction in reference to the RSRS sequence. In case of a transversion, the derived allele is shown in lowercase instead of uppercase. Exclamation mark signifies back mutation to the ancestral sequence RSRS. (!) for single mutation and (!!) for double back mutation. Yellow-colored boxes indicate mutations that are diagnostic for particular haplogroup or subclade as per Phylotree. ATP ATP synthase, CO Cytochrome c oxidase, CYB Cytochrome b
Fig. 1A phylogenetic tree with 23 hierBAPS groups of haplogroup U5 mitogenome sequences. Roman numerals denote the hierBAPS subclades. The hierBAPS subclades were superimposed on a phylogenetic tree, generated using maximum likelihood analysis, to help visualize the phylogenetic relationships of each sequence. The yellow coloring represents the U5a subhaplogroup while the blue coloring represents the U5b subhaplogroup. RSRS is the Reconstructed Sapiens Reference Sequence
Fig. 2A phylogenetic tree of haplogroup U5 mitogenome sequences. hierBAPS groups are separated by a light blue watercolor. Roman numerals denote the hierBAPS subclades and their representative Phylotree based subhaplogroups. The geographic regions are defined as: Africa (Burkina Faso, Berber, Fulbe, Fulani), Western Europe (Ireland, Germany, United Kingdom), Southern Europe (France, Italy, Spain, Sardinia), Scandinavia (Denmark, Norway, Sweden), Finland, Saami (includes Saami from Scandinavia and Finland), Central Europe (Czech Republic, Hungary (Roma), Poland, Serbia, Slovenia, Slovakia), Eastern Europe (Baltic, Belarus, Caucasus, Russia), Asia (India, Iran). Unknown origins are colorized in grey
Fig. 3A phylogeny of the hierBAPS subclade XVII (subhaplogroup U5b1b1a) with the haplogroup U5 phylogenetic tree). The phylogeny shows detailed branching for sequences by country or ethnic origin. Time estimates kya are shown for mtDNA subhaplogroups (see Table S6). Blank ages indicate that the confidence intervals (CIs) extend to the present day. For clusters older than 200 years old (encircled in black border), the estimated rate is based on calibrated age in years before present (BP) provided by the literature. The size of the circle is proportional to the number of sequences of the same subhaplogroup, with the smallest size corresponding to one sequence. Colors indicate geographic region as in Fig. 2: Western Europe (dark blue), Southern Europe (orange), Scandinavia (light blue), Finland (magenta), Saami (lilac), Central Europe (fluorescent green), Eastern Europe (salmon), Asia (mustard)
Fig. 4Phylogeny of the hiers BAPS subclade III (subhaplogroup U5a/U5a1) in the U5 phylogenetic tree). The phylogeny shows detailed branching for each sequence by country or ethnic origin. Time estimates are provided in kya (Table S7). Blank ages indicate confidence intervals (CIs) that extend to the present day. For clusters older than 200 years old (encircled in black border), the estimated rate provided is based on calibrated age in years before present (calBP) provided by the literature. The size of the circle is proportional to the number of sequences of the same subhaplogroup, with the smallest size corresponding to one sequence. Colors indicate geographic region as in Fig. 2: Western Europe (dark blue), Southern Europe (orange), Scandinavia (light blue), Finland (magenta), Saami (lilac), Central Europe (fluorescent green), Eastern Europe (salmon), Asia (mustard)
Fig. 5The frequency of haplogroup U5 mtDNAs in global populations based on the literature (see Table S3). The proportions of each subhaplogroup are listed, based on the four major hierBAPS groups from the FamilyTreeDNA’s U5 project. The sample sizes for each data set were as follows: Western Europe (n = 537), Scandinavia (n = 397), Sami (n = 78), Finland (n = 344), Southern Europe (n = 124), Central Europe (n = 166), and Eastern Europe (n = 157). Countries within Asia (n = 11) and Africa (n = 4) were combined due to their small sample sizes, with the los frequency of U5 mtDNAs being supported by the literature (Table S3)