| Literature DB >> 25468874 |
Pille Hallast1, Chiara Batini1, Daniel Zadik1, Pierpaolo Maisano Delser1, Jon H Wetton1, Eduardo Arroyo-Pardo2, Gianpiero L Cavalleri3, Peter de Knijff4, Giovanni Destro Bisol5, Berit Myhre Dupuy6, Heidi A Eriksen7, Lynn B Jorde8, Turi E King1, Maarten H Larmuseau9, Adolfo López de Munain10, Ana M López-Parra2, Aphrodite Loutradis11, Jelena Milasin12, Andrea Novelletto13, Horolma Pamjav14, Antti Sajantila15, Werner Schempp16, Matt Sears1, Aslıhan Tolun17, Chris Tyler-Smith18, Anneleen Van Geystelen19, Scott Watkins8, Bruce Winney20, Mark A Jobling21.
Abstract
Many studies of human populations have used the male-specific region of the Y chromosome (MSY) as a marker, but MSY sequence variants have traditionally been subject to ascertainment bias. Also, dating of haplogroups has relied on Y-specific short tandem repeats (STRs), involving problems of mutation rate choice, and possible long-term mutation saturation. Next-generation sequencing can ascertain single nucleotide polymorphisms (SNPs) in an unbiased way, leading to phylogenies in which branch-lengths are proportional to time, and allowing the times-to-most-recent-common-ancestor (TMRCAs) of nodes to be estimated directly. Here we describe the sequencing of 3.7 Mb of MSY in each of 448 human males at a mean coverage of 51×, yielding 13,261 high-confidence SNPs, 65.9% of which are previously unreported. The resulting phylogeny covers the majority of the known clades, provides date estimates of nodes, and constitutes a robust evolutionary framework for analyzing the history of other classes of mutation. Different clades within the tree show subtle but significant differences in branch lengths to the root. We also apply a set of 23 Y-STRs to the same samples, allowing SNP- and STR-based diversity and TMRCA estimates to be systematically compared. Ongoing purifying selection is suggested by our analysis of the phylogenetic distribution of nonsynonymous variants in 15 MSY single-copy genes.Entities:
Keywords: Y-STRs; Y-chromosome phylogeny; purifying selection; single nucleotide polymorphisms; targeted resequencing
Mesh:
Year: 2014 PMID: 25468874 PMCID: PMC4327154 DOI: 10.1093/molbev/msu327
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
NGS Studies of Human Y-Chromosome Diversity.
| Study | Approach | Mb | Mean Read Depth | Sample Choice | SNPs Imputed? | SNPs Found | Overlap with This Study (%) | |
|---|---|---|---|---|---|---|---|---|
| WGS | Unclear | 1.8× | 77 | 4 HapMap populations | No (ML tree) | 2,870 | 635/13,261 | |
| WGS | 8.97 | 28.4× | 36 | Various (Complete Genomics data set, plus hg A male) | Yes | 5,865 (+56 MNPs, 741 indels) | 1,776/13,261 (13.4) | |
| WGS | 9.9 | Median 3.1× at var sites | 69 | 9 populations (7 from HGDP) | Yes | 11,640 | 2,420/13,261 (18.25) | |
| WGS | 8.97 | 2.16× | 1,208 | Sardinian population | Yes | 11,763 (no singletons) | 2,229/13,261 (16.8) | |
| SC | 1.50 | 50× | 68 | Phylogenetic | Not stated | 2,386 | 665/13,261 (5.01) | |
| This study | SC | 3.7 | 51× | 448 | 19 pops. + phylogenetic | No | 13,261 | Novel: 8,742/ 13,261 (65.9) |
WGS, whole-genome sequence; SC, sequence capture; ML, maximum likelihood; HGDP, Human Genome Diversity Panel.
aBased on available file containing 2,788 variants in 75 individuals.
FDistribution of sequenced regions on the MSY. At the top is shown a schematic representation of the Y chromosome and the analyzed subregion, with the distribution of the ampliconic, X-transposed, X-degenerate, and heterochromatic regions indicated (Skaletsky et al. 2003). The graph shows read depth in sequenced regions (blue) and density of discovered SNPs (red). Target coordinates for bait design (bottom) are according to GRCh37. Also shown are the locations of single-copy MSY genes (Skaletsky et al. 2003; Bellott et al. 2014), as triangles pointing in the direction of transcription. TXLNGY (Putative gamma-taxilin 2) replaces the former CYorf15A and CYorf15B (Skaletsky et al. 2003).
FVenn diagram showing overlap of SNPs between NGS studies of the MSY. The total number of independent SNPs across all five studies (this study plus Francalacci et al. [2013], Poznik et al. [2013], Scozzari et al. [2014], and Wei, Ayub, Chen, et al. [2013]) is 33,479.
FMaximum-parsimony tree of MSY SNP haplotypes. (a) Major haplogroups are indicated by colors, and selected haplogroup-defining mutations are indicated on branches. Deep-rooting branches have been contracted for display. The colored bar to the right indicates population group of origin: ASC: Asia, Central; ASE: Asia, East; BRI: British Isles; SCA: Scandinavia; ENW: Europe, North West; ESW: Europe, South West; ESC, Europe, South Central; ESE: Europe, South East; MNE: Middle and Near East; MEX: Mexico; AUS: Australia; AFP: Africa, food-producers; AHG: Africa, hunter-gatherers. Supplementary figure S1, Supplementary Material online, gives tips labeled with individual sample names. (b) Simplified tree showing the true lengths for deep-rooting branches. Diagonal dashed lines indicate the positions of branch contractions in part (a).
TMRCA Estimates for Selected Clades within the Phylogeny.
| Clade | TMRCA/ka | TMRCA Range Based on Mutation Rate CI/ka | |
|---|---|---|---|
| Root | 448 | 125.8 | 50.3–419.5 |
| B-M182 | 14 | 45.6 | 18.2–152.0 |
| B2a-M150 | 2 | 16.6 | 6.7–55.5 |
| B2b-M112 | 12 | 38.1 | 15.2–127.1 |
| CR-P143 | 378 | 47.9 | 19.2–159.8 |
| C-M216 | 9 | 39.4 | 15.8–131.5 |
| DR-M168 | 427 | 48.7 | 19.5–162.4 |
| D-M174 | 5 | 34.3 | 13.7–114.4 |
| DE-M145 | 49 | 48.1 | 19.2–160.3 |
| E-P29 | 44 | 37.9 | 15.2–126.4 |
| E1b1a-M2 | 24 | 6.9 | 2.7–22.9 |
| E1b1b-M215 | 17 | 17.7 | 7.1–58.9 |
| FR-M213 | 369 | 35.2 | 14.1–117.2 |
| HF5 | 7 | 31.6 | 12.7–105.5 |
| G-M201 | 23 | 23.1 | 9.2–77.0 |
| G2a-L31 | 20 | 16.4 | 6.6–54.8 |
| H-M69 | 6 | 27.4 | 11.0–91.5 |
| I-M170 | 76 | 20.6 | 8.2–68.6 |
| I1-M253 | 46 | 3.5 | 1.4–11.5 |
| I2-P215 | 30 | 17.1 | 6.8–57.0 |
| J-M304 | 33 | 23.3 | 9.3–77.7 |
| IJ-P123 | 109 | 31.0 | 12.4–103.4 |
| J2-M172 | 28 | 21.1 | 8.4–70.3 |
| J2a-M410 | 18 | 15.2 | 6.1–50.8 |
| J2b-M102 | 10 | 11.3 | 4.5–37.8 |
| LT | 12 | 32.6 | 13.1–108.8 |
| L-M11 | 5 | 14.2 | 5.7–47.3 |
| T-M70 | 7 | 21.0 | 8.4–70.1 |
| M/LT/NO/QR-M9 | 230 | 32.6 | 13.0–108.6 |
| NO-M214 | 39 | 30.0 | 12.0–99.9 |
| N-M231 | 20 | 13.4 | 5.4–44.8 |
| N1c1-M178 | 15 | 4.6 | 1.8–15.2 |
| O-P191 | 19 | 25.6 | 10.2–85.3 |
| P-M45 | 179 | 24.2 | 9.7–80.6 |
| Q-M242 | 5 | 22.6 | 9.0–75.4 |
| R-M207 | 174 | 19.3 | 7.7–64.4 |
| R1a-M198 | 27 | 6.2 | 2.5–20.8 |
| R1b-L278 | 146 | 14.3 | 5.7–47.7 |
| R1b-M269 | 145 | 4.9 | 2.0–16.3 |
FRelationship between SNP- and STR-based TMRCA estimates. SNP-based node estimates are plotted against STR-based estimates for (a) 21 STRs, (b) 17 STRs, and (c) 13 STRs, here using ASD with the “ancestral haplotype” root specification. The black dashed line in each case indicates x = y. Underlying data and correlation coefficients are given in supplementary tables S6 and S7, Supplementary Material online.