| Literature DB >> 35084493 |
Rui Martiniano1,2, Bianca De Sanctis1,3, Pille Hallast4,5, Richard Durbin1,5.
Abstract
Joint phylogenetic analysis of ancient DNA (aDNA) with modern phylogenies is hampered by low sequence coverage and post-mortem deamination, often resulting in overconservative or incorrect assignment. We provide a new efficient likelihood-based workflow, pathPhynder, that takes advantage of all the polymorphic sites in the target sequence. This effectively evaluates the number of ancestral and derived alleles present on each branch and reports the most likely placement of an ancient sample in the phylogeny and a haplogroup assignment, together with alternatives and supporting evidence. To illustrate the application of pathPhynder, we show improved Y chromosome assignments for published aDNA sequences, using a newly compiled Y variation data set (120,908 markers from 2,014 samples) that significantly enhances Y haplogroup assignment for low coverage samples. We apply the method to all published male aDNA samples from Africa, giving new insights into ancient migrations and the relationships between ancient and modern populations. The same software can be used to place samples with large amounts of missing data into other large non-recombining phylogenies such as the mitochondrial tree.Entities:
Keywords: Y chromosome haplogroups; ancient DNA; phylogenetic placement
Mesh:
Substances:
Year: 2022 PMID: 35084493 PMCID: PMC8857924 DOI: 10.1093/molbev/msac017
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Overview of pathPhynder workflow. We illustrate the method using a small simulated data set of 6 reference samples and 112 SNPs. (A) The initial step is the assignment of phylogenetically informative SNPs in the reference data set to branches. This can be achieved with phynder by estimating the likelihood of each SNP at any given branch of the tree. (B) A pileup from aDNA reads is generated at each SNP, then filtered for mismatches and potential deamination. Here, because SNP3 is defined by alleles G and C, the T base is excluded as likely to be caused by post-mortem deamination. (C) Best path method: aDNA sample genotypes for each SNP are assigned to the corresponding branch of the tree and binned into support and conflict categories. In this case the best path is supported by 56 derived markers (green), of which 55 are above the assigned branch and one is on the branch, with no conflicting markers along the chosen path [55-0; 1-0]. (D) Maximum likelihood method: the likelihoods for placing the query sample on each edge of the tree are converted to posterior probabilities using Bayes’ rule and branches with posterior probability greater than 0.01 are indicated (largest posterior in green). The blue circle shows the lowest branch in the tree for which the sum of posterior probabilities for the whole clade below that branch (including the branch in question) is greater than 0.99, providing a conservative assignment when placement is uncertain. The arrows point to the correct location for the query sample.
Fig. 2.Improvement of Y-chromosome lineage resolution for 52 low coverage samples assigned to higher level branches in the literature. Blue crosses: published assignments. Orange crosses: reassignments by pathPhynder, including ISOGG haplogroup. The phylogenetic tree (inset) provides an example of this process for sample ASH008 (Feldman ).
Fig. 3.pathPhynder placement of ancient African samples into the Y-chromosome phylogeny. (A) A and B lineages, which are mostly composed of present-day San, Mbuti, and Biaka Pygmy populations and ancient hunter-gatherer groups. (B) E1b1b1a1 lineages carried by Morocco Ibemaurusian period samples and one Jordan PPNB individual. (C) E1b1b1b1 lineages mostly present in Algerian Mozabite populations and shared with Moroccan Early Neolithic samples. (D) E1b1b1b2 lineages present in PN samples from East Africa and Levantine Natufians to whom they are ancestrally related.