| Literature DB >> 34028527 |
K D Ahlquist1,2, Mayra M Bañuelos1,2, Alyssa Funk1,2, Jiaying Lai1,3, Stephen Rong1,2, Fernando A Villanea1,4, Kelsey E Witt1,5.
Abstract
The archaic ancestry present in the human genome has captured the imagination of both scientists and the wider public in recent years. This excitement is the result of new studies pushing the envelope of what we can learn from the archaic genetic information that has survived for over 50,000 years in the human genome. Here, we review the most recent ten years of literature on the topic of archaic introgression, including the current state of knowledge on Neanderthal and Denisovan introgression, as well as introgression from other as-yet unidentified archaic populations. We focus this review on four topics: 1) a reimagining of human demographic history, including evidence for multiple admixture events between modern humans, Neanderthals, Denisovans, and other archaic populations; 2) state-of-the-art methods for detecting archaic ancestry in population-level genomic data; 3) how these novel methods can detect archaic introgression in modern African populations; and 4) the functional consequences of archaic gene variants, including how those variants were co-opted into novel function in modern human populations. The goal of this review is to provide a simple-to-access reference for the relevant methods and novel data, which has changed our understanding of the relationship between our species and its siblings. This body of literature reveals the large degree to which the genetic legacy of these extinct hominins has been integrated into the human populations of today.Entities:
Keywords: Denisovans; Neanderthals; archaic introgression; human evolution
Mesh:
Year: 2021 PMID: 34028527 PMCID: PMC8480178 DOI: 10.1093/gbe/evab115
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Summary of inferred periods of archaic introgression between anatomically modern humans and archaic humans. Time is represented vertically (but not to scale), with the present time on top, and deep time roughly corresponding to the Holocene-Late Pleistocene, Late Pleistocene, and Middle Pleistocene. Anatomically modern human populations are represented in blue, two Neanderthal populations in red, two Denisovan populations in green, and superarchaic in yellow (this represents one or more populations of hominin that may have contributed to the genome ancestry of modern humans). Possible deep structure in African populations is represented in purple. Horizontal lines indicate gene flow between two populations, but may represent single or multiple gene flow events between the same two populations. Arrows indicate the scientific source which postulates each introgression event. The star notes that ancient African substructure and superarchaic introgression were postulated as alternative hypotheses to explain the same data pattern. It should be noted that in cases where older scientific articles postulated introgression from a population which later came to be understood as separate populations, we assigned the introgression to a specific population, such as European and Siberian Neanderthals, and Oceanian and Siberian Denisovans.
Classification of different methods for genome-wide inference of archaic introgression. An underlying but unknown model (M) gives rise to a pattern of mating and reproduction that can be represented by a data structure, such as an ancestral recombination graph (A), for which we can observe genomic data (D). The information in D can further be simplified by calculating summary statistics (S). Our objective is to gain information about M. In practice, this objective is approached in a number of ways: (a) Comparison of genome variation using either archaic or unadmixed (usually African) reference genomes. (b) Using summary statistics (S) to compute the likelihood (L) under M, or the probability (P) of S under alternate models M0. Computing S from D summarizes salient information about A, for which is possible to make inferences about M using null hypothesis testing (D-statistics), Maximum Likelihood Estimation (SFS, LD), or Bayesian inference (gene tree methods). (c) Attempting to infer ARG (A) directly from D, or from simulations (ARGweaver, Relate, and tsinfer). (d) Using the ARG or simulated ARG to predict introgressed branch lengths. Predictions about coalescent wait times informed from A are used to classify genome segments (ARGweaver-D). (e) Simulating data summaries (S), which could be user-defined statistics (Approximate Bayesian Computation) or automatically learned summaries (Machine Learning). Here, many mappings of M to S are generated from the simulations, and used to learn an inverse mapping from S to M in empirical data. This applies to ABC (P(M|S)) or ML (f: S→M).
Methods for Detecting and Identifying Archaic Admixture
| Method | Publication | Link |
|---|---|---|
| Admixfrog |
|
|
| ArChie |
|
|
| Conditional Random Field (CRF) |
| |
| Conditional Site Frequency Spectrum (CSFS) |
| |
| ARGWeaver-D |
|
|
| Convolutional Neural Network (CNN) |
|
|
|
|
| |
| diCal-admix, CSD |
|
|
| HMM |
|
|
| IBDmix |
|
|
| IMa3 |
|
|
| Legofit | Rogers (2019) |
|
| Moments, moments. LD |
|
|
| Relate |
|
|
| RD, U, and Q95 Statistics |
| |
| S* |
|
|
| SPrime |
|
|
| tsinfer |
|
|
| VolcanoFinder |
|
|
Note.—The publication where each method is described is given. Where available, links to code repositories are also provided.
Select SNPs and Genes with Archaic Origin and Their Function Effects
| SNPs and Genes Linked to Phenotype | Reference | SNPs and Genes Linked to Phenotype | Reference | ||
|---|---|---|---|---|---|
| 1 |
|
| 21 |
|
|
| 2 |
|
| 22 | rs1834481, interleukin-18 levels |
|
| 3 | Increase in plasma prothrombin time (rs6013) |
| 23 | rs11175593, Crohn's disease |
|
| 4 |
| Huerta-Sanchez 2014D, T | 24 | rs11564258, |
|
| 5 | rs28387074, decreased concentration of hemoglobin |
| 25 | rs3118914, reduced height |
|
| 6 |
|
| 26 | rs72728264, decrease in mean corpuscular hemoglobin |
|
| 7 |
|
| 27 |
|
|
| 8 |
|
| 28 |
|
|
| 9 |
|
| 29 |
|
|
| 10 |
|
| 30 |
|
|
| 11 | rs12531711, systemic lupus erthematosus, primary biliary cirrhosis |
| 31 |
|
|
| 12 |
|
| 32 |
|
|
| 13 | rs3025343, smoking behavior |
| 33 |
|
|
| 14 | rs7076156 Crohn's disease |
| 34 |
|
|
| 15 | rs12571093, optic disc size |
| 35 | rs75493593, type-2 diabetes |
|
| 16 |
|
| 36 | rs75418188, type-2 diabetes |
|
| 17 |
|
| 37 | rs117767867, type-2 diabetes |
|
| 18 |
|
| 38 |
|
|
| 19 | rs11030043, symptoms involving urinary system |
| 39 | rs17632542, reduced risk of prostate cancer |
|
| 20 |
|
|
Note.—All SNPs and genes have evidence for archaic introgression and functional effect. E—European, indicating that the SNP or gene was identified in a modern European population. N—Neanderthal, indicating that the source of the SNP or gene was a Neanderthal population. T—Tibetan, indicating that the SNP or gene was identified in a modern Tibetan population. D—Denisovan, indicating that the source of the SNP or gene was a Denisovan population. Citations without a modern population indicated (E or T) were detected using a broad panel of modern populations. Citations without Neanderthal or Denisovan indicated were detected using a method that generated a more general result of archaic introgression, without a specific population specified.
Distribution of a select subset of functionally associated SNPs of Neanderthal and Denisovan origin, and genes associated with functional phenotypes in the autosomes. Details on SNP-phenotype and gene–phenotype pairs shown in this figure can be found in table 2.