Literature DB >> 33597750

Million-year-old DNA sheds light on the genomic history of mammoths.

Tom van der Valk^1,2,3, Patrícia Pečnerová^4,5,6, David Díez-Del-Molino^7,4,5, Anders Bergström⁸, Jonas Oppenheimer⁹, Stefanie Hartmann¹⁰, Georgios Xenikoudakis¹⁰, Jessica A Thomas¹⁰, Marianne Dehasque^7,4,5, Ekin Sağlıcan¹¹, Fatma Rabia Fidan¹¹, Ian Barnes¹², Shanlin Liu¹³, Mehmet Somel¹¹, Peter D Heintzman¹⁴, Pavel Nikolskiy¹⁵, Beth Shapiro^16,17, Pontus Skoglund⁸, Michael Hofreiter¹⁰, Adrian M Lister¹², Anders Götherström^7,18, Love Dalén^19,20,21.

Abstract

Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780-560 thousand years ago2. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.

Entities: Chemical

Mesh：

Substances：
DNA, Ancient

Year: 2021 PMID： 33597750 PMCID： PMC7116897 DOI： 10.1038/s41586-021-03224-9

Source DB: PubMed Journal: Nature ISSN： 0028-0836 Impact factor: 49.962

The recovery of genomic data from specimens that are many thousands of years old has improved our understanding of prehistoric population dynamics, ancient introgression events, and the demography of extinct species[3-5]. However, some evolutionary processes occur over time scales that have often been considered beyond the temporal limits of ancient DNA research. For example, many present-day mammal and bird species originated during the Early and Middle Pleistocene[6,7]. Palaeogenomic investigations of their speciation process would thus require recovery of ancient DNA from specimens that are at least several hundreds of thousands of years (ka) old. Mammoths (Mammuthus sp.) appeared in Africa approximately 5 million years ago (Ma) and subsequently colonised much of the Northern Hemisphere[8,9]. During the Pleistocene (2.6 Ma - 11.7 ka), the mammoth lineage underwent evolutionary changes that resulted in early species known as the southern (Mammuthus meridionalis) and steppe (M. trogontherii) mammoths, which later gave rise to the Columbian (M. columbi) and woolly (M. primigenius) mammoths[10]. Although the exact relationships among these taxa are uncertain, the prevailing view is that the Columbian mammoth evolved during an early colonisation of North America c. 1.5 Ma, whereas the woolly mammoth first appeared in northeastern Siberia c. 0.7 Ma[8,10]. M. trogontherii-like mammoths, considered to be a single species, inhabited Eurasia since at least c. 1.7 Ma, with the last populations going extinct in Europe at c. 0.2 Ma[8]. To investigate the origin and evolution of woolly and Columbian mammoths, we recovered genomic data from three northeastern Siberian mammoth molars dated to the Early and Middle Pleistocene (Fig. 1a; Extended Data Fig. 1; Extended Data Fig. 2). These molars originate from the well-documented and fossiliferous Olyorian Suite of northeastern Siberia[11], which has been dated using rodent biostratigraphy tied to the global sequence of palaeomagnetic reversals as well as to correlated faunas with absolute dating from eastern Beringia (Extended Data Fig. 2, Supplementary Section 1). One of the specimens (Krestovka) is morphologically similar to the steppe mammoth, a species originally defined from the European Middle Pleistocene (Supplementary Section 1), and was collected from Lower Olyorian deposits that have been dated to 1.2 - 1.1 Ma. The second specimen (Adycha), which is also of trogontherii-like morphology (Supplementary Section 1), is of less certain age within the Olyorian (1.2 - 0.5 Ma). However, the morphology of the Adycha specimen (Extended data Fig. 1) strongly suggests that it dates to the Early Olyorian, 1.2 - 1.0 Ma. The third specimen (Chukochya) has a morphology consistent with an early form of woolly mammoth (Extended data Fig. 1) and was discovered in a section where only Upper Olyorian deposits are exposed, implying an approximate age of 0.8 - 0.5 Ma (Supplementary Section 1).

Fig. 1

DNA-based phylogenies and specimen age estimates.

a, Geographic origin of the mammoth genomes analysed in this study. b, Phylogenetic tree built in FASTME based on pairwise genetic distances, assuming balanced minimum evolution using all nuclear sites as well as 100 resampling replicates based on 100,000 sites each. c, Bayesian reconstruction of the mitochondrial tree, with the molecular clock calibrated using radiocarbon dates of ancient samples for which a finite radiocarbon date was available, as well as assuming a lognormal prior on the divergence between the African savannah elephant (not shown in the tree) and mammoths with a mean of 5.3 Ma. Blue bars reflect 95% highest posterior densities. Circles depict the position of the newly sequenced genomes. d, Densities for age estimates of samples Adycha and Chukochya based on autosomal divergence to African savannah elephant (L. africana) and e, Densities for age estimates of samples Krestovka, Adycha and Chukochya based on mitochondrial genomes as inferred from the Bayesian mitochondrial reconstruction.

Extended Data Fig. 1

Mammoth molars and morphometric comparisons.

a-b, upper third molars in lateral and cross-sectional views; c, partial lower third molar in lateral and occlusal views. a, Chukochya (PIN-3341-737); b, Krestovka (PIN-3491-3) flipped horizontally; c, Adycha (PIN-3723-511), occlusal view flipped horizontally. Note the more closely-spaced lamellae and thinner enamel in a (primigenius-like) than b and c (trogontherii-like). d, Hypsodonty index vs lamellar length index of upper M3s; e, Enamel thickness index vs basal lamellar length index of lower M3s. Olyorian specimens yielding DNA are labelled by site name. Green dashed line: convex hull summarising Early to early Middle Pleistocene (ca. 1.5-0.5 Ma) North American Mammuthus samples (data points not shown). Green and blue squares: Early and Late Olyorian North-East Siberian samples, respectively; red and green circles: European M. meridionalis and M. trogontherii, respectively; blue circles, M. primigenius from North-East Siberia and Alaska. Note (i) similarity of Krestovka and Adycha to other Early Olyorian molars and to European steppe mammoths (M. trogontherii), (ii) similarity of early North American mammoths to these (Early Olyorian in particular), (iii) similarity of Chukochya to M. primigenius. For site details, measurement definitions and data, see Supplementary Section 1.

Extended Data Fig. 2

Sample age based on biostratigraphy, paleomagnetic reversals and genomic data.

Chart shows the stratigraphic position of the Kutuyakhian fauna, Phenacomys complex, Early Olyorian and Late Olyorian faunas in relation to important European, northwest Asian and northern North American stratigraphic benchmarks. ELMA - European Land Mammal Ages (small mammals), LMA - Land Mammal Ages (large mammals), MN/MQ - European Small Mammal Biozones, EEBU – East European biochronological units. Biostratigraphic and palaeomagnetic based chronological constraints for the specimens are provided, in comparison with the DNA-based age estimations.

We extracted DNA from the three molars using methods designed to recover highly degraded DNA fragments[12,13], converted the extracts into libraries[14], and sequenced these on Illumina platforms (Supplementary Section 2; Supplementary Table 1). The reads were merged and mapped against the African savannah elephant (Loxodonta africana) genome (LoxAfr4)[15] and an Asian elephant (Elephas maximus) mitochondrial genome[16]. We found that the DNA recovered from the Early and Middle Pleistocene specimens was considerably more fragmented and had higher levels of cytosine deamination than DNA from Late Pleistocene permafrost samples (Extended Data Figs. 3, 4, Supplementary Section 4). To circumvent this, we used conservative filters and an iterative approach designed to minimise spurious mappings of short reads (Supplementary Section 5). This approach allowed us to recover complete (>37X coverage) mitogenomes from all three specimens, and 49, 884, and 3,671 million base pairs of nuclear genomic data for Krestovka, Adycha, and Chukochya, respectively (Supplementary Table 3).

Extended Data Fig. 3

DNA fragment length distributions for nine mammoths.

Reads are aligned to the LoxAfr4 autosomes. For the three Early-Middle Pleistocene samples (Krestovka, Adycha, Chukochya), reads of 25-200 bp length are shown, whereas 30-200 bp reads are shown for the remaining samples. Ultrashort reads (<35 bp) are denoted in red and were shown to be enriched for spurious alignments and therefore excluded from downstream analyses (Supplementary Section 4). The mean read lengths (μ) were calculated using only the retained reads (≥35 bp).

Extended Data Fig. 4

Post-mortem cytosine deamination damage profiles at CpG sites.

The most ancient samples (Krestovka, Adycha, Chukochya) carry a greater frequency of cytosine deamination compared to younger permafrost preserved woolly mammoth samples (Oimyakon and Wrangel) and the Columbian mammoth (M. columbi) specimen.

DNA-based age estimates

To estimate specimen ages using mitogenome data, we conducted a Bayesian molecular clock analysis, calibrated using samples with finite radiocarbon dates (tip calibration) and a log-normal prior assuming a 5.3 Ma genomic divergence between the African elephant and mammoth lineages[15] (root calibration). This provided specimen age estimates of 1.65 Ma (95% HPD: 2.08-1.25 Ma), 1.34 (1.69-1.06 Ma), and 0.87 Ma (1.07-0.68 Ma) for Krestovka, Adycha, and Chukochya, respectively (Fig. 1c,e). We also used the autosomal genomic data to investigate the age of the higher-coverage Adycha (0.3X) and Chukochya (1.4X) specimens by estimating the number of derived changes since their common ancestor with the African elephant (Supplementary Section 6). We used an approach based on the accumulation of derived variants over time[17], assuming a constant mutation rate. This resulted in inferred ages of 1.28 Ma (95% CI 1.64-0.92 Ma) for the Adycha specimen and 0.62 Ma (95% CI 1.00-0.24 Ma) for the Chukochya specimen (Fig. 1d). Although we caution that this analysis is based on low-coverage data and the confidence intervals are wide, these estimates are similar to those obtained from the mitochondrial data. The DNA-based age estimates for the Chukochya and Adycha specimens are consistent with the independently derived geological age inferences from biostratigraphy and palaeomagnetism, whereas molecular clock dating of the Krestovka specimen suggests an older age compared to that obtained from biostratigraphy. This could mean that the Krestovka specimen had been reworked from an older geological deposit or that the mitochondrial clock rate has been underestimated. However, the confidence intervals of the genetic and geological age estimates of the Krestovka specimen are separated by only 0.05 Ma, and all estimates support an age greater than one million years.

A genetically divergent mammoth lineage

A phylogeny based on autosomal data shows that the three Early/Middle Pleistocene samples fall outside the diversity of all Late Pleistocene Eurasian mammoth genomes (Fig. 1b), including two woolly mammoth genomes from Europe (Scotland; 48 ka) and Siberia (Kanchalan; 24 ka) generated as part of this study. The phylogenetic positions of Adycha and Chukochya are consistent with these genomes being from a population directly ancestral to all Late Pleistocene woolly mammoths, whereas the Krestovka mammoth genome diverged prior to the split between Columbian and woolly mammoth genomes (Fig. 1b). Similarly, Bayesian reconstruction of a mitogenome phylogeny that included 168 Late Pleistocene mammoth specimens[18,19] places the Early Pleistocene Krestovka and Adycha specimens as basal to all previously published mammoth mitogenomes, whereas the Middle Pleistocene Chukochya mitogenome is basal to one of the three clades previously described for Late Pleistocene woolly mammoths[20] (Fig. 1c). Estimates of sequence divergence times based on both genome-wide and mitochondrial data indicate a deep split between Krestovka and all other mammoths analysed in this study. We estimate that the Krestovka mitogenome diverged from all other mammoth mitogenomes between 2.66 and 1.78 Ma (95% HPD, Fig. 1c). We obtained a similar divergence time estimate (95% CI 2.65 - 1.96 Ma) from the autosomal data, but caution that this analysis is based on limited genomic data (Supplementary Section 7). Moreover, estimates of relative divergence using F(A|B) statistics[4] show that the Krestovka nuclear genome carries fewer derived alleles than any other mammoth genome at sites where the high-coverage woolly mammoth genomes are heterozygous, further supporting that it diverged after the split with Asian elephant but before any of the other mammoth genomes analysed here (Extended Data Fig. 5, Supplementary Section 8).

Extended Data Fig. 5

F(A|B) statistics.

The statistics reflect relative divergence between the genomes on the left and the right side. Lower values indicate reduced derived allele sharing between the sample indicated on the left and the right of the graph, at sites for which the genome on the right panel is heterozygous. The lower the value, the more drift has occurred between the genomes and thus the older their genetic divergence.

Overall, these analyses suggest that two evolutionary lineages (i.e. two isolated populations persisting through time) of mammoths inhabited eastern Siberia during the latter stages of the Early Pleistocene. One of these lineages, which is represented by the Krestovka specimen, diverged from other mammoths prior to the first appearance of mammoths in North America. The second lineage comprises the Adycha specimen along with all Middle and Late Pleistocene woolly mammoths.

Origin of the Columbian mammoth

Intriguingly, several lines of evidence suggest that, compared to all other mammoths, the Columbian mammoth derives a much higher proportion of its ancestry from the lineage represented by the Krestovka mammoth. Analyses using D-statistics[4] revealed a strong signal of excess derived allele sharing between the Columbian mammoth and Krestovka (Fig. 2a, Supplementary Section 8). This is at odds with the average phylogenetic position of Krestovka being basal to all other mammoth genomes, since under a scenario without subsequent admixture the D-statistic would not deviate from zero. We further investigated this pattern using TreeMix[21]. Without modelling migration (admixture) events, none of the models fit the data (residuals >10x SE). Instead, we observed a good fit when modelling one migration event (admixture weight = 42%; residuals <2x SE) (Supplementary section 8), indicating that part of the Columbian mammoth’s ancestry is derived from the Krestovka lineage.

Fig. 2

Inferred genomic history of mammoths.

a, D-statistics where each dot reflects a comparison involving one woolly mammoth genome and one genome depicted on the right side of the panel (where L. africana = African savannah elephant, P. antiquus = straight-tusked elephant, Mammuthus sp. = all mammoth specimens in this study, M. columbi = Columbian mammoth, and M. primigenius = woolly mammoth), iterating through all possible sample combinations using the mastodon (Mammut americanum) as an outgroup. No elevated allele sharing between any of the mammoth genomes and the reference (African savannah elephant) is observed, suggesting no pronounced reference biases in the Early/Middle Pleistocene genomes. A strong affinity between Columbian mammoths and sample Krestovka is observed, as well as a relationship between the North American woolly mammoth (Wyoming) and the Columbian mammoth. b, Best fitting admixture graph model for one admixture event, suggesting a hybrid origin for the Columbian mammoth. c, Hypothesized evolutionary history of mammoths during the last 3 Ma, based on currently available genomic data. Brown dots represent mammoth specimens for which genomic data has been analysed in this study, with error bars representing 95% highest posterior density intervals from the mitogenome-based age estimates obtained for the three Early and Middle Pleistocene specimens. Arrows depict gene flow events identified from the autosomal genomic data. The European steppe mammoth (M. trogontherii) survived well into the later stages of the Middle Pleistocene, and we hypothesize that it most likely branched off from a common ancestor shared with the woolly mammoth at ~1 Ma.

To further assess the evolutionary context of the Krestovka lineage within the population history of mammoths, we used two complementary admixture graph model approaches[22,23]. We exhaustively tested all possible phylogenetic combinations relating the three ancient individuals with one Siberian woolly mammoth, one Columbian mammoth and one Asian elephant. We set the latter as outgroup, only including sites identified as polymorphic in six Asian elephant genomes to limit the effects of incorrectly called genotypes (Supplementary Section 8). None of the graph models without admixture events provided good fits to the data, thus ruling out a simple tree-like population history. In contrast, graph models with just one admixture event provided a perfect fit, explaining all 45 f-statistic combinations without significant outliers. Based on the point estimates obtained from the two different admixture graph model approaches, the Columbian mammoth is estimated to be the result of an admixture event where 38-43% of its ancestry was derived from a lineage related to Krestovka, and 57-62% from the woolly mammoth lineage (Fig. 2b, Extended Data Fig. 6). We obtained additional support for the complex ancestry of the Columbian mammoth by employing a hidden Markov model aimed at identifying admixed genomic regions from an unknown source (i.e. ghost admixture)[24] (Supplementary Section 9). This analysis, which was done without including any of the Early and Middle Pleistocene specimens, suggested that roughly 41% of the Columbian mammoth genome originates from a lineage genetically differentiated from the woolly mammoth (Extended Data Fig. 7a). We subsequently built pairwise-distance phylogenetic trees for the genomic regions identified as being the result of ghost admixture and found them closely related to the Krestovka genome (Extended Data Fig. 7b, Supplementary Section 9). In contrast, when excluding these regions, the remaining part of the Columbian mammoth genome falls within the diversity of Late Pleistocene woolly mammoths (Extended Data Fig. 7c, Supplementary Section 9).

Extended Data Fig. 7

Ghost introgression analysis of the Columbian mammoth genome.

a, The number of private alleles per 1000 bp within genomic regions identified as woolly mammoth (M. primigenius) ancestry or ghost ancestry. b, Maximum-likelihood phylogenies for those genomic regions identified as ghost ancestry in the Colombian mammoth (M. columbi) genome. c, Maximum-likelihood phylogenies for regions identified as un-admixed ancestry.

Finally, our D-statistics analysis also identified higher levels of derived allele sharing between the Columbian mammoth and a woolly mammoth from Wyoming (Fig. 2a). Based on f4-ratios, we estimate 10.7-12.7% excess shared ancestry between these genomes (Supplementary Section 9), consistent with an earlier study[15]. Since the Columbian mammoth carries a large proportion of Krestovka ancestry, gene flow from the Columbian mammoth into North American woolly mammoths would have resulted in a larger proportion of allele sharing between Krestovka and the Wyoming woolly mammoth. Our finding of no excess allele sharing between the Krestovka genome and any of the sequenced woolly mammoths, including the individual from Wyoming (Supplementary Table 7), therefore indicates that this second phase of gene flow may have been unidirectional, from woolly mammoth into the Columbian mammoth. This implies that the composition of the Columbian mammoth’s genome, as identified in the D-statistics, admixture graph models, and ghost-admixture analysis, is the result of two admixture events, where an initial ~50% contribution from each of the Krestovka and woolly mammoth lineages was followed by an additional ~12% gene flow from North American woolly mammoths (Fig. 2c).

Insights into mammoth adaptive evolution

The woolly mammoth evolved into a cold-tolerant, open-habitat specialist through a series of adaptive changes[8]. The antiquity of our genomes makes it possible to investigate when these adaptations evolved. To do this, we identified protein-coding changes for which all Late Pleistocene woolly mammoths carried the derived allele and all African and Asian elephants carried the ancestral allele (n = 5,598; Supplementary Table 8). Among the variants that could be called in the Early and Middle Pleistocene genomes, we find that 85.2% (782 out of 918) and 88.7% (2,578 out of 2,906) of the mammoth-specific protein-coding changes were already present in the genomes of Adycha (trogontherii-like) and Chukochya (early woolly mammoth), respectively (Supplementary Section 10, Supplementary Table 9). Moreover, we did not detect significant differences in the ratio of shared non-synonymous versus synonymous sites among our sequenced Early, Middle, and Late Pleistocene genomes (Supplementary Table 9). Thus, despite the transitions in climate and mammoth morphology at the onset of the Middle Pleistocene, we do not observe any marked change in the rate of protein-coding mutations during this time period. Previous analyses have identified specific genetic changes that are thought to underlie a suite of woolly mammoth adaptations to the Arctic environment[25]. For these variants (n = 91), we assessed whether the Adycha and Chukochya genomes shared the same amino acid changes as those observed in Late Pleistocene woolly mammoths (Supplementary Table 10). We find that among genes possibly involved in hair growth, circadian rhythm, thermal sensation, and white and brown fat deposits, the vast majority of coding changes were present in both the Adycha (87%) and Chukochya (89%) genomes (Supplementary Table 10). This suggests that Siberian trogontherii-like mammoths (i.e. Adycha) had already developed a woolly fur as well as several physiological adaptations to a cold high-latitude environment (Supplementary Section 11). However, in one of the best studied genes in the woolly mammoth, TRPV3, which encodes a temperature-sensitive transient receptor channel, potentially involved in thermal sensation and hair growth[25], we find that only two out of four amino-acid changes identified in Late Pleistocene woolly mammoths were present in the early woolly mammoth genome (Chukochya). This indicates that non-synonymous changes in this gene occurred over several hundreds of thousands of years, rather than during a single brief burst of adaptive evolution.

Discussion

Our genomic analyses suggest that the Columbian mammoth is a product of admixture between woolly mammoths and a previously unrecognised ancient mammoth lineage represented by the Krestovka specimen. Given the finding that each of these lineages initially contributed roughly half of their genome to this ancient admixture, we propose that the origin of the Columbian mammoth constitutes a hybrid speciation event[26]. This hybridisation event appears not to have imparted any shift in average molar morphology of North American populations[10], but can explain the mitochondrial-nuclear discordance in the Columbian mammoth[18] where all known Columbian mammoth mitogenomes are nested within the woolly mammoth’s mitogenome diversity (Fig. 1c). Based on the mitogenome phylogeny, we estimate that the most recent common female ancestor of all Late Pleistocene Columbian mammoths lived approximately 420 ka (95% HPD 511 - 338 ka), providing a likely minimum age for when this hybridization event occurred (Fig. 1c). Since mammoths had already appeared in North America by 1.5 Ma, these findings imply that prior to the hybridisation event, North American mammoths belonged to the Krestovka lineage. Given the morphology of the Krestovka specimen, this corroborates the model proposed by Lister & Sher[10] that the earliest North American mammoths were derived from a trogontherii-like Eurasian ancestor, rather than originating from an expansion of the southern mammoth (M. meridionalis) into North America[27]. Our findings demonstrate that genomic data can be recovered from Early Pleistocene specimens, opening up the possibility of studying adaptive evolution across speciation events. The mammoth genomes presented here offer a glimpse of this potential. Even though the transition from trogontherii-like (Adycha) to woolly (Chukochya) mammoths represents a significant change in molar morphology (Extended data Fig. 1), we do not observe an increased rate of genome-wide selection during this time period. Moreover, many key adaptations identified in Late Pleistocene mammoth genomes were already present in the Early Pleistocene Adycha genome. We thus find no evidence for an increased rate of adaptive evolution associated with the origin of the woolly mammoth. This is consistent with previous work suggesting that the major shift in habitat and morphology of mammoths happened earlier, between meridionalis-like and trogontherii-like mammoths[8,10]. The retrieval of DNA older than one million years confirms previous theoretical predictions[1] that the ancient genetic record can be extended beyond what has been previously shown. We anticipate that additional recovery and analyses of Early and Middle Pleistocene genomes will further improve our understanding of the complex nature of evolutionary change and speciation. Our results highlight the importance of perennially frozen environments for extending the temporal limits of DNA recovery, and hint at a future deep-time chapter of ancient DNA research that will likely be predominantly fueled by specimens from high latitudes.

Methods

Morphometry of mammoth molars

Mammoth molars were measured according to the method described in Lister & Sher[10] (Supplementary Section 1). Samples considered are as follows: Mammuthus meridionalis, ca. 2.0 Ma, Upper Valdarno, Italy (type locality) (n=34); M. trogontherii, ca. 0.6 Ma, Süssenborn, Germany (type locality) (n=48); M. primigenius, Late Pleistocene of North-East Siberia (Russia) and Alaska (USA) (n=28). Early (n=8) and Late (n=15) Olyorian samples are from localities in the Yana-Kolyma lowland (Early Olyorian is ~1.2 – 0.8 Ma, Late Olyorian is 0.8 – 0.5 Ma; Extended Data Fig. 2). North American Early to early Middle Pleistocene samples (ca. 1.5 – 0.5 Ma) are from Old Crow (Yukon, Canada), Leisey Shell Pit 1A and Punta Gorda (Florida, USA), and the Ocotillo Formation (California, USA) (combined n=16). Original data are from Lister & Sher[10], where further details on sites and collections can be found.

DNA extraction and sequencing

Samples from Early-Middle Pleistocene mammoth molars (Krestovka, Adycha, Chukochya) as well as Late Pleistocene samples (Scotland, Kanchalan) were processed in dedicated ancient DNA laboratories following standard ancient DNA practices (Supplementary Section 2). Following DNA extraction[12], we constructed double- or single-stranded Illumina libraries[14,28], which were treated to remove uracils caused by post-mortem cytosine deamination[13]. We subsequently sequenced these libraries using Illumina platforms, generating from 200 to 2,350 million paired-end reads (2× 50 or 2×150 bp) per specimen (Supplementary Table 1).

Sequence data processing and mapping

We combined our sequence data with previously published genomic data from elephantids generated by Palkopoulou et al.[15] (Supplementary Table 2). For the five samples sequenced in this study, we trimmed adapters and merged paired-end reads using SeqPrep v1.1[29], initially retaining reads either ≥25 bp (Krestovka, Adycha, Chukochya) or ≥30 bp (Scotland, Kanchalan), and with a minor modification in the source code that allowed us to choose the best base quality score in the merged region instead of aggregating the scores[5] (Supplementary Section 3). For genomic data from the straight-tusked elephant, and the Scotland and Kanchalan mammoths, which had been treated with the afu UDG enzyme leaving post-mortem DNA damage at the ends of the molecules (Supplementary Tables 2 and 3), we removed the first and last two base pairs from all reads before mapping. The merged reads were mapped to a composite reference, consisting of the African savannah elephant nuclear genome (LoxAfr4), woolly mammoth mitogenome (DQ188829), and the human genome (hg19) using BWA aln v0.7.8 with deactivated seeding (-l 16,500), allowing for more substitutions (-n 0.01) and up to two gaps (-o 2)[30,31]. The human genome was included as a decoy to filter out spurious mappings in genomic conserved regions[32]. Next, we removed PCR duplicates from the alignments using a custom python script[5]. After obtaining initial quality metrics for the genomes, we removed reads <35 base pairs from the BAM-files using samtools v1.10[33] and awk for all remaining analysis (Supplementary Section 4).

Ancient DNA authenticity and quality assessment

All ancient genomes were treated to reduce post-mortem DNA damage. For the most ancient samples (Krestovka, Adycha, Chukochya), we took several steps to assess the authenticity and quality of the data (Supplementary Section 4). First, only reads that mapped uniquely to non-repetitive regions of the LoxAfr4 reference and had a mapping quality ≧30 were retained, whereas reads that mapped equally well to the human genome reference (hg19) in our composite reference were removed to reduce possible biases caused by contaminant human reads[32]. Second, we employed a method based on the rate of mismatches per base pair to the reference to assess the rate of spurious mappings for all reads between 20-35 bp and at 5 bp intervals between 35-50 bp (Supplementary Section 4). This allowed us to identify a sample-specific minimum read length cutoff, above which we consider reads to be correctly mapped and endogenous (Supplementary Section 4, Supplementary Table 3). Based on this, we applied the longest sample-specific cutoff (≥35 bp, Krestovka) for all samples. We used mapDamage v2.0.6[34] to obtain read length distributions for all ancient samples. Finally, an assessment of cytosine deamination profiles at CpG sites, which are unaffected by UDG treatment[13], was done using the platypus option in PMDtools (github.com/pontussk/PMDtools)[35]. A full set of ancient DNA quality statistics are available in Supplementary Tables 1–3.

Allele sampling

To minimize coverage-related biases, all subsequent analyses were based on pseudo-haploidized sequences that were generated by randomly selecting a single high quality base call at each autosomal genomic site using ANGSD v0.921[36]. For base calling we only considered reads ≥35 bp, a mapping and base quality ≥30, and reads without multiple best hits (-uniqueOnly 1). Finally, we masked all sites within repetitive regions as identified with RepeatMasker v.4.0.7[37], CpG sites, sites with more than two alleles among all individuals, and sites with coverage above the 95th percentile of the genome-wide average to reduce false calls from duplicated genomic regions.

Reconstruction of mitogenomes, tip-dating, and mtDNA phylogeny

Mitochondrial genomes for the five newly sequenced samples were assembled using MIA[38] with the Asian elephant (NC_005129)[16] mitogenome as reference for Adycha, Krestovka, and Chukochya and the mammoth mitogenome (NC_007596) as reference for the Late Pleistocene woolly mammoth samples from Scotland and Kanchalan, restricting the input reads to those ≥35 bp for each (Supplementary Section 5). This yielded mitochondrial assemblies with coverage of 37.8×, 47.5×, and 77.1× for Adycha, Krestovka, and Chukochya, and 99.6x and 179.5x for Scotland and Kanchalan, respectively. These assemblies were then aligned using Muscle v3.8.31[39] together with previously published elephantid mitogenomes[18,19,40]. Following alignment partitioning, the HKY model with a gamma-distributed rate heterogeneity[41] and a proportion of invariant sites or just a proportion of invariant sites, was identified as best-fitting for each alignment partition using jModelTest v2.1.10[42] (Supplementary Section 5). To estimate the age of the three oldest Mammuthus samples (Adycha, Krestovka, Chukochya), we performed a Bayesian reconstruction of the phylogenetic tree using BEAST v1.10.4[43]. We calibrated the molecular clock using tip ages for all ancient samples with a finite radiocarbon date, as well as a lognormal prior of 5.3 Ma on the genetic divergence of Loxodonta and Elephas/Mammuthus as obtained from previous genomic studies[15] (Supplementary Table 4). In addition, we tested for an older divergence (7.6 Ma) between Loxodonta and Mammuthus that is more consistent with the fossil record[16] (see Supplementary Section 5). For both priors, we used a standard deviation of 500,000 years. We assumed a strict molecular clock and the flexible skygrid coalescent model[44] to account for the complex cross-generic demographic history of the included taxa. The ages of all samples beyond the limit of radiocarbon dating were estimated by sampling from lognormal distributions with priors based on stratigraphic context and previous genetic studies, using two MCMC chains of 100 million generations, sampling every 10,000 and discarding the first 10% as burn-in (Supplementary Table 5, Supplementary Section 5).

Genetic dating based on autosomal data

Specimen age estimates for Adycha and Chukochya (Krestovka was excluded as too few autosomal bases were available for this analysis) were estimated based on the autosomal data following the method described in Meyer et al. [17], using the American mastodon (Mammut americanum), which is an outgroup to all elephantids, and the African savannah and Asian elephant genomes as outgroups. We inferred the ancestral state for a given base in the African elephant reference genome by requiring that the alignments of the mastodon, two African elephants and five Asian elephants are present and identical at that nucleotide. We used the high coverage and radiocarbon dated Wrangel Island woolly mammoth genome as a calibration point[5]. Each difference to the ancestral state was then counted for the Wrangel genome and the focal Mammuthus genome for all sites at which both genomes had a called base. We calculated the relative age of each individual as (nW − nM)/nW, based on the number of derived changes in the Wrangel genome (nW) and the other Mammuthus genome (nM), using an assumed divergence time of 5.3 million years[15] to the common ancestor of African elephant and woolly mammoth. Age variance estimates were calculated in windows of 5 Mb and we computed bootstrap confidence intervals as 1.96× standard error around the date estimates (Supplementary Section 6).

Nuclear genetic relationships and phylogeny

We reconstructed phylogenetic trees based on the whole genome Identical-By-State (IBS) matrix for all individuals using the “doIBS” function in ANGSD. We calculated pairwise genetic distances between individuals using the full dataset, as well as 100 resampling replicates based on 100,000 sites each. Second, we obtained the phylogenetic tree using a balanced minimum evolution (ME) method as implemented in FASTME[45] (Fig. 1b, Supplementary Section 7). Next, we inferred relative population split times using an approach that examines single nucleotide polymorphic (SNP) positions that are heterozygous in an individual from one population and measures the fraction of these sites at which a randomly sampled allele from an individual of a second population carries the derived variant, polarized by an outgroup (F(A|B) statistics)[4]. We ascertained heterozygous sites in three high-coverage genomes — and M. primigenius (Oimyakon and Wrangel)[5] — using the SAMtools v.1.10[33] ‘mpileup’ command and bcftools. We only included SNPs with a quality ≥30, and filtered out all SNP in repetitive regions, within 5 bp from indels, at CpG sites and sites below 1/3 or above two times the genome-wide average coverage. For each of the Mammuthus genomes, we then estimated the proportion of sites for which a randomly drawn allele at the ascertained heterozygous sites matches the derived state.

D, f4 statistics, AdmixtureGraphs and TreeMix

We first used Admixtools v5[22] to calculate D- and f4-statistics for all possible quadruple combinations of samples iterating through the three different groups (P1, P2, P3,) based on the randomly sampled alleles, conditioning on all sites that are polymorphic among the 6 Asian elephant genomes[22]. The mastodon was used as an outgroup in all comparisons (Supplementary Table 6, 7). Direct estimates of genomic ancestries using f4-ratios were additionally calculated for specific pairs in AdmixTools (Supplementary section 9)[22]. Second, we used the admixturegraph R package[23] to assess the genetic relationship among the Mammuthus genomes using admixture graph models, fitting graphs to all possible f-statistics involving a given set of genomes. To resolve the relationships of the Adycha, Krestovka and Chukochya individuals within the population history of mammoths, we exhaustively tested all 135,285 possible admixture graphs (with up to two admixture events) relating these three individuals, one woolly mammoth (Wrangel), one Columbian mammoth, and one Asian elephant, setting the latter as outgroup (Supplementary Section 8). We repeated the admixturegraph analysis using the above described f4-statistic with qpBrute[46], which in addition allowed us to estimate shared genetic drift and branch lengths using f2 and f3 statistics. At each step, insertion of a new node was tested at all branches of the graph, except the outgroup branch. Where a node could not be inserted without producing f4 outliers (i.e. |Z| >=3), all possible admixture combinations were also attempted. The resulting list of all fitted graphs was then passed to the MCMC algorithm implemented in the admixturegraph R package, to compute the marginal likelihood of the models and their Bayes Factors. Finally, we estimated genetic relationships and admixture among the Mammuthus samples using TreeMix v1.12[21]. We first estimated the allele frequencies among the randomly sampled alleles and subsequently ran the TreeMix model accounting for linkage disequilibrium (LD) by grouping sites in blocks of 1,000 SNPs (-k 1,000) setting the E. maximus samples as root. Standard errors (-SE) and bootstrap replicates (-bootstrap) were used to evaluate the confidence in the inferred tree topology. After constructing a maximum-likelihood tree, migration events were added (−m) and iterated 10 times for each value of m (1–10) to check for convergence in the likelihood of the model as well as the explained variance following each addition of a migration event. The inferred maximum-likelihood trees were visualized with the in-built TreeMix R script plotting functions.

Introgression in the Columbian mammoth

We further tested for admixture in the Columbian and Scotland mammoths using a hidden Markov model[24]. This method identifies genomic regions within a given individual that possibly came from an admixture event with a distant lineage not present in the dataset based on the distribution of private sites. Briefly, we estimated the number of callable sites, the SNP density (as a proxy for per-window mutation rate) and the number of private variants with respect to all other elephant genomes except Krestovka in 1 kb windows. We applied settings without gene flow, or with one gene flow event with starting probabilities and decoding described in Supplementary Section 9. We tested for ghost admixture in the Columbian mammoth using sites private to the Columbian mammoth with respect to all other genomes in this study except Krestovka. We subsequently obtained fasta-alignments for those autosomal regions identified as “unadmixed” and “ghost-admixed” in the Columbian mammoths by calling a random base at each covered position using ANGSD. Minimal evolution phylogenies were then obtained for both alignments as described in the ‘Nuclear genetic relationships and phylogeny’ section.

Genetic adaptations of the woolly mammoth

To investigate the timing of genetic adaptations in the woolly mammoth lineage, we used last v1170[47] to build a chain file to lift over our sampled allele dataset mapped to LoxAfr4 to the annotated LoxAfr3 reference genome. Following construction of a reference index using lastdb (-P0 -uNEAR -R01), we aligned the two references using lastal (-m50 -E0.05 -C2). The alignment was converted to MAF format (last-split -m1) and finally to a chain file with the maf-convert tool (last.cbrc.jp). The Picard Liftover tool (‘Picard Toolkit’, 2019) was then used to lift over the identified variants to the LoxAfr3 reference. Using the African savannah elephant genome annotation (LoxAfr3.gff), we identified all amino-acid changes where all Late Pleistocene woolly mammoth genomes carry the derived state and all other elephantid genomes carry the ancestral allele using VariantEffectPredictor[48]. For all identified amino-acid changes, we assessed the state (derived or ancestral) among the three oldest samples (Krestovka, Adycha, Chukochya) and the Columbian mammoth (Supplementary Table 8–10). In addition, we conducted a Gene Ontology enrichment on all genes for which the woolly mammoth genomes (including Chukochya and Adycha) are derived, using GOrilla[49]. Finally, we used PAML v1.3.1[50] to identify genes that potentially have been under positive selection in Late Pleistocene woolly mammoths (Supplementary Table 11, Supplementary Section 10).

Mammoth molars and morphometric comparisons.

Sample age based on biostratigraphy, paleomagnetic reversals and genomic data.

DNA fragment length distributions for nine mammoths.

Post-mortem cytosine deamination damage profiles at CpG sites.

F(A|B) statistics.

qpGraph model.

The most parsimonious graph model (highest Bayes Factor) of the phylogenetic relationships among mammoths lineages augmented with one admixture event. Branch lengths are given in f-statistic units multiplied by 1,000. Discontinuous lines show admixture events between lineages, with percentages representing admixture proportions.

Ghost introgression analysis of the Columbian mammoth genome.

28 in total

Review 1. Depression in patients with epilepsy: an overview.

Authors: M M Robertson
Journal: Semin Neurol Date: 1991-06 Impact factor: 3.420

2. Ancient admixture in human history.

Authors: Nick Patterson; Priya Moorjani; Yontao Luo; Swapan Mallick; Nadin Rohland; Yiping Zhan; Teri Genschoreck; Teresa Webster; David Reich
Journal: Genetics Date: 2012-09-07 Impact factor: 4.562

3. [Syndromes of compression of the nerves (author's transl)].

Authors: A Mumenthaler
Journal: Ther Umsch Date: 1974-01

4. [Use of homeoplastic auditory ossicles for chain defects within the scope of tympanoplasty].

Authors: J Baumann
Journal: Z Laryngol Rhinol Otol Date: 1971-02

5. [Acute Mendelsohn's syndrome. (Case report)].

Authors: W Dick
Journal: Anaesthesist Date: 1969-06 Impact factor: 1.041

6. Effects of N-aralkyl substitution of beta-agonists on alpha- and beta-adrenoceptor subtypes: pharmacological studies and binding assays.

Authors: N Decker; M C Quennedey; B Rouot; J Schwartz; J Velly
Journal: J Pharm Pharmacol Date: 1982-02 Impact factor: 3.765

7. Quantitative relationships between an influenza virus and neutralizing antibody.

Authors: H P Taylor; S J Armstrong; N J Dimmock
Journal: Virology Date: 1987-08 Impact factor: 3.616

8. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius.

Authors: Ian Barnes; Beth Shapiro; Adrian Lister; Tatiana Kuznetsova; Andrei Sher; Dale Guthrie; Mark G Thomas
Journal: Curr Biol Date: 2007-06-07 Impact factor: 10.834

9. Inference of population splits and mixtures from genome-wide allele frequency data.

Authors: Joseph K Pickrell; Jonathan K Pritchard
Journal: PLoS Genet Date: 2012-11-15 Impact factor: 5.917

10. admixturegraph: an R package for admixture graph manipulation and fitting.

Authors: Kalle Leppälä; Svend V Nielsen; Thomas Mailund
Journal: Bioinformatics Date: 2017-06-01 Impact factor: 6.937

30 in total

1. A mammoth contribution to ancient genomics.

Authors: Darren J Burgess
Journal: Nat Rev Genet Date: 2021-04 Impact factor: 53.242

2. Russia's war in Ukraine is disrupting studies of ancient life.

Authors: Freda Kreier
Journal: Nature Date: 2022-06-30 Impact factor: 49.962

Review 3. Paleoproteomics.

Authors: Christina Warinner; Kristine Korzow Richter; Matthew J Collins
Journal: Chem Rev Date: 2022-07-15 Impact factor: 72.087

4. 100-My history of bornavirus infections hidden in vertebrate genomes.

Authors: Junna Kawasaki; Shohei Kojima; Yahiro Mukai; Keizo Tomonaga; Masayuki Horie
Journal: Proc Natl Acad Sci U S A Date: 2021-05-18 Impact factor: 11.205

5. Performance and automation of ancient DNA capture with RNA hyRAD probes.

Authors: Tomasz Suchan; Mariya A Kusliy; Naveed Khan; Loreleï Chauvey; Laure Tonasso-Calvière; Stéphanie Schiavinato; John Southon; Marcel Keller; Keiko Kitagawa; Johannes Krause; Alexander N Bessudnov; Alexander A Bessudnov; Alexander S Graphodatsky; Silvia Valenzuela-Lamas; Jarosław Wilczyński; Sylwia Pospuła; Krzysztof Tunia; Marek Nowak; Magdalena Moskal-delHoyo; Alexey A Tishkin; Alexander J E Pryor; Alan K Outram; Ludovic Orlando
Journal: Mol Ecol Resour Date: 2021-10-15 Impact factor: 8.678

6. Additional evaluations show that specific BWA-aln settings still outperform BWA-mem for ancient DNA data alignment.

Authors: Adrien Oliva; Raymond Tobler; Bastien Llamas; Yassine Souilmi
Journal: Ecol Evol Date: 2021-12-17 Impact factor: 2.912

7. Promiscuous molecules for smarter file operations in DNA-based data storage.

Authors: Kyle J Tomek; Kevin Volkel; Elaine W Indermaur; James M Tuck; Albert J Keung
Journal: Nat Commun Date: 2021-06-10 Impact factor: 14.919

Review 8. Malaria in Europe: A Historical Perspective.

Authors: Mahmoud A Boualam; Bruno Pradines; Michel Drancourt; Rémi Barbieri
Journal: Front Med (Lausanne) Date: 2021-06-30

9. The elusive parasite: comparing macroscopic, immunological, and genomic approaches to identifying malaria in human skeletal remains from Sayala, Egypt (third to sixth centuries AD).

Authors: Alvie Loufouma Mbouaka; Michelle Gamble; Christina Wurst; Heidi Yoko Jäger; Frank Maixner; Albert Zink; Harald Noedl; Michaela Binder
Journal: Archaeol Anthropol Sci Date: 2021-06-14 Impact factor: 1.989

10. Sedimentary ancient DNA shows terrestrial plant richness continuously increased over the Holocene in northern Fennoscandia.

Authors: Dilli P Rijal; Peter D Heintzman; Youri Lammers; Nigel G Yoccoz; Kelsey E Lorberau; Iva Pitelkova; Tomasz Goslar; Francisco J A Murguzur; J Sakari Salonen; Karin F Helmens; Jostein Bakke; Mary E Edwards; Torbjørn Alm; Kari Anne Bråthen; Antony G Brown; Inger G Alsos
Journal: Sci Adv Date: 2021-07-30 Impact factor: 14.136