Literature DB >> 31511700

Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny.

Enrico Cappellini1,2, Frido Welker3,4, Luca Pandolfi5, Jazmín Ramos-Madrigal3, Diana Samodova6, Patrick L Rüther6, Anna K Fotakis3, David Lyon6, J Víctor Moreno-Mayar7, Maia Bukhsianidze8, Rosa Rakownikow Jersie-Christensen6, Meaghan Mackie3,6, Aurélien Ginolhac9, Reid Ferring10, Martha Tappen11, Eleftheria Palkopoulou12, Marc R Dickinson13, Thomas W Stafford14, Yvonne L Chan15, Anders Götherström16, Senthilvel K S S Nathan17, Peter D Heintzman18,19, Joshua D Kapp18, Irina Kirillova20, Yoshan Moodley21, Jordi Agusti22,23, Ralf-Dietrich Kahlke24, Gocha Kiladze25, Bienvenido Martínez-Navarro22,23,26, Shanlin Liu3,27, Marcela Sandoval Velasco3, Mikkel-Holger S Sinding3,28, Christian D Kelstrup6, Morten E Allentoft7, Ludovic Orlando7,29, Kirsty Penkman13, Beth Shapiro18,30, Lorenzo Rook5, Love Dalén15, M Thomas P Gilbert3,31, Jesper V Olsen32, David Lordkipanidze8, Eske Willerslev33,34,35,36.   

Abstract

The sequencing of ancient DNA has enabled the reconstruction of speciation, migration and admixture events for extinct taxa1. However, the irreversible post-mortem degradation2 of ancient DNA has so far limited its recovery-outside permafrost areas-to specimens that are not older than approximately 0.5 million years (Myr)3. By contrast, tandem mass spectrometry has enabled the sequencing of approximately 1.5-Myr-old collagen type I4, and suggested the presence of protein residues in fossils of the Cretaceous period5-although with limited phylogenetic use6. In the absence of molecular evidence, the speciation of several extinct species of the Early and Middle Pleistocene epoch remains contentious. Here we address the phylogenetic relationships of the Eurasian Rhinocerotidae of the Pleistocene epoch7-9, using the proteome of dental enamel from a Stephanorhinus tooth that is approximately 1.77-Myr old, recovered from the archaeological site of Dmanisi (South Caucasus, Georgia)10. Molecular phylogenetic analyses place this Stephanorhinus as a sister group to the clade formed by the woolly rhinoceros (Coelodonta antiquitatis) and Merck's rhinoceros (Stephanorhinus kirchbergensis). We show that Coelodonta evolved from an early Stephanorhinus lineage, and that this latter genus includes at least two distinct evolutionary lines. The genus Stephanorhinus is therefore currently paraphyletic, and its systematic revision is needed. We demonstrate that sequencing the proteome of Early Pleistocene dental enamel overcomes the limitations of phylogenetic inference based on ancient collagen or DNA. Our approach also provides additional information about the sex and taxonomic assignment of other specimens from Dmanisi. Our findings reveal that proteomic investigation of ancient dental enamel-which is the hardest tissue in vertebrates11, and is highly abundant in the fossil record-can push the reconstruction of molecular evolution further back into the Early Pleistocene epoch, beyond the currently known limits of ancient DNA preservation.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31511700      PMCID: PMC6894936          DOI: 10.1038/s41586-019-1555-y

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   49.962


Phylogenetic placement of extinct species increasingly relies on aDNA sequencing. Efforts to improve the molecular tools underlying aDNA recovery have enabled the reconstruction of ~0.4 Ma and ~0.7 Ma old DNA sequences from temperate deposits[3] and subpolar regions[12], respectively. However, no aDNA data have so far been generated from species that became extinct beyond this time range. In contrast, ancient proteins represent a more durable source of genetic information, reported to survive, in eggshell, up to 3.8 Ma[13]. Ancient protein sequences can carry taxonomic and phylogenetic information useful to trace the evolutionary relationships between extant and extinct species[14,15]. However, so far, the recovery of ancient mammal proteins from sites too old or too warm to be compatible with aDNA preservation is mostly limited to collagen type I (COL1). Being highly conserved[16], this protein is not an ideal phylogenetic marker. For example, regardless of endogeneity[17], collagen-based phylogenetic placement of Dinosauria in relation to extant Aves appears to be unstable[6]. This suggests that the exclusive use of COL1 in deep-time molecular phylogenetics is constraining. Here, we sought to overcoming these limitations by testing whether dental enamel can better preserve a richer set of ancient proteins that are preserved longer than COL1. Dated to ~1.77 Ma by a combination of 40Ar/39Ar dating, paleomagnetism and biozonation[18,19], the archaeological site of Dmanisi (Georgia, South Caucasus; Fig. 1a) represents a context currently considered outside the scope of aDNA recovery. This site has been excavated since 1983, resulting in the discovery, along with stone tools and contemporaneous fauna (Table S1), of almost one hundred hominin fossils, including five skulls representing the georgicus paleodeme within Homo erectus [10]. These are the earliest fossils of the genus Homo outside Africa.
Figure 1

Dmanisi location, stratigraphy, and Stephanorhinus specimen GNM Dm.5/157-16635.

a, Geographic location of Dmanisi in the South Caucasus. The base map was generated using public domain data from www.naturalearthdata.com. b, Generalised stratigraphic profile indicating origin and age of the analysed specimens. c, Isolated left lower molar (m1 or m2) of Stephanorhinus ex gr. etruscus-hundsheimensis, from Dmanisi (labial view). Scale bar: 1 cm.

The geology of the Dmanisi deposits favours the preservation of faunal materials (Supplementary Information: Extended Methods and Results), as the primary aeolian deposits provide rapid burial in fine-grained, calcareous sediments. We studied 12 bone and 14 enamel+dentine samples from 23 specimens of large mammals from multiple excavation units within stratum B1 (Fig. 1b, Extended Data Fig. 1, Extended Data Table 1, Table S3). This is an ashfall deposit that contains faunal remains in different geomorphic contexts. All of these are firmly dated between 1.85-1.76 Ma[19]. High-resolution tandem MS was used to confidently sequence ancient protein residues from the set of faunal remains, after digestion-based (protocols A and B), or digestion-free (protocol C), sample preparation (Methods and Supplementary Information). Ancient DNA analysis was unsuccessfully attempted on a subset of five bone and dentine specimens (Methods).
Extended Data Figure 1

Generalized stratigraphic profiles for Dmanisi, indicating specimen origins.

a, Type section of the Dmanisi M5 Excavation block. b, Stratigraphic profile of excavation area M6. M6 preserves a larger gully associated with the pipe-gully phase of stratigraphic-geomorphic development in Stratum B1. The thickness of Stratum B1 gully fill extends to the basalt surface, but includes “rip-ups” of Strata A1 and A2, showing that B1 deposits post-date Stratum A. c, Stratigraphic section of excavation area M17. Here, Stratum B1 was deposited after erosion of Stratum A deposits. The stratigraphic position of the Stephanorhinus sample Dm.5/157-16635 is highlighted with a red diamond. The Masavara basalt is ca. 50 cm below the base of the shown profile. d, Northern section of Block 2. Following collapse of a pipe and erosion to the basalt, the deeper part of this area was filled with local gully fill of Stratum B1/x/y/z. Note the uniform burial of all Stratum B1 deposits by Strata B2-B4. Sampled specimens are indicated by CGG five-digit numbers. See Extended Data Table 1 for both CGG and GNM specimen numbers.

Extended Data Table 1

Genome and proteome survival in 23 Dmanisi fossil fauna specimens.

For each specimen, the Centre for GeoGenetics (CGG) reference number and the Georgian National Museum (GNM) specimen field number are reported. *or the narrowest possible taxonomic identification achievable using comparative anatomy methods. †Only collagens survive. B = Bone, D = Dentine, E = Enamel. Extractions of enamel might include some residual dentine. Accordingly, both tissues are either listed separately (○D, ●E, in case of no collagen preservation), or together (●E+D, in case of collagen preservation). Open circles (○) indicate no molecular preservation; (●) closed circles indicate molecular preservation.

CGG ref. numb.GNM specimen numberMorphological identification*AnatomyAncient DNAProtein extr. Method AProtein extr. Method BProtein extr. Method CPhylogenetic analysis
16486Dm.bXl.sqA6.V._. Canis etruscus P4 sin.○E+D
16626Dm.6/154.2/4.A4.17Artiodactylatibia sin.○B
16628Dm.7/154.2.A2.27Cervidaeme Ill&IV dex.●B†
16629Dm.5/154.3.A4.32Cervidaehem imandible sin. with dp2, dp3, dp4, m1○B●E+D
16630Dm.6/151.4.A4.12 Pseudodama nestii hemimandible dex. with p2-m3○B○D, ●E
16631Dm.69/64.3.81.53Cervidaemaxilla sin. with P3○B○D, ●E
16632Dm.5/154.2.A4.38 Equus stenonis i3 dex.●E+D Fig. S10
16633Dm.5/153.3.A2.33 Equus stenonis mc Ill & mc II sin.○B
16634Dm.7/151.2.81/A4.1 Equus stenonis m/1 or m/2 dex.○D, ●E
16635Dm.5/157.profile cleaning Stephanorhinus sp.m/1 sin.○D, ●E Fig. 4, Fig. S11
16636Dm.6/153.1.A4.13Rhinocerotidaetibia dex.○B
16637Dm.7/154.2.A4.8Bovidaemt lll&IV sin.●B†
16638Dm.5/154.1.B1.1Bovidaehemimandible dex. with p3-m3○B○D, ●E Fig. S12
16639Dm.8/154.4.A4.22Bovidaemaxilla dex. with P2-M2○D, ●E Fig. S13
16640Dm.6/151.2.A4.97 Bison georgicus mt lll&IV sin.○B
16641Dm.8/152.3.B1.2 Bison georgicus m3 dex.○D, ●E Fig. S14
16642Dm.8/153.4.A4.5 Canis etruscus hemimandible sin. with p1-m2○D, ●E
16856Dm.M6/7.Il.296Cervidaem2 sin.●D†○D, ●E●E+D
16857Dm.bXl.profile cleaninglndet.long bone fragment of a herbivore●B†○B○B Fig. S15, EDF6
16858Dm.bXl.North.B1a.collectionCervidaemetapodium fragment○B○B○B
16859D4.collectionlndet.fragments of pelvis and ribs of a large mammal○B○B○B
16860Dm.65/62.1.A1.CervidaeP4 sin.○D, ●E○D, ●E
16861Dm.64/63.1.B1z.collection Equus stenonis fragment of an upper tooth○D, ●E○D, ●E
Neg. contr. (blank)NCNCNC
We recovered endogenous proteins from 15 out of 23 studied specimens. Digestion-based peptide extraction from bone, dentine and enamel specimens led to the sporadic recovery (6/19) of a limited number of collagen fragments. In contrast, digestion-free peptide extraction of enamel+dentine and bone specimens resulted in high rates of enamel proteome recovery (13/14 specimens, Extended Data Table 1). The small proteome[20,21] of mature dental enamel consists of structural enamel proteins, i.e. amelogenin (AMELX), enamelin (ENAM), amelotin (AMTN), and ameloblastin (AMBN), and enamel-specific proteases secreted during amelogenesis, i.e. matrix metalloproteinase-20 (MMP20) and kallikrein 4 (KLK4). The presence of non-specific proteins, such as serum albumin (ALB), has also been previously reported in mature dental enamel[20] (Extended Data Table 2). The depth of coverage for these proteins varied considerably across their sequence, with some positions covered by over 1000 peptide spectrum matches (Extended Data Fig. 2). The high depth of coverage also allows to identify multiple isoforms of AMELX (Extended Data Fig. 3).
Extended Data Table 2

Proteome composition and coverage.

Aggregated data from different extraction methods and/or tissues from the same specimen. In those cells reporting two values separated by the “|” symbol, the first value refers to MaxQuant (MQ) searches performed selecting unspecific digestion, while the second value refers to MQ searches performed selecting trypsin digestion. For those cells including one value only, it refers to MQ searches performed selecting unspecific digestion. Final amino acid coverage, incorporating both MQ and PEAKS searches, is reported in the last column. *supporting all peptides. See Extended Data Table 1 for tissue sources per specimen and both CGG and GNM specimen numbers.

SpecimenProtein NameSequence lengthRazor and unique peptidesMatched spectra*Coverage after MaxQuant searches (%)Final coverage after MaxQuant and PEAKS searches (%)Final coverage (aa)
16628Collagen alpha-1(I)1158583.23.237

16629Amelogenin X2097919036.836.877
Ameloblastin440518425.025.0110
Enamelin1129581336.26.573
Collagen alpha-1(I)1453332.02.029
Collagen alpha-1(III)1464231.41.420
Amelotin212224.74.710

16630Enamelin1129180 | 3530 | 511.8 | 2.715.4174
Ameloblastin44010523130.931.4138
Amelogenin X21311652962.062.9134
AmelogeninY1924913.022.944
Amelotin212568.08.017

16631Enamelin91617575111.011.7107
Amelogenin X21315659848.861.5131
AmelogeninY9051815.625.623
Ameloblastin4407113324.125.2111
MMP 20482223.93.919

16632Enarnelin1144401216017.919.1219
Amelogenin X19228096084.484.4162
MMP 20424496733.333.3141
Serum albumin60711186.16.137
Collagen alpha-1(1)1513442.62.640

16634Amelogenin X1856815753.553.599
Ameloblastin440475823.423.4103
Enamelin92033874.54.541
MMP 20483445.65.627

16635Amelogenin X206394 | 32793 | 573.8 | 7.885.9177
Enamelin1150382 | 22966 | 218.3 | 1.625.1289
Ameloblastin44213146331.339.3166
Amelotin267261489.99.920
Serum albumin607346418.524.5149
MMP20483152511.815.374

16637Collagen alpha-1(I)1453221.71.725
Collagen alpha-1(II)1421221.91.927
Collagen alpha-1(III)1464221.61.623

16638Enamelin1129235 | 71155 | 1311.8 | 4.712.9146
Amelogenin X192185 | 3734 | 552.0 | 10.960.4116
Ameloblastin44064 | 2120 | 430.0 | 5.736.4160
MMP 20481678.19.144

16639Enamelin112920272612.012.6142
Amelogenin X21316762459.267.6144
Ameloblastin4408815526.830.5134
AmelogeninY192131318.818.836

16641Amelogenin X2139125164.365.3139
Ameloblastin4406912228.928.9127
Enamelin112924757.87.888
Amelotin212337.17.115

16642Amelogenin X1858924542.742.779
Enarnelin73314192.52.518
Ameloblastin421337.17.130
MMP20483223.53.517

16856Amelogenin X20966 | 4365 | 2538.845.595
Enamelin91658 | 13153 | 708.210.293
Ameloblastin440213114.814.865
Collagen alpha-1(I)10478 | 109 | 1114.516.9177
Collagen alpha-2(1)10544 | 851 910.610.6112
Serum albumin5830 | 80 | 1216.616.697
AmelogeninY903710.010.09

16857Collagen alpha-1(I)104718 | 1424 | 1821.723.4245
Collagen alpha-2(1)127416 | 1117 | 1117.724.3310

16860Amelogenin X192469830.732.362
Ameloblastin44019379.19.140
Enamelin90015253.83.834

16861Amelogenin X185141536.838.972
Ameloblastin343224.44.415
Enarnelin915221.21.211

Neg.Contr. Gr. 1:ND
235, 275, 706
Neg.Contr. Gr. 2:ND
630, 875, 889
Neg. Contr. Gr. 3:Amelogenin X1225718.018.022
1214, 1218
Extended Data Fig. 2

Proteomic sequence coverage for specimen Dm.5/157-16635 (Stephanorhinus).

a, c, e, g, i, j, PSM sequence coverage of proteins AMBN, ENAM, AMELX, AMTN, MMP20 and ALB, respectively. Annotations include: “amino acid position, amino acid called in that position (number of PSMs/peptides covering that position)” for the phylogenetically informative SAPs within Rhinocerotidae. b, d, f, h, Frequency (%) of phosphorylated (green) and non-phosphorylated (red) PSMs per amino acid position for AMBN, ENAM, AMELX and AMTN, respectively. Numbers within the bars provide the PSM counts. k, Violinplot of PSM coverage distribution for all covered sites (n=693) and those of phylogenetic relevance (SAPs, n=30). The boxplots define the range of the data, with whiskers extending to 1.5 the interquartile range, 25th and 75th percentiles (boxes), and medians (dots). All panels based on MQ results only. Supplementary File “Key MS-MS Spectra” contains spectral examples and fragment ion series alignments for each of the marked SAPs.

Extended Data Figure 3

Peptide and ion fragment coverage of amelogenin X (AMELX) isoforms 1 and 2 from specimen Dm.M6/7.II.296-16856 (Cervidae).

Peptides specific to amelogenin X (AMELX) isoforms 1 and 2 appear in the upper and lower parts of the figure, respectively. No amelogenin X isoform 2 is currently reported in public databases for the Cervidae group. Accordingly, the amelogenin X isoform 2-specific peptides were identified by MaxQuant spectral matching against bovine (Bos Taurus) amelogenin X isoform 2 (UniProt accession number P02817-2). Amelogenin X isoform 2, also known as leucine-rich amelogenin peptide (LRAP), is a naturally occurring amelogenin X isoform from the translation product of an alternatively spliced transcript.

Multiple lines of evidence support the authenticity and the endogenous origin of the sequences recovered. Dental enamel proteins are extremely tissue-specific and confined to the dental enamel mineral matrix[20]. The amino acid composition of the intra-crystalline protein fraction, measured by amino acid racemisation analysis, indicates that the dental enamel behaves as a closed system, unaffected by amino acid and protein residues exchange with the burial environment (Extended Data Fig. 4). The measured rate of asparagine and glutamine deamidation, a spontaneous form of hydrolytic damage consistently observed in ancient samples[22], is particularly advanced. Deamidation in Dmanisi enamel is higher than in the control enamel sample, supporting the antiquity of the peptides recovered (Fig. 2a, Supplementary Information). Other forms of non-enzymatic modifications are also present. Tyrosine (Y) experienced mono- and di-oxidation while tryptophan (W) was extensively converted into multiple oxidation products (Fig. 2b, Supplementary Information). Oxidative degradation of histidine (H) and conversion of arginine (R) leading to ornithine accumulation were also observed (Supplementary Information). These modifications are absent, or much less frequent, in the control sample. Similarly, unlike in the control, the peptide length distribution in the Dmanisi dataset is dominated by shorter fragments, generated by advanced, diagenetically-induced, terminal hydrolysis[23] (Fig. 2c, d). Together all these independent lines of evidence clearly define the substantial biomolecular damage affecting the proteomes retrieved and independently support the authenticity of the sequences reconstructed. To demonstrate beyond reasonable doubt the correct peptide sequence assignments of our MS2 spectra, we performed manual validation of peptide-spectrum-matches, conducted fragment ion intensity predictions, and generated synthetic peptides, for a range of phylogenetically informative and phosphorylated peptides (Methods and Supplementary Information: Key MS2 Spectra).
Extended Data Figure 4

Amino Acid Racemisation.

Extent of intra-crystalline racemization in enamel for the free amino acid (FAA, x-axis) fraction and the total hydrolysable amino acids (THAA, y-axis) fraction for four amino acids (Asx, Glx, Ala and Phe). Note differences in axis scale. Intra-crystalline data from Proboscidea enamel from a range of UK sites[64] has been shown for comparison (black crosses). Both taxa from Dmanisi and the UK exhibit a similar relationship between FAA and THAA racemization and R2 values have been calculated based on a polynomial relationship (order = 2, all >0.93).

Figure 2

Enamel proteome degradation.

a, Deamidation of asparagine (N) and glutamine (Q). Violin plots based on 1000 bootstrap replicates. The boxplots define the range of the data, with whiskers extending to 1.5 the interquartile range, 25th and 75th percentiles (boxes), and medians (dots). Tissue source (B = Bone, D = Dentine, E = Enamel) and the number of peptides used for the calculation are shown at the bottom. b, Extent of tryptophan (W) oxidation leading to several diagenetic products, measured as relative spectral counts. c, Alignment of peptides (positions 124-137, Enamelin) retrieved by digestion-free acid demineralisation from Pleistocene Stephanorhinus ex gr. etruscus-hundsheimensis specimen (GNM Dm.5/157-16635). d, Barplot of peptide length distribution of specimen Dm.5/157-16635 and Medieval (CTRL) undigested ovicaprine dental enamel proteomes.

We confidently detect phosphorylation (Fig. 3, Extended Data Figs. 2, 5), a stable and tightly in vivo regulated physiological post-translational modification (PTM) previously detected in dental enamel proteins[24,25]. Most of the phosphorylated sites we identified belong to the S-x-E/phS motif, recognised by the secreted kinases of the Fam20C family, which are involved in phosphorylation of extracellular proteins and regulation of biomineralization[26]. Spectra supporting the identification of serine phosphorylation were validated manually and by comparison with MS2 obtained from synthetic peptides (Supplementary Information), confirming the automated MaxQuant identifications. Phosphorylated serine and threonine residues may be subjected to spontaneous dephosphorylation. However, by complexing with the Ca2+ ions in the enamel hydroxyapatite matrix, the peptide-bound phosphate groups can remain stable over millennia, as recently observed in ancient bone[27]. Previous studies demonstrated that, when complexed with mineral matrix, ~3.8 Ma protein residues can be retrieved from sub-tropical environments[13]. Limited availability of free water in the enamel matrix further reduces spontaneous dephosphorylation via beta-elimination. Altogether, these observations demonstrate that the heavily modified dental enamel proteome retrieved from the ~1.77 Ma old Dmanisi faunal material is endogenous and almost complete.
Figure 3

Sequence motif analysis of ancient enamel proteome phosphorylation.

Indicated is the overrepresentation of specific amino acids within six positions N- and C-terminal of the phosphorylated amino acids (position 0). See Extended Data Figure 5 for MS2 examples of both S-x-E and S-x-phS phosphorylated motifs.

Extended Data Figure 5

Ancient enamel proteome phosphorylation.

Annotated spectra including phosphorylated serine (phS). a, Phosphorylation in the S-x-E motif (AMEL). b, Phosphorylation in the S-x-phS motif (AMBN). Phosphorylation was independently observed in all three separate analyses of Dm.5/157-16635, including multiple spectra and peptides (see Extended Data Fig. 2).

Next, we used the palaeoproteomic sequence information to improve taxonomic assignment and achieve sex attribution for some of the Dmanisi faunal remains. Phylogenetic analysis of the five largest enamel+dentine proteomes, and of a moderately large bone proteome, allowed to confirm or improve the morphological identification of their specimens of origin (Extended Data Fig. 6; Figs. S10-15). In addition, confident identification of peptides specific for the isoform Y of amelogenin, coded on the non-recombinant portion of the Y chromosome, indicates that four tooth specimens, namely Dm.6/151.4.A4.12-16630 (Pseudodama), Dm.69/64.3.B1.53-16631 (Cervidae), Dm.8/154.4.A4.22-16639 (Bovidae), and Dm.M6/7.II.296-16856 (Cervidae), belonged to male individuals[21] (Extended Data Fig. 7a-d).
Extended Data Figure 6

Phylogenetic relationships between the comparative reference dataset and specimen Dm.bXI-16857.

Consensus tree from Bayesian inference. The posterior probability of each bipartition is shown as a percentage to the left of each node.

Extended Data Figure 7

Amelogenin Y-specific matches.

a) Specimen Dm.6/151.4.A4.12-16630 (Pseudodama). b) Specimen Dm.69/64.3.B1.53-16631 (Cervidae). c) Specimen Dm.8/154.4.A4.22-16639 (Bovidae). d) Specimen Dm.M6/7.II.296-16856 (Cervidae). Note the presence of deamidated glutamine (deQ) and asparagine (deN), oxidated methionine (oxM), and phosphorylated serine (phS).

An enamel+dentine fragment, from the lower molar of a Stephanorhinus ex gr. etruscus-hundsheimensis (Dm.5/157-16635; Fig. 1c, Supplementary Information), returned the highest proteomic sequence coverage, encompassing a total of 875 amino acids, across 987 peptides (6 proteins; Extended Data Fig. 2; Supplementary Information). Following alignment of the enamel protein sequences retrieved from Dm.5/157-16635 against their homologues from all the extant rhinoceros species, plus the extinct woolly rhinoceros (†Coelodonta antiquitatis) and Merck’s rhinoceros (†Stephanorhinus kirchbergensis), phylogenetic reconstructions place the Dmanisi specimen closer to the extinct woolly and Merck’s rhinoceroses than to the extant Sumatran rhinoceros (Dicerorhinus sumatrensis), as an early divergent sister lineage (Fig. 4; Extended Data Fig. 8).
Figure 4

Phylogenetic relationships between the comparative enamel proteome dataset and specimen Dm.5/157-16635 (Stephanorhinus ex gr. etruscus-hundsheimensis).

Consensus tree from Bayesian inference on the concatenated alignment of six enamel proteins, using Homo sapiens as an outgroup. For each bipartition, we show the posterior probability obtained from the Bayesian inference. Additionally, for bipartitions where the Bayesian and the Maximum-likelihood inference support are different, we show (right) the support obtained in the latter. Scale indicates estimated branch lengths.

Extended Data Figure 8

Effect of the missingness in the tree topology.

a, Maximum-likelihood phylogeny obtained using PhyML and the protein alignment excluding the ancient Dmanisi rhinoceros Dm.5/157-16635. b, Topologies obtained from 100 random replicates of the Woolly rhinoceros (Coelodonta antiquitatis). In each replicate the amount of missing sites was similar to the one observed in the Dm.5/157-16635 specimen (72.4% missingness). The percentage shown for each topology indicates the number of replicates in which that particular topology was recovered. c, Similar to b, but for the Javan rhinoceros (Rhinoceros sondaicus). d, Similar to b, but for the black rhinoceros (Diceros bicornis).

Our phylogenetic reconstruction confidently recovers the expected differentiation of the Rhinoceros genus from other genera considered, in agreement with previous cladistic[28] and genetic analyses[29] (Supplementary Information). This topology defines two-horned rhinoceroses as monophyletic and the one-horned condition as plesiomorphic, as previously proposed (Supplementary Information). We caution, however, that the higher-level relationships we observe between the rhinoceros monophyletic clades might be affected by demographic events, such as incomplete lineage sorting[30] and/or gene flow between groups[31], due to the limited number of markers considered. A confident and stable reconstruction of the structure of the Rhinocerotidae family needs the strong support only high-resolution whole-genome sequencing can provide. Regardless, the highly supported placement of the Dmanisi rhinoceros in the (Stephanorhinus, Woolly, Sumatran) clade will remain unaffected, should deeper phylogenetic relationships between the Rhinoceros genus and other family members be revised (Extended Data Fig. 8). The phylogenetic relationships of the genus Stephanorhinus within the family Rhinocerotidae, as well as those of the several species recognized within this genus, are contentious. Stephanorhinus was initially included in the extant South-East Asian genus Dicerorhinus represented by the Sumatran rhinoceros species (D. sumatrensis)[32]. This hypothesis has been rejected and, based on morphological data, Stephanorhinus has been identified as a sister taxon of the woolly rhinoceros[33]. Furthermore, ancient DNA analysis supports a sister relationship between the woolly rhinoceros and D. sumatrensis [7,34,35]. As the Stephanorhinus ex gr. etruscus-hundsheimensis sequences from Dmanisi branch off basal to the common ancestor of the woolly and Merck’s rhinoceroses, these two species most likely derived from an early Stephanorhinus lineage expanding eastward from western Eurasia. Throughout the Plio-Pleistocene, Coelodonta adapted to continental and later to cold-climate habitats in central Asia. Its earliest representative, C. thibetana, displayed some clear Stephanorhinus-like anatomical features[33]. The presence in eastern Europe and Anatolia of the genus Stephanorhinus [35] is documented at least since the late Miocene, and the Dmanisi specimen most likely represents an Early Pleistocene descendent of the Western-Eurasian branch of this genus. Ultimately, our phylogenetic reconstructions show that, as currently defined, the genus Stephanorhinus is paraphyletic, in line with previous morphological and palaeobiogeographical evidence (Supplementary Information). Accordingly, a systematic revision of the genera Stephanorhinus and Coelodonta, as well as their closest relatives, is needed. In this study, we show that enamel proteome sequencing can overcome the time limits of ancient DNA preservation and the reduced phylogenetic content of COL1 sequences. Given the abundance of teeth in the palaeontological record, the approach presented here holds the potential to address a wide range of questions pertaining to the Early and Middle Pleistocene evolutionary history of a large number of mammals, including hominins, at least in temperate climates.

Methods

Dmanisi & sample selection

Dmanisi is located about 65 km southwest of the capital city of Tbilisi in the Kvemo Kartli region of Georgia, at an elevation of 910 meters above sea level (Lat: 41° 20’ N, Lon: 44° 20’ E)[10,18]. The 23 fossil specimens we analysed were retrieved from stratum B1, in excavation blocks M17, M6, block 2, and area R11 (Extended Data Table 1, Extended Data Fig. 1). Stratum B deposits date between 1.78 Ma and 1.76 Ma[19]. All the analysed specimens were collected between 1984 and 2014 and their taxonomic identification was based on traditional comparative anatomy. After the sample preparation and data acquisition for all the Dmanisi specimens was concluded, we applied the whole experimental procedure to a medieval ovicaprine (sheep/goat) dental enamel+dentine specimen that was used as control. For this sample, we used extraction protocol “C”, and generated tandem MS data using a Q Exactive HF mass spectrometer (Thermo Fisher Scientific). The data were searched against the goat proteome, downloaded from the NCBI Reference Sequence Database (RefSeq) archive on 31st May 2017 (Supplementary Information). The ovicaprine specimen was found at the “Hotel Skandinavia” site in the city of Århus, Denmark and stored at the Natural History Museum of Denmark, Copenhagen.

Biomolecular preservation

We assessed the potential of ancient protein preservation prior to proteomic analysis by measuring the extent of amino acid racemisation in a subset of samples (6/23)[36]. Enamel chips, with all dentine removed, were powdered, and two subsamples per specimen were subject to analysis of their free (FAA) and total hydrolysable (THAA) amino acid fractions. Samples were analysed in duplicate by RP-HPLC, with standards and blanks run alongside each one of them (Supplementary Information). The D/L values of aspartic acid/asparagine, glutamic acid/glutamine, phenylalanine and alanine (D/L Asx, Glx, Phe, Ala) were assessed (Extended Data Fig. 4) to provide an overall estimate of intra-crystalline protein decomposition (IcPD).

Proteomics

All the sample preparation procedures for palaeoproteomic analysis were conducted in laboratories dedicated to the analysis of ancient DNA and ancient proteins in clean rooms fitted with filtered ventilation and positive pressure, in line with recent recommendations for ancient protein analysis[37]. A mock “extraction blank”, containing no starting material, was prepared, processed and analysed together with each batch of ancient samples.

Sample preparation

The external surface of bone samples was gently removed, and the remaining material was subsequently powdered. Enamel fragments, occasionally mixed with small amounts of dentine, were removed from teeth with a cutting disc and subsequently crushed into a rough powder. Ancient protein residues were extracted from approximately 180-220 mg of mineralised material, unless otherwise specified, using three different extraction protocols, hereafter referred to as “A”, “B” and “C” (Supplementary Information):

Extraction Protocol A - FASP

Tryptic peptides were generated using a filter-aided sample preparation (FASP) approach[38], as previously performed on ancient samples[39].

Extraction Protocol B - GuHCl Solution and Digestion

Bone or enamel+dentine powder was demineralised in 1 mL 0.5 M EDTA pH 8.0. After removal of the supernatant, all demineralised pellets were re-suspended in a 300 μL solution containing 2 M guanidine hydrochloride (GuHCl, Thermo Scientific), 100 mM Tris pH 8.0, 20 mM 2-Chloroacetamide (CAA), 10 mM Tris (2-carboxyethyl)phosphine (TCEP) in ultrapure H2O[40,41]. A total of 0.2 μg of mass spectrometry-grade rLysC (Promega P/N V1671) enzyme was added before the samples were incubated for 3-4 hours at 37°C with agitation. Samples and negative controls were subsequently diluted to 0.6 M GuHCl, and 0.8 μg of mass spectrometry-grade Trypsin (Promega P/N V5111) was added. Next, samples and negative controls were incubated overnight under mechanical agitation at 37°C. On the following day, samples were acidified, and the tryptic peptides were purified on C18 Stage-Tips, as previously described[42].

Extraction Protocol C - Digestion-Free ACID Demineralisation

Dental enamel powder, with possible trace amounts of dentine, was demineralised in 1.2 M HCl at room temperature, after which the solubilised protein residues were directly cleaned and concentrated on Stage-Tips, as described above. The sample prepared on Stage-Tip “#1217” was processed with 10% TFA instead of 1.2 M HCl. All the other parameters and procedures were identical to those used for all the other samples extracted with protocol “C”.

Tandem mass spectrometry

Different sets of samples (Supplementary Information §5.1, 5.2) were analysed by nanoflow liquid chromatography coupled to tandem mass spectrometry (nanoLC-MS/MS) on an EASY-nLC™ 1000 or 1200 system connected to a Q-Exactive, a Q-Exactive Plus, or to a Q-Exactive HF (Thermo Scientific, Bremen, Germany) mass spectrometer. Before and after each MS/MS run measuring ancient or extraction blank samples, two successive MS/MS runs were included in the sample queue in order to prevent carryover contamination between the samples. These consisted, first, of a MS/MS run ("MS/MS blank" run) with an injection exclusively of the buffer used to re-suspend the samples (0.1% TFA, 5% ACN), followed by a second MS/MS run ("MS/MS wash" run) with no injection.

Data analysis

Raw data files generated during MS/MS spectral acquisition were searched using MaxQuant[43], version 1.5.3.30, and PEAKS[44], version 7.5. A two-stage peptide-spectrum matching approach was adopted (Supplementary Information §5.3). Raw files were initially searched against a target/reverse database of collagen and enamel proteins retrieved from the UniProt and NCBI Reference Sequence Database (RefSeq) archives[45,46], taxonomically restricted to mammalian species. A database of partial “COL1A1” and “COL1A2” sequences from cervid species[47] was also included. The results from the preliminary analysis were used for a first, provisional reconstruction of protein sequences (MaxQuant search 1, MQ1). For specimens whose dataset resulted in a narrower, though not fully resolved, initial taxonomic placement, a second MaxQuant search (MQ2) was performed using a new protein database taxonomically restricted to the “order” taxonomic rank as determined after MQ1. For the MQ2 matching of the MS/MS spectra from specimen Dm.5/157-16635, partial sequences of serum albumin and enamel proteins from Sumatran (Dicerorhinus sumatrensis), Javan (Rhinoceros sondaicus), Indian (Rhinoceros unicornis), woolly (Coelodonta antiquitatis), Mercks (Stephanorhinus kirchbergensis), and Black rhinoceros (Diceros bicornis), were also added to the protein database. All the protein sequences from these species were reconstructed from draft genomes for each species (Dalen and Gilbert, unpublished data, Supplementary Information). For each MaxQuant and PEAKS search, enzymatic digestion was set to “unspecific” and the following variable modifications were included: oxidation (M), deamidation (NQ), N-term Pyro-Glu (Q), N-term Pyro-Glu (E), hydroxylation (P), phosphorylation (S). The error tolerance was set to 5 ppm for the precursor and to 20 ppm, or 0.05 Da, for the fragment ions in MaxQuant and PEAKS respectively. For searches of data generated from sample fractions partially or exclusively digested with trypsin, another MaxQuant and PEAKS search was conducted using the “enzyme” parameter set to “Trypsin/P”. Carbamidomethylation (C) was set: (i) as a fixed modification, for searches of data generated from sets of sample fractions exclusively digested with trypsin, or (ii) as a variable modification, for searches of data generated from sets of sample fractions partially digested with trypsin. For searches of data generated exclusively from undigested sample fractions, carbamidomethylation (C) was not included as a modification, neither fixed nor variable. The datasets re-analysed with MQ2 search, were also processed with the PEAKS software using the entire workflow (PEAKS de novo to PEAKS SPIDER) in order to detect hitherto unreported single amino acid polymorphisms (SAPs). Any amino acid substitution detected by the “SPIDER” homology search algorithm was validated by repeating the MaxQuant search (MQ3). In MQ3, the protein database used for MQ2 was modified to include the amino acid substitutions detected by the “SPIDER” algorithm.

Ancient protein sequence reconstruction

The peptide sequences confidently identified by the MQ1, MQ2, MQ3 were aligned using the software Geneious[48] (v. 5.4.4, substitution matrix BLOSUM62). The peptide sequences confidently identified by the PEAKS searches were aligned using an in-house R-script. A consensus sequence for each protein from each specimen was generated in FASTA format, without filtering on depth of coverage. Amino acid positions that were not confidently reconstructed were replaced by an “X”. Novel SAPs discovered through PEAKS were only accepted if these were further validated by repeating the MaxQuant search (MQ3). All isoleucine were converted into leucines, as standard MS/MS cannot differentiate between these two isobaric amino acids. For possible deamidated sites, we checked whether there were positions in our reference sequence database where both Q and E or both N and D occurred on the same position, and where we also had ancient sequences matching. For sample Dm.5/157-16635, only one such position existed, and this was replaced by an “X” in our consensus sequence. Based on parsimony, for other Q, E, N, and D positions we called the amino acid present in the reference proteome, regardless of their phylogenetic relevance. The output of the MQ2 and 3 searches was used to extend the coverage of the ancient protein sequences initially identified in the MQ1 iteration. For specimen DM.5/157-16335, all the experimentally identified peptides, as well as the respective best matching MS/MS spectra covering the sites informative for Rhinocerotidae phylogenetic inference, are provided as Supplementary Information (“Key MS-MS Spectra” file). All the reported MS/MS spectra are annotated using the advanced annotation mode of MaxQuant. Selected spectra matching to peptides covering phylogenetically informative amino acid positions were manually inspected, validated and annotated by an experienced mass spectrometrist, in all cases in full agreement with bioinformatic sequence assignment (Supplementary Information, “Key MS-MS Spectra” file). We utilized MS[2]PIP fragment ion spectral intensity prediction[49] (version: v20190107; model: HCD) to demonstrate that the experimentally observed fragment ion intensities are highly correlated with the theoretical ones (Fig. S3). Finally, we generated synthetic peptides for 19 selected peptides covering Rhinocerotidae SAPs in DM.5/157-16635.

Post translational modifications

Deamidation

After removal of likely contaminants, the extent of glutamine and asparagine deamidation was estimated for individual specimens, by using the MaxQuant output files as previously published[41] (Supplementary Information).

Other Spontaneous Chemical Modifications

Spontaneous post-translational modifications (PTMs) associated with chemical protein damage were searched using the PEAKS PTM tool and the dependent peptides search mode[50] in MaxQuant. In the PEAKS PTM search, all modifications in the Unimod database were considered. The mass error was set to 5.0 ppm and 0.5 Da for precursor and fragment, respectively. For PEAKS, the de novo ALC score was set to a threshold of 15 % and the peptide hit threshold to 30. The results were filtered by an FDR of 5 %, de novo ALC score of 50 %, and a protein hit threshold of ≥ 20. The MaxQuant dependent peptides search was carried out with the same search settings as described above and with a dependent peptide FDR of 1 % and a mass bin size of 0.0065 Da.

Phosphorylation

Class I phosphorylation sites were selected with localisation probabilities of ≥0.98 in the Phosph(ST)Sites MaxQuant output file. Sequence windows of ±6 aa from all identified sites were compared against a background file containing all non-phosphorylated peptides using a linear kinase sequence motif enrichment analysis in IceLogo (version 1.3.8)[51].

Phylogenetic Analysis

Reference datasets

We assembled a reference dataset consisting of publicly available protein sequences from representative ungulate species belonging to the following families: Equidae, Rhinocerotidae, Suidae and Bovidae (Supplementary Information §7 and §8). As Cervidae and carnivores are absent from protein sequence databases to a various extent, we did not attempt phylogenetic placement of samples from these taxa. Instead, we conducted our phylogenetic analysis on the five best-performing enamel proteomes (Dm.5/154.2.A4.38-16632), Dm.5/157-16635, Dm.5/154.1.B1.1-16638, Dm.8/154.4.A4.22-16639, Dm.8/152.3.B1.2-16641) and the largest bone proteome (Dm.bXI.North.B1a.collection-16658) we recovered (see Extended Data Table 2). We extended this dataset with the protein sequences from extinct and extant rhinoceros species including: the woolly rhinoceros (†Coelodonta antiquitatis), the Merck’s rhinoceros (†Stephanorhinus kirchbergensis), the Sumatran rhinoceros (Dicerorhinus sumatrensis), the Javan rhinoceros (Rhinoceros sondaicus), the Indian rhinoceros (Rhinoceros unicornis), and the Black rhinoceros (Diceros bicornis). Their corresponding protein sequences were obtained following translation of high-throughput DNA sequencing data, after filtering reads with mapping quality lower than 30 and nucleotides with base quality lower than 20, and calling the majority rule consensus sequence using ANGSD[52] For the woolly and Merck’s rhinoceroses we excluded the first and last five nucleotides of each DNA fragment in order to minimize the effect of post-mortem ancient DNA damage[53]. Each consensus sequence was formatted as a separate blast nucleotide database. We then performed a tblastn[54] alignment using the corresponding white rhinoceros sequence as a query, favouring ungapped alignments in order to recover translated and spliced protein sequences. Resulting alignments were processed using ProSplign algorithm from the NCBI Eukaryotic Genome Annotation Pipeline[55] to recover the spliced alignments and translated protein sequences.

Construction of phylogenetic trees

For each specimen, multiple sequence alignments for each protein were built using MAFFT[56] and concatenated onto a single alignment per specimen. These were inspected visually to correct obvious alignment mistakes, and all the isoleucine residues were substituted with leucine ones to account for indistinguishable isobaric amino acids at the positions where the ancient protein carried one of such amino acids. Based on these alignments, we inferred the phylogenetic relationship between the ancient samples and the species included in the reference dataset by using three approaches: distance-based neighbour-joining, maximum likelihood and Bayesian phylogenetic inference (Supplementary Information). Neighbour-joining trees were built using the phangorn[57] R package, restricting to sites covered in the ancient samples. Genetic distances were estimated using the JTT model, considering pairwise deletions. We estimated bipartition support through a non-parametric bootstrap procedure using 500 pseudoreplicates. We used PHyML 3.1[58] for maximum likelihood inference based on the whole concatenated alignment. For likelihood computation, we used the JTT substitution model with two additional parameters for modelling rate heterogeneity and the proportion of invariant sites. Bipartition support was estimated using a non-parametric bootstrap procedure with 500 replicates. Bayesian phylogenetic inference was carried out using MrBayes 3.2.6[59] on each concatenated alignment, partitioned per gene. While we chose the JTT substitution model in the two approaches above, we allowed the Markov chain to sample parameters for the substitution rates from a set of predetermined matrices, as well as the shape parameter of a gamma distribution for modelling across-site rate variation and the proportion of invariable sites. The MCMC algorithm was run with 4 chains for 5,000,000 cycles. Sampling was conducted every 500 cycles and the first 25% were discarded as burn-in. Convergence was assessed using Tracer v. 1.6.0, which estimated an ESS greater than 5,500 for each individual, indicating reasonable convergence for all runs.

Ancient DNA Analysis

The samples were processed using strict aDNA guidelines in a clean lab facility at the Natural History Museum of Denmark, University of Copenhagen. DNA extraction was attempted on five of the ancient animal samples (Supplementary Information §9, §13). Powdered samples (120-140 mg) were extracted using a silica-in-solution method[12,60]. To prepare the samples for NGS sequencing, 20 μL of DNA extract was built into a blunt-end library using the NEBNext DNA Sample Prep Master Mix Set 2 (E6070) with Illumina-specific adapters. The libraries were PCR-amplified with inPE1.0 forward primers and custom-designed reverse primers with a 6-nucleotide index[61]. Two extracts (MA399 and MA2481, from specimens 16859 and 16635 respectively) yielded detectable DNA concentrations (Table S9). The libraries generated from specimen 16859 and 16635 were processed on different flow cells. They were pooled with others for sequencing on an Illumina 2000 platform (MA399_L1, MA399_L2), using 100bp single read chemistry, and on an Illumina 2500 platform (MA2481_L1), using 81bp single read chemistry. The data were base-called using the Illumina software CASAVA 1.8.2 and sequences were demultiplexed with a requirement of a full match of the six nucleotide indexes that were used. Raw reads were processed using the PALEOMIX pipeline following published guidelines[62], mapping against the cow nuclear genome (Bos taurus 4.6.1, accession GCA_000003205.4), the cow mitochondrial genome (Bos taurus), the red deer mitochondrial genome (Cervus elaphus, accession AB245427.2), and the human nuclear genome (GRCh37/hg19), using BWA backtrack[63] v0.5.10 with the seed disabled. All other parameters were set as default. PCR duplicates from mapped reads were removed using the picard tool MarkDuplicate [http://picard.sourceforge.net/].

Sample Dm.5/157-16635 Morphological Measurements

We followed the methodology introduced by Guérin[32]. The maximal length of the tooth is measured with a digital calliper at the lingual side of the tooth and parallel to the occlusal surface. All measurements are given in mm (Supplementary Information §3).

Generalized stratigraphic profiles for Dmanisi, indicating specimen origins.

a, Type section of the Dmanisi M5 Excavation block. b, Stratigraphic profile of excavation area M6. M6 preserves a larger gully associated with the pipe-gully phase of stratigraphic-geomorphic development in Stratum B1. The thickness of Stratum B1 gully fill extends to the basalt surface, but includes “rip-ups” of Strata A1 and A2, showing that B1 deposits post-date Stratum A. c, Stratigraphic section of excavation area M17. Here, Stratum B1 was deposited after erosion of Stratum A deposits. The stratigraphic position of the Stephanorhinus sample Dm.5/157-16635 is highlighted with a red diamond. The Masavara basalt is ca. 50 cm below the base of the shown profile. d, Northern section of Block 2. Following collapse of a pipe and erosion to the basalt, the deeper part of this area was filled with local gully fill of Stratum B1/x/y/z. Note the uniform burial of all Stratum B1 deposits by Strata B2-B4. Sampled specimens are indicated by CGG five-digit numbers. See Extended Data Table 1 for both CGG and GNM specimen numbers.

Proteomic sequence coverage for specimen Dm.5/157-16635 (Stephanorhinus).

a, c, e, g, i, j, PSM sequence coverage of proteins AMBN, ENAM, AMELX, AMTN, MMP20 and ALB, respectively. Annotations include: “amino acid position, amino acid called in that position (number of PSMs/peptides covering that position)” for the phylogenetically informative SAPs within Rhinocerotidae. b, d, f, h, Frequency (%) of phosphorylated (green) and non-phosphorylated (red) PSMs per amino acid position for AMBN, ENAM, AMELX and AMTN, respectively. Numbers within the bars provide the PSM counts. k, Violinplot of PSM coverage distribution for all covered sites (n=693) and those of phylogenetic relevance (SAPs, n=30). The boxplots define the range of the data, with whiskers extending to 1.5 the interquartile range, 25th and 75th percentiles (boxes), and medians (dots). All panels based on MQ results only. Supplementary File “Key MS-MS Spectra” contains spectral examples and fragment ion series alignments for each of the marked SAPs.

Peptide and ion fragment coverage of amelogenin X (AMELX) isoforms 1 and 2 from specimen Dm.M6/7.II.296-16856 (Cervidae).

Peptides specific to amelogenin X (AMELX) isoforms 1 and 2 appear in the upper and lower parts of the figure, respectively. No amelogenin X isoform 2 is currently reported in public databases for the Cervidae group. Accordingly, the amelogenin X isoform 2-specific peptides were identified by MaxQuant spectral matching against bovine (Bos Taurus) amelogenin X isoform 2 (UniProt accession number P02817-2). Amelogenin X isoform 2, also known as leucine-rich amelogenin peptide (LRAP), is a naturally occurring amelogenin X isoform from the translation product of an alternatively spliced transcript.

Amino Acid Racemisation.

Extent of intra-crystalline racemization in enamel for the free amino acid (FAA, x-axis) fraction and the total hydrolysable amino acids (THAA, y-axis) fraction for four amino acids (Asx, Glx, Ala and Phe). Note differences in axis scale. Intra-crystalline data from Proboscidea enamel from a range of UK sites[64] has been shown for comparison (black crosses). Both taxa from Dmanisi and the UK exhibit a similar relationship between FAA and THAA racemization and R2 values have been calculated based on a polynomial relationship (order = 2, all >0.93).

Ancient enamel proteome phosphorylation.

Annotated spectra including phosphorylated serine (phS). a, Phosphorylation in the S-x-E motif (AMEL). b, Phosphorylation in the S-x-phS motif (AMBN). Phosphorylation was independently observed in all three separate analyses of Dm.5/157-16635, including multiple spectra and peptides (see Extended Data Fig. 2).

Phylogenetic relationships between the comparative reference dataset and specimen Dm.bXI-16857.

Consensus tree from Bayesian inference. The posterior probability of each bipartition is shown as a percentage to the left of each node.

Amelogenin Y-specific matches.

a) Specimen Dm.6/151.4.A4.12-16630 (Pseudodama). b) Specimen Dm.69/64.3.B1.53-16631 (Cervidae). c) Specimen Dm.8/154.4.A4.22-16639 (Bovidae). d) Specimen Dm.M6/7.II.296-16856 (Cervidae). Note the presence of deamidated glutamine (deQ) and asparagine (deN), oxidated methionine (oxM), and phosphorylated serine (phS).

Effect of the missingness in the tree topology.

a, Maximum-likelihood phylogeny obtained using PhyML and the protein alignment excluding the ancient Dmanisi rhinoceros Dm.5/157-16635. b, Topologies obtained from 100 random replicates of the Woolly rhinoceros (Coelodonta antiquitatis). In each replicate the amount of missing sites was similar to the one observed in the Dm.5/157-16635 specimen (72.4% missingness). The percentage shown for each topology indicates the number of replicates in which that particular topology was recovered. c, Similar to b, but for the Javan rhinoceros (Rhinoceros sondaicus). d, Similar to b, but for the black rhinoceros (Diceros bicornis).

Genome and proteome survival in 23 Dmanisi fossil fauna specimens.

For each specimen, the Centre for GeoGenetics (CGG) reference number and the Georgian National Museum (GNM) specimen field number are reported. *or the narrowest possible taxonomic identification achievable using comparative anatomy methods. †Only collagens survive. B = Bone, D = Dentine, E = Enamel. Extractions of enamel might include some residual dentine. Accordingly, both tissues are either listed separately (○D, ●E, in case of no collagen preservation), or together (●E+D, in case of collagen preservation). Open circles (○) indicate no molecular preservation; (●) closed circles indicate molecular preservation.

Proteome composition and coverage.

Aggregated data from different extraction methods and/or tissues from the same specimen. In those cells reporting two values separated by the “|” symbol, the first value refers to MaxQuant (MQ) searches performed selecting unspecific digestion, while the second value refers to MQ searches performed selecting trypsin digestion. For those cells including one value only, it refers to MQ searches performed selecting unspecific digestion. Final amino acid coverage, incorporating both MQ and PEAKS searches, is reported in the last column. *supporting all peptides. See Extended Data Table 1 for tissue sources per specimen and both CGG and GNM specimen numbers.

Supplementary Material

Supplementary information is available in the online version of the paper.
  44 in total

1.  Organic matrix of tooth enamel.

Authors:  J E EASTOE
Journal:  Nature       Date:  1960-07-30       Impact factor: 49.962

Review 2.  Ancient DNA damage.

Authors:  Jesse Dabney; Matthias Meyer; Svante Pääbo
Journal:  Cold Spring Harb Perspect Biol       Date:  2013-07-01       Impact factor: 10.005

3.  A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo.

Authors:  David Lordkipanidze; Marcia S Ponce de León; Ann Margvelashvili; Yoel Rak; G Philip Rightmire; Abesalom Vekua; Christoph P E Zollikofer
Journal:  Science       Date:  2013-10-18       Impact factor: 47.728

4.  Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

Authors:  Ludovic Orlando; Aurélien Ginolhac; Guojie Zhang; Duane Froese; Anders Albrechtsen; Mathias Stiller; Mikkel Schubert; Enrico Cappellini; Bent Petersen; Ida Moltke; Philip L F Johnson; Matteo Fumagalli; Julia T Vilstrup; Maanasa Raghavan; Thorfinn Korneliussen; Anna-Sapfo Malaspinas; Josef Vogt; Damian Szklarczyk; Christian D Kelstrup; Jakob Vinther; Andrei Dolocan; Jesper Stenderup; Amhed M V Velazquez; James Cahill; Morten Rasmussen; Xiaoli Wang; Jiumeng Min; Grant D Zazula; Andaine Seguin-Orlando; Cecilie Mortensen; Kim Magnussen; John F Thompson; Jacobo Weinstock; Kristian Gregersen; Knut H Røed; Véra Eisenmann; Carl J Rubin; Donald C Miller; Douglas F Antczak; Mads F Bertelsen; Søren Brunak; Khaled A S Al-Rasheid; Oliver Ryder; Leif Andersson; John Mundy; Anders Krogh; M Thomas P Gilbert; Kurt Kjær; Thomas Sicheritz-Ponten; Lars Juhl Jensen; Jesper V Olsen; Michael Hofreiter; Rasmus Nielsen; Beth Shapiro; Jun Wang; Eske Willerslev
Journal:  Nature       Date:  2013-06-26       Impact factor: 49.962

Review 5.  Ancient Biomolecules and Evolutionary Inference.

Authors:  Enrico Cappellini; Ana Prohaska; Fernando Racimo; Frido Welker; Mikkel Winther Pedersen; Morten E Allentoft; Peter de Barros Damgaard; Petra Gutenbrunner; Julie Dunne; Simon Hammann; Mélanie Roffet-Salque; Melissa Ilardo; J Víctor Moreno-Mayar; Yucheng Wang; Martin Sikora; Lasse Vinner; Jürgen Cox; Richard P Evershed; Eske Willerslev
Journal:  Annu Rev Biochem       Date:  2018-04-25       Impact factor: 23.643

6.  Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins.

Authors:  Matthias Meyer; Juan-Luis Arsuaga; Cesare de Filippo; Sarah Nagel; Ayinuer Aximu-Petri; Birgit Nickel; Ignacio Martínez; Ana Gracia; José María Bermúdez de Castro; Eudald Carbonell; Bence Viola; Janet Kelso; Kay Prüfer; Svante Pääbo
Journal:  Nature       Date:  2016-03-14       Impact factor: 49.962

7.  Expansion for the Brachylophosaurus canadensis Collagen I Sequence and Additional Evidence of the Preservation of Cretaceous Protein.

Authors:  Elena R Schroeter; Caroline J DeHart; Timothy P Cleland; Wenxia Zheng; Paul M Thomas; Neil L Kelleher; Marshall Bern; Mary H Schweitzer
Journal:  J Proteome Res       Date:  2017-01-23       Impact factor: 4.466

8.  Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae.

Authors:  Frido Welker; Geoff M Smith; Jarod M Hutson; Lutz Kindler; Alejandro Garcia-Moreno; Aritza Villaluenga; Elaine Turner; Sabine Gaudzinski-Windheuser
Journal:  PeerJ       Date:  2017-03-14       Impact factor: 2.984

9.  Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution.

Authors:  Eske Willerslev; M Thomas P Gilbert; Jonas Binladen; Simon Y W Ho; Paula F Campos; Aakrosh Ratan; Lynn P Tomsho; Rute R da Fonseca; Andrei Sher; Tatanya V Kuznetsova; Malgosia Nowak-Kemp; Terri L Roth; Webb Miller; Stephan C Schuster
Journal:  BMC Evol Biol       Date:  2009-05-11       Impact factor: 3.260

10.  Proteome degradation in fossils: investigating the longevity of protein survival in ancient bone.

Authors:  Caroline Wadsworth; Mike Buckley
Journal:  Rapid Commun Mass Spectrom       Date:  2014-03-30       Impact factor: 2.419

View more
  22 in total

1.  Million-year-old mammoth genomes shatter record for oldest ancient DNA.

Authors:  Ewen Callaway
Journal:  Nature       Date:  2021-02-17       Impact factor: 49.962

2.  Rhinoceros genomes uncover family secrets.

Authors:  Desire Lee Dalton; Stefan Prost
Journal:  Nature       Date:  2021-11       Impact factor: 49.962

Review 3.  Paleoproteomics.

Authors:  Christina Warinner; Kristine Korzow Richter; Matthew J Collins
Journal:  Chem Rev       Date:  2022-07-15       Impact factor: 72.087

4.  Phylogenetic Signal and Bias in Paleontology.

Authors:  Robert J Asher; Martin R Smith
Journal:  Syst Biol       Date:  2022-06-16       Impact factor: 9.160

5.  A first glimpse at the influence of body mass in the morphological integration of the limb long bones: an investigation in modern rhinoceroses.

Authors:  Christophe Mallet; Guillaume Billet; Alexandra Houssaye; Raphaël Cornette
Journal:  J Anat       Date:  2020-06-10       Impact factor: 2.921

6.  Interspecific variation in the limb long bones among modern rhinoceroses-extent and drivers.

Authors:  Christophe Mallet; Raphaël Cornette; Guillaume Billet; Alexandra Houssaye
Journal:  PeerJ       Date:  2019-09-26       Impact factor: 2.984

7.  Enamel proteome shows that Gigantopithecus was an early diverging pongine.

Authors:  Frido Welker; Jazmín Ramos-Madrigal; Martin Kuhlwilm; Wei Liao; Petra Gutenbrunner; Marc de Manuel; Diana Samodova; Meaghan Mackie; Morten E Allentoft; Anne-Marie Bacon; Matthew J Collins; Jürgen Cox; Carles Lalueza-Fox; Jesper V Olsen; Fabrice Demeter; Wei Wang; Tomas Marques-Bonet; Enrico Cappellini
Journal:  Nature       Date:  2019-11-13       Impact factor: 49.962

8.  Bioarchaeology: a profitable dialogue between microbiology and archaeology.

Authors:  Harald Brüssow
Journal:  Microb Biotechnol       Date:  2020-03       Impact factor: 5.813

9.  The early hunting dog from Dmanisi with comments on the social behaviour in Canidae and hominins.

Authors:  Saverio Bartolini-Lucenti; Joan Madurell-Malapeira; Bienvenido Martínez-Navarro; Paul Palmqvist; David Lordkipanidze; Lorenzo Rook
Journal:  Sci Rep       Date:  2021-07-29       Impact factor: 4.379

Review 10.  Tooth Enamel and its Dynamic Protein Matrix.

Authors:  Ana Gil-Bona; Felicitas B Bidlack
Journal:  Int J Mol Sci       Date:  2020-06-23       Impact factor: 6.208

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.