Serena Aneli1, Tina Saupe2, Francesco Montinaro2,3, Anu Solnik4, Ludovica Molinaro2, Cinzia Scaggion5, Nicola Carrara6, Alessandro Raveane7, Toomas Kivisild2,8, Mait Metspalu2, Christiana L Scheib2,9, Luca Pagani1,2. 1. Department of Biology, University of Padua, Padova, Italy. 2. Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia. 3. Department of Biology-Genetics, University of Bari, Bari, Italy. 4. Core Facility, Institute of Genomics, University of Tartu, Tartu, Estonia. 5. Department of Geosciences, University of Padua, Padova, Italy. 6. Anthropology Museum, University of Padova, Padova, Italy. 7. Laboratory of Hematology-Oncology, European Institute of Oncology IRCCS, Milan, Italy. 8. Department of Human Genetics, KU Leuven, Leuven, Belgium. 9. St John's College, Cambridge, United Kingdom.
Abstract
The geographical location and shape of Apulia, a narrow land stretching out in the sea at the South of Italy, made this region a Mediterranean crossroads connecting Western Europe and the Balkans. Such movements culminated at the beginning of the Iron Age with the Iapygian civilization which consisted of three cultures: Peucetians, Messapians, and Daunians. Among them, the Daunians left a peculiar cultural heritage, with one-of-a-kind stelae and pottery, but, despite the extensive archaeological literature, their origin has been lost to time. In order to shed light on this and to provide a genetic picture of Iron Age Southern Italy, we collected and sequenced human remains from three archaeological sites geographically located in Northern Apulia (the area historically inhabited by Daunians) and radiocarbon dated between 1157 and 275 calBCE. We find that Iron Age Apulian samples are still distant from the genetic variability of modern-day Apulians, they show a degree of genetic heterogeneity comparable with the cosmopolitan Republican and Imperial Roman civilization, even though a few kilometers and centuries separate them, and they are well inserted into the Iron Age Pan-Mediterranean genetic landscape. Our study provides for the first time a window on the genetic make-up of pre-Roman Apulia, whose increasing connectivity within the Mediterranean landscape, would have contributed to laying the foundation for modern genetic variability. In this light, the genetic profile of Daunians may be compatible with an at least partial autochthonous origin, with plausible contributions from the Balkan peninsula.
The geographical location and shape of Apulia, a narrow land stretching out in the sea at the South of Italy, made this region a Mediterranean crossroads connecting Western Europe and the Balkans. Such movements culminated at the beginning of the Iron Age with the Iapygian civilization which consisted of three cultures: Peucetians, Messapians, and Daunians. Among them, the Daunians left a peculiar cultural heritage, with one-of-a-kind stelae and pottery, but, despite the extensive archaeological literature, their origin has been lost to time. In order to shed light on this and to provide a genetic picture of Iron Age Southern Italy, we collected and sequenced human remains from three archaeological sites geographically located in Northern Apulia (the area historically inhabited by Daunians) and radiocarbon dated between 1157 and 275 calBCE. We find that Iron Age Apulian samples are still distant from the genetic variability of modern-day Apulians, they show a degree of genetic heterogeneity comparable with the cosmopolitan Republican and Imperial Roman civilization, even though a few kilometers and centuries separate them, and they are well inserted into the Iron Age Pan-Mediterranean genetic landscape. Our study provides for the first time a window on the genetic make-up of pre-Roman Apulia, whose increasing connectivity within the Mediterranean landscape, would have contributed to laying the foundation for modern genetic variability. In this light, the genetic profile of Daunians may be compatible with an at least partial autochthonous origin, with plausible contributions from the Balkan peninsula.
The Mediterranean’s Iron Age populations, between 1100 and 600 BCE, lived in a time of previously unprecedented connectivity (Hodos 2020). Although the technological advances in seafaring had allowed great opportunities for long-distance mobility just a few millennia earlier, it was during the Iron Age that the cosmopolitan role of Mare Nostrum arose, by promoting the spreading of cultures, goods, languages, technical advances as well as heterogeneous ancestral genetic components coming from far and wide (Abulafia 2011; Hodos 2020). Notable examples of such distant connections are the Greek and Phoenician settlements across the Central and Western Mediterranean shores beginning from the 9th and 8th centuries (Fernandes et al. 2020; Marcus et al. 2020).The Iron Age is conventionally marked as beginning 950 BCE in the Italian peninsula and its Tyrrenian islands (Nijboer 2006; Hodos 2020; Aneli et al. 2021). These areas too joined the cosmopolitan wave sweeping across Southern Europe, by hosting numerous trading posts along its shores. A patchwork of communities appeared in this period within the Italian borders, each characterized by unique and well-defined cultures and identities, which were later encapsulated and blurred by the Roman colonization. The shift from Republican to Imperial Rome, with the consequent inclusion of non-Roman civilizations from and beyond the Italian peninsula, has already been shown to be connected with a major eastward genetic shift in Central Italian samples (Antonio et al. 2019). Genetic material from Imperial Romans from Central Italy is indeed the first to colocalize with contemporary Italians in a Principal Component space, pointing to this time period (after 27 BCE) as a crucial one in shaping contemporary Italian genetic makeup. Whether such a shift can also be observed in the rest of the Italian peninsula and how the Republic to Imperial transition may have impacted the genetic landscape of local, pre-Roman populations remains an open question.Despite the numerous written records and archaeological findings, questions about Iron Age populations, their origins, and mutual relationships remain unanswered. Among the many groups occupying Italy in the Iron Age, the Daunians, a Iapygian population from northern Apulia, were first mentioned in the 7–6th century BCE (Lombardo 2014; Norman 2016). Similarly to their neighboring populations, Peucetians and Messapians (living in central and southern Apulia, respectively), the name of the Daunians comes from ancient Greek documents and, given the absence of written Daunian records, the scant information we have on their social, political, and religious life are wholly reliant upon the material record, such as their one-of-a-kind stelae (Norman 2016). For instance, we know that they were mainly farmers, animal breeders, horsemen, and maritime traders with an established trade network extending across the sea with Illyrian tribes (Tagliente 1986; Kirigin et al. 2010; Norman 2016). A fascinating aspect of this population, as opposed to their neighbors in Apulia, was their tenacious resistance to external influences. For instance, they did not acquire either social or cultural Hellenic elements and no Greek alphabet inscriptions have been found in their settlements. Indeed, they retained a strong cultural identity and political autonomy until the Roman arrival in the late 4th–early 3rd century BCE (Norman 2016).Despite the extensive archaeological literature, their origin has been lost to time and, as early as in the Hellenistic period, various legends already existed connecting them to either Illyria (an ancient region broadly identifiable with the Balkan peninsula), Arkadia (present-day Peloponnesus), or Crete (Lombardo 2014; Norman 2016).To shed light on this population and provide a glance of the Iron Age genetic landscape of the Southern Italian peninsula, we collected human remains from three Iron Age necropoleis geographically located within the area historically inhabited by Daunians: Ordona (ancient Herdonia), one of the largest settlements estimated at 600 hectares (Norman 2016), Salapia, and San Giovanni Rotondo (fig. 1). Here, we provide for the first time a genetic investigation of Iron Age Apulian individuals associated with the Daunian culture, offering insight into the genetic landscape of pre-Roman Southern Italy.
Fig. 1.
Geographical location, dating, and genetic structure. (A) Geographical location of the newly generated Apulian ancient individuals (green, red, and blue diamonds and triangles) and other published Iron Age samples from Italy and surrounding areas (color scale mirroring their “years before present” dates). (See also table 1 and supplementary data 1 and 2, Supplementary Material online.) (B) Dating and number of SNPs from 1240K set covered. Samples with thick edges have been radiocarbon dated (mean presented), whereas for the others, we represented their archaeological dating (table 1 and supplementary data 1 online). (C) PCA of newly generated individuals with reference ancient individuals projected onto the genetic variation inferred from modern-day Western Eurasian populations (supplementary data 2 and 3, Supplementary Material online). For the ancient reference populations which have not been attributed to the Iron Age, we plotted their centroids.
Geographical location, dating, and genetic structure. (A) Geographical location of the newly generated Apulian ancient individuals (green, red, and blue diamonds and triangles) and other published Iron Age samples from Italy and surrounding areas (color scale mirroring their “years before present” dates). (See also table 1 and supplementary data 1 and 2, Supplementary Material online.) (B) Dating and number of SNPs from 1240K set covered. Samples with thick edges have been radiocarbon dated (mean presented), whereas for the others, we represented their archaeological dating (table 1 and supplementary data 1 online). (C) PCA of newly generated individuals with reference ancient individuals projected onto the genetic variation inferred from modern-day Western Eurasian populations (supplementary data 2 and 3, Supplementary Material online). For the ancient reference populations which have not been attributed to the Iron Age, we plotted their centroids.
Table 1.
Archaeological Information, Dating, Genome Coverage, Genetic Sex, mtDNA, and Y Chromosome Haplogroups of the Individuals of This Study Selected for Genome-Wide Analysis.
Individual
Site
Date
Genome Coverage
Gen. Sex
mtDNA HG
Y Chromosome HG
No. SNPs
ORD001
Ordona
2516 ± 25 BP; 779–544 calBCE
0.0396
XX
H5c
—
53,277
ORD004
Ordona
2374 ± 21 BP; 536–384 calBCE
0.0429
XX
U8b1b1
R1b-M269
56,070
ORD006
Ordona
2476 ± 29 BP; 770–425 calBCE
0.12
XY
H + 16291T
—
158,562
ORD009
Ordona
2433 ± 27 BP; 749–406 calBCE
0.995
XX
H5c
—
805,966
ORD010
Ordona
984 ± 25 BP; 995–1156 calCE
0.91
XX
X2i
—
769,336
ORD011
Ordona
Not dated
0.0889
XY
H1e
R1b-P312
116,800
ORD014
Ordona
2438 ± 25 BP; 749–408 calBCE
0.108
XY
I5a2 + 16086C
J2b2-L283
141,736
ORD019
Ordona
Not dated
0.0484
XY
T2e
I2d-Z2093/Y3670
61,292
SAL001
Salapia
2946 ± 30 BP; 1260–1048 calBCE
0.049
XY
H1 + 16189!
J2b-M241
64,574
SAL003
Salapia
2287 ± 24 BP; 401–354 calBCE
0.113
XX
K1a2a
—
144,490
SAL007
Salapia
Not dated
0.0529
XX
K1a + 195T!
—
64,034
SAL010
Salapia
Not dated
0.031
XY
U5a1
J2b-M241
39,174
SAL011
Salapia
2241 ± 23 BP; 389–206 calBCE
0.0496
XY
U5b1
I2d-M223
62,155
SGR001
San Giovanni Rotondo
1285 ± 23 BP; 670–774 calCE
0.044
XY
U3a
I1-M253
59,213
SGR002
San Giovanni Rotondo
2451 ± 22 BP; 750–415 calBCE
0.102
XY
U5b1d1
R1b-M269
131,437
SGR003
San Giovanni Rotondo
2205 ± 26 BP; 368–176 calBCE
0.0572
XX
H1 + 16311T!
—
76,687
Note.—Dates in BP are raw radiocarbon dates, calibrated dates are the 95.4% probability and were calibrated using IntCal20 (Reimer et al. 2020). See also supplementary data 1, Supplementary Material online. Gen., genetic; HG, haplogroup.
Archaeological Information, Dating, Genome Coverage, Genetic Sex, mtDNA, and Y Chromosome Haplogroups of the Individuals of This Study Selected for Genome-Wide Analysis.Note.—Dates in BP are raw radiocarbon dates, calibrated dates are the 95.4% probability and were calibrated using IntCal20 (Reimer et al. 2020). See also supplementary data 1, Supplementary Material online. Gen., genetic; HG, haplogroup.
Results
We extracted DNA from 34 human skeletal remains (petrous bone = 23 and teeth = 11) from three necropoleis (Ordona = 19; Salapia = 12; San Giovanni Rotondo = 3; hereafter ORD, SAL, and SGR, respectively) at the Ancient DNA Laboratory of the Institute of Genomics, University of Tartu in Estonia (supplementary data 1), processing between 250 and 1,200 mg of bone material for each sample. All three necropoleis are geographically located less than 50 km from each other in modern Apulia, Southeastern Italy. Ordona and Salapia are archaeologically dated to the Daunian period (6th–3rd century BCE) except one individual (ORD010) archaeologically dated to the Mediaeval time period (which in Europe dates approximately from the 5th to the late 15th century CE) (fig. 1). Based on the museum record, the samples from the San Giovanni Rotondo necropole have been archaeologically inferred to be from the Iron Age period. After excluding 13 samples due to low DNA concentration (<0.5 ng/µl) and screening 21 libraries at a low depth (±20M reads per library), we additionally sequenced to higher depth 16 libraries with endogenous DNA between 1.81% and 38.82% and mitochondrial DNA-based (mtDNA) contamination estimated at less than 2.89% (supplementary data 1).The sequencing runs were merged resulting in 16 individuals for genome-wide analysis: eight from Ordona, five from Salapia, and three from San Giovanni Rotondo. The final data set includes individuals with an average genomic coverage between 0.031 and 0.995× and a number of single-nucleotide polymorphisms (SNPs) overlapping with Human Origins 1240k between ∼40,000 and ∼810,000 (supplementary data 1).Out of those 16 individuals, we selected ten individuals based on their proximity within the principal component analysis (PCA) space (fig. 1) for radiocarbon dating and estimated their age between 1157 and 275 calBCE with a median date of 521 calBCE (supplementary data 1). The radiocarbon dates confirm the archaeological dates of individuals from Salapia and Ordona as well as the Iron Age affiliation of the two San Giovanni Rotondo samples (SGR002 and SGR003). Two additional samples (ORD010 and SGR001) with a shift toward the Near East in the PCA (fig. 1) were radiocarbon dated to 995–1156 calCE (95.4%) and 670–774 calCE (95.4%), respectively (supplementary data 1). Based on their dates, the samples were used as an external control for further analyses focusing on Iron Age Apulia (IAA).We determined the mitochondrial DNA (mtDNA) haplotype of each individual and the Y chromosome (Ychr) haplogroup for nine male individuals (see Materials and Methods and supplementary data 1). We found that the mtDNA haplotypes mostly belong to the mtDNA lineages H1, H5, K1, and U5, haplotypes found in previous studies of individuals from this time period in the Italian Peninsula (Antonio et al. 2019). Besides the Ychr haplogroup R1b, which is the most frequent haplogroup during the Bronze Age in the Italian Peninsula and on the islands Sardinia and Sicily (Allentoft et al. 2015; Haak et al. 2015; Antonio et al. 2019; Fernandes et al. 2020; Saupe et al. 2021), we found the Ychr lineages I1-M253, I2d-M223, and J2b-M241. The haplogroup I2d-M223 was one of the main Y chromosome lineages in Western Europe until the Late Neolithic whereas J2b-M241 first appears in the Bronze Age (Allentoft et al. 2015; Mathieson et al. 2015; Schuenemann et al. 2017; Fernandes et al. 2020; Marcus et al. 2020). We found one Early Mediaeval individual (SGR001) belonging to haplogroup I1-M253, which is common in Northern Europe and previously also detected in a 6th Century Langobard burial from North Italy (Amorim et al. 2018).We used READ (Monroy Kuhn et al. 2018) to identify pairs with first or second degree of genetic relatedness from autosomal data (see Materials and Methods and supplementary data 1, for details, supplementary fig. 11, Supplementary Material online). We found one first-degree relationship between the two female individuals ORD001 (age: 9–11 years, 779–544 calBCE; 95.4%) and ORD009 (age: 40–45 years, 749–406 calBCE; 95.4%) from Ordona sharing the identical mtDNA haplogroup H5c and indicating a mother–daughter or full sibling relationship (supplementary data 1 and fig. 1).
The Making of Modern Italians
To explore the genetic make-up of the IAA population, we performed a PCA projecting the ancient individuals onto the genetic variation of modern Eurasian samples (fig. 1 and supplementary data 2 and 3, Supplementary Material online). Our samples are largely scattered between modern peninsular Italians and Sardinians, and, in contrast to what was generally described (Schiffels et al. 2016; Saag et al. 2019) for other European Iron Age populations (e.g., Northern_Europe_IA, Western_Europe_IA, and Levant_IA in fig. 1), they are still clearly distant from the genetic variability of modern-day inhabitants of Apulia. The downward shift of Iron Age Apulians from the present-day ones is further confirmed by the significantly negative f4(Modern Apulians, IAA; X, Mbuti), where X is a Neolithic/Chalcolithic/Copper Age population (fig. 2 and supplementary data 4, Supplementary Material online). Within the heterogeneity reported by the PCA, which does not mirror the archaeological sites, the two Mediaeval individuals are shifted toward modern Middle Eastern and Caucasus populations (ORD010 and SGR001), whereas the others are stretched along the PC2. This pattern partially mirrors the chronological date with the most recent being more similar to present-day Southern Europeans and is further strengthened when considering the PC3 distribution (supplementary fig. 2, Supplementary Material online). Three samples located at the bottom of the PCA (ORD004, ORD019, SAL007) and one (SAL010) falling in the middle did not include modern Apulians among the top 25 results of an f3 outgroup analysis (supplementary fig. 3, Supplementary Material online). All of them showed an affinity to Copper and Bronze Age Italians (Saupe et al. 2021) as well as the Aegean and the Mediterranean worlds (including Minoans, Greece, Croatians, and Gibraltar). A similar distribution is mirrored in the multidimensional scaling (MDS) built from the f3 outgroup measures, where the oldest IAA individual (SAL001: 1260–1048 calBCE; 95.4%) lies farthest from the modern samples, whereas the Mediaeval ones (ORD010 and SGR001) are the closest (supplementary fig. 4, Supplementary Material online). An additional MDS plot is reported in supplementary figure 5, Supplementary Material online: There, we computed the median distances from the centroid of the IAA samples (excluding the two Mediaeval samples), the Roman Imperials and the Republicans, obtaining 0.0093, 0.0140, and 0.0101, respectively. Notably, the distance distributions were not statistically different, thus suggesting that the extent of genetic heterogeneity of IAA individuals is comparable to that of the cosmopolitan Republican and Imperial Roman samples, the latter having experienced a remarkable degree of admixture with foreign Mediterranean groups, pushed by strong geopolitical contingencies (Antonio et al. 2019). We also investigated whether the coverage was somehow responsible for the sample scattering by computing the Pearson correlation between the coverage itself and the distance from the IAA centroid. We found out that the two measures are not correlated (R = −0.17, P = 0.57), thus showing that most of the sample variation was not due to technical reasons (supplementary fig. 6, Supplementary Material online).
Fig. 2.
Genetic relationship of Iron Age and Middle Age Apulians and their ancestral composition. (A) Heatmap representing the Z score values of f4(Modern Apulia, IAA; X, Mbuti), where X is an ancient Italian population. We also added the two Middle Age samples for comparison. Tests with Z scores between −3 and 3 or with less than 5,000 SNPs were not included (P, Palaeolithic; M, Mesolithic; N, Neolithic; C, Chalcolithic; CA, Copper Age; BA, Bronze Age; IA, Iron Age; A, Antiquity). (B) qpAdm proportions of ancient Apulian samples and other reference populations (Modern Apulians, Minoans, Croatia_EIA, and the Roman individuals from the Republican and the Imperial period) using “base” sources: Caucasus and Western hunter-gatherer component (HG), Anatolia Neolithic (Anatolia_N), Iranian Neolithic (Iran_N), Steppe-related ancestry. (C) qpAdm proportions using also the Minoans as source (Materials and Methods). In (B) and (C), we plotted the model with the highest number of sources and P value (complete results may be seen in supplementary fig. 7, Supplementary Material online).
Genetic relationship of Iron Age and Middle Age Apulians and their ancestral composition. (A) Heatmap representing the Z score values of f4(Modern Apulia, IAA; X, Mbuti), where X is an ancient Italian population. We also added the two Middle Age samples for comparison. Tests with Z scores between −3 and 3 or with less than 5,000 SNPs were not included (P, Palaeolithic; M, Mesolithic; N, Neolithic; C, Chalcolithic; CA, Copper Age; BA, Bronze Age; IA, Iron Age; A, Antiquity). (B) qpAdm proportions of ancient Apulian samples and other reference populations (Modern Apulians, Minoans, Croatia_EIA, and the Roman individuals from the Republican and the Imperial period) using “base” sources: Caucasus and Western hunter-gatherer component (HG), Anatolia Neolithic (Anatolia_N), Iranian Neolithic (Iran_N), Steppe-related ancestry. (C) qpAdm proportions using also the Minoans as source (Materials and Methods). In (B) and (C), we plotted the model with the highest number of sources and P value (complete results may be seen in supplementary fig. 7, Supplementary Material online).The peculiar positioning of the IAA individuals casts doubt on when the major population shift resulting in modern Italian genetic composition took place. The shift toward the modern Italian genetic variability can be seen with the Republican-Imperial Roman samples (Antonio et al. 2019), the latter being more “similar” to modern Italians (fig. 1). Whether Apulian individuals dating back to the Republican or Imperial phase would also show a repositioning toward modern genetic variability remains an open question, although the later Mediaeval samples of this study point to population changes having taken place after the Iron Age in the area.
The Pan-Mediterranean Genetic Landscape of IAA
The geographic location of Apulia, a narrow peninsula stretching out in the sea at the South of Italy, has made this region an important Mediterranean crossroads connecting Western Europe, the Balkans, the Aegean, and Levant worlds. This is reflected in the PCA where IAA individuals are closely related to other Iron Age populations from the Mediterranean and surrounding areas (e.g., Montenegro, Bulgaria, and Sardinia) (fig. 1 and supplementary fig. 4, Supplementary Material online). Nomadic or cosmopolitan groups scatter like IAA: three Punic individuals from Sardinia (Italy_Sardinia_IA_Punic; Marcus et al. 2020), three Moldova Scythians already reported to be genetically similar to Southern Europeans (Krzewińska et al. 2018), Spanish individuals from the Hellenistic and the Romans periods (Olalde et al. 2019), and an individual from the 12th century Iron Age Ashkelon (Feldman, Master, et al. 2019) which clusters with ORD001.In order to shed light onto the genetic composition of the IAA individuals, we modeled them as a combination of the main ancestries documented across Western Europe at that time: Western Hunter-Gatherers (WHG), Anatolian Neolithic (AN), Steppe-related and, interchangeably, Caucasus Hunter-Gatherers (CHG), or Iranian Neolithic (IN) using the qpWave/qpAdm framework (fig. 2 and supplementary fig. 7, and Materials and Methods). Broadly, the contributions of such ancestries to the genetic variability of ancient European populations vary according to their geographical positions: in particular, northernmost locations received higher proportions of WHG, Steppe-related ancestry and, consequently, CHG ancestries, whereas Southern European groups carried variable Iranian Neolithic or CHG traces (Lazaridis et al. 2014). In view of this, we observed that although the IAA individuals could generally be modeled as a two-way admixture between AN and Steppe (0.63 ± 0.08 and 0.37 ± 0.08, respectively), the alternative model AN + CHG/IN could also fit for a subset of them, particularly in case of the samples ORD004, ORD010, and SAL010 with higher or comparable P values (supplementary fig. 7, first row with two sources). When three or four sources were tested, the presence of WHG ancestry in the majority of our individuals emerges and together with AN, Steppe, and CHG/IN, forms a supported model for IAA samples (fig. 2 and supplementary fig. 7). Notably, for the individuals stretching downward in the PCA (ORD004, ORD019, SGR002, and the Mediaeval ORD010) a three-way admixture involving AN, Steppe, and CHG/IN is generally preferable. To better understand the putative contribution of more recent populations, we modeled our samples with base sources (WHG, AN, Steppe-related, and CHG/IN) and, alternatively, Minoans (Odigitria and Lasithi), Amhara_NAF, and Roman Republicans (fig. 2 and supplementary fig. 7, and Materials and Methods). Amhara_NAF can be used as a proxy for the non-African component in modern Ethiopian individuals that was tentatively linked to the Sea People, a Bronze Age nomadic seafaring population (Feldman, Master, et al. 2019; Molinaro et al. 2019). Together with Minoans and Roman Republicans, this component can be broadly modeled as a Pan-Mediterranean population (constituted by AN and IN/CHG components) with the addition of WHG and Steppe-related ancestry in Roman Republicans. When modeled also with Minoans and Amhara_NAF, which roughly proxies the same ancestral signature, the majority of the samples required an additional CHG/IN contribution (two-way admixtures in supplementary fig. 7) as well as Steppe-related and WHG. We further observed that, as previously seen, the WHG contribution is less clear in those samples stretching downward in the PCA. Although the CHG/IN additional contribution may simply proxy the presence of Steppe-related ancestry in IAA, the absence of which in Minoans has already been reported (Lazaridis et al. 2017), the same cannot be said about Roman Republicans (two-way admixtures in supplementary fig. 7), which harbored a considerable amount of Steppe component (Antonio et al. 2019). However, this signature is not confirmed with f4 analyses (supplementary fig. 8), where just Mycenaean groups report less CHG ancestry than our samples.Even taking into consideration some uncertainty, especially for the samples with lower coverage, the broader picture emerging from qpAdm analyses is the pervasive presence of coeval Italian ancestries (Rome Republicans), as well as previous Bronze Age sources (Minoans and Sea People) spread far and wide across the Mediterranean Sea. In such a melting pot scenario, the genetic heterogeneity of IAA individuals, who lived in close temporal and geographical proximity, stand out (fig. 1 and supplementary figs. 4 and 9, Supplementary Material online). An example of the cosmopolitan nature of Iron Age Mediterranean is the first-degree relationship between ORD009 and ORD001, whose positions in the PCA strikingly differ, with the individual ORD001 being stretched toward Middle Eastern and Caucasus modern populations, potentially as a consequence, in case of a mother–daughter relationship, of the foreign origin of the father (fig. 1 and supplementary fig. 4, Supplementary Material online). F4 analyses in the form f4(ORD009, ORD001; X, Mbuti) report a slightly significant (Z score between −2 and −3) excess of Greece_N, Portugal_LN_C, Lebanon_Roman, and Italy_Sicily_EBA in ORD001, which may explain its eastward shift (supplementary fig. 10). Moreover, ORD001, together with SGR002, but not ORD009 harbored more CHG when compared with Lebanon_Hellenistic and Lebanon_IA3 samples, respectively (supplementary fig. 10).We also investigated whether the PCA scattering was due to varying African or Levantine contributions with f4(Rome Republican, IAA, Levant_N/YRI, Mbuti) and tried the same on Mediaeval ancient Apulians (ORD010 and SGR001). However, none of the tested ancient Apulians shows a significant excess of YRI ancestry when compared with the contemporary Roman Republicans, even though ORD014, SAL007, and SAL011 show negative f4 values with a Z score between −2 and −3 (supplementary fig. 10).
The Origin of Daunians
The apparent genetic heterogeneity of IAA individuals connected to the Daunian population and the cosmopolitan landscape of Iron Age Mediterranean populations hinder a full reconstruction of the demographic processes leading to the Daunians. Nevertheless, a few milestones can be spotted.When we performed f3 analyses to investigate the nearest possible source for each IAA individual using Minoans, Iron Age Croatians, and the local Roman Republicans (fig. 3), we found that none of the IAA individuals shows higher affinities with Minoans. Three of them, which clustered close to modern Italians in the PCA (ORD001, ORD014, and SGR003, fig. 1), show higher affinity with the Iron Age Croatian sample (ORD004 followed this pattern too, but with lower f3 values). However, the remaining majority are closest to the Roman Republicans, which can be interpreted as representative of local Iron Age peninsular Italy ancestry, as also indicated by our MDS results.
Fig. 3.
Genetic affinities of Iron Age Apulian samples with the putative populations of origin: Minoans (Minoan_Lassithi), Illyrians (here proxied by the Croatia_EIA individual), and the Roman Republicans (here proxying the autochthonous Iron Age Italian ancestry). (A) Outgroup f3 statistics of IAA samples compared with the putative Daunian sources. Samples have been sorted and grouped according to the source whose f3 values were higher. (B) Heatmap showing the Z score values of f4(putative sources, IAA_P; X, Mbuti), where X is an ancient Italian population and IAA_P is the entire set of Iron Age Apulian samples taken together. Tests with Z scores between −3 and 3 or with less than 5,000 SNPs were not included (P, Palaeolithic; M, Mesolithic; N, Neolithic; C, Chalcolithic; CA, Copper Age; BA, Bronze Age; IA, Iron Age; A, Antiquity).
Genetic affinities of Iron Age Apulian samples with the putative populations of origin: Minoans (Minoan_Lassithi), Illyrians (here proxied by the Croatia_EIA individual), and the Roman Republicans (here proxying the autochthonous Iron Age Italian ancestry). (A) Outgroup f3 statistics of IAA samples compared with the putative Daunian sources. Samples have been sorted and grouped according to the source whose f3 values were higher. (B) Heatmap showing the Z score values of f4(putative sources, IAA_P; X, Mbuti), where X is an ancient Italian population and IAA_P is the entire set of Iron Age Apulian samples taken together. Tests with Z scores between −3 and 3 or with less than 5,000 SNPs were not included (P, Palaeolithic; M, Mesolithic; N, Neolithic; C, Chalcolithic; CA, Copper Age; BA, Bronze Age; IA, Iron Age; A, Antiquity).Moreover, the WHG contribution, which was a necessary component to explain IAA and Roman Republicans according to qpAdm output, is absent from Minoans and Iron Age Croatians, thus making it a putative signature of an at least partial autochthonous origin (fig. 2 and supplementary fig. 7, Supplementary Material online). These results are confirmed for Minoans, but not for Croatia_EIA, by the f4 analyses (fig. 3 and supplementary fig. 8). Indeed, the significantly negative f4(Minoans, IAA; WHG, Mbuti) (fig. 3 and supplementary fig. 8), as well as the f4(Greece_Minoan_Lassithi/Greece_BA_Mycenaean/Greece_BA_Mycenaean_Pylos, IAA; X, Mbuti), reported a significant excess of WHG ancestry (X being Tagliente2, Loschbour, Italy_Central_M, Switzerland_Bichon, and so on) in all IAA individuals, with the exception of ORD004, SAL001, and SGR003 (also the Mediaeval samples ORD010 and SGR001 deviate from this pattern, supplementary data 6, Supplementary Material online). An excess of Bronze Age Steppe-related component, which at that time was already present along the Italian peninsula (Saupe et al. 2021), was clearly present in SGR002, ORD006, and ORD009 by the f4(IAA, Greece_Minoan_Lassithi; X, Mbuti) (supplementary data 6, Supplementary Material online).The excess of WHG ancestry, tentatively suggesting a local origin, is somewhat blurred by the genetic similarity of the two most probable sources—Illyrians (Croatia EIA) and an autochthonous one (Roman Republicans), which together make part of the same Mediterranean continuum. Indeed, although no significantly negative f4(Croatia_EIA, IAA; X, Mbuti) values have been found (with the exception of X being Italy_C_BA, Italy_Iceman_CA, Portugal_MBA, and Anatolia_MLBA in SAL007, which retains the highest proportion of AN in qpAdm in fig. 2 and supplementary fig. 7, Supplementary Material online), the same pattern is obtained when IAA individuals are compared with Roman Republicans (f4(Roman Republicans, IAA; X, Mbuti), supplementary data 6, Supplementary Material online). Imperial individuals from a few centuries later (f4(Rome_Imperial, IAA; X, Mbuti)) show quite the opposite pattern: no positive results have been produced with the exception of ORD004 (X = Lebanon_Roman), SAL001 (X = Italy_Sardinia_Roman_o), SAL010 (X = Russia_LateMaikop), and ORD010. Conversely, the negative f4 values point toward WHG, Neolithic, and Bronze Age Steppe-related ancestries (supplementary data 6, Supplementary Material online).Another signal coming from qpAdm analyses is the apparent excess of CHG ancestry in IAA; however, the predominant contribution of CHG to the Steppe-related ancestry that, by the Iron Age, had already spread to the Mediterranean area makes it hard to properly detect a CHG signature independent from the Steppe wave, possibly brought by pan-Mediterranean influxes. When directly investigated with an f4 framework, IAA shows generally more CHG than Mycenaean, less CHG than contemporary Croatian_EIA and, in some cases (ORD019, SGR002, and the Mediaeval SGR001 with Z scores higher than 2) more CHG than older Croatian samples (_N = Neolithic and _MN = Middle Neolithic) (supplementary fig. 8, Supplementary Material online).
Discussion
The new genomic sequences of Daunian samples reveal that Iron Age (pre-Roman) Southern Italy (Apulia) can be placed within a Pan-Mediterranean genetic continuum that stretches from Crete (Minoans; Lazaridis et al. 2017) and the Levant (Sea People; Feldman, Master, et al. 2019; Molinaro et al. 2019) to the Republican Rome and the Iberian Peninsula (Antonio et al. 2019), mainly composed by AN and IN/CHG genetic features with the addition of WHG and Steppe-related influences in Continental Italy. Pre-Roman Italian populations, being part of this broader landscape, are not directly superimposable with contemporary Italians which instead seem to be influenced by the homogenizing effect of Imperial Roman and late Antiquity events.Within the described Pan-Mediterranean landscape, the IAA/Daunians show a sizeable heterogeneity, comparable to the one of the broader and cosmopolitan Republican and even more so Imperial Roman civilization, and the highest genetic affinity to Republican Romans and Iron Age Croatians, whereas Minoans and other Iron Age Greek samples show absent or reduced WHG contribution when compared with IAA. This makes a Cretan or Arkadian origin less likely, even though some tales have connected them with the Greek hero Diomedes and many ancient historians have claimed such origins for their neighbor Messapians and Peucetians.The Daunians maintained strong commercial and political relations with the Illyrian people, controlling together the area spanning from the Dalmatia to the Gargano peninsula (Kirigin et al. 2010) and had many cultural affinities with them (Nava and Descoeudres 1990). The material culture, involving peculiar anthropomorphic statue stelae, has provided some information on Daunian culture and may also help in unraveling their mysterious origin. In particular, the forearm decorations on a female stela have been interpreted as tattoos and, whereas tattooing practices were considered barbarian among the Greeks (Jones 1987), they were customary in populations from Tracia and Illyria and, more generally, among the women of status from the Balkans (Norman 2011, 2016).It is not clear whether these connections indicate a movement of people or a sharing of cultural ideas and a conclusive answer to the origin of the Daunians remains elusive. From a parsimony perspective, the genetic results point to an autochthonous origin (e.g., a genetic continuity of Daunians with the population that inhabited the area prior to the examined historical period), here mainly marked by the presence of WHG signature, although we cannot exclude additional influences from Croatia (ancient Illyria), as described by available historical sources and by the material remains (De Juliis 1988; Norman 2016).
Materials and Methods
Samples Information and Ethical Statement
The ancient human remains analyzed in this work belong to the collections of the Museum of Anthropology, University of Padua and have been found and cataloged by different archaeological campaigns carried out during the 20th century. Given the samples were kept at the Museum of Anthropology of the University of Padova, we informed the territorially competent Soprintendenza (Soprintendenza Archeologia, Belle Arti e Paesaggio per l’Area metropolitana di Venezia e per le Province di Belluno, Padova e Treviso) about our sampling strategy and scientific research project, which was approved and our communication assigned the following protocol numbers: prot. 27903 (September 10, 2020) and prot. 29131 (November 23, 2020).
Ordona
The necropolis of Herdonia (today’s Ordona, within the Apulian Foggia province in Italy) was studied by different archaeological campaigns interested in the Daunians, Roman as well as the Mediaeval settlements in 1978 and 1981 (Corrain 1986). The human remains collected on such occasions were later extensively cataloged and studied and new paleopathological evidence was also brought back to light (Scaggion and Carrara 2016). Inhabited starting from the Neolithic period, Herdonia became an important Daunian center from the 6th century BCE.For this study, in collaboration with the University of Padova and Museum of Anthropology of Padova, samples of human remains (petrous bone = 19) were taken. The samples ORD001, ORD004, ORD006, ORD009, ORD010, and ORD014 were dated at 14CHRONO Centre for Climate, the Environment, and Chronology in Belfast, United Kingdom (supplementary data 1).
Salapia
The necropolis of Salapia, an ancient town located 10 km from contemporary Cerignola, within the Apulian Foggia province in Italy. The osteological samples were brought by Prof. Santo Tiné to the Museum of Anthropology of the University of Padova with no further information on the archaeological context, and osteological studies were carried out by Cleto Corrain and colleagues in 1971 (Corrain et al. 1972). From this site, samples of 12 human remains (petrous bone = 5, teeth = 7) were taken. The samples SAL001, SAL003, and SAL011 were dated at 14CHRONO Centre for Climate, the Environment, and Chronology in Belfast, United Kingdom (supplementary data 1).
San Giovanni Rotondo
The San Giovanni Rotondo samples come from the osteo-archaeological collection of the Museum of Anthropology of the University of Padova and are not associated to any further record with the exception of a broad “Iron Age” archaeological label and may be part of the samples brought to the Museum by Prof. Santo Tiné in the 1960s. From this site, samples of human remains (petrous bone = 3) were taken. The samples SGR001, SGR002, and SGR003 were dated at 14CHRONO Centre for Climate, the Environment, and Chronology in Belfast, United Kingdom (supplementary data 1).
DNA Extraction
In all ancient DNA work, we strive to reduce the impact of destructive sampling and retain as much information for future researchers as possible. We sample individuals with multiple elements present, ideally an antimere to the sample taken (when possible); we take detailed photographs of material prior to sampling; we sample the minimum amount that is necessary and we use protocols that will allow for different types of biomolecular or chemical analysis (either simultaneously or in the future).All of the laboratory work was performed in dedicated ancient DNA laboratories at the Estonian Biocentre, Institute of Genomics, University of Tartu, Tartu, Estonia. The library quantification and sequencing were performed at the Core Facility of the Institute of Genomics, Tartu, Estonia. The main steps of the laboratory work are detailed below.In total 34 samples from human remains were extracted for DNA analysis (supplementary data 1, Supplementary Material online). The first layer of pars petrous was removed with a sterilized drill bite to avoid exogenous contamination. A 10-mm core of the inner ear was sampled from the pars petrous. The drill bits and core drill were sterilized in between samples with 6% (w/v) bleach followed by distilled water and then ethanol rinse. Root portions of teeth were removed with a sterile drill wheel.The root and the petrous portions were soaked in 6% (w/v) bleach for 5 min. Samples were rinsed three times with 18.2 MΩcm H2O and soaked in 70% (v/v) Ethanol for 2 min. The tubes were shaken during the procedure to dislodge particles. The samples were transferred to a clean paper towel on a rack inside a class IIB hood with the UV light on and allowed to dry for 2–3 h.Afterward, the samples were weighed to calculate the accurate volume of EDTA (20× EDTA [µl] of sample mass [mg]) and Proteinase K (0.5× Proteinase K [µl] of sample mass [mg]). EDTA and Proteinase K were added into PCR-clean 5 ml or 15 ml conical tubes (Eppendorf) along with the samples inside the IIB hood and the tubes were incubated 72 h on a slow shaker at room temperature.The DNA extracts (of root portions and pars petrous portions) were concentrated to 250 µl using the Vivaspin Turbo 15 (Sartorius) and purified in large volume columns (High Pure Viral Nucleic Acid Large Volume Kit, Roche) using 2.5 ml of PB buffer, 1 ml of PE buffer, and 100 μl of EB buffer (MinElute PCR Purification Kit, QIAGEN). For the elution of the endogenous DNA, the silica columns were transferred to a collection tube to dry and followed in 1.5 ml DNA lo-bind tubes (Eppendorf) to elute. The samples were incubated with 100 µl EB buffer at 37 °C for 10 min and centrifuged at 13,000 rpm for 2 min. After centrifugation, the silica columns were removed and the samples were stored at −20 °C. Only one extraction was performed per extraction for screening and 30 µl used for libraries.
Library Preparation
Sequencing libraries were built using NEBNext DNA Library Prep Master Mix Set for 454 (E6070, New England Biolabs) and Illumina-specific adaptors (Meyer and Kircher 2010) following established protocols(Meyer and Kircher 2010; Orlando et al. 2013; Malaspinas et al. 2014). The end repair module was implemented using 18.75 μl of water, 7.5 μl of buffer, and 3.75 μl of enzyme mix, incubating at 20°C for 30 min. The samples were purified using 500 μl PB and 650 μl of PE buffer and eluted in 30 μl EB buffer (MinElute PCR Purification Kit, QIAGEN). The adaptor ligation module was implemented using 10 μl of buffer, 5 μl of T4 ligase, and 5 μl of adaptor mix (Meyer and Kircher 2010), incubating at 20°C for 15 min. The samples were purified as in the previous step and eluted in 30 μl of EB buffer (MinElute PCR Purification Kit, QIAGEN). The adaptor fill-in module was implemented using 13 μl of water, 5 μl of buffer, and 2 μl of Bst DNA polymerase, incubating at 37°C for 30 min and at 80°C for 20 min. Libraries were amplified using the following PCR set up: 50 μl DNA library, 1× PCR buffer, 2.5 mM MgCl2, 1 mg/ml BSA, 0.2 μM inPE1.0, 0.2 mM dNTP each, 0.1U/μl HGS Taq Diamond, and 0.2 μM indexing primer. Cycling conditions were: 5′ at 94 °C, followed by 18 cycles of 30 s each at 94, 60, and 68 °C, with a final extension of 7 min at 72 °C. The samples were purified and eluted in 35 μl of EB buffer (MinElute PCR Purification Kit, QIAGEN). Three verification steps were implemented to make sure library preparation was successful and to measure the concentration of dsDNA/sequencing libraries—fluorometric quantitation (Qubit, Thermo Fisher Scientific), parallel capillary electrophoresis (Fragment Analyser, Agilent Technologies), and qPCR.
DNA Sequencing
DNA was sequenced using the Illumina NextSeq500/550 High-Output single-end 75 cycle kit. As a norm, 15 samples were sequenced together on one flow cell; additional data was generated for 16 samples to increase coverage (supplementary data 1, Supplementary Material online).
Mapping
Before mapping, the sequences of the adapters, indexes, and poly-G tails occurring due to the specifics of the NextSeq 500 technology were cut from the ends of DNA sequences using cutadapt-1.11 (Martin 2011). Sequences shorter than 30 bp were also removed with the same program to avoid random mapping of sequences from other species. The sequences were aligned to the reference sequence GRCh37 (hs37d5) using Burrows-Wheeler Aligner (BWA 0.7.12) (Li and Durbin 2009) and the command mem with reseeding disabled.After alignment, the sequences were converted to BAM format and only sequences that mapped to the human genome were kept with samtools 1.3 (Li et al. 2009). Afterward, the data from different flow cell lanes were merged and duplicates were removed using picard 2.12 (https://github.com/broadinstitute/picard/tree/2.12.1, last accessed January 24, 2022). Indels were realigned using GATK 3.5 (McKenna et al. 2010) and reads with a mapping quality less than 10 were filtered out using samtools 1.3 (Li et al. 2009).
aDNA Authentication
As a result of degradation over time, aDNA can be distinguished from modern DNA by certain characteristics: short fragments and a high frequency of C=>T substitutions at the 5′ ends of sequences due to cytosine deamination. The program mapDamage2.0 (Jónsson et al. 2013) was used to estimate the frequency of 5′ C=>T transitions. Rates of contamination were estimated on mitochondrial DNA by calculating the percentage of nonconsensus bases at haplogroup-defining positions as detailed in (Jones et al. 2017). Each sample was mapped against the RSRS downloaded from phylotree.org and checked against haplogroup-defining sites for the sample-specific haplogroup (supplementary data 1).Samtools 1.9 (Li et al. 2009) option stats was used to determine the number of final reads, average read length, and average coverage. The final average endogenous DNA content of all individuals (proportion of reads mapping to the human genome) was 10.39% (0.09–38.82%) (supplementary data 1).
Calculating Genetic Sex Estimation
Genetic sex was calculated using the methods described in (Skoglund et al. 2013), estimating the fraction of reads mapping to Y chromosome out of all reads mapping to either X or Y chromosome. Additionally, sex was determined using a method described in (Fu et al. 2016), calculating the X and Y ratio by the division of the coverage by the autosomal coverage. Here, the sex was calculated for samples with a coverage >0.01× and only reads with a mapping quality >10 were counted for the autosomal, X, and Y chromosomes (supplementary data 1).
Determining mtDNA Haplogroups
Mitochondrial DNA haplogroups were determined using HaploGrep2 on the command line. For the determination, the reads were realigned to the reference sequence RSRS and the parameter -rsrs were given to estimate the haplogroups using HaploGrep2 (van Oven and Kayser 2009; Weissensteiner et al. 2016) (supplementary data 1). Subsequently, the identical results between the individuals were checked visually by aligning mapped reads to the reference sequence using samtools 0.1.19 (Li et al. 2009) command tview and confirming the haplogroup assignment in PhyloTree. Additionally, private mutations were noted for further kinship analysis. The polymorphisms were estimated using the online platform of HaploGrep2. Here, the variant calling files (vcf) were uploaded to the online platform and the known polymorphisms in the RSRS were converted to rCRS (supplementary data 1).
Y Chromosome Variant Calling and Haplotyping
A total of 161,140 binary Y chromosome SNPs that have been detected as polymorphic in previous high coverage whole Y chromosome sequencing studies (Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016) were called in nine male individuals with more than 0.01× autosomal coverage using ANGSD-0.916 (Hallast et al. 2015; Karmin et al. 2015; Poznik et al. 2016) –doHaploCall option. A subset of 826 sites had at least one of the nine individuals with a derived allele (supplementary data 1). Basal haplogroup affiliations of each sample were determined by assessing the proportion of derived allele calls in a set of primary haplogroup defining internal branches, as defined in (Karmin et al. 2015), using 1,677 informative sites.
Kinship Analysis
All newly generated individuals with an average human coverage more than 0.03× were selected to assess kinship relationships up to the third degree (Saupe et al. 2021; Wang et al. 2021; Žegarac et al. 2021). We divided the individuals in different groups regarding their geographical area and resulting relationships. First, we were called all individuals using ANGSD-0.916 (Korneliussen et al. 2014) command –doHaploCall to sample a random base for the positions that are present at MAF > 0.1 in the 1000 Genomes GBR population (1000 Genomes Project Consortium et al. 2015) giving a total of 4,045,514 SNPs for autosomal kinship analysis. The ANGSD output files were converted to .tped format and used as an input for kinship analyses with READ (Monroy Kuhn et al. 2018), known to be reliable when working with sequencing depth in the order of 0.1× and routinely used in aDNA studies. First, we tested all individuals together and found one first degree relationship between ORD001 and ORD009 and two second degree relationships between ORD004, SAL007, and SAL011. We tested those relationships using all individuals from Ordona including SAL007 and SAL011 and all individuals from Salapia including ORD004. We compared the estimated mtDNA haplogroups, radiocarbon dates, genetic sexes, average human coverages, and the age of death from the individuals to confirm or dismiss relationships (supplementary data 1 and fig. 1, Supplementary Material online). Based on the outputs, we dismissed the possible second degree relationships.
Data Set Preparation and Preprocessing for Autosomal Analysis
We assembled a genome-wide data set of ancient and modern samples by converting the following data sets, where needed, in PLINK format using convertf from the EIGENSOFT 8.0.0 (Patterson et al. 2006) and merging them together with PLINK 1.9 (Chang et al. 2015): 1) the ancient individuals and the Mbuti Congolese individuals (HGDP; Patterson et al. 2012) from the “1240K” data set from Dr David Reich laboratory (https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data, version 44.3 release, last accessed January 21, 2021), 2) the modern samples from “1240K+HO” of the same release, 3) the Italian Chalcolithic and Bronze Age samples produced by a recent study (Saupe et al. 2021), 4) the genotypes of an Italian hunter-gatherer from the Late Epigravettian site of Riparo Tagliente dated around 17kya (Bortolini et al. 2021), 5) additional genome-wide data of 129 modern-day Italian samples covering the entire country (Raveane et al. 2019), 6) genome-wide data of 217 present-day Apulian individuals (Sallustio et al. 2015), of which 41 individuals (unrelated and filtered for PCA outliers) were used for qpadm and f3 analyses, and 7) 48 haploid genomes representing the Eurasian component of modern Ethiopians (here called Amhara_NAF for “Non African Fraction”; Molinaro et al. 2019). We excluded the ancient samples showing issues in their group assignment (i.e., “Ignore” in the “1240K” data set) or in their DNA quality (e.g., those not containing the string “PASS,” with the exception of the Iceman individual, or those with less than 5,000 SNPs) and one of a pair of related samples. We further refined the list of ancient samples by retaining only those coming from geographical regions relevant for this study (supplementary data 2, Supplementary Material online) (Keller et al. 2012; Gamba et al. 2014; Lazaridis et al. 2014, 2016, 2017; Allentoft et al. 2015; Günther et al. 2015; Jones et al. 2015, 2017; Mathieson et al. 2015, 2018; Olalde et al. 2015, 2018, 2019; Broushaki et al. 2016; Cassidy et al. 2016; Fu et al. 2016; Hofmanová et al. 2016; Kılınç et al. 2016; Martiniano et al. 2016, 2017; Omrak et al. 2016; Schiffels et al. 2016; González-Fortes et al. 2017, 2019; Haber et al. 2017, 2019, 2020; Lipson et al. 2017; Saag et al. 2017, 2019; Schuenemann et al. 2017; Unterländer et al. 2017; van den Brink et al. 2017; Amorim et al. 2018; de Barros Damgaard, Marchi, et al. 2018; de Barros Damgaard, Martiniano, et al. 2018; Fregel et al. 2018; Harney et al. 2018; Krzewińska et al. 2018; Mittnik et al. 2018; Valdiosera et al. 2018; Veeramah et al. 2018; Zalloua et al. 2018; Antonio et al. 2019; Brace et al. 2019; Feldman, Fernández-Domínguez, et al. 2019; Feldman, Master, et al. 2019; Järve et al. 2019; Narasimhan et al. 2019; Sikora et al. 2019; Villalba-Mouco et al. 2019; Wang et al. 2019; Agranat-Tamir et al. 2020; Brunel et al. 2020; Fernandes et al. 2020; Furtwängler et al. 2020; Gokhman et al. 2020; Marcus et al. 2020; Margaryan et al. 2020; Rivollat et al. 2020; Skourtanioti et al. 2020). The same geographical filter was applied on the modern samples from “1240K+HO” and only those flagged as “PASS (genotyping)” have been selected in order to minimize variation due to different genotyping techniques (supplementary data 3, Supplementary Material online) (Patterson et al. 2012; Lazaridis et al. 2014, 2016; Biagini et al. 2019; Jeong et al. 2019). In case of duplicated samples, we selected the item showing the highest number of SNPs among the 1240K SNP set. Finally, we added the newly generated ancient Apulian samples to the merge.We used two different SNPs set for the following analyses. The autosomal “1240K” SNP set (1,150,639 SNPs) was used for the analyses involving just ancient samples, such as the f-statistics and qpAdm analyses (see below). Conversely, the autosomal “HO” SNP set was used for those analyses relying also on modern populations (e.g., PCA and ADMIXTURE). In the latter case, we further refine the “HO” SNP set by removing monomorphic SNPs and those with more than 5% missing count in modern populations with PLINK, thus obtaining 503,062 SNPs.
Principal Component Analysis
We performed PCA using the program smartpca implemented in EIGENSOFT software 8.0.0 (Patterson et al. 2006) with the parameters “lsqproject” and “shrinkmode.” In particular, we projected the ancient samples’ genetic variation onto the principal components inferred from the modern samples of the “1240K+HO” data set. Modern Italian samples coming from Raveane et al. (2019), the modern Apulians from Sallustio et al. (2015), and the Amhara_NAF were also projected.
Admixture
We used the same data set of modern and ancient individuals presented in the PCA to explore the genetic components of each individual. For the analysis, we exploited the model-based algorithm implemented in ADMIXTURE (Alexander et al. 2009) projecting ancient individuals (-P flag) into the genetic structure calculated on the modern data set, due to missing data in the ancient individuals. We performed unsupervised Admixture for K ∈ {2.10} for modern individuals and used the “per-cluster” inferred allele frequencies to project the ancient individuals. We visualized the Q output using R 3.6 (R Core Team 2021).
f4 Statistics
We performed a series of f4 statistics using the program qpDstat (option f4mode) implemented in the software ADMIXTOOLS 7.0.2 (Patterson et al. 2012) in different forms. f4 statistics are computed for a set of four different populations (W, X, Y, Z) and the sign of the resultant Z score indicates the direction of the gene flow: if it is positive, the gene flow occurred either between W and Y or X and Z, whereas if it is negative the gene flow occurred either between W and Z or X and Y). We grouped together ancient published samples from the same cultural/archaeological assemblages (labels used for f3 and f4 analyses can be found in the column “Analysis.Label” of supplementary data 2, Supplementary Material online). Conversely, ancient Apulian samples were both analyzed as individual samples and grouped together according to their archaeological age (Iron Age samples were grouped together under the acronym “IAA,” Iron Age Apulians). About 10 individuals from the Mbuti population from Congo coming from the “1240K” data set were used as an outgroup (population Z).
Outgroup f3 Statistics
To investigate genetic affinities between ancient Apulian samples with other ancient and modern human populations, we performed a series of Outgroup f3 analyses in the form f3(X, Y; outgroup = Mbuti) using the qp3Pop function implemented in ADMIXTOOLS 7.0.2 (Patterson et al. 2012). As above-described, we used the “Analysis.Label” groups for the ancient samples, whereas we used the “Group.Label” already present in the “1240K+HO” annotation file for modern samples.We also computed the distance matrices from the Outgroup f3 results comparing ancient Apulian samples (both Iron and Middle Age samples) with other published ancient samples by subtracting the f3 values from 1 and we performed a classical (metric) MDS with default options using the function cmdscale in the stats package of R (version 4.0.4; R Core Team 2021).
qpAdm
We exploited the qpWave/qpAdm framework within the software ADMIXTOOLS 7.0.2 to investigate the genetic composition of ancient Apulian samples and other ancient samples of interest with respect to human ancestries that contributed to coeval European population in the Mediterranean area. In particular, we tested a number of left sources from 2 to 5 in combination with a list of reference sources (right). We used the options “allsnps = YES” and “inbreed = NO” (the last option was used because some populations were composed of just one haploid individual). We tested four different sets of left sources. The “base” set included Western hunter-gatherers (Loschbour_published.DG), Anatolian Neolithic (26 samples flagged as “Anatolia_N” in the Analysis.Label column of supplementary data 2, Supplementary Material online), Iranian Neolithic (samples Iran_GanjDareh_N), Caucasus hunter-gatherers (KK1.SG and SATP.SG), and Yamnaya (10 Russia_Samara_EBA_Yamnaya samples). To this set of base sources, we separately added as left sources: Minoans (both Greece_Minoan_Lassithi and Greece_Minoan_Odigitria, but taken individually), Amhara_NAF, and Roman Republican samples (Italy_Central_Republic_IA). In each run, we consistently used the same set of right sources (if present in the left set, they were removed from the right): Russia_Ust_Ishim_HG_published.DG (which was always kept as the first of the list), Russia_Kostenki14, Russia_AfontovaGora3, Eastern hunter-gatherers (I0061 = Russia_HG_Karelia, I0124 = Russia_HG_Samara, I0211 = Russia_HG_Karelia, and UzOO77 = Russia_EHG), Spain_ElMiron, Belgium_UP_GoyetQ116_1_published_all, Levant_N (Israel_PPNB), Russia_MA1_HG.SG, Israel_Natufian_published, Czech_Vestonice16, and Caucasus hunter-gatherers.
Ethical Statement
All ancient samples newly sequenced for this study are kept at the Museum of Anthropology of the University of Padova, hence we informed the territorially competent Soprintendenza (Soprintendenza Archeologia, Belle Arti e Paesaggio per l’Area metropolitana di Venezia e per le Province di Belluno, Padova e Treviso) about our sampling strategy and scientific research project, which was approved and our communication assigned the following protocol numbers: prot. 27903 (September 10, 2020) and prot. 29131 (November 23, 2020). We sampled individuals with multiple elements present, ideally an antimere to the sample taken (when possible) and took detailed photographs of material prior to performing the micro destructive sampling. Furthermore, we sampled the minimum amount that is necessary and we used protocols that allow for different types of biomolecular or chemical analysis (either simultaneously or in the future) to be carried out on the same bone material.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.Click here for additional data file.
Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043
Authors: Rosa Fregel; Fernando L Méndez; Youssef Bokbot; Dimas Martín-Socas; María D Camalich-Massieu; Jonathan Santana; Jacob Morales; María C Ávila-Arcos; Peter A Underhill; Beth Shapiro; Genevieve Wojcik; Morten Rasmussen; André E R Soares; Joshua Kapp; Alexandra Sockell; Francisco J Rodríguez-Santos; Abdeslam Mikdad; Aioze Trujillo-Mederos; Carlos D Bustamante Journal: Proc Natl Acad Sci U S A Date: 2018-06-12 Impact factor: 11.205
Authors: Fabio Sallustio; Sharon N Cox; Grazia Serino; Claudia Curci; Francesco Pesce; Giuseppe De Palma; Aikaterini Papagianni; Dimitrios Kirmizis; Mario Falchi; Francesco P Schena Journal: Eur J Hum Genet Date: 2014-10-08 Impact factor: 4.246
Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis Journal: Nature Date: 2015-10-01 Impact factor: 49.962
Authors: Marc Haber; Claude Doumet-Serhal; Christiana Scheib; Yali Xue; Petr Danecek; Massimo Mezzavilla; Sonia Youhanna; Rui Martiniano; Javier Prado-Martinez; Michał Szpak; Elizabeth Matisoo-Smith; Holger Schutkowski; Richard Mikulski; Pierre Zalloua; Toomas Kivisild; Chris Tyler-Smith Journal: Am J Hum Genet Date: 2017-07-27 Impact factor: 11.025
Authors: Verena J Schuenemann; Alexander Peltzer; Beatrix Welte; W Paul van Pelt; Martyna Molak; Chuan-Chao Wang; Anja Furtwängler; Christian Urban; Ella Reiter; Kay Nieselt; Barbara Teßmann; Michael Francken; Katerina Harvati; Wolfgang Haak; Stephan Schiffels; Johannes Krause Journal: Nat Commun Date: 2017-05-30 Impact factor: 14.919
Authors: Alissa Mittnik; Chuan-Chao Wang; Saskia Pfrengle; Mantas Daubaras; Gunita Zariņa; Fredrik Hallgren; Raili Allmäe; Valery Khartanovich; Vyacheslav Moiseyev; Mari Tõrv; Anja Furtwängler; Aida Andrades Valtueña; Michal Feldman; Christos Economou; Markku Oinonen; Andrejs Vasks; Elena Balanovska; David Reich; Rimantas Jankauskas; Wolfgang Haak; Stephan Schiffels; Johannes Krause Journal: Nat Commun Date: 2018-01-30 Impact factor: 14.919