| Literature DB >> 34711884 |
Daniel M Fernandes1,2, Olivia Cheronet3, Pere Gelabert3, Ron Pinhasi3.
Abstract
Estimation of genetically related individuals is playing an increasingly important role in the ancient DNA field. In recent years, the numbers of sequenced individuals from single sites have been increasing, reflecting a growing interest in understanding the familial and social organisation of ancient populations. Although a few different methods have been specifically developed for ancient DNA, namely to tackle issues such as low-coverage homozygous data, they require a 0.1-1× minimum average genomic coverage per analysed pair of individuals. Here we present an updated version of a method that enables estimates of 1st and 2nd-degrees of relatedness with as little as 0.026× average coverage, or around 18,000 SNPs from 1.3 million aligned reads per sample with average length of 62 bp-four times less data than 0.1× coverage at similar read lengths. By using simulated data to estimate false positive error rates, we further show that a threshold even as low as 0.012×, or around 4000 SNPs from 600,000 reads, will always show 1st-degree relationships as related. Lastly, by applying this method to published data, we are able to identify previously undocumented relationships using individuals that had been excluded from prior kinship analysis due to their very low coverage. This methodological improvement has the potential to enable relatedness estimation on ancient whole genome shotgun data during routine low-coverage screening, and therefore improve project management when decisions need to be made on which individuals are to be further sequenced.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34711884 PMCID: PMC8553948 DOI: 10.1038/s41598-021-00581-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Top 30 results ordered by HRC of all 4950 pairwise tests for the 100 individuals from the CHS population, for the eight subsampling fractions between 0.5 and 10%. Results are independently ordered for each fraction, so x-axis order is not expected to match. Known relationships are shown as filled symbols, and each different fraction as a specific colour. All triangles (second degree) are expected to fall within the lighter gray area, and all circles (first degree) within the darker gray area. The allele frequencies from the CHS population were used.
Figure 2Coefficients of the 44 relationships for 1st- and 2nd-degree relatives from the Neolithic site of Koszyce[15]. Each vertical set of 4 coloured points corresponds to one pair of individuals tested with different numbers of aligned reads, according to the legend. Relationships to the left of the solid vertical line are known to be 1st-degree, as per Schroeder et al.[15], and relationships to the right of the line are expected to be 2nd-degree. Grey areas define 1st- and 2nd-degree range intervals.
Application of the method to other published ancient individuals.
| Publication | Individual ID 1 | Individual ID 2 | Associated period | Published degree estimate | Our method’s HRC | SNPs used | Our degree estimate (posterior probability) |
|---|---|---|---|---|---|---|---|
| Fernandes et al. (2017)[ | N44 | N45 | Chalcolithic | 2nd | 0.0962 | 14,457 | 2nd |
| Brace et al. (2019)[ | SB492A3 | SB493A2 | Neolithic | 1st | 0.2644 | 25,805 | 1st |
| Saag et al. (2019)[ | V14 | X05 | Bronze Age | – | 0.2595 | 14,374 | 1st |
| V16 | X14 | Bronze Age | 2nd | 0.1448 | 12,580 | 2nd | |
| Margaryan et al. (2020)[ | VK234 | VK236 | Viking Age | 1st | 0.2173 | 22,702 | 1st |
| VK236 | VK25 | Viking Age | 1st | 0.2163 | 22,659 | 1st | |
| VK234 | VK25 | Viking Age | 1st | 0.2071 | 22,753 | 1st | |
| VK245 | VK45 | Viking Age | 1st | 0.1878 | 17,196 | 1st | |
| VK237 | VK239 | Viking Age | – | 0.1265 | 18,032 | 2nd | |
| VK236 | VK242 | Viking Age | 2nd | 0.1214 | 20,349 | 2nd | |
| VK236 | VK238 | Viking Age | 2nd | 0.1208 | 21,321 | 2nd | |
| VK238 | VK242 | Viking Age | 2nd | 0.1188 | 19,694 | 2nd | |
| VK240 | VK245 | Viking Age | 2nd | 0.1178 | 21,463 | 2nd | |
| VK240 | VK45 | Viking Age | 2nd | 0.1175 | 17,850 | 2nd | |
| VK25 | VK44 | Viking Age | 2nd | 0.1155 | 21,530 | 2nd | |
| VK236 | VK44 | Viking Age | 2nd | 0.1125 | 21,293 | 2nd | |
| VK234 | VK44 | Viking Age | 2nd | 0.1113 | 21,216 | 2nd | |
| VK238 | VK44 | Viking Age | 2nd | 0.1108 | 20,458 | 2nd | |
| VK242 | VK44 | Viking Age | 2nd | 0.1065 | 19,489 | 2nd | |
| VK242 | VK25 | Viking Age | 2nd | 0.0970 | 20,903 | 2nd | |
| VK245 | VK46 | Viking Age | 2nd | 0.0895 | 21,696 | 2nd | |
| VK238 | VK25 | Viking Age | 2nd | 0.0813 | 21,947 | 2nd | |
| VK45 | VK46 | Viking Age | 2nd | 0.0805 | 19,269 | 2nd | |
| VK234 | VK238 | Viking Age | 2nd | 0.0759 | 21,797 | 2nd | |
| VK234 | VK242 | Viking Age | 2nd | 0.0687 | 20,617 | 2nd |
BAM files were downloaded from the European Nucleotide Database, subsampled to a maximum of 1,300,000 reads, and then processed through our pipeline. The allele frequencies used were from individuals with European ancestry in the 1000 Genomes Phase 3 dataset. Estimates in bold are based on less than 10,000 SNPs, and therefore include some degree of uncertainty. For these, we present the posterior probabilities of each degree between parentheses, and on Supplementary Figure 2 we show the corresponding simulated range plots for these pairs.
Figure 3Coefficient distribution ranges for (a) 500 and (b) 5000 pairs of simulated individuals using different numbers of SNPs, based on Phase 1 CHS allele frequencies, demonstrating overlaps and the correction of the curves towards the hard thresholds between classes on the higher SNP numbers. (c) False positive rates, as identified by simulated relationships crossing the thresholds between classes. From 30,000 SNPs no overlap was obtained, even with up to 5000 simulated pairs of individuals, although higher numbers of simulated pairs would eventually produce an overlap with error rates further tending towards 0.