| Literature DB >> 32778142 |
Nathan Nakatsuka1,2,3, Éadaoin Harney4,5,6, Swapan Mallick7,8, Matthew Mah7,8, Nick Patterson8, David Reich9,10,11,12.
Abstract
We report a method called ContamLD for estimating autosomal ancient DNA (aDNA) contamination by measuring the breakdown of linkage disequilibrium in a sequenced individual due to the introduction of contaminant DNA. ContamLD leverages the idea that contaminants should have haplotypes uncorrelated to those of the studied individual. Using simulated data, we confirm that ContamLD accurately infers contamination rates with low standard errors: for example, less than 1.5% standard error in cases with less than 10% contamination and 500,000 sequences covering SNPs. This method is optimized for application to aDNA, taking advantage of characteristic aDNA damage patterns to provide calibrated contamination estimates, and is available at https://github.com/nathan-nakatsuka/ContamLD .Entities:
Keywords: Ancient DNA; Autosomal DNA; Contamination; Linkage disequilibrium; Nuclear DNA
Mesh:
Substances:
Year: 2020 PMID: 32778142 PMCID: PMC7418405 DOI: 10.1186/s13059-020-02111-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1ContamLD estimates when the target individual, contaminant, and haplotype panel are from the CEU population. Contamination estimates when the simulated contamination rate is between 0.00 and 0.15. a Estimates with damage-restricted correction (option 1). b Estimates with an external correction from an uncontaminated sample (option 2). The black dotted line is y = x, which would correspond to a perfect estimate of contamination. Error bars are 1.96*standard error (95% confidence interval determined via jackknife resampling across chromosomes)
Fig. 2Genetic divergence between uncontaminated individual and contamination sources or haplotype panels impacts ContamLD estimates. a Ancient Iberian (I3756, 1.02× coverage) contaminated with CEU with haplotype panels generated from CEU, TSI, CHB, and YRI populations. b Contamination estimates from the same ancient Iberian contaminated with TSI, CHB, or YRI and analyzed with a CEU panel; from an ancient East Asian (DA362.SG, 1.10× coverage) contaminated with CEU and analyzed with a CHB panel; and from an ancient South African (I9028.SG, 1.21× coverage) contaminated with CEU and analyzed with a YRI panel. The black dotted line is y = x, corresponding to a perfect estimate of contamination. All estimates use the damage-restricted correction (option 1)
Fig. 3ContamLD estimates for ancient European samples of different coverages after damage-restricted correction (option 1). An ancient Iberian of 0.46× coverage, an ancient Hungarian of 0.27× coverage, and an ancient Ukrainian of 0.015× coverage (~ 16,000 SNPs) were contaminated with CEU and analyzed using a CEU panel with ContamLD option 1 (damage-restricted correction). The black dotted line is y = x. Error shading is 1.96*standard error (95% confidence interval)
Fig. 4 Contamination estimates with ContamLD and ANGSD for ancient individuals with different levels of contamination added. Sixty-five ancient individuals with average coverage over 0.5× had increasing levels of artificial contamination added in (from I10895, an ~ 1200BP ancient West Eurasian individual) and were then analyzed with ContamLD (with panels most genetically similar to the ancient individual and using damage-restricted correction, option 1) and ANGSD. Details of all estimates (including standard errors) are provided in Additional file 3: Table S2. The black dotted line is y = x, which would correspond to a perfect estimate of the contamination
Fig. 5Contamination estimates from ContamLD, ANGSD, and ContamMix in 439 ancient individuals of variable ancestry. ANGSD estimates (method 1) are plotted on the x-axis, and on the y-axis are either a ContamMix or b ContamLD estimates. In red are samples that were flagged in ContamLD as “Very_High_Contamination” based on having uncorrected estimates over 15%. All ContamLD estimates below 0 were set to 0