| Literature DB >> 33256612 |
Tatiana R Feuerborn1,2,3,4, Eleftheria Palkopoulou5, Tom van der Valk5,6, Johanna von Seth5,6,7, Arielle R Munters8, Patrícia Pečnerová9, Marianne Dehasque5,6,7, Irene Ureña10, Erik Ersmark5,6, Vendela Kempe Lagerholm11,6, Maja Krzewińska11,6, Ricardo Rodríguez-Varela11,6, Anders Götherström11,6, Love Dalén5,6,7, David Díez-Del-Molino12,13,14.
Abstract
BACKGROUND: After over a decade of developments in field collection, laboratory methods and advances in high-throughput sequencing, contamination remains a key issue in ancient DNA research. Currently, human and microbial contaminant DNA still impose challenges on cost-effective sequencing and accurate interpretation of ancient DNA data.Entities:
Keywords: Ancient DNA; Competitive mapping; DNA contamination removal; Palaeogenomics
Mesh:
Substances:
Year: 2020 PMID: 33256612 PMCID: PMC7708127 DOI: 10.1186/s12864-020-07229-y
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Mapping statistics for target, non-target and human references. a Right panel, percentage of reads from each sample mapping to each of the three reference genomes. Left panel, same as before but zoomed to percentages below 1.2%. b Proportion of reads from the faunal BAM file that mapped to the human part of the concatenated reference genome
Fig. 2Schematic view of the competitive mapping analyses. FASTQ files represent ‘raw’ sequencing files and BAM files represent alignments to a reference genome. Color boxes indicate different types of data: blue, files that need further processing; red, discarded data; and green, data for downstream analyses. a Schematic view of the analyses performed in this manuscript. An example using a mammoth sample is shown. First, normal mapping to the elephant, human and dog references to check for endogenous content as well as non-target and human contamination in the sequencing files. Second, competitive mapping to a concatenated reference of an elephant and human to detect human contamination in the alignments. Third, normal mapping human data to the elephant reference to check that the human contaminat sequences map preferentially to conserved regions of the genome. b Schematic view of a typical competitive mapping pipeline using a mammoth sample as example. After competitive mapping, only the sequences mapping to the elephant part of the concatenated reference will be used for downstream analyses
Fig. 3Characterization of endogenous and human contaminant reads in faunal BAM files. a Comparisons of PMDR and mRL for all mammoth samples. b mRL for mammoth sequences mapping to the elephant or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 313.5, p-value = 0.00223). c PMDR for mammoth sequences mapping to the elephant or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 397, p-value = 1.016e-10). d Comparisons of PMDR and mRL for all ancient dog samples. e mRL for dog sequences mapping to the dog or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 1929, p-value = 1.251e-08). f PMDR for dog sequences mapping to the dog or the human parts of the concatenated reference (Wilcoxon rank-sum test, W = 1743, p-value = 1.511e-05). In all cases, **: p-value < 0.01 and ****: p-value < 0.0001
Fig. 4Data lost per sample after competitive mapping. Fraction of data lost in each sample at genome-wide level and only in conserved regions. Colors indicate different species
Fig. 5Proportions of sequences mapping to human, target and non-target reference from the FASTQ and BAM files. a Correlation between the proportion of reads mapping to human and to the non-target species in the raw FASTQ sequencing files (r2 = 0.81, F = 303.8, p-value = < 2.2e-16). b Not correlation between the proportion of reads mapping to human in the raw FASTQ sequencing files and the proportion of reads mapping to human from the faunal BAM file (r2 = 0.01, F = 1.67, p-value = 0.2). c Correlation between the number of reads mapping to human in the raw FASTQ sequencing files and the number of reads mapping to human from the faunal BAM file (r2 = 0.15, F = 13.5, p-value = < 2e-16)