Literature DB >> 23613487

mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.

Hákon Jónsson¹, Aurélien Ginolhac, Mikkel Schubert, Philip L F Johnson, Ludovic Orlando.

Abstract

MOTIVATION: Ancient DNA (aDNA) molecules in fossilized bones and teeth, coprolites, sediments, mummified specimens and museum collections represent fantastic sources of information for evolutionary biologists, revealing the agents of past epidemics and the dynamics of past populations. However, the analysis of aDNA generally faces two major issues. Firstly, sequences consist of a mixture of endogenous and various exogenous backgrounds, mostly microbial. Secondly, high nucleotide misincorporation rates can be observed as a result of severe post-mortem DNA damage. Such misincorporation patterns are instrumental to authenticate ancient sequences versus modern contaminants. We recently developed the user-friendly mapDamage package that identifies such patterns from next-generation sequencing (NGS) sequence datasets. The absence of formal statistical modeling of the DNA damage process, however, precluded rigorous quantitative comparisons across samples.
RESULTS: Here, we describe mapDamage 2.0 that extends the original features of mapDamage by incorporating a statistical model of DNA damage. Assuming that damage events depend only on sequencing position and post-mortem deamination, our Bayesian statistical framework provides estimates of four key features of aDNA molecules: the average length of overhangs (λ), nick frequency (ν) and cytosine deamination rates in both double-stranded regions ( ) and overhangs ( ). Our model enables rescaling base quality scores according to their probability of being damaged. mapDamage 2.0 handles NGS datasets with ease and is compatible with a wide range of DNA library protocols. AVAILABILITY: mapDamage 2.0 is available at ginolhac.github.io/mapDamage/ as a Python package and documentation is maintained at the Centre for GeoGenetics Web site (geogenetics.ku.dk/publications/mapdamage2.0/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Species

Mesh：

Substances：
Cytosine

Year: 2013 PMID： 23613487 PMCID： PMC3694634 DOI： 10.1093/bioinformatics/btt193

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

DNA in historical samples is subject to a plethora of environmental conditions and degradation reactions (Sawyer ). Abasic sites, strand breaks, interstrand cross-links and a wide diversity of atypic nucleotidic bases are formed following oxidative and hydrolytic degradation (Lindahl, 1993; Pääbo ), even in the most favorable preservation conditions. Post-mortem DNA damage limits our ability to access ancient DNA (aDNA) sequences and increases the risk of exogenous modern contamination, as undamaged DNA molecules are more prone to enzymatic manipulation. Nucleotide misincorporation patterns, which are mostly driven by deaminated forms of cytosines (uracils), have been suggested as a powerful approach to authenticate aDNA sequences generated on next-generation sequencing (NGS) platforms (Briggs ) and motivated the creation of the mapDamage package (Ginolhac ). Such patterns could vary according to the specific molecular approach used for constructing (Meyer ) and/or amplifying (Ginolhac ) second-generation DNA libraries. For instance, for one of the most popular protocols (Meyer and Kircher, 2010), we observe inflated cytosine deamination rates at 5′-overhangs, an increase in substitution rates toward sequencing starts and complementary increase in rates toward reads ends (Briggs ). Conversely, a novel procedure targeting single-stranded templates has shown elevated substitution rates at both ends (Meyer ). Statistical modeling of such patterns has been developed by Briggs with strand break, overhangs and cytosine deamination as key factors. Using read alignment to reference genomes and maximum likelihood optimization, this approach has delivered the first quantitative estimates of damage parameters. However, the likelihood framework originally implemented scales poorly with the size of NGS datasets, and extensive running times have prevented common usage. Here, we present an extension of mapDamage, which implements a fast approximation of the DNA damage model using a Bayesian framework. mapDamage 2.0 opens the possibility of comparing DNA damage levels across temporal and environmental gradients. Posterior distributions of damage parameters also enable penalizing the quality score of likely damaged bases, reducing noise in downstream single-nucleotide polymorphism (SNP) calling procedures.

2 APPROACH

Here we build on the DNA damage model described in Briggs . We make the simplifying assumption that mutations and post-mortem DNA damage are independent within a fragment, with occurrences depending only on the relative position from the sequence ends.

3 METHODS

The general idea is to mutate bases following an Hasegawa, Kishino and Yano (HKY) transition matrix (Hasegawa ) and then independently add post-mortem damage on top of mutated bases. In this framework, we have multinomial distributions describing the position-specific substitutions for any given base ( and ). Θ is the HKY transition matrix, and is defined as the DNA damage transition matrix. We assume post-mortem cytosine deamination is the main driver of nucleotide misincorporations in agreement with experimental evidence (Briggs ), providing Where the base-specific damage probabilities are defined as The motivation for the base-specific damage probabilities is best explained by the Markov chain in Figure 1 where the first jump decides if the position is before or after a nick; then a substitution could be observed following deamination in overhang or double-stranded DNA regions. A similar Markov chain could be drawn for substitutions (Supplementary Section 1).

Fig. 1.

A schematic view describing the DNA damage Markov chain, which extends the DNA substitution model. The states and correspond to the final nucleotides in the sequences

A schematic view describing the DNA damage Markov chain, which extends the DNA substitution model. The states and correspond to the final nucleotides in the sequences For rescaling base quality scores, we assume that and substitutions either originate from true biological differences or from damage driven misincorporations. We can derive an estimate for the probability that a (similar for ) misincorporation at position i along the reads is due to damage using We can now correct base quality scores provided in alignment BAM files [ at position i for read r] using

4 DISCUSSION

We applied mapDamage2.0 on a series of aDNA sequence datasets generated from a range of periods, source materials and environments (Supplementary Section 3). Posterior predictive intervals and empirical frequencies are in general agreement, as shown for the ancient plague dataset (Supplementary Table S2 and Supplementary Figs S4–S9) (Schuenemann ), demonstrating the adequacy of our method. We observed a ratio of cytosine deamination rates for double- and single-stranded regions orders of magnitude greater than estimates based on in vitro experiments in aqueous solution (0.007 in Lindahl, 1993 versus 0.026–0.070 for Schuenemann in Supplementary Table S1). This suggests that tissue- and sample-specific micro-environmental characteristics drive different DNA damage kinetics in situ. We also found a significant rank correlation between the posterior mean for single-stranded cytosine deamination and sample age (Supplementary Table S3) in agreement with Sawyer . However, remains of similar age and location showed diverse parameter estimates (Supplementary Table S2), suggesting a prominent role of micro-environmental characteristics over age in diagenesis. We also applied our quality rescaling scheme to the sequence data of an Australian Aboriginal individual who died in 1920s (Rasmussen ). This increased the overlap of genotype calls to dbSNP v137, suggesting that lower false-positive SNP calls were achieved (Supplementary Table S4).

5 CONCLUSION

We have developed a computational method for inferring aDNA damage parameters from NGS sequence datasets, with minimal changes to the DNA damage model presented by Briggs . Our model is compatible with the specificities of different sequencing and library building protocols. We believe that downscaling quality scores of likely damaged bases is the first from a long list of possible applications for damage parameter posterior distributions, limiting the impact of nucleotide misincorporations in downstream sequence analyses. The knowledge of such distributions could also be instrumental for improving mapping procedures to reference genomes (Schubert ).

11 in total

1. Illumina sequencing library preparation for highly multiplexed target capture and sequencing.

Authors: Matthias Meyer; Martin Kircher
Journal: Cold Spring Harb Protoc Date: 2010-06

Review 2. Genetic analyses from ancient DNA.

Authors: Svante Pääbo; Hendrik Poinar; David Serre; Viviane Jaenicke-Despres; Juliane Hebler; Nadin Rohland; Melanie Kuch; Johannes Krause; Linda Vigilant; Michael Hofreiter
Journal: Annu Rev Genet Date: 2004 Impact factor: 16.830

3. Patterns of damage in genomic DNA sequences from a Neandertal.

Authors: Adrian W Briggs; Udo Stenzel; Philip L F Johnson; Richard E Green; Janet Kelso; Kay Prüfer; Matthias Meyer; Johannes Krause; Michael T Ronan; Michael Lachmann; Svante Pääbo
Journal: Proc Natl Acad Sci U S A Date: 2007-08-21 Impact factor: 11.205

4. A high-coverage genome sequence from an archaic Denisovan individual.

Authors: Matthias Meyer; Martin Kircher; Marie-Theres Gansauge; Heng Li; Fernando Racimo; Swapan Mallick; Joshua G Schraiber; Flora Jay; Kay Prüfer; Cesare de Filippo; Peter H Sudmant; Can Alkan; Qiaomei Fu; Ron Do; Nadin Rohland; Arti Tandon; Michael Siebauer; Richard E Green; Katarzyna Bryc; Adrian W Briggs; Udo Stenzel; Jesse Dabney; Jay Shendure; Jacob Kitzman; Michael F Hammer; Michael V Shunkov; Anatoli P Derevianko; Nick Patterson; Aida M Andrés; Evan E Eichler; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal: Science Date: 2012-08-30 Impact factor: 47.728

Review 5. Instability and decay of the primary structure of DNA.

Authors: T Lindahl
Journal: Nature Date: 1993-04-22 Impact factor: 49.962

6. Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death.

Authors: Verena J Schuenemann; Kirsten Bos; Sharon DeWitte; Sarah Schmedes; Joslyn Jamieson; Alissa Mittnik; Stephen Forrest; Brian K Coombes; James W Wood; David J D Earn; William White; Johannes Krause; Hendrik N Poinar
Journal: Proc Natl Acad Sci U S A Date: 2011-08-29 Impact factor: 11.205

7. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Authors: M Hasegawa; H Kishino; T Yano
Journal: J Mol Evol Date: 1985 Impact factor: 2.395

8. An Aboriginal Australian genome reveals separate human dispersals into Asia.

Authors: Morten Rasmussen; Xiaosen Guo; Yong Wang; Kirk E Lohmueller; Simon Rasmussen; Anders Albrechtsen; Line Skotte; Stinus Lindgreen; Mait Metspalu; Thibaut Jombart; Toomas Kivisild; Weiwei Zhai; Anders Eriksson; Andrea Manica; Ludovic Orlando; Francisco M De La Vega; Silvana Tridico; Ene Metspalu; Kasper Nielsen; María C Ávila-Arcos; J Víctor Moreno-Mayar; Craig Muller; Joe Dortch; M Thomas P Gilbert; Ole Lund; Agata Wesolowska; Monika Karmin; Lucy A Weinert; Bo Wang; Jun Li; Shuaishuai Tai; Fei Xiao; Tsunehiko Hanihara; George van Driem; Aashish R Jha; François-Xavier Ricaut; Peter de Knijff; Andrea B Migliano; Irene Gallego Romero; Karsten Kristiansen; David M Lambert; Søren Brunak; Peter Forster; Bernd Brinkmann; Olaf Nehlich; Michael Bunce; Michael Richards; Ramneek Gupta; Carlos D Bustamante; Anders Krogh; Robert A Foley; Marta M Lahr; Francois Balloux; Thomas Sicheritz-Pontén; Richard Villems; Rasmus Nielsen; Jun Wang; Eske Willerslev
Journal: Science Date: 2011-09-22 Impact factor: 47.728

9. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA.

Authors: Susanna Sawyer; Johannes Krause; Katerina Guschanski; Vincent Savolainen; Svante Pääbo
Journal: PLoS One Date: 2012-03-30 Impact factor: 3.240

10. Improving ancient DNA read mapping against modern reference genomes.

Authors: Mikkel Schubert; Aurelien Ginolhac; Stinus Lindgreen; John F Thompson; Khaled A S Al-Rasheid; Eske Willerslev; Anders Krogh; Ludovic Orlando
Journal: BMC Genomics Date: 2012-05-10 Impact factor: 3.969

370 in total

1. Ancient mitochondrial genome reveals trace of prehistoric migration in the east Pamir by pastoralists.

Authors: Chao Ning; Shizhu Gao; Boping Deng; Hongxiang Zheng; Dong Wei; Haoze Lv; Hongjie Li; Li Song; Yong Wu; Hui Zhou; Yinqiu Cui
Journal: J Hum Genet Date: 2015-10-29 Impact factor: 3.172

2. Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome.

Authors: Lara M Cassidy; Rui Martiniano; Eileen M Murphy; Matthew D Teasdale; James Mallory; Barrie Hartwell; Daniel G Bradley
Journal: Proc Natl Acad Sci U S A Date: 2015-12-28 Impact factor: 11.205

3. Measles virus and rinderpest virus divergence dated to the sixth century BCE.

Authors: Ariane Düx; Sebastian Lequime; Philippe Lemey; Sébastien Calvignac-Spencer; Livia Victoria Patrono; Bram Vrancken; Sengül Boral; Jan F Gogarten; Antonia Hilbig; David Horst; Kevin Merkel; Baptiste Prepoint; Sabine Santibanez; Jasmin Schlotterbeck; Marc A Suchard; Markus Ulrich; Navena Widulin; Annette Mankertz; Fabian H Leendertz; Kyle Harper; Thomas Schnalke
Journal: Science Date: 2020-06-19 Impact factor: 47.728

4. Ancient DNA and human history.

Authors: Montgomery Slatkin; Fernando Racimo
Journal: Proc Natl Acad Sci U S A Date: 2016-06-06 Impact factor: 11.205

5. Ancient mtDNA sequences from the First Australians revisited.

Authors: Tim H Heupink; Sankar Subramanian; Joanne L Wright; Phillip Endicott; Michael Carrington Westaway; Leon Huynen; Walther Parson; Craig D Millar; Eske Willerslev; David M Lambert
Journal: Proc Natl Acad Sci U S A Date: 2016-06-06 Impact factor: 11.205

6. Parallel detection of ancient pathogens via array-based DNA capture.

Authors: Kirsten I Bos; Günter Jäger; Verena J Schuenemann; Åshild J Vågene; Maria A Spyrou; Alexander Herbig; Kay Nieselt; Johannes Krause
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-01-19 Impact factor: 6.237

7. Partial uracil-DNA-glycosylase treatment for screening of ancient DNA.

Authors: Nadin Rohland; Eadaoin Harney; Swapan Mallick; Susanne Nordenfelt; David Reich
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-01-19 Impact factor: 6.237

Review 8. Reconstructing ancient genomes and epigenomes.

Authors: Ludovic Orlando; M Thomas P Gilbert; Eske Willerslev
Journal: Nat Rev Genet Date: 2015-06-09 Impact factor: 53.242

9. Genome-wide ancestry of 17th-century enslaved Africans from the Caribbean.

Authors: Hannes Schroeder; María C Ávila-Arcos; Anna-Sapfo Malaspinas; G David Poznik; Marcela Sandoval-Velasco; Meredith L Carpenter; José Víctor Moreno-Mayar; Martin Sikora; Philip L F Johnson; Morten Erik Allentoft; José Alfredo Samaniego; Jay B Haviser; Michael W Dee; Thomas W Stafford; Antonio Salas; Ludovic Orlando; Eske Willerslev; Carlos D Bustamante; M Thomas P Gilbert
Journal: Proc Natl Acad Sci U S A Date: 2015-03-09 Impact factor: 11.205

Review 10. Almost 20 years of Neanderthal palaeogenetics: adaptation, admixture, diversity, demography and extinction.

Authors: Federico Sánchez-Quinto; Carles Lalueza-Fox
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-01-19 Impact factor: 6.237