Hussein Al-Asadi1,2, Kushal K Dey2, John Novembre1,3, Matthew Stephens2,3. 1. Committee on Evolutionary Biology, University of Chicago, Chicago, IL, USA. 2. Department of Statistics, University of Chicago, Chicago, IL, USA. 3. Department of Human Genetics, University of Chicago, Chicago, IL, USA.
Abstract
MOTIVATION: Quality control plays a major role in the analysis of ancient DNA (aDNA). One key step in this quality control is assessment of DNA damage: aDNA contains unique signatures of DNA damage that distinguish it from modern DNA, and so analyses of damage patterns can help confirm that DNA sequences obtained are from endogenous aDNA rather than from modern contamination. Predominant signatures of DNA damage include a high frequency of cytosine to thymine substitutions (C-to-T) at the ends of fragments, and elevated rates of purines (A & G) before the 5' strand-breaks. Existing QC procedures help assess damage by simply plotting for each sample, the C-to-T mismatch rate along the read and the composition of bases before the 5' strand-breaks. Here we present a more flexible and comprehensive model-based approach to infer and visualize damage patterns in aDNA, implemented in an R package aRchaic. This approach is based on a 'grade of membership' model (also known as 'admixture' or 'topic' model) in which each sample has an estimated grade of membership in each of K damage profiles that are estimated from the data. RESULTS: We illustrate aRchaic on data from several aDNA studies and modern individuals from 1000 Genomes Project Consortium (2012). Here, aRchaic clearly distinguishes modern from ancient samples irrespective of DNA extraction, lab and sequencing protocols. Additionally, through an in-silico contamination experiment, we show that the aRchaic grades of membership reflect relative levels of exogenous modern contamination. Together, the outputs of aRchaic provide a concise visual summary of DNA damage patterns, as well as other processes generating mismatches in the data. AVAILABILITY AND IMPLEMENTATION: aRchaic is available for download from https://www.github.com/kkdey/aRchaic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Quality control plays a major role in the analysis of ancient DNA (aDNA). One key step in this quality control is assessment of DNA damage: aDNA contains unique signatures of DNA damage that distinguish it from modern DNA, and so analyses of damage patterns can help confirm that DNA sequences obtained are from endogenous aDNA rather than from modern contamination. Predominant signatures of DNA damage include a high frequency of cytosine to thymine substitutions (C-to-T) at the ends of fragments, and elevated rates of purines (A & G) before the 5' strand-breaks. Existing QC procedures help assess damage by simply plotting for each sample, the C-to-T mismatch rate along the read and the composition of bases before the 5' strand-breaks. Here we present a more flexible and comprehensive model-based approach to infer and visualize damage patterns in aDNA, implemented in an R package aRchaic. This approach is based on a 'grade of membership' model (also known as 'admixture' or 'topic' model) in which each sample has an estimated grade of membership in each of K damage profiles that are estimated from the data. RESULTS: We illustrate aRchaic on data from several aDNA studies and modern individuals from 1000 Genomes Project Consortium (2012). Here, aRchaic clearly distinguishes modern from ancient samples irrespective of DNA extraction, lab and sequencing protocols. Additionally, through an in-silico contamination experiment, we show that the aRchaic grades of membership reflect relative levels of exogenous modern contamination. Together, the outputs of aRchaic provide a concise visual summary of DNA damage patterns, as well as other processes generating mismatches in the data. AVAILABILITY AND IMPLEMENTATION: aRchaic is available for download from https://www.github.com/kkdey/aRchaic. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Helena Malmström; Emma M Svensson; M Thomas P Gilbert; Eske Willerslev; Anders Götherström; Gunilla Holmlund Journal: Mol Biol Evol Date: 2007-01-25 Impact factor: 16.240
Authors: Matthias Meyer; Martin Kircher; Marie-Theres Gansauge; Heng Li; Fernando Racimo; Swapan Mallick; Joshua G Schraiber; Flora Jay; Kay Prüfer; Cesare de Filippo; Peter H Sudmant; Can Alkan; Qiaomei Fu; Ron Do; Nadin Rohland; Arti Tandon; Michael Siebauer; Richard E Green; Katarzyna Bryc; Adrian W Briggs; Udo Stenzel; Jesse Dabney; Jay Shendure; Jacob Kitzman; Michael F Hammer; Michael V Shunkov; Anatoli P Derevianko; Nick Patterson; Aida M Andrés; Evan E Eichler; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo Journal: Science Date: 2012-08-30 Impact factor: 47.728
Authors: Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo Journal: Nature Date: 2013-12-18 Impact factor: 49.962
Authors: Iñigo Olalde; Selina Brace; Morten E Allentoft; Ian Armit; Kristian Kristiansen; Thomas Booth; Nadin Rohland; Swapan Mallick; Anna Szécsényi-Nagy; Alissa Mittnik; Eveline Altena; Mark Lipson; Iosif Lazaridis; Thomas K Harper; Nick Patterson; Nasreen Broomandkhoshbacht; Yoan Diekmann; Zuzana Faltyskova; Daniel Fernandes; Matthew Ferry; Eadaoin Harney; Peter de Knijff; Megan Michel; Jonas Oppenheimer; Kristin Stewardson; Alistair Barclay; Kurt Werner Alt; Corina Liesau; Patricia Ríos; Concepción Blasco; Jorge Vega Miguel; Roberto Menduiña García; Azucena Avilés Fernández; Eszter Bánffy; Maria Bernabò-Brea; David Billoin; Clive Bonsall; Laura Bonsall; Tim Allen; Lindsey Büster; Sophie Carver; Laura Castells Navarro; Oliver E Craig; Gordon T Cook; Barry Cunliffe; Anthony Denaire; Kirsten Egging Dinwiddy; Natasha Dodwell; Michal Ernée; Christopher Evans; Milan Kuchařík; Joan Francès Farré; Chris Fowler; Michiel Gazenbeek; Rafael Garrido Pena; María Haber-Uriarte; Elżbieta Haduch; Gill Hey; Nick Jowett; Timothy Knowles; Ken Massy; Saskia Pfrengle; Philippe Lefranc; Olivier Lemercier; Arnaud Lefebvre; César Heras Martínez; Virginia Galera Olmo; Ana Bastida Ramírez; Joaquín Lomba Maurandi; Tona Majó; Jacqueline I McKinley; Kathleen McSweeney; Balázs Gusztáv Mende; Alessandra Modi; Gabriella Kulcsár; Viktória Kiss; András Czene; Róbert Patay; Anna Endrődi; Kitti Köhler; Tamás Hajdu; Tamás Szeniczey; János Dani; Zsolt Bernert; Maya Hoole; Olivia Cheronet; Denise Keating; Petr Velemínský; Miroslav Dobeš; Francesca Candilio; Fraser Brown; Raúl Flores Fernández; Ana-Mercedes Herrero-Corral; Sebastiano Tusa; Emiliano Carnieri; Luigi Lentini; Antonella Valenti; Alessandro Zanini; Clive Waddington; Germán Delibes; Elisa Guerra-Doce; Benjamin Neil; Marcus Brittain; Mike Luke; Richard Mortimer; Jocelyne Desideri; Marie Besse; Günter Brücken; Mirosław Furmanek; Agata Hałuszko; Maksym Mackiewicz; Artur Rapiński; Stephany Leach; Ignacio Soriano; Katina T Lillios; João Luís Cardoso; Michael Parker Pearson; Piotr Włodarczak; T Douglas Price; Pilar Prieto; Pierre-Jérôme Rey; Roberto Risch; Manuel A Rojo Guerra; Aurore Schmitt; Joël Serralongue; Ana Maria Silva; Václav Smrčka; Luc Vergnaud; João Zilhão; David Caramelli; Thomas Higham; Mark G Thomas; Douglas J Kennett; Harry Fokkens; Volker Heyd; Alison Sheridan; Karl-Göran Sjögren; Philipp W Stockhammer; Johannes Krause; Ron Pinhasi; Wolfgang Haak; Ian Barnes; Carles Lalueza-Fox; David Reich Journal: Nature Date: 2018-02-21 Impact factor: 49.962
Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean Journal: Nature Date: 2012-11-01 Impact factor: 49.962