J Víctor Moreno-Mayar1,2,3, Thorfinn Sand Korneliussen4, Jyoti Dalal1,2, Gabriel Renaud4, Anders Albrechtsen5, Rasmus Nielsen4,6,7, Anna-Sapfo Malaspinas1,2. 1. Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland. 2. Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland. 3. National Institute of Genomic Medicine (INMEGEN), 14610 Mexico City, Mexico. 4. Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, 1350 Copenhagen. 5. Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen, Denmark. 6. Department of Statistics, CA 94720, USA. 7. Department of Integrative Biology, University of California, Berkeley, CA 94720, USA.
Abstract
MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets. RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets. RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
Authors: Svante Pääbo; Hendrik Poinar; David Serre; Viviane Jaenicke-Despres; Juliane Hebler; Nadin Rohland; Melanie Kuch; Johannes Krause; Linda Vigilant; Michael Hofreiter Journal: Annu Rev Genet Date: 2004 Impact factor: 16.830
Authors: Richard E Green; Anna-Sapfo Malaspinas; Johannes Krause; Adrian W Briggs; Philip L F Johnson; Caroline Uhler; Matthias Meyer; Jeffrey M Good; Tomislav Maricic; Udo Stenzel; Kay Prüfer; Michael Siebauer; Hernán A Burbano; Michael Ronan; Jonathan M Rothberg; Michael Egholm; Pavao Rudan; Dejana Brajković; Zeljko Kućan; Ivan Gusić; Mårten Wikström; Liisa Laakkonen; Janet Kelso; Montgomery Slatkin; Svante Pääbo Journal: Cell Date: 2008-08-08 Impact factor: 41.582
Authors: Morten Rasmussen; Xiaosen Guo; Yong Wang; Kirk E Lohmueller; Simon Rasmussen; Anders Albrechtsen; Line Skotte; Stinus Lindgreen; Mait Metspalu; Thibaut Jombart; Toomas Kivisild; Weiwei Zhai; Anders Eriksson; Andrea Manica; Ludovic Orlando; Francisco M De La Vega; Silvana Tridico; Ene Metspalu; Kasper Nielsen; María C Ávila-Arcos; J Víctor Moreno-Mayar; Craig Muller; Joe Dortch; M Thomas P Gilbert; Ole Lund; Agata Wesolowska; Monika Karmin; Lucy A Weinert; Bo Wang; Jun Li; Shuaishuai Tai; Fei Xiao; Tsunehiko Hanihara; George van Driem; Aashish R Jha; François-Xavier Ricaut; Peter de Knijff; Andrea B Migliano; Irene Gallego Romero; Karsten Kristiansen; David M Lambert; Søren Brunak; Peter Forster; Bernd Brinkmann; Olaf Nehlich; Michael Bunce; Michael Richards; Ramneek Gupta; Carlos D Bustamante; Anders Krogh; Robert A Foley; Marta M Lahr; Francois Balloux; Thomas Sicheritz-Pontén; Richard Villems; Rasmus Nielsen; Jun Wang; Eske Willerslev Journal: Science Date: 2011-09-22 Impact factor: 47.728
Authors: Iñigo Olalde; Selina Brace; Morten E Allentoft; Ian Armit; Kristian Kristiansen; Thomas Booth; Nadin Rohland; Swapan Mallick; Anna Szécsényi-Nagy; Alissa Mittnik; Eveline Altena; Mark Lipson; Iosif Lazaridis; Thomas K Harper; Nick Patterson; Nasreen Broomandkhoshbacht; Yoan Diekmann; Zuzana Faltyskova; Daniel Fernandes; Matthew Ferry; Eadaoin Harney; Peter de Knijff; Megan Michel; Jonas Oppenheimer; Kristin Stewardson; Alistair Barclay; Kurt Werner Alt; Corina Liesau; Patricia Ríos; Concepción Blasco; Jorge Vega Miguel; Roberto Menduiña García; Azucena Avilés Fernández; Eszter Bánffy; Maria Bernabò-Brea; David Billoin; Clive Bonsall; Laura Bonsall; Tim Allen; Lindsey Büster; Sophie Carver; Laura Castells Navarro; Oliver E Craig; Gordon T Cook; Barry Cunliffe; Anthony Denaire; Kirsten Egging Dinwiddy; Natasha Dodwell; Michal Ernée; Christopher Evans; Milan Kuchařík; Joan Francès Farré; Chris Fowler; Michiel Gazenbeek; Rafael Garrido Pena; María Haber-Uriarte; Elżbieta Haduch; Gill Hey; Nick Jowett; Timothy Knowles; Ken Massy; Saskia Pfrengle; Philippe Lefranc; Olivier Lemercier; Arnaud Lefebvre; César Heras Martínez; Virginia Galera Olmo; Ana Bastida Ramírez; Joaquín Lomba Maurandi; Tona Majó; Jacqueline I McKinley; Kathleen McSweeney; Balázs Gusztáv Mende; Alessandra Modi; Gabriella Kulcsár; Viktória Kiss; András Czene; Róbert Patay; Anna Endrődi; Kitti Köhler; Tamás Hajdu; Tamás Szeniczey; János Dani; Zsolt Bernert; Maya Hoole; Olivia Cheronet; Denise Keating; Petr Velemínský; Miroslav Dobeš; Francesca Candilio; Fraser Brown; Raúl Flores Fernández; Ana-Mercedes Herrero-Corral; Sebastiano Tusa; Emiliano Carnieri; Luigi Lentini; Antonella Valenti; Alessandro Zanini; Clive Waddington; Germán Delibes; Elisa Guerra-Doce; Benjamin Neil; Marcus Brittain; Mike Luke; Richard Mortimer; Jocelyne Desideri; Marie Besse; Günter Brücken; Mirosław Furmanek; Agata Hałuszko; Maksym Mackiewicz; Artur Rapiński; Stephany Leach; Ignacio Soriano; Katina T Lillios; João Luís Cardoso; Michael Parker Pearson; Piotr Włodarczak; T Douglas Price; Pilar Prieto; Pierre-Jérôme Rey; Roberto Risch; Manuel A Rojo Guerra; Aurore Schmitt; Joël Serralongue; Ana Maria Silva; Václav Smrčka; Luc Vergnaud; João Zilhão; David Caramelli; Thomas Higham; Mark G Thomas; Douglas J Kennett; Harry Fokkens; Volker Heyd; Alison Sheridan; Karl-Göran Sjögren; Philipp W Stockhammer; Johannes Krause; Ron Pinhasi; Wolfgang Haak; Ian Barnes; Carles Lalueza-Fox; David Reich Journal: Nature Date: 2018-02-21 Impact factor: 49.962
Authors: Clio Der Sarkissian; Morten E Allentoft; María C Ávila-Arcos; Ross Barnett; Paula F Campos; Enrico Cappellini; Luca Ermini; Ruth Fernández; Rute da Fonseca; Aurélien Ginolhac; Anders J Hansen; Hákon Jónsson; Thorfinn Korneliussen; Ashot Margaryan; Michael D Martin; J Víctor Moreno-Mayar; Maanasa Raghavan; Morten Rasmussen; Marcela Sandoval Velasco; Hannes Schroeder; Mikkel Schubert; Andaine Seguin-Orlando; Nathan Wales; M Thomas P Gilbert; Eske Willerslev; Ludovic Orlando Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-01-19 Impact factor: 6.237
Authors: Florian Clemente; Martina Unterländer; Olga Dolgova; Carlos Eduardo G Amorim; Francisco Coroado-Santos; Samuel Neuenschwander; Elissavet Ganiatsou; Diana I Cruz Dávalos; Lucas Anchieri; Frédéric Michaud; Laura Winkelbach; Jens Blöcher; Yami Ommar Arizmendi Cárdenas; Bárbara Sousa da Mota; Eleni Kalliga; Angelos Souleles; Ioannis Kontopoulos; Georgia Karamitrou-Mentessidi; Olga Philaniotou; Adamantios Sampson; Dimitra Theodorou; Metaxia Tsipopoulou; Ioannis Akamatis; Paul Halstead; Kostas Kotsakis; Dushka Urem-Kotsou; Diamantis Panagiotopoulos; Christina Ziota; Sevasti Triantaphyllou; Olivier Delaneau; Jeffrey D Jensen; J Víctor Moreno-Mayar; Joachim Burger; Vitor C Sousa; Oscar Lao; Anna-Sapfo Malaspinas; Christina Papageorgopoulou Journal: Cell Date: 2021-04-29 Impact factor: 41.582