Literature DB >> 30668635

admixr-R package for reproducible analyses using ADMIXTOOLS.

Martin Petr1, Benjamin Vernot1, Janet Kelso1.   

Abstract

SUMMARY: We present a new R package admixr, which provides a convenient interface for performing reproducible population genetic analyses (f3, D, f4, f4-ratio, qpWave and qpAdm), as implemented by command-line programs in the ADMIXTOOLS software suite. In a traditional ADMIXTOOLS workflow, the user must first generate a set of text configuration files tailored to each individual analysis, often using a combination of shell scripting and manual text editing. The non-tabular output files then need to be parsed to extract values of interest prior to further analyses. Our package simplifies this process by automating all low-level configuration and parsing steps, making analyses as simple as running a single R command. Furthermore, we provide a set of R functions for processing, filtering and manipulating datasets in the EIGENSTRAT format. By unifying all steps of the workflow under a single R framework, this package enables the automation of analytic pipelines, significantly improving the reproducibility of population genetic studies.
AVAILABILITY AND IMPLEMENTATION: The source code of the R package is available under the MIT license. Installation instructions, reference manual and a tutorial can be found on the package website at https://bioinf.eva.mpg.de/admixr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30668635      PMCID: PMC6736366          DOI: 10.1093/bioinformatics/btz030

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The growing number of ancient and modern genome sequences have transformed our understanding of the evolutionary history of humans and other species. Several statistical methods have been developed to make inferences about past population movements and admixtures from genomic data. Chief among these has been a series of population genetic methods (D, ,,-ratio, qpWave and qpAdm) for estimating the amounts of genetic drift shared between populations, testing admixture hypotheses and estimating admixture proportions, implemented as command-line utilities in the ADMIXTOOLS software suite (Patterson ). Although ADMIXTOOLS has been used in many recent studies of human ancient DNA (Fu ; Haak ; Hajdinjak ; Lazaridis ), the tools in this package are rather cumbersome to use. First, each individual analysis or hypothesis test relies on a set of configuration files, which have to be generated using a combination of shell scripting and manual editing. Second, after running an ADMIXTOOLS command on the command-line, the user needs to extract relevant values from a non-tabular text file before they can be imported into software such as R for further analysis and plotting. This workflow is slow and potentially error-prone, especially if the user wishes to quickly iterate through different hypotheses involving many different populations or samples. Most importantly, however, it makes it challenging to conduct fully reproducible research. To overcome these challenges, we present a new R package for population admixture analyses which utilizes the ADMIXTOOLS software suite for the underlying calculations, but that provides a unified and convenient R interface. The package completely automates the generation, processing and parsing of all intermediate files, hiding all low-level details from the user, and allowing them to focus on the analysis itself. Importantly, unifying the entire analytic workflow in a single environment makes it possible to implement and share fully automated, reproducible analytic pipelines.

2 Implementation

The admixr package is implemented using the R programming language. It consists of several wrapper functions (calling ADMIXTOOLS commands internally from R), and a set of complementary functions for filtering and processing datasets in the EIGENSTRAT file format required by ADMIXTOOLS (Patterson ). An EIGENSTRAT dataset is represented by an S3 object of the class EIGENSTRAT, which is created using the eigenstrat() constructor function, and encapsulates the paths to a trio of ‘ind’, ‘snp’ and ‘geno’ files: > snps <- eigenstrat(“∼/path/to/eigenstrat/data”) > snps EIGENSTRAT object ================= components: ind file: ∼/path/to/eigenstrat/data.ind snp file: ∼/path/to/eigenstrat/data.snp geno file: ∼/path/to/eigenstrat/data.geno All other functions in the package accept this object as their first argument, and perform either a requested calculation on it (returning an R data frame for further analysis), or return a new, modified EIGENSTRAT S3 object (in case of filtering and processing functions) which can be used in additional downstream steps or calculations. The core functionality of the package consists of the following set of R functions: f3(), d(), f4(), f4ratio(), qpWave() and qpAdm(), each implemented as a wrapper around one of the command-line programs distributed as part of the ADMIXTOOLS package.

3 Example usage

Performing even the most trivial analysis using ADMIXTOOLS presents a significant amount of overhead for the user. For example, to estimate the proportion of Neandertal ancestry in a set of individuals, the user would typically calculate an -ratio statistic such as: The user first needs to create a file with a list of samples in each position of both f statistics, a parameter file specifying the paths to a trio of EIGENSTRAT component files, then manually run the qpF4ratio command-line program, and then capture and parse its output to obtain relevant values (see Supplementary Information for a complete example workflow using a traditional ADMIXTOOLS approach). Note that changing the analysis setup [such as including a different set of populations in Equation (1)], performing the analysis on a subset of the genome, or modifying the analysis in another way, requires changes to be made to its configuration files. This presents a significant overhead for the user, especially when iterating through a complex set of population genetic hypotheses. In contrast, using the admixr package, the same analysis can be performed with just the following snippet of R code: result <- f4ratio( X = c(“French”, “Han”, “Papuan”), A = “Altai”, B = “Vindija”, C = “Mbuti”, O =  “Chimp”, data = eigenstrat(“”) ) Internally, the f4ratio() function performs all configuration and parsing work, and returns an R data frame which can be immediately used for further statistical analysis and plotting: > result A B X C O alpha stderr Zscore Altai Vindija French Mbuti Chimp 0.019696 0.003114  6.324 Altai Vindija Han Mbuti Chimp 0.024379 0.003364 7.248 Altai Vindija Papuan Mbuti Chimp 0.032167 0.003499 9.193 All other admixr wrapper functions have a similar interface and are described in the tutorial vignette on the package website in more detail.

4 Additional functionality

The fact that ADMIXTOOLS requires the data to be in EIGENSTRAT format presents additional challenges for quality control, processing and filtering, as this format is not supported by standard bioinformatics tools. Our R package therefore provides additional functionality to simplify the processing and filtering of EIGENSTRAT genotype data. This includes: Reading and writing of ind, snp and geno file components. Filtering of SNPs based on regions specified in a BED file. Restricting analyses to sites carrying transversion SNPs. Renaming samples or grouping them into larger population groups. Merging of EIGENSTRAT datasets. Counting the number of sites present or missing in each sample. Click here for additional data file.
  5 in total

1.  Ancient admixture in human history.

Authors:  Nick Patterson; Priya Moorjani; Yontao Luo; Swapan Mallick; Nadin Rohland; Yiping Zhan; Teri Genschoreck; Teresa Webster; David Reich
Journal:  Genetics       Date:  2012-09-07       Impact factor: 4.562

2.  Massive migration from the steppe was a source for Indo-European languages in Europe.

Authors:  Wolfgang Haak; Iosif Lazaridis; Nick Patterson; Nadin Rohland; Swapan Mallick; Bastien Llamas; Guido Brandt; Susanne Nordenfelt; Eadaoin Harney; Kristin Stewardson; Qiaomei Fu; Alissa Mittnik; Eszter Bánffy; Christos Economou; Michael Francken; Susanne Friederich; Rafael Garrido Pena; Fredrik Hallgren; Valery Khartanovich; Aleksandr Khokhlov; Michael Kunst; Pavel Kuznetsov; Harald Meller; Oleg Mochalov; Vayacheslav Moiseyev; Nicole Nicklisch; Sandra L Pichler; Roberto Risch; Manuel A Rojo Guerra; Christina Roth; Anna Szécsényi-Nagy; Joachim Wahl; Matthias Meyer; Johannes Krause; Dorcas Brown; David Anthony; Alan Cooper; Kurt Werner Alt; David Reich
Journal:  Nature       Date:  2015-03-02       Impact factor: 49.962

3.  Reconstructing the genetic history of late Neanderthals.

Authors:  Mateja Hajdinjak; Qiaomei Fu; Alexander Hübner; Martin Petr; Fabrizio Mafessoni; Steffi Grote; Pontus Skoglund; Vagheesh Narasimham; Hélène Rougier; Isabelle Crevecoeur; Patrick Semal; Marie Soressi; Sahra Talamo; Jean-Jacques Hublin; Ivan Gušić; Željko Kućan; Pavao Rudan; Liubov V Golovanova; Vladimir B Doronichev; Cosimo Posth; Johannes Krause; Petra Korlević; Sarah Nagel; Birgit Nickel; Montgomery Slatkin; Nick Patterson; David Reich; Kay Prüfer; Matthias Meyer; Svante Pääbo; Janet Kelso
Journal:  Nature       Date:  2018-03-21       Impact factor: 49.962

4.  Genomic insights into the origin of farming in the ancient Near East.

Authors:  Iosif Lazaridis; Dani Nadel; Gary Rollefson; Deborah C Merrett; Nadin Rohland; Swapan Mallick; Daniel Fernandes; Mario Novak; Beatriz Gamarra; Kendra Sirak; Sarah Connell; Kristin Stewardson; Eadaoin Harney; Qiaomei Fu; Gloria Gonzalez-Fortes; Eppie R Jones; Songül Alpaslan Roodenberg; György Lengyel; Fanny Bocquentin; Boris Gasparian; Janet M Monge; Michael Gregg; Vered Eshed; Ahuva-Sivan Mizrahi; Christopher Meiklejohn; Fokke Gerritsen; Luminita Bejenaru; Matthias Blüher; Archie Campbell; Gianpiero Cavalleri; David Comas; Philippe Froguel; Edmund Gilbert; Shona M Kerr; Peter Kovacs; Johannes Krause; Darren McGettigan; Michael Merrigan; D Andrew Merriwether; Seamus O'Reilly; Martin B Richards; Ornella Semino; Michel Shamoon-Pour; Gheorghe Stefanescu; Michael Stumvoll; Anke Tönjes; Antonio Torroni; James F Wilson; Loic Yengo; Nelli A Hovhannisyan; Nick Patterson; Ron Pinhasi; David Reich
Journal:  Nature       Date:  2016-07-25       Impact factor: 49.962

5.  The genetic history of Ice Age Europe.

Authors:  Qiaomei Fu; Cosimo Posth; Mateja Hajdinjak; Martin Petr; Swapan Mallick; Daniel Fernandes; Anja Furtwängler; Wolfgang Haak; Matthias Meyer; Alissa Mittnik; Birgit Nickel; Alexander Peltzer; Nadin Rohland; Viviane Slon; Sahra Talamo; Iosif Lazaridis; Mark Lipson; Iain Mathieson; Stephan Schiffels; Pontus Skoglund; Anatoly P Derevianko; Nikolai Drozdov; Vyacheslav Slavinsky; Alexander Tsybankov; Renata Grifoni Cremonesi; Francesco Mallegni; Bernard Gély; Eligio Vacca; Manuel R González Morales; Lawrence G Straus; Christine Neugebauer-Maresch; Maria Teschler-Nicola; Silviu Constantin; Oana Teodora Moldovan; Stefano Benazzi; Marco Peresani; Donato Coppola; Martina Lari; Stefano Ricci; Annamaria Ronchitelli; Frédérique Valentin; Corinne Thevenet; Kurt Wehrberger; Dan Grigorescu; Hélène Rougier; Isabelle Crevecoeur; Damien Flas; Patrick Semal; Marcello A Mannino; Christophe Cupillard; Hervé Bocherens; Nicholas J Conard; Katerina Harvati; Vyacheslav Moiseyev; Dorothée G Drucker; Jiří Svoboda; Michael P Richards; David Caramelli; Ron Pinhasi; Janet Kelso; Nick Patterson; Johannes Krause; Svante Pääbo; David Reich
Journal:  Nature       Date:  2016-05-02       Impact factor: 49.962

  5 in total
  27 in total

1.  Genome-wide patterns of divergence and introgression after secondary contact between Pungitius sticklebacks.

Authors:  Yo Y Yamasaki; Ryo Kakioka; Hiroshi Takahashi; Atsushi Toyoda; Atsushi J Nagano; Yoshiyasu Machida; Peter R Møller; Jun Kitano
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2020-07-13       Impact factor: 6.237

2.  Ancient genomes from the last three millennia support multiple human dispersals into Wallacea.

Authors:  Sandra Oliveira; Kathrin Nägele; Selina Carlhoff; Johannes Krause; Cosimo Posth; Mark Stoneking; Irina Pugach; Toetik Koesbardiati; Alexander Hübner; Matthias Meyer; Adhi Agus Oktaviana; Masami Takenaka; Chiaki Katagiri; Delta Bayu Murti; Rizky Sugianto Putri; Fiona Petchey; Thomas Higham; Charles F W Higham; Sue O'Connor; Stuart Hawkins; Rebecca Kinaston; Peter Bellwood; Rintaro Ono; Adam Powell
Journal:  Nat Ecol Evol       Date:  2022-06-09       Impact factor: 19.100

3.  The genomic signatures of natural selection in admixed human populations.

Authors:  Sebastian Cuadros-Espinoza; Guillaume Laval; Lluis Quintana-Murci; Etienne Patin
Journal:  Am J Hum Genet       Date:  2022-03-07       Impact factor: 11.043

4.  Plasmodium simium: Population Genomics Reveals the Origin of a Reverse Zoonosis.

Authors:  Thaís C de Oliveira; Priscila T Rodrigues; Angela M Early; Ana Maria R C Duarte; Julyana C Buery; Marina G Bueno; José L Catão-Dias; Crispim Cerutti; Luísa D P Rona; Daniel E Neafsey; Marcelo U Ferreira
Journal:  J Infect Dis       Date:  2021-12-01       Impact factor: 5.226

5.  The genomes of ancient date palms germinated from 2,000 y old seeds.

Authors:  Muriel Gros-Balthazard; Jonathan M Flowers; Khaled M Hazzouri; Sylvie Ferrand; Frédérique Aberlenc; Sarah Sallon; Michael D Purugganan
Journal:  Proc Natl Acad Sci U S A       Date:  2021-05-11       Impact factor: 11.205

6.  Novel insights on demographic history of tribal and caste groups from West Maharashtra (India) using genome-wide data.

Authors:  Guilherme Debortoli; Cristina Abbatangelo; Francisco Ceballos; Cesar Fortes-Lima; Heather L Norton; Shantanu Ozarkar; Esteban J Parra; Manjari Jonnalagadda
Journal:  Sci Rep       Date:  2020-06-22       Impact factor: 4.379

7.  An Out-of-Patagonia migration explains the worldwide diversity and distribution of Saccharomyces eubayanus lineages.

Authors:  Roberto F Nespolo; Carlos A Villarroel; Christian I Oporto; Sebastián M Tapia; Franco Vega-Macaya; Kamila Urbina; Matteo De Chiara; Simone Mozzachiodi; Ekaterina Mikhalev; Dawn Thompson; Luis F Larrondo; Pablo Saenz-Agudelo; Gianni Liti; Francisco A Cubillos
Journal:  PLoS Genet       Date:  2020-05-01       Impact factor: 5.917

8.  High Levels of Genetic Diversity within Nilo-Saharan Populations: Implications for Human Adaptation.

Authors:  Julius Mulindwa; Harry Noyes; Hamidou Ilboudo; Luca Pagani; Oscar Nyangiri; Magambo Phillip Kimuda; Bernardin Ahouty; Olivier Fataki Asina; Elvis Ofon; Kelita Kamoto; Justin Windingoudi Kabore; Mathurin Koffi; Dieudonne Mumba Ngoyi; Gustave Simo; John Chisi; Issa Sidibe; John Enyaru; Martin Simuunza; Pius Alibu; Vincent Jamonneau; Mamadou Camara; Andy Tait; Neil Hall; Bruno Bucheton; Annette MacLeod; Christiane Hertz-Fowler; Enock Matovu
Journal:  Am J Hum Genet       Date:  2020-08-10       Impact factor: 11.025

9.  The shaping of immunological responses through natural selection after the Roma Diaspora.

Authors:  Begoña Dobon; Rob Ter Horst; Hafid Laayouni; Mayukh Mondal; Erica Bianco; David Comas; Mihai Ioana; Elena Bosch; Jaume Bertranpetit; Mihai G Netea
Journal:  Sci Rep       Date:  2020-09-30       Impact factor: 4.379

10.  New Guinea highland wild dogs are the original New Guinea singing dogs.

Authors:  Suriani Surbakti; Heidi G Parker; James K McIntyre; Hendra K Maury; Kylie M Cairns; Meagan Selvig; Margaretha Pangau-Adam; Apolo Safonpo; Leonardo Numberi; Dirk Y P Runtuboi; Brian W Davis; Elaine A Ostrander
Journal:  Proc Natl Acad Sci U S A       Date:  2020-08-31       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.