Literature DB >> 33453139

pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data.

Katharine L Korunes1, Kieran Samuk2.   

Abstract

Population genetic analyses often use summary statistics to describe patterns of genetic variation and provide insight into evolutionary processes. Among the most fundamental of these summary statistics are π and dXY , which are used to describe genetic diversity within and between populations, respectively. Here, we address a widespread issue in π and dXY calculation: systematic bias generated by missing data of various types. Many popular methods for calculating π and dXY operate on data encoded in the variant call format (VCF), which condenses genetic data by omitting invariant sites. When calculating π and dXY using a VCF, it is often implicitly assumed that missing genotypes (including those at sites not represented in the VCF) are homozygous for the reference allele. Here, we show how this assumption can result in substantial downward bias in estimates of π and dXY that is directly proportional to the amount of missing data. We discuss the pervasive nature and importance of this problem in population genetics, and introduce a user-friendly UNIX command line utility, pixy, that solves this problem via an algorithm that generates unbiased estimates of π and dXY in the face of missing data. We compare pixy to existing methods using both simulated and empirical data, and show that pixy alone produces unbiased estimates of π and dXY regardless of the form or amount of missing data. In summary, our software solves a long-standing problem in applied population genetics and highlights the importance of properly accounting for missing data in population genetic analyses.
© 2021 John Wiley & Sons Ltd.

Entities:  

Keywords:  bioinfomatics/phyloinfomatics; genomics/proteomics; molecular evolution; population genetics - empirical; software

Mesh:

Substances:

Year:  2021        PMID: 33453139      PMCID: PMC8044049          DOI: 10.1111/1755-0998.13326

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  23 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  Gene flow and selection interact to promote adaptive divergence in regions of low recombination.

Authors:  Kieran Samuk; Gregory L Owens; Kira E Delmore; Sara E Miller; Diana J Rennison; Dolph Schluter
Journal:  Mol Ecol       Date:  2017-07-29       Impact factor: 6.185

Review 3.  Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow.

Authors:  Tami E Cruickshank; Matthew W Hahn
Journal:  Mol Ecol       Date:  2014-06-17       Impact factor: 6.185

4.  Sampling variances of heterozygosity and genetic distance.

Authors:  M Nei; A K Roychoudhury
Journal:  Genetics       Date:  1974-02       Impact factor: 4.562

5.  Breaking RAD: an evaluation of the utility of restriction site-associated DNA sequencing for genome scans of adaptation.

Authors:  David B Lowry; Sean Hoban; Joanna L Kelley; Katie E Lotterhos; Laura K Reed; Michael F Antolin; Andrew Storfer
Journal:  Mol Ecol Resour       Date:  2016-12-16       Impact factor: 7.090

6.  A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species.

Authors:  Robert J Elshire; Jeffrey C Glaubitz; Qi Sun; Jesse A Poland; Ken Kawamoto; Edward S Buckler; Sharon E Mitchell
Journal:  PLoS One       Date:  2011-05-04       Impact factor: 3.240

7.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

8.  VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases.

Authors:  Gloria I Giraldo-Calderón; Scott J Emrich; Robert M MacCallum; Gareth Maslen; Emmanuel Dialynas; Pantelis Topalis; Nicholas Ho; Sandra Gesing; Gregory Madey; Frank H Collins; Daniel Lawson
Journal:  Nucleic Acids Res       Date:  2014-12-15       Impact factor: 16.971

9.  ANGSD: Analysis of Next Generation Sequencing Data.

Authors:  Thorfinn Sand Korneliussen; Anders Albrechtsen; Rasmus Nielsen
Journal:  BMC Bioinformatics       Date:  2014-11-25       Impact factor: 3.169

10.  The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference.

Authors:  Lex Flagel; Yaniv Brandvain; Daniel R Schrider
Journal:  Mol Biol Evol       Date:  2019-02-01       Impact factor: 16.240

View more
  27 in total

1.  The roles of balancing selection and recombination in the evolution of rattlesnake venom.

Authors:  Drew R Schield; Blair W Perry; Richard H Adams; Matthew L Holding; Zachary L Nikolakis; Siddharth S Gopalan; Cara F Smith; Joshua M Parker; Jesse M Meik; Michael DeGiorgio; Stephen P Mackessy; Todd A Castoe
Journal:  Nat Ecol Evol       Date:  2022-07-18       Impact factor: 19.100

2.  Population genomics of ancient and modern Trichuris trichiura.

Authors:  Stephen R Doyle; Martin Jensen Søe; Peter Nejsum; Martha Betson; Philip J Cooper; Lifei Peng; Xing-Quan Zhu; Ana Sanchez; Gabriela Matamoros; Gustavo Adolfo Fontecha Sandoval; Cristina Cutillas; Louis-Albert Tchuem Tchuenté; Zeleke Mekonnen; Shaali M Ame; Harriet Namwanje; Bruno Levecke; Matthew Berriman; Brian Lund Fredensborg; Christian Moliin Outzen Kapel
Journal:  Nat Commun       Date:  2022-07-06       Impact factor: 17.694

3.  Chromosome-Level Genome Assembly Reveals Dynamic Sex Chromosomes in Neotropical Leaf-Litter Geckos (Sphaerodactylidae: Sphaerodactylus).

Authors:  Brendan J Pinto; Shannon E Keating; Stuart V Nielsen; Daniel P Scantlebury; Juan D Daza; Tony Gamble
Journal:  J Hered       Date:  2022-07-09       Impact factor: 2.679

4.  Gene body methylation is under selection in Arabidopsis thaliana.

Authors:  Aline Muyle; Jeffrey Ross-Ibarra; Danelle K Seymour; Brandon S Gaut
Journal:  Genetics       Date:  2021-06-24       Impact factor: 4.562

5.  RAD-Seq and Ecological Niche Reveal Genetic Diversity, Phylogeny, and Geographic Distribution of Kadsura interior and Its Closely Related Species.

Authors:  Yuqing Dong; Xueping Wei; Tingyan Qiang; Jiushi Liu; Peng Che; Yaodong Qi; Bengang Zhang; Haitao Liu
Journal:  Front Plant Sci       Date:  2022-04-26       Impact factor: 6.627

6.  Population genomics of apricots unravels domestication history and adaptive events.

Authors:  Alexis Groppi; Shuo Liu; Amandine Cornille; Stéphane Decroocq; Quynh Trang Bui; David Tricon; Corinne Cruaud; Sandrine Arribat; Caroline Belser; William Marande; Jérôme Salse; Cécile Huneau; Nathalie Rodde; Wassim Rhalloussi; Stéphane Cauet; Benjamin Istace; Erwan Denis; Sébastien Carrère; Jean-Marc Audergon; Guillaume Roch; Patrick Lambert; Tetyana Zhebentyayeva; Wei-Sheng Liu; Olivier Bouchez; Céline Lopez-Roques; Rémy-Félix Serre; Robert Debuchy; Joseph Tran; Patrick Wincker; Xilong Chen; Pierre Pétriacq; Aurélien Barre; Macha Nikolski; Jean-Marc Aury; Albert Glenn Abbott; Tatiana Giraud; Véronique Decroocq
Journal:  Nat Commun       Date:  2021-06-25       Impact factor: 14.919

7.  Genetic Divergence and Population Structure in Weedy and Cultivated Broomcorn Millets (Panicum miliaceum L.) Revealed by Specific-Locus Amplified Fragment Sequencing (SLAF-Seq).

Authors:  Chunxiang Li; Minxuan Liu; Fengjie Sun; Xinyu Zhao; Mingyue He; Tianshu Li; Ping Lu; Yue Xu
Journal:  Front Plant Sci       Date:  2021-06-24       Impact factor: 5.753

8.  Signatures of mitonuclear coevolution in a warbler species complex.

Authors:  Silu Wang; Madelyn J Ore; Else K Mikkelsen; Julie Lee-Yaw; David P L Toews; Sievert Rohwer; Darren Irwin
Journal:  Nat Commun       Date:  2021-07-13       Impact factor: 14.919

9.  Whole-genome sequencing of Schistosoma mansoni reveals extensive diversity with limited selection despite mass drug administration.

Authors:  Duncan J Berger; Thomas Crellen; Poppy H L Lamberton; Fiona Allan; Alan Tracey; Jennifer D Noonan; Narcis B Kabatereine; Edridah M Tukahebwa; Moses Adriko; Nancy Holroyd; Joanne P Webster; Matthew Berriman; James A Cotton
Journal:  Nat Commun       Date:  2021-08-06       Impact factor: 14.919

10.  Demography and selection analysis of the incipient adaptive radiation of a Hawaiian woody species.

Authors:  Ayako Izuno; Yusuke Onoda; Gaku Amada; Keito Kobayashi; Mana Mukai; Yuji Isagi; Kentaro K Shimizu
Journal:  PLoS Genet       Date:  2022-01-21       Impact factor: 5.917

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.