Literature DB >> 24288159

Characterizing bias in population genetic inferences from low-coverage sequencing data.

Eunjung Han1, Janet S Sinsheimer, John Novembre.   

Abstract

The site frequency spectrum (SFS) is of primary interest in population genetic studies, because the SFS compresses variation data into a simple summary from which many population genetic inferences can proceed. However, inferring the SFS from sequencing data is challenging because genotype calls from sequencing data are often inaccurate due to high error rates and if not accounted for, this genotype uncertainty can lead to serious bias in downstream analysis based on the inferred SFS. Here, we compare two approaches to estimate the SFS from sequencing data: one approach infers individual genotypes from aligned sequencing reads and then estimates the SFS based on the inferred genotypes (call-based approach) and the other approach directly estimates the SFS from aligned sequencing reads by maximum likelihood (direct estimation approach). We find that the SFS estimated by the direct estimation approach is unbiased even at low coverage, whereas the SFS by the call-based approach becomes biased as coverage decreases. The direction of the bias in the call-based approach depends on the pipeline to infer genotypes. Estimating genotypes by pooling individuals in a sample (multisample calling) results in underestimation of the number of rare variants, whereas estimating genotypes in each individual and merging them later (single-sample calling) leads to overestimation of rare variants. We characterize the impact of these biases on downstream analyses, such as demographic parameter estimation and genome-wide selection scans. Our work highlights that depending on the pipeline used to infer the SFS, one can reach different conclusions in population genetic inference with the same data set. Thus, careful attention to the analysis pipeline and SFS estimation procedures is vital for population genetic inferences.

Keywords:  accuracy; base-calling errors; maximum likelihood; site frequency spectrum

Mesh:

Year:  2013        PMID: 24288159      PMCID: PMC3935184          DOI: 10.1093/molbev/mst229

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


  39 in total

1.  Accounting for bias from sequencing error in population genetic estimates.

Authors:  Philip L F Johnson; Montgomery Slatkin
Journal:  Mol Biol Evol       Date:  2007-11-02       Impact factor: 16.240

2.  Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects.

Authors:  Michael Lynch
Journal:  Mol Biol Evol       Date:  2008-08-25       Impact factor: 16.240

3.  Testing for neutrality in samples with sequencing errors.

Authors:  Guillaume Achaz
Journal:  Genetics       Date:  2008-06-18       Impact factor: 4.562

4.  Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection.

Authors:  Y X Fu
Journal:  Genetics       Date:  1997-10       Impact factor: 4.562

5.  Statistical tests of neutrality of mutations.

Authors:  Y X Fu; W H Li
Journal:  Genetics       Date:  1993-03       Impact factor: 4.562

6.  Properties of statistical tests of neutrality for DNA polymorphism data.

Authors:  K L Simonsen; G A Churchill; C F Aquadro
Journal:  Genetics       Date:  1995-09       Impact factor: 4.562

7.  Evolutionary relationship of DNA sequences in finite populations.

Authors:  F Tajima
Journal:  Genetics       Date:  1983-10       Impact factor: 4.562

8.  Hitchhiking effects of recurrent beneficial amino acid substitutions in the Drosophila melanogaster genome.

Authors:  Peter Andolfatto
Journal:  Genome Res       Date:  2007-11-07       Impact factor: 9.043

9.  Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans.

Authors:  David J Begun; Alisha K Holloway; Kristian Stevens; Ladeana W Hillier; Yu-Ping Poh; Matthew W Hahn; Phillip M Nista; Corbin D Jones; Andrew D Kern; Colin N Dewey; Lior Pachter; Eugene Myers; Charles H Langley
Journal:  PLoS Biol       Date:  2007-11-06       Impact factor: 8.029

10.  Stacks: an analysis tool set for population genomics.

Authors:  Julian Catchen; Paul A Hohenlohe; Susan Bassham; Angel Amores; William A Cresko
Journal:  Mol Ecol       Date:  2013-05-24       Impact factor: 6.185

View more
  36 in total

1.  Genotype-Frequency Estimation from High-Throughput Sequencing Data.

Authors:  Takahiro Maruki; Michael Lynch
Journal:  Genetics       Date:  2015-07-29       Impact factor: 4.562

Review 2.  Population genetic studies in the genomic sequencing era.

Authors:  Hua Chen
Journal:  Dongwuxue Yanjiu       Date:  2015-07-18

3.  Fast and accurate site frequency spectrum estimation from low coverage sequence data.

Authors:  Eunjung Han; Janet S Sinsheimer; John Novembre
Journal:  Bioinformatics       Date:  2014-10-30       Impact factor: 6.937

4.  Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans.

Authors:  Krishna R Veeramah; Ryan N Gutenkunst; August E Woerner; Joseph C Watkins; Michael F Hammer
Journal:  Mol Biol Evol       Date:  2014-05-15       Impact factor: 16.240

5.  Genome-wide estimation of linkage disequilibrium from population-level high-throughput sequencing data.

Authors:  Takahiro Maruki; Michael Lynch
Journal:  Genetics       Date:  2014-05-28       Impact factor: 4.562

6.  Gene exchange between two divergent species of the fungal human pathogen, Coccidioides.

Authors:  Colin S Maxwell; Kathleen Mattox; David A Turissini; Marcus M Teixeira; Bridget M Barker; Daniel R Matute
Journal:  Evolution       Date:  2018-12-04       Impact factor: 3.694

Review 7.  Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.

Authors:  Sean Hoban; Joanna L Kelley; Katie E Lotterhos; Michael F Antolin; Gideon Bradburd; David B Lowry; Mary L Poss; Laura K Reed; Andrew Storfer; Michael C Whitlock
Journal:  Am Nat       Date:  2016-08-15       Impact factor: 3.926

8.  Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent.

Authors:  Hideo Imamura; Tim Downing; Frederik Van den Broeck; Mandy J Sanders; Suman Rijal; Shyam Sundar; An Mannaert; Manu Vanaerschot; Maya Berg; Géraldine De Muylder; Franck Dumetz; Bart Cuypers; Ilse Maes; Malgorzata Domagalska; Saskia Decuypere; Keshav Rai; Surendra Uranw; Narayan Raj Bhattarai; Basudha Khanal; Vijay Kumar Prajapati; Smriti Sharma; Olivia Stark; Gabriele Schönian; Harry P De Koning; Luca Settimo; Benoit Vanhollebeke; Syamal Roy; Bart Ostyn; Marleen Boelaert; Louis Maes; Matthew Berriman; Jean-Claude Dujardin; James A Cotton
Journal:  Elife       Date:  2016-03-22       Impact factor: 8.140

9.  Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Am J Hum Genet       Date:  2015-08-20       Impact factor: 11.025

10.  Uganda Genome Resource Enables Insights into Population History and Genomic Discovery in Africa.

Authors:  Deepti Gurdasani; Tommy Carstensen; Segun Fatumo; Guanjie Chen; Chris S Franklin; Javier Prado-Martinez; Heleen Bouman; Federico Abascal; Marc Haber; Ioanna Tachmazidou; Iain Mathieson; Kenneth Ekoru; Marianne K DeGorter; Rebecca N Nsubuga; Chris Finan; Eleanor Wheeler; Li Chen; David N Cooper; Stephan Schiffels; Yuan Chen; Graham R S Ritchie; Martin O Pollard; Mary D Fortune; Alex J Mentzer; Erik Garrison; Anders Bergström; Konstantinos Hatzikotoulas; Adebowale Adeyemo; Ayo Doumatey; Heather Elding; Louise V Wain; Georg Ehret; Paul L Auer; Charles L Kooperberg; Alexander P Reiner; Nora Franceschini; Dermot Maher; Stephen B Montgomery; Carl Kadie; Chris Widmer; Yali Xue; Janet Seeley; Gershim Asiki; Anatoli Kamali; Elizabeth H Young; Cristina Pomilla; Nicole Soranzo; Eleftheria Zeggini; Fraser Pirie; Andrew P Morris; David Heckerman; Chris Tyler-Smith; Ayesha A Motala; Charles Rotimi; Pontiano Kaleebu; Inês Barroso; Manj S Sandhu
Journal:  Cell       Date:  2019-10-31       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.