Literature DB >> 34351073

Pseudoreplication in genomic-scale data sets.

Robin S Waples1, Ryan K Waples2, Eric J Ward1.   

Abstract

In genomic-scale data sets, loci are closely packed within chromosomes and hence provide correlated information. Averaging across loci as if they were independent creates pseudoreplication, which reduces the effective degrees of freedom (df') compared to the nominal degrees of freedom, df. This issue has been known for some time, but consequences have not been systematically quantified across the entire genome. Here, we measured pseudoreplication (quantified by the ratio df'/df) for a common metric of genetic differentiation (FST ) and a common measure of linkage disequilibrium between pairs of loci (r2 ). Based on data simulated using models (SLiM and msprime) that allow efficient forward-in-time and coalescent simulations while precisely controlling population pedigrees, we estimated df' and df'/df by measuring the rate of decline in the variance of mean FST and mean r2 as more loci were used. For both indices, df' increases with Ne and genome size, as expected. However, even for large Ne and large genomes, df' for mean r2 plateaus after a few thousand loci, and a variance components analysis indicates that the limiting factor is uncertainty associated with sampling individuals rather than genes. Pseudoreplication is less extreme for FST , but df'/df ≤0.01 can occur in data sets using tens of thousands of loci. Commonly-used block-jackknife methods consistently overestimated var (FST ), producing very conservative confidence intervals. Predicting df' based on our modelling results as a function of Ne , L, S, and genome size provides a robust way to quantify precision associated with genomic-scale data sets.
© 2021 John Wiley & Sons Ltd. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

Entities:  

Keywords:  zzm321990FSTzzm321990; zzm321990Nezzm321990; degrees of freedom; genome size; jackknife variance; linkage disequilibrium; simulations

Mesh:

Year:  2021        PMID: 34351073      PMCID: PMC9415146          DOI: 10.1111/1755-0998.13482

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   8.678


  54 in total

1.  Estimation of levels of gene flow from DNA sequence data.

Authors:  R R Hudson; M Slatkin; W P Maddison
Journal:  Genetics       Date:  1992-10       Impact factor: 4.562

2.  A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other.

Authors:  Dale R Nyholt
Journal:  Am J Hum Genet       Date:  2004-03-02       Impact factor: 11.025

3.  The genetical structure of populations.

Authors:  S WRIGHT
Journal:  Ann Eugen       Date:  1951-03

4.  Modelling evolutionary processes in small populations: not as ideal as you think.

Authors:  Robin S Waples; James R Faulkner
Journal:  Mol Ecol       Date:  2009-03-31       Impact factor: 6.185

5.  ldne: a program for estimating effective population size from data on linkage disequilibrium.

Authors:  Robin S Waples; Chi DO
Journal:  Mol Ecol Resour       Date:  2008-07       Impact factor: 7.090

Review 6.  Making sense of genomic islands of differentiation in light of speciation.

Authors:  Jochen B W Wolf; Hans Ellegren
Journal:  Nat Rev Genet       Date:  2016-11-14       Impact factor: 53.242

7.  VARIANCE OF GENE FREQUENCIES.

Authors:  C Clark Cockerham
Journal:  Evolution       Date:  1969-03       Impact factor: 3.694

8.  ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.

Authors:  B S Weir; C Clark Cockerham
Journal:  Evolution       Date:  1984-11       Impact factor: 3.694

9.  Effects of the population pedigree on genetic signatures of historical demographic events.

Authors:  John Wakeley; Léandra King; Peter R Wilton
Journal:  Proc Natl Acad Sci U S A       Date:  2016-07-19       Impact factor: 11.205

10.  Ascertainment bias in spatially structured populations: a case study in the eastern fence lizard.

Authors:  Erica Bree Rosenblum; John Novembre
Journal:  J Hered       Date:  2007-07-04       Impact factor: 2.645

View more
  4 in total

1.  Pedigree analysis and estimates of effective breeding size characterize sea lamprey reproductive biology.

Authors:  Ellen M Weise; Kim T Scribner; Jean V Adams; Olivia Boeberitz; Aaron K Jubar; Gale Bravener; Nicholas S Johnson; John D Robinson
Journal:  Evol Appl       Date:  2022-03-15       Impact factor: 5.183

2.  Bray-Curtis (AFD) differentiation in molecular ecology: Forecasting, an adjustment ( A A), and comparative performance in selection detection.

Authors:  William B Sherwin
Journal:  Ecol Evol       Date:  2022-09-11       Impact factor: 3.167

3.  Parallel recolonizations generate distinct genomic sectors in kelp following high-magnitude earthquake disturbance.

Authors:  Felix Vaux; Elahe Parvizi; Dave Craw; Ceridwen I Fraser; Jonathan M Waters
Journal:  Mol Ecol       Date:  2022-06-21       Impact factor: 6.622

4.  Commonly used Hardy-Weinberg equilibrium filtering schemes impact population structure inferences using RADseq data.

Authors:  William S Pearman; Lara Urban; Alana Alexander
Journal:  Mol Ecol Resour       Date:  2022-06-05       Impact factor: 8.678

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.