Literature DB >> 35579549

Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.

Alex Mas-Sandoval1, Nathaniel S Pope2, Knud Nor Nielsen3, Isin Altinkaya4, Matteo Fumagalli1,5, Thorfinn Sand Korneliussen4.   

Abstract

BACKGROUND: The site frequency spectrum summarizes the distribution of allele frequencies throughout the genome, and it is widely used as a summary statistic to infer demographic parameters and to detect signals of natural selection. The use of high-throughput low-coverage DNA sequencing data can lead to biased estimates of the site frequency spectrum due to high levels of uncertainty in genotyping.
RESULTS: Here we design and implement a method to efficiently and accurately estimate the multidimensional joint site frequency spectrum for large numbers of haploid or diploid individuals across an arbitrary number of populations, using low-coverage sequencing data. The method maximizes a likelihood function that represents the probability of the sequencing data observed given a multidimensional site frequency spectrum using genotype likelihoods. Notably, it uses an advanced binning heuristic paired with an accelerated expectation-maximization algorithm for a fast and memory-efficient computation, and can generate both unfolded and folded spectra and bootstrapped replicates for haploid and diploid genomes. On the basis of extensive simulations, we show that the new method requires remarkably less storage and is faster than previous implementations whilst retaining the same accuracy. When applied to low-coverage sequencing data from the fungal pathogen Neonectria neomacrospora, results recapitulate the patterns of population differentiation generated using the original high-coverage data.
CONCLUSION: The new implementation allows for accurate estimation of population genetic parameters from arbitrarily large, low-coverage datasets, thus facilitating cost-effective sequencing experiments in model and non-model organisms.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  genotype likelihoods; high-throughput sequencing; maximum likelihood; next-generation sequencing; population genetics; site frequency spectrum; threading

Mesh:

Year:  2022        PMID: 35579549      PMCID: PMC9112775          DOI: 10.1093/gigascience/giac032

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  18 in total

1.  Quantifying population genetic differentiation from next-generation sequencing data.

Authors:  Matteo Fumagalli; Filipe G Vieira; Thorfinn Sand Korneliussen; Tyler Linderoth; Emilia Huerta-Sánchez; Anders Albrechtsen; Rasmus Nielsen
Journal:  Genetics       Date:  2013-08-26       Impact factor: 4.562

2.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2011-09-08       Impact factor: 6.937

3.  Fast and accurate site frequency spectrum estimation from low coverage sequence data.

Authors:  Eunjung Han; Janet S Sinsheimer; John Novembre
Journal:  Bioinformatics       Date:  2014-10-30       Impact factor: 6.937

4.  Estimating IBD tracts from low coverage NGS data.

Authors:  Filipe G Vieira; Anders Albrechtsen; Rasmus Nielsen
Journal:  Bioinformatics       Date:  2016-04-22       Impact factor: 6.937

Review 5.  Advancements in Next-Generation Sequencing.

Authors:  Shawn E Levy; Richard M Myers
Journal:  Annu Rev Genomics Hum Genet       Date:  2016-06-09       Impact factor: 8.929

6.  Mathematical model for studying genetic variation in terms of restriction endonucleases.

Authors:  M Nei; W H Li
Journal:  Proc Natl Acad Sci U S A       Date:  1979-10       Impact factor: 11.205

7.  Fast and accurate relatedness estimation from high-throughput sequencing data in the presence of inbreeding.

Authors:  Kristian Hanghøj; Ida Moltke; Philip Alstrup Andersen; Andrea Manica; Thorfinn Sand Korneliussen
Journal:  Gigascience       Date:  2019-05-01       Impact factor: 6.524

Review 8.  Genotype and SNP calling from next-generation sequencing data.

Authors:  Rasmus Nielsen; Joshua S Paul; Anders Albrechtsen; Yun S Song
Journal:  Nat Rev Genet       Date:  2011-06       Impact factor: 53.242

9.  Estimation of allele frequency and association mapping using next-generation sequencing data.

Authors:  Su Yeon Kim; Kirk E Lohmueller; Anders Albrechtsen; Yingrui Li; Thorfinn Korneliussen; Geng Tian; Niels Grarup; Tao Jiang; Gitte Andersen; Daniel Witte; Torben Jorgensen; Torben Hansen; Oluf Pedersen; Jun Wang; Rasmus Nielsen
Journal:  BMC Bioinformatics       Date:  2011-06-11       Impact factor: 3.169

10.  SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data.

Authors:  Rasmus Nielsen; Thorfinn Korneliussen; Anders Albrechtsen; Yingrui Li; Jun Wang
Journal:  PLoS One       Date:  2012-07-24       Impact factor: 3.240

View more
  1 in total

1.  Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.

Authors:  Alex Mas-Sandoval; Nathaniel S Pope; Knud Nor Nielsen; Isin Altinkaya; Matteo Fumagalli; Thorfinn Sand Korneliussen
Journal:  Gigascience       Date:  2022-05-17       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.