Literature DB >> 25359894

Fast and accurate site frequency spectrum estimation from low coverage sequence data.

Eunjung Han1, Janet S Sinsheimer2, John Novembre1.   

Abstract

MOTIVATION: The distribution of allele frequencies across polymorphic sites, also known as the site frequency spectrum (SFS), is of primary interest in population genetics. It is a complete summary of sequence variation at unlinked sites and more generally, its shape reflects underlying population genetic processes. One practical challenge is that inferring the SFS from low coverage sequencing data in a straightforward manner by using genotype calls can lead to significant bias. To reduce bias, previous studies have used a statistical method that directly estimates the SFS from sequencing data by first computing site allele frequency (SAF) likelihood for each site (i.e. the likelihood a site has each possible allele frequency conditional on observed sequence reads) using a dynamic programming (DP) algorithm. Although this method produces an accurate SFS, computing the SAF likelihood is quadratic in the number of samples sequenced.
RESULTS: To overcome this computational challenge, we propose an algorithm, 'score-limited DP' algorithm, which is linear in the number of genomes to compute the SAF likelihood. This algorithm works because in a lower triangular matrix that arises in the DP algorithm, all non-negligible values of the SAF likelihood are concentrated on a few cells around the best-guess allele counts. We show that our score-limited DP algorithm has comparable accuracy but is faster than the original DP algorithm. This speed improvement makes SFS estimation practical when using low coverage NGS data from a large number of individuals.
AVAILABILITY AND IMPLEMENTATION: The program will be available via a link from the Novembre lab website (http://jnpopgen.org/).
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25359894      PMCID: PMC4341071          DOI: 10.1093/bioinformatics/btu725

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  Frequency spectrum neutrality tests: one for all and all for one.

Authors:  Guillaume Achaz
Journal:  Genetics       Date:  2009-06-22       Impact factor: 4.562

2.  Testing for neutrality in samples with sequencing errors.

Authors:  Guillaume Achaz
Journal:  Genetics       Date:  2008-06-18       Impact factor: 4.562

3.  Sequencing of 50 human exomes reveals adaptation to high altitude.

Authors:  Xin Yi; Yu Liang; Emilia Huerta-Sanchez; Xin Jin; Zha Xi Ping Cuo; John E Pool; Xun Xu; Hui Jiang; Nicolas Vinckenbosch; Thorfinn Sand Korneliussen; Hancheng Zheng; Tao Liu; Weiming He; Kui Li; Ruibang Luo; Xifang Nie; Honglong Wu; Meiru Zhao; Hongzhi Cao; Jing Zou; Ying Shan; Shuzheng Li; Qi Yang; Peixiang Ni; Geng Tian; Junming Xu; Xiao Liu; Tao Jiang; Renhua Wu; Guangyu Zhou; Meifang Tang; Junjie Qin; Tong Wang; Shuijian Feng; Guohong Li; Jiangbai Luosang; Wei Wang; Fang Chen; Yading Wang; Xiaoguang Zheng; Zhuo Li; Zhuoma Bianba; Ge Yang; Xinping Wang; Shuhui Tang; Guoyi Gao; Yong Chen; Zhen Luo; Lamu Gusang; Zheng Cao; Qinghui Zhang; Weihan Ouyang; Xiaoli Ren; Huiqing Liang; Huisong Zheng; Yebo Huang; Jingxiang Li; Lars Bolund; Karsten Kristiansen; Yingrui Li; Yong Zhang; Xiuqing Zhang; Ruiqiang Li; Songgang Li; Huanming Yang; Rasmus Nielsen; Jun Wang; Jian Wang
Journal:  Science       Date:  2010-07-02       Impact factor: 47.728

4.  An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people.

Authors:  Matthew R Nelson; Daniel Wegmann; Margaret G Ehm; Darren Kessner; Pamela St Jean; Claudio Verzilli; Judong Shen; Zhengzheng Tang; Silviu-Alin Bacanu; Dana Fraser; Liling Warren; Jennifer Aponte; Matthew Zawistowski; Xiao Liu; Hao Zhang; Yong Zhang; Jun Li; Yun Li; Li Li; Peter Woollard; Simon Topp; Matthew D Hall; Keith Nangle; Jun Wang; Gonçalo Abecasis; Lon R Cardon; Sebastian Zöllner; John C Whittaker; Stephanie L Chissoe; John Novembre; Vincent Mooser
Journal:  Science       Date:  2012-05-17       Impact factor: 47.728

5.  Extremely low-coverage sequencing and imputation increases power for genome-wide association studies.

Authors:  Bogdan Pasaniuc; Nadin Rohland; Paul J McLaren; Kiran Garimella; Noah Zaitlen; Heng Li; Namrata Gupta; Benjamin M Neale; Mark J Daly; Pamela Sklar; Patrick F Sullivan; Sarah Bergen; Jennifer L Moran; Christina M Hultman; Paul Lichtenstein; Patrik Magnusson; Shaun M Purcell; David W Haas; Liming Liang; Shamil Sunyaev; Nick Patterson; Paul I W de Bakker; David Reich; Alkes L Price
Journal:  Nat Genet       Date:  2012-05-20       Impact factor: 38.330

6.  Design of association studies with pooled or un-pooled next-generation sequencing data.

Authors:  Su Yeon Kim; Yingrui Li; Yiran Guo; Ruiqiang Li; Johan Holmkvist; Torben Hansen; Oluf Pedersen; Jun Wang; Rasmus Nielsen
Journal:  Genet Epidemiol       Date:  2010-07       Impact factor: 2.135

7.  A map of human genome variation from population-scale sequencing.

Authors:  Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

8.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

9.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

10.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

View more
  5 in total

1.  Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.

Authors:  Alex Mas-Sandoval; Nathaniel S Pope; Knud Nor Nielsen; Isin Altinkaya; Matteo Fumagalli; Thorfinn Sand Korneliussen
Journal:  Gigascience       Date:  2022-05-17       Impact factor: 7.658

2.  Correcting Bias in Allele Frequency Estimates Due to an Observation Threshold: A Markov Chain Analysis.

Authors:  Toni I Gossmann; David Waxman
Journal:  Genome Biol Evol       Date:  2022-04-10       Impact factor: 4.065

3.  Whole-genome sequencing for an enhanced understanding of genetic variation among South Africans.

Authors:  Ananyo Choudhury; Michèle Ramsay; Scott Hazelhurst; Shaun Aron; Soraya Bardien; Gerrit Botha; Emile R Chimusa; Alan Christoffels; Junaid Gamieldien; Mahjoubeh J Sefid-Dashti; Fourie Joubert; Ayton Meintjes; Nicola Mulder; Raj Ramesar; Jasper Rees; Kathrine Scholtz; Dhriti Sengupta; Himla Soodyall; Philip Venter; Louise Warnich; Michael S Pepper
Journal:  Nat Commun       Date:  2017-12-12       Impact factor: 14.919

4.  Global patterns in genomic diversity underpinning the evolution of insecticide resistance in the aphid crop pest Myzus persicae.

Authors:  Kumar Saurabh Singh; Erick M G Cordeiro; Bartlomiej J Troczka; Adam Pym; Joanna Mackisack; Thomas C Mathers; Ana Duarte; Fabrice Legeai; Stéphanie Robin; Pablo Bielza; Hannah J Burrack; Kamel Charaabi; Ian Denholm; Christian C Figueroa; Richard H Ffrench-Constant; Georg Jander; John T Margaritopoulos; Emanuele Mazzoni; Ralf Nauen; Claudio C Ramírez; Guangwei Ren; Ilona Stepanyan; Paul A Umina; Nina V Voronova; John Vontas; Martin S Williamson; Alex C C Wilson; Gao Xi-Wu; Young-Nam Youn; Christoph T Zimmer; Jean-Christophe Simon; Alex Hayward; Chris Bass
Journal:  Commun Biol       Date:  2021-07-07

5.  Comparison of Single Genome and Allele Frequency Data Reveals Discordant Demographic Histories.

Authors:  Annabel C Beichman; Tanya N Phung; Kirk E Lohmueller
Journal:  G3 (Bethesda)       Date:  2017-11-06       Impact factor: 3.154

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.