Literature DB >> 21596896

Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans.

Peter D Keightley1, Daniel L Halligan.   

Abstract

Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.

Entities:  

Mesh:

Year:  2011        PMID: 21596896      PMCID: PMC3176106          DOI: 10.1534/genetics.111.128355

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


  24 in total

1.  On the number of segregating sites in genetical models without recombination.

Authors:  G A Watterson
Journal:  Theor Popul Biol       Date:  1975-04       Impact factor: 1.570

2.  What can we learn about the distribution of fitness effects of new mutations from DNA sequence data?

Authors:  Peter D Keightley; Adam Eyre-Walker
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2010-04-27       Impact factor: 6.237

3.  Evolutionary constraints in conserved nongenic sequences of mammals.

Authors:  Peter D Keightley; Gregory V Kryukov; Shamil Sunyaev; Daniel L Halligan; Daniel J Gaffney
Journal:  Genome Res       Date:  2005-10       Impact factor: 9.043

4.  Sequencing errors and molecular evolutionary analysis.

Authors:  A G Clark; T S Whittam
Journal:  Mol Biol Evol       Date:  1992-07       Impact factor: 16.240

5.  The distribution of fitness effects of new deleterious amino acid mutations in humans.

Authors:  Adam Eyre-Walker; Megan Woolfit; Ted Phelps
Journal:  Genetics       Date:  2006-03-17       Impact factor: 4.562

6.  Estimation of allele frequencies from high-coverage genome-sequencing projects.

Authors:  Michael Lynch
Journal:  Genetics       Date:  2009-03-16       Impact factor: 4.562

7.  A map of human genome variation from population-scale sequencing.

Authors:  Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

8.  Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes.

Authors:  Lél Eory; Daniel L Halligan; Peter D Keightley
Journal:  Mol Biol Evol       Date:  2010-01       Impact factor: 16.240

9.  Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines.

Authors:  Peter D Keightley; Urmi Trivedi; Marian Thomson; Fiona Oliver; Sujai Kumar; Mark L Blaxter
Journal:  Genome Res       Date:  2009-05-13       Impact factor: 9.043

10.  Assessing the evolutionary impact of amino acid mutations in the human genome.

Authors:  Adam R Boyko; Scott H Williamson; Amit R Indap; Jeremiah D Degenhardt; Ryan D Hernandez; Kirk E Lohmueller; Mark D Adams; Steffen Schmidt; John J Sninsky; Shamil R Sunyaev; Thomas J White; Rasmus Nielsen; Andrew G Clark; Carlos D Bustamante
Journal:  PLoS Genet       Date:  2008-05-30       Impact factor: 5.917

View more
  28 in total

1.  Are Synonymous Sites in Primates and Rodents Functionally Constrained?

Authors:  Nicholas Price; Dan Graur
Journal:  J Mol Evol       Date:  2015-11-12       Impact factor: 2.395

2.  Genotype-Frequency Estimation from High-Throughput Sequencing Data.

Authors:  Takahiro Maruki; Michael Lynch
Journal:  Genetics       Date:  2015-07-29       Impact factor: 4.562

3.  Quantifying population genetic differentiation from next-generation sequencing data.

Authors:  Matteo Fumagalli; Filipe G Vieira; Thorfinn Sand Korneliussen; Tyler Linderoth; Emilia Huerta-Sánchez; Anders Albrechtsen; Rasmus Nielsen
Journal:  Genetics       Date:  2013-08-26       Impact factor: 4.562

4.  Characterizing bias in population genetic inferences from low-coverage sequencing data.

Authors:  Eunjung Han; Janet S Sinsheimer; John Novembre
Journal:  Mol Biol Evol       Date:  2013-11-27       Impact factor: 16.240

Review 5.  Comparative population genomics: power and principles for the inference of functionality.

Authors:  David S Lawrie; Dmitri A Petrov
Journal:  Trends Genet       Date:  2014-03-20       Impact factor: 11.639

Review 6.  From next-generation resequencing reads to a high-quality variant data set.

Authors:  S P Pfeifer
Journal:  Heredity (Edinb)       Date:  2016-10-19       Impact factor: 3.821

Review 7.  Weak selection and protein evolution.

Authors:  Hiroshi Akashi; Naoki Osada; Tomoko Ohta
Journal:  Genetics       Date:  2012-09       Impact factor: 4.562

8.  Evidence for increased levels of positive and negative selection on the X chromosome versus autosomes in humans.

Authors:  Krishna R Veeramah; Ryan N Gutenkunst; August E Woerner; Joseph C Watkins; Michael F Hammer
Journal:  Mol Biol Evol       Date:  2014-05-15       Impact factor: 16.240

9.  Genome-wide estimation of linkage disequilibrium from population-level high-throughput sequencing data.

Authors:  Takahiro Maruki; Michael Lynch
Journal:  Genetics       Date:  2014-05-28       Impact factor: 4.562

Review 10.  Inferring population size changes with sequence and SNP data: lessons from human bottlenecks.

Authors:  L M Gattepaille; M Jakobsson; M G B Blum
Journal:  Heredity (Edinb)       Date:  2013-02-20       Impact factor: 3.821

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.