Literature DB >> 21393387

High sensitivity to aligner and high rate of false positives in the estimates of positive selection in the 12 Drosophila genomes.

Penka Markova-Raina1, Dmitri Petrov.   

Abstract

We investigate the effect of aligner choice on inferences of positive selection using site-specific models of molecular evolution. We find that independently of the choice of aligner, the rate of false positives is unacceptably high. Our study is a whole-genome analysis of all protein-coding genes in 12 Drosophila genomes annotated in either all 12 species (~6690 genes) or in the six melanogaster group species. We compare six popular aligners: PRANK, T-Coffee, ClustalW, ProbCons, AMAP, and MUSCLE, and find that the aligner choice strongly influences the estimates of positive selection. Differences persist when we use (1) different stringency cutoffs, (2) different selection inference models, (3) alignments with or without gaps, and/or additional masking, (4) per-site versus per-gene statistics, (5) closely related melanogaster group species versus more distant 12 Drosophila genomes. Furthermore, we find that these differences are consequential for downstream analyses such as determination of over/under-represented GO terms associated with positive selection. Visual analysis indicates that most sites inferred as positively selected are, in fact, misaligned at the codon level, resulting in false positive rates of 48%-82%. PRANK, which has been reported to outperform other aligners in simulations, performed best in our empirical study as well. Unfortunately, PRANK still had a high, and unacceptable for most applications, false positives rate of 50%-55%. We identify misannotations and indels, many of which appear to be located in disordered protein regions, as primary culprits for the high misalignment-related error levels and discuss possible workaround approaches to this apparently pervasive problem in genome-wide evolutionary analyses.

Entities:  

Mesh:

Year:  2011        PMID: 21393387      PMCID: PMC3106319          DOI: 10.1101/gr.115949.110

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  59 in total

1.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

2.  Why would phylogeneticists ignore computerized sequence alignment?

Authors:  David A Morrison
Journal:  Syst Biol       Date:  2009-03-25       Impact factor: 15.683

3.  ProbCons: Probabilistic consistency-based multiple sequence alignment.

Authors:  Chuong B Do; Mahathi S P Mahabhashyam; Michael Brudno; Serafim Batzoglou
Journal:  Genome Res       Date:  2005-02       Impact factor: 9.043

4.  Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites.

Authors:  Wendy S W Wong; Ziheng Yang; Nick Goldman; Rasmus Nielsen
Journal:  Genetics       Date:  2004-10       Impact factor: 4.562

5.  Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species.

Authors:  Melanie A Huntley; Andrew G Clark
Journal:  Mol Biol Evol       Date:  2007-06-29       Impact factor: 16.240

6.  Reliabilities of identifying positive selection by the branch-site and the site-prediction methods.

Authors:  Masafumi Nozawa; Yoshiyuki Suzuki; Masatoshi Nei
Journal:  Proc Natl Acad Sci U S A       Date:  2009-04-01       Impact factor: 11.205

7.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.

Authors:  R Nielsen; Z Yang
Journal:  Genetics       Date:  1998-03       Impact factor: 4.562

8.  Darwinian evolution in the light of genomics.

Authors:  Eugene V Koonin
Journal:  Nucleic Acids Res       Date:  2009-02-12       Impact factor: 16.971

9.  Genome-wide acceleration of protein evolution in flies (Diptera).

Authors:  Joël Savard; Diethard Tautz; Martin J Lercher
Journal:  BMC Evol Biol       Date:  2006-01-25       Impact factor: 3.260

10.  Patterns of positive selection in six Mammalian genomes.

Authors:  Carolin Kosiol; Tomás Vinar; Rute R da Fonseca; Melissa J Hubisz; Carlos D Bustamante; Rasmus Nielsen; Adam Siepel
Journal:  PLoS Genet       Date:  2008-08-01       Impact factor: 5.917

View more
  71 in total

1.  Identification, genealogical structure and population genetics of S-alleles in Malus sieversii, the wild ancestor of domesticated apple.

Authors:  X Ma; Z Cai; W Liu; S Ge; L Tang
Journal:  Heredity (Edinb)       Date:  2017-06-21       Impact factor: 3.821

2.  Patterns of molecular evolution of the germ line specification gene oskar suggest that a novel domain may contribute to functional divergence in Drosophila.

Authors:  Abha Ahuja; Cassandra G Extavour
Journal:  Dev Genes Evol       Date:  2014-01-10       Impact factor: 0.900

3.  Secreted Proteins Defy the Expression Level-Evolutionary Rate Anticorrelation.

Authors:  Felix Feyertag; Patricia M Berninsone; David Alvarez-Ponce
Journal:  Mol Biol Evol       Date:  2017-03-01       Impact factor: 16.240

4.  Positive and purifying selection on the Drosophila Y chromosome.

Authors:  Nadia D Singh; Leonardo B Koerich; Antonio Bernardo Carvalho; Andrew G Clark
Journal:  Mol Biol Evol       Date:  2014-06-27       Impact factor: 16.240

5.  Limited utility of residue masking for positive-selection inference.

Authors:  Stephanie J Spielman; Eric T Dawson; Claus O Wilke
Journal:  Mol Biol Evol       Date:  2014-06-03       Impact factor: 16.240

6.  Erasing errors due to alignment ambiguity when estimating positive selection.

Authors:  Benjamin Redelings
Journal:  Mol Biol Evol       Date:  2014-05-27       Impact factor: 16.240

7.  Integrated structural and evolutionary analysis reveals common mechanisms underlying adaptive evolution in mammals.

Authors:  Greg Slodkowicz; Nick Goldman
Journal:  Proc Natl Acad Sci U S A       Date:  2020-03-02       Impact factor: 11.205

8.  PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Authors:  Arne Sahm; Martin Bens; Matthias Platzer; Karol Szafranski
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

9.  Improving genome-wide scans of positive selection by using protein isoforms of similar length.

Authors:  José Luis Villanueva-Cañas; Steve Laurie; M Mar Albà
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  Genome-scale detection of positive selection in nine primates predicts human-virus evolutionary conflicts.

Authors:  Robin van der Lee; Laurens Wiel; Teunis J P van Dam; Martijn A Huynen
Journal:  Nucleic Acids Res       Date:  2017-10-13       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.