Literature DB >> 23377868

Improving genome-wide scans of positive selection by using protein isoforms of similar length.

José Luis Villanueva-Cañas1, Steve Laurie, M Mar Albà.   

Abstract

Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank(+F). Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 23377868      PMCID: PMC3590775          DOI: 10.1093/gbe/evt017

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


  37 in total

1.  Highly expressed genes in yeast evolve slowly.

Authors:  C Pál; B Papp; L D Hurst
Journal:  Genetics       Date:  2001-06       Impact factor: 4.562

2.  The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection.

Authors:  William Fletcher; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2010-05-05       Impact factor: 16.240

3.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

4.  Evidence for widespread positive and purifying selection across the European rabbit (Oryctolagus cuniculus) genome.

Authors:  Miguel Carneiro; Frank W Albert; José Melo-Ferreira; Nicolas Galtier; Philippe Gayral; Jose A Blanco-Aguiar; Rafael Villafuerte; Michael W Nachman; Nuno Ferrand
Journal:  Mol Biol Evol       Date:  2012-01-31       Impact factor: 16.240

5.  Alignment uncertainty and genomic analysis.

Authors:  Karen M Wong; Marc A Suchard; John P Huelsenbeck
Journal:  Science       Date:  2008-01-25       Impact factor: 47.728

6.  Improving the performance of positive selection inference by filtering unreliable alignment regions.

Authors:  Eyal Privman; Osnat Penn; Tal Pupko
Journal:  Mol Biol Evol       Date:  2011-07-19       Impact factor: 16.240

Review 7.  Next-generation transcriptome assembly.

Authors:  Jeffrey A Martin; Zhong Wang
Journal:  Nat Rev Genet       Date:  2011-09-07       Impact factor: 53.242

8.  The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate.

Authors:  Sean Chun-Chang Chen; Trees-Juen Chuang; Wen-Hsiung Li
Journal:  Mol Biol Evol       Date:  2011-03-11       Impact factor: 16.240

9.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser.

Authors:  Webb Miller; Kate Rosenbloom; Ross C Hardison; Minmei Hou; James Taylor; Brian Raney; Richard Burhans; David C King; Robert Baertsch; Daniel Blankenberg; Sergei L Kosakovsky Pond; Anton Nekrutenko; Belinda Giardine; Robert S Harris; Svitlana Tyekucheva; Mark Diekhans; Thomas H Pringle; William J Murphy; Arthur Lesk; George M Weinstock; Kerstin Lindblad-Toh; Richard A Gibbs; Eric S Lander; Adam Siepel; David Haussler; W James Kent
Journal:  Genome Res       Date:  2007-11-05       Impact factor: 9.043

10.  Patterns of positive selection in six Mammalian genomes.

Authors:  Carolin Kosiol; Tomás Vinar; Rute R da Fonseca; Melissa J Hubisz; Carlos D Bustamante; Rasmus Nielsen; Adam Siepel
Journal:  PLoS Genet       Date:  2008-08-01       Impact factor: 5.917

View more
  23 in total

1.  Adaptive selection and coevolution at the proteins of the Polycomb repressive complexes in Drosophila.

Authors:  J M Calvo-Martín; P Librado; M Aguadé; M Papaceit; C Segarra
Journal:  Heredity (Edinb)       Date:  2015-10-21       Impact factor: 3.821

2.  Erasing errors due to alignment ambiguity when estimating positive selection.

Authors:  Benjamin Redelings
Journal:  Mol Biol Evol       Date:  2014-05-27       Impact factor: 16.240

3.  PosiGene: automated and easy-to-use pipeline for genome-wide detection of positively selected genes.

Authors:  Arne Sahm; Martin Bens; Matthias Platzer; Karol Szafranski
Journal:  Nucleic Acids Res       Date:  2017-06-20       Impact factor: 16.971

4.  A Comprehensive Study of the WRKY Transcription Factor Family in Strawberry.

Authors:  José Garrido-Gala; José-Javier Higuera; Antonio Rodríguez-Franco; Juan Muñoz-Blanco; Francisco Amil-Ruiz; José L Caballero
Journal:  Plants (Basel)       Date:  2022-06-15

5.  Genome-wide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in picea gene families.

Authors:  Amanda R De La Torre; Yao-Cheng Lin; Yves Van de Peer; Pär K Ingvarsson
Journal:  Genome Biol Evol       Date:  2015-03-05       Impact factor: 3.416

6.  Key Role of Amino Acid Repeat Expansions in the Functional Diversification of Duplicated Transcription Factors.

Authors:  Núria Radó-Trilla; Krisztina Arató; Cinta Pegueroles; Alicia Raya; Susana de la Luna; M Mar Albà
Journal:  Mol Biol Evol       Date:  2015-04-29       Impact factor: 16.240

7.  Genome-wide analysis of positively selected genes in seasonal and non-seasonal breeding species.

Authors:  Yuhuan Meng; Wenlu Zhang; Jinghui Zhou; Mingyu Liu; Junhui Chen; Shuai Tian; Min Zhuo; Yu Zhang; Yang Zhong; Hongli Du; Xiaoning Wang
Journal:  PLoS One       Date:  2015-05-22       Impact factor: 3.240

8.  Recent Duplication and Functional Divergence in Parasitic Nematode Levamisole-Sensitive Acetylcholine Receptors.

Authors:  Thomas B Duguet; Claude L Charvet; Sean G Forrester; Claudia M Wever; Joseph A Dent; Cedric Neveu; Robin N Beech
Journal:  PLoS Negl Trop Dis       Date:  2016-07-14

9.  Gene Expression Profiling in the Hibernating Primate, Cheirogaleus Medius.

Authors:  Sheena L Faherty; José Luis Villanueva-Cañas; Peter H Klopfer; M Mar Albà; Anne D Yoder
Journal:  Genome Biol Evol       Date:  2016-08-25       Impact factor: 3.416

10.  Uncovering adaptive evolution in the human lineage.

Authors:  Magdalena Gayà-Vidal; M Mar Albà
Journal:  BMC Genomics       Date:  2014-07-16       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.