Literature DB >> 18275003

Does choice in model selection affect maximum likelihood analysis?

Jennifer Ripplinger1, Jack Sullivan.   

Abstract

In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.

Mesh:

Year:  2008        PMID: 18275003     DOI: 10.1080/10635150801898920

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  25 in total

1.  Assessment of substitution model adequacy using frequentist and Bayesian methods.

Authors:  Jennifer Ripplinger; Jack Sullivan
Journal:  Mol Biol Evol       Date:  2010-07-08       Impact factor: 16.240

2.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Authors:  Koichiro Tamura; Daniel Peterson; Nicholas Peterson; Glen Stecher; Masatoshi Nei; Sudhir Kumar
Journal:  Mol Biol Evol       Date:  2011-05-04       Impact factor: 16.240

3.  Phylogeny of gammaproteobacteria.

Authors:  Kelly P Williams; Joseph J Gillespie; Bruno W S Sobral; Eric K Nordberg; Eric E Snyder; Joshua M Shallom; Allan W Dickerman
Journal:  J Bacteriol       Date:  2010-03-05       Impact factor: 3.490

4.  Approximating model probabilities in Bayesian information criterion and decision-theoretic approaches to model selection in phylogenetics.

Authors:  Jason Evans; Jack Sullivan
Journal:  Mol Biol Evol       Date:  2010-07-29       Impact factor: 16.240

5.  Information Criteria for Comparing Partition Schemes.

Authors:  Tae-Kun Seo; Jeffrey L Thorne
Journal:  Syst Biol       Date:  2018-07-01       Impact factor: 15.683

6.  Signal processing for metagenomics: extracting information from the soup.

Authors:  Gail L Rosen; Bahrad A Sokhansanj; Robi Polikar; Mary Ann Bruns; Jacob Russell; Elaine Garbarine; Steve Essinger; Non Yok
Journal:  Curr Genomics       Date:  2009-11       Impact factor: 2.236

7.  Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling.

Authors:  Alexandros Stamatakis; Markus Göker; Guido W Grimm
Journal:  Evol Bioinform Online       Date:  2010-05-24       Impact factor: 1.625

8.  Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets.

Authors:  Arong Luo; Huijie Qiao; Yanzhou Zhang; Weifeng Shi; Simon Yw Ho; Weijun Xu; Aibing Zhang; Chaodong Zhu
Journal:  BMC Evol Biol       Date:  2010-08-09       Impact factor: 3.260

9.  Comparing Semiquantitative and Qualitative Methods of Vascular 18F-FDG PET Activity Measurement in Large-Vessel Vasculitis.

Authors:  Himanshu R Dashora; Joel S Rosenblum; Kaitlin A Quinn; Hugh Alessi; Elaine Novakovich; Babak Saboury; Mark A Ahlman; Peter C Grayson
Journal:  J Nucl Med       Date:  2021-06-04       Impact factor: 10.057

10.  Assessing parameter identifiability in phylogenetic models using data cloning.

Authors:  José Miguel Ponciano; J Gordon Burleigh; Edward L Braun; Mark L Taper
Journal:  Syst Biol       Date:  2012-05-30       Impact factor: 15.683

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.