Literature DB >> 16845102

ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online.

Abstract

ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at http://darwin.uvigo.es/software/modeltest_server.html.

Entities: Gene Species

Mesh：

Substances：
Nucleotides

Year: 2006 PMID： 16845102 PMCID： PMC1538795 DOI： 10.1093/nar/gkl042

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Models of nucleotide substitution play a significant role in the study of DNA sequences. The use of one or another model can change our impressions regarding the evolution of a given genomic region, and therefore influence the conclusions derived from its analysis (1–3). Hence, the use of a given model needs to be properly justified. The program ModelTest (4) is a widely used standalone application for the selection of models of nucleotide substitution. This program implements different statistical frameworks for model selection, including hierarchical likelihood ratio tests (hLRT), the Akaike Information Criterion (AIE) and the Bayesian Information Criterion (BIC). Currently the ModelTest program can run on computers with different operating systems including Mac OS (with graphical user interface), Windows (DOS console) and UNIX-like (command line). To unify these different implementations, and to make the program more accessible to a wider range of researchers, the ModelTest server offers a single site for the selection online of models of nucleotide substitution.

MODELTEST SERVER

Server implementation

The ModelTest web server starts with an HyperText Markup Language (HTML) form where the user can specify the input file and several options for the analysis (Figure 1). Several JavaScript functions are included in this page to validate the input and to enable or disable several options according to the selections made by the user. All the user data are submitted to a Common Gateway Interface (CGI) written in Perl that organizes the analysis. This CGI program uploads the input file, executes the program ModelTest according to the user specifications, and writes the output in HTML in a new browser window.

Figure 1

The web page for the ModelTest server, with the options used for the analysis of the example dataset.

Analysis options

The capabilities of the server are the same as those in the program ModelTest. The user needs to specify a text input file containing the likelihood scores for 56 models of DNA substitution. This file is most easily obtained by executing in PAUP* (5) a command script that can be obtained from the help page of the server. Further instructions can be found in the program manual (also available from the help page of the server) or in (6,7). The only option within the hierarchical likelihood framework (4,8,9) is the statistical confidence level. For each individual likelihood ratio test, this level is set by default to 0.01, but the user can specify any value. The user should note that five or six likelihood ratio tests will be performed, increasing the type I error, so using a 0.01 individual test level will be more or less equivalent to a Bonferroni correction to maintain a global 0.05 confidence level. The user can choose between three information criteria: the AIC (10–12), an AIC corrected for small sample sizes (AICc) (13,14) and the BIC (15). Users are referred to references (1–3,13,14,16–20) for background on these methods. If the AICc or BIC model selection options are selected, then the user needs to indicate also the sample size corresponding to the DNA sequence alignment from which the model likelihoods were obtained. This is a difficult choice, because the concept of sample size of a sequence alignment has yet to be developed. Here, most people uses the length of the alignment as a surrogate for sample size, although other options exist (2,21). Furthermore, because model likelihoods are conditional on a given DNA sequence alignment and a tree topology, branch lengths should be considered parameters of the models as well, which is the option selected by default. In this case the user needs to specify the number of sequences, so the program can automatically calculate the number of branch length parameters. The inclusion of branch lengths as parameters will not change the AIC or BIC ranking of the models, as its number is a constant for all models, but might change the AIC differences (2). Alternatively, the user can decide to ignore branch lengths and not include them as model parameters. In addition, the user can select whether all models are included in the model averaging calculations, or just a given set of models is used according to their cumulative information weight. Finally, the user can indicate a name for the analysis. The server offers a help page where all the options are explained in detail, as well as a link to the script of commands for PAUP* and to the ModelTest PDF manual.

Output

Once the user sends the data to the server by pressing the submit button, the output page opens up in a new window in a few seconds (Figure 2). The output includes a header indicating the job number, the title of the analysis, the submission date, the IP address of the local computer and the input file name. After this header, the standard output of ModelTest will appear. This output includes two model selection frameworks, the hLRT and one of the three information criteria: AIC, AICc or BIC. The hLRT section includes the sequence of likelihood ratio tests performed, a description of the model selected including parameter estimates, and a set of commands that can be appended to a NEXUS file (22) with the sequence alignment in order to implement this model in PAUP* automatically. The information criterion section includes a full description of the model selected according to the chosen criterion, a set of PAUP* commands to implement this model, a ranking of all models according to their weight for the assessment of model selection uncertainty, and a table including parameter importance's and model-averaged estimates of model parameters.

Figure 2

Output window of the ModelTest server corresponding to the analysis of the example dataset.

EXAMPLE DATASET

The example file ‘example1.nex’ includes an alignment of 20 DNA sequences 1000 nt long, simulated according to the coalescent (23) with an effective population size of 1000 and a mutation rate of 2 × 10−5 substitutions per site per generation. The model of nucleotide substitution used was the Hasegawa–Kishino–Yano model (HKY) (24) with unequal base frequencies (fA = 0.4, fC = 0.2, fG = 0.1, fT = 0.3), a transition/transversion ratio of 2, and rate variation among sites (25) [alpha (α) shape of the gamma (Γ) distribution = 0.5]. This example dataset was loaded into PAUP*, and upon execution of the ‘modelblockPAUPb10’ script, the file ‘example1.scores’ was obtained. This file, as well as the original DNA alignment, is available from the help page of the ModelTest server. The file ‘example1.scores’ was then analyzed with the ModelTest server (Figure 1: input file =example1.scores; confidence level for the LRTs = 0.01; model selection criterion = AIC; counting branch lengths as parameters, with number of taxa = 20; averaging confidence interval = 1). The output of the server for this dataset, partially represented in Figure 2, is included as Supplementary Data. The output starts with the hLRT section, indicating the details for the six sequential LRTs performed. The model selected is HKY + Γ, which corresponds exactly with the model of nucleotide substitution used to simulate the original sequence alignment. The output includes the parameter estimates obtained in PAUP*, and set of PAUP* commands to implement this model. In the AIC section, the output indicates that this criterion also selects HKY + Γ as the best model among the 56 candidates. Again, the output includes the parameter estimates obtained in PAUP*, and a set of PAUP* commands to implement this model. Next we can see a table where models have been ordered according to their Akaike weights. Here, the best model only accumulates 20.75% of the total weight, and the best 12 models are needed to accumulate more than 95% of the total weight (96.22%). This indicates that there is quite a bit of model selection uncertainty, suggesting that several models could be used to make inferences from this dataset. The last table in the output indicates the importance (0–1) of the different parameters and the model averaged estimates. We can see that considering unequal base frequencies are very important (importance = 0.9935), that considering certain substitution types (AG or CT) is more important than considering others and that rate variation can also be important [alpha (G) = 0.5849]. The model-averaged estimates provide us with estimates obtained by averaging all 56 models. In general, they tend to be quite similar to those obtained under the best-fit model (HKY + Γ).

CONCLUSIONS

The ModelTest server is a useful online application for the selection of models of nucleotide substitution that will facilitate the execution of ModelTest to a wider range of users across many different platforms. The program includes three different frameworks for model selection and offers a serious of tools for the assessment of model selection uncertainty and model averaging.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online

8 in total

1. NEXUS: an extensible file format for systematic information.

Authors: D R Maddison; D L Swofford; W P Maddison
Journal: Syst Biol Date: 1997-12 Impact factor: 15.683

2. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests.

Authors: David Posada; Thomas R Buckley
Journal: Syst Biol Date: 2004-10 Impact factor: 15.683

3. ProtTest: selection of best-fit models of protein evolution.

Authors: Federico Abascal; Rafael Zardoya; David Posada
Journal: Bioinformatics Date: 2005-01-12 Impact factor: 6.937

4. Model selection in ecology and evolution.

Authors: Jerald B Johnson; Kristian S Omland
Journal: Trends Ecol Evol Date: 2004-02 Impact factor: 17.712

5. MODELTEST: testing the model of DNA substitution.

Authors: D Posada; K A Crandall
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

6. Among-site rate variation and its impact on phylogenetic analyses.

Authors: Z Yang
Journal: Trends Ecol Evol Date: 1996-09 Impact factor: 17.712

7. Evolution of the mitochondrial cytochrome oxidase II gene in collembola.

Authors: F Frati; C Simon; J Sullivan; D L Swofford
Journal: J Mol Evol Date: 1997-02 Impact factor: 2.395

8. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA.

Authors: M Hasegawa; H Kishino; T Yano
Journal: J Mol Evol Date: 1985 Impact factor: 2.395

8 in total

80 in total

1. Novel high-rank phylogenetic lineages within a sulfur spring (Zodletone Spring, Oklahoma), revealed using a combined pyrosequencing-sanger approach.

Authors: Noha Youssef; Brandi L Steidley; Mostafa S Elshahed
Journal: Appl Environ Microbiol Date: 2012-02-03 Impact factor: 4.792

2. The voltage-gated potassium channel subfamily KQT member 4 (KCNQ4) displays parallel evolution in echolocating bats.

Authors: Yang Liu; Naijian Han; Lucía F Franchini; Huihui Xu; Francisco Pisciottano; Ana Belén Elgoyhen; Koilmani Emmanuvel Rajan; Shuyi Zhang
Journal: Mol Biol Evol Date: 2011-12-13 Impact factor: 16.240

3. Culture-free survey reveals diverse and distinctive fungal communities associated with developing figs (Ficus spp.) in Panama.

Authors: Ellen O Martinson; Edward Allen Herre; Carlos A Machado; A Elizabeth Arnold
Journal: Microb Ecol Date: 2012-06-23 Impact factor: 4.552

4. Adaptive changes in alphavirus mRNA translation allowed colonization of vertebrate hosts.

Authors: Iván Ventoso
Journal: J Virol Date: 2012-07-03 Impact factor: 5.103

5. Internet-accessible DNA sequence database for identifying fusaria from human and animal infections.

Authors: Kerry O'Donnell; Deanna A Sutton; Michael G Rinaldi; Brice A J Sarver; S Arunmozhi Balajee; Hans-Josef Schroers; Richard C Summerbell; Vincent A R G Robert; Pedro W Crous; Ning Zhang; Takayuki Aoki; Kyongyong Jung; Jongsun Park; Yong-Hwan Lee; Seogchan Kang; Bongsoo Park; David M Geiser
Journal: J Clin Microbiol Date: 2010-08-04 Impact factor: 5.948

6. Inferring the evolutionary history of Mo-dependent nitrogen fixation from phylogenetic studies of nifK and nifDK.

Authors: Linda S Hartmann; Susan R Barnum
Journal: J Mol Evol Date: 2010-07-17 Impact factor: 2.395

7. Physicochemical evolution and positive selection of the gymnosperm matK proteins.

Authors: Da Cheng Hao; Jun Mu; Shi Lin Chen; Pei Gen Xiao
Journal: J Genet Date: 2010-04 Impact factor: 1.166

8. Prasinoviruses of the marine green alga Ostreococcus tauri are mainly species specific.

Authors: Camille Clerissi; Yves Desdevises; Nigel Grimsley
Journal: J Virol Date: 2012-02-08 Impact factor: 5.103

9. Molecular evolution of paclitaxel biosynthetic genes TS and DBAT of Taxus species.

Authors: Da Cheng Hao; Ling Yang; Beili Huang
Journal: Genetica Date: 2008-03-08 Impact factor: 1.082

10. DNA barcoding the floras of biodiversity hotspots.

Authors: Renaud Lahaye; Michelle van der Bank; Diego Bogarin; Jorge Warner; Franco Pupulin; Guillaume Gigot; Olivier Maurin; Sylvie Duthoit; Timothy G Barraclough; Vincent Savolainen
Journal: Proc Natl Acad Sci U S A Date: 2008-02-07 Impact factor: 11.205