| Literature DB >> 12689349 |
Abstract
BACKGROUND: Several ways of incorporating indels into phylogenetic analysis have been suggested. Simple indel coding has two strengths: (1) biological realism and (2) efficiency of analysis. In the method, each indel with different start and/or end positions is considered to be a separate character. The presence/absence of these indel characters is then added to the data set. ALGORITHM: We have written a program, GapCoder to automate this procedure. The program can input PIR format aligned datasets, find the indels and add the indel-based characters. The output is a NEXUS format file, which includes a table showing what region each indel characters is based on. If regions are excluded from analysis, this table makes it easy to identify the corresponding indel characters for exclusion. DISCUSSION: Manual implementation of the simple indel coding method can be very time-consuming, especially in data sets where indels are numerous and/or overlapping. GapCoder automates this method and is therefore particularly useful during procedures where phylogenetic analyses need to be repeated many times, such as when different alignments are being explored or when various taxon or character sets are being explored. GapCoder is currently available for Windows from http://www.home.duq.edu/~youngnd/GapCoder.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12689349 PMCID: PMC153505 DOI: 10.1186/1471-2105-4-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Sample input file, PIR format
Figure 2Sample input file, modified FASTA format
Figure 3Sample output file. Output files are in the NEXUS format and ready to be input into PAUP or other programs that use this format. The indel characters have been added to the matrix and a table of correspondences is appended in the form of a comment, showing each indel character and the position of the indel upon which it is based. The Equate command allows 0 and 1 to be used, while maintaining the data type as 'DNA'. This allows one to perform maximum likelihood and other analyses that require this data type, though if a model of DNA substitution is applied, it may be most appropriate to exclude the indel characters from the analysis. They probably don't evolve according to the same model as substitutions.