| Literature DB >> 21261969 |
Martin O Jones1, Georgios D Koutsovoulos, Mark L Blaxter.
Abstract
BACKGROUND: The increasing availability of molecular sequence data means that the accuracy of future phylogenetic studies is likely to by limited by systematic bias and taxon choice rather than by data. In order to take advantage of increasing datasets, user-friendly tools are required to facilitate phylogenetic analyses and to reduce duplication of dataset assembly efforts. Current phylogenetic pipelines are dependency-heavy and have significant technical barriers to use.Entities:
Mesh:
Year: 2011 PMID: 21261969 PMCID: PMC3037854 DOI: 10.1186/1471-2105-12-30
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The iPhy user interface. The user is viewing the "Add Sequences" tab. The interface shows the current user name (1) and a set of summary statistics about the dataset (2). A modified taxonomy has been selected (3) which is displayed in the taxonomy panel (4). The user has expanded several nodes of the taxonomy and selected several nodes. Sequences belong to the selected nodes, or their descendants, will be added to the current dataset based on either annotation (5) or similarity to known sequences (6). From this panel the user can also upload new data files (7).
Figure 2A supermatrix for Nematoda assembled in iPhy. The colour of the box at each intersection shows the length of the consensus DNA sequence for a given gene in a given species. Genes and species are ordered by total number of characters. A high-resolution version of this figure is available as additional file 4 : heatmap.svg.
Descriptive statistics of subset alignments and trees
| L14 AT content | L8 AT content | L9 AT content | Tree length | |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
| ||||
| whole dataset | 0.5118 | 0.0819 | 0.4686 | 0.0675 | 0.5221 | 0.0745 | ||
| 0.5614 | 0.0750 | 0.5175 | 0.0740 | 0.5593 | 0.0600 | 3.837 | 0.0500 | |
| 0.5266 | 0.0570 | 0.4845 | 0.0524 | 0.5206 | 0.0461 | 3.489 | 0.0374 | |
| 0.5182 | 0.0692 | 0.4860 | 0.0678 | 0.5334 | 0.0503 | 3.033 | 0.0375 | |
Each row shows a set of statistics associated with a given subset of the data, or with the entire dataset. Each row shows the mean and Standard Deviation of AT content for each gene separately, and the mean and variance of the total tree length as reported by MrBayes.
Figure 3Analyses of slices from the Nematoda dataset. The figure shows the results of analyses of automatically selected taxon subsets from the Nematoda dataset using various criteria for a three-gene supermatrix. (A) most_chars species (one per order) with most characters for the three genes; (B) least_bias species showing the lowest base composition bias; (C) slowest_rate species with the inferred slowest overall rate of evolution. (D) For comparison, we show the tree derived from alignment of full length SSU rRNA sequences for twelve of the fourteen species included in the iPhy slices in parts (A), (B) and (C). For two of the species (Caenorhabditis sp. 5 and Ditylenchus africanus) no SSU rRNA sequence was available so we have included closely-related species (Caenorhabditis briggsae and Ditylenchus angustus). Clade membership sensu Blaxter 1998 [36] is shown on the tree. For each iPhy subset the figure shows, from left to right, the tree resulting from phylogenetic analysis; a heat map showing the AT content of each of the three genes; a stacked bar chart showing the number of characters for each gene. Scale bars above each tree show the branch length associated with 0.1 changes per site. Order names are given in parentheses. The keys at the bottom of the figure show, from left to right, the mapping of colours to AT content for the heatmap, and the mapping of colours to loci for the bar chart. The scale bar shows the length of bar representing 1000 characters.