| Literature DB >> 17949484 |
Daniel Dalevi1, Todd Z Desantis, Jakob Fredslund, Gary L Andersen, Victor M Markowitz, Philip Hugenholtz.
Abstract
BACKGROUND: Accurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17949484 PMCID: PMC2228325 DOI: 10.1186/1471-2105-8-402
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Curation of groups using GRUNT before and after database updates. (a) An ungrouped tree with interior nodes labeled s1 to s5 and parent branch e0. The grouping function (addG) identifies that e0 satisfies a minimum branch length (mL) and/or bootstrap support (mS) and that the group contains at least a minimum number of taxa (mC). (b) The naming rules (see text) are applied and the group-name S is proposed and recorded in the XML file. (c) The name is assigned to the newly formed group. (d) New sequences are added to an existing tree as part of an update, and a new sequence, s', is placed basal to group S. (e) The ungrouping function (rmvG) removes groups with branch-lengths below mL and/or mS, in this example e1 is not supported and the group S is removed. (f) The grouping and naming tools are then reapplied and identify the new stable parent branch e2 which then reforms the group S. Note that the name for group S may not be the same as in 1d depending on the taxon composition of the newly formed group.
Figure 2Number of newly defined groups when iterating minimum group size (mC) from 1000 to 5 in decrements of 5 for four minimum branch lengths (mL). Only groups from 150 to 5 are shown for clarity. A non-linear scale is used for the Y-axis to highlight differences in assignments for large groups (missing data points mean that no groups were assigned for that iteration). The total number of defined groups for these settings was 4197, 1582, 699 and 356 for 0.01, 0.02, 0.03 and 0.04 respectively. The default mL setting for grouping is 0.02 (boxed).