| Literature DB >> 24891820 |
Abstract
Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this task. Here, we present a new Python-based software package that uses well defined XML schema to manage both data and metadata. It builds on previous versions by 1) including new processing steps, such as Safe Taxonomic Reduction, 2) using a user-friendly GUI that guides the user to complete at least the minimum information required and includes context-sensitive documentation, and 3) a revised storage format that integrates both tree- and meta-data into a single file. These data can then be manipulated according to a well-defined, but flexible, processing pipeline using either the GUI or a command-line based tool. Processing steps include standardising names, deleting or replacing taxa, ensuring adequate taxonomic overlap, ensuring data independence, and safe taxonomic reduction. This software has been successfully used to store and process data consisting of over 1000 trees ready for analyses using standard supertree methods. This software makes large supertree creation a much easier task and provides far greater flexibility for further work.Entities:
Keywords: Supertree; data curation; meta-data; phylogeny
Year: 2014 PMID: 24891820 PMCID: PMC4031428 DOI: 10.3897/BDJ.2.e1053
Source DB: PubMed Journal: Biodivers Data J ISSN: 1314-2828
Figure 1.The STK GUI, based on Diamond (Ham et al. 2009). The GUI is split into two main panes, with the left being used to store data and the right showing the context-based help (top), data entry (middle) and comments (bottom).
Figure 2.Data structure of the STK metadata. Each project consists of several sources, which in turn contain bibliographic information and one or more source trees. The blue boxes show the hierarchy for a single source tree. The data structure has been simplified here and more meta data can be stored for each source tree.
Figure 3.Example of a processing pipeline that can be created with the STK. Data are collected and then are put through the processing pipeline in order to create a matrix. The resulting matrix (in either Nexus format (.nex) or Hennig format (.tnt)) can then be analysed in any suitable software such as PAUP* (Swofford 2003) or TNT (Goloboff et al. 2008).
Figure 4.Result of the taxonomic overlap check which highlights which source trees are not sufficiently well connected to the rest of the dataset.