| Literature DB >> 23088596 |
Arlin Stoltzfus1, Brian O'Meara, Jamie Whitacre, Ross Mounce, Emily L Gillespie, Sudhir Kumar, Dan F Rosauer, Rutger A Vos.
Abstract
BACKGROUND: Recently, various evolution-related journals adopted policies to encourage or require archiving of phylogenetic trees and associated data. Such attention to practices that promote sharing of data reflects rapidly improving information technology, and rapidly expanding potential to use this technology to aggregate and link data from previously published research. Nevertheless, little is known about current practices, or best practices, for publishing trees and associated data so as to promote re-use.Entities:
Mesh:
Year: 2012 PMID: 23088596 PMCID: PMC3583491 DOI: 10.1186/1756-0500-5-574
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1The character-state data model used in evolutionary comparative analysis. The character-state data model is illustrated here with an example showing members of a protein family, with a single set of labels for Operational Taxonomic Units, 2 phylogenies, and 3 types of characters (modified from [5]). The biological entities to be compared— whether genes, species, individuals, or some other unit— are known as “OTUs” or “Taxa”. Each OTU may be characterized as having a “state” for a given “character”, e.g., the OTU C_elegans_17537797 has the state “A” (Alanine) for the 2nd amino acid character. Phylogenetic trees (typically, directional, acyclic, singly-linked graphs in which no node has more than one ancestor) connect all the OTUs, representing their descent from a common ancestor.
Links to resources mentioned in the text (contact the authors if a resource is no longer available at the given address)
| APG tree | Authoritative phylogeny from Angiosperm Phylogeny Group | |
| Dryad | Public archive of data associated with peer-reviewed bioscience articles | |
| ICBN | International Code of Botanical Nomenclature | |
| ICSP | International Committee on Systematics of Prokaryotes | |
| ICZN | International Commission on Zoological Nomenclature | |
| JDAP | Joint Data Archiving Policy that directs authors to submit supporting data to an appropriate public archive | |
| Mesquite | Interactive software for comparative analysis; email list is a common venue for addressing interoperability issues | |
| MIAPA | Open project to develop a Minimum Information About a Phylogenetic Analysis standard | |
| MorphoBank | Web tool for sharing and publishing comparative data linked to images and specimen vouchers | |
| NAR database issue | List of secondary resources with alignments and trees (under protein sequences: domain databases) | |
| NESCent | National Evolutionary Synthesis Center that supports many interoperability projects | |
| NeXML | Open project to develop an XML format for comparative data and trees | |
| NIH policy | Data sharing policy applicable to NIH-funded research | |
| NSF policy | Data sharing policy applicable to NSF-funded research | |
| NWO policy | Data sharing policy applicable to NWO-funded research, policy specified on p19, items 30 and onwards | |
| Phylomatic | Software that supports grafting and pruning to create plant phylogenies from APG mega-tree | |
| TDWG | Biodiversity information standards organization with an active “Phylogenetic standards” interest group | |
| TimeTree | Secondary resource synthesizing data on divergence times | |
| iPlant TNRS | Taxonomic Name Resolution Service for plant names | |
| ToLWeb | Secondary resource to assemble a curated tree of life | |
| TreeBASE | Public archive for published trees and character data. | |
| uBio | Taxonomic name resolution service for life |
Figure 2Comparison of file formats commonly used to represent trees. The features of various formats in common use are compared, with a square indicating support for a feature, and an open circle indicating partial or incomplete support. The Newick format represents trees (and no other information) as a series of parenthetical statements representing internal nodes, taxon names, and optionally branch lengths (as described in http://evolution.genetics.washington.edu/phylip/newicktree.html). NEXUS [13] utilizes Newick strings, but also may store character information, processing commands (e.g., to exclude certain OTUs or characters), and notes. There is no formal way to propose extensions to NEXUS, but it has been widely adopted. PhyloXML [20] can store trees and molecular data, as well as accession numbers, geographic information, and other data. NeXML [21] is a different data format intended as an XML-based replacement for NEXUS. Both PhyloXML and NeXML have a formal syntax in an XSD schema. For further information, see [21].
Figure 3A taxonomy of barriers experienced by users. Barriers may occur at many different steps along the path of re-use. For instance, an author may decide not to archive data, due to the perceived burden. If the author does not archive data, then it is difficult for users to discover that the data exist. Once the user discovers that the data exist by reading a publication, the only way to obtain the data is to write to the author, a process that is known to be subject to delays and refusals. Even if the data are placed in an archive, it may be difficult for users to discover (e.g., journal web sites typically do not offer any kind of content searching for supplementary data) or to access (e.g., users may be required to pay for access). Finally, it is not unusual for archived data to contain errors and ambiguities that make it difficult to apply in scientific research.