| Literature DB >> 22909249 |
Eric Talevich1, Brandon M Invergo, Peter J A Cock, Brad A Chapman.
Abstract
BACKGROUND: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines.Entities:
Mesh:
Year: 2012 PMID: 22909249 PMCID: PMC3468381 DOI: 10.1186/1471-2105-13-209
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Built-in functions and tree methods
| Parse a file in the given format and return a single tree. | ||
| Iteratively parse a file and return each of the trees it contains. | ||
| Write a sequence of trees to file in the given format. | ||
| Convert between two tree file formats. | ||
| Plot the given tree using matplotlib (or pylab). | ||
| Draw an ascii-art phylogram of the given tree. | ||
| Display a tree or clade as a graph, using the graphviz engine. | ||
| Convert a Tree object to a NetworkX graph object. | ||
| Find all tree elements matching the given attributes. | ||
| Find each clade containing a matching element. | ||
| Return the first matching element found by | ||
| List the clades directly between the current node and the target. | ||
| List of all of the tree or clade’s internal nodes. | ||
| List of all of the tree or clade’s “leaf” nodes. | ||
| List of all clade object between two targets in the tree/clade. | ||
| Most recent common ancestor (clade) of all the given targets. | ||
| Count the number of terminal nodes within the tree. | ||
| Create a mapping of tree clades to depths (by branch length). | ||
| Calculate the sum of the branch lengths between two targets. | ||
| Return True if tree downstream of node is strictly bifurcating. | ||
| If the given terminals comprise a complete subclade, return the MRCA. | ||
| True if target is a descendent of the tree. | ||
| True if all direct descendents are terminal. | ||
| Calculate the sum of all the branch lengths in the tree. | ||
| Deletes target from the tree, relinking its children to its parent. | ||
| Collapse all the descendents of the tree, leaving only terminals. | ||
| Sort clades in-place according to the number of terminal nodes. | ||
| Prunes a terminal clade from the tree. | ||
| Generate | ||
| True if the node has no descendents. | ||
| Reroot the tree with the specified outgroup clade. | ||
| Reroot the tree at the midpoint between the two most distant terminals. | ||
| Serialize the tree as a string in the specified file format. | ||
| Convert the tree to its PhyloXML subclass equivalent. | ||
| Create a new Tree object given a clade. | ||
| Create a randomized bifurcating tree, given a list of taxa. | ||
| Construct an alignment from the aligned sequences in this tree. | ||
| Parse a BASEML results file. | ||
| Parse a CODEML results file. | ||
| Parse a yn00 results file. | ||
| Dynamically build a program-specific control file. | ||
| Parse a control file to create a program-specific class instance. | ||
| Print all of the program options and their current settings. | ||
| Set the value of a program option. | ||
| Return the value of a program option. | ||
| Return the current values of all the program options. | ||
| Run a PAML program and parse the results. | ||
Public methods and functions provided in the Bio.Phylo module and sub-modules. An up-to-date version of this information is available at http://biopython.org/DIST/docs/api/.
Figure 1Tree visualization. An example tree with the code to generate each plot shown below each plot. First, a phyloXML tree of the Apaf-1 protein family [27] is downloaded, read by Bio.Phylo, and plotted with default settings. The tree is then rerooted at the midpoint of its two most divergent tips and ladderized such that sibling clades with a larger number of descendents are listed first. The clade of genes belonging to vertebrate species is identified as the common ancestor of the human and zebrafish, after inspection of the original tree. The vertebrate clade is highlighted with the color fuchsia and an increased branch width, and the rest of the tree is colored gray. Finally, the tree is plotted again
Performance
| Read a very large Newick tree | Smith 2011 angiosperm supertree | |||
| | (55473 terminal nodes) [ | 17.45 | 16.85 | 1.214 |
| Read the same large tree in phyloXML | Smith 2011, converted to phyloXML with | |||
| | 3.805 | 4.318 | 3.937 | |
| Write the same large tree as Newick | Smith 2011 | 0.5238 | 0.7704 | 0.4378 |
| Write the same large tree as phyloXML | Smith 2011 | 10.39 | 10.85 | 24.17 |
| Read a medium-sized Newick tree | Davies 2004 angiosperm supertree | |||
| | (440 terminal nodes) [ | 0.1097 | 0.1087 | 0.007312 |
| Parse many Newick trees | Davies 2004, copies rerooted at | |||
| | each node (816 trees) | 84.91 | 84.29 | 6.812 |
| Reroot at each node | Davies 2004 | 1.347 | 1.167 | 0.3450 |
| Collapse all splits with bootstrap values less than 50 | Davies 2004 | 2.266 | 2.312 | 2.411 |
| Total branch length | Davies 2004 | 0.01322 | 0.01310 | 0.01448 |
| Ladderize the tree | Davies 2004 | 0.1274 | 0.1190 | 0.1127 |
| Count terminal nodes | Davies 2004 | 0.006838 | 0.006323 | 0.005914 |
Performance of Bio.Phylo functions and tree methods under different Python versions on several benchmark tasks. Reported execution times are the median of 101 replications of each task, in seconds (Additional file 2). Benchmarks were evaluated with Python versions 2.7.3 and 3.2.3 and PyPy version 1.9 on an Intel Xeon E5405 2.00 GHz processor with 8 GB memory, running under 64-bit Ubuntu Linux 12.04 with Biopython 1.60 installed.