| Literature DB >> 34900417 |
Cristian Román Palacios1,2, April Wright3, Josef Uyeda4.
Abstract
The number of terminals in phylogenetic trees has significantly increased over the last decade. This trend reflects recent advances in next-generation sequencing, accessibility of public data repositories, and the increased use of phylogenies in many fields. Despite R being central to the analysis of phylogenetic data, manipulation of phylogenetic comparative datasets remains slow, complex, and poorly reproducible. Here, we describe the first R package extending the functionality and syntax of data.table to explicitly deal with phylogenetic comparative datasets. treedata.table significantly increases speed and reproducibility during the data manipulation steps involved in the phylogenetic comparative workflow in R. The latest release of treedata.table is currently available through CRAN (https://cran.r-project.org/web/packages/treedata.table/). Additional documentation can be accessed through rOpenSci (https://ropensci.github.io/treedata.table/). ©2021 Román Palacios et al.Entities:
Keywords: Data.table; Evolution; Phylogenetic comparative analyses; Phylogenetics; R Package
Year: 2021 PMID: 34900417 PMCID: PMC8628621 DOI: 10.7717/peerj.12450
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Temporal change in phylogenetic tree sizes between 1978 and 2020 based on 927 publications for different animal and plant groups.
We used a LOESS smoothing to depict the temporal trend in tree size over time. Data was retrieved from the Open Tree of Life (Redelings et al., 2019) using the rotl R package (Michonneau, Brown & Winter, 2016). A linear regression that accounted for lineage identity indicated the significant increase in tree size over time (R2 = 0.2077, p < 0.001).
Brief descriptions of the functions implemented in treedata.table.
We list functions under eight different categories and provide a brief outline of their main uses.
| Category | Function | Description |
|---|---|---|
|
|
| Initial step of the workflow in |
| Drop taxa from |
| Drops taxa from a |
| Data manipulation | [ | Performs |
| Data extraction | [[ | Extracts a named vector from an object of class |
|
| Returns a named vector from a | |
|
| Returns a character matrix or tree(s) from a | |
| Run functions from other packages |
| Runs a function on a |
| Detect character type |
| Detects whether a character is continuous or discrete |
|
| Applies | |
| filterMatrix | Filters a matrix, returning either all continuous or all discrete characters | |
| Examine |
| Summarizes |
|
| Print method | |
|
| Returns the first or last part of an | |
| Inspect column/row names |
| Row and column name check |
|
| Force names for rows, columns or both |
Functions in different R packages (including treedata.table) with similar functions on matched tree/data objects.
| Package | Function | Tree/data-matched object manipulation | Reference |
|---|---|---|---|
|
|
|
| This study |
|
|
| Not supported | |
|
|
|
|
|
|
|
|
|
|
Figure 2Results for the treedata.table microbenchmark during tree/data matching steps.
Estimates of the timing during the tree/data matching steps under treedata.table are shown in relation to treeplyr. We show median and lower/upper quartiles times for the performance of each package.
Figure 3Results for the treedata.table microbenchmark during data manipulation.
We compare the performance of treedata.table against data.table, base, treeplyr, and dplyr. We show median and lower/upper quartiles times for the performance of each package.