Stilianos Louca1,2, Michael Doebeli1,2,3. 1. Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T1Z4, Canada. 2. Department of Zoology, University of British Columbia, Vancouver, BC, V6T1Z4, Canada. 3. Department of Mathematics, University of British Columbia, Vancouver, BC, V6T1Z4, Canada.
Abstract
Motivation: Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable. Results: Here we present a new R package, named 'castor', for comparative phylogenetics on large trees comprising millions of tips. On large trees castor is often 100-1000 times faster than existing tools. Availability and implementation: The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN). Contact: louca.research@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Biodiversity databases now comprise hundreds of thousands of sequences and trait records. For example, the Open Tree of Life includes over 1 491 000 metazoan and over 300 000 bacterial taxa. These data provide unique opportunities for analysis of phylogenetic trait distribution and reconstruction of ancestral biodiversity. However, existing tools for comparative phylogenetics scale poorly to such large trees, to the point of being almost unusable. Results: Here we present a new R package, named 'castor', for comparative phylogenetics on large trees comprising millions of tips. On large trees castor is often 100-1000 times faster than existing tools. Availability and implementation: The castor source code, compiled binaries, documentation and usage examples are freely available at the Comprehensive R Archive Network (CRAN). Contact: louca.research@gmail.com. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Adriano de Bernardi Schneider; Colby T Ford; Reilly Hostager; John Williams; Michael Cioce; Ümit V Çatalyürek; Joel O Wertheim; Daniel Janies Journal: Bioinformatics Date: 2020-02-01 Impact factor: 6.937
Authors: L Francisco Henao Diaz; Luke J Harmon; Mauro T C Sugawara; Eliot T Miller; Matthew W Pennell Journal: Proc Natl Acad Sci U S A Date: 2019-03-25 Impact factor: 11.205
Authors: Kerrin Mendler; Han Chen; Donovan H Parks; Briallen Lobb; Laura A Hug; Andrew C Doxey Journal: Nucleic Acids Res Date: 2019-05-21 Impact factor: 16.971
Authors: Edith M Muwawa; Chinedu C Obieze; Huxley M Makonde; Joyce M Jefwa; James H P Kahindi; Damase P Khasa Journal: PLoS One Date: 2021-03-23 Impact factor: 3.240
Authors: Shelly A Buffington; Sean W Dooling; Martina Sgritta; Cecilia Noecker; Oscar D Murillo; Daniela F Felice; Peter J Turnbaugh; Mauro Costa-Mattioli Journal: Cell Date: 2021-03-10 Impact factor: 41.582
Authors: Jacob L Steenwyk; Thomas J Buida; Abigail L Labella; Yuanning Li; Xing-Xing Shen; Antonis Rokas Journal: Bioinformatics Date: 2021-02-09 Impact factor: 6.937