Jacob L Steenwyk1, Thomas J Buida2, Abigail L Labella1, Yuanning Li1, Xing-Xing Shen3, Antonis Rokas1. 1. Department of Biological Sciences, Vanderbilt University, VU Station B #35-1634, Nashville, TN, 37235, United States of America. 2. 9 City Place #312, Nashville, TN, 37209, United States of America. 3. Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Institute of Insect Sciences, Zhejiang University, Hangzhou, 310058, China.
Abstract
MOTIVATION: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes, and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock, and collapsing bipartitions (internal branches) with low support. RESULTS: To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes; and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining, and deriving biological meaning from increasingly large phylogenomic datasets. AVAILABILITY: PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/), and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). SUPPLEMENTARY INFORMATION: Supplementary data are available on figshare (doi: 10.6084/m9.figshare.13118600) and are available at Bioinformatics online.
MOTIVATION: Diverse disciplines in biology process and analyze multiple sequence alignments (MSAs) and phylogenetic trees to evaluate their information content, infer evolutionary events and processes, and predict gene function. However, automated processing of MSAs and trees remains a challenge due to the lack of a unified toolkit. To fill this gap, we introduce PhyKIT, a toolkit for the UNIX shell environment with 30 functions that process MSAs and trees, including but not limited to estimation of mutation rate, evaluation of sequence composition biases, calculation of the degree of violation of a molecular clock, and collapsing bipartitions (internal branches) with low support. RESULTS: To demonstrate the utility of PhyKIT, we detail three use cases: (1) summarizing information content in MSAs and phylogenetic trees for diagnosing potential biases in sequence or tree data; (2) evaluating gene-gene covariation of evolutionary rates to identify functional relationships, including novel ones, among genes; and (3) identify lack of resolution events or polytomies in phylogenetic trees, which are suggestive of rapid radiation events or lack of data. We anticipate PhyKIT will be useful for processing, examining, and deriving biological meaning from increasingly large phylogenomic datasets. AVAILABILITY: PhyKIT is freely available on GitHub (https://github.com/JLSteenwyk/PhyKIT), PyPi (https://pypi.org/project/phykit/), and the Anaconda Cloud (https://anaconda.org/JLSteenwyk/phykit) under the MIT license with extensive documentation and user tutorials (https://jlsteenwyk.com/PhyKIT). SUPPLEMENTARY INFORMATION: Supplementary data are available on figshare (doi: 10.6084/m9.figshare.13118600) and are available at Bioinformatics online.
Authors: Julio Rozas; Albert Ferrer-Mata; Juan Carlos Sánchez-DelBarrio; Sara Guirao-Rico; Pablo Librado; Sebastián E Ramos-Onsins; Alejandro Sánchez-Gracia Journal: Mol Biol Evol Date: 2017-12-01 Impact factor: 16.240
Authors: Peter J A Cock; Tiago Antao; Jeffrey T Chang; Brad A Chapman; Cymon J Cox; Andrew Dalke; Iddo Friedberg; Thomas Hamelryck; Frank Kauff; Bartek Wilczynski; Michiel J L de Hoon Journal: Bioinformatics Date: 2009-03-20 Impact factor: 6.937
Authors: Jacob L Steenwyk; Thomas J Buida; Carla Gonçalves; Dayna C Goltz; Grace Morales; Matthew E Mead; Abigail L LaBella; Christina M Chavez; Jonathan E Schmitz; Maria Hadjifrangiskou; Yuanning Li; Antonis Rokas Journal: Genetics Date: 2022-07-04 Impact factor: 4.402
Authors: Matthew E Mead; Jacob L Steenwyk; Lilian P Silva; Patrícia A de Castro; Nauman Saeed; Falk Hillmann; Gustavo H Goldman; Antonis Rokas Journal: Genetics Date: 2021-06-24 Impact factor: 4.562
Authors: Maria Augusta C Horta; Jacob L Steenwyk; Matthew E Mead; Luciano H Braz Dos Santos; Shu Zhao; John G Gibbons; Marina Marcet-Houben; Toni Gabaldón; Antonis Rokas; Gustavo H Goldman Journal: mBio Date: 2022-06-29 Impact factor: 7.786