John Yin1, Chao Zhang2, Siavash Mirarab3. 1. Department of Mathematics, University of California at San Diego, La Jolla, CA, USA. 2. Bioinformatics and Systems Biology, University of California at San Diego, La Jolla, CA, USA. 3. Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA, USA.
Abstract
MOTIVATION: Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. RESULTS: ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. AVAILABILITY AND IMPLEMENTATION: ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Evolutionary histories can change from one part of the genome to another. The potential for discordance between the gene trees has motivated the development of summary methods that reconstruct a species tree from an input collection of gene trees. ASTRAL is a widely used summary method and has been able to scale to relatively large datasets. However, the size of genomic datasets is quickly growing. Despite its relative efficiency, the current single-threaded implementation of ASTRAL is falling behind the data growth trends is not able to analyze the largest available datasets in a reasonable time. RESULTS: ASTRAL uses dynamic programing and is not trivially parallel. In this paper, we introduce ASTRAL-MP, the first version of ASTRAL that can exploit parallelism and also uses randomization techniques to speed up some of its steps. Importantly, ASTRAL-MP can take advantage of not just multiple CPU cores but also one or several graphics processing units (GPUs). The ASTRAL-MP code scales very well with increasing CPU cores, and its GPU version, implemented in OpenCL, can have up to 158× speedups compared to ASTRAL-III. Using GPUs and multiple cores, ASTRAL-MP is able to analyze datasets with 10 000 species or datasets with more than 100 000 genes in <2 days. AVAILABILITY AND IMPLEMENTATION: ASTRAL-MP is available at https://github.com/smirarab/ASTRAL/tree/MP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Andrew Ryan Passer; Shelly Applen Clancey; Terrance Shea; Márcia David-Palma; Anna Floyd Averette; Teun Boekhout; Betina M Porcel; Minou Nowrousian; Christina A Cuomo; Sheng Sun; Joseph Heitman; Marco A Coelho Journal: Elife Date: 2022-06-17 Impact factor: 8.713
Authors: Alexander R Kneubehl; Aparna Krishnavajhala; Sebastián Muñoz Leal; Adam J Replogle; Luke C Kingry; Sergio E Bermúdez; Marcelo B Labruna; Job E Lopez Journal: BMC Genomics Date: 2022-05-31 Impact factor: 4.547
Authors: Qiyun Zhu; Uyen Mai; Wayne Pfeiffer; Stefan Janssen; Francesco Asnicar; Jon G Sanders; Pedro Belda-Ferre; Gabriel A Al-Ghalith; Evguenia Kopylova; Daniel McDonald; Tomasz Kosciolek; John B Yin; Shi Huang; Nimaichand Salam; Jian-Yu Jiao; Zijun Wu; Zhenjiang Z Xu; Kalen Cantrell; Yimeng Yang; Erfan Sayyari; Maryam Rabiee; James T Morton; Sheila Podell; Dan Knights; Wen-Jun Li; Curtis Huttenhower; Nicola Segata; Larry Smarr; Siavash Mirarab; Rob Knight Journal: Nat Commun Date: 2019-12-02 Impact factor: 14.919
Authors: Jennifer M Korstian; Nicole S Paulat; Roy N Platt; Richard D Stevens; David A Ray Journal: Genes (Basel) Date: 2022-02-23 Impact factor: 4.096
Authors: Bernard Y Kim; Jeremy R Wang; Daniel R Matute; Dmitri A Petrov; Danny E Miller; Olga Barmina; Emily Delaney; Ammon Thompson; Aaron A Comeault; David Peede; Emmanuel R R D'Agostino; Julianne Pelaez; Jessica M Aguilar; Diler Haji; Teruyuki Matsunaga; Ellie E Armstrong; Molly Zych; Yoshitaka Ogawa; Marina Stamenković-Radak; Mihailo Jelić; Marija Savić Veselinović; Marija Tanasković; Pavle Erić; Jian-Jun Gao; Takehiro K Katoh; Masanori J Toda; Hideaki Watabe; Masayoshi Watada; Jeremy S Davis; Leonie C Moyle; Giulia Manoli; Enrico Bertolini; Vladimír Košťál; R Scott Hawley; Aya Takahashi; Corbin D Jones; Donald K Price; Noah Whiteman; Artyom Kopp Journal: Elife Date: 2021-07-19 Impact factor: 8.713