Literature DB >> 31648300

SwiftOrtho: A fast, memory-efficient, multiple genome orthology classifier.

Xiao Hu1, Iddo Friedberg1.   

Abstract

BACKGROUND: Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters.
FINDINGS: Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy.
CONCLUSIONS: SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Keywords:  clustering; homology search; orthologs; orthology analysis; orthology inference; paralogs

Mesh:

Substances:

Year:  2019        PMID: 31648300      PMCID: PMC6812468          DOI: 10.1093/gigascience/giz118

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  59 in total

1.  GenBank.

Authors:  D A Benson; I Karsch-Mizrachi; D J Lipman; J Ostell; B A Rapp; D L Wheeler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Aligning two sequences within a specified diagonal band.

Authors:  K M Chao; W R Pearson; W Miller
Journal:  Comput Appl Biosci       Date:  1992-10

3.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

Review 4.  The quest for orthologs: finding the corresponding gene across genomes.

Authors:  Arnold Kuzniar; Roeland C H J van Ham; Sándor Pongor; Jack A M Leunissen
Journal:  Trends Genet       Date:  2008-09-24       Impact factor: 11.639

5.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

Authors:  M Remm; C E Storm; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-12-14       Impact factor: 5.469

6.  Fast databank searching with a reduced amino-acid alphabet.

Authors:  C Landès; J L Risler
Journal:  Comput Appl Biosci       Date:  1994-07

7.  Proteinortho: detection of (co-)orthologs in large-scale analysis.

Authors:  Marcus Lechner; Sven Findeiss; Lydia Steiner; Manja Marz; Peter F Stadler; Sonja J Prohaska
Journal:  BMC Bioinformatics       Date:  2011-04-28       Impact factor: 3.169

8.  Phylogenetic and functional assessment of orthologs inference projects and methods.

Authors:  Adrian M Altenhoff; Christophe Dessimoz
Journal:  PLoS Comput Biol       Date:  2009-01-16       Impact factor: 4.475

9.  Assessing performance of orthology detection strategies applied to eukaryotic genomes.

Authors:  Feng Chen; Aaron J Mackey; Jeroen K Vermunt; David S Roos
Journal:  PLoS One       Date:  2007-04-18       Impact factor: 3.240

10.  SonicParanoid: fast, accurate and easy orthology inference.

Authors:  Salvatore Cosentino; Wataru Iwasaki
Journal:  Bioinformatics       Date:  2019-01-01       Impact factor: 6.937

View more
  10 in total

1.  Massive haplotypes underlie ecotypic differentiation in sunflowers.

Authors:  Marco Todesco; Gregory L Owens; Natalia Bercovich; Jean-Sébastien Légaré; Shaghayegh Soudi; Dylan O Burge; Kaichi Huang; Katherine L Ostevik; Emily B M Drummond; Ivana Imerovski; Kathryn Lande; Mariana A Pascual-Robles; Mihir Nanavati; Mojtaba Jahani; Winnie Cheung; S Evan Staton; Stéphane Muños; Rasmus Nielsen; Lisa A Donovan; John M Burke; Sam Yeaman; Loren H Rieseberg
Journal:  Nature       Date:  2020-07-08       Impact factor: 49.962

2.  A nomenclature for echinoderm genes.

Authors:  Thomas R Beatman; Katherine M Buckley; Gregory A Cary; Veronica F Hinman; Charles A Ettensohn
Journal:  Database (Oxford)       Date:  2021-08-07       Impact factor: 4.462

3.  Integration of 1:1 orthology maps and updated datasets into Echinobase.

Authors:  Saoirse Foley; Carolyn Ku; Brad Arshinoff; Vaneet Lotay; Kamran Karimi; Peter D Vize; Veronica Hinman
Journal:  Database (Oxford)       Date:  2021-05-19       Impact factor: 3.451

4.  Ten Years of Collaborative Progress in the Quest for Orthologs.

Authors:  Benjamin Linard; Ingo Ebersberger; Shawn E McGlynn; Natasha Glover; Tomohiro Mochizuki; Mateus Patricio; Odile Lecompte; Yannis Nevers; Paul D Thomas; Toni Gabaldón; Erik Sonnhammer; Christophe Dessimoz; Ikuo Uchiyama
Journal:  Mol Biol Evol       Date:  2021-07-29       Impact factor: 16.240

5.  Echinobase: leveraging an extant model organism database to build a knowledgebase supporting research on the genomics and biology of echinoderms.

Authors:  Bradley I Arshinoff; Gregory A Cary; Kamran Karimi; Saoirse Foley; Sergei Agalakov; Francisco Delgado; Vaneet S Lotay; Carolyn J Ku; Troy J Pells; Thomas R Beatman; Eugene Kim; R Andrew Cameron; Peter D Vize; Cheryl A Telmer; Jenifer C Croce; Charles A Ettensohn; Veronica F Hinman
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

6.  Fast characterization of segmental duplication structure in multiple genome assemblies.

Authors:  Hamza Išerić; Can Alkan; Faraz Hach; Ibrahim Numanagić
Journal:  Algorithms Mol Biol       Date:  2022-03-18       Impact factor: 1.405

7.  Genome skimming approach reveals the gene arrangements in the chloroplast genomes of the highly endangered Crocus L. species: Crocus istanbulensis (B.Mathew) Rukšāns.

Authors:  Selahattin Baris Cay; Yusuf Ulas Cinar; Selim Can Kuralay; Behcet Inal; Gokmen Zararsiz; Almila Ciftci; Rachel Mollman; Onur Obut; Vahap Eldem; Yakup Bakir; Osman Erol
Journal:  PLoS One       Date:  2022-06-15       Impact factor: 3.752

8.  Evolutionary analyses of genes in Echinodermata offer insights towards the origin of metazoan phyla.

Authors:  Saoirse Foley; Anna Vlasova; Marina Marcet-Houben; Toni Gabaldón; Veronica F Hinman
Journal:  Genomics       Date:  2022-07-12       Impact factor: 4.310

9.  Discovery of multi-operon colinear syntenic blocks in microbial genomes.

Authors:  Dina Svetlitsky; Tal Dagan; Michal Ziv-Ukelson
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

10.  dagLogo: An R/Bioconductor package for identifying and visualizing differential amino acid group usage in proteomics data.

Authors:  Jianhong Ou; Haibo Liu; Niraj K Nirala; Alexey Stukalov; Usha Acharya; Michael R Green; Lihua Julie Zhu
Journal:  PLoS One       Date:  2020-11-06       Impact factor: 3.240

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.