| Literature DB >> 29163633 |
Bruno T L Nichio1, Jeroniza Nunes Marchaukoski1, Roberto Tadeu Raittz1.
Abstract
Nowadays defying homology relationships among sequences is essential for biological research. Within homology the analysis of orthologs sequences is of great importance for computational biology, annotation of genomes and for phylogenetic inference. Since 2007, with the increase in the number of new sequences being deposited in large biological databases, researchers have begun to analyse computerized methodologies and tools aimed at selecting the most promising ones in the prediction of orthologous groups. Literature in this field of research describes the problems that the majority of available tools show, such as those encountered in accuracy, time required for analysis (especially in light of the increasing volume of data being submitted, which require faster techniques) and the automatization of the process without requiring manual intervention. Conducting our search through BMC, Google Scholar, NCBI PubMed, and Expasy, we examined more than 600 articles pursuing the most recent techniques and tools developed to solve most the problems still existing in orthology detection. We listed the main computational tools created and developed between 2011 and 2017, taking into consideration the differences in the type of orthology analysis, outlining the main features of each tool and pointing to the problems that each one tries to address. We also observed that several tools still use as their main algorithm the BLAST "all-against-all" methodology, which entails some limitations, such as limited number of queries, computational cost, and high processing time to complete the analysis. However, new promising tools are being developed, like OrthoVenn (which uses the Venn diagram to show the relationship of ortholog groups generated by its algorithm); or proteinOrtho (which improves the accuracy of ortholog groups); or ReMark (tackling the integration of the pipeline to turn the entry process automatic); or OrthAgogue (using algorithms developed to minimize processing time); and proteinOrtho (developed for dealing with large amounts of biological data). We made a comparison among the main features of four tool and tested them using four for prokaryotic genomas. We hope that our review can be useful for researchers and will help them in selecting the most appropriate tool for their work in the field of orthology.Entities:
Keywords: bioinformatics; comparative analysis; genomic dynamics; orthology prediction; phylogeny
Year: 2017 PMID: 29163633 PMCID: PMC5674930 DOI: 10.3389/fgene.2017.00165
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Growth of number of the citations by the new ortholog tools from 2011 to 2017. A brief relationship between the number of citations along the years. It is observed that the new softwares to use and development in the prediction of ortholog groups has been increasing in recent years since 2011 until 2017. The blue line represents the citation of emergent tools in orthology inference for each year and the red line shows the tendency to increase.
Figure 2The tools overview method and its main features to orthology prediction. The ortholog tools subdivided into four categories according of each tool characteristic to show better its main applicability in orthology predictions context: in better accuracy, higher speed detection, large data set, or automation of process (pipelines).
Software tools features for Ortholog studies since 2011 at 2017.
| Hieranoid | Combines an efficient graph-based methodology with aspects of compute-intensive tree-based methods to infer orthology | Linux/Unix; web-server | Perl, BioPerl, BLAST, Muscle and Kalign | 2013 | |
| MorFeus | Calculates a network score to the resulting network orthologs to find remotely stored orthologs proteins | Linux/Unix; web-server: | Python, biopython, networkx, gnuplot and BLAST+ | 2014 | |
| OrthAgogue | High speed of the homology relationships in large data sets | Linux/Unix | BLAST, cmph and TBB | 2013 | |
| OrthoInspector | Agile detection of orthology and in-paralogy incorporing a unique algorithm | Cross-platform (Java) | BLASTP+ package, JAVA, MySQL | 2011 | |
| OrthoFinder | Solves the fundamental biases in whole genome comparisons and improves inference accuracy in ortholog groups | Linux/Unix | BLASTP+ package, python, MCL graph clustering algorithm, MAFF and Fastree | 2015 | |
| Ortholog-Finder | Identifies genuine orthologs among distantly related species by phylogenetic analysis using ORF data | Linux/Unix | BLAST+, MAFFT, BioPERL, OrthoMCL, JAVA, ClustalW | 2016 | |
| Orthograph | Developed for a large data set maintaining the high sensitivity and accuracy with BRH approachs | Linux/Mac OS X | BLAST +, PERL, MySQL, MAFF, SWIPE | 2017 | |
| Orthonome | Designed to boost the accuracy of multiple-species in ortholog predictions and reduce the trade-off between ortholog captures rates | Web-Server | Linux Bash, C++, Perl and Python package | 2017 | |
| OrthoVenn | Makes relationships of orthologous clusters across multiple species using Venn Diagram | Web-server | BLAST and MCL algorithms | 2015 | |
| PanOCT | Automates clusters of orthologs for pan-genomic analysis of bacterial strains and closely related species | Linux/Unix | BLAST+, PERL | 2012 | |
| PhosphOrtholog | Developed for mapping of orthologs by protein PTMs by cross-species | Web-Server | BLOSUM62, comma separated file format (.csv) input | 2015 | |
| PorthoDom | Developed to speed up the detection of orthologs protein using domain sequences | Cross-platform | C program, Python, pFam Database, HMMER | 2015 | |
| PorthoMCL | Designed for find orthologs in a large number of genomes | Linux and Unix (OS X) | BLAST, PERL, Python, MCL | 2017 | |
| ProteinOrtho | Dealing with hundreds of bacterial species in set containing millions of proteins using low computer memory | Linux/Unix 64 bits | BLAST+, PERL, and Python | 2011 | |
| ReMark | Identifies orthologs automatically by a parameter adjustment according to the user's interest | Cross-platform (Java) | Recursive and a Markov clustering (MCL) algorithms, Reciprocal BLAST Best Hits (RBBH), JAVA | 2011 | |
| SPOCS | Ortholog prediction on graph method based to generate a table can provide a visualization of the relationships between the orthologs groups | Web-Server; Linux/Mac OS X | BLAST+, C++ libraries | 2013 |
Software tools usabilities.
| Hieranoid | Uses hierarchical approaches to ortholog inference based on InParanoid method. Its uses BLAST or not (Usearch – optionally). | Needs a lot of dependences to runs locally: Perl, BioPerl, HHSearch, BLAST or Usearch algorithms. Limit of queries on Web-Server uses. |
| MorFeus | Uses symmetrical best hits and orthology network scoring to detect remotely conserved orthologs. Available in web-server and locally. | A lot of dependences such BLAST (locally) to runs and it is not updated since 2014. |
| OrthAgogue | A multithreaded application for massive datasets with high-speed estimation of homology relations. | The input file be a tabular file generated by the BLAST algorithm is needed. BLAST package is necessary. |
| OrthoInspector | Incorporates an original algorithm, facilitate data query, and process automation. | Creation of a Database in Postgresql or MySQL and BLAST dependences. |
| OrthoFinder | Easy command that uses as input a multiFASTA file (one per species) minimizing the bias of the length, previously undetected gene in orthogroup. | Needs a lot of dependences including BLAST and MCL algorithms to run. |
| Ortholog-Finder | A program that identifies genuine orthologs among distant species using HGF filters for phylogenetic analysis | A lot of dependences and the program does not support the maximum likelihood or Bayes methods. |
| Orthonome | The algorithm provides a superior combination of ortholog capture rates and accuracy on draft or complete drosophilid genomes. | Limited number of queries (web) and depends the assemblies and annotations improves to better performance. |
| OrthoVenn | Visualization using the interactive Venn diagram in the generated clusters views. Brings Gene Ontology information with each protein functions. | Only web-server, limitation of queries. |
| Orthograph | A program easy to install that facilitates comparative analyses of transcriptomic and other coding sequence for comparative genomic analyses. | Needs a lot of dependences to run like BLAST, MySQL, MAFFT, HMMer, SWIPPE to runs. |
| PanOCT | Procariotic uses, Orthologs and co-orthologs relationships. | Depends on the PERL packages, BLAST+ and is limited to an analysis of up to 25 genomes. |
| PhosphOrtholog | Mapping between orthologous protein species from post-translational modifications (PTMs). Uses UniProt/Swiss-Prot DB reference. | Exclusively on the Web, it needs comma separated file format and it is restricted to just the proteomas of human, mouse, rat, fly. |
| porthoDom | Uses protein domain to speed up proteinOrtho. Uses Pfam anotation to accuracity. | Its a bit laborious to be performed, needs a lot packages, Pfam database and HMMER package, in addition to the ProteinOrtho tool. |
| PorthoMCL | Capability by identifying orthology in large number of genomes. | Although fast and easy tool, requires BLAST, PERL, and Python package. |
| ProteinOrtho | Reduces the amount of memory needed to create orthologs groups, finds co-orthologs on big banks containing different species. | Depends on libraries such BLAST +, PERL, and Python to run. |
| ReMark | Makes the process more automated, using adjustment according to the user's interest. | The tool is not updated since march 2011. It Needs BLAST and JAVA dependences. |
| SPOCS | Flexibility and automates the process of orthologs detection, without the need for multiple steps. | It needs boost C++ and BLAST previously installed and generates text or HTML outputs. |