Literature DB >> 30508066

iHam and pyHam: visualizing and processing hierarchical orthologous groups.

Clément-Marie Train1,2,3, Miguel Pignatelli4,5, Adrian Altenhoff1,2, Christophe Dessimoz1,3,6,7,8.   

Abstract

SUMMARY: The evolutionary history of gene families can be complex due to duplications and losses. This complexity is compounded by the large number of genomes simultaneously considered in contemporary comparative genomic analyses. As provided by several orthology databases, hierarchical orthologous groups (HOGs) are sets of genes that are inferred to have descended from a common ancestral gene within a species clade. This implies that the set of HOGs defined for a particular clade correspond to the ancestral genes found in its last common ancestor. Furthermore, by keeping track of HOG composition along the species tree, it is possible to infer the emergence, duplications and losses of genes within a gene family of interest. However, the lack of tools to manipulate and analyse HOGs has made it difficult to extract, display and interpret this type of information. To address this, we introduce interactive HOG analysis method, an interactive JavaScript widget to visualize and explore gene family history encoded in HOGs and python HOG analysis method, a python library for programmatic processing of genes families. These complementary open source tools greatly ease adoption of HOGs as a scalable and interpretable concept to relate genes across multiple species.
AVAILABILITY AND IMPLEMENTATION: iHam's code is available at https://github.com/DessimozLab/iHam or can be loaded dynamically. pyHam's code is available at https://github.com/DessimozLab/pyHam and or via the pip package 'pyham'.
© The Author(s) 2018. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 30508066      PMCID: PMC6612847          DOI: 10.1093/bioinformatics/bty994

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The evolution of a gene family describes the history of all the genes that shared a common ancestral gene. Those genes called homologs can be distinguished into orthologs if they start diverging by speciation and paralogs if they start diverging by duplication (Fitch, 1970). In comparative genomics, gene families are a fundamental resource since they tend to represent the links between several organisms from a gene centric perspective and allow us to understand how genes and genomes have evolved over time. The evolutionary history of gene families can be studied by visualizing reconciled gene trees, using web-based resources such as Ensembl (Herrero ), HOGENOM/HOVERGEN (Dufayard ), EggNOG (Huerta-Cepas ), PhylomeDB (Huerta-Cepas ) or tools such as ETE (Huerta-Cepas ) and SylvX (Chevenet ). However, when considering large families across many species, reconciled gene trees can become prohibitively complex to infer and interpret. As a scalable alternative to reconciled gene trees, the concept of Hierarchical Orthologous Groups (HOGs) is increasingly adopted. HOGs generalize Fitch’s definition of orthology to more than two species, by grouping sequences that have descended from a common ancestral gene within a clade of interest. Thus, the set of all HOGs defined for a given clade corresponds to the set of ancestral genes in the common ancestor of that clade. Furthermore, if HOGs are available for nested clades (e.g. vertebrates versus mammals), the difference between their HOG repertoires imply gene duplication and loss events on the branch separating them: a HOG split implies a duplication, while a HOG disappearance implies a loss. HOGs are inferred by several leading orthology databases such as OrthoDB (Zdobnov et al., 2017), EggNOG (Huerta-Cepas ), HieranoidDB (Kaduk ) or OMA (Altenhoff ). In OMA, for instance, some HOGs connect large gene families of over 100 000 members across 1000’s of genomes. Because of this complexity, manual exploration of gene families encoded in HOGs can be challenging. As of now, there is a lack of tool for visualizing, exploring and processing HOGs to tackle specific biological questions. In this application note, we introduce two tools to facilitate the visualization and analysis of HOGs: interactive HOG analysis method (iHam) for web-based interactive visualization and exploration of individual HOGs and python HOG analysis method (pyHam) to perform aggregate analyses.

2 iHam

iHam is an interactive JavaScript tool to visualize the evolutionary history of a specific gene family encoded in HOGs. The viewer is composed of two panels (Fig. 1A): a species tree which lets the user select a node to focus on a particular taxonomic range of interest, and a matrix that organizes extant genes according to their membership in species (rows) and HOGs (columns). The tree-guided matrix representation of HOGs facilitates: (i) to delineate orthologous groups at given taxonomic ranges, (ii) to infer duplication and loss events in the species tree, (iii) gauge the cumulative effect of duplications and losses on gene repertoires and (iv) to identify potential mistakes in genome assembly, annotation or orthology inference (e.g. if losses are concentrated on terminal branches—suggestive of incomplete genomes; or if the species coverage within a HOG looks implausible—suggestive of orthology inference error).
Fig. 1.

(A) iHam. An excerpt of the Tetraspanin family at the Haplorrhini level: the tree depicts relationships between species, squares depict genes and HOGs are delineated by vertical bars. (B) pyHam can be used to map gene losses, duplications or new appearances (‘gained’) onto species trees (here, using the NCBI taxonomy tree)

(A) iHam. An excerpt of the Tetraspanin family at the Haplorrhini level: the tree depicts relationships between species, squares depict genes and HOGs are delineated by vertical bars. (B) pyHam can be used to map gene losses, duplications or new appearances (‘gained’) onto species trees (here, using the NCBI taxonomy tree) Users can customize the view in different ways. They can color genes according to protein length or GC-content. Low-confidence HOGs can be masked. Irrelevant species clades can be collapsed. iHam is a reusable web widget that can be easily embedded into a website; for instance, it is used to display HOGs in OMA (http://omabrowser.org; Altenhoff ). Implemented as a JavaScript library using the TnT framework (Pignatelli, 2016), iHam merely requires as input HOGs in the standard OrthoXML format (Schmitt ) and the underlying species tree in newick or PhyloXML format (supported resources listed in Table 1).

Table 1. Support for iHam and pyHam by various HOG inference resources

ResourceSpecies tree formatOrthoXMLiHam SupportpyHam Support
OMA browserPhyloXML and NewickAll HOGs, or one HOG at a timeYESYES
OMA standalonePhyloXML and NewickAll HOGsYESYES
EnsemblNewickOne HOG at a timeYESYES
HieranoidDBNewickOne HOG at a timeYESYES
Table 1. Support for iHam and pyHam by various HOG inference resources

3 pyHam

pyHam makes it possible to extract useful information from HOGs encoded in standard OrthoXML format. It is available both as a python library and as a set of command-line scripts. Input HOGs in OrthoXML format are available from multiple bioinformatics resources, including OMA, Ensembl and HieranoidDB (Table 1). The main features of pyHam are: (i) given a clade of interest, extract all the relevant HOGs, each of which ideally corresponds to a distinct ancestral gene in the last common ancestor of the clade; (ii) given a branch on the species tree, report the HOGs that duplicated on the branch, got lost on the branch, first appeared on that branch or were simply retained; (iii) repeat the previous point along the entire species tree and plot an overview of the gene evolution dynamics along the tree (Fig. 1B) and (iv) given a set of nested HOGs for a specific gene family of interest, generate a local iHam web page to visualize its evolutionary history.

4 Conclusion

The combination of iHam and pyHam enable users to unlock the full potential of HOGs.
  12 in total

1.  Tree pattern matching in phylogenetic trees: automatic search for orthologs or paralogs in homologous gene sequence databases.

Authors:  Jean-François Dufayard; Laurent Duret; Simon Penel; Manolo Gouy; François Rechenmann; Guy Perrière
Journal:  Bioinformatics       Date:  2005-02-15       Impact factor: 6.937

2.  Letter to the editor: SeqXML and OrthoXML: standards for sequence and orthology information.

Authors:  Thomas Schmitt; David N Messina; Fabian Schreiber; Erik L L Sonnhammer
Journal:  Brief Bioinform       Date:  2011-06-11       Impact factor: 11.622

3.  Distinguishing homologous from analogous proteins.

Authors:  W M Fitch
Journal:  Syst Zool       Date:  1970-06

4.  ETE: a python Environment for Tree Exploration.

Authors:  Jaime Huerta-Cepas; Joaquín Dopazo; Toni Gabaldón
Journal:  BMC Bioinformatics       Date:  2010-01-13       Impact factor: 3.169

5.  Ensembl comparative genomics resources.

Authors:  Javier Herrero; Matthieu Muffato; Kathryn Beal; Stephen Fitzgerald; Leo Gordon; Miguel Pignatelli; Albert J Vilella; Stephen M J Searle; Ridwan Amode; Simon Brent; William Spooner; Eugene Kulesha; Andrew Yates; Paul Flicek
Journal:  Database (Oxford)       Date:  2016-05-02       Impact factor: 3.451

6.  TnT: a set of libraries for visualizing trees and track-based annotations for the web.

Authors:  Miguel Pignatelli
Journal:  Bioinformatics       Date:  2016-04-22       Impact factor: 6.937

7.  HieranoiDB: a database of orthologs inferred by Hieranoid.

Authors:  Mateusz Kaduk; Christian Riegler; Oliver Lemp; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2016-10-13       Impact factor: 16.971

8.  OrthoDB v9.1: cataloging evolutionary and functional annotations for animal, fungal, plant, archaeal, bacterial and viral orthologs.

Authors:  Evgeny M Zdobnov; Fredrik Tegenfeldt; Dmitry Kuznetsov; Robert M Waterhouse; Felipe A Simão; Panagiotis Ioannidis; Mathieu Seppey; Alexis Loetscher; Evgenia V Kriventseva
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  PhylomeDB v4: zooming into the plurality of evolutionary histories of a genome.

Authors:  Jaime Huerta-Cepas; Salvador Capella-Gutiérrez; Leszek P Pryszcz; Marina Marcet-Houben; Toni Gabaldón
Journal:  Nucleic Acids Res       Date:  2013-11-25       Impact factor: 16.971

10.  The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces.

Authors:  Adrian M Altenhoff; Natasha M Glover; Clément-Marie Train; Klara Kaleb; Alex Warwick Vesztrocy; David Dylus; Tarcisio M de Farias; Karina Zile; Charles Stevenson; Jiao Long; Henning Redestig; Gaston H Gonnet; Christophe Dessimoz
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

View more
  8 in total

1.  Repeated truncation of a modular antimicrobial peptide gene for neural context.

Authors:  Mark A Hanson; Bruno Lemaitre
Journal:  PLoS Genet       Date:  2022-06-17       Impact factor: 6.020

2.  Positive selection and gene duplications in tumour suppressor genes reveal clues about how cetaceans resist cancer.

Authors:  Daniela Tejada-Martinez; João Pedro de Magalhães; Juan C Opazo
Journal:  Proc Biol Sci       Date:  2021-02-24       Impact factor: 5.349

3.  Expanding the Orthologous Matrix (OMA) programmatic interfaces: REST API and the OmaDB packages for R and Python.

Authors:  Klara Kaleb; Alex Warwick Vesztrocy; Adrian Altenhoff; Christophe Dessimoz
Journal:  F1000Res       Date:  2019-01-10

4.  OMA standalone: orthology inference among public and custom genomes and transcriptomes.

Authors:  Adrian M Altenhoff; Jeremy Levy; Magdalena Zarowiecki; Bartłomiej Tomiczek; Alex Warwick Vesztrocy; Daniel A Dalquen; Steven Müller; Maximilian J Telford; Natasha M Glover; David Dylus; Christophe Dessimoz
Journal:  Genome Res       Date:  2019-06-24       Impact factor: 9.043

5.  Advances and Applications in the Quest for Orthologs.

Authors:  Natasha Glover; Christophe Dessimoz; Ingo Ebersberger; Sofia K Forslund; Toni Gabaldón; Jaime Huerta-Cepas; Maria-Jesus Martin; Matthieu Muffato; Mateus Patricio; Cécile Pereira; Alan Sousa da Silva; Yan Wang; Erik Sonnhammer; Paul D Thomas
Journal:  Mol Biol Evol       Date:  2019-10-01       Impact factor: 16.240

6.  Gene Duplication and Gain in the Trematode Atriophallophorus winterbourni Contributes to Adaptation to Parasitism.

Authors:  Natalia Zajac; Stefan Zoller; Katri Seppälä; David Moi; Christophe Dessimoz; Jukka Jokela; Hanna Hartikainen; Natasha Glover
Journal:  Genome Biol Evol       Date:  2021-03-01       Impact factor: 3.416

7.  Ten Years of Collaborative Progress in the Quest for Orthologs.

Authors:  Benjamin Linard; Ingo Ebersberger; Shawn E McGlynn; Natasha Glover; Tomohiro Mochizuki; Mateus Patricio; Odile Lecompte; Yannis Nevers; Paul D Thomas; Toni Gabaldón; Erik Sonnhammer; Christophe Dessimoz; Ikuo Uchiyama
Journal:  Mol Biol Evol       Date:  2021-07-29       Impact factor: 16.240

8.  Identifying orthologs with OMA: A primer.

Authors:  Monique Zahn-Zabal; Christophe Dessimoz; Natasha M Glover
Journal:  F1000Res       Date:  2020-01-17
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.