Literature DB >> 25600944

Functional Gene Networks: R/Bioc package to generate and analyse gene networks derived from functional enrichment and clustering.

Sara Aibar1, Celia Fontanillo1, Conrad Droste1, Javier De Las Rivas1.   

Abstract

Functional Gene Networks (FGNet) is an R/Bioconductor package that generates gene networks derived from the results of functional enrichment analysis (FEA) and annotation clustering. The sets of genes enriched with specific biological terms (obtained from a FEA platform) are transformed into a network by establishing links between genes based on common functional annotations and common clusters. The network provides a new view of FEA results revealing gene modules with similar functions and genes that are related to multiple functions. In addition to building the functional network, FGNet analyses the similarity between the groups of genes and provides a distance heatmap and a bipartite network of functionally overlapping genes. The application includes an interface to directly perform FEA queries using different external tools: DAVID, GeneTerm Linker, TopGO or GAGE; and a graphical interface to facilitate the use.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2015        PMID: 25600944      PMCID: PMC4426835          DOI: 10.1093/bioinformatics/btu864

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Due to the increasing number of omic studies, efficient functional analysis of large lists of genes or proteins is essential to understand the biological processes in which they are involved. Functional enrichment analysis (FEA) is the most popular bioinformatic methodology to obtain significant functional information from sets of cooperating genes. FEA methods search in biological databases (such as Gene Ontology and KEGG pathways, among others) and use statistical testing to find biological terms and functional annotations that are significantly enriched in a list of genes. However, in most cases the results of these analyses are very long lists of biological terms associated to genes that are difficult to digest and interpret. Some tools cluster the FEA results, like DAVID-FAC (Huang ) and GeneTerm Linker (Fontanillo ), but their output is provided as large tables and there are not many tools to integrate and visualize these results. Here we present Functional Gene Networks (FGNet), an R/Bioconductor package that uses FEA results to perform network-based analyses and visualization. The main network is built by establishing links between genes annotated to similar functional terms. In this way, FGNet generates and provides a network representing the links and associations between the clusters of genes and enriched terms. The network summarizes and facilitates the interpretation of the biological processes significantly enriched in the initial list of genes, revealing important information such as: distance and overlap between clusters, identification of modules and hubs. The tool can also help to disclose new associations among genes cooperating in hidden biological processes not annotated yet, which can be revealed by the topology of the functional network.

2 Methods

2.1 Input: functional enrichment and clustering

FGNet builds functional networks based on the groups obtained from clustering gene-term sets (gtsets, genes and terms associated by an enrichment p-value) returned by a FEA. The package includes an interface to do queries with gene lists using four FEA tools: DAVID with Functional Annotation Clustering (that returns clustered gtsets, Cl); GAGE (that also provides clusters) (Luo ); GeneCodis with GeneTerm Linker (that returns metagroups, Mg) and TopGO (that only returns gtsets) (Alexa ). The package can be also applied to the results from other EA tools, as long as the input results are transformed into tables of genes and associated terms.

2.2 Construction of the functional network

The functional network is built based on the analysis of all the gtsets provided by the FEA tool. These sets allow to generate a boolean matrix M of genes by gtsets, in which each element if gene g is in set s. This membership matrix is then transformed into an adjacency matrix A n × n; being n the total number of genes and the number of gtsets s in which a gene-pair is included: , where δ is a Kronecker delta ( if i = j, if ). This adjacency matrix is used to generate the functional network by establishing a weighted link between each pair of genes (g, g) in which . Finally, the clustering of gtsets provided by the FEA tool is used to generate a second genes’ adjacency matrix with the number of common clusters/metagroups (Fig. 1A), that is used to define and allocate gene groups. The network produced is provided as an igraph object for further analysis, and can be exported to other network-based tools like Cytoscape.
Fig. 1.

Schematic workflow. A query gene list is analysed through a FEA tool and the generated gene-term sets are used to build: (A) gene’s adjacency matrices; (B) a functional network (general view); (C) a distance heatmap and (D) an intersection network (to highlight multifunctional genes)

Schematic workflow. A query gene list is analysed through a FEA tool and the generated gene-term sets are used to build: (A) gene’s adjacency matrices; (B) a functional network (general view); (C) a distance heatmap and (D) an intersection network (to highlight multifunctional genes)

2.3 Visualization and plots of the functional network

The main plot of the network presents the functionally associated genes (Fig. 1B). Edges link the genes that are in the same gtsets. Nodes within the same Cl/Mg are placed together using a force-directed Fruchterman–Reingold layout, within a common background colour. Genes in only one Cl/Mg are plotted with the colour of such group, while genes that are included in more than one Cl/Mg are left white.

2.4 Analysis of functional modules in the network

To facilitate the analysis and quantification of the modules and the overlap between groups, FGNet also provides a distance matrix and a heatmap (Fig. 1C), plus an intersection network (Fig. 1D). The distance matrix is calculated based on the pairwise binary distance in the adjacency matrix of common Cls/Mgs. These distances are analysed by hierarchical average linkage and plotted as a heatmap that reveals the proximity and similarity between the groups of genes (Cls/Mgs). The intersection network is a bipartite network which includes only the genes associated to several Cls/Mgs (white nodes in Fig. 1B,D), showing their connectivity to such Cls/Mgs. This intersection network facilitates the identification of multifunctional genes. (For more details see FGNet documentation in Bioconductor).

3 Example of use

We applied the method to several datasets and confirmed that the functional network greatly facilitates the analysis of enrichment results. Figure 1 shows the results of FGNet for a list of 175 genes differentially expressed in human samples of entorhinal cortex neurons from Alzheimer’s disease (AD) patients (obtained from Gene Expression Omnibus database, GEO: dataset GSE4757). Performing a FEA through GeneTerm Linker, we obtained six metagroups that we labelled according to their main annotations: (Mg1) cell adhesion; (Mg2) voltage-gated ion/potassium channels; (Mg3) axon and cell projection; (Mg4) dendrite and neuronal cell body; (Mg5) synaptic neuroactive ligand-receptor interaction and (Mg6) MAPK signaling and Alzheimer. The network of these six Mgs (Fig. 1B) provides a global overview of the functionally overlapping genes and allows to identify hub genes that interconnect groups. For example, CNTNAP1 and NLGN4X appear as hubs in Mg1. CNTNAP1 (that regulates distribution of K+ channels) links Mg1 and 2; and NLGN4X (that facilitates synaptic neurotransmission) links Mg1 with 4 and 5. NLGN4X is the gene with highest betweenness centrality in this network. Another important hub is APOE, recently associated to Alzheimer. The distance matrix (Fig. 1C) allows to quantify the similarity between gene groups, showing that the closest Mgs are 3, 4 and 6, sharing eight nodes. This is also presented in the intersection network (Fig. 1D). Finally, the functional network can reveal further information about some Mgs. For example, if a Mg shares many genes with several other Mgs, it will indicate that such Mg brings the most common features that define the studied biological state. This is the case for Mg6, which, in fact, is annotated to Alzheimer's Disease.

Funding

This work was supported by the “Accion Estrategica en Salud” (AES) of the “Instituto de Salud Carlos III” (ISCiii) from the Spanish Government (projects granted to J.D.L.R.: PS09/00843 and PI12/00624); and by the “Consejeria de Educación” of the “Junta Castilla y Leon” (JCyL) and the European Social Fund (ESF) with grants given to S.A. and C.D. Conflict of Interest: none declared.
  3 in total

1.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nat Protoc       Date:  2009       Impact factor: 13.491

2.  Functional analysis beyond enrichment: non-redundant reciprocal linkage of genes and biological terms.

Authors:  Celia Fontanillo; Ruben Nogales-Cadenas; Alberto Pascual-Montano; Javier De las Rivas
Journal:  PLoS One       Date:  2011-09-16       Impact factor: 3.240

3.  GAGE: generally applicable gene set enrichment for pathway analysis.

Authors:  Weijun Luo; Michael S Friedman; Kerby Shedden; Kurt D Hankenson; Peter J Woolf
Journal:  BMC Bioinformatics       Date:  2009-05-27       Impact factor: 3.169

  3 in total
  28 in total

1.  Genome-Wide Analysis Reveals Mucociliary Remodeling of the Nasal Airway Epithelium Induced by Urban PM2.5.

Authors:  Michael T Montgomery; Satria P Sajuthi; Seung-Hyun Cho; Jamie L Everman; Cydney L Rios; Katherine C Goldfarbmuren; Nathan D Jackson; Benjamin Saef; Meghan Cromie; Celeste Eng; Vivian Medina; Jennifer R Elhawary; Sam S Oh; Jose Rodriguez-Santana; Eszter K Vladar; Esteban G Burchard; Max A Seibold
Journal:  Am J Respir Cell Mol Biol       Date:  2020-08       Impact factor: 6.914

2.  A Highly Sensitive and Robust Method for Genome-wide 5hmC Profiling of Rare Cell Populations.

Authors:  Dali Han; Xingyu Lu; Alan H Shih; Ji Nie; Qiancheng You; Meng Michelle Xu; Ari M Melnick; Ross L Levine; Chuan He
Journal:  Mol Cell       Date:  2016-07-28       Impact factor: 17.970

3.  CD4+CD28+KIR+CD11ahi T cells correlate with disease activity and are characterized by a pro-inflammatory epigenetic and transcriptional profile in lupus patients.

Authors:  Elizabeth Gensterblum; Paul Renauer; Patrick Coit; Faith M Strickland; Nathan C Kilian; Shaylynn Miller; Mikhail Ognenovski; Jonathan D Wren; Pei-Suen Tsou; Emily E Lewis; Kathleen Maksimowicz-McKinnon; W Joseph McCune; Bruce C Richardson; Amr H Sawalha
Journal:  J Autoimmun       Date:  2017-10-20       Impact factor: 7.094

4.  Activated signature of antiphospholipid syndrome neutrophils reveals potential therapeutic target.

Authors:  Jason S Knight; He Meng; Patrick Coit; Srilakshmi Yalavarthi; Gautam Sule; Alex A Gandhi; Robert C Grenn; Levi F Mazza; Ramadan A Ali; Paul Renauer; Jonathan D Wren; Paula L Bockenstedt; Hui Wang; Daniel T Eitzman; Amr H Sawalha
Journal:  JCI Insight       Date:  2017-09-21

5.  Bioinformatic Analysis of Human Cumulus Cells to Unravel Cellular's Processes that Could Be Used to Establish Oocyte Quality Biomarkers with Clinical Application.

Authors:  Lucia von Mengden; Marco Antônio De Bastiani; Lucas Kich Grun; Florencia Barbé-Tuana; Tom Adriaenssens; Johan Smitz; Leticia Schmidt Arruda; Carlos Alberto Link; Fábio Klamt
Journal:  Reprod Sci       Date:  2022-07-26       Impact factor: 2.924

6.  Dynamic and Cell-Specific DACH1 Expression in Human Neocortical and Striatal Development.

Authors:  Valentina Castiglioni; Andrea Faedo; Marco Onorati; Vittoria Dickinson Bocchi; Zhen Li; Raffaele Iennaco; Romina Vuono; Gaetano P Bulfamante; Luca Muzio; Gianvito Martino; Nenad Sestan; Roger A Barker; Elena Cattaneo
Journal:  Cereb Cortex       Date:  2019-05-01       Impact factor: 5.357

Review 7.  Dynamics in Transcriptomics: Advancements in RNA-seq Time Course and Downstream Analysis.

Authors:  Daniel Spies; Constance Ciaudo
Journal:  Comput Struct Biotechnol J       Date:  2015-08-24       Impact factor: 7.271

8.  Insights into the human mesenchymal stromal/stem cell identity through integrative transcriptomic profiling.

Authors:  Beatriz Roson-Burgo; Fermin Sanchez-Guijo; Consuelo Del Cañizo; Javier De Las Rivas
Journal:  BMC Genomics       Date:  2016-11-21       Impact factor: 3.969

9.  Using Human iPSC-Derived Neurons to Uncover Activity-Dependent Non-Coding RNAs.

Authors:  Mainá Bitar; Stefanie Kuiper; Elizabeth O'Brien; Guy Barry
Journal:  Genes (Basel)       Date:  2017-12-20       Impact factor: 4.096

10.  R Script Approach to Infer Toxoplasma Infection Mechanisms From Microarrays and Domain-Domain Protein Interactions.

Authors:  Ailan F Arenas; Gladys E Salcedo; Jorge E Gomez-Marin
Journal:  Bioinform Biol Insights       Date:  2017-12-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.