Literature DB >> 21596792

linkcomm: an R package for the generation, visualization, and analysis of link communities in networks of arbitrary size and type.

Alex T Kalinka1, Pavel Tomancak.   

Abstract

SUMMARY: An essential element when analysing the structure, function, and dynamics of biological networks is the identification of communities of related nodes. An algorithm proposed recently enhances this process by clustering the links between nodes, rather than the nodes themselves, thereby allowing each node to belong to multiple overlapping or nested communities. The R package 'linkcomm' implements this algorithm and extends it in several aspects: (i) the clustering algorithm handles networks that are weighted, directed, or both weighted and directed; (ii) several visualization methods are implemented that facilitate the representation of the link communities and their relationships; (iii) a suite of functions are included for the downstream analysis of the link communities including novel community-based measures of node centrality; (iv) the main algorithm is written in C++ and designed to handle networks of any size; and (v) several clustering methods are available for networks that can be handled in memory, and the number of communities can be adjusted by the user. AVAILABILITY: The program is freely available from the Comprehensive R Archive Network (http://cran.r-project.org/) under the terms of the GNU General Public License (version 2 or later).

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21596792      PMCID: PMC3129527          DOI: 10.1093/bioinformatics/btr311

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The advent of high-throughput technologies in the biological sciences has resulted in a large amount of data that can often be represented as systems of interacting elements, such as genes or proteins. To understand how the nodes in these networks relate to one another and how the topologies of the networks influence how they work, an extremely useful analytical approach is to identify sets of related nodes, known as communities (Radicchi ). Until recently, this was conducted by clustering nodes in the network, however, a major drawback to this approach is that each node can belong to only a single community and in densely-connected networks, subnetworks may often overlap to such an extent that this approach becomes unsuitably restrictive. A superior method that circumvents this constraint is to cluster the links between nodes, thereby allowing nodes to belong to multiple communities and consequently revealing the overlapping and nested structure of the network (Ahn ; Evans and Lambiotte, 2009). We implement the algorithm outlined by Ahn ), which employs the Jaccard coefficient for assigning similarity between links, e and e, that share a node, k, where n+(i) refers to the first-order node neighbourhood of node i. After assigning pairwise similarities to all of the links in the network, the links are hierarchically clustered and the resulting dendrogram is cut at a point that maximizes the density of links within the clusters normalizing against the maximum and minimum numbers of links possible in each cluster, known as the partition density.

2 IMPLEMENTATION

We extend the algorithm so that it can handle networks that are weighted, directed and both weighted and directed using the Tanimoto coefficient suggested by Ahn ), where a refers to a vector describing the weights of links between node i and the nodes in the first-order neighbourhoods of both nodes i and j (equal to 0 in the event of an absent link). For directed networks, links to nodes shared by both node i and j are given a user-defined weight below 1 if they are in the opposite orientation. For networks that have numbers of edges that can be comfortably handled in memory (adjustable to suit the resources available to each user), several different hierarchical clustering algorithms can be chosen. For networks that are too large to be handled in memory, single-linkage clustering is used to enhance performance (see Supplementary Material). To facilitate analysis of the communities generated by the algorithm, we have included a suite of functions that allow the user to explore the structure of the communities as they relate to each other. Included in this are functions to extract the nested structure of communities and to further cluster the communities themselves using the Jaccard coefficient and the numbers of nodes shared by pairs of communities, thereby allowing the user to visualize the structure of the network across multiple scales (see Fig. 1D). In addition to this, we provide functions that calculate a novel community-based measure of node centrality. This measure weights the number of communities a node belongs to by the average pairwise similarity between the communities, where the main sum is over the N communities to which node i belongs, and S(j,k) refers to the similarity between community j and k, calculated as the Jaccard coefficient for the number of shared nodes between each community pair, and this is averaged over the m communities paired with community j and in which node i jointly belongs.
Fig. 1.

Visualizing link communities. (A) Example output from the link clustering algorithm in the R package ‘linkcomm’. The plot shows the link communities that result from cutting the dendrogram at a point where the partition density is maximized. (B) The network of interactions between the transcription factor diminutive (dm) and its targets visualized using a novel graph layout algorithm (see text) (C) A community-membership matrix showing colour-coded community membership for nodes that belong to the most communities. (D) A hierarchical clustering dendrogram showing clusters of link communities (meta-communities) which are based on the numbers of nodes shared by pairs of communities (see text).

Visualizing link communities. (A) Example output from the link clustering algorithm in the R package ‘linkcomm’. The plot shows the link communities that result from cutting the dendrogram at a point where the partition density is maximized. (B) The network of interactions between the transcription factor diminutive (dm) and its targets visualized using a novel graph layout algorithm (see text) (C) A community-membership matrix showing colour-coded community membership for nodes that belong to the most communities. (D) A hierarchical clustering dendrogram showing clusters of link communities (meta-communities) which are based on the numbers of nodes shared by pairs of communities (see text). We also provide several visualization methods for representing the link communities (Figs 1A–C). Foremost here is an implementation of a novel method for visualizing link communities (Fig. 1B) (http://scaledinnovation.com). This algorithm anchors communities evenly around the circumference of a circle in their dendrogram order (to minimize crossing over of links) and positions nodes within the circle according to how many links they possess in each of the communities. Thus, nodes that have links to a lot of communities will get pushed into the centre of the circle making this method well suited for representing ego networks where one or a small number of nodes belong to multiple communities (Fig. 1B).

3 RESULTS AND DISCUSSION

We ran the algorithm on a large gene co-expression network derived from Drosophila melanogaster embryonic in situ expression data (Tomancak ). This weighted network contains 106 357 links, 1031 nodes, and an average degree of 206. Links between genes indicate that the genes are co-expressed in at least one tissue during the final stages of embryonic development, and the weights attached to the links refer to the similarity of expression patterns for pairs of genes, calculated using the Jaccard coefficient (based on the numbers of shared tissues). The algorithm produced 873 non-trivial communities (composed of more than two edges). Further clustering of these communities allowed us to extract 11 meta-communities, where again nodes may appear multiple across different meta-communities (Fig. 1D). Using our measure of community centrality (3) we find that genes expressed in the gut, epidermis and pharynx structures tend to appear in many communities and hence tend to be expressed in many different tissues. Conversely, genes expressed in the yolk, fat body, eye, brain and ventral cord tend to be expressed in fewer tissues (Supplementary Tables S1 and S2). These results allow us to identify genes that may have more or less specific roles during the final stages of embryonic development. In future versions of the package we aim to implement a visualization method that will allow the user to zoom interactively into the network so that large networks can be plotted in their entirety without losing access to information at the local scale (Saalfeld ).
  5 in total

1.  Defining and identifying communities in networks.

Authors:  Filippo Radicchi; Claudio Castellano; Federico Cecconi; Vittorio Loreto; Domenico Parisi
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-23       Impact factor: 11.205

2.  Link communities reveal multiscale complexity in networks.

Authors:  Yong-Yeol Ahn; James P Bagrow; Sune Lehmann
Journal:  Nature       Date:  2010-06-20       Impact factor: 49.962

3.  Line graphs, link partitions, and overlapping communities.

Authors:  T S Evans; R Lambiotte
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2009-07-09

4.  CATMAID: collaborative annotation toolkit for massive amounts of image data.

Authors:  Stephan Saalfeld; Albert Cardona; Volker Hartenstein; Pavel Tomancak
Journal:  Bioinformatics       Date:  2009-04-17       Impact factor: 6.937

5.  Global analysis of patterns of gene expression during Drosophila embryogenesis.

Authors:  Pavel Tomancak; Benjamin P Berman; Amy Beaton; Richard Weiszmann; Elaine Kwan; Volker Hartenstein; Susan E Celniker; Gerald M Rubin
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

  5 in total
  42 in total

Review 1.  High-Dimensional Immunology for Schizophrenia Research: A Short Perspective.

Authors:  George K Lewis
Journal:  Schizophr Bull       Date:  2018-08-20       Impact factor: 9.306

2.  Metabolic Phenotypes of Response to Vaccination in Humans.

Authors:  Shuzhao Li; Nicole L Sullivan; Nadine Rouphael; Tianwei Yu; Sophia Banton; Mohan S Maddur; Megan McCausland; Christopher Chiu; Jennifer Canniff; Sheri Dubey; Ken Liu; ViLinh Tran; Thomas Hagan; Sai Duraisingham; Andreas Wieland; Aneesh K Mehta; Jennifer A Whitaker; Shankar Subramaniam; Dean P Jones; Alessandro Sette; Kalpit Vora; Adriana Weinberg; Mark J Mulligan; Helder I Nakaya; Myron Levin; Rafi Ahmed; Bali Pulendran
Journal:  Cell       Date:  2017-05-11       Impact factor: 41.582

3.  Evidence of community structure in biomedical research grant collaborations.

Authors:  Radhakrishnan Nagarajan; Alex T Kalinka; William R Hogan
Journal:  J Biomed Inform       Date:  2012-09-07       Impact factor: 6.317

4.  A genomic-scale artificial microRNA library as a tool to investigate the functionally redundant gene space in Arabidopsis.

Authors:  Felix Hauser; Wenxiao Chen; Ulrich Deinlein; Kenneth Chang; Stephan Ossowski; Joffrey Fitz; Gregory J Hannon; Julian I Schroeder
Journal:  Plant Cell       Date:  2013-08-16       Impact factor: 11.277

5.  Social network analysis to assess the impact of the CTSA on biomedical research grant collaboration.

Authors:  Radhakrishnan Nagarajan; Charlotte A Peterson; Jane S Lowe; Stephen W Wyatt; Timothy S Tracy; Philip A Kern
Journal:  Clin Transl Sci       Date:  2014-11-30       Impact factor: 4.689

6.  Association between dietary intake networks identified through a Gaussian graphical model and the risk of cancer: a prospective cohort study.

Authors:  Madhawa Gunathilake; Tung Hoang; Jeonghee Lee; Jeongseon Kim
Journal:  Eur J Nutr       Date:  2022-06-28       Impact factor: 5.614

7.  Habituation Learning Is a Widely Affected Mechanism in Drosophila Models of Intellectual Disability and Autism Spectrum Disorders.

Authors:  Michaela Fenckova; Laura E R Blok; Lenke Asztalos; David P Goodman; Pavel Cizek; Euginia L Singgih; Jeffrey C Glennon; Joanna IntHout; Christiane Zweier; Evan E Eichler; Catherine R von Reyn; Raphael A Bernier; Zoltan Asztalos; Annette Schenck
Journal:  Biol Psychiatry       Date:  2019-05-09       Impact factor: 13.382

8.  miRSM: an R package to infer and analyse miRNA sponge modules in heterogeneous data.

Authors:  Junpeng Zhang; Lin Liu; Taosheng Xu; Wu Zhang; Chunwen Zhao; Sijing Li; Jiuyong Li; Nini Rao; Thuc Duy Le
Journal:  RNA Biol       Date:  2021-04-06       Impact factor: 4.652

9.  Systematic Phenomics Analysis Deconvolutes Genes Mutated in Intellectual Disability into Biologically Coherent Modules.

Authors:  Korinna Kochinke; Christiane Zweier; Bonnie Nijhof; Michaela Fenckova; Pavel Cizek; Frank Honti; Shivakumar Keerthikumar; Merel A W Oortveld; Tjitske Kleefstra; Jamie M Kramer; Caleb Webber; Martijn A Huynen; Annette Schenck
Journal:  Am J Hum Genet       Date:  2016-01-07       Impact factor: 11.025

10.  Transcriptomics of Differential Ripening in 'd'Anjou' Pear (Pyrus communis L.).

Authors:  Loren Honaas; Heidi Hargarten; John Hadish; Stephen P Ficklin; Sara Serra; Stefano Musacchi; Eric Wafula; James Mattheis; Claude W dePamphilis; David Rudell
Journal:  Front Plant Sci       Date:  2021-06-16       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.