Literature DB >> 27508260

Data and programs in support of network analysis of genes and their association with diseases.

Panagiota I Kontou1, Athanasia Pavlopoulou1, Niki L Dimou1, Georgios A Pavlopoulos2, Pantelis G Bagos1.   

Abstract

The network-based approaches that were employed in order to depict the relationships between human genetic diseases and their associated genes are described. Towards this direction, monopartite disease-disease and gene-gene networks were constructed from bipartite gene-disease association networks. The latter were created by collecting and integrating data from three diverse resources, each one with different content, covering from rare monogenic disorders to common complex diseases. Moreover, topological and clustering graph analyses were performed. The methodology and the programs presented in this article are related to the research article entitled "Network analysis of genes and their association with diseases" [1].

Entities:  

Keywords:  Disease-disease networks; Gene-disease associations; Gene-gene networks

Year:  2016        PMID: 27508260      PMCID: PMC4969244          DOI: 10.1016/j.dib.2016.07.022

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table Value of the data The need for integrating complementary data from different sources to biological networks is further highlighted in this study. Important, previously unknown, associations between genes and diseases were revealed. Based on the constructed disease-disease networks, diseases with apparently distinct phenotypic manifestations were found to share a common genetic background. This finding could be utilized in network pharmacology.

Data

The overall procedure of the data analysis is shown illustratively in Fig. 1. The Perl (Supplementary Files 1-5) and R (Supplementary File 6) programs used for data analysis are indicated. A complete description of the data and methodology is presented in [1].
Fig.1

Flow Diagram of the data analysis.

Experimental design, materials and methods

Data collection

Disease-gene association data were collected and integrated from three diverse publicly available, comprehensive resources (NCBI׳s OMIM [2], NIH׳s GAD [3] and NHRI GWAS Catalog [4]). As a given disease can be associated with more than one gene, a script was written in Perl to separate the multiple entries (Supplementary File 1; separate.pl).

Disease and gene nomenclature

In order to maintain a consistent nomenclature and classification for diseases in our analysis, the naming conventions described in the International Classification of Diseases (ICD) were used. The disease terms from the three databases were converted to ICD terms with the use of a Perl script (Supplementary File 2; ICD.pl). Moreover, in order to maintain a uniform nomenclature across all datasets, all genes from our three databases along with the ones from UniProtKB [5] were converted to the official HGNC (HUGO Gene Nomenclature Committee) [6] gene symbols using a Perl script (Supplementary File 3; Hugo.pl).

Network processing and analysis

The bipartite networks of gene-disease associations were converted to monopartite networks of gene-gene and disease-disease interactions, by using a Perl script (Supplementary File 4; Bipartite.pl). This functionality is not available in other network analysis packages and we incorporated it in a publicly available web-server, PowerClust, which is available at: http://www.compgen.org/tools/powerclust. PowerClust, is an easy-to-use web application for clustering analysis, network processing and visualization. Moreover, randomization procedures were performed in order to determine whether the highly connected nodes in the original networks have a degree that cannot occur simply by chance given the other properties of the networks (Supplementary File 5; Random.pl). Finally, the robustness of the topological features of the projected gene-gene and disease-disease networks was assessed by employing a bipartite-specific rewiring algorithm [7] to test whether the degree distributions of the projected monopartite networks are kept stable in the randomized gene-gene/disease-disease networks compared to the initial ones (Supplementary File 6; Rewire.R). The JOINT gene-disease network (generated by combing data from the individual databases) is provided as a cytoscape network file.
Subject areaSystems biology
More specific subject areaGene-disease networks
Type of dataFigure, text files, Cytoscape Network file
How data were acquiredData were acquired from the publicly available databases: OMIM, GAD, GWAS, UniProtKB, ICD, HGNC
Data formatProcessed, analyzed
Experimental factorsGene-disease association data were analyzed using Perl and R scripts and Cytoscape.
Experimental featuresGene-gene and disease-disease networks were constructed.
Data source locationDepartment of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece
Data accessibilityData are provided with this article.
  7 in total

1.  Genetic association studies.

Authors:  Heather J Cordell; David G Clayton
Journal:  Lancet       Date:  2005 Sep 24-30       Impact factor: 79.321

2.  Network analysis of genes and their association with diseases.

Authors:  Panagiota I Kontou; Athanasia Pavlopoulou; Niki L Dimou; Georgios A Pavlopoulos; Pantelis G Bagos
Journal:  Gene       Date:  2016-06-02       Impact factor: 3.688

3.  Genenames.org: the HGNC resources in 2015.

Authors:  Kristian A Gray; Bethan Yates; Ruth L Seal; Mathew W Wright; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2014-10-31       Impact factor: 19.160

4.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.

Authors:  Danielle Welter; Jacqueline MacArthur; Joannella Morales; Tony Burdett; Peggy Hall; Heather Junkins; Alan Klemm; Paul Flicek; Teri Manolio; Lucia Hindorff; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2013-12-06       Impact factor: 16.971

5.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders.

Authors:  Joanna S Amberger; Carol A Bocchini; François Schiettecatte; Alan F Scott; Ada Hamosh
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 19.160

6.  Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data.

Authors:  Sylvain Poux; Michele Magrane; Cecilia N Arighi; Alan Bridge; Claire O'Donovan; Kati Laiho
Journal:  Database (Oxford)       Date:  2014-03-12       Impact factor: 3.451

7.  Fast randomization of large genomic datasets while preserving alteration counts.

Authors:  Andrea Gobbi; Francesco Iorio; Kevin J Dawson; David C Wedge; David Tamborero; Ludmil B Alexandrov; Nuria Lopez-Bigas; Mathew J Garnett; Giuseppe Jurman; Julio Saez-Rodriguez
Journal:  Bioinformatics       Date:  2014-09-01       Impact factor: 6.937

  7 in total
  1 in total

1.  Identification of gene expression profiles in myocardial infarction: a systematic review and meta-analysis.

Authors:  Panagiota Kontou; Athanasia Pavlopoulou; Georgia Braliou; Spyridoula Bogiatzi; Niki Dimou; Sripal Bangalore; Pantelis Bagos
Journal:  BMC Med Genomics       Date:  2018-11-27       Impact factor: 3.063

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.