| Literature DB >> 28812536 |
Giulia Babbi1, Pier Luigi Martelli2, Giuseppe Profiti1, Samuele Bovo1, Castrense Savojardo1, Rita Casadio1,3.
Abstract
BACKGROUND: Genetic investigations, boosted by modern sequencing techniques, allow dissecting the genetic component of different phenotypic traits. These efforts result in the compilation of lists of genes related to diseases and show that an increasing number of diseases is associated with multiple genes. Investigating functional relations among genes associated with the same disease contributes to highlighting molecular mechanisms of the pathogenesis.Entities:
Keywords: Functional enrichment; Gene/disease relationship; Protein functional annotation; Protein-protein interaction
Mesh:
Year: 2017 PMID: 28812536 PMCID: PMC5558190 DOI: 10.1186/s12864-017-3911-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Distribution of gene-disease associations. The Y-axis scale is logarithmic. a Number (#) of genes associated with diseases. 2672 diseases are distributed with respect to the number of associated genes. 2051 diseases are monogenic; 621 diseases are associated with multiple genes (from 2 to 69). b Number (#) of diseases associated to genes. 3658 genes are distributed with respect to the number of associated diseases. 2544 genes are associated with a single disease; 1114 genes are associated with multiple diseases (from 2 to 16)
Gene annotation in eDGAR
| All diseases | Diseases associated with multiple genes | |||
|---|---|---|---|---|
| # genesa | # associated diseasesb | # genesa | # associated diseasesb | |
| Total number | 3658 | 2672 | 2600 | 621 |
| Protein coding genes | 3628 (100%) | 2655 (100%) | 2576 (100%) | 619 (100%) |
| with PDB entry | 1682 (46.4%) | 1625 (61.2%) | 1176 (45.7%) | 512 (82.7%) |
| Membrane proteins | 1891 (52.1%) | 1644 (61.9%) | 1364 (53.0%) | 517 (83.5%) |
| Enzymes (with E.C number) | 1112 (30.7%) | 1045 (39.4%) | 688 (26.7%) | 363 (58.6%) |
| Reported in TRRUST (as TF) | 253 (7.0%) | 358 (13.5%) | 179 (6.9%) | 157 (25.4%) |
| Reported in TRRUST (as target) | 783 (21.6%) | 969 (36.5%) | 570 (22.1%) | 405 (65.4%) |
| Annotated with GO MF | 3419 (94.2%) | 2575 (97.0%) | 2419 (93.9%) | 617 (99.7%) |
| Annotated with GO BP | 3538 (97.5%) | 2619 (98.6%) | 2514 (97.6%) | 618 (99.8%) |
| Annotated with GO CC | 3576 (98.6%) | 2644 (99.6%) | 2533 (98.3%) | 618 (99.8%) |
| Associated with KEGG pathways | 2057 (56.7%) | 1868 (70.4%) | 1430 (55.5%) | 549 (88.7%) |
| Associated with REACTOME | 2278 (62.8%) | 2007 (75.6%) | 1595 (61.9%) | 563 (91.0%) |
| With physical BIOGRID interactions | 3307 (91.3%) | 2502 (94.2%) | 2346 (91.2%) | 609 (98.4%) |
| With genetic BIOGRID interactions | 351 (9.7%) | 472 (17.8%) | 259 (10.1%) | 247 (39.9%) |
| With STRING interactions | 2992 (82.5%) | 2341 (88.2%) | 2146 (83.3%) | 609 (98.4%) |
| Part of CORUM complexes | 714 (19.7%) | 706 (26.6%) | 558 (21.7%) | 340 (54.9%) |
| Part of CENSUS complexes | 696 (19.2%) | 689 (26.0%) | 501 (19.4%) | 296 (47.8%) |
| In tandem repeats | 381 (10.5%) | 448 (16.9%) | 280 (10.9%) | 234 (37.8%) |
aPercentages are computed with respect to the number of protein coding genes
bPercentages are computed with respect to the number of diseases associated with protein coding genes
Features shared by genes involved in the same heterogeneous or polygenic diseases
| # diseases | # pairwise relations | # protein coding genes | |
|---|---|---|---|
| Total number | 621 | 25,100 | 2576 |
| With pairs of genes: | |||
| In same cytogenetic band | 136 (21.9%) | 326 (1.3%) | 335 (13.0%) |
| In tandem repeat | 51 (8.2%) | 58 (0.2%) | 92 (3.6%) |
| In TF/target pairs | 39 (6.3%) | 81 (0.3%) | 94 (3.6%) |
| Co-regulated by the same TF (not involved in the disease) | 273 (44.0%) | 2308 (9.2%) | 626 (24.3%) |
| Sharing MF GO | 586 (94.4%) | 19,075 (76.0%) | 2369 (92.0%) |
| Sharing BP GO | 597 (96.1%) | 22,948 (91.4%) | 2502 (97.1%) |
| Sharing CC GO | 604 (97.3%) | 23,645 (94.2%) | 2519 (97.8%) |
| Sharing KEGG pathway | 349 (56.2%) | 3129 (12.5%) | 1074 (41.7%) |
| Sharing REACTOME pathway | 474 (76.3%) | 9806 (39.1%) | 1554 (60.3%) |
| Interacting in PDB | 96 (15.5%) | 207 (0.8%) | 199 (7.7%) |
| In the same CORUM complex | 86 (13.8%) | 469 (1.9%) | 225 (8.7%) |
| In the same CENSUS complex | 45 (7.2%) | 166 (0.7%) | 119 (4.6%) |
| Directly linked in STRING | 291 (46.9%) | 1535 (6.1%) | 932 (36.2%) |
| Indirectly linked in STRING | 115 (18.5%) | 4355 (17.4%) | 1346 (52.3%) |
| Directly linked in BIOGRID (physical interaction) | 250 (40.3%) | 944 (3.8%) | 799 (31.0%) |
| Indirectly linked in BIOGRID (physical interaction) | 160 (25.8%) | 5228 (20.8%) | 1607 (62.4%) |
| Directly linked in BIOGRID (genetic interaction) | 9 (1.4%) | 13 (0.1%) | 19 (0.7%) |
| Indirectly linked in BIOGRID (genetic interaction) | 25 (4.0%) | 45 (0.2%) | 62 (2.4%) |
Fig. 2Distribution of best IC values of GO terms for genes involved in multigenic diseases. a GO terms shared by genes; b GO terms after enrichment with NET-GE. For each multigenic disease, IC values of gene-associated GO terms (of the three different roots) are evaluated (Eq. 1). In the figure, the highest IC for each disease is shown. The frequency is computed with respect to the total number of multigenic diseases (621). When IC = 0, genes associated with multigenic disease do not share or enrich GO terms (panel a and b respectively)
NET-GE functional enrichment of groups of genes involved in the same disease
| # diseases | # annotations | |
|---|---|---|
| KEGG pathways | 412 (66.3%) | 2753 |
| REACTOME pathways | 488 (78.6%) | 4130 |
| GO MF terms | 530 (85.3%) | 4851 |
| GO BP terms | 551 (88.7%) | 17,029 |
| GO CC terms | 477 (76.8%) | 3910 |
Fig. 3eDGAR page for hypoparathyroidism (OMIM 146200). In the figure, each gene is highlighted with a different color; the Transcription Factor annotation and the known interactions are reported, together with the simple graph describing them. A summary of the KEGG pathways enriched with NET-GE and the shared GO terms for BP is also provided