| Literature DB >> 23398688 |
Antonio Mora1, Katerina Michalickova, Ian M Donaldson.
Abstract
BACKGROUND: Multigenic diseases are often associated with protein complexes or interactions involved in the same pathway. We wanted to estimate to what extent this is true given a consolidated protein interaction data set. The study stresses data integration and data representation issues.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23398688 PMCID: PMC3598893 DOI: 10.1186/1471-2105-14-47
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Disease groups and their significant overlaps. Cytoscape is used to visualize the disease groups and their overlaps. Additional file 5 can be directly loaded into Cytoscape to replicate the figure and explore the disease groups. Each node represents a group of related diseases associated with two to 59 genes. Edges represent one or more genes that are shared between disease groups where the width of the edge is proportional to the Jaccard index. The graph is sparse. Only 837 edges exist between the 497 multigenic disease groups and 130 of these overlaps are significant (hypergeometric test, p-value < 0.01 after FDR adjustment) – the above shows only these significant overlaps but all are available in the provided file. A number of connected components group together related disease groups such as cancer and eye disorders (red box magnified in inset).
Figure 2Alport syndrome and subunits of type 4 collagen. Alport syndrome is associated with three genes: COL4A5 with the X-linked form (MIM 301050) while COL4A3 and COL4A4 are associated with the autosomal-recessive form (MIM 203780). These forms were grouped into disease group 80. Searching for this disease group in iRefScape for interactions between any of the proteins in this disease group returns one record from IntAct (EBI-2461456) describing a complex (hexagon) containing all three proteins (ovals).
Figure 3Disease group overlaps with complexes. Cytoscape is used to visualize disease groups and their overlaps with n-ary interaction records. Additional file 7 can be directly loaded into Cytoscape to replicate the figure and explore the overlaps. Disease groups (circular nodes) and n-ary records (hexagonal nodes) that have significant overlaps are indicated by edges whose width is proportional to the jacaard index [31] for the overlap. All disease-group overlaps with complexes are provided in the additional file. Here, significant overlaps (hypergeometric test, p-value < 0.0025 after FDR adjustment) are shown on the right and involve 105 disease groups. The region inside the red box shows an overlap between the Cornelia de Lange syndrome disease group and n-ary records that contain subunits of the cohesion complex. The region inside the green box (and magnified in the left inset) shows two disease groups (Benign familial hematuria and Alport syndrome) that both overlap with the same n-ary record (see text for details).
Figure 4Liddle syndrome. Mutations in either SCNNB or SCNNG are associated with Liddle syndrome (disease group 772). Both are subunits of the heterotrimeric (alpha, beta, gamma) nonvoltage-gated, amiloride-sensitive, sodium channel. Both proteins were observed together with SCNN1A (the alpha subunit of the channel) as interactors with syntaxin 1A (STX1A). The original paper [32] contains evidence for a direct interaction between STX1A and the gamma subunit and for a complex that includes all four proteins using in-vitro translated components (Figure one in [32]). The complex is represented as four binary interactions in the BioGrid database. These interactions are identified as part of a potential spoke-represented complex by iRefScape (a grey hexagon appears after selecting View Tools/Show spoke-represented complexes from the iRefScape menu).
Figure 5Glycine encephalopathy. Mutations in any one of three genes can cause Glycine encephalopathy (see MIM 605899 and disease group 542). All three potential pairwise interactions are found as predicted interactions in the OPHID database. No other database in iRefIndex includes these interactions. The three proteins are all part of the glycine decarboxylase complex; a loosely associated multienzyme complex consisting of four proteins that catalyzes the oxidative cleavage of glycine to carbon dioxide, ammonia, and a methylene group, in a multistep reaction. The fourth subunit (DLDH_HUMAN a.k.a. DLD or GCSL) has no interactions with any of the above three subunits in the iRefIndex. DLD is also a subunit of the branched-chain alpha-keto acid dehydrogenase complex (BCKD). Mutations in DLD or any other of the three catalytic subunits of this complex can lead to Maple Syrup Urine Disease (MIM 248600) – a disease with similar phenotype. This complex is not detected by any of the methods shown in this study since interactions between its subunits are not present in iRefIndex.
Figure 6Diagram of DiG overlaps with each of three different interaction data types. A Venn Diagram shows that some DiGs correlate to a complex or complexes found at only one of the three protein information sources (24 DiGs when using n-ary data, 25 for regenerated complexes and 20 for binary interactions between the DiG proteins). At the same time, it can be seen that some DiGs are significantly similar to complexes found in more than one of the three protein interaction information sources.
DiGs related only to complexes found in n-ary data
| 6 | 3-methylcrotonyl-coa carboxylase | 2 | 1209480 | 9 | 9 | 2 |
| 65 | Albinism | 3 | 1209579 | 1 | 1 | 1 |
| 80 | alport syndrome | 3 | 781937 | 4 | 4 | 1 |
| 89 | Amyloidosis | 6 | 1221863 | 26 | 22 | 9 |
| 168 | bethlem myopathy | 3 | 1027975 | 5 | 5 | 1 |
| 192 | Bradyopsia | 2 | 969965 | 1 | 1 | 1 |
| 259 | Ceroid | 8 | 1232651 | 6 | 3 | 1 |
| 302 | combined cellular and humoral immune defects | 2 | 1220789 | 1 | 1 | 1 |
| 313 | congenital disorder of glycosylation | 23 | 725907 | 28 | 26 | 4 |
| 578 | Hematuria | 2 | 781937 | 4 | 4 | 1 |
| 595 | hereditary hemorrhagic telangiectasia | 2 | 618400 | 1 | 1 | 1 |
| 690 | immune dysfunction | 2 | 1122614 | 5 | 5 | 2 |
| 758 | leigh syndrome | 14 | 1211293 | 44 | 36 | 13 |
| 812 | maple syrup urine disease | 4 | 1225549 | 12 | 11 | 1 |
| 870 | mitochondrial complex | 16 | 1211293 | 28 | 24 | 5 |
| 975 | omenn syndrome | 3 | 1220789 | 1 | 1 | 1 |
| 998 | Osteoporosis | 5 | 869728 | 5 | 5 | 2 |
| 1108 | Propionicacidemia | 2 | 1209480 | 5 | 5 | 2 |
| 1266 | stickler syndrome | 3 | 878437 | 4 | 4 | 1 |
| 1341 | tumoral calcinosis | 4 | 682939 | 3 | 3 | 1 |
| 1345 | ullrich congenital muscular dystrophy | 3 | 1027975 | 5 | 5 | 1 |
| 1477 | celiac disease | 4 | 1220318 | 2 | 1 | 1 |
| 1512 | intervertebral disc disease | 2 | 893696 | 1 | 1 | 1 |
| 1520 | Leprosy | 4 | 651466 | 7 | 7 | 1 |
Summary of the 24 DiGs that can be related to protein complexes but can only be found in iRefIndex n-ary data: Mitochondrial complex deficiency (DiG ID = 870) is a DiG that groups 16 genes. At least one of the 16 genes in this DiG was present in 28 complexes present in iRefIndex n-ary data (complex span = 28) and 24 of them could be considered as significantly similar after a hypergeometric test with a p-value < 0.05. After adjusting the p-values for multiple testing using the FDR method, only 5 of those complexes could be considered significantly similar to the DiG, and therefore, their subunits related to the diseases involved in the DiG. The best match, among those 5, is the complex with the icrigid = 1211293, which corresponds to the complex in the CORUM database with interaction identifier = 15317750.
“3-methylcrotonyl-coa carboxylase” stands for “3-methylcrotonyl-coa carboxylase 1 deficiency” and “3-methylcrotonyl-coa carboxylase 2 deficiency”, while “Ceroid” stands for “Ceroid lipofuscinosis” and “Mitochondrial complex” stands for Mitochondrial complex I, II, III and IV deficiencies.
DiGs related only to regenerated complexes
| 63 | alagille syndrome | 2 | MI:0463(grid).pubmed:10958687.MI:0004(affinity chromatography technology).1144108.MI:0463(grid).pubmed:10958687.MI:0096(pull down).1144108 | 4 | 4 | 1 |
| 190 | Brachydactyly | 7 | MI:0463(grid).pubmed:9525338.MI:0096(pull down).248458 | 13 | 12 | 1 |
| 246 | central hypoventilation syndrome | 6 | MI:0463(grid).pubmed:10829012.MI:0096(pull down).4168707 | 13 | 12 | 1 |
| 271 | Chondrodysplasia | 6 | MI:0463(grid).pubmed:9525338.MI:0096(pull down).248458 | 13 | 13 | 1 |
| 290 | cockayne syndrome | 2 | MI:0463(grid).pubmed:10944529.MI:0004(affinity chromatography technology).660979 | 11 | 11 | 2 |
| 363 | Deafness | 59 | MI:0463(grid).pubmed:12485990.MI:0096(pull down).1981308 | 134 | 56 | 1 |
| 424 | endometrial carcinoma | 5 | MI:0463(grid).pubmed:8942985.MI:0096(pull down).813561.MI:0463(grid).pubmed:10029069.MI:0096(pull down).813561.MI:0463(grid).pubmed:9774676.MI:0096(pull down).813561.MI:0469(intact).pubmed:9774676.MI:0096(pull down).813561 | 105 | 103 | 5 |
| 451 | Exostoses | 2 | MI:0463(grid).pubmed:17353931.MI:0004(affinity chromatography technology).3748087.MI:0469(intact).pubmed:17353931.MI:0006(anti bait coip).3748087 | 13 | 13 | 1 |
| 512 | gastric cancer | 11 | MI:0469(intact).pubmed:19411071.MI:0006(anti bait coip).3231405 | 194 | 190 | 2 |
| 687 | Ichthyosis | 13 | MI:0469(intact).pubmed:17373842.MI:0006(anti bait coip).1386965 | 25 | 21 | 1 |
| 714 | jackson-weiss syndrome | 2 | MI:0463(grid).pubmed:20388777.MI:0004(affinity chromatography technology).3027803 | 16 | 16 | 1 |
| 730 | Keratosis | 6 | MI:0463(grid).pubmed:11790773.MI:0004(affinity chromatography technology).2791010 | 65 | 60 | 5 |
| 772 | liddle syndrome | 2 | MI:0463(grid).pubmed:14996668.MI:0004(affinity chromatography technology).382851 | 14 | 14 | 8 |
| 829 | medullary thyroid carcinoma | 2 | MI:0463(grid).pubmed:8183561.MI:0004(affinity chromatography technology).1871880 | 26 | 26 | 1 |
| 904 | Mycobacterium | 10 | MI:0463(grid).pubmed:10848598.MI:0004(affinity chromatography technology).1004542 | 45 | 42 | 2 |
| 933 | nephrotic syndrome | 4 | MI:0463(grid).pubmed:11733557.MI:0004(affinity chromatography technology).3453124.MI:0463(grid).pubmed:11733557.MI:0096(pull down).3453124 | 8 | 8 | 1 |
| 959 | noonan-like/multiple giant cell lesion syndrome | 2 | MI:0463(grid).pubmed:9344843.MI:0004(affinity chromatography technology).5369795 | 100 | 100 | 1 |
| 999 | Osteosarcoma | 2 | MI:0463(grid).pubmed:12242661.MI:0004(affinity chromatography technology).3633225 | 340 | 340 | 318 |
| 1049 | pfeiffer syndrome | 2 | MI:0463(grid).pubmed:20388777.MI:0004(affinity chromatography technology).3027803 | 16 | 16 | 1 |
| 1051 | pheochromocytoma | 6 | MI:0463(grid).pubmed:10829012.MI:0096(pull down).4168707 | 115 | 109 | 1 |
| 1071 | pituitary hormone | 5 | MI:0463(grid).pubmed:10788441.MI:0096(pull down).3747010 | 9 | 9 | 1 |
| 1165 | rhabdomyosarcoma | 4 | MI:0463(grid).pubmed:17662948.MI:0004(affinity chromatography technology).2068280 | 30 | 30 | 1 |
| 1233 | Sitosterolemia | 2 | MI:0471(mint).pubmed:16870176.MI:0007(anti tag coip).3242301 | 2 | 2 | 1 |
| 1260 | squamous cell carcinoma | 3 | MI:0463(grid).pubmed:15659383.MI:0401(biochemical).4258047 | 52 | 52 | 12 |
| 1574 | Tuberculosis | 3 | MI:0463(grid).pubmed:7673114.MI:0096(pull down).4202157 | 9 | 9 | 1 |
Summary of the 25 DiGs that are significantly similar to protein complexes but can only be found in iRefIndex regenerated complex data: Ichthyosis (DiG ID = 687) is a DiG that groups 13 genes. At least one of the 13 genes in this DiG was present in 25 complexes present in iRefIndex regenerated n-ary data (complex span = 25) but only 21 of them could be considered as significantly similar after a hypergeometric test with a p-value < 0.05. After adjusting the p-values for multiple testing using the FDR method, only 1 of those complexes could be considered significantly similar to the DiG.
DiGs enriched only in binary interactions
| 156 | basal cell carcinoma | 4 | 2 | 1.1e-05 |
| 263 | charcot-marie-tooth disease | 26 | 4 | 2.0e-4 |
| 305 | Immunodeficiency | 12 | 4 | 3.8e-07 |
| 310 | retinal dystrophy | 22 | 4 | 5.4e-05 |
| 379 | diabetes mellitus | 44 | 12 | 7.7e-11 |
| 538 | Glutaricaciduria | 4 | 3 | 1.3e-08 |
| 542 | glycine encephalopathy | 3 | 3 | 6.3e-10 |
| 543 | glycogen storage disease | 19 | 6 | 1.1e-08 |
| 581 | Hemochromatosis | 5 | 2 | 3.3e-05 |
| 626 | Hypercholesterolemia | 9 | 3 | 4.4e-06 |
| 628 | Hyperekplexia | 5 | 2 | 3.3e-05 |
| 644 | Hyperphenylalaninemia | 4 | 2 | 1.1e-05 |
| 780 | Lissencephaly | 5 | 2 | 3.3e-05 |
| 850 | Methemoglobinemia | 4 | 2 | 1.1e-05 |
| 996 | Osteopetrosis | 8 | 2 | 2.7e-4 |
| 1081 | polycystic kidney | 4 | 2 | 1.1e-05 |
| 1092 | Porphyria | 6 | 5 | 1.4e-12 |
| 1153 | retinitis pigmentosa | 43 | 8 | 1.6e-06 |
| 1300 | Thrombocythemia | 3 | 2 | 2.2e-06 |
| 1536 | myocardial infarction | 13 | 4 | 7.4e-07 |
Summary of the 20 DiGs that are enriched in iRefIndex binary interactions and can only be found using this method: Glycine encephalopathy (DiG ID = 542) is a DiG that groups 3 genes. There are 3 possible interactions between 3 genes. Finding all of them in the PIN is a very unlikely event (hypergeometric p-value close to zero) and, therefore, the existence of a functional group of proteins related to the disease can be hypothesized.
“immunodeficiency” stands for immunodeficiencies due to defects in CD3, MAPBP-interacting protein, with hyper IgM and X-linked.
Sources of significant n-ary data
| IntAct | 24 |
| CORUM | 16 |
| HPRD | 13 |
| DIP | 6 |
| InnateDB | 5 |
| DIP & IntAct | 5 |
| Mint | 4 |
| BIND & CORUM | 2 |
| DIP, MINT & InnateDB | 1 |
| BIND | 1 |
| CORUM, IntAct & HPRD | 1 |
| CORUM & IntAct | 1 |
Number of best matches to DiGs per source database. 7 different databases contribute with complexes that match at least one DiG. In 87% of the cases the complex belongs to only one database. The other 13% can be found in 2 or 3 databases at the same time. The table shows how using only a subset of databases instead of a consolidated data set may lead to incomplete results regarding DiG matches.
Sources of significant regenerated data
| BioGRID | 73 |
| IntAct | 8 |
| Mint | 3 |
| BioGRID & IntAct | 3 |
| BioGRID & Mint | 1 |
| IntAct & Mint | 1 |
Number of best matches to DiGs per source database. 3 different databases contribute with regenerated complexes that happen to match at least one DiG. BioGRID is the main contributor to this group of complexes. The table shows that the process of regenerating potentially spoke-represented complexes may be important to detect information on protein groups matching DiGs.
Figure 7Diagram of DiG overlaps with three different interaction data types and KEGG data. A Venn diagram of the number of disease groups that have significant overlaps with each of the three interaction data types and the KEGG pathway database. A. n-ary data. B. regenerated complex data. C. binary data. D. KEGG data. Diseases lacking significant overlaps with KEGG are evenly spread among the three interaction types.
Comparison of results using iRefIndex version 8.0 and iRefIndex 9.0
| # Human-Human Interaction records | 319447 | 361059 |
| # Genes in Morbid Map with PI info in iRefIndex | 1685 | 1719 |
| # Genes in Morbid Map without PI info in iRefIndex | 256 | 222 |
| # DiGs with non-translatable genes | 166 | 156 |
| # n-ary groups | 4827 | 5677 |
| # regenerated groups | 7830 | 9947 |
| # binary nodes in PIN | 15597 | 16272 |
| # binary edges in PIN | 98853 | 113733 |
| # significant matches DiG-nary | 81 | 94 |
| # significant matches DiG-regenerated | 96 | 105 |
| # binary-enriched DiGs | 84 | 87 |
| # all significant DiGs | 220 | 227 |
| # DiG matching only nary data | 16 | 24 |
| # DiG matching only regenerated data | 22 | 25 |
| # DiG matching only binary data | 21 | 20 |
| # databases with nary data matching DiGs | 6 | 7 |
| # databases with regenerated data matching DiGs | 3 | 3 |
| # nary groups with lpr < 22 (low-throughput) | 53 | 61 |
| # Matches found using KEGG and not found using iRefIndex | 77 | 68 |
The first 8 rows show how knowledge about the human interactome grows from one database release to the next; that is, the growth of the number of interaction records (319447 to 361059), the number of genes associated to disease in OMIM having interaction information in iRefIndex (1685 to 1719), the number of complexes coming from n-ary (4827 to 5677) and regenerated (7830 to 9947) data, and the number of binary nodes and edges in the PIN. This leads to an increase in the number of matches between DiGs and each of the three data sources (nary, regenerated, binary) and, in total, an increase from 220 to 227 matching DiGs from one release to the following one. The number of DiGs matching only one type of data also grows, as well as the number of databases with n-ary data matching DiGs.
Figure 8An explanation of interaction data types and their representation. Experiments that can be used to detect interactions may produce binary data (panels A-C) or n-ary data (panels D-F). N-ary data is commonly represented using one of three different models: the spoke model (panels G-H), the matrix model (panels I-J) or the bipartite model (panels K-L). Some n-ary data is represented using a spoke model and an attempt can be made to detect these records in MITAB files and reconstruct the list of component proteins in a potential complex (panel M). These lists are referred to as regenerated complexes in this paper. The full explanation of this Figure is provided in the Methods section.