| Literature DB >> 23282057 |
Hufeng Zhou1, Jingjing Jin, Haojun Zhang, Bo Yi, Michal Wozniak, Limsoon Wong.
Abstract
BACKGROUND: Pathway data are important for understanding the relationship between genes, proteins and many other molecules in living organisms. Pathway gene relationships are crucial information for guidance, prediction, reference and assessment in biochemistry, computational biology, and medicine. Many well-established databases--e.g., KEGG, WikiPathways, and BioCyc--are dedicated to collecting pathway data for public access. However, the effectiveness of these databases is hindered by issues such as incompatible data formats, inconsistent molecular representations, inconsistent molecular relationship representations, inconsistent referrals to pathway names, and incomprehensive data from different databases.Entities:
Mesh:
Year: 2012 PMID: 23282057 PMCID: PMC3521174 DOI: 10.1186/1752-0509-6-S2-S2
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Four types of IntPath unified gene relationships.
| Unified Genes Relationships | Explanation |
|---|---|
| ECrel | Enzyme-enzyme relation, indicating two enzymes catalyzing successive reaction steps. |
| PPrel | Protein-protein interaction, such as binding and modification, or proteins have control over the same process. |
| GErel | Gene expression interaction, indicating relation of transcription factor and target gene product. |
| GPrel | Proteins belong to the same molecular complex, not necessarily interacting directly. |
Explanations of the types of relationships in IntPath are given below.
The number of pathways, genes and gene pairs from different databases after normalization.
| KEGG | WikiPathways | HumanCyc | |
|---|---|---|---|
| Pathways | 237 | 135 | 290 |
| Genes | 5,935 | 3,445 | 1,082 |
| Gene Pairs | 29,566 | 18,035 | 5,961 |
| KEGG | WikiPathways | MouseCyc | |
| Pathways | 218 | 140 | 323 |
| Genes | 6,306 | 4,084 | 1,194 |
| Gene Pairs | 32,235 | 25,004 | 10,792 |
| KEGG | WikiPathways | YeastCyc | |
| Pathways | 98 | 125 | 184 |
| Genes | 1,735 | 863 | 542 |
| Gene Pairs | 2,922 | 57 | 1,440 |
| KEGG | WikiPathways | MTBRvCyc | |
| Pathways | 110 | 8 | 234 |
| Genes | 1,078 | 152 | 493 |
| Gene Pairs | 3,775 | 62 | 2,764 |
Summary of the number of pathways, genes, and gene pairs after normalization from different databases.
Figure 1Pie charts depicting overlapping gene proportions. The red part refers to the proportions of unique genes while the blue part refers to proportions where there is an overlap of genes.
Figure 2Pie charts depicting overlapping gene pair proportions. The red part refers to the proportions of unique gene pairs while the blue part refers to proportions where there is an overlap of gene pairs.
Summary of overlapping gene proportions.
| KEGG vs WikiPathways | WikiPathways vs HumanCyc | HumanCyc vs KEGG | |
|---|---|---|---|
| Overlap Genes | 2,485 | 396 | 824 |
| Unique Genes | 4,410 | 3,735 | 5,369 |
| Jaccard Coefficient | 0.360 | 0.096 | 0.133 |
| KEGG vs WikiPathways | WikiPathways vs MouseCyc | MouseCyc vs KEGG | |
| Overlap Genes | 2,611 | 532 | 919 |
| Unique Genes | 5,168 | 4,214 | 5,662 |
| Jaccard Coefficient | 0.336 | 0.112 | 0.140 |
| KEGG vs WikiPathways | WikiPathways vs YeastCyc | YeastCyc vs KEGG | |
| Overlap Genes | 801 | 402 | 480 |
| Unique Genes | 996 | 601 | 1,317 |
| Jaccard Coefficient | 0.446 | 0.400 | 0.267 |
| KEGG vs WikiPathways | WikiPathways vs MTBRvCyc | MTBRvCyc vs KEGG | |
| Overlap Genes | 141 | 60 | 432 |
| Unique Genes | 948 | 525 | 707 |
| Jaccard Coefficient | 0.129 | 0.103 | 0.379 |
Summary of the number of overlap genes, number of unique genes, and Jaccard coefficient among three representative databases.
Summary of overlapping gene pair proportions.
| KEGG vs WikiPathways | WikiPathways vs HumanCyc | HumanCyc vs KEGG | |
|---|---|---|---|
| Overlap Gene Pairs | 1198 | 468 | 1,270 |
| Unique Gene Pairs | 45,205 | 23,060 | 32,987 |
| Jaccard Coefficient | 0.026 | 0.020 | 0.037 |
| KEGG vs WikiPathways | WikiPathways vs MouseCyc | MouseCyc vs KEGG | |
| Overlap Gene Pairs | 875 | 1,242 | 2,068 |
| Unique Gene Pairs | 55,489 | 33,312 | 38,891 |
| Jaccard Coefficient | 0.016 | 0.036 | 0.050 |
| KEGG vs WikiPathways | WikiPathways vs YeastCyc | YeastCyc vs KEGG | |
| Overlap Gene Pairs | 35 | 9 | 419 |
| Unique Gene Pairs | 2,909 | 1,479 | 3,524 |
| Jaccard Coefficient | 0.012 | 0.006 | 0.106 |
| KEGG vs WikiPathways | WikiPathways vs MTBRvCyc | MTBRvCyc vs KEGG | |
| Overlap Gene Pairs | 9 | 8 | 358 |
| Unique Gene Pairs | 3,819 | 2,810 | 5,823 |
| Jaccard Coefficient | 0.002 | 0.003 | 0.058 |
Summary of the number of overlap gene pairs, number of unique gene pairs, and Jaccard coefficient among three representative databases.
Figure 3Venn diagram of pathways in different databases. Venn diagram depicting overlapping pathways across the three databases.
Table showing data overlap for same chosen pathways in difference source databases.
| TCA cycle pathway | KEGG vs WikiPathways | KEGG vs MouseCyc | MouseCyc vs WikiPathways | |
|---|---|---|---|---|
| Gene | Count | 31 vs 30 | 31 vs 13 | 13 vs 30 |
| Overlap | 24 | 13 | 11 | |
| Jaccard Coefficient | 0.65 | 0.42 | 0.34 | |
| Gene Pair | Count | 100 vs 30 | 100 vs 24 | 24 vs 30 |
| Overlap | 10 | 9 | 7 | |
| Jaccard Coefficient | 0.083 | 0.078 | 0.149 | |
| Fatty Acid Biosynthesis | KEGG vs WikiPathways | KEGG vs HumanCyc | HumanCyc vs WikiPathways | |
| Gene | Count | 6 vs 22 | 6 vs 2 | 2 vs 22 |
| Overlap | 3 | 2 | 1 | |
| Jaccard Coefficient | 0.12 | 0.33 | 0.04 | |
| Gene Pair | Count | 12 vs 29 | 12 vs 2 | 2 vs 29 |
| Overlap | 1 | 1 | 0 | |
| Jaccard Coefficient | 0.025 | 0.077 | 0.0 | |
| TCA cycle pathway | KEGG vs WikiPathways | KEGG vs MTBRvCyc | MTBRvCyc vs WikiPathways | |
| Gene | Count | 35 vs 34 | 35 vs 10 | 10 vs 34 |
| Overlap | 34 | 10 | 10 | |
| Jaccard Coefficient | 0.97 | 0.29 | 0.29 | |
| Gene Pair | Count | 107 vs 37 | 107 vs 19 | 19 vs 37 |
| Overlap | 3 | 9 | 5 | |
| Jaccard Coefficient | 0.021 | 0.077 | 0.098 | |
This table shows the calculation of gene/gene pair differences and overlap between the different source databases for the same chosen pathways.
Examples of inconsistent referrals to pathway names in M. musculus.
| IntPath | KEGG | WikiPathways | MouseCyc |
|---|---|---|---|
| Fatty Acid | Fatty acid | Fatty Acid | 1. fatty acid biosynthesis initiation II |
| Biosynthesis | biosynthesis | Biosynthesis | 2. very long chain fatty acid biosynthesis |
| 3. fatty acid biosynthesis initiation III | |||
| Cholesterol | Cholesterol | 1. cholesterol biosynthesis III (via desmosterol) | |
| Biosynthesis | Biosynthesis | 2. cholesterol biosynthesis II (via 24,25-dihydrolanosterol) | |
| 3. cholesterol biosynthesis I | |||
| 4. superpathway of cholesterol biosynthesis | |||
| TCA cycle | Citrate cycle (TCA cycle) | TCA cycle | TCA Cycle |
| Glycolysis and Gluconeogenesis | Glycolysis/Gluconeogenesis | Glycolysis and Gluconeogenesis | 1. glycolysis I 2. glycolysis II |
The table shows several examples of the same pathways with inconsistent referrals to pathway names in different databases.
Number of related pathways.
| KEGG | HumanCyc | WikiPathways | |
|---|---|---|---|
| KEGG | 5 | 3 | 29 |
| HumanCyc | 3 | 34 | 12 |
| WikiPathways | 29 | 12 | 4 |
| KEGG | MouseCyc | WikiPathways | |
| KEGG | 6 | 6 | 32 |
| MouseCyc | 6 | 61 | 14 |
| WikiPathways | 32 | 14 | 10 |
| KEGG | YeastCyc | WikiPathways | |
| KEGG | 1 | 10 | 11 |
| YeastCyc | 10 | 25 | 74 |
| WikiPathways | 11 | 74 | 15 |
| KEGG | MTBRvCyc | WikiPathways | |
| KEGG | 1 | 7 | 8 |
| MTBRvCyc | 7 | 35 | 2 |
| WikiPathways | 8 | 2 | 0 |
Summary of the number of identified related pathways within and among databases.
Summary of number of pathways, average number of genes per pathway and average number of gene pairs per pathway before and after integration.
| No. of Pathways BEFORE integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
|---|---|---|---|
| WikiPathways | 135 pathways | 46.3 | 166.2 |
| HumanCyc | 290 pathways | 7.2 | 33.0 |
| KEGG | 237 pathways | 72.4 | 171.3 |
| No. of unique Pathways AFTER integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 100 pathways | 42.7 | 157.4 |
| HumanCyc | 225 pathways | 7.2 | 31.6 |
| KEGG | 201 pathways | 72.6 | 165.3 |
| Integrated Pathways | 57 pathways | 59.5 | 263.6 |
| No. of Pathways BEFORE integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 140 pathways | 57.8 | 209.1 |
| MouseCyc | 323 pathways | 8.0 | 61.4 |
| KEGG | 218 pathways | 74.6 | 194.8 |
| No. of unique Pathways AFTER integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 97 pathways | 56.8 | 242.8 |
| MouseCyc | 204 pathways | 7.4 | 43.0 |
| KEGG | 172 pathways | 77.9 | 187.3 |
| Integrated Pathways | 85 pathways | 52.6 | 260.9 |
| No. of Pathways BEFORE integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 125 pathways | 11.8 | 0.5 |
| YeastCyc | 184 pathways | 6.5 | 13.4 |
| KEGG | 98 pathways | 35.2 | 34.7 |
| No. of unique Pathways AFTER integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 45 pathways | 15.1 | 0.2 |
| YeastCyc | 85 pathways | 5.8 | 11.6 |
| KEGG | 80 pathways | 38.0 | 35.0 |
| Integrated Pathways | 76 pathways | 14.1 | 25.2 |
| No. of Pathways BEFORE integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 8 pathways | 22.3 | 7.8 |
| MTBRvCyc | 234 pathways | 5.7 | 18.9 |
| KEGG | 110 pathways | 32.5 | 47.5 |
| No. of unique Pathways AFTER integration | Average No. of genes/pathway | Average No. of gene pairs/pathway | |
| WikiPathways | 0 pathways | ||
| MTBRvCyc | 171 pathways | 5.9 | 21.0 |
| KEGG | 94 pathways | 35.4 | 51.7 |
| Integrated Pathways | 35 pathways | 12.3 | 25.4 |
The table below shows the number of pathways from major pathway databases before and after integration.
Figure 4IntPath system overview. This figure shows the components of IntPath database, the relationships between those components and a clear indication on which components are supported by web service and which are supported by web interface.
Figure 5Core functions of IntPath. This figure shows the core functions of IntPath, the relationships between those core functions, database and web service.