| Literature DB >> 31733064 |
Sara Rahmati1,2,3, Mark Abovsky1, Chiara Pastrello1, Max Kotlyar1, Richard Lu1, Christian A Cumbaa1, Proton Rahman3, Vinod Chandran1,2,3,4,5,6, Igor Jurisica1,7,8,9.
Abstract
PathDIP was introduced to increase proteome coverage of literature-curated human pathway databases. PathDIP 4 now integrates 24 major databases. To further reduce the number of proteins with no curated pathway annotation, pathDIP integrates pathways with physical protein-protein interactions (PPIs) to predict significant physical associations between proteins and curated pathways. For human, it provides pathway annotations for 5366 pathway orphans. Integrated pathway annotation now includes six model organisms and ten domesticated animals. A total of 6401 core and ortholog pathways have been curated from the literature or by annotating orthologs of human proteins in the literature-curated pathways. Extended pathways are the result of combining these pathways with protein-pathway associations that are predicted using organism-specific PPIs. Extended pathways expand proteome coverage from 81 088 to 120 621 proteins, making pathDIP 4 the largest publicly available pathway database for these organisms and providing a necessary platform for comprehensive pathway-enrichment analysis. PathDIP 4 users can customize their search and analysis by selecting organism, identifier and subset of pathways. Enrichment results and detailed annotations for input list can be obtained in different formats and views. To support automated bioinformatics workflows, Java, R and Python APIs are available for batch pathway annotation and enrichment analysis. PathDIP 4 is publicly available at http://ophid.utoronto.ca/pathDIP.Entities:
Year: 2020 PMID: 31733064 PMCID: PMC7145646 DOI: 10.1093/nar/gkz989
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Coverage of different annotation sets (core, ortholog and extended pathways) for unique proteins and the number of pathways per annotation set across different species, as well as the number and ratio of proteins annotated only through network-based predictions
|
|
Classification of pathway source databases according to their context-specificity, and their colour-coding in pathDIP 4. This categorization facilitates selecting pathway databases in the context that is most suitable to each study
|
|
Figure 1.(A) Distribution of the number of non-human organisms covered by each ortholog pathway. Pathways with count organism ‘16’ are human pathways for which we provide ortholog pathways in all non-human organisms, while pathways with count organism ‘0’ are human pathways with no ortholog pathways in pathDIP 4. (B) Recovery rate for core protein-pathway pairs through ortholog and network-based predictions. In chicken and cow, where almost all available PPIs used for network-based predictions are orthologous ones, networks do not improve recovery, whereas in fly, yeast and worm, in which experimental PPIs are more prevalent, networks improve recovery drastically. (C) Comparing coverage of the three largest pathway databases pathDIP 4 (core, ortho, extended), full Reactome (i.e. combination of core and ortholog pathways) and full WikiPathways for proteins across different species shows that extended pathways in pathDIP 4 annotate the largest number of proteins compared with the other two databases. The only two exceptions are chicken and pig. Among all 24 source databases, Reactome and WikiPathways provide the two largest number of core pathways in human (separately) and in non-human organisms (combined) (details in Supplementary Tables S1A and B). Plots generated using R package ggplot2 (version 3.2.1).
Coverage and recovery rate of core pathways, extended pathways using only experimental PPIs, and extended pathways using full (combination of experimental and predicted) PPIs for human proteins (top row) and their pathway memberships (bottom row) in pathDIP 4.
| Human pathways | Curated | Extended using experimental PPIs (0.95) | Extended using experimental and predicted PPIs (0.95) |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
Extended pathways are based on 95% confidence cut-off.
Figure 2.(A) Conservation of DNA-replication pathways across multiple species. For DNA replication pathways in each species, orthologs to human were considered, and overlap of orthologs per consecutive pairs of species are shown. Each bar represents an organism and each color represents genes present in ortholog (bottom), expanded (middle) and core (top) DNA replication pathways. Flow among bars depicts the number of proteins with orthologs from the starting organism to the landing one. Flow color grows darker with the number of organisms in which a set of proteins is conserved. (B) Overlap among core pathways in different species after enrichment analysis using osteoarthritis genes or their orthologs. (C) Heatmap of Jaccard indices of pathways across different species. Human pathways were obtained using core pathways while for every other species only ortholog pathways were used. Plots generated based on R package alluvial (version 6.3.0) modified by authors, UpSetR (version 1.3.3), and ggplot2 (version 3.1.0).