| Literature DB >> 31366992 |
R Ranjani Rani1, D Ramyachitra2, A Brindhadevi1.
Abstract
The accessibility of a huge amount of protein-protein interaction (PPI) data has allowed to do research on biological networks that reveal the structure of a protein complex, pathways and its cellular organization. A key demand in computational biology is to recognize the modular structure of such biological networks. The detection of protein complexes from the PPI network, is one of the most challenging and significant problems in the post-genomic era. In Bioinformatics, the frequently employed approach for clustering the networks is Markov Clustering (MCL). Many of the researches for protein complex detection were done on the static PPI network, which suffers from a few drawbacks. To resolve this problem, this paper proposes an approach to detect the dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach (DMCL-EHO). Initially, the proposed method divides the PPI network into a set of dynamic subnetworks under various time points by combining the gene expression data and secondly, it employs the clustering analysis on every subnetwork using the MCL along with Elephant Herd Optimization approach. The experimental analysis was employed on different PPI network datasets and the proposed method surpasses various existing approaches in terms of accuracy measures. This paper identifies the common protein complexes that are expressively enriched in gold-standard datasets and also the pathway annotations of the detected protein complexes using the KEGG database.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31366992 PMCID: PMC6668483 DOI: 10.1038/s41598-019-47468-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overall Flowchart of the proposed method.
The association between the components of DMCL-EHO and the protein complex
| DMCL-EHO | Protein Complex |
|---|---|
| Elephant | The temporary proteins in dynamic subnetwork. |
| Population | Static PPI Network |
| Clan | Dynamic PPI Network |
| Fitness of Clan | Clustering result of the proposed method |
| Fittest Clan | Best result of the proposed method |
| Position of an elephant | Value of Parameters |
List of datasets and gold standard benchmark databases.
| S. No | Saccharomyces cerevisiae | Homo Sapiens | Saccharomyces cerevisiae & Homo Sapiens | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | No of Proteins | No of Interactions | Dataset | No of Proteins | No of Interactions | Dataset | ORGANISM | No of Proteins | No of Interactions | |
| 1 | Gavin2 | 1430 | 6531 | HPRD | 10080 | 39209 | DIP | Yeast | 5221 | 24918 |
| Human | 5048 | 9141 | ||||||||
| 2 | Gavin6 | 1855 | 7669 | HPID | 27049 | 16390 | BioGRID | Yeast | 7161 | 53791 |
| Human | 23373 | 365293 | ||||||||
| 3 | Krogan-Core | 2708 | 7123 | PIPs | 32179 | 14979 | STRING | Yeast | 6691 | 184596 |
| Human | 19566 | 1258291 | ||||||||
| 4 | Krogan-Extended | 3581 | 14076 | — | — | — | — | — | — | — |
| 5 | Collins | 1622 | 9074 | — | — | — | — | — | — | — |
| 6 | Gavin + Krogan | 2964 | 13507 | — | — | — | — | — | — | — |
| 7 | WI-PHI | 5955 | 50000 | — | — | — | — | — | — | — |
| — | — | — | — | — | — | — | ||||
| 1 | CYC2008 | 1627 | 408 | 408 | ||||||
| 2 | MIPS | 1189 | 11119 | 203 | ||||||
| 3 | SGD | 1279 | 19854 | 323 | ||||||
| 4 | PCDq | 9268 | 32198 | 1264 | ||||||
Figure 2Comparison of Number of Clusters with various Datasets and Algorithms against CYC2008 Benchmark Dataset,
Figure 7Comparison of Accuracy with various Datasets and Algorithms against CYC2008 Benchmark Dataset.
Figure 8Comparison of Number of Clusters and Coverage Ratio with HPRD Dataset and Algorithms against PCDq Benchmark Dataset.
Figure 9Comparison of Precision, Recall, F-Measure and Accuracy with HPRD Dataset and Algorithms against PCDq Benchmark Dataset.
Figure 3Comparison of Coverage Ratio with various Datasets and Algorithms against CYC2008 Benchmark Dataset.
Figure 4Comparison of Precision with various Datasets and Algorithms against CYC2008 Benchmark Dataset.
Figure 10Comparison of Accuracy with Random Deletion of Protein Interactions on DIP Dataset against CYC2008 benchmark database.
Figure 11Comparison of Accuracy with Random Insertion of Protein Interactions on DIP Dataset against CYC2008 benchmark database.
Various Parameter Values of proposed and existing methods for protein complex detection.
| Parameters | Variable | MCL | SR-MCL | CSO | PSO-MCL | ACO-MCL | AFA-MCL | F-MCL | FOCA | EHO-MCL |
|---|---|---|---|---|---|---|---|---|---|---|
| Inflation constant | ic | 2 | 2 | automatic | automatic | automatic | automatic | automatic | automatic | Automatic |
| Lowest ic | Lic | 1 | 1 | 1 | 1 | 1 | 1 | 1 | ||
| Highest ic | Hic | 6 | 6 | 6 | 6 | 6 | 6 | 6 | ||
| Balance | B | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
| Penalty proportion | Pp | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 | 1.25 |
| Number of population | K | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | |
| Maximum geneartion | mGen | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | |
| Cognitive and Social acceleration coefficient | C1 and c2 | 2 | ||||||||
| Maximum velocity | MaxV | 0.5 | ||||||||
| evaporation rate | р | 0.1 | ||||||||
| Heuristic information | H | 1.2 | ||||||||
| pseudo random proportion selection rule | q0 | 0.9 | ||||||||
| Visual range | Vis | 0.9 | ||||||||
| Step length | S | 0.05 | 0.05 | 0.05 | ||||||
| Light absorption coefficient | λ | 1.0 | ||||||||
| Maximum attractiveness | Ma | 1.0 | ||||||||
| Scale regulates | α | 0.5 | ||||||||
| Scale regulates | β | 0.1 | ||||||||
| Number of clans | allclan | 20 |
Figure 6Comparison of F-Measure with various Datasets and Algorithms against CYC2008 Benchmark Dataset.
Statistical Significance of proposed and existing approaches based on F-Measure and Accuracy.
| MCODE | MCL | COACH | ClusterONE | RNSC | Maulik U | CFinder | CMC | CSO | SR-MCL | PSO-MCL | ACO-MCL | AFA-MCL | F-MCL | FOCA | EHO-MCL | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MCODE | 0 | 0.005 | 0.241 | 0.028 | 0.015 | 0.508 | 0.012 | 0.017 | 0.019 | 0.026 | 0.018 | 0.006 | 0.006 | 0.006 | 0.005 | 0.004 |
| MCL | 0.005 | 0 | 0.047 | 0.022 | 0.059 | 0.035 | 0.333 | 0.878 | 0.52 | 0.008 | 0.007 | 0.008 | 0.006 | 0.006 | 0.006 | 0.005 |
| COACH | 0.241 | 0.037 | 0 | 0.646 | 0.022 | 0.508 | 0.013 | 0.093 | 0.008 | 0.009 | 0.009 | 0.005 | 0.006 | 0.007 | 0.005 | 0.005 |
| ClusterONE | 0.174 | 0.089 | 0.521 | 0 | 0.089 | 0.017 | 0.015 | 0.059 | 0.025 | 0.012 | 0.008 | 0.008 | 0.007 | 0.008 | 0.005 | 0.005 |
| RNSC | 0.05 | 0.074 | 0.022 | 0.022 | 0 | 0.025 | 0.022 | 0.015 | 0.016 | 0.028 | 0.008 | 0.007 | 0.007 | 0.009 | 0.005 | 0.005 |
| Maulik U | 0.333 | 0.065 | 0.103 | 0.035 | 0.005 | 0 | 0.035 | 0.027 | 0.025 | 0.015 | 0.005 | 0.006 | 0.006 | 0.006 | 0.005 | 0.003 |
| CFinder | 0.024 | 0.138 | 0.013 | 0.029 | 0.028 | 0.035 | 0 | 0.285 | 0.025 | 0.015 | 0.005 | 0.005 | 0.005 | 0.008 | 0.005 | 0.002 |
| CMC | 0.035 | 0.093 | 0.017 | 0.022 | 0.203 | 0.03 | 0.059 | 0 | 0.015 | 0.025 | 0.015 | 0.005 | 0.005 | 0.009 | 0.005 | 0.001 |
| CSO | 0.038 | 0.045 | 0.015 | 0.019 | 0.169 | 0.027 | 0.025 | 0.017 | 0 | 0.028 | 0.012 | 0.004 | 0.005 | 0.006 | 0.005 | 0.003 |
| SR-MCL | 0.027 | 0.035 | 0.035 | 0.025 | 0.017 | 0.021 | 0.035 | 0.015 | 0.013 | 0 | 0.022 | 0.005 | 0.004 | 0.005 | 0.005 | 0.004 |
| PSO-MCL | 0.023 | 0.028 | 0.018 | 0.015 | 0.005 | 0.019 | 0.012 | 0.011 | 0.009 | 0.018 | 0 | 0.013 | 0.004 | 0.004 | 0.004 | 0.004 |
| ACO-MCL | 0.015 | 0.019 | 0.013 | 0.01 | 0.005 | 0.015 | 0.011 | 0.005 | 0.009 | 0.008 | 0.013 | 0 | 0.016 | 0.003 | 0.005 | 0.003 |
| AFA-MCL | 0.012 | 0.002 | 0.01 | 0.009 | 0.005 | 0.01 | 0.005 | 0.009 | 0.005 | 0.009 | 0.008 | 0.139 | 0 | 0.017 | 0.009 | 0.002 |
| F-MCL | 0.009 | 0.005 | 0.009 | 0.005 | 0.005 | 0.005 | 0.005 | 0.007 | 0.006 | 0.004 | 0.005 | 0.005 | 0.007 | 0 | 0.003 | 0.003 |
| FOCA | 0.004 | 0.007 | 0.006 | 0.002 | 0.005 | 0.004 | 0.005 | 0.005 | 0.005 | 0.005 | 0.004 | 0.005 | 0.005 | 0.005 | 0 | 0.002 |
| EHO-MCL | 0.005 | 0.004 | 0.004 | 0.003 | 0.002 | 0.005 | 0.002 | 0.008 | 0.002 | 0.003 | 0.003 | 0.002 | 0.004 | 0.005 | 0.001 | 0 |
Top 5 Common Protein Complexes, Gene Ontology Functions and KEGG Pathways of the Predicted Complexes of proposed method.
| S.no | Complex name | Real Complexes | Correctly Predicted Complexes | Wrong Complexes | BP | MF | CC | Pathways |
|---|---|---|---|---|---|---|---|---|
| 1 | Paf1p complex | YOL145C, YLR418C, YBR279W, YML069W, YGL207W, YGL244W, | YOL145C, YLR418C, YBR279W, YGL244W, | YEL037C | Positive regulation of transcription elongation from RNA polymerase I promoter (GO:2001209) | RNA polymerase II C-terminal domain phosphoserine binding (GO:1990269) | Cdc73/Paf1 complex(GO:0016593) | NIL |
| 2 | Condensin complex | YFR031C, YLR086W, YDR325W, YBL097W, YNL088W, | YFR031C, YLR086W, YDR325W, YBL097W, YNL088W, | NIL | tRNA gene clustering (GO:0070058) | Chromatin binding (GO:0003682) | Condensed nuclear chromosome(GO:0000794) | Cell- Cycle Yeast |
| 3 | RNA polymerase II mediator complexX | YHR058C, YDR308C, YHR041C, YNR010W, YOL135C, YBR253W, YOR174W, YMR112C, YPR168W | YHR058C, YDR308C, YHR041C, YBR253W, YOR174W, YMR112C, YPR168W | YKL081W, YCR033W | Positive regulation of transcription from RNA polymerase II promoter (GO:0045944) | transcription factor activity, RNA polymerase II transcription factor binding (GO:0001076) | Mediator complex (GO:0016592) | NIL |
| 4 | RNA polymerase I subunit | YNL248C, YJR063W, YJL148W, YOR340C, YPR010C, YPR187W, YBR154C, YOR224C, YNL113W | YNL248C, YJR063W, YJL148W, YOR340C, YPR010C, YPR187W, YBR154C, YNL113W | YIL7095W | Ribosome biogenesis (GO:0042254) | DNA-directed RNA polymerase activity (GO:0003899) | DNA-directed RNA polymerase I complex (GO:0005736) | RNA polymerase, Pyrimidine metabolism, Purine metabolism, Metabolic pathways |
| 5 | Small Subunit (SSU) processome complexes | YLR409C, YLR222C, YJL069C, YDR398W, YGR128C, YJL109C, YDR324C, YDR449C, YDL148C, YLR129W | YLR409C, YLR222C, YJL069C, YDR398W, YGR128C, YJL109C, YDR324C, YDR449C, YDL148C, YLR129W | NIL | Ribosomal small subunit biogenesis (GO:0042274) | snoRNA binding(GO:0030515) | Small-subunit processome (GO:0032040) | Ribosome biogenesis in eukaryotes |
| 1 | NOT core complex | YDL165W, YCR093W, YAL021C, YIL038C, YGR134W, YNL288W, YDR252W, YER068W, YNR052C, YPR072W | YDL165W, YCR093W, YAL021C, YIL038C, YDR252W, YER068W, YNR052C, YPR072W | YMR149WYJR035W, YJR112W | Nuclear-transcribed mRNA catabolic process, deadenylation-dependent decay (GO:0000288) | NIL | CCR4-NOT core complex (GO:0030015) | RNA degradation |
| 2 | Mitochondrial F1F0 ATP synthase | YLR295C, YDL004W, YBL099W, YBR039W, YJR121W, YPL078C, YKL016C, YEL027W, YEL051W, YDL185W, YLR447C | YLR295C, YDL004W, YBL099W, YBR039W, YJR121W, YPL078C, YKL016C, YEL027W, YEL051W, | YNL189W, YER031C, YGL181W | ATP hydrolysis coupled proton transport (GO:0015991) | Proton-transporting ATPase activity, rotational mechanism (GO:0046961) | Mitochondrial proton-transporting ATP synthase complex(GO:0005753) 4.1E-3 | Oxidative phosphorylation, P-Value: 9.4E-13 Enrichment Score: 2.8E-12 Metabolic pathways |
| 3 | Putative ferric reductase | YBR207W, YLR214W, YER145C, YLR047C, YOL152W, YKL220C, YFL041W, YMR319C, YLL051C, | YBR207W, YLR214W, YER145C, YLR047C, YLL051C, | YOR227W, YKL196C, | Iron ion homeostasis (GO:0055072) | Ferroxidase activity (GO:0004322) | Plasma membrane (GO:0005886) | NIL |
| 4 | Component of spindle pole body | YKL042W, YDR356W, YPL124W, YMR117C, YAL047C, YHR172W, YNL126W, YML124C, YLR212C, YNL188W | YDR356W, YPL124W, YMR117C, YAL047C, YHR172W, YNL126W, YML124C, YLR212C, YNL188W | YML048W | Microtubule nucleation(GO:0007020) | Structural constituent of cytoskeleton(GO:0005200) | Microtubule organizing center part(GO:0044450) | NIL |
| 5 | PRoteinase yscE | YKL206C, YER012W, YJL001W, YFR050C, YMR314W, YOL038W, YBL041W, YML092C, YGR135W, YOR362C, YER094C | YKL206C, YER012W, YJL001W, YFR050C, YOL038W, YBL041W, YML092C, YGR135W, YOR362C, YER094C | YOL061W, YIL006W | Proteasomal ubiquitin-independent protein catabolic process(GO:0010499) | Threonine-type endopeptidase activity (GO:0004298) | Proteasome storage granule (GO:0034515) | Proteasome |
Figure 12Top 5 common protein complexes, gene ontology functions and KEGG pathways of the predicted complexes of proposed method on Krogan-Extended Dataset.
Figure 13Top 5 common protein complexes, gene ontology functions and KEGG pathways of the predicted complexes of proposed method on DIP Dataset.
Figure 14Comparison of Average Execution Time of the proposed algorithm with the existing algorithms.