| Literature DB >> 36207692 |
Rongquan Wang1, Caixia Wang2, Huimin Ma3.
Abstract
BACKGROUND: Accurate identification of protein complexes in protein-protein interaction (PPI) networks is crucial for understanding the principles of cellular organization. Most computational methods ignore the fact that proteins in a protein complex have a functional similarity and are co-localized and co-expressed at the same place and time, respectively. Meanwhile, the parameters of the current methods are specified by users, so these methods cannot effectively deal with different input PPI networks. RESULT: To address these issues, this study proposes a new method called MP-AHSA to detect protein complexes with Multiple Properties (MP), and an Adaptation Harmony Search Algorithm is developed to optimize the parameters of the MP algorithm. First, a weighted PPI network is constructed using functional annotations, and multiple biological properties and the Markov cluster algorithm (MCL) are used to mine protein complex cores. Then, a fitness function is defined, and a protein complex forming strategy is designed to detect attachment proteins and form protein complexes. Next, a protein complex filtering strategy is formulated to filter out the protein complexes. Finally, an adaptation harmony search algorithm is developed to determine the MP algorithm's parameters automatically.Entities:
Keywords: Adaptation harmony search algorithm; Core-attachment structure; Fitness function; Multiple properties; Protein complex; Protein-protein interaction network
Mesh:
Substances:
Year: 2022 PMID: 36207692 PMCID: PMC9541083 DOI: 10.1186/s12859-022-04923-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The detection process of protein complexes from a protein-protein interaction (PPI) network
Symbol and its explanation in this paper
| ID | symbol | Explanation |
|---|---|---|
| 1 | PPI | Protein-protein interaction |
| 2 | MP | Multiple properties |
| 3 | MP-AHSA | Multiple properties and an adaptation harmony search algorithm |
| 4 | MCL | Markov cluster algorithm |
| 5 | TCSS method | Topological clustering semantic similarity |
| 6 | GO | Gene ontology |
| 7 | CC | Cellular component |
| 8 | BP | Biological process |
| 9 | MF | Molecular function |
| 10 | WCC | Weighted local clustering coefficient |
| 11 | LN(c) | The union set of the first neighbors of protein c and itself |
| 12 | V(G) | The set of proteins in G |
| 13 | SLD | Subcellular localization data |
| 14 | GED | Gene expression data |
| 15 | CEV | Co-expression threshold value |
| 16 | G | Weighted PPI network |
| 17 | GCE | Gene co-expression threshold |
| 18 | PCCs | The set of protein complex cores |
| 19 | PCC | A protein complex core |
| 20 | Neighbor(PCC) | The neighbors of protein complex core |
| 21 | cohesiveness(C) | The cohesiveness score of cluster C |
| 22 | density(C) | The weighted density of cluster C |
| 23 | awm(C) | The average weighted modularity of cluster C |
| 24 | VC | The set of proteins in the cluster C |
| 25 | EC | The set of interactions in the cluster C |
| 26 | WC | The set of weights between the protein pair in the cluster C |
| 27 | fitness(C) | The fitness function score of cluster C |
| 28 | N(PCC) | The potential attachment proteins of the cluster PCC |
| 29 | attachscore(v,PCC) | The sum of weights between protein v and the protein complex core PCC |
| 30 | CPC | A candidate protein complex |
| 31 | FPC | A filtered protein complex |
| 32 | FPCs | The set of filtered protein complexes |
| 33 | The functional annotation term with the most common proteins in the identified protein complex has | |
| 34 | HAS | The harmony search algorithm |
| 35 | HMCR | The harmony memory considering rate of AHSA method |
| 36 | PAR | The pitch adjusting rate of AHSA method |
| 37 | FW | The fret width of AHSA method |
| 38 | OFfitness | The sum of the fitness function of the detected protein complexes and it is used as the objective function |
| 39 | K | The number of identified protein complexes |
| 40 | Fitness(Ci) | The fitness function of the ith identified protein complex Ci |
| 41 | i | The iteration times |
| 42 | HMs | The harmony memory |
| 43 | HM | A harmony |
| 44 | R1,R2 | The variable value by randomly generated within [0,1] |
| 45 | fitnessmax,fitnessmin | The maximum and minimum values of |
| 46 | HMnew | The new harmony generated |
| 47 | HMsmin | The worst harmony in HMs |
| 48 | Maxiter | The termination time |
| 49 | HMsbest | The best clustering HM in the harmony memory HMs |
| 50 | IPCs | The identified protein complexes |
Fig. 2MP-AHSA algorithm detects protein complexes from PPI network
Main parameters of AHSA
| ID | Parameters | Abbreviation | Parameter value range and setting |
|---|---|---|---|
| 1 | Harmony memory | 30 | |
| 2 | Harmony memory considering rate | ||
| 3 | Pitch adjusting rate | ||
| 4 | Fret width | ||
| 5 | The maximum number of iterations | 300 | |
| 6 | Gene co-expression threshold | ||
| 7 | The inflate of MCL | ||
| 8 | The ratio of initial seeds |
Main modifications of AHSA
| Parameters | Adaptive adjustment |
|---|---|
i is the times of iteration
Detailed properties of the experimental PPI networks used in the study
| Dataset | Nodes | Edges | Density |
|---|---|---|---|
| Collins | 1622 | 9074 | 0.006902317076 |
| Gavin | 1855 | 7669 | 0.004459796985 |
| Krogan | 2674 | 7075 | 0.001979684934 |
| String | 1366 | 5071 | 0.005439265468 |
| DIP | 4696 | 21822 | 0.001979524413 |
| Biogrid | 4093 | 13178 | 0.001573628198 |
Properties of the standard protein complexes used in the study
| Datasets | Num | PC | AS |
|---|---|---|---|
| standard protein complexes 1 | 812 | 2773 | 8.92 |
| standard protein complexes 2 | 1045 | 2778 | 8.97 |
AS: average size of the protein complexes; Num: number of protein complexes; PC: number of proteins
Parameters of each method used in the study
| ID | Year | Algorithm | Parameters |
|---|---|---|---|
| 1 | 2004 | MCL | inflation=2 |
| 2 | 2008 | IPCA | S=3,P=2, |
| 3 | 2008 | COACH | w=0.225 |
| 4 | 2009 | CMC | |
| 5 | 2010 | SPICi | Graph mode=0,minimum support threshold= 0.5, minimum cluster size= 3, minimum density threshold=0.5 |
| 6 | 2012 | ClusterONE | Density=auto,Overlap threshold=0.8 |
| 7 | 2013 | PEWCC | Overlap=0.8,-r=0.1,Re-join=0.3 |
| 8 | 2015 | WPNCA | lambda=0.3,minimum cluster size=3 |
| 9 | 2016 | WEC | Balance factor ( |
| 10 | 2018 | ClusterEPs | NEPs of Complexes(minimum support threshold=0.4,maximum support threshold=0.05); NEPs of non-complexes(maximum support threshold=0.05, minimum support threshold=0.4) ;maximum overlap=0.9,Maximum size of clusters=100 |
| 11 | 2018 | ClusterSS | numEpochs = 500,learnRate =0.2,thresholdIn=1.0,thresholdOut=1.02, negativeTime=20, minimum cluster size=3 |
| 12 | 2019 | SE-DMTG | minimum cluster size=3 |
| 13 | 2020 | MPC-C | Overlap threshold=0.8,minimum cluster size=3 |
| 14 | 2021 | GCC-v | Minimum cluster size=3 |
Fig. 3Comparative analysis of identified protein complexes from different approaches in Collins PPI network and two standard protein complexes. The comparative analyses are based on a total score that is a sum of ACC, F-measure, MMR, Frac, and Jaccard (see Evaluation metrics)
Fig. 4Comparative analysis of identified protein complexes from different approaches in Gavin PPI network and two standard protein complexes. The comparative analyses are based on a total score that is a sum of ACC, F-measure, MMR, Frac, and Jaccard (see Evaluation metrics)
Fig. 5Comparative analysis of identified protein complexes from different approaches in Krogan PPI network and two standard protein complexes. The comparative analyses are based on a total score that is a sum of ACC, F-measure, MMR, Frac, and Jaccard (see Evaluation metrics)
Fig. 6The 390th protein complex in standard protein complexes 1 detected by different methods based on the Gavin dataset. True positive, false-positive, and false-negative proteins are shown in red, blue, and yellow, respectively
Functional enrichment analysis of protein complexes detected by different methods in Collins, Gavin and Krogan datasets
| Method | [E-20,E-15) | [E-15,E-10) | [E-10,E-5) | [E-5,0.01) | ||
|---|---|---|---|---|---|---|
| Collins dataset | ||||||
| MCL | 62(39.24%) | 9(5.7%) | 25(15.82%) | 40(25.32%) | 4(2.53%) | 140(88.61%) |
| IPCA | 108(31.58%) | 37(10.82%) | 63(18.42%) | 97(28.36%) | 16(4.68%) | 321(93.86%) |
| COACH | 64(25.5%) | 22(8.76%) | 39(15.54%) | 80(31.87%) | 14(5.58%) | 219(87.25%) |
| CMC | 54(30.51%) | 17(9.6%) | 22(12.43%) | 63(35.59%) | 8(4.52%) | 164(92.66%) |
| SPICi | 62(51.24%) | 10(8.26%) | 19(15.7%) | 25(20.66%) | 3(2.48%) | |
| ClusterONE | 47(23.15%) | 19(9.36%) | 45(22.17%) | 61(30.05%) | 11(5.42%) | 183(90.15%) |
| PEWCC | 128(30.05%) | 21(4.93%) | 104(24.41%) | 120(28.17%) | 18(4.23%) | 391(91.78%) |
| WPNCA | 90(33.46%) | 33(12.27%) | 61(22.68%) | 52(19.33%) | 7(2.6%) | 243(90.33%) |
| WEC | 394(40.74%) | 81(8.38%) | 174(17.99%) | 261(26.99%) | 23(2.38%) | 933(96.48%) |
| ClusterEPs | 4(0.68%) | 13(2.21%) | 95(16.18%) | 350(59.63%) | 74(12.61%) | 536(91.31%) |
| ClusterSS | 22(10.05%) | 19(8.68%) | 48(21.92%) | 93(42.47%) | 18(8.22%) | 200(91.32%) |
| 28(12.96%) | 25(11.57%) | 45(20.83%) | 85(39.35%) | 19(8.8%) | 202(93.52%) | |
| SE-DMTG | 58(34.73%) | 22(13.17%) | 29(17.37%) | 46(27.54%) | 6(3.59%) | 161(96.41%) |
| MPC-C | 75(27.37%) | 35(12.77%) | 49(17.88%) | 86(31.39%) | 10(3.65%) | 255(93.07%) |
| GCC-v | 11(5.16%) | 19(8.92%) | 28(13.15%) | 107(50.23%) | 29(13.62%) | 194(91.08%) |
| MP-AHSA | 75(27.17%) | 36(13.04%) | 48(17.39%) | 94(34.06%) | 15(5.43%) | 268(97.1%) |
| Gavin dataset | ||||||
| MCL | 24(10.91%) | 22(10.0%) | 35(15.91%) | 72(32.73%) | 22(10.0%) | 175(79.55%) |
| IPCA | 121(26.08%) | 58(12.5%) | 70(15.09%) | 106(22.84%) | 41(8.84%) | 396(85.34%) |
| COACH | 124(34.35%) | 34(9.42%) | 52(14.4%) | 83(22.99%) | 18(4.99%) | 311(86.15%) |
| CMC | 71(24.15%) | 15(5.1%) | 40(13.61%) | 76(25.85%) | 21(7.14%) | 223(75.85%) |
| SPICi | 47(24.87%) | 15(7.94%) | 30(15.87%) | 54(28.57%) | 17(8.99%) | 163(86.24%) |
| ClusterONE | 52(20.16%) | 11(4.26%) | 36(13.95%) | 78(30.23%) | 20(7.75%) | 197(76.36%) |
| PEWCC | 76(11.45%) | 51(7.68%) | 108(16.27%) | 224(33.73%) | 77(11.6%) | 536(80.72%) |
| WPNCA | 128(26.45%) | 32(6.61%) | 100(20.66%) | 158(32.64%) | 19(3.93%) | 437(90.29%) |
| WEC | 261(28.87%) | 82(9.07%) | 151(16.7%) | 234(25.88%) | 66(7.3%) | 794(87.83%) |
| ClusterEPs | 74(27.31%) | 35(12.92%) | 47(17.34%) | 62(22.88%) | 22(8.12%) | 240(88.56%) |
| ClusterSS | 27(6.47%) | 24(5.76%) | 57(13.67%) | 178(42.69%) | 50(11.99%) | 336(80.58%) |
| 30(7.59%) | 21(5.32%) | 68(17.22%) | 165(41.77%) | 42(10.63%) | 326(82.53%) | |
| SE-DMTG | 82(35.65%) | 35(15.22%) | 38(16.52%) | 48(20.87%) | 13(5.65%) | 216(93.91%) |
| MPC-C | 124(31.16%) | 38(9.55%) | 58(14.57%) | 152(38.19%) | 10(2.51%) | |
| GCC-v | 13(4.45%) | 15(5.14%) | 27(9.25%) | 101(34.59%) | 44(15.07%) | 200(68.49%) |
| MP-AHSA | 100(27.17%) | 30(8.15%) | 58(15.76%) | 125(33.97%) | 32(8.7%) | 345(93.75%) |
| Krogan dataset | ||||||
| MCL | 31(8.38%) | 23(6.22%) | 40(10.81%) | 118(31.89%) | 31(8.38%) | 243(65.68%) |
| IPCA | 101(17.35%) | 70(12.03%) | 90(15.46%) | 218(37.46%) | 39(6.7%) | 518(89.0%) |
| COACH | 68(19.71%) | 33(9.57%) | 53(15.36%) | 118(34.2%) | 27(7.83%) | 299(86.67%) |
| CMC | 36(13.64%) | 19(7.2%) | 38(14.39%) | 92(34.85%) | 21(7.95%) | 206(78.03%) |
| SPICi | 10(4.46%) | 17(7.59%) | 42(18.75%) | 68(30.36%) | 25(11.16%) | 162(72.32%) |
| ClusterONE | 34(14.17%) | 16(6.67%) | 34(14.17%) | 109(45.42%) | 14(5.83%) | 207(86.25%) |
| PEWCC | 146(37.53%) | 50(12.85%) | 71(18.25%) | 95(24.42%) | 16(4.11%) | |
| WPNCA | 106(28.73%) | 52(14.09%) | 61(16.53%) | 114(30.89%) | 17(4.61%) | 350(94.85%) |
| WEC | 171(33.14%) | 64(12.4%) | 88(17.05%) | 141(27.33%) | 19(3.68%) | 483(93.6%) |
| ClusterEPs | 53(12.93%) | 32(7.8%) | 57(13.9%) | 237(57.8%) | 14(3.41%) | 393(95.85%) |
| ClusterSS | 35(7.73%) | 33(7.28%) | 50(11.04%) | 188(41.5%) | 34(7.51%) | 340(75.06%) |
| 42(17.43%) | 33(13.69%) | 43(17.84%) | 92(38.17%) | 12(4.98%) | 222(92.12%) | |
| SE-DMTG | 33(9.14%) | 33(9.14%) | 69(19.11%) | 173(47.92%) | 23(6.37%) | 331(91.69%) |
| MPC-C | 93(20.39%) | 70(15.35%) | 110(24.12%) | 160(35.09%) | 7(1.54%) | 440(96.49%) |
| GCC-v | 11(3.53%) | 9(2.88%) | 28(8.97%) | 148(47.44%) | 29(9.29%) | 225(72.12%) |
| MP-AHSA | 75(14.71%) | 35(6.86%) | 90(17.65%) | 232(45.49%) | 27(5.29%) | 459(90.0%) |
The highest score of each row are shown in bold