| Literature DB >> 33985429 |
Jiancheng Zhong1,2, Chao Tang1, Wei Peng3, Minzhu Xie1, Yusui Sun1, Qiang Tang4, Qiu Xiao5, Jiahong Yang6.
Abstract
BACKGROUND: Some proposed methods for identifying essential proteins have better results by using biological information. Gene expression data is generally used to identify essential proteins. However, gene expression data is prone to fluctuations, which may affect the accuracy of essential protein identification. Therefore, we propose an essential protein identification method based on gene expression and the PPI network data to calculate the similarity of "active" and "inactive" state of gene expression in a cluster of the PPI network. Our experiments show that the method can improve the accuracy in predicting essential proteins.Entities:
Keywords: Edge clustering coefficient; Essential proteins; Jaccard similarity index; The PPI networks
Mesh:
Substances:
Year: 2021 PMID: 33985429 PMCID: PMC8120700 DOI: 10.1186/s12859-021-04175-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1An illustration of JDC
Fig. 2ROC curves and AUC values of the JDC method and other methods using the individual features. a Yeast data. b E.coli data
Fig. 3ROC curves and AUC values of the JDC method and other methods using the individual features in the top 20% ranked proteins. a Yeast data. b E.coli data
SN, SP, FPR, PPV, NPV, F-measure, ACC and MCC of Various Methods on Total Ranked Proteins
| Methods | SN | SP | FPR | PPV | NPV | F- | ACC | MCC |
|---|---|---|---|---|---|---|---|---|
| JDC | ||||||||
| DC | 0.4002 | 0.8217 | 0.1783 | 0.4002 | 0.8217 | 0.4002 | 0.7251 | 0.2219 |
| BC | 0.3505 | 0.8069 | 0.1931 | 0.3505 | 0.8069 | 0.3505 | 0.7023 | 0.1574 |
| CC | 0.3548 | 0.8082 | 0.1918 | 0.3548 | 0.8082 | 0.3548 | 0.7043 | 0.163 |
| SC | 0.3676 | 0.812 | 0.188 | 0.3676 | 0.812 | 0.3676 | 0.7102 | 0.1796 |
| EC | 0.3676 | 0.812 | 0.188 | 0.3676 | 0.812 | 0.3676 | 0.7102 | 0.1796 |
| IC | 0.401 | 0.822 | 0.178 | 0.401 | 0.822 | 0.401 | 0.7255 | 0.223 |
| NC | 0.4353 | 0.8321 | 0.1679 | 0.4353 | 0.8321 | 0.4353 | 0.7412 | 0.2674 |
| PeC | 0.4036 | 0.8227 | 0.1773 | 0.4036 | 0.8227 | 0.4036 | 0.7267 | 0.2263 |
| WDC | 0.4576 | 0.839 | 0.161 | 0.458 | 0.8388 | 0.4578 | 0.7516 | 0.2967 |
Fig. 4Compares the top 1%, 5%, 10%, 15%, 20% and 25% of essential proteins obtained by JDC with other methods in yeast data. a TOP1%. b Top5%. c Top10%. d Top15%. e Top20%. f Top25%
Fig. 5Compares the top 1%, 5%, 10%, 15%, 20% and 25% of essential proteins obtained by JDC with other methods in E.coli data. a TOP1%. b Top5%. c Top10%. d Top15%. e Top20%. f Top25%
The overlapping relationships between JDC and nine other prediction measures for the top 100 proteins
| Centrality | Non-essential proteins of | Non-essential proteins of | Percentage of essential proteins of | Percentage of essential proteins of JDC in | |
|---|---|---|---|---|---|
| DC | 16 | 46 | 15 | 45.24 | 82.14 |
| IC | 17 | 46 | 18 | 44.58 | 78.31 |
| EC | 8 | 61 | 18 | 33.70 | 80.43 |
| SC | 8 | 61 | 18 | 33.70 | 80.43 |
| BC | 15 | 49 | 18 | 42.35 | 78.82 |
| CC | 13 | 52 | 17 | 40.23 | 80.46 |
| NC | 36 | 34 | 14 | 46.88 | 78.13 |
| PeC | 67 | 12 | 8 | 63.64 | 75.76 |
| WDC | 55 | 20 | 12 | 55.56 | 73.33 |
Fig. 6Jackknife curve of various prediction methods. a Yeast data. b E.coli data
Fig. 7The modularity of interactions among the top 100 essential proteins predicted by JDC and WDC
Accurate analysis of the number of essential proteins predicted by JDC, PeC and WDC on Fly and Human network
| Methods name | Top100 | Top200 | Top300 | Top400 | Top500 | T600 | |
|---|---|---|---|---|---|---|---|
| Fly | JDC | 79 | 85 | ||||
| PeC | 46 | 52 | 58 | 66 | 70 | 73 | |
| WDC | 43 | 64 | 68 | 73 | |||
| Human Colon | JDC | 93 | 438 | ||||
| PeC | 182 | 272 | 357 | 522 | |||
| WDC | 87 | 178 | 271 | 355 | 435 | 512 | |
| Human Liver | JDC | 437 | |||||
| PeC | 176 | 352 | 516 | ||||
| WDC | 83 | 171 | 258 | 345 | 430 | 509 |
Accurate analysis of the number of essential proteins predicted by various central methods in the dynamic network of NF-PIN with JDC
| Centrality | Top100 | Top200 | Top300 | Top400 | Top500 | T600 | Exceed times |
|---|---|---|---|---|---|---|---|
| JDC | 80 | ||||||
| NF-DC | 55 | 111 | 167 | 221 | 261 | 303 | 0 |
| NF-EC | 55 | 110 | 157 | 202 | 239 | 276 | 0 |
| NF-SC | 55 | 116 | 161 | 204 | 239 | 276 | 0 |
| NF-BC | 50 | 97 | 133 | 188 | 226 | 254 | 0 |
| NF-CC | 45 | 87 | 122 | 161 | 193 | 230 | 0 |
| NF-IC | 55 | 111 | 167 | 221 | 261 | 303 | 0 |
| NF-LAC | 141 | 198 | 243 | 280 | 322 | 1 | |
| NF-NC | 80 | 147 | 197 | 252 | 290 | 324 | 0 |
Accurate analysis of the number of essential proteins predicted by various central methods in the dynamic network of TS-PIN with JDC
| Centrality | Top100 | Top200 | Top300 | Top400 | Top500 | T600 | Exceed times |
|---|---|---|---|---|---|---|---|
| JDC | 80 | ||||||
| TS-DC | 71 | 143 | 198 | 250 | 297 | 347 | 0 |
| TS-EC | 71 | 143 | 209 | 259 | 300 | 334 | 0 |
| TS-SC | 78 | 144 | 210 | 266 | 308 | 351 | 0 |
| TS-BC | 55 | 117 | 165 | 215 | 252 | 287 | 0 |
| TS-CC | 55 | 114 | 173 | 221 | 273 | 326 | 0 |
| TS-IC | 71 | 143 | 198 | 247 | 297 | 347 | 0 |
| TS-LAC | 138 | 196 | 246 | 300 | 350 | 1 | |
| TS-NC | 82 | 142 | 200 | 253 | 301 | 350 | 0 |