| Literature DB >> 36078056 |
Anik Banik1, Souvik Podder1, Sovan Saha2, Piyali Chatterjee3, Anup Kumar Halder4,5, Mita Nasipuri6, Subhadip Basu6, Dariusz Plewczynski4,5.
Abstract
Proteins are vital for the significant cellular activities of living organisms. However, not all of them are essential. Identifying essential proteins through different biological experiments is relatively more laborious and time-consuming than the computational approaches used in recent times. However, practical implementation of conventional scientific methods sometimes becomes challenging due to poor performance impact in specific scenarios. Thus, more developed and efficient computational prediction models are required for essential protein identification. An effective methodology is proposed in this research, capable of predicting essential proteins in a refined yeast protein-protein interaction network (PPIN). The rule-based refinement is done using protein complex and local interaction density information derived from the neighborhood properties of proteins in the network. Identification and pruning of non-essential proteins are equally crucial here. In the initial phase, careful assessment is performed by applying node and edge weights to identify and discard the non-essential proteins from the interaction network. Three cut-off levels are considered for each node and edge weight for pruning the non-essential proteins. Once the PPIN has been filtered out, the second phase starts with two centralities-based approaches: (1) local interaction density (LID) and (2) local interaction density with protein complex (LIDC), which are successively implemented to identify the essential proteins in the yeast PPIN. Our proposed methodology achieves better performance in comparison to the existing state-of-the-art techniques.Entities:
Keywords: edge weight; essential protein; local interaction density; node weight; yeast PPIN
Mesh:
Substances:
Year: 2022 PMID: 36078056 PMCID: PMC9454873 DOI: 10.3390/cells11172648
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 7.666
Computational studies based on essential protein prediction.
| Utilized | Description | Database | References |
|---|---|---|---|
| Subcellular localization | An efficient method to identify essential proteins for different species by integrating protein subcellular | PPIN of | [ |
| Protein | A new method for predicting essential proteins based on participation degree in protein complex and subgraph Density. | PPIN of | [ |
| Orthology, gene expression, PPIN | Predicting essential proteins by integrating orthology, gene expressions, and PPIN. | PPIN of | [ |
| CC and orthology | United neighborhood closeness centrality and orthology for predicting essential proteins. | PPIN of | [ |
| Node, edge | Identification of essential proteins using improved node and edge clustering coefficient. | PPIN of | [ |
| Centrality scores | CytoNCA: a cytoscape plugin for centrality analysis and evaluation of protein interaction networks. | _ | [ |
| Protein | Identification of essential proteins based on a new combination of local interaction density and protein complexes. | PPIN of | [ |
| PPIN, | Prediction of essential proteins by integration of PPI network topology and protein complex information. | PPIN of | [ |
Figure 1Schematic diagram of computation of node weight. It retains proteins having maximum connectivity. Root node (protein) is denoted by orange while its corresponding neighbors (proteins) are highlighted in blue. The filtered-out nodes (proteins) are represented in red.
Figure 2Schematic diagram of computation of edge weight. Edge weight retains only the reliable edges in a PPIN. Edge weight has been calculated for the edges connected with the nodes (proteins) marked with pink color whereas the neighbors (proteins) and their connected edges are highlighted in blue color.
Figure 3Schematic diagram of computation of LIDC. It is a combination of 3 scores: (1) LID, (2) IDC, and (3) ranking score. Disconnected neighbors (proteins) are highlighted in blue color whereas inter-connected neighbors (proteins) are represented in pink color. Protein complex is represented in yellow.
Figure 4Essential and non-essential proteins in PPIN of yeast at a low cut-off. The yellow-colored proteins are the predicted non-essential ones, while the red ones are the predicted essential proteins. The blue-colored nodes represent proteins that are filtered out in the pre-filtering stage.
Figure 5Validation of proposed methodology. All the methods are compared using the jackknife methodology for six different ranking ranges (top 100–600 proteins).
Performance analysis of proposed method with other methodologies.
| Methods | Precision | Recall | F-Score |
|---|---|---|---|
| DC (Jeong et al. 2001) | 0.41 | 0.35 | 0.38 |
| BC (Joy et al. 2005) | 0.35 | 0.31 | 0.33 |
| NC (Jianxin Wang et al. 2012) | 0.46 | 0.40 | 0.43 |
| LID (Luo and Qi 2015) | 0.45 | 0.39 | 0.42 |
| PeC (Li et al. 2012) | 0.46 | 0.40 | 0.43 |
| CoEWC (Zhang et al. 2013) | 0.47 | 0.41 | 0.44 |
| WDC (Xiwei et al. 2014) | 0.48 | 0.42 | 0.45 |
| ION (Peng et al. 2012) | 0.53 | 0.41 | 0.46 |
| UC (Li et al. 2017) | 0.48 | 0.42 | 0.45 |
| LIDC (Luo and Qi 2015) | 0.50 | 0.44 | 0.47 |
| Proposed Methodology | 0.77 | 0.44 | 0.56 |
Network statistics of pruned PPIN of yeast at three levels of cut-offs.
| Cut-Off Levels | Proteins after Node Reduction | Interactions after Node Reduction | Proteins after Edge Reduction | Interactions after Node Reduction | Essential Protein | Non-Essential Protein |
|---|---|---|---|---|---|---|
| Low | 1393 | 14,063 | 985 | 3907 | 198 | 787 |
| Medium | 1374 | 13,924 | 969 | 3847 | 194 | 775 |
| High | 1340 | 13,714 | 931 | 3733 | 187 | 744 |
Performance analysis of our proposed method at three levels of cut-offs.
| Cut-Off Levels | Recall | Precision | F-Score |
|---|---|---|---|
| Low | 0.41 | 0.75 | 0.53 |
| Medium | 0.42 | 0.76 | 0.54 |
| High | 0.44 | 0.77 | 0.56 |