| Literature DB >> 30087694 |
Bo Xu1,2, Yu Liu1, Chi Lin1,2, Jie Dong3, Xiaoxia Liu3, Zengyou He1,2.
Abstract
Identifying protein complexes from protein-protein interaction networks (PPINs) is important to understand the science of cellular organization and function. However, PPINs produced by high-throughput studies have high false discovery rate and only represent snapshot interaction information. Reconstructing higher quality PPINs is essential for protein complex identification. Here we present a Multi-Level PPINs reconstruction (MLPR) method for protein complexes detection. From existing PPINs, we generated full combinations of every two proteins. These protein pairs are represented as a vector which includes six different sources. Then the protein pairs with same vector are mapped to the same fingerprint ID. A fingerprint similarity network is constructed next, in which a vertex represents a protein pair fingerprint ID and each vertex is connected to its top 10 similar fingerprints by edges. After random walking on the fingerprints similarity network, each vertex got a score at the steady state. According to the score of protein pairs, we considered the top ranked ones as reliable PPI and the score as the weight of edge between two distinct proteins. Finally, we expanded clusters starting from seeded vertexes based on the new weighted reliable PPINs. Applying our method on the yeast PPINs, our algorithm achieved higher F-value in protein complexes detection than the-state-of-the-art methods. The interactions in our reconstructed PPI network have more significant biological relevance than the exiting PPI datasets, assessed by gene ontology. In addition, the performance of existing popular protein complexes detection methods are significantly improved on our reconstructed network.Entities:
Keywords: PPI network; PPI prediction; bioinformatics; network reconstruction; protein complex
Year: 2018 PMID: 30087694 PMCID: PMC6067004 DOI: 10.3389/fgene.2018.00272
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The working flow of our method.
Figure 2High-level reconstructed network. The first level is the existing PPI networks. The second level is the protein pairs annotated with six sources. The third level is the protein pair fingerprints similarity network.
The basic statistical information of different datasets.
| BioGRID | 5,640 | 59,748 |
| Collins | 1,622 | 9,074 |
| DIP | 4,928 | 17,201 |
| Gavin | 1,430 | 6,531 |
| KroganCore | 2,708 | 7,123 |
| KroganExtended | 3,672 | 14,317 |
The relevance of Protein pairs in different datasets.
| TOP6000 | 0.995667 | 0.994168 | 0.812531 |
| TOP7000 | 0.991143 | 0.992 | 0.798143 |
| TOP8000 | 0.98588 | 0.989379 | 0.786205 |
| TOP9000 | 0.977005 | 0.985892 | 0.782048 |
| TOP10000 | 0.9651 | 0.9779 | 0.778 |
| TOP11000 | 0.956455 | 0.970909 | 0.773364 |
| TOP12000 | 0.951083 | 0.967 | 0.757 |
| TOP13000 | 0.942385 | 0.958692 | 0.742077 |
| TOP14000 | 0.933286 | 0.949429 | 0.728571 |
| TOP15000 | 0.9256 | 0.941133 | 0.7178 |
| TOP16000 | 0.917625 | 0.933063 | 0.710625 |
| BioGRID | 0.782369 | 0.816847 | 0.593902 |
| Collins | 0.96793 | 0.971126 | 0.73672 |
| DIP | 0.791407 | 0.740771 | 0.541248 |
| Gavin | 0.904942 | 0.897901 | 0.656148 |
| KroganCore | 0.83083 | 0.834901 | 0.603959 |
| KroganExtended | 0.783614 | 0.802542 | 0.579613 |
Figure 3The performance of our MLPR method on our reconstructed PPINs.
Figure 4The performance of our MLPR method on our reconstructed PPINs.
Figure 9The performances comparison between our method and other five methods on Collins dataset.
Figure 10The F-value of our method and other five methods on our reconstructed networks.
Figure 12The recall of our method and other five methods on our reconstructed networks.
Figure 13The false positive protein complexes which have low P-value and high local density.