| Literature DB >> 31881842 |
Jialu Hu1,2, Junhao He1, Jing Li3, Yiqun Gao1, Yan Zheng1, Xuequn Shang4.
Abstract
Proteins play essential roles in almost all life processes. The prediction of protein function is of significance for the understanding of molecular function and evolution. Network alignment provides a fast and effective framework to automatically identify functionally conserved proteins in a systematic way. However, due to the fast growing genomic data, interactions and annotation data, there is an increasing demand for more accurate and efficient tools to deal with multiple PPI networks. Here, we present a novel global alignment algorithm NetCoffee2 based on graph feature vectors to discover functionally conserved proteins and predict function for unknown proteins. To test the algorithm performance, NetCoffee2 and three other notable algorithms were applied on eight real biological datasets. Functional analyses were performed to evaluate the biological quality of these alignments. Results show that NetCoffee2 is superior to existing algorithms IsoRankN, NetCoffee and multiMAGNA++ in terms of both coverage and consistency. The binary and source code are freely available under the GNU GPL v3 license at https://github.com/screamer/NetCoffee2.Entities:
Keywords: Functional conserved proteins; Network alignment; Optimization; PPI networks; Simulated annealing
Mesh:
Substances:
Year: 2019 PMID: 31881842 PMCID: PMC6933650 DOI: 10.1186/s12864-019-6302-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The calculation of similarity matrix between two networks G1 and G2. a A 5-tuple-feature vector (γ,σ,τ,η,θ) was calculated on each node. Here, the vector of γ, (1,0.88,0.33,0.75,1) , is the normalized major eigenvector of the adjacent matrix of the graph. Vectors of σ and η are the number of 1-step neighbors and 2-step neighbors for each node. Vectors of τ and θ describe the influence of each node to their 1-step neighbors and 2-step neighbors. b Vectors of σ,τ,η,θ were normalized by its maximal element. c The similarity matrix was calculated by a Gaussian-based similarity measure . Here, u and v is a pair of nodes, and x is the Euclidean distance between the two feature vectors of u and v
Statistics of PPI networks of five species: mus musculus (MM), saccharomyces cerevisiae (SC), drosophila melanogaster (DM), arabidopsis thaliana (AT) and homo sapiens (HS)
| Species | NO.nodes | NO.edges | BP Ann.(%) | MF Ann.(%) | CC Ann.(%) |
|---|---|---|---|---|---|
| MM | 3611 | 4704 | 87.03 | 87.59 | 88.00 |
| SC | 5708 | 42674 | 94.55 | 94.48 | 90.94 |
| DM | 8715 | 26362 | 65.81 | 64.46 | 64.30 |
| AT | 5665 | 19247 | 84.99 | 78.78 | 78.44 |
| HS | 17344 | 100589 | 70.14 | 71.86 | 72.95 |
Functional annotations of proteins are collected, which include biological process (BP), molecular function (MF) and cellular component (CC)
Algorithms performance were tested on eight datasets, which were represented as D1, D2,..., D8
| Species | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 |
|---|---|---|---|---|---|---|---|---|
| MM | ||||||||
| SC | ||||||||
| DM | ||||||||
| AT | ||||||||
| HS |
Fig. 2Coverage of NetCoffee, IsoRankN, multiMAGNA++, and NetCoffee2 on eight test datasets. Coverage was measured by the percentage of aligned proteins in alignments
Consistency was measured by mean entropy (ME) and mean normalized entropy (MNE)
| Algorithm | Consistency | D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | Average |
|---|---|---|---|---|---|---|---|---|---|---|
| isoRankN | ME | 1.09 | 1.07 | 1.15 | 1.07 | 1.18 | 1.20 | 1.19 | 1.20 | 1.144 |
| MNE | 0.58 | 0.56 | 0.58 | 0.59 | 0.53 | 0.60 | 0.60 | 0.58 | 0.58 | |
| NetCoffee | ME | * | * | 0.99 | 0.85 | 1.05 | 1.00 | 1.07 | 1.17 | 1.022 |
| MNE | * | * | 0.54 | 0.54 | 0.53 | 0.55 | 0.58 | 0.57 | 0.55 | |
| multiMAGNA++ | ME | 0.94 | 0.94 | 0.91 | 0.93 | 0.98 | 1.00 | 1.16 | 1.18 | 1.005 |
| MNE | 0.55 | 0.54 | 0.53 | 0.58 | 0.52 | 0.57 | 0.63 | 0.59 | 0.56 | |
| NetCoffee2 | ME | 1.04 | 0.73 | 0.94 | 0.87 | 1.04 | 1.05 | 1.01 | 1.10 | 0.973 |
| MNE | 0.54 | 0.46 | 0.52 | 0.54 | 0.52 | 0.55 | 0.56 | 0.55 | 0.53 |
Notably, a matchset is more functionally coherent when ME and MNE are smaller. There is no result of NetCoffee on D1 and D2, because it can not be applied to pairwise network alignment