| Literature DB >> 23533536 |
Jing Sun1, Runyu Jing, Di Wu, Tuanfei Zhu, Menglong Li, Yizhou Li.
Abstract
The main objective of this study is to explore the contribution of complex network together with its different definitions of vertexes and edges to describe the structure of proteins. Protein folds into a specific conformation for its function depending on interactions between residues. Consequently, in many studies, a protein structure was treated as a complex system comprised of individual components residues, and edges were interactions between residues. What is the proper time for representing a protein structure as a network? To confirm the effect of different definitions of vertexes and edges in constructing the amino acid interaction networks, protein domains and the structural unit of proteins were described using this method. The identification performance of 2847 proteins with domain/domains proved that the structure of proteins was described well when R(C)(α) was around 5.0-7.5 Å, and the optimal cutoff value for constructing the protein structure networks was 5.0 Å (C(α) -C(α) distances) while the ideal community division method was community structure detection based on edge betweenness in this study.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23533536 PMCID: PMC3600232 DOI: 10.1155/2013/365410
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
The composition of proteins contained in the dataset.
| Number of domains | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| Number of proteins | 1450 | 1077 | 230 | 66 | 19 | 3 | 2 |
Figure 1The amino acid residues interaction network. PDB code 1BZ7, chain A. The 3D structure of which is shown above together with its side, top and upward view. Here, the vertex is defined as C, and the edge is C-C distances which is set at 7.5 Å. Each point in the figure represents an amino acid in the chain, which is also the vertex of the network. Ligatures between the vertices are the edge of the network, which illustrate the interaction between the amino acids. For contrasting the figure of community division with this complex network, each vertex is colored based on its identity in SCOP. Here, reddish purple and blue represent different domain regions in this chain.
Figure 2The flowchart of the amino acid interaction network together with community division method. PDB code 1BZ7, chain A. Each point in the figure represents an amino acid in the chain, which is also the vertex of the network. Ligatures between the vertices are the edge of the network, which illustrate the interaction between the amino acids. Here, the reddish purple and blue represent different domain regions in this chain based on the identity in SCOP. Firstly, an amino acid complex network was constructed with the vertex defined as C, and the edge as C-C distance which was set at 7.5 Å, as shown in (a). Secondly, community division was based on edge betweenness, and the first edge with the highest edge betweenness score was removed, as shown in (b). Thirdly, more edges were removed based on the algorithm, and (c) shows that three edges were removed. Fourthly, the community division was finished when the correct number of edges was removed, as shown in (d); two different domains have been clearly separated, and five edges were removed for this protein. Finally, if the community division is taken continually, more communities will be found in the complex network. (e) shows the result of community division for chain A of protein 1BZ7 after removing 500 edges in this complex network, and many more communities illustrate the wrong results according to the identity in SCOP.
The accuracies of all proteins defined by R C based on edge betweenness.
| Threshold | Accuracy |
|---|---|
| 3 Å | 2.15 |
| 3.5 Å | 2.17 |
| 4 Å | 78.96 |
| 4.5 Å | 83.42 |
| 5 Å | 86.68 |
| 5.5 Å | 86.45 |
| 6 Å | 85.54 |
| 6.5 Å | 85.76 |
| 7 Å | 86.21 |
| 7.5 Å | 85.92 |
| 8 Å | 85.21 |
| 8.5 Å | 84.75 |
| 9 Å | 84.28 |
| 9.5 Å | 83.71 |
| 10 Å | 83.86 |
The accuracies of all proteins defined by R cent based on edge betweenness.
| Threshold | Accuracy |
|---|---|
| 3 Å | 2.14 |
| 3.5 Å | 2.59 |
| 4 Å | 3.79 |
| 4.5 Å | 7.42 |
| 5 Å | 33.99 |
| 5.5 Å | 78.87 |
| 6 Å | 84.53 |
| 6.5 Å | 85.04 |
| 7 Å | 85.16 |
| 7.5 Å | 85.52 |
| 8 Å | 84.89 |
| 8.5 Å | 84.48 |
| 9 Å | 83.83 |
| 9.5 Å | 83.56 |
| 10 Å | 83.40 |
The accuracies of all proteins defined by R atom based on edge betweenness.
| Threshold | Accuracy |
|---|---|
| 0 Å | 85.06 |
| 0.5 Å | 85.36 |
| 1.0 Å | 85.58 |
| 1.5 Å | 85.59 |
| 2 Å | 85.06 |
| 2.5 Å | 84.39 |
| 3 Å | 83.73 |
| 3.5 Å | 83.50 |
| 4 Å | 83.95 |
| 4.5 Å | 83.93 |
| 5 Å | 83.51 |
| 5.5 Å | 83.45 |
| 6 Å | 83.31 |
Acc (R C) and Acc (R cent) of all proteins based on random walks under 7 Å of different step sizes.
| Step size | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|
| Acc ( | 77.37 | 78.56 | 79.84 | 80.21 | 80.93 | 81.23 | 81.43 | 81.93 |
| Acc ( | 76.39 | 77.62 | 78.56 | 79.12 | 79.64 | 80.05 | 80.13 | 80.70 |
The accuracies of all proteins defined by R C based on random walks.
| Threshold | Accuracy |
|---|---|
| 3 Å | 0 |
| 3.5 Å | 0 |
| 4 Å | 67.14 |
| 4.5 Å | 69.65 |
| 5 Å | 73.84 |
| 5.5 Å | 79.87 |
| 6 Å | 80.39 |
| 6.5 Å | 81.09 |
| 7 Å | 81.93 |
| 7.5 Å | 81.85 |
| 8 Å | 80.97 |
| 8.5 Å | 80.48 |
| 9 Å | 80.46 |
| 9.5 Å | 79.95 |
| 10 Å | 79.71 |
The accuracies of all proteins defined by R cent based on random walks.
| Threshold | Accuracy |
|---|---|
| 3 Å | 0 |
| 3.5 Å | 0 |
| 4 Å | 0 |
| 4.5 Å | 0 |
| 5 Å | 0 |
| 5.5 Å | 5.05 |
| 6 Å | 59.20 |
| 6.5 Å | 78.34 |
| 7 Å | 80.63 |
| 7.5 Å | 80.63 |
| 8 Å | 80.77 |
| 8.5 Å | 80.20 |
| 9 Å | 79.60 |
| 9.5 Å | 79.64 |
| 10 Å | 79.41 |
The accuracies of all proteins defined by R atom based on random walks.
| Threshold | Accuracy |
|---|---|
| 0 Å | 80.39 |
| 0.5 Å | 80.58 |
| 1.0 Å | 80.82 |
| 1.5 Å | 80.70 |
| 2 Å | 80.79 |
| 2.5 Å | 80.08 |
| 3 Å | 79.55 |
| 3.5 Å | 79.35 |
| 4 Å | 79.24 |
| 4.5 Å | 78.98 |
| 5 Å | 78.68 |
| 5.5 Å | 78.36 |
| 6 Å | 77.49 |
The optimal accuracies of each dataset based on edge betweenness.
| Dataset | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
|
| 7.00 Å | 5.50 Å | 5.50 Å | 5.00 Å | 5.50 Å | 5.00 Å | 5.50 Å | 5.50 Å |
| Accuracy | 84.67 | 89.08 | 87.07 | 86.52 | 87.35 | 87.26 | 86.95 | 86.50 |
|
| ||||||||
|
| 6.50 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.50 Å |
| Accuracy | 82.51 | 86.93 | 86.50 | 85.74 | 86.17 | 86.58 | 85.85 | 85.49 |
|
| ||||||||
|
| 1.00 Å | 1.00 Å | 0.50 Å | 1.00 Å | 1.50 Å | 1.00 Å | 1.00 Å | 1.00 Å |
| Accuracy | 82.89 | 87.54 | 86.24 | 86.13 | 86.94 | 86.61 | 85.61 | 85.80 |
The optimal accuracies of each dataset based on random walks.
| Dataset | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|---|
|
| 6.00 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.50 Å | 7.00 Å | 7.50 Å | 7.00 Å |
| Step size | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| Accuracy | 75.34 | 85.00 | 82.46 | 81.61 | 83.20 | 83.39 | 82.25 | 81.93 |
|
| ||||||||
|
| 7.00 Å | 7.00 Å | 8.00 Å | 8.00 Å | 8.00 Å | 8.00 Å | 7.50 Å | 7.00 Å |
| Step size | 10 | 10 | 10 | 10 | 9 | 10 | 10 | 10 |
| Accuracy | 74.62 | 84.95 | 80.97 | 80.89 | 81.84 | 82.67 | 80.61 | 80.79 |
|
| ||||||||
|
| 0.50 Å | 1.50 Å | 0.50 Å | 1.00 Å | 1.50 Å | 1.00 Å | 1.00 Å | 1.00 Å |
| Step size | 10 | 10 | 10 | 9 | 10 | 10 | 10 | 10 |
| Accuracy | 74.85 | 84.66 | 81.20 | 81.11 | 82.36 | 82.97 | 81.45 | 80.95 |