| Literature DB >> 21223604 |
Yizhou Li1, Zhining Wen, Jiamin Xiao, Hui Yin, Lezheng Yu, Li Yang, Menglong Li.
Abstract
BACKGROUND: The rapid accumulation of data on non-synonymous single nucleotide polymorphisms (nsSNPs, also called SAPs) should allow us to further our understanding of the underlying disease-associated mechanisms. Here, we use complex networks to study the role of an amino acid in both local and global structures and determine the extent to which disease-associated and polymorphic SAPs differ in terms of their interactions to other residues.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21223604 PMCID: PMC3027113 DOI: 10.1186/1471-2105-12-14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The frequency distributions of a) degree; b) closeness; c) betweenness; and d) clustering coefficient for disease-associated and polymorphism SAPs. In Figure 1c, the x-axis indicates the betweenness values scaled by sequence length. Here, the intervals for the frequency calculation were set to 1, 0.03, 1 and 0.1.
Performance for each feature set by 5-fold cross-validation.
| Sensitivity (%) | Specificity (%) | ACC (%) | MCC | |
|---|---|---|---|---|
| All feature set | 89.8 | 72.7 | 83.0 | 0.64 |
| 37-feature set | 91.0 | 72.3 | 83.6 | 0.65 |
| f-set 1c | 79.7 | 65.4 | 74.1 | 0.45 |
| f-set 2d | 81.2 | 63.6 | 74.3 | 0.45 |
| f-set 3e | 85.4 | 67.0 | 78.1 | 0.54 |
Optimized parameters for random forest are listed in parentheses. Detailed description of each feature set is given in the Results section.
a the optimal value of ntree (the number of trees to be grown).
b the optimal value of mtry (the number of variables selected to determine the decision at a node of the tree).
c the conservation feature set.
d the network feature set.
e the environmental feature set.
Figure 2Importance score of each feature determined by using the random forest algorithm in the R package. Scores were averaged over 100 times. Here, suffices W, M and N indicate wild-type, mutant and environment, respectively. For environmental features, suffix numbers were used to indicate different neighboring residues. det_Frequency denotes the frequency difference between wild-type and mutant residues whereas det_PSSM denotes the PSSM score difference between wild-type and mutant residues.
Performance of different methods based on an independent dataset.
| Methods | Sensitivity (%) | Specificity (%) | ACC (%) | MCC |
|---|---|---|---|---|
| All feature set | 86.6 | 71.9 | 80.8 | 0.59 |
| 37-feature set | 87.3 | 72.1 | 81.3 | 0.60 |
| SIFT | 79.5 | 71.3 | 76.3 | 0.51 |
| PolyPhen-2 | 74.1 | 78.1 | 75.7 | 0.51 |
| Bongo | 21.6 | 84.7 | 46.6 | 0.09 |