| Literature DB >> 29467815 |
Abstract
BACKGROUND: A profile-comparison method with position-specific scoring matrix (PSSM) is among the most accurate alignment methods. Currently, cosine similarity and correlation coefficients are used as scoring functions of dynamic programming to calculate similarity between PSSMs. However, it is unclear whether these functions are optimal for profile alignment methods. By definition, these functions cannot capture nonlinear relationships between profiles. Therefore, we attempted to discover a novel scoring function, which was more suitable for the profile-comparison method than existing functions, using neural networks.Entities:
Keywords: Derivative-free optimization; Dynamic programming; Evolutionary strategy; Neural network; Profile alignment
Year: 2018 PMID: 29467815 PMCID: PMC5815186 DOI: 10.1186/s13015-018-0123-6
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Fig. 1Schematic diagram of the learning network. Upper case letters in italics and bold, lowercase letters in italics and bold, and lowercase letters in italics represent matrix, vector, and scalar values, respectively. Here, and represent the input vector, 1, 1, and w2 are weight matrices and vectors, 1 and b2 are bias vectors and scalar values, is the middle layer vector, and y is the output value (the similarity score between PSSV A and PSSV B). The activating function is represented by φ(). The square bracket represents the index of each vector
Gap optimization of the existing scoring function
| Open | Extension | Sensitivity | Precision | |
|---|---|---|---|---|
| Cosine | − 1.0 | − 0.1 | 0.6837 | 0.6550 |
| CC | − 1.5 | − 0.1 | 0.6882 | 0.6613 |
Open and extension indicate optimized open- and extension-gap penalties, respectively, and cosine and CC represent aligners using cosine similarity and correlation coefficient as scoring functions, respectively
Comparison of Nepal with other alignment methods
| Remote | Medium | All | |
|---|---|---|---|
| Sensitivity | |||
| Nepal | 0.5317 | 0.8343 | 0.7012 |
| Cosine | 0.5045** | 0.8246** | 0.6838** |
| CC | 0.5135** | 0.8269** | 0.6891** |
| MIQS | 0.2775** | 0.7316** | 0.5319** |
| BL62 | 0.2333** | 0.6955** | 0.4923** |
| Precision | |||
| Nepal | 0.5031 | 0.8102 | 0.6751 |
| Cosine | 0.4753** | 0.7999** | 0.6571** |
| CC | 0.4858** | 0.8032** | 0.6636** |
| MIQS | 0.2654** | 0.7134** | 0.5164** |
| BL62 | 0.2317** | 0.6902** | 0.4885** |
Cosine, CC, MIQS, and BL62, indicate profile comparison methods with cosine similarity and correlation coefficient and sequence comparison methods with MIQS and BLOSUM62
** P < 0.01, Wilcoxon signed rank test with Bonferroni correction
aSequence identity (%) of each division
Fig. 2a Absolute connection weight for each attribute corresponding to the profile value of each amino acid. Filled and open bars represent positive and negative signs of the original connection weights, respectively. b The propensity for the residue to be buried within the protein
Fig. 3Transition of similarity scores depending on site swapping. In each panel, and represent PSSV A and B, respectively. The middle panel represents an original PSSV and similarity scores calculated using correlation coefficient (CC) and Nepal. The top and bottom panels show the resulting PSSVs and similarity scores