| Literature DB >> 24466214 |
Bi-Qing Li1, Yu-Chao Zhang2, Guo-Hua Huang3, Wei-Ren Cui4, Ning Zhang5, Yu-Dong Cai3.
Abstract
Aptamers are oligonucleic acid or peptide molecules that bind to specific target molecules. As a novel and powerful class of ligands, aptamers are thought to have excellent potential for applications in the fields of biosensing, diagnostics and therapeutics. In this study, a new method for predicting aptamer-target interacting pairs was proposed by integrating features derived from both aptamers and their targets. Features of nucleotide composition and traditional amino acid composition as well as pseudo amino acid were utilized to represent aptamers and targets, respectively. The predictor was constructed based on Random Forest and the optimal features were selected by using the maximum relevance minimum redundancy (mRMR) method and the incremental feature selection (IFS) method. As a result, 81.34% accuracy and 0.4612 MCC were obtained for the training dataset, and 77.41% accuracy and 0.3717 MCC were achieved for the testing dataset. An optimal feature set of 220 features were selected, which were considered as the ones that contributed significantly to the interacting aptamer-target pair predictions. Analysis of the optimal feature set indicated several important factors in determining aptamer-target interactions. It is anticipated that our prediction method may become a useful tool for identifying aptamer-target pairs and the features selected and analyzed in this study may provide useful insights into the mechanism of interactions between aptamers and targets.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24466214 PMCID: PMC3899287 DOI: 10.1371/journal.pone.0086729
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1IFS curves showing the values of MCC against different number of features used based on the data in Additional File S5.
When the first 220 features in the ranked feature list were used, MCC reached the maximum of 0.4612. These 220 features were considered as composing the optimal feature set for our prediction problem.
Prediction performance on training dataset and testing dataset.
| Dataset | Sn | Sp | Ac | MCC |
| Training dataset | 0.4879 | 0.9218 | 0.8134 | 0.4612 |
| Testing dataset | 0.4828 | 0.8713 | 0.7741 | 0.3717 |
Sn: sensitivity.
Sp: specificity.
Ac: accuracy.
MCC: Matthews correlation coefficient.
Figure 2The feature extraction procedure of aptamer-target pair 21402046-AlkB-12: AlkB.
Sequences of the aptamer and the target were shown, from which 290 features were extracted. Finally, 220 features were selected as the final optimal feature set from the 290 features by the IFS procedure, composing a 220-dimentional vector as input of the model.
Figure 3Histograms showing the distributions of the 220 optimal features.