| Literature DB >> 26955638 |
ShaoPeng Wang1, Yu-Hang Zhang2, Jing Lu3, Weiren Cui4, Jerry Hu5, Yu-Dong Cai1.
Abstract
The development of biochemistry and molecular biology has revealed an increasingly important role of compounds in several biological processes. Like the aptamer-protein interaction, aptamer-compound interaction attracts increasing attention. However, it is time-consuming to select proper aptamers against compounds using traditional methods, such as exponential enrichment. Thus, there is an urgent need to design effective computational methods for searching effective aptamers against compounds. This study attempted to extract important features for aptamer-compound interactions using feature selection methods, such as Maximum Relevance Minimum Redundancy, as well as incremental feature selection. Each aptamer-compound pair was represented by properties derived from the aptamer and compound, including frequencies of single nucleotides and dinucleotides for the aptamer, as well as the constitutional, electrostatic, quantum-chemical, and space conformational descriptors of the compounds. As a result, some important features were obtained. To confirm the importance of the obtained features, we further discussed the associations between them and aptamer-compound interactions. Simultaneously, an optimal prediction model based on the nearest neighbor algorithm was built to identify aptamer-compound interactions, which has the potential to be a useful tool for the identification of novel aptamer-compound interactions. The program is available upon the request.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26955638 PMCID: PMC4756144 DOI: 10.1155/2016/8351204
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Distribution of the features investigated in this study.
| Feature type | Number of features |
|---|---|
| Features of aptamer | |
| Frequency of single nucleotide | 4 |
| Frequency of dinucleotide | 16 |
| Features of compound | |
| Constitutional | 24 |
| Electrostatic | 57 |
| Geometrical | 12 |
| Quantum-chemical | 171 |
| Topological | 37 |
| Total | 321 |
Distribution of the top 10% features in the MaxRel features list.
| Feature type | Number of features | Feature names |
|---|---|---|
| Features of aptamer | 0 | — |
|
| ||
| Constitutional | 2 | Number of double bonds; number of O atoms |
|
| ||
| Electrostatic | 11 | DPSA-1 difference in CPSAs (PPSA1-PNSA1) [Zefirov's PC]; HA dependent HDCA-2 [Zefirov's PC]; Max partial charge for H atom [Zefirov's PC]; PNSA-3 atomic charge weighted PNSA [Zefirov's PC]; HACA-2 [Zefirov's PC]; HACA-1 [Zefirov's PC]; min(#HA_#HD) [Zefirov's PC]; count of H-acceptor sites [Zefirov's PC]; HA dependent HDSA-1/TMSA [Zefirov's PC]; DPSA-3 difference in CPSAs (PPSA3-PNSA3) [Zefirov's PC]; HA dependent HDCA-1 [Zefirov's PC] |
|
| ||
| Geometrical | 0 | — |
|
| ||
| Quantum-chemical | 18 | Tot dipole of the molecule; tot point-charge comp. of the molecular dipole; ESP-HA dependent HDSA-2 [quantum-chemical PC]; ESP-HA dependent HDCA-2 [quantum-chemical PC]; ESP-HACA-2 [quantum-chemical PC]; HA dependent HDSA-2 [quantum-chemical PC]; final heat of formation; ESP-Max net atomic charge for H atom; ESP-DPSA-1 difference in CPSAs (PPSA1-PNSA1) [quantum-chemical PC]; HA dependent HDCA-2 [quantum-chemical PC]; HOMO - LUMO energy gap; ESP-HA dependent HDSA-1 [quantum-chemical PC]; min(#HA_#HD) [quantum-chemical PC]; ESP-count of H-acceptor sites [quantum-chemical PC]; ESP-min(#HA_#HD) [quantum-chemical PC]; count of H-acceptor sites [quantum-chemical PC]; DPSA-1 difference in CPSAs (PPSA1-PNSA1) [quantum-chemical PC]; HA dependent HDCA-1 [quantum-chemical PC] |
|
| ||
| Topological | 1 | Average structural information content (order 1) |
Figure 1The proportion of features listed in the top 10% of the MaxRel features list in each feature type.
Figure 2Four IFS curves plotted by taking MCC as the y-axis and the number of considered features as the x-axis for four basic prediction engines. The MCC values indicate the performance of various prediction models using different classifiers and different combination of features to represent interactions. It can be observed that using NNA as the classifier and the first 80 features in the mRMR features list to represent interactions can yield the best performance with the highest MCC value of 0.670.
Predicted results of some specific examples obtained by the optimal prediction model.
| Compound | Aptamer | Predicted class | True class |
|---|---|---|---|
| Arsenate | 20000526-arsenic-5 | Positive | Positive |
| Isoleucine | 15772067-isoleucine-1 | Positive | Positive |
| Dopamine | 9245404-dopamine-4 | Positive | Positive |
| Chitin | 10743940-chitin-5 | Positive | Positive |
| N-Acetylneuraminic acid | 23042406-Neu5Ac-1 | Positive | Positive |
| Isoleucine | 14980623-sialyllactose-1 | Positive | Negative |
| Dopamine | 18983163-ochratoxin A-3 | Positive | Negative |
| Chitin | 10786843-L tyrosine-3 | Positive | Negative |
| Tyrosine | 20000526-arsenic-Ma-1 | Positive | Negative |
| N-Acetylneuraminic acid | 21076782-L-tryptophan-1 | Positive | Negative |
Figure 3(a) The distribution of the 80 optimal features. (b) The proportion of features among the 80 optimal features in each feature type.