| Literature DB >> 26504797 |
Daqing Zhang1, Jianfeng Xiao2, Nannan Zhou3, Mingyue Zheng2, Xiaomin Luo2, Hualiang Jiang4, Kaixian Chen2.
Abstract
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration.Entities:
Mesh:
Year: 2015 PMID: 26504797 PMCID: PMC4609370 DOI: 10.1155/2015/292683
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Workflow of GA/SVM model for BBB penetration prediction.
Figure 2Workflow of genetic algorithms.
Figure 3Encoding of the chromosome.
Figure 4Mating strategy of GA.
Performance comparison of models with different number of features.
| Number of features | Training (CV = 10) | Prediction/ | Parameters of SVM | ||||
|---|---|---|---|---|---|---|---|
| MSE |
| Test set | Training set |
|
|
| |
| 4 | 0.1197 | 0.674 | 0.722 | 0.740 | 38.8833 | 0.6081 | 0.1491 |
| 5 | 0.1042 | 0.715 | 0.770 | 0.805 | 16.3419 | 0.7973 | 0.2743 |
| 6 | 0.0945 | 0.744 | 0.840 | 0.829 | 13.3573 | 0.7158 | 0.1513 |
| 7 | 0.0959 | 0.74 | 0.821 | 0.843 | 34.3067 | 0.5218 | 0.1595 |
| 8 | 0.0883 | 0.761 | 0.834 | 0.883 | 60.9596 | 0.5871 | 0.2357 |
| 9 | 0.0815 | 0.777 | 0.847 | 0.864 | 3.7770 | 0.8764 | 0.1663 |
| 10 | 0.0823 | 0.776 | 0.858 | 0.903 | 15.2236 | 0.6247 | 0.1434 |
| 11 | 0.0714 | 0.804 | 0.861 | 0.891 | 5.6937 | 0.6531 | 0.1573 |
| 12 | 0.0780 | 0.787 | 0.864 | 0.905 | 7.2787 | 0.7428 | 0.1515 |
| 13 | 0.0817 | 0.778 | 0.862 | 0.922 | 4.1957 | 0.7791 | 0.1574 |
| 14 | 0.0812 | 0.778 | 0.882 | 0.917 | 14.8391 | 0.5002 | 0.2054 |
| 15 | 0.0734 | 0.799 | 0.870 | 0.919 | 4.9915 | 0.5231 | 0.1077 |
Figure 5(a) Performance comparison of models with different number of features. (b) Evolution of the best 6-feature model.
Comparison of most relevant QSAR studies on BBB permeability.
| Descriptors |
|
| Methods |
| Predictive accuracy on test set | Reference |
|---|---|---|---|---|---|---|
| Δlop | 20 | — | Linear Regression | 0.69 | — | Young et al. [ |
| Excess molar refraction, dipolarity/polarisability, H-bond acidity, and basicity | 148 | 30 | LFER | 0.75 |
| Platts et al. [ |
|
Δ | 55 | — | Linear Regression | 0.82 | — | Lombardo et al. [ |
| PSA, the octanol/water partition coefficient, and the conformational flexibility | 56 | 7 | MLR | 0.85 |
| Iyer et al. [ |
| CODESSA/DRAGON (482) | 200 | 110 | PLS | 0.83 |
| Golmohammadi et al. [ |
| Molecular (CODESSA-PRO) descriptors (5) | 113 | 19 | MLR | 0.78 |
| Katritzky et al. [ |
| Molecular fragment (ISIDA) descriptors | 112 | 19 | MLR | 0.90 |
| Katritzky et al. [ |
| PSA, log | 144 | 10 | Combinatorial QSAR (KNN | 0.91 |
| Zhang et al. [ |
| Abraham solute descriptors and indicators | 328 | — | LFER | 0.75 | — | Abraham et al. [ |
| Abraham solute descriptors and indicators | 164 | 164 | LFER | 0.71 |
| Abraham et al. [ |
| CODESSA/Marvin/indicator (6) | 260 | 63 | GA based SVM | 0.83 |
| This research, GA/SVM, final model |
| CODESSA/Marvin/indicator (236) | 260 | 63 | GA based SVM | 0.97 |
| This research, Grid/SVM |
| CODESSA/Marvin/indicator (6) | 260 | 63 | GA based SVM | 0.86 |
| This research, Grid/SVM |
Figure 6Prediction accuracy of the final model on training set (a) and test set (b).
Features used in the final model.
| Name | Meaning |
|---|---|
| M_log | log |
| HA_dependent_HDSA-2_[Zefirov's_PC] | H-bond donor surface area related (CODESSA) |
| M_PSA_7.4 | PSA at pH 7.4 (Marvin) |
| AbsCarboxy | Carboxylic acid indicator (Abraham) |
| HA_dependent_HDCA-2/SQRT(TMSA)_[Zefirov's_PC] | H-bond donor charged area related (CODESSA) |
| Average_Complementary_Information_content_(order_0) | Topology descriptor (CODESSA) |
The most frequently used features for all 6-feature modelsa.
| Number | Feature name | Occurrence | Meaning |
|---|---|---|---|
| 11 | AbsCarboxy | 36 | Indicator for carboxylic acid† |
|
| |||
| 268 | ESP-FHASA_Fractional_HASA_(HASA/TMSA)_Quantum-Chemical_PC | 14 | H-acceptor surface area/total molecular surface area# |
|
| |||
| 101 | Topographic_electronic_index_(all_bonds)_Zefirov's_PC | 12 | Topological electronic index for all bonded pairs of atomsb‡ |
|
| |||
| 8 | M_PSA_7.4 | 11 | PSA at pH 7.4c§ |
|
| |||
| 267 | ESP-HASA_H-acceptors_surface_area_Quantum-Chemical_PC | 10 | H-acceptor surface area# |
|
| |||
| 5 | delta_log | 9 | log |
|
| |||
| 7 | M_PSA_7.0 | 9 | PSA at pH 7.0§ |
|
| |||
| 138 | HA_dependent_HDCA-2_[Zefirov's_PC] | 9 | H-donors charged surface area# |
|
| |||
| 6 | M_PSA_6.5 | 8 | PSA at pH 6.5§ |
|
| |||
| 1 | M_log | 7 | log |
aRows with the same symbol could be categorized into the same group.
bTopological electronic index is a feature to characterize the distribution of molecular charge: T = ∑( (|q − q |/r 2 ), where q is net charge on ith atom and r is the distance between two bonded atoms.
c7.4 is the pH in blood.
d6.5 is the pH in intestine.
e D is the ratio of the sum of the concentrations of all species of a compound in octanol to the sum of the concentrations of all species of the compound in water. For neutral compounds, logD is equal to logP.
Figure 7Top features for all 6-feature models (50 in all).
Most frequently used features for all top models (number of features range from 4 to 15).
| Number | Descriptor name | Occurrence | Meaning |
|---|---|---|---|
| 11 | AbsCarboxy | 84 | Indicator for carboxylic acid† |
|
| |||
| 5 | delta_log | 35 | log |
|
| |||
| 7 | M_PSA_7.0 | 32 | PSA at pH 7.0§ |
|
| |||
| 1 | M_log | 28 | log |
|
| |||
| 138 | HA_dependent_HDCA-2_[Zefirov's_PC] | 27 | H-donors charged surface area# |
|
| |||
| 268 | ESP-FHASA_Fractional_HASA_(HASA/TMSA)_Quantum-Chemical_PC | 27 | H-acceptor surface area/total molecular surface area# |
|
| |||
| 6 | M_PSA_6.5 | 25 | PSA at pH 6.5§ |
|
| |||
| 167 | PPSA-1_Partial_positive_surface_area_[Quantum-Chemical_PC] | 25 | Partial positive surface area§ |
|
| |||
| 101 | Topographic_electronic_index_(all_bonds)_Zefirov's_PC | 23 | Topological electronic index for all bonded pairs of atoms‡ |
|
| |||
| 8 | M_PSA_7.4 | 21 | PSA at pH 7.4§ |
Rows with the same symbol could be categorized into the same group.
Figure 8The most frequently used features for all top models.