| Literature DB >> 28979308 |
Habib MotieGhader1, Sajjad Gharaghani2, Yosef Masoudi-Sobhanzadeh1, Ali Masoudi-Nejad1.
Abstract
Feature selection is of great importance in Quantitative Structure-Activity Relationship (QSAR) analysis. This problem has been solved using some meta-heuristic algorithms such as GA, PSO, ACO and so on. In this work two novel hybrid meta-heuristic algorithms i.e. Sequential GA and LA (SGALA) and Mixed GA and LA (MGALA), which are based on Genetic algorithm and learning automata for QSAR feature selection are proposed. SGALA algorithm uses advantages of Genetic algorithm and Learning Automata sequentially and the MGALA algorithm uses advantages of Genetic Algorithm and Learning Automata simultaneously. We applied our proposed algorithms to select the minimum possible number of features from three different datasets and also we observed that the MGALA and SGALA algorithms had the best outcome independently and in average compared to other feature selection algorithms. Through comparison of our proposed algorithms, we deduced that the rate of convergence to optimal result in MGALA and SGALA algorithms were better than the rate of GA, ACO, PSO and LA algorithms. In the end, the results of GA, ACO, PSO, LA, SGALA, and MGALA algorithms were applied as the input of LS-SVR model and the results from LS-SVR models showed that the LS-SVR model had more predictive ability with the input from SGALA and MGALA algorithms than the input from all other mentioned algorithms. Therefore, the results have corroborated that not only is the predictive efficiency of proposed algorithms better, but their rate of convergence is also superior to the all other mentioned algorithms.Entities:
Keywords: Drug Design; Feature Selection; Genetic Algorithm; Learning Automata; QSAR
Year: 2017 PMID: 28979308 PMCID: PMC5603862
Source DB: PubMed Journal: Iran J Pharm Res ISSN: 1726-6882 Impact factor: 1.696

Figure 1Learning automata connection with environment(16).
Figure 2Proposed Genetic Algorithm flowchart
Figure 3A sample of QSAR dataset and the relative random chromosome. Every feature, in dataset, is equal to a gene in chromosome. Gene value will be 1 if correspond feature is selected, and otherwise it will be 0
Figure 4Crossover operator. (A) Two new chromosomes before crossover. (B) Two random chromosomes after crossover.
Figure 5Mutation operator. (A) Resulted chromosome before mutation. (B) A random chromosome after mutation.
Figure 6An equivalent automaton for chromosome in Figure 3.
Figure 7Proposed Learning Automata flowchart
Figure 8An example of Reward and Penalize relation
Figure 9The trend of rewarding of feature f2
Figure 10The trend of penalizing feature f2.
Figure 11The trend of penalizing feature f1
Figure 12Proposed Mixed GA and LA flowchart
Compounds list, observed, predicted pIC50 values, and Basic structures of TTK inhibitors.
|
|
The letters of a, b, c, d, and e in the first column correspond to the basic structures TTK inhibitors and t refers to test set.
Results of algorithms for ten different Runs (Laufer et al. dataset
|
|
|
|
|
|---|---|---|---|
|
| |||
| Avg. | 0.8351 | 0.44203 | 70.1 |
| min | 0.8274 | 0.4369 | 65 |
| Max | 0.839 | 0.4523 | 76 |
| (Std.) | 0.0035 | 0.0048 | 2.982 |
| Best Result (feature names) | D/Dr05, MATS5m, MATS3v, ATS6e, SPAM, RDF035m, Mor08m, nCt | ||
|
| |||
| Avg. | 0.825 | 0.454 | 93.144 |
| min | 0.800 | 0.435 | 79.65 |
| Max | 0.840 | 0.486 | 106.5 |
| (Std.) | 0.010 | 0.014 | 8.736 |
| Best Result (feature names) | AMW, nCIR, RBN, DECC, BELp1, Mor17u, E3u, R1p+ | ||
|
| |||
| Avg. | 0.81796 | 0.46421 | 4.75012 |
| min | 0.8001 | 0.4255 | 4.2001 |
| Max | 0.8473 | 0.4868 | 5.721 |
| (Std.) | 0.0137 | 0.0178 | 0.4568 |
| Best Result (feature names) | RDF095m, C-008, RBN, ISH, SPAM, GATS6e, MATS6e, nCaH | ||
|
| |||
| Avg. | 0.8263 | 0.4535 | 134.3 |
| min | 0.814 | 0.438 | 118 |
| Max | 0.8382 | 0.4695 | 170 |
| (Std.) | 0.0071 | 0.0092 | 19.4784 |
| Best Result (feature names) | BEHv1,MATS3m,SPAM,RDF095m, Mor03u , Mor03m, E1u, nCaH | ||
|
| |||
| Avg. | 0.8470 | 0.4256 | 80.4 |
| min | 0.8315 | 0.4074 | 72 |
| Max | 0.86 | 0.4469 | 89 |
| (Std.) | 0.0071 | 0.0099 | 5.5892 |
| Best Result (feature names) | RBN, X1A, BIC4, GATS5v, RDF035m, E2m, HATS1u, H8m | ||
|
| |||
| Avg. | 0.8647 | 0.4003 | 118.4 |
| min | 0.8596 | 0.3868 | 111 |
| Max | 0.8737 | 0.4079 | 125 |
| (Std.) | 0.0040 | 0.0060 | 4.4581 |
| Best Result (feature names) | RBN, PW3, SAM, RDF095m, RDF120m, nSO2, C-027,H-046 | ||
Parameters of Algorithms
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Initial Population | 100 | 100 | 100 | 100 | 100 | 100 |
| Generation | 100 | - | - | - | 60 | 100 |
| Epoch | - | 100 | 100 | 100 | 40 | - |
| Cross over | 0.7 | - | - | - | 0.7 | 0.7 |
| Mutation | 0.3 | - | - | - | 0.3 | 0.3 |
| Memory | - | - | - | 3 | 3 | 3 |
| Inertia weight(w) | - | - | 0.8 | - | - | - |
| Acceleration constants | - | - | 1.5 | - | - | - |
| Rho | - | 0.7 | - | - | - | - |
Figure 13Proposed Sequential GA and LA flowchart
Figure 14The variations of (A) and (B) RMSE for Table 3 results
The statistical parameters of the GA-LS-SVR, ACO-LS-SVR, PSO-LS-SVR, LA-LS-SVR, SGALA-LS-SVR, and MGALA- LS-SVR models
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| GA- LS-SVR | 468.323 | 1413.690 | 0.861 | 0.409 | 0.760 | 0.591 |
| ACO- LS-SVR | 74.4881 | 88.5071 | 0.9028 | 0.3440 | 0.8980 | 0.4842 |
| PSO- LS-SVR | 8.2448 | 23.6189 | 0.9290 | 0.2965 | 0.8147 | 0.5578 |
| LA-LS-SVR | 27.965 | 19.1907 | 0.964 | 0.210 | 0.786 | 0.545 |
| SGALA-LS-SVR | 119.877 | 465.674 | 0.880 | 0.381 | 0.875 | 0.443 |
| MGALA-LS-SVR | 1007.3 | 293.604 | 0.940 | 0.268 | 0.770 | 0.564 |
The statistical parameters of the external test set for GA-LS-SVR, ACO-LS-SVR, PSO-LS-SVR, LA-LS-SVR, SGALA-LS-SVR, and MGALA-LS-SVR models
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| 0.698 | 0.803 | 0.731 | 0.744 | 0.830 | 0.725 |
|
| 0.759 | 0.898 | 0.815 | 0.786 | 0.875 | 0.770 |
|
| 0.665 | 0.897 | 0.772 | 0.776 | 0.859 | 0.750 |
|
| 0.758 | 0.880 | 0.815 | 0.769 | 0.875 | 0.759 |
|
| 0.123 | 0.001 | 0.053 | 0.012 | 0.018 | 0.026 |
|
| 0.001 | 0.020 | 0.000 | 0.0201 | 0.000 | 0.014 |
|
| 0.526 | 0.870 | 0.646 | 0.709 | 0.764 | 0.661 |
|
| 0.734 | 0.777 | 0.815 | 0.685 | 0.875 | 0.689 |
|
| 0.954 | 0.952 | 0.950 | 0.969 | 0.963 | 0.964 |
|
| 1.041 | 1.047 | 1.048 | 1.026 | 1.035 | 1.031 |
Figure 15The average value of convergence process for the all mentioned algorithms on Laufer et al’s dataset. The number of generation is 100. The goal of the algorithms is minimizing RMSE value. MGALA and SGALA converge to minimum RMSE values than others
Figure 16Plot of predicted pIC50 versus observed values using (A) GA-LS-SVR (), (B) ACO-LS-SVR (), (C) PSO-LS-SVR (), (D) LA-LS-SVR (), (E) SGALA-LS-SVR (), (F) MGALA-LS-SVR () models
Results of algorithms for ten different Runs (Guha et al. and Calm et al. datasets
|
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||||||
| 320 | 79 | (9) | 12 | 115 | 45 | (10) | 7 | |||||||
|
| RMSE train | Running time (second) |
| RMSE train | Running time (second) | |||||||||
| GA | ||||||||||||||
| Avg. | 0.626 | 0.413 | 220.184 | 0.953 | 0.348 | 40.332 | ||||||||
| min | 0.616 | 0.401 | 189.797 | 0.946 | 0.283 | 36.935 | ||||||||
| Max | 0.647 | 0.419 | 291.226 | 0.969 | 0.376 | 48.873 | ||||||||
| (Std.) | 0.009 | 0.005 | 27.303 | 0.006 | 0.027 | 3.740 | ||||||||
| Best Result (feature names) | MOLC#5, EMAX#1, MOMI#3, GRAV#3, CHDH#2, CHDH#3, SCDH#1, SAAA#1, SAAA#3, CHAA#2, ACHG#0 | BEHm2, ATS1m, MATS1m, DISPe, RDF020u, E3s, HTp | ||||||||||||
| ACO | ||||||||||||||
| Avg. | 0.596 | 0.428 | 101.835 | 0.934 | 0.413 | 15.344 | ||||||||
| min | 0.564 | 0.415 | 86.5 | 0.923 | 0.379 | 12.196 | ||||||||
| Max | 0.613 | 0.446 | 116.54 | 0.945 | 0.448 | 26.201 | ||||||||
| (Std.) | 0.016 | 0.009 | 10.237 | 0.007 | 0.022 | 4.081 | ||||||||
| Best Result (feature names) | MOLC#4, WTPT#2, WTPT#5, MDEC#12, MDEN#33, MREF#1, GRVH#3, NITR#5, FNSA#2, SADH#3, CHDH#3, FLEX#5 | AMW, Me, X4v, IDDE, L3m, HTp, nROR | ||||||||||||
| PSO | ||||||||||||||
| Avg. | 0.603 | 0.425 | 7.057 | 0.931 | 0.422 | 5.728 | ||||||||
| min | 0.583 | 0.410 | 0.627 | 0.9164 | 0.3821 | 4.233 | ||||||||
| Max | 0.632 | 0.436 | 11.147 | 0.9445 | 0.4687 | 7.781 | ||||||||
| (Std.) | 0.018 | 0.010 | 2.662 | 0.008 | 0.027 | 1.127 | ||||||||
| Best Result (feature names) | 2SP2#1, CHAA#2, CHDH#2, WNSA#1, WTPT#4, PNSA#2, N2P#1, SADH#2, SADH#1, NITR#5, SURR#1, MOLC#3 | R1u, nF, S2K, nROR, L3m, AMW, HTp | ||||||||||||
| LA | ||||||||||||||
| Avg. | 0.61146 | 0.4215 | 265.617 | 0.940 | 0.393 | 56.644 | ||||||||
| min | 0.5945 | 0.4022 | 215.16 | 0.934 | 0.372 | 44.469 | ||||||||
| Max | 0.646 | 0.430 | 307.315 | 0.947 | 0.413 | 74.579 | ||||||||
| (Std.) | 0.016 | 0.008 | 31.762 | 0.003 | 0.012 | 9.514 | ||||||||
| Best Result (feature names) | V3CH#15, WTPT#4, MDEC#34, MDEO#12, MREF#1, EMIN#1, MOMI#1, VOL#150, homo#0, WPSA#1, FNHS#1, RNH#1 | ATS2e, RPCG, DISPe, L3m, H0e, HTp, F082 | ||||||||||||
| SGALA | ||||||||||||||
| Avg. | 0.6409 | 0.4051 | 270.678 | 0.955 | 0.340 | 61.969 | ||||||||
| min | 0.624 | 0.382 | 245.521 | 0.946 | 0.312 | 46.548 | ||||||||
| Max | 0.680 | 0.414 | 298.413 | 0.962 | 0.374 | 76.731 | ||||||||
| (Std.) | 0.017 | 0.009 | 18.779 | 0.005 | 0.019 | 9.553 | ||||||||
| Best Result (feature names) | KAPA#2, KAPA#4, ALLP#1, ALLP#2, V4PC#12, N6CH#16, N7CH#20, NITR#5, FNSA#3, RNCG#1, SCDH#2, FNHS#1 | IDDE, SHP2, DISPe, L3m, H0e, HTp, Hy | ||||||||||||
| MGALA | ||||||||||||||
| Avg. | 0.704 | 0.371 | 627.026 | 0.960 | 0.322 | 124.922 | ||||||||
| min | 0.684 | 0.352 | 555.923 | 0.958 | 0.302 | 89.526 | ||||||||
| Max | 0.727 | 0.390 | 709.474 | 0.965 | 0.329 | 150.749 | ||||||||
| (Std.) | 0.013 | 0.010 | 53.150 | 0.001 | 0.007 | 20.326 | ||||||||
| Best Result (feature names) | KAPA#2, KAPA#4, ALLP#4, V5CH#17, S6CH#18, MOLC#1, SADH#2, CHDH#1, CHDH#3, SAAA#2, ACHG#0, SURR#5 | GGI1, DISPe, RDF020u, E3s, H0e, HTp, Hy | ||||||||||||
Figure 17.William’s plot of standardized residual versus leverage (h* = 0.54). (A) GA-LS-SVR, (B) ACO-LS-SVR, (C) PSO-LS-SVR, (D) LA-LS-SVR , (E) SGALA-LS-SVR, (F) MGALA-LS-SVR models
Figure 18The results of variations of R2 for Table 6. (A) The R2 value for all the algorithms on Guha et al’s dataset. The MGALA and SGALA have the best R2 values than others have respectively. (B) The value for all algorithms on Calm et al’s dataset. MGALA and SGALA have the best R2 than others have respectively