| Literature DB >> 35200658 |
Susana P Gaudêncio1,2, Florbela Pereira3.
Abstract
Biofouling is the undesirable growth of micro- and macro-organisms on artificial water-immersed surfaces, which results in high costs for the prevention and maintenance of this process (billion €/year) for aquaculture, shipping and other industries that rely on coastal and off-shore infrastructure. To date, there are still no sustainable, economical and environmentally safe solutions to overcome this challenging phenomenon. A computer-aided drug design (CADD) approach comprising ligand- and structure-based methods was explored for predicting the antifouling activities of marine natural products (MNPs). In the CADD ligand-based method, 141 organic molecules extracted from the ChEMBL database and literature with antifouling screening data were used to build the quantitative structure-activity relationship (QSAR) classification model. An overall predictive accuracy score of up to 71% was achieved with the best QSAR model for external and internal validation using test and training sets. A virtual screening campaign of 14,492 MNPs from Encinar's website and 14 MNPs that are currently in the clinical pipeline was also carried out using the best QSAR model developed. In the CADD structure-based approach, the 125 MNPs that were selected by the QSAR approach were used in molecular docking experiments against the acetylcholinesterase enzyme. Overall, 16 MNPs were proposed as the most promising marine drug-like leads as antifouling agents, e.g., macrocyclic lactam, macrocyclic alkaloids, indole and pyridine derivatives.Entities:
Keywords: acetylcholinesterase enzyme (AChE); antifouling activity; blue biotechnology; computer-aided drug design (CADD); machine learning (ML) techniques; marine natural products (MNPs); molecular docking; quantitative structure–activity relationship (QSAR); virtual screening
Mesh:
Substances:
Year: 2022 PMID: 35200658 PMCID: PMC8879326 DOI: 10.3390/md20020129
Source DB: PubMed Journal: Mar Drugs ISSN: 1660-3397 Impact factor: 5.118
Structural clusters and antifouling activity class counts within the seven structural clusters.
| Clusters 1 | # 2 (Active Class) | Average MW (Da) 3 | Average ALogP 4 | |||
|---|---|---|---|---|---|---|
| Tr | Te | Tr | Te | Tr | Te | |
| I—acyclic derivative | 11 (11) | 0 (0) | 361.65 | 0 | 2.86 | 0 |
| II— | 28 (9) | 3 (1) | 328.09 | 334.64 | 3.18 | 3.22 |
| III— | 19 (14) | 1 (0) | 363.92 | 493.04 | 2.50 | 3.65 |
| IV—terpenoid derivative | 22 (5) | 6 (3) | 264.64 | 341.76 | 3.00 | 4.49 |
| V—diketopiperazine derivative | 15 (10) | 3 (2) | 392.54 | 415.15 | 3.06 | 3.10 |
| VI—chalcone derivative | 16 (3) | 0 (0) | 352.37 | 0 | 4.56 | 0 |
| VII—miscellaneous | 16 (5) | 1 (0) | 1164.53 | 975.69 | −0.88 | −1.57 |
1 Cluster code and chemical structure of the cluster scaffold. 2 Number of molecules in the training (Tr) and the test (Te) sets. 3 Molecular weight (MW) within the cluster for the training and test sets. 4 Octanol–water partition coefficient prediction within the cluster for the training and test sets.
Evaluation of the predictive performance of FPs and 1D&2D molecular descriptors for modeling the antifouling activity using the RF algorithm for the training set with an OOB estimation. The best models are highlighted in bold.
| Descriptors (#) | TP 1 | TN 2 | FN 3 | FP 4 | SE 5 | SP 6 | Q 7 | MCC 8 |
|---|---|---|---|---|---|---|---|---|
| MACCS (166) 9 | 41 | 51 | 16 | 19 | 0.719 | 0.729 | 0.724 | 0.446 |
| Sub (307) 9 | 41 | 53 | 16 | 17 | 0.719 | 0.757 |
|
|
| PubChem (881) 9 | 43 | 48 | 14 | 22 | 0.754 | 0.686 | 0.717 | 0.438 |
| CDK (1024) 9 | 42 | 47 | 15 | 23 | 0.737 | 0.671 | 0.701 | 0.406 |
| ExtCDK (1024) 9 | 41 | 49 | 16 | 21 | 0.719 | 0.700 |
|
|
| 1D&2D (1376) | 40 | 53 | 17 | 17 | 0.702 | 0.757 |
|
|
1 True positive. 2 True negative. 3 False negative. 4 False positive. 5 Sensitivity, the ratio of true positive to the sum of true positive and false positive. 6 Specificity, the ratio of true negative to the sum of true negative and false negative. 7 Overall predictive accuracy, the ratio of the sum of true positive and true negative to the sum of true positive, true negative, false positive and false negative. 8 Matthews correlation coefficient. 9 Fingerprints, FPs.
Evaluation of the predictive performance of RDF descriptors and descriptor selection for modeling the antifouling activity using the RF algorithm for the training set with an OOB estimation. The best models are highlighted in bold.
| Model | # | SE 1 | SP 2 | Q 3 | MCC 4 |
|---|---|---|---|---|---|
| Sub + RDF | 691 | 0.667 | 0.714 | 0.693 | 0.380 |
| Selection 5 | 50 | 0.667 | 0.714 | 0.693 | 0.380 |
| Selection 5 | 100 | 0.684 | 0.757 | 0.724 | 0.442 |
| Selection 5 | 150 | 0.702 | 0.786 |
|
|
| Selection 5 | 200 | 0.684 | 0.757 | 0.724 | 0.442 |
| ExtCDK + RDF | 1408 | 0.667 | 0.743 | 0.709 | 0.410 |
| Selection 5 | 12 | 0.754 | 0.729 | 0.740 | 0.481 |
| Selection 5 | 25 | 0.737 | 0.786 |
|
|
| Selection 5 | 50 | 0.702 | 0.771 | 0.740 | 0.474 |
| Selection 5 | 100 | 0.684 | 0.771 | 0.732 | 0.457 |
| 1D&2D + RDF | 1760 | 0.719 | 0.714 | 0.717 | 0.432 |
| Selection 5 | 50 | 0.807 | 0.800 | 0.803 | 0.605 |
| Selection 5 | 100 | 0.825 | 0.786 | 0.803 | 0.607 |
| Selection 5 | 150 | 0.807 | 0.800 | 0.803 | 0.605 |
| Selection 5 | 200 | 0.842 | 0.786 |
|
|
| Selection 5 | 250 | 0.772 | 0.800 | 0.787 | 0.571 |
1 Sensitivity, the ratio of true positive to the sum of true positive and false positive. 2 Specificity, the ratio of true negative to the sum of true negative and false negative. 3 Overall predictive accuracy, the ratio of the sum of true positive and true negative to the sum of true positive, true negative, false positive and false negative. 4 Matthews correlation coefficient. 5 The descriptor selection was evaluated based on the importance assigned by the RF model with the R program.
Exploration of different ML algorithms using the 200 selected descriptors.
| Model | SE 1 | SP 2 | Q 3 | MCC 4 |
|---|---|---|---|---|
| RF | 0.667 | 0.750 | 0.714 | 0.417 |
| SVM | 0.830 | 0.500 | 0.643 | 0.344 |
| dMLP | 0.670 | 0.750 | 0.714 | 0.417 |
1 Sensitivity, the ratio of true positive to the sum of true positive and false positive. 2 Specificity, the ratio of true negative to the sum of true negative and false negative. 3 Overall predictive accuracy, the ratio of the sum of true positive and true negative to the sum of true positive, true negative, false positive and false negative. 4 Matthews correlation coefficient.
The predictions of the best RF model by the seven structural clusters for the training and test sets. The best models are highlighted in bold.
| Cluster | # | SE 1 | SP 2 | Q 3 | MCC 4 |
|---|---|---|---|---|---|
| Training set | |||||
| I | 11 | 1.000 | - |
|
|
| II | 28 | 0.889 | 0.789 |
|
|
| III | 19 | 1.000 | 0.400 | 0.842 | 0.574 |
| IV | 22 | 0.800 | 0.941 |
|
|
| V | 15 | 0.900 | 0.000 | 0.600 | - |
| VI | 16 | 0.000 | 1.000 | 0.813 | - |
| VII | 16 | 0.400 | 0.812 | 0.688 | 0.234 |
| All | 0.842 | 0.786 | 0.811 | 0.625 | |
| Test set | |||||
| II | 3 | 1.000 | 1.000 |
|
|
| III | 1 | - | 1.000 |
|
|
| IV | 6 | 0.333 | 1.000 | 0.667 | 0.447 |
| V | 3 | 1.000 | 0.000 | 0.667 | - |
| VII | 1 | - | 0.000 | 0.000 | - |
| All | 0.667 | 0.750 | 0.713 | 0.417 | |
1 Sensitivity, the ratio of true positive to the sum of true positive and false positive. 2 Specificity, the ratio of true negative to the sum of true negative and false negative. 3 Overall predictive accuracy, the ratio of the sum of true positive and true negative to the sum of true positive, true negative, false positive and false negative. 4 Matthews correlation coefficient.
Figure 1The twenty most important 1D&2D +RDF descriptors selected in RF classification models, where the first three descriptors in terms of importance are three Burden-modified eigenvalue descriptors weighted by relative I-state, mass and Sanderson electronegativities, respectively; there are several Broto–Moreau autocorrelations 4th–5th, 7th–8th, 14th, 16th–18th, 20th weighted by I-state, mass, mass, first ionization potential, mass, polarizabilities, charge, Sanderson electronegativities and I-state; two Moran autocorrelation descriptors, 6th and 15th weighted by charge and mass, respectively; four electrotopological state atom type descriptors, 9th (>C<), 11th (weak hydrogen bond acceptors), 13th (-CH2-), 19th (H bonded to B, Si, P, Ge, As, Se, Sn or P); one PaDEL weighted path descriptor, 10th (sum of path lengths starting from nitrogens); and one topological charge descriptor, 12th (mean topological charge index of order 1).
Figure 2Chemical structure of the morpholine derivative.
Structures and calculated free binding energies (∆GB, in kcal/mol) of the sixteen selected MNPs, the positive (synoxazolidinone A and C; donepezil) and negative (phenolic derivative) controls.
| CAS | Chemical Structure | Name/Structural | Natural Source | Prob_A | ∆GB (kcal/mol) 1 |
|---|---|---|---|---|---|
| 147362-39-8 |
| cylindramide/lactam | marine sponge 2 | 0.684 | −11.3 |
| 126622-63-7 |
| haliclamine B/macrocyclic | marine sponge 3 | 0.682 | −8.2 |
| 126622-64-8 |
| haliclamine A/macrocyclic | marine sponge 3 | 0.682 | −7.8 |
| 156310-18-8 |
| ingamine B/macrocyclic | marine sponge 4 | 0.682 | −7.8 |
| 155944-26-6 |
| madangamines A/macrocyclic alkaloid | marine sponge 4 | 0.694 | −7.7 |
| 105305-54-2 |
| serain 3/ | marine sponge 5 | 0.686 | −7.5 |
| 142677-10-9 |
| chondriamide B/indole | red alga 6 | 0.682 | −7.5 |
| 134029-43-9 |
| nortopsentin A/indole | marine sponge 7 | 0.702 | −7.3 |
| 134029-44-0 |
| nortopsentin B/indole | marine sponge 7 | 0.698 | −7.3 |
| 134029-45-1 |
| nortopsentin C/indole | marine sponge 7 | 0.700 | −7.3 |
| 105418-77-7 |
| serain 1/ | marine sponge 5 | 0.686 | −7.2 |
| 142677-09-6 |
| chondriamide A/indole | red alga 6 | 0.682 | −7.2 |
| 223596-72-3 |
| isobromodeoxytopsent/ | marine sponge 8 | 0.680 | −7.2 |
| 134779-34-3 |
| nortopsentin D/indole | marine sponge 7 | 0.688 | −7.1 |
| 157536-35-1 |
| keramaphidin B/macrocyclic alkaloid | marine sponge 9 | 0.684 | −7.1 |
| 59697-14-2 |
| nemertelline/ | marine worm 10 | 0.680 | −7.0 |
| positive control |
| synoxazolidinone A | - | - | −6.5 |
| positive control |
| synoxazolidinone C | - | - | −6.7 |
| positive control |
| donepezil | - | - | −6.5 |
| negative control |
| phenolic | - | - | −5.1 |
1 AChE enzyme: center X: 25.435 Y: 69.621 Z: 278.986; 2 Halichondria cylindrata; 3 Haliclona sp.; 4 Xestospongia ingens; 5 Reniera sarai; 6 Chondria sp.; 7 Spongosorites ruetzleri and Haliclona sp.; 8 Spongosorites sp.; 9 Amphimedon sp.; 10 Amphiporus angulatus.
Figure 3Interaction profiles of the best-docked poses for the two hits (a) cylindramide and (b) haliclamine B.
Figure 4Interaction profiles of the best-docked poses for the two macrocyclic hits (cylindramide and haliclamine B), the best non-macrocycle hit (indole derivative) and the positive (synoxazolidinone A and C; donepezil) and negative (phenolic derivative) controls.
Hyperparameter settings of the best dMLP model.
| Hyperparameter | Setting |
|---|---|
| Initializer | Glorot uniform |
| Number of hidden layers | 2 |
| Number of neurons in the 1st and 2nd layers | 200 |
| Number of neurons in the 3rd | 2 |
| Activation 1st–2nd layers | Relu |
| Activation 3rd layer | Sigmoid |
| Batch size | 36 |
| Optimizer | Adadelta |
| Loss | Binary crossentropy |
| Epochs | 100 |