| Literature DB >> 35566079 |
Wissal Liman1, Mehdi Oubahmane2, Ismail Hdoufane2, Imane Bjij3, Didier Villemin4, Rachid Daoud1, Driss Cherqaoui2, Achraf El Allali1.
Abstract
Hepatitis C virus (HCV) is a serious disease that threatens human health. Despite consistent efforts to inhibit the virus, it has infected more than 58 million people, with 300,000 deaths per year. The HCV nonstructural protein NS5A plays a critical role in the viral life cycle, as it is a major contributor to the viral replication and assembly processes. Therefore, its importance is evident in all currently approved HCV combination treatments. The present study identifies new potential compounds for possible medical use against HCV using the quantitative structure-activity relationship (QSAR). In this context, a set of 36 NS5A inhibitors was used to build QSAR models using genetic algorithm multiple linear regression (GA-MLR) and Monte Carlo optimization and were implemented in the software CORAL. The Monte Carlo method was used to build QSAR models using SMILES-based optimal descriptors. Four splits were performed and 24 QSAR models were developed and verified through internal and external validation. The model created for split 3 produced a higher value of the determination coefficients using the validation set (R2 = 0.991 and Q2 = 0.943). In addition, this model provides interesting information about the structural features responsible for the increase and decrease of inhibitory activity, which were used to develop eight novel NS5A inhibitors. The constructed GA-MLR model with satisfactory statistical parameters (R2 = 0.915 and Q2 = 0.941) confirmed the predicted inhibitory activity for these compounds. The Absorption, Distribution, Metabolism, Elimination, and Toxicity (ADMET) predictions showed that the newly designed compounds were nontoxic and exhibited acceptable pharmacological properties. These results could accelerate the process of discovering new drugs against HCV.Entities:
Keywords: HCV; NS5A; QSAR; chemoinformatics; drug discovery; molecular descriptors
Mesh:
Year: 2022 PMID: 35566079 PMCID: PMC9099611 DOI: 10.3390/molecules27092729
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Statistical parameter of built QSAR models and their corresponding equations.
| Split | Set |
|
| CCC |
|
|
|
|
|
|
|
|
|
|
| Equation |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Training | 13 | 0.872 | 0.931 | 0.800 | 0.812 | 0.671 | 0.543 | 75 | 0.820 | pEC50 = 0.655 (±0.344) + 0.101 (±0.004) × DCW(1,30) | |||||
| Inv.Train | 13 | 0.899 | 0.921 | 0.565 | 0.867 | 0.580 | 0.469 | 98 | 0.845 | |||||||
| Calibration | 5 | 0.964 | 0.906 | 0.982 | 0.893 | 0.909 | 0.853 | 0.942 | 0.456 | 0.367 | 81 | 0.798 | 0.615 | 0.156 | ||
| Validation | 5 | 0.891 | 0.853 | 0.824 | 0.710 | 0.685 | 0.592 | 0.194 | ||||||||
| 2 | Training | 13 | 0.889 | 0.941 | 0.314 | 0.856 | 0.561 | 0.388 | 81 | 0.843 | pEC50 = −0.094 (±0.234) + 0.084 (±0.002) × DCW(1,30) | |||||
| Inv.Train | 13 | 0.937 | 0.962 | 0.519 | 0.915 | 0.517 | 0.392 | 166 | 0.905 | |||||||
| Calibration | 5 | 0.989 | 0.963 | 0.994 | 0.975 | 0.927 | 0.927 | 0.950 | 0.438 | 0.342 | 290 | 0.858 | 0.695 | 0.094 | ||
| Validation | 5 | 0.860 | 0.904 | 0.691 | 0.566 | 0.592 | 0.767 | 0.132 | ||||||||
| 3 | Training | 13 | 0.873 | 0.932 | 0.801 | 0.837 | 0.634 | 0.498 | 76 | 0.848 | pEC50 = 0.532 (±0.184) + 0.103 (±0.003) × DCW(2,30) | |||||
| Inv.Train | 13 | 0.865 | 0.871 | 0.581 | 0.829 | 0.685 | 0.510 | 71 | 0.805 | |||||||
| Calibration | 5 | 0.975 | 0.960 | 0.987 | 0.909 | 0.937 | 0.935 | 0.949 | 0.428 | 0.315 | 120 | 0.851 | 0.755 | 0.072 | ||
| Validation | 5 | 0.990 | 0.911 | 0.719 | 0.942 | 0.532 | 0.728 | 0.076 | ||||||||
| 4 | Training | 13 | 0.941 | 0.970 | 0.831 | 0.923 | 0.390 | 0.301 | 178 | 0.894 | pEC50 = 0.457 (±0.157) + 0.149 (±0.003) × DCW(1,30) | |||||
| Inv.Train | 13 | 0.843 | 0.914 | 0.601 | 0.801 | 0.650 | 0.483 | 59 | 0.807 | |||||||
| Calibration | 5 | 0.924 | 0.945 | 0.961 | 0.637 | 0.908 | 0.906 | 0.897 | 0.568 | 0.412 | 37 | 0.657 | 0.752 | 0.101 | ||
| Validation | 5 | 0.943 | 0.892 | 0.317 | 0.399 | 0.647 | 0.777 | 0.081 |
N—number of samples; s—standard error of estimation; F—Fischer ratio.
Figure 1Experimental versus calculated pEC50 values for the models (i.e., Four Splits).
Figure 2(a) Experimental vs. predicted pEC50 values computed by GA-MLR. (b) Williams plot.
Promoters of increase and decrease of pEC50 endpoint from split 3.
| No. | Sak | CWs | CWs | CWs | NT a | NiT b | NC c | Defect |
|---|---|---|---|---|---|---|---|---|
| Promoter of endpoint increase | ||||||||
| 1 | C...(....... | 0.137 | 0.188 | 0.464 | 13 | 13 | 5 | 0.000 |
| 2 | C...O...C... | 2.041 | 1.674 | 2.516 | 6 | 8 | 3 | 0.015 |
| 3 | O........... | 1.351 | 1.397 | 1.810 | 13 | 13 | 5 | 0.000 |
| 4 | N...(....... | 0.054 | 0.063 | 0.110 | 13 | 13 | 5 | 0.000 |
| 5 | O...C...C... | 0.093 | 0.324 | 0.338 | 6 | 11 | 4 | 0.034 |
| 6 | N...C...1... | 0.222 | 0.101 | 0.043 | 7 | 6 | 3 | 0.006 |
| 7 | N...C....... | 0.423 | 0.479 | 0.573 | 13 | 13 | 5 | 0.000 |
| 8 | C...N...C... | 0.648 | 0.821 | 0.356 | 9 | 9 | 3 | 0.008 |
| 9 | C...C....... | 0.325 | 0.382 | 0.273 | 13 | 13 | 5 | 0.000 |
| 10 | N........... | 0.330 | 0.111 | 0.147 | 13 | 13 | 5 | 0.000 |
| 11 | N...C...C... | 0.798 | 0.643 | 0.829 | 11 | 13 | 5 | 0.009 |
| 12 | Nmax.8...... | 0.798 | 0.510 | 0.590 | 2 | 2 | 0 | 1.000 |
| 13 | Omax.6...... | 0.169 | 0.527 | 0.871 | 2 | 5 | 2 | 0.061 |
| Promoter of endpoint decrease | ||||||||
| 1 | 1........... | −0.1812 | −0.0120 | −0.0134 | 13 | 13 | 5 | 0.000 |
| 2 | =...O...(... | −0.0579 | −0.1326 | −0.2212 | 13 | 13 | 5 | 0.000 |
| 3 | C...=....... | −0.2642 | −0.0515 | −0.1487 | 13 | 13 | 5 | 0.000 |
| 4 | O...(....... | −1.1001 | −0.8219 | −1.2695 | 13 | 13 | 5 | 0.000 |
NT, NiT, and Nc are the numbers of SMILES (samples) that include a given attribute (SAk) in the training set a, inv.Training set b, and calibration set c. d Defect [SAk] is the difference of probabilities of SAk in the training and calibration sets, divided by the sum of total numbers of the SAk in the training and calibration sets.
Figure 3Chemical structures (25a–25h) of the newly designed compounds with favorable structural features.
The newly designed compounds and their predicted pEC50 using the Monte Carlo optimization and the GA-MLR models.
| Designed Compound | Promoters of Endpoint | pEC50
| pEC50
|
|---|---|---|---|
|
| 9.68 | 10.01 | |
|
|
Combination of sp3 carbon with branching Maximum number of oxygen is 6 | 9.88 | 10.15 |
|
|
Combination of two sp3 carbons Maximum number of oxygen is 6 | 11.78 | 10.13 |
|
|
Presence of sp3 oxygen surrounded by two sp3 carbons | 12.18 | 11.18 |
|
|
Presence of sp3 carbon surrounded by sp3 oxygen and sp3 carbon | 12.27 | 11.23 |
|
|
Combination of sp3 nitrogen and sp3 carbon in aliphatic ring Maximum number of oxygen is 6 Maximum number of nitrogen is 8 | 11.95 | 10.71 |
|
|
Presence of sp3 carbon surrounded by sp3 nitrogen and sp3 carbon Maximum number of oxygen is 6 Maximum number of nitrogen is 8 | 12.05 | 12.28 |
|
|
Combination of two sp3 carbons Combination of sp3 nitrogen and sp3 carbon in aliphatic ring Maximum number of oxygen is 6 Maximum number of nitrogen is 8 | 11.69 | 10.04 |
|
|
Presence of sp3 carbon surrounded by sp3 oxygen and sp3 carbon Combination of sp3 nitrogen and sp3 carbon in aliphatic ring Maximum number of nitrogen is 8 | 12.19 | 11.44 |
Pharmacokinetic and ADME properties of the designed molecules and the lead compound evaluated using AdmetSAR and Osiris property explorer.
| Pharmacokinetic Properties | MW | Lipophilicity (logP) | Solubility log(mol/L) | TPSA (Å2) | HBA | HBD | BBB | HIA |
|---|---|---|---|---|---|---|---|---|
|
| 835.50 | 6.71 | −2.88 | 157.99 | 13 | 4 | 0.012 | 0.010 |
|
| 849.52 | 6.67 | −3.11 | 157.99 | 13 | 4 | 0.018 | 0.010 |
|
| 863.53 | 6.95 | −3.20 | 157.99 | 13 | 4 | 0.015 | 0.009 |
|
| 865.51 | 5.37 | −3.32 | 167.22 | 14 | 4 | 0.009 | 0.012 |
|
| 879.53 | 5.68 | −3.49 | 167.22 | 14 | 4 | 0.009 | 0.007 |
|
| 864.53 | 4.73 | −3.12 | 170.02 | 14 | 5 | 0.013 | 0.189 |
|
| 878.54 | 4.85 | −3.06 | 170.02 | 14 | 5 | 0.010 | 0.078 |
|
| 878.54 | 5.39 | −3.06 | 184.01 | 14 | 6 | 0.023 | 0.049 |
|
| 894.54 | 4.26 | −3.28 | 193.24 | 15 | 6 | 0.015 | 0.035 |
Molecular weight (MW), blood–brain barrier (BBB), total polar surface area (TPSA), hydrogen bond acceptor (HBA), hydrogen bond donor (HBD), human intestinal absorption (HIA).
Chemical structures and the studied biological activity data.
|
| ||||
|
|
|
| ||
|
| 10.01 | |||
|
|
| 5.55 | ||
|
| ||||
|
|
|
| ||
|
| 5.62 | |||
|
| 5.89 | |||
|
|
| 9.16 | ||
|
|
| 7.25 | ||
|
|
| 6.21 | ||
|
| 7.36 | |||
|
|
| 6.88 | ||
|
|
| 6.61 | ||
|
|
| 7.00 | ||
|
| 7.09 | |||
|
|
| 8.96 | ||
|
|
| 6.45 | ||
|
|
| 6.45 | ||
|
| ||||
|
|
|
|
| |
|
|
|
| 7.47 | |
|
|
|
| 8.46 | |
|
|
|
| 8.09 | |
|
|
|
| 8.64 | |
|
|
|
| 10.41 | |
|
| ||||
|
|
|
| ||
|
|
| 5 | ||
|
|
| 5.55 | ||
|
| ||||
|
|
|
|
|
|
|
|
|
| 9.82 | |
|
|
|
|
| 9.77 |
|
|
|
|
| 10.41 |
|
|
|
|
| 9.46 |
|
|
|
|
| 9.01 |
|
| ||||
|
|
|
|
| |
|
|
| 8.47 | ||
|
|
|
| 9.41 | |
|
|
| 8.68 | ||
|
|
|
| 9.57 | |
|
|
|
| 9.74 | |
|
|
| 9.89 | ||
|
|
| 9.92 | ||
|
|
|
| 10.23 | |
|
| 10.30 | |||
t: test set. *:Ramification position
Description of the SMILES attributes.
| SMILES Notation | Description |
|---|---|
| SK | One symbol or two symbols that cannot be examined separately |
| SSK | Combination of two SMILES atoms |
| SSSK | Combination of three SMILES atoms |
| HARD | Existence of some chemical element |
| Cmax | Number of rings |
| Nmax | Number of nitrogen atoms |
| Omax | Number of oxygen atoms |