Literature DB >> 36120024

Multi-Condition QSAR Model for the Virtual Design of Chemicals with Dual Pan-Antiviral and Anti-Cytokine Storm Profiles.

Alejandro Speck-Planche1, Valeria V Kleandrova2.   

Abstract

Respiratory viruses are infectious agents, which can cause pandemics. Although nowadays the danger associated with respiratory viruses continues to be evidenced by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as the virus responsible for the current COVID-19 pandemic, other viruses such as SARS-CoV-1, the influenza A and B viruses (IAV and IBV, respectively), and the respiratory syncytial virus (RSV) can lead to globally spread viral diseases. Also, from a biological point of view, most of these viruses can cause an organ-damaging hyperinflammatory response known as the cytokine storm (CS). Computational approaches constitute an essential component of modern drug development campaigns, and therefore, they have the potential to accelerate the discovery of chemicals able to simultaneously inhibit multiple molecular and nonmolecular targets. We report here the first multicondition model based on quantitative structure-activity relationships and an artificial neural network (mtc-QSAR-ANN) for the virtual design and prediction of molecules with dual pan-antiviral and anti-CS profiles. Our mtc-QSAR-ANN model exhibited an accuracy higher than 80%. By interpreting the different descriptors present in the mtc-QSAR-ANN model, we could retrieve several molecular fragments whose assembly led to new molecules with drug-like properties and predicted pan-antiviral and anti-CS activities.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 36120024      PMCID: PMC9476185          DOI: 10.1021/acsomega.2c03363

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Respiratory viruses represent infectious agents that can lead to life-threatening medical conditions. The danger associated with infections caused by respiratory viruses continues to be demonstrated by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)[1] as the virus responsible for the current COVID-19 pandemic. In any case, although SARS-CoV-1, SARS-CoV-2, and the influenza A virus (IAV) are some of the most studied viral pathogens[1−4] due to their transmissibility and lethality worldwide, evolved strains of other respiratory viruses such as the influenza B virus (IBV) and the respiratory syncytial virus (RSV) have also the potential to cause epidemics or pandemics.[5,6] Two alarming aspects associated with the infections caused by all the respiratory viruses mentioned above deserve special attention. On one side, the presence of at least two of these respiratory viruses can involve coinfection,[7−9] a poorly understood phenomenon that can lead to an increment in morbidity and mortality while also making antiviral drugs to became less effective. On the other hand, most of these viruses can cause the organ-damaging hyperinflammatory response known as the cytokine storm (CS),[10−12] where the proteins known as caspase-1 and tumor necrosis factor-alpha (TNF-alpha) play a determinant role. In this sense, caspase-1 is essential for the maturation and/or release of two CS inflammatory proteins[13−15] named interleukin 1β (IL-1β) and interleukin 18 (IL-18) while TNF-alpha is a CS protein itself,[16] triggering several inflammation-related proteins such as caspase-1[17] and the CS protein named interleukin 6 (IL-6).[18] All these elements, together with the fact that the current antiviral therapies are very narrow in terms of mechanisms of action and the number of targeted viruses, indicate the need of finding new chemicals with the ability to act as pan-antivirals[19] and to simultaneously inhibit the CS proteins caspase-1 and TNF-alpha.[20] Nowadays, it is well-known that computational methods are of paramount importance in modern drug development.[21] However, in the context of finding new chemicals with the potential to treat viral infections, in silico approaches such as quantitative structure–activity relationships (QSAR), 3D-QSAR, molecular docking, molecular dynamics simulation, virtual screening, and/or shape-based pharmacophore have been applied to discover molecules with either antiviral activity[22−28] or inhibitory potency against CS-related proteins,[29−32] but never both biological profiles at the same time. Also, such computational approaches have at least one or more limitations, such as the use of small data sets of compounds belonging to the same chemical family, the study of inhibitory data based on only one biological target (a virus, a viral protein, or a CS-related biomolecule), the reliance on only one assay protocol, and the lack of sufficient physicochemical and structural information, which can provide deeper insight when searching for new molecular entities with the desired bioactivities. Because of the significance of computational drug discovery, over the last 10 years, several research groups have emphasized the development and application of the methodology known as Perturbation Theory and Machine Learning (PTML)[33] through which it has been possible to overcome all the drawbacks of the in silico approaches mentioned above. Thus, PTML models have been able to integrate chemical and biological data at different levels of complexity in scientific areas as diverse as antimicrobial research,[34−38] oncology,[39−42] neurosciences,[43] immunology,[44] and peptide/protein science.[45−47] In doing so, PTML models have been able to predict multiple activities, toxicities, and/or pharmacokinetic end points, while considering dissimilar biological targets (e.g., proteins, microorganisms, cells, rodents, etc.) and many assay protocols. To date, there is no computational model reported in the scientific literature able to guide the search for chemicals with both pan-antiviral and anti-CS activities. Bearing in mind all these ideas, we report here, for the first time, the theoretical foundations of the PTML methodology in the context of both pan-antiviral and anti-inflammatory therapies. Particularly, we have created a multicondition QSAR model based on artificial neural networks (mtc-QSAR-ANN) as a tool for the virtual design and prediction of drug-like molecules with dual pan-antiviral and anti-CS profiles.

Results and Discussion

The Mtc-QSAR-ANN Model

In this work, we developed an mtc-QSAR-ANN model able to consider 23 different experimental conditions (cj) when predicting the inhibitory activity against either multiple respiratory viruses or the CS proteins caspase-1 and TNF-alpha (Table ). Each of those experimental conditions cj is a combination of three elements: the measure of inhibitory activity (ma), the biological target on which the experiment was performed (bt), and the assay information involving different test protocols (ai). Notice that the mtc-QSAR-ANN model used a classification approach, predicting molecules with BAi(cj) = 1 (active) or BAi(cj) = −1 (inactive), where BAi(cj) represented the categorical variable of inhibitory activity. All the chemical and biological data used in this work can be found in Supporting Information (Tables S1 and S2).
Table 1

Different Experimental Conditions under Which the Molecules of the Present Dataset Were Assayed

cjamabcutoff value (nM)cbtdaie
c1EC50 (nM)≤3800IAV (A-Puerto Rico-8-1934 (H1N1))F (organism-based format)
c2EC50 (nM) IAV (A-Puerto Rico-8-1934 (H1N1))F (cell-based format)
c3EC50 (nM) IAV (A-Puerto Rico-8-1934 (H1N1))F (assay format)
c4EC50 (nM)≤10000IBVF (organism-based format)
c5EC50 (nM) IBVF (cell-based format)
c6EC50 (nM)≤300RSVF (organism-based format)
c7EC50 (nM) RSVF (cell-based format)
c8EC50 (nM)≤7500SARS-CoV-1F (organism-based format)
c9EC50 (nM)≤1200SARS-CoV-2F (organism-based format)
c10IC50 (nM)≤8600IAV (A-Puerto Rico-8-1934 (H1N1))F (organism-based format)
c11IC50 (nM) IAV (A-Puerto Rico-8-1934 (H1N1))F (cell-based format)
c12IC50 (nM)≤10000IBVF (organism-based format)
c13IC50 (nM)≤3600RSVF (organism-based format)
c14IC50 (nM)≤1080SARS-CoV-1F (organism-based format)
c15IC50 (nM)≤6070SARS-CoV-2F (organism-based format)
c16IC50 (nM)p≤1100Caspase-1B (assay format)
c17IC50 (nM)p Caspase-1B (single protein format)
c18IC50 (nM)p Caspase-1B (cell-based format)
c19IC50 (nM)p≤1635TNF-alphaB (single protein format)
c20IC50 (nM)p TNF-alphaF (assay format)
c21IC50 (nM)p TNF-alphaB (assay format)
c22IC50 (nM)p TNF-alphaB (cell-based format)
c23IC50 (nM)p TNF-alphaF (cell-based format)

Codes for the different experimental conditions cj, which are combinations of the elements ma (measures of activity), bt (biological targets), and ai (assay information containing diverse experimental protocols).

Measures of inhibitory activity. EC50 (nM) is the effective concentration leading to a 50% reduction in the cytophaticity caused by a virus or inhibition of viral replication, IC50 (nM) is the concentration required for 50% inhibition of the virus, and IC50 (nM)p is the concentration required for 50% inhibition of a protein associated with the cytokine storm.

Value of activity from which a chemical was annotated as active [BAi(cj) = 1].

Targets (respiratory viruses or proteins associated with the cytokine storm).

Assay information related to the different test protocols. Each annotation combines the columns “assay type” (first letter) and “BioAssay Ontology” (phrase between parentheses), which were extracted from the ChEMBL file containing inhibitory activity data.

Codes for the different experimental conditions cj, which are combinations of the elements ma (measures of activity), bt (biological targets), and ai (assay information containing diverse experimental protocols). Measures of inhibitory activity. EC50 (nM) is the effective concentration leading to a 50% reduction in the cytophaticity caused by a virus or inhibition of viral replication, IC50 (nM) is the concentration required for 50% inhibition of the virus, and IC50 (nM)p is the concentration required for 50% inhibition of a protein associated with the cytokine storm. Value of activity from which a chemical was annotated as active [BAi(cj) = 1]. Targets (respiratory viruses or proteins associated with the cytokine storm). Assay information related to the different test protocols. Each annotation combines the columns “assay type” (first letter) and “BioAssay Ontology” (phrase between parentheses), which were extracted from the ChEMBL file containing inhibitory activity data. The best mtc-QSAR-ANN model found by us has the notation MLP 15-68-2, which means that our model is based on a multilayer perceptron network containing 15 nodes equivalent to the 15 multicondition descriptors of the type D[GTI]cj, which entered in the mtc-QSAR-ANN model (Table ). According to such a notation, 68 neurons were used in the hidden layer containing a hyperbolic tangent function while the number two indicates that the mtc-QSAR-ANN model could predict the two aforementioned values of BAi(cj) in the output layer by relying on a softmax function.
Table 2

Symbols and Codes of the D[GTI]cj Descriptors Present in the mtc-QSAR-ANN Model

symbologyacodebconcept
D[Xv(C)6]maDGTI01deviation of the Kier–Hall (valence) connectivity index based only on cluster subgraphs of order 6
D[NXv(C)4]maDGTI02deviation of the normalized Kier–Hall (valence) connectivity index based only on cluster subgraphs of order 4
D[Xv(P)6]btDGTI03deviation of the Kier–Hall (valence) connectivity index based only on path subgraphs of order 6
D[NSM(Gas)1]btDGTI04deviation of the normalized spectral moment of order 1 based on bonds weighted by the Gasteiger–Marsili charges.
D[NSM(Gas)2]btDGTI05deviation of the normalized spectral moment of order 2 based on bonds weighted by the Gasteiger–Marsili charges.
D[NXv(PC)4]btDGTI06deviation of the normalized Kier–Hall (valence) connectivity index based only on path-cluster subgraphs of order 4
D[Ne(P)5]btDGTI07deviation of the normalized edge (bond) connectivity index based only on path subgraphs of order 5
D[NJ]btDGTI08deviation of the normalized Balaban index.
D[SM(Hyd)1]aiDGTI09deviation of the spectral moment of order 1 based on bonds weighted by the hydrophobicity contributions.
D[3k(alpha)]aiDGTI10deviation of Kier’s shape index based only on path subgraphs of order 3
D[J]aiDGTI11deviation of the Balaban index.
D[NSM(Psa)6]aiDGTI12deviation of the normalized spectral moment of order 6 based on bonds weighted by the polar surface area.
D[NSM(Mol)5]aiDGTI13deviation of the normalized spectral moment of order 5 based on bonds weighted by the molar refractivity.
D[NXv(Ch)5]aiDGTI14deviation of the normalized Kier–Hall (valence) connectivity index based only on chain subgraphs of order 5
D[NKFI]aiDGTI15deviation of the normalized Kierflexibility index.

The D[GTI]cj descriptors with ending “ma” characterize both the molecular structure and the measures of inhibitory activity. Those D[GTI]cj descriptors with the ending “bt” describe the chemical structure as well as the biological targets (respiratory viruses and proteins associated with the cytokine storm). Finally, the D[GTI]cj descriptors with ending “ai” characterize the chemical structure and information related to different experimental assay protocols.

From now on, for the sake of simplicity, the codes will be used instead of the original symbols to explain either the statistical significance or the physicochemical interpretation of the D[GTI]cj descriptors.

The D[GTI]cj descriptors with ending “ma” characterize both the molecular structure and the measures of inhibitory activity. Those D[GTI]cj descriptors with the ending “bt” describe the chemical structure as well as the biological targets (respiratory viruses and proteins associated with the cytokine storm). Finally, the D[GTI]cj descriptors with ending “ai” characterize the chemical structure and information related to different experimental assay protocols. From now on, for the sake of simplicity, the codes will be used instead of the original symbols to explain either the statistical significance or the physicochemical interpretation of the D[GTI]cj descriptors. The mtc-QSAR-ANN model displayed a good global performance, with accuracy (Acc) values of 85.14% and 80.19% in training and test sets, respectively. Other statistical indices such as sensitivity [Sn(%)], specificity [Sp(%)], and Matthews’ correlation coefficient (MCC)[48] supported the good statistical quality and predictive power of our mtc-QSAR-ANN model (Table ). For instance, [Sn(%)] and [Sp(%)] were higher than 79%. Also, MCC values were relatively close to one, which indicated a strong convergence between the predicted and observed values of the categorical variable of biological activity BAi(cj).
Table 3

Statistical Performance of the mtc-QSAR-ANN Model

symbolsatraining settest set
NActive1156374
CCCActive981297
Sn (%)84.8679.41
NInactive1435469
CCCInactive1225379
Sp (%)85.3780.81
MCC0.7000.600

NActive, number of molecules/cases labeled as active; NInactive, number of molecules/cases annotated as inactive; CCCActive, number of molecules/cases correctly classified as active; CCCInactive, number of molecules/cases correctly classified as inactive; Sn (%), statistical sensitivity (percentage of molecules/cases correctly classified as active); Sp (%), statistical specificity (percentage of molecules/cases correctly classified as inactive); MCC, Matthews’ correlation coefficient.

NActive, number of molecules/cases labeled as active; NInactive, number of molecules/cases annotated as inactive; CCCActive, number of molecules/cases correctly classified as active; CCCInactive, number of molecules/cases correctly classified as inactive; Sn (%), statistical sensitivity (percentage of molecules/cases correctly classified as active); Sp (%), statistical specificity (percentage of molecules/cases correctly classified as inactive); MCC, Matthews’ correlation coefficient. We also examined the local sensitivities and specificities, which depended on specific elements. In the training set, in the case of the measures of activity (ma), the sensitivity [Sn(%)]ma and the specificity [Sp(%)]ma were higher than or equal to 79.52% and 83.78%, respectively. Both magnitudes exhibited values above 72% in the test set. Considering the local measures in the case of the biological targets (bt), [Sn(%)]bt and [Sp(%)]bt had acceptable values in the range 62.86%–92% for the whole data set. The only exception was the biological target SARS-CoV-1, for which [Sn(%)]bt = 44.44% was achieved in the test set. Last, in the case of the assay information involving different experimental protocols (ai), the [Sn(%)]ai and [Sp(%)]ai values were in the interval 75%–100% in either training or test sets. Only [Sn(%)]ai for the assay information denoted as “F (assay format)” was below this range, with values of 53.49% and 38.46% for training and test sets, respectively. We suggest that the incorrectly classified/predicted chemicals are a consequence of the lack of universality of our D[GTI]cj descriptors (also valid for any molecular descriptor reported to date in the scientific literature), as well as the complexity of the experimental data involving different measures of activity (ma), multiple targets (bt), and great variability of the assay information (ai). In any case, from the analyses of both global and local statistical indices, we can infer that our mtc-QSAR-ANN model has a good statistical quality and predictive power. Detailed information regarding the D[GTI]cj descriptors and the classification results is available in Supporting Information (Tables S3–S6). In addition to analyzing the performance of the mtc-QSAR-ANN model, we also assessed the reliability of its classifications/predictions by determining the applicability domain according to the descriptors’ space approach.[49,50] In doing so, we defined a local score of applicability domain (LSAD) for each D[GTI]cj descriptor as well as a total score (TSAD),[50,51] which was the sum of the LSAD values. Because our mtc-QSAR-ANN model contains 15 D[GTI]cj descriptors, it was expected that only molecules/cases with TSAD = 15 would be those falling within the applicability domain. Of the 3434 molecules/cases in the data set, only four were outside the applicability domain (TSAD = 14). Yet, the deletion of these four outliers did not improve the performance of the mtc-QSAR-ANN model (Supporting Information Table S7). Last, we would like to emphasize that although the present mtc-QSAR-ANN model is based on a single neural network, when performing virtual screening, it can behave as a consensus tool. Notice that the previous analysis of the local measures [Sn(%)]ma, [Sp(%)]ma, [Sn(%)]bt, [Sp(%)]bt, [Sn(%)]ai, and [Sp(%)]ai demonstrates that our mtc-QSAR-ANN model can predict the activity of any molecule against a defined target by considering several experimental conditions (see Table ). Therefore, if for a given molecule, we combine the results and the reliability (the latter provided by the applicability domain) of such predictions against a specific target under different experimental conditions, it will be possible to obtain a consensus idea on the activity or inactivity of that molecule against the target under study.

Physicochemical and Structural Interpretation of the Molecular Descriptors

Before using our mtc-QSAR-ANN model as a tool to design new molecules, we interpreted the different D[GTI]cj descriptors from a physicochemical and structural point of view. In doing so, for each D[GTI]cj descriptor present in the mtc-QSAR-ANN model, we calculated two mean values, one for the chemicals annotated and correctly classified as active and the other for those molecules considered and accurately classified as inactive (Table ).[50,51] Such calculations were carried out by considering only the molecules in the training set. The comparison of the two mean values permitted us to gain insight into how the value of each D[GTI]cj descriptor should be varied (increased or diminished) to increase both the pan-antiviral and anti-CS activities.
Table 4

Tendencies of Variation of the D[GTI]cj Descriptors in the mtc-QSAR-ANN Model

 class-based meansa
 
descriptorsactiveinactivetendencyb
DGTI011.4391 × 10–25.9700 × 10–2decrease
DGTI022.8440 × 10–23.1588 × 10–2decrease
DGTI031.6610 × 10–3–5.2028 × 10–2increase
DGTI04–3.1118 × 10–21.2448 × 10–1decrease
DGTI051.0951 × 10–2–1.2250 × 10–1increase
DGTI061.1698 × 10–21.0999 × 10–1decrease
DGTI07–2.9689 × 10–2–3.8392 × 10–2increase
DGTI081.3665 × 10–21.3667 × 10–1decrease
DGTI09–2.4072 × 10–28.7241 × 10–2decrease
DGTI104.9842 × 10–3–2.3010 × 10–2increase
DGTI111.6154 × 10–24.7476 × 10–2decrease
DGTI121.3308 × 10–2–1.1988 × 10–3increase
DGTI131.4227 × 10–39.9303 × 10–2decrease
DGTI148.0425 × 10–32.4919 × 10–2decrease
DGTI158.4249 × 10–34.8861 × 10–2decrease

These are the averages calculated for each D[GTI]cj descriptor by considering chemicals (from the training set) belonging to a defined class (active or inactive).

Variation of the value of a D[GTI]cj descriptor that should be expected to increase the inhibitory activity against the respiratory viruses and the proteins associated with the cytokine storm.

These are the averages calculated for each D[GTI]cj descriptor by considering chemicals (from the training set) belonging to a defined class (active or inactive). Variation of the value of a D[GTI]cj descriptor that should be expected to increase the inhibitory activity against the respiratory viruses and the proteins associated with the cytokine storm. We also relied on the use of a graphic containing the sensitivity values (SVs) of the D[GTI]cj descriptors, which indicated their degrees of importance in the mtc-QSAR-ANN model (Figure ). The D[GTI]cj descriptors with the largest SVs are the most influential, and therefore, they represent the most important physicochemical properties and structural features that a molecule should have to enhance its pan-antiviral and anti-CS activities.
Figure 1

Sensitivity values as measures of the importance of the D[GTI]cj descriptors present in the mtc-QSAR-ANN model.

Sensitivity values as measures of the importance of the D[GTI]cj descriptors present in the mtc-QSAR-ANN model. We have five D[GTI]cj descriptors derived from the atom-based connectivity indices, and therefore, they characterize the molecular accessibility of the chemicals,[52,53] i.e., the ability of certain regions/fragments/functional groups of the chemicals to interact with the different biological targets (or components of them). These D[GTI]cj descriptors are DGTI01, DGTI02, DGTI03, DGTI06, and DGTI14. Thus, to modify the molecular accessibility in the way of increasing the pan-antiviral and anti-CS activities, the following physicochemical and structural requirements are needed. First, the number of functional groups of the type G–JL3 should be avoided. In such functional groups, G, J, and L are non-hydrogen atoms, and the three L atoms can be either equal or different. In G–JL3, when G is attached to at least two non-hydrogen atoms besides J (e.g., the trifluoromethyl attached to a benzene ring), the information on the diminution of the molecular accessibility is provided by the decrease of the values of DGTI01 and DGTI02. If the atom G is attached to only one non-hydrogen atom besides J (the same trifluoromethyl group attached to carbon from a methylene group), then, the information will be described only by the diminution of DGTI02 (its value is also reduced by increasing the total number of non-hydrogen atoms in a molecule). We would like to highlight that if for some reason, the functional groups G–JL3 are present in a molecule, then G, J, and L should be preferably noncarbon atoms such as oxygen or nitrogen. The descriptors DGTI01 and DGTI02 rank 12th and 13th among the most important D[GTI]cj descriptors in the mtc-QSAR-ANN model, respectively. In addition, we have DGTI03 (the ninth most important descriptor), which indicates the increase in the number of linear fragments containing six bonds (hexane-like skeletons). In such fragments, atoms such as S and P, as well as aliphatic portions and aromatic atoms, are also highly beneficial. In this sense, Figure depicts a series of generic molecular fragments whose presence can positively favor the variation of the values of the D[GTI]cj descriptors, including DGTI03.
Figure 2

Different molecular fragments whose presence favorably affects the values of the D[GTI]cj descriptors. The symbols have the following meanings: A = O or S; X = C or N; Y1 = O, −CH2– or −NH–; Y2 and Y3 can be any atom; Y4 = O or −NH–; Z1 and Z2 can be F or any functional group whose electronegative atom (can be only O or N) is the one attached to the aromatic ring.

Different molecular fragments whose presence favorably affects the values of the D[GTI]cj descriptors. The symbols have the following meanings: A = O or S; X = C or N; Y1 = O, −CH2– or −NH–; Y2 and Y3 can be any atom; Y4 = O or −NH–; Z1 and Z2 can be F or any functional group whose electronegative atom (can be only O or N) is the one attached to the aromatic ring. In the case of DGTI06 and DGTI14, they measure the diminution of the molecular accessibility in regions containing methylbutane-like skeletons and five-membered rings, respectively. This means that to increase the pan-antiviral and anti-CS activities, the number of those fragments must be reduced as much as possible; in the case where they are present, those fragments should preferably contain noncarbon atoms. The values of DGTI06 and DGTI14 can also be diminished by augmenting the number of non-hydrogen atoms in a molecule. We would like to notice that DGTI06 and DGTI14 are the second and fifth most influential D[GTI]cj descriptors in the mtc-QSAR-ANN model, respectively. Our mtc-QSAR-ANN model also contains five D[GTI]cj descriptors (DGTI04, DGTI05, DGTI09, DGTI12, and DGTI13) based on the spectral moments of the bond adjacency matrix,[54−59] which means that they characterize how much any given physicochemical property is concentrated in different regions of a molecule. From one side, the favorable diminution of the value of DGTI04 (ranked 11th in terms of significance) is equivalent to increasing the number of electronegative atoms (mainly N, O, S, F, and Cl) of a molecule. The presence of these same atoms favorably increases the value of DGTI05 but these electronegative atoms should exist in fragments formed by two bonds (without counting bond multiplicity). We would like to highlight that DGTI05 is the most influential D[GTI]cj descriptor in the mtc-QSAR-ANN model and the presence of fragments such as amide, ester, urea, carbamate, and trifluoromethyl, heteroaromatic rings such as pyrimidine and pyridazine, and benzene rings substituted with fluorine atoms or polar groups (amine, amide, hydroxyl, urea, carbamate, etc.) can desirably increase the value of DGTI05. On the other hand, DGTI09 (the seventh most important) accounts for the global hydrophobicity of a molecule, and thus, most of the fragments favoring DGTI05 will also be suitable for DGTI09 (except for ester, trifluoromethyl, and fluorobenzene as well as alkyl groups), including benzene rings where a hydrogen atom has been replaced by nitrogen or oxygen. We also have DGTI12, which describes the augmentation of the polar surface area in fragments containing six bonds or less. In particular, the value of DGTI12 (ranking 14th in terms of significance) can be increased through the presence of any functional group containing the elements N, O, S, and/or P (e.g., amine, amide, ester, carbamate, urea, sulfonamide, sulfoxide, sulfone, and phosphate) and the aforementioned heteroaromatic rings. For the case of DGTI13 (being the least important descriptor), its decrease is equivalent to the presence of low-refractivity atoms such as N, O, and F. Consequently, the presence of functional groups such as amide, primary and secondary amines, and pyrido[4,3-b]pyrazines (with a nitrogen or oxygen atom replacing hydrogen at positions 3 and 7) is encouraged. However, because DGTI13 encompasses fragments having five bonds or less (with emphasis on four-bond fragments), functional groups with high-refractivity atoms (S, P, and halogens except for fluor) should be avoided, mainly, sulfonamides, sulfones, and phosphates (or any phosphorus-based group where phosphorus is attached to other four non-hydrogen atoms). If a functional group containing a high-refractivity atom is present, that group should be in the periphery of a molecule. The last five D[GTI]cj descriptors contain structural information which is different from the D[GTI]cj descriptors already discussed (see Figure ). For instance, DGTI07 is based on the edge-connectivity index and, as such, is a measure of the molecular volume of a chemical.[60−62] Specifically, DGTI07 (ranked the tenth most influential) indicates the increment of the number of linear fragments containing five bonds (pentane-like skeletons). We have also DGTI08 and DGTI11, which are derived from the Balaban index,[63] and therefore, they are indicators of the shape of a molecule. The diminution of the values of DGTI08 and DGTI11 converges with the decrease of the number of ramifications in the molecules and the increase in the number of rings. From a structural point of view, the only difference between DGTI08 and DGTI11 is that the former is size-independent. In the mtc-QSAR-ANN model, DGTI08 and DGTI11 are the sixth and fourth most significant D[GTI]cj descriptors, respectively. In the case of DGTI10 (the eighth most important), its increase describes the shape of the molecules[64] by augmenting the number of linear fragments (propane-like skeletons). Last, we have DGTI15, which is the third most influential among all the D[GTI]cj descriptors in the mtc-QSAR-ANN model and a measure of the flexibility of the molecules.[65] In the context of the present work, the diminution of the value of DGTI15 is expected, particularly by increasing the number of fused rings. Bearing in mind all the ideas mentioned when explaining the different D[GTI]cj descriptors, their joint interpretation can be summarized in the following manner. Electronegative atoms should be distributed through the entire molecular structure of a chemical, with an emphasis on functional groups based on nitrogen and oxygen (a sulfoxide group is allowed mainly in the periphery of a molecule as a replacement of an amide group). Ramifications should be avoided, and if present, they should preferably appear in one of the extremes of any molecule. The use of two fused ring systems separated by a linker containing four or five atoms (with at least two of them being aliphatic carbons) is the most important aspect. In this sense, each fused ring system should be formed by two heteroaromatic rings or a combination of a benzene ring with an aliphatic portion. Benzene rings, when present in the molecules, should present substitutions in multiple positions, with nitrogen, oxygen, and/or fluorine as the preferred atoms to substitute the hydrogen atoms.

Virtual Design of New Chemical with Pan-Antiviral and Anti-CS Profiles

By strictly following the joint interpretation of the 15 D[GTI]cj descriptors present in the mtc-QSAR-ANN model, we designed eight structurally related molecules (Figure ). In doing so, we connected and/or fused some key molecular fragments,[66] i.e., those whose presence was suggested as desirable for the favorable variation of the values of most of the D[GTI]cj descriptors. Such fragments included fused ring systems, as well as amide, sulfoxide, carbamate, and urea groups. We also decorated the designed molecules with certain atoms/functional groups such as fluor and hydroxymethyl (both in the periphery of the molecule), as well as nitrogen atoms from secondary amines.
Figure 3

New molecules designed by assembling several fragments according to the physicochemical and structural interpretation of the D[GTI]cj descriptors.

New molecules designed by assembling several fragments according to the physicochemical and structural interpretation of the D[GTI]cj descriptors. The designed molecules were predicted by our mtc-QSAR-ANN model to confirm that the virtual design was correctly performed. Such predictions are depicted in Table . The experimental conditions from c1 to c15 involve the predicted antiviral activities against the different viral strains while those in the intervals c16–c18 and c19–c23 are based on the inhibitory potencies against caspase-1 and TNF-alpha, respectively. All the data regarding the eight designed molecules can be found in detail in the Supporting Information (Tables S8–S11).
Table 5

Summary of the Predictions Performed by the mtc-QSAR-ANN Model for the Designed Molecules

cja,bDP-001DP-002DP-003DP-004DP-005DP-006DP-007DP-008
c152.8860.8824.0829.6271.4786.0070.6071.84
c288.4091.6271.3078.9396.1899.0695.2993.36
c390.5691.9873.5978.2594.5895.5689.1689.46
c475.5978.1698.3498.5688.6286.7396.5696.56
c595.5496.1098.9399.0998.0498.2899.2399.20
c665.5571.6861.2268.7563.5382.5179.9274.79
c787.6684.6589.2386.1276.5263.7851.6157.04
c812.4417.249.9214.9833.6076.7383.1578.64
c980.3780.3788.5790.7290.0195.2498.5198.41
c1070.7777.2643.6350.7584.8692.1882.0383.10
c1188.9792.1773.6080.7296.2298.9593.8691.29
c1262.4167.0197.4697.8982.8781.1494.9794.70
c1342.6650.5355.6964.1446.4371.9375.1268.21
c1418.2025.6317.8326.5349.9386.4790.1586.64
c1580.2781.6791.0093.2392.4096.5498.8198.63
c169.935.6341.4524.998.3811.3560.8073.98
c1798.3198.0896.4396.5599.0499.0397.2097.20
c1888.7089.8089.7791.9094.6495.4396.2195.50
c1959.4466.9118.1422.3860.1274.7728.5023.29
c2044.6442.0446.3444.0548.1454.2055.2054.41
c210.430.317.544.320.070.104.167.26
c2254.1155.1761.3161.7962.2573.3671.8970.13
c2392.5390.3282.2077.5186.7375.5350.5858.71

This refers to the different experimental conditions as reported in Table .

The numbers in this table are the predicted values of probability for each molecule to be considered active. In the Supporting Information (Table S10), these probability values appear in a column named Prob.(%)Act.

This refers to the different experimental conditions as reported in Table . The numbers in this table are the predicted values of probability for each molecule to be considered active. In the Supporting Information (Table S10), these probability values appear in a column named Prob.(%)Act. The numbers in Table reflect the probabilities (predicted by the mtc-QSAR-ANN model) of the designed molecules to be considered active. Because most of the probability values are higher than 50%, we can infer that the designed molecules seem to have pan-antiviral activity while also inhibiting the CS-related proteins caspase-1 and TNF-alpha. From a physicochemical and structural point of view, we can say that fragments such as the two systems of fused rings (each of them with one polar part and one hydrophobic region), the presence and location of the two fluorine atoms, and even the hydroxymethyl group (from DP-001 to DP-004) positively account for the dual pan-antiviral and anti-CS profiles of all the designed molecules. In any case, notice that DP-006, DP-007, and DP-008 are the most desirable molecules since they were predicted as active in a larger number of experimental conditions cj and with the highest probabilities in most of these conditions. This indicates that the nitrogen atom from a secondary amine is necessary as well as the seemingly more correct placement of the amide and sulfoxide groups. Also, the carbamate group is slightly preferred over the urea moiety. We would like to emphasize that because the eight designed molecules were predicted by the mtc-QSAR-ANN model against the 23 experimental conditions for cj, 184 predictions in total were performed; in all the predictions, the designed molecules fell within the applicability domain of the mtc-QSAR-ANN model. We also examined the druglikeness of the designed molecules by calculating a series of physicochemical properties[67] that are depicted in Table . The purpose here was to determine whether these molecules simultaneously complied with Lipinski’s rule of five,[68] Ghose’s filter,[69] and Veber’s recommendations.[70] The values of the physicochemical properties of the designed molecules were in agreement with the corresponding cutoff values of the physicochemical properties established by the three aforementioned approaches, confirming their adequate druglikeness.
Table 6

Physicochemical Properties Suggesting the Druglikeness of the Designed Molecules

IDaHDHAMWM log PA log PAMRNATNRBTPSA
DP-00139448.511.2471.614112.72526133.17
DP-00239448.511.2471.614112.72526133.17
DP-003410429.440.9141.548106.65516125.99
DP-004410429.440.9141.548106.65516125.99
DP-005410450.511.1371.330112.99515127.77
DP-006310451.491.1371.977111.04506124.97
DP-007410430.451.7621.904108.41516108.56
DP-008410430.451.7621.904108.41516108.56

In the table, the abbreviations have the following meanings: the number of atoms behaving as hydrogen bond donors (HD), the number of atoms acting as hydrogen bond acceptors (HA), the molecular weight (MW), the logarithm of Moriguchi’s octanol/water partition coefficient (M log P), the logarithm of Ghose–Crippen’s octanol/water partition coefficient (A log P), Ghose–Crippen’s molar refractivity (AMR), the number of atoms (NAT), the number of rotatable bonds (NRB), and the topological polar surface area (PSA).

In the table, the abbreviations have the following meanings: the number of atoms behaving as hydrogen bond donors (HD), the number of atoms acting as hydrogen bond acceptors (HA), the molecular weight (MW), the logarithm of Moriguchi’s octanol/water partition coefficient (M log P), the logarithm of Ghose–Crippen’s octanol/water partition coefficient (A log P), Ghose–Crippen’s molar refractivity (AMR), the number of atoms (NAT), the number of rotatable bonds (NRB), and the topological polar surface area (PSA). Last, intending to assess the novelty of the designed molecules, we performed a search in prestigious external databases such as ChEMBL[71] and ZINC.[72] The idea was to investigate in these databases the existence of chemicals that could structurally resemble our designed molecules. By applying a similarity cutoff value of 85%, we did not find any molecule similar to ours.

Conclusion

Computational methods have the potential to accelerate the discovery of therapeutic chemicals in the context of antiviral research. However, given the complexity of the infections caused by respiratory viruses, such methods should go beyond the single task of performing virtual screening by considering only one viral target. The mtc-QSAR-ANN model developed by us is a confirmation that chemical and biological data can be successfully integrated, allowing the simultaneous prediction of pan-antiviral activity against different respiratory viruses and inhibitory potency against CS-related proteins. We have also demonstrated the importance of the physicochemical and structural interpretations of the molecular descriptors, which enabled the use of the mtc-QSAR-ANN model as a tool to design novel molecules with virtually dual pan-antiviral and anti-CS profiles. The methodology underpinning the creation and application of our mtc-QSAR-ANN model opens encouraging opportunities for the in silico design of chemicals with the desired properties.

Materials and Methods

The development and application of our mtc-QSAR-ANN model are illustrated in Figure . All the chemical and biological data used in this work were retrieved from version 29 of the ChEMBL database.[71,73] The curation of our data set was carried out according to certain previously reported guidelines.[35,40,66,74]
Figure 4

Development and use of an mtc-QSAR-ANN model. The D[GTI]cj descriptors were calculated by applying the Box-Jenkins approach; such calculations were carried out in Microsoft Excel. Before finding the mtc-QSAR-ANN model, the data set was randomly split into training and test sets, which accounted for 75% and 25% of the data set, respectively. The abbreviation “INTP” signifies the physicochemical and structural interpretations of the D[GTI]cj descriptors.

Development and use of an mtc-QSAR-ANN model. The D[GTI]cj descriptors were calculated by applying the Box-Jenkins approach; such calculations were carried out in Microsoft Excel. Before finding the mtc-QSAR-ANN model, the data set was randomly split into training and test sets, which accounted for 75% and 25% of the data set, respectively. The abbreviation “INTP” signifies the physicochemical and structural interpretations of the D[GTI]cj descriptors. By using the software named MODESLAB v1.5,[75] we calculated (using the SMILES codes stored in a txt file) the following topology-based molecular descriptors (TI): atom connectivity indices, bond-based connectivity indices, spectral moments of the bond adjacency matrix, Kier’s shape descriptors, and Kier’s flexibility index and other classical topological indices. Other steps such as the calculation of a set of size-independent descriptors (NTI), the application of the Box–Jenkins approach to obtain the multicondition descriptors D[GTI]cj, the split of the data set of the training and test sets, the selection of the most adequate D[GTI]cj descriptors, and the creation of the mtc-QSAR-ANN model were performed by strictly following a recent work.[76] Therefore, we will mention here only specific aspects. When calculating the D[GTI]cj descriptors,[76] the a priori probabilities ps(cj) were defined according to the following mathematical formalism: In eq , n(cj) and N(cj) are the numbers of chemicals/cases annotated as active and the total number of cases, respectively. Both numbers (calculated by considering only chemicals/cases in the training set) depend on a specific element of the experimental condition cj. For instance, if cj = ma, then, ps(ma), n(ma), and N(ma) correspond to the a priori probability, the number of active chemicals/cases, and the total number of chemicals/cases, all of them by considering only the element named measure of activity (ma). The same equation was applied separately to the elements named biological target (bt) and assay information (ai). When selecting the most appropriate D[GTI]cj descriptors using the software IMMAN v1.0,[77] we computed Jeffreys’s information index (100 bins) to rank them according to their potential discriminatory power. Then, we performed a correlation analysis, keeping only those D[GTI]cj descriptors with pairwise correlation values in the interval −0.7 < PCC < +0.7 (PCC was for the Pearson’s correlation coefficient). To find the best mtc-QSAR-ANN model, we employed the ANN package of the software STATISTICA v13.5.0.17.[78] In doing so, we utilized a customized configuration [strategy for creating predictive models: automated network search; network types, MLP; minimum hidden units (neurons), 15; maximum hidden units (neurons), 70; networks to train, 3000; networks to retain, 50; activation functions for the hidden layer, logistic and hyperbolic tangent; activation function for the output layer, logistic, hyperbolic tangent, and softmax (the latter was implicitly used by default)]. The options regarding weight decay and initialization were left inactivated. We considered only those MLP networks with the number of epochs equal to or less than 400. In the end, the best MLP network (most suitable mtc-QSAR-ANN model) was the one exhibiting the highest values of the local measures [Sn (%)]ma, [Sp (%)]ma, [Sn (%)]bt, [Sp (%)]bt, [Sn (%)]ai, and [Sp (%)]ai. We would like to emphasize that our choice of using ANN via MLP as the machine learning algorithm with the aforementioned configuration is based on our previous experience in the creation and application of PTML models with good statistical quality and predictive power.[34,66,76,79]
  67 in total

Review 1.  General theory for multiple input-output perturbations in complex molecular systems. 1. Linear QSPR electronegativity models in physical, organic, and medicinal chemistry.

Authors:  Humberto González-Díaz; Sonia Arrasate; Asier Gómez-SanJuan; Nuria Sotomayor; Esther Lete; Lina Besada-Porto; Juan M Ruso
Journal:  Curr Top Med Chem       Date:  2013       Impact factor: 3.295

2.  Multi-target Drug Discovery via PTML Modeling: Applications to the Design of Virtual Dual Inhibitors of CDK4 and HER2.

Authors:  Valeria V Kleandrova; Marcus T Scotti; Luciana Scotti; Alejandro Speck-Planche
Journal:  Curr Top Med Chem       Date:  2021       Impact factor: 3.295

3.  Computational insights into the inhibition of influenza viruses by rupestonic acid derivatives: pharmacophore modeling, 3D-QSAR, CoMFA and COMSIA studies.

Authors:  Karthikeyan Muthusamy; Palani Kirubakaran; Gopinath Krishnasamy; Raja Rajeshwari Thanashankar
Journal:  Comb Chem High Throughput Screen       Date:  2015       Impact factor: 1.339

4.  From knowledge generation to knowledge archive. A general strategy using TOPS-MODE with DEREK to formulate new alerts for skin sensitization.

Authors:  Ernesto Estrada; Grace Patlewicz; Yaquelin Gutierrez
Journal:  J Chem Inf Comput Sci       Date:  2004 Mar-Apr

5.  ChEMBL: a large-scale bioactivity database for drug discovery.

Authors:  Anna Gaulton; Louisa J Bellis; A Patricia Bento; Jon Chambers; Mark Davies; Anne Hersey; Yvonne Light; Shaun McGlinchey; David Michalovich; Bissan Al-Lazikani; John P Overington
Journal:  Nucleic Acids Res       Date:  2011-09-23       Impact factor: 16.971

6.  ChEMBL: towards direct deposition of bioassay data.

Authors:  David Mendez; Anna Gaulton; A Patrícia Bento; Jon Chambers; Marleen De Veij; Eloy Félix; María Paula Magariños; Juan F Mosquera; Prudence Mutowo; Michal Nowotka; María Gordillo-Marañón; Fiona Hunter; Laura Junco; Grace Mugumbate; Milagros Rodriguez-Lopez; Francis Atkinson; Nicolas Bosc; Chris J Radoux; Aldo Segura-Cabrera; Anne Hersey; Andrew R Leach
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

7.  Estimation of the Timing and Intensity of Reemergence of Respiratory Syncytial Virus Following the COVID-19 Pandemic in the US.

Authors:  Zhe Zheng; Virginia E Pitzer; Eugene D Shapiro; Louis J Bont; Daniel M Weinberger
Journal:  JAMA Netw Open       Date:  2021-12-01

Review 8.  Antiviral combinations for severe influenza.

Authors:  Jake Dunning; J Kenneth Baillie; Bin Cao; Frederick G Hayden
Journal:  Lancet Infect Dis       Date:  2014-09-08       Impact factor: 25.071

Review 9.  SARS: epidemiology.

Authors:  Moira Chan-Yeung; Rui-Heng Xu
Journal:  Respirology       Date:  2003-11       Impact factor: 6.424

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.