Literature DB >> 30619195

Anti-flavi: A Web Platform to Predict Inhibitors of Flaviviruses Using QSAR and Peptidomimetic Approaches.

Abstract

Flaviviruses are arboviruses, which comprises more than 70 viruses, covering broad geographic ranges, and responsible for significant mortality and morbidity globally. Due to the lack of efficient inhibitors targeting flaviviruses, the designing of novel and efficient anti-flavi agents is an important problem. Therefore, in the current study, we have developed a dedicated prediction algorithm anti-flavi, to identify inhibition ability of chemicals and peptides against flaviviruses through quantitative structure-activity relationship based method. We extracted the non-redundant 2168 chemicals and 117 peptides from ChEMBL and AVPpred databases, respectively, with reported IC50 values. The regression based model developed on training/testing datasets of 1952 chemicals and 105 peptides displayed the Pearson's correlation coefficient (PCC) of 0.87, 0.84, and 0.87, 0.83 using support vector machine and random forest techniques correspondingly. We also explored the peptidomimetics approach, in which the most contributing descriptors of peptides were used to identify chemicals having anti-flavi potential. Conversely, the selected descriptors of chemicals performed well to predict anti-flavi peptides. Moreover, the developed model proved to be highly robust while checked through various approaches like independent validation and decoy datasets. We hope that our web server would prove a useful tool to predict and design the efficient anti-flavi agents. The anti-flavi webserver is freely available at URL http://bioinfo.imtech.res.in/manojk/antiflavi.

Entities: Chemical Disease Gene Species

Keywords: QSAR; flaviviruses; inhibitor; machine learning techniques; peptidomimetics; prediction algorithm; random forest; support vector machine

Year: 2018 PMID： 30619195 PMCID： PMC6305493 DOI： 10.3389/fmicb.2018.03121

Source DB: PubMed Journal: Front Microbiol ISSN： 1664-302X Impact factor: 5.640

Introduction

According to World Health Organization, flaviviruses are responsible for serious outbreaks world wide and hence considered as global health burden[1] (Liang et al., 2015; Wilder-Smith and Byass, 2016). For example, the epidemics by dengue virus, DENV (100 countries in Africa, the Eastern Mediterranean, the Americas, the Western Pacific, and South-East Asia), Zika virus, ZIKV (in 42 countries), Yellow fever virus, YFV (Angolan capital city, China), and many more are reported recently. They comprise arboviruses, which are known for their shifting epidemiology in response to the changing societal factors, e.g., population growth and urbanization (Petersen and Marfin, 2005). Among all the mosquito species, the Aedes mosquito species are known to have prominent role in flaviviruses transmission, due to their ability to thrive in diverse ecological niche (beyond their resident tropical forest niche). The genome of flaviviruses comprises positive-sense, non-segmented single-stranded RNA, which ranges from 9.0 to 13 kb (Simmonds et al., 2017). It code for single long open reading frame (ORF), being flanked by 5′ end (methylated nucleotide cap) and 3′ end (non-polyadenylated) and forms secondary structures for genome replication (Bollati et al., 2010). Further, the ORF codes for single large polyprotein, which is processed by host proteases and resulted in 10 proteins including structural (3) and non-structural (7). The three structural proteins are capsid (C), premembrane/membrame (prM/M), and envelop (E), whereas the seven non-structural proteins are NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5 (Blitvich and Firth, 2017). Various inhibitors like chemicals, peptides, and peptidomimetics have been designed against the flaviviruses to target their stages and proteins. For example, NITD-448 (Lim et al., 2013) inhibits E protein-mediated membrane fusion, P02 hampers the viral replication (Zhou et al., 2008), in DENV. The DN59 inhibits the flaviviral infection by interacting with viral particles (Lok et al., 2012); ST-148 is an active compound against all four DENV serotypes (Byrd et al., 2013); BP13944 (Yang et al., 2014), keto amides (Steuer et al., 2011), is known as dengue protease inhibitor; ivermectin targets the helicase activity of DENV, YFV, and JEV (Mastrangelo et al., 2012; Lai et al., 2017). Further, the NITD-618 is an effective NS4B inhibitor against all DENV serotypes (Lim et al., 2013); ribavirin impedes the DENV methyltransferase and HCV replication (Chang et al., 2011; Tomlinson and Watowich, 2011). Moreover, the NITD 008 and NITD 203 are RNA-dependent RNA polymerase inhibitors and target all the four serotype of DENV, WNV, and YFV (Caillet-Saguy et al., 2014). Lycorine displays the antiviral activity against many flaviviruses like YFV, WNV, and DENV-1 (Harms et al., 1991; Caillet-Saguy et al., 2014). Despite several inhibitors tested, only a few are proved efficient against the circulating mutant strains of viruses. In literature, limited computational resources are available for predicting antiviral potential of any compound. Our group has been developing various web servers viz. AVPpred for predicting the effective antiviral peptides (Thakur et al., 2012), AVP-IC50Pred dedicated to identify the antiviral activity of a peptide based on the half life inhibitory concentration (Qureshi et al., 2015). Likewise, AVCpred platform was designed to predict general antiviral compounds (Qureshi et al., 2017) and HIVProtI for predicting and designing inhibitors specifically against Human Immunodeficiency Virus proteins (Qureshi et al., 2018). Since, flaviviruses have been emerged as worldwide threat, affecting more than 50% population globally (∼40% infected by DENV alone) (Holbrook, 2017). Therefore, there is a need to accelerate the development of efficient therapeutics. Hence, in current study we are providing anti-flavi, a web platform for prediction and designing of novel antiviral compounds specifically against flaviviruses.

Materials and Methods

Data Collection

For the development of predictive models, the flaviviral inhibitors were extracted from ChEMBL database (Gaulton et al., 2017), whereas the anti-flaviviral peptides were retrieved from AVPdb database (Qureshi et al., 2014). The ChEMBL is a comprehensive repository, which contains manually curated bioactive molecules possessing drug-like properties. It has been previously utilized for development of various algorithms, e.g., AVCpred (Qureshi et al., 2017), HIVProtI (Qureshi et al., 2018), CLC-Pred (Lagunin et al., 2018), and Pred-hERG (Braga et al., 2015). The AVPdb database is a comprehensive database of experimentally verified antiviral peptides (AVPs) and was earlier utilized for various algorithm like AVP-IC50Pred (Qureshi et al., 2015). In the current study, we fetched the data for the inhibitors (chemicals and peptides) designed to “target” whole “organism.” The chemicals against whole organism were extracted by using specific keywords like “Dengue virus,” “Hepatitis C virus,” “West Nile Virus,” “Yellow Fever virus,” and “Japanese encephalitis virus.” Majority of the inhibition profile was reported in the form of half maximal inhibitory concentration, i.e., IC50, therefore we preceded our study with it. Likewise the anti-flaviviral peptides were extracted from AVPdb database with inhibition profile as the half maximal inhibitory concentration. Initially, we obtained 65, 2038, 33, 22, 10 inhibitors against DENV (serotype 1–4), HCV, WNV, YFV, and JEV whereas 117 unique anti-flaviviral peptides. Finally, after filtering the anti-flaviviral inhibitors with relevant information and removing the redundant entries we acquired 2168 chemicals and 117 peptides (length 7 to 25 amino acids), respectively. The regression-based models were constructed on the negative logarithm of half maximal inhibitory concentration (pIC50 = -log10 (IC50(M))) (Kalliokoski et al., 2013; Zhou et al., 2015; Bag and Ghorai, 2016). For the development of prediction algorithm, the complete dataset was sub-divided (in triplicate) into training/testing (90%) and independent validation (10%) sets. Later, out of the three, one of the dataset set was used for algorithm development.

Quantitative Structure Activity Relationship Based Model Development

The quantitative structure–activity relationship (QSAR) is a mathematical relationship between a biological activity and physiochemical property of any compound (Cherkasov et al., 2014). It uses various descriptors that represent the chemical characteristics of a molecule in numerical form, i.e., 1D, 2D, and 3D. We utilized the PaDEL software to extract out various molecular descriptors and fingerprints (Yap, 2011). Further, the descriptors were used for model development of anti-flaviviral compounds. Initially, the PaDEL resulted in 16384 descriptors included in 2D, 3D, and fingerprints categories. This strategy was further employed for the algorithm development in various previous studies (Qureshi et al., 2017, 2018; Rajput et al., 2018).

Format Conversion

We performed format conversion before extracting the PaDel descriptors, in order to get the 3D descriptors along with 2D and fingerprints. In case of chemicals, the retrieved SMILES from ChEMBL were translated to SDF format through obabel software (O’Boyle et al., 2011). Whereas the anti-flavi peptides were in the form of amino acid sequences, which were firstly converted to pdb using pepstrmod (Singh et al., 2015) software with length 7 to 25 amino acids. Later on the pdbs were converted to SDF format using obabel software. We proceeded for the pdb to sdf conversion because the pdb format does not providing the complete descriptors as compared to sdf of the peptides.

Ten-Fold Cross Validation

Initially, the model was developed on the training/testing by sub grouping into 10 almost equal parts. Of the 10 subgroups, single part is retained for testing while remaining nine was utilized for training purpose. This process was iterated 10 times, and every subgroup got the chance to be testing dataset. Further, for checking the performance of developed model the accuracy of all the 10 iterations were averaged out (Rajput et al., 2015; Thakur et al., 2016). Finally, the developed model on training/testing data set was cross-evaluated independent validation dataset.

Support Vector Machine

For developing the regression-based predictive models, we used support vector machine (SVM) learning algorithm (Hearst, 1998). In regression mode, the SVM works on defining the function (the loss function/epsilon intensive), which ignores errors and situated within the specific distance boundaries of the actual value (Bouboulis et al., 2015). The support vector regression (SVR) is of two types, i.e., linear and non-linear. However, the non-linear SVR is much more complex as it employed kernel approach to address curse of the dimensionality. We employed SVM module of support vector machine to develop all the models.

Random Forest

Random forest (RF) is an ensemble-learning method that works on the basis of decision tree model with bootstrapping algorithm. First, the decision tree was made from training data sets and the classes of unknown sample is assigned either according to the mode of classes in classification or mean prediction for regression based data sets. RF was used through Waikato Environment for Knowledge Analysis (WEKA) package in prediction model development (Frank et al., 2004).

Feature Selection

Feature selection is an important technique to extract out the best contributing features from the existing features. We implemented WEKA package for feature selection, initially the RemoveUseless filter were used for preprocessing. Further, the attributes were selected through CFsSubsetEval (attribute evaluator) and BestFirst (search method) (Frank et al., 2004). Finally, we got best representative features (relevant) for all the models.

Performance Measure

The performance of the QSAR developed models was evaluated using correlation coefficient (R, PCC). Pearson’s correlation coefficient (R) or bivariate correlation determine the association between two variables (actual and predicted) and calculated by the formula: Its value ranges from +1 to -1, +1 means the two variables are positively correlated whereas -1 depicts the negative correlation between two variables, here, n, , and ct are size of the data set, predicted, and actual efficiencies.

Model Performance

We checked the appropriateness of the developed models by plotting the actual v/s predicted inhibition (Qureshi et al., 2018). The plot was constructed on the actual and predicted values of training/testing as well as independent validation data sets. The scatter plot was used to depict the relationship between both the values. The best predictive ability of model is depicted by the localization of the points of actual and predicted values on/nearest to the trend line.

Decoy Set

We used decoy set to check the robustness of our developed models. There were few tools like DUD (Huang et al., 2006), DecoyFinder (Cereto-Massague et al., 2012), and RADER (Wang et al., 2017) for designing the decoys of the chemicals. In our study, the decoys were generated from the latest tool, i.e., RApid DEcoy Retriever (RADER) software (Wang et al., 2017) against the 2168 anti-flavi chemicals with similar 1D physicochemical properties but different 2D topology.

Clustering

We performed clustering using ChemMine tool (Backman et al., 2011). We used multidimensional scaling clustering method by both 2D and 3D method with cutoff similarity of 0.4. However, the clustering of the peptide sequences was dome using CLuster ANalysis of Sequences (CLANS) software (Frickey and Lupas, 2004), which performs all-against-all BLAST search.

Results

The 16,384 features of anti-flavi chemicals and peptides were subjected to feature selection, which resulted in 8700 and 3822 features for chemicals and peptides, respectively, after the preprocessing by RemoveUseless filter. Further, the 8700 and 3822 features were processed using CfssubsetEval and BestFirst attribute selector and reduced to 124 and 19 features against chemicals and peptides correspondingly. The detailed information of all the selected descriptors of chemicals and peptides are provided in Supplementary Tables , , respectively. The models were developed using these reduced and relevant features.

Performance of QSAR-Based Models

The 2168 anti-flavi chemicals were divided into training/testing and independent validation data sets with 1952 and 216 sequences, respectively, through randomization method. The best performing model displayed the correlation of 0.87 and 0.87 through SVM and RF machine learning technique during 10-fold cross-validation (Table ). Whereas, the independent validation data set showed the correlation of 0.87 and 0.86 correspondingly with developed model during the cross-validation through SVM and RF techniques (detailed in Supplementary Table ). Performance of training/testing and independent validation data sets of anti-flavi chemicals and peptides on 10-fold cross validation using Support Vector Machine and Random Forest techniques. The 117 anti-flavi peptides were grouped into 112 sequences as training/testing and 15 as independent validation data sets. Out of the three randomized models, the best one achieved correlation of 0.84 and 0.83, respectively, using SVM and RF techniques during 10-fold cross-validation on training/testing data sets (Table ). While, the independent validation data set displayed the correlation of 0.84 and 0.86 correspondingly on RF and SVM techniques (detailed in Supplementary Table ). We checked the robustness of the model by plotting actual v/s predicted value and residual plot of residuals v/s predicted values on independent validation data set of both chemical and peptides. The experimental v/s predicted values of independent validation dataset are shown in Figure . The plot between actual and predicted inhibition displayed the statistical significance among the pIC50 of the model on independent data sets. Maximum points found to be lie close to the origin, which shows that the model developed using training/testing data sets are robust. The scatter plot for actual v/s predicted of independent validation data set using SVM technique is provided in Figure . However, the scatter plot for actual v/s predicted inhibition efficiency of independent validation data set using RF technique is available in Supplementary Figure . Further, the residual plot also prove the robustness of the developed model as maximum points exist close to the origin line as shown in SVM (Figure ) and RF (Supplementary Figure ) models. Scatter plot for Actual vs. Predicted inhibition for the independent validation data set on the Support vector machine developed models on (A) anti-flavi chemicals and (B) anti-flavi peptides. Residual plot for Residuals vs. Predicted inhibition for the independent validation data set on the Support vector machine developed models on (A) anti-flavi chemicals and (B) anti-flavi peptides. Further, the robustness of the model was checked using decoy data set. We opted top most hit of each 2168 chemicals, which resulted in 1417 decoys. The predicted pIC50 of the decoy is ranges from 3.03 to 7.83, as shown in Supplementary Table .

Peptidomimetics Approach

We checked the peptidomimetics approach in the anti-flaviviral inhibitors by swapping the most contributing features of and peptides (19) and chemicals (124) among each other along with the hybrid features (143) using 10-fold cross validation through SVM technique. On employing the 124 features of chemicals on 117 anti-flavi peptides and 19 features on 2168 chemicals, we achieved the PCC of 0.53 and 0.74, respectively. Interestingly, on combining the top contributing features of anti-flavi chemicals and peptides, i.e., 143, we got the PCC of 0.83 and 0.87 on chemicals and peptides correspondingly. Detailed results are tabulate in Table . Table depicting the performance of swapped most-contributing features of chemicals and peptides over each other during 10-fold cross validation employing support vector machine. We performed clustering of the anti-flavi chemicals and peptides. The clustering displayed that anti-flavi compounds are highly diverse with clustered in 58 different clusters as shown in Figure , with cutoff similarity threshold of 0.4. Additionally, the 2D plot of clusters in provided in Supplementary Figure . 3-Dimensional plot showing the chemical spacing of 2168 anti-flavi chemicals embedded in 3D space with 58 different clusters. We also perform clustering of the peptide sequences, to check the diversity in out anti-flavi data sets (as shown in Figure ). The p-value range for the clustering was set between 1e-90 and 0.1, most of the peptides were singleton. At such a stringent p-value we get only 10 clusters, rest sequences were found in unclustered. The clustering plot of the 117 anti-flavi peptides showing the clustering pattern at p-value 1e-90 and 0.1.

Webserver

Anti-flavi integrates SVM and RF predictive models to identify the inhibition efficiency of any chemical or peptides using QSAR-based approaches. For the prediction of anti-flavi chemicals, the user can provide input in form of multiple sdf formats and the output would be available in tabulated form with information of SMILES, 2D structure, important chemical descriptors, and inhibition efficiency. Whereas, for predicting the flaviviral inhibition potential of peptides the input would be provided in form of pdb format, which further led to the output as percentage inhibition of the peptide and other specifications like SMILES, 2-D image, and descriptors. As the calculation of unknown chemicals and peptides usually took 2–5 min, so the user can note the job id and retrieve the results any time using “check job status” page. The anti-flavi webservers also displayed the clustering analyses of both chemicals and peptides under the “analysis” portion. Moreover, we are also providing the format conversion facility, where the user can draw/paste the structure and get the output in form of SMILES, sdf, and mol format. The overall architecture of the anti-flavi is provided in Figure . Architecture of anti-flavi web server.

Discussion

Flaviviruses emerged as an expanding threat to human health globally (Daep et al., 2014). Various efforts has been made to develop an effective anti-flavi drugs aiming specific replication, structural, non-structural, and host protein, as well as non-specific targets, etc (Sampath and Padmanabhan, 2009; Bollati et al., 2010; García et al., 2017). To tackle the severity of the RNA viruses, the European Union released VIZIER (Coutard and Canard, 2010) and SILVER[2] projects for drug discovery against viruses. However, various computational efforts would be useful along with the experimental ones to speed up the antiviral drug discovery process. In this regards, the present study is focused to develop first dedicated computational platform against the flavivirus. We used anti-flavi chemicals and peptides for developing the predictive models. Further, the peptidomimetics approach was also explored in addition to the individual chemicals and peptides. Interestingly, the performance of the developed models on chemicals is more than peptide and peptidomimetics. Additionally robustness of the developed models was also cross checked by plotting actual v/s predicted inhibition values of training/testing and independent validation data set. Finally, the predictive models proved to be statistically significant, which depicts their ability to predict any unknown agent as anti-flaviviral with high efficiency (Fatemi et al., 2015). The concept of peptidomimetics is evidenced to be successful among the drug inhibitors, few examples of peptidomimetics were also reported against WNV (Lim et al., 2011; Hammamy et al., 2013), HIV (Kazmierski et al., 2006), etc. Our study also demonstrated the same, as we achieved good performance though most contributing features of chemicals on the peptides and vice versa. Intriguingly, the performance of the models increases when the most contributing features of chemicals and peptides were used together. Therefore, our study suggests that the concept of peptidomimetics can also be implemented in the anti-flaviviral agents. We evaluated the performance of the models through independent validation and decoy data set. As, the developed predictive models also showed good performance on both independent validation and decoy data set, which further proves their robustness. We also tried to compare our algorithm with existing one, but didn’t able to perform direct comparison, due to lack of any method for anti-flaviviral agents. The diversification of the chemicals and peptides were also explored using different clustering methods for both type of agents. The clustering analyses displayed high level of diversification among the anti-flavi agents at statistically significant conditions. Majority of chemicals and peptides tend to remain un-clustered rather than showing similarity through cluster forming tendency. The effective inhibitors against flaviviruses are the need of the hour. The incorporation of computational approach with experimental one would definitely speed up the process of anti-flavi agents’ discovery. We used 10-fold cross validation to develop a robust prediction algorithm, which was further cross validated with independent validation as well as decoy data set. We, first time, incorporate peptidomimetics approach in prediction algorithm against flaviviruses. Therefore, this computational method would be highly beneficial to microbiologists and virologists, working hard to develop a novel and effective antiviral agents. This algorithm can be used to filter out the highly effective anti-flavi agents, which can be tested directly in experimental lab, rather than doing initial high through put screening. The limitation of our study is that the predictive models were developed on major flaviviral species rather than all, e.g., HCV, DENV, ZIKV, and WNV.

Author Contributions

MK conceived the idea and helped in overall supervision. AR and MK performed the data collection, model development, analyses, and wrote the manuscript. AR executed the web server.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Table 1

Performance of training/testing and independent validation data sets of anti-flavi chemicals and peptides on 10-fold cross validation using Support Vector Machine and Random Forest techniques.

Data	Descriptors	Features	MLTs	PCC	Data set	PCC	Data set
				Training/testing		Independent validation
Chemicals	16383	124	SVM	0.87	TESTSET = 1952	0.87	TESTSET = 216
Peptides	16383	19	SVM	0.84	TESTSET = 105	0.84	TESTSET = 12
Chemicals	16383	124	RF	0.87	TESTSET = 1952	0.86	TESTSET = 216
Peptides	16383	19	RF	0.83	TESTSET = 105	0.86	TESTSET = 12

Table 2

Table depicting the performance of swapped most-contributing features of chemicals and peptides over each other during 10-fold cross validation employing support vector machine.

Data	Descriptors	Features	PCC	Dataset
Chemicals	16383	19	0.74	TESTSET = 2168
Peptides	16383	124	0.53	TESTSET = 117
Chemicals	16383	143	0.87	TESTSET = 2168
Peptides	16383	143	0.83	TESTSET = 117

7 in total

1. Targeting non-structural proteins of Hepatitis C virus for predicting repurposed drugs using QSAR and machine learning approaches.

Authors: Sakshi Kamboj; Akanksha Rajput; Amber Rastogi; Anamika Thakur; Manoj Kumar
Journal: Comput Struct Biotechnol J Date: 2022-06-30 Impact factor: 6.155

2. Meta-iAVP: A Sequence-Based Meta-Predictor for Improving the Prediction of Antiviral Peptides Using Effective Feature Representation.

Authors: Nalini Schaduangrat; Chanin Nantasenamat; Virapong Prachayasittikul; Watshara Shoombuatong
Journal: Int J Mol Sci Date: 2019-11-15 Impact factor: 5.923

3. Prediction of repurposed drugs for Coronaviruses using artificial intelligence and machine learning.

Authors: Akanksha Rajput; Anamika Thakur; Adhip Mukhopadhyay; Sakshi Kamboj; Amber Rastogi; Sakshi Gautam; Harvinder Jassal; Manoj Kumar
Journal: Comput Struct Biotechnol J Date: 2021-05-24 Impact factor: 7.271

4. Biofilm-i: A Platform for Predicting Biofilm Inhibitors Using Quantitative Structure-Relationship (QSAR) Based Regression Models to Curb Antibiotic Resistance.

Authors: Akanksha Rajput; Kailash T Bhamare; Anamika Thakur; Manoj Kumar
Journal: Molecules Date: 2022-07-29 Impact factor: 4.927