Literature DB >> 28542505

AlzhCPI: A knowledge base for predicting chemical-protein interactions towards Alzheimer's disease.

Jiansong Fang1,2, Ling Wang3, Yecheng Li3, Wenwen Lian4, Xiaocong Pang4, Hong Wang1, Dongsheng Yuan1, Qi Wang1,2, Ai-Lin Liu4, Guan-Hua Du4.   

Abstract

Alzheimer's disease (AD) is a complicated progressive neurodegeneration disorder. To confront AD, scientists are searching for multi-target-directed ligands (MTDLs) to delay disease progression. The in silico prediction of chemical-protein interactions (CPI) can accelerate target identification and drug discovery. Previously, we developed 100 binary classifiers to predict the CPI for 25 key targets against AD using the multi-target quantitative structure-activity relationship (mt-QSAR) method. In this investigation, we aimed to apply the mt-QSAR method to enlarge the model library to predict CPI towards AD. Another 104 binary classifiers were further constructed to predict the CPI for 26 preclinical AD targets based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms. The internal 5-fold cross-validation and external test set validation were applied to evaluate the performance of the training sets and test set, respectively. The area under the receiver operating characteristic curve (ROC) for the test sets ranged from 0.629 to 1.0, with an average of 0.903. In addition, we developed a web server named AlzhCPI to integrate the comprehensive information of approximately 204 binary classifiers, which has potential applications in network pharmacology and drug repositioning. AlzhCPI is available online at http://rcidm.org/AlzhCPI/index.html. To illustrate the applicability of AlzhCPI, the developed system was employed for the systems pharmacology-based investigation of shichangpu against AD to enhance the understanding of the mechanisms of action of shichangpu from a holistic perspective.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28542505      PMCID: PMC5460905          DOI: 10.1371/journal.pone.0178347

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Alzheimer’s disease (AD) is the most common neurodegenerative disease in elderly people, which is accompanied by the progressive impairment of memory and cognitive function [1]. The pathological hallmarks of AD are mainly characterized by extracellular senile plaques (SPs) and intracellular neurofibrillary tangles (NFTs), as well as selective cholinergic neuronal loss [2]. Current drugs for AD treatment that target cholinergic and glutamatergic neurotransmission, such as donepezil and memantine, show limited benefits to most AD patients [3, 4]. Therefore, there is an urgent need to develop an effective treatment that could not only improve symptoms but also modify the disease process. The aetiology of AD is multifactorial. Considering the complexity of AD, the classic “one drug, one target” solution is not effective enough [5]. Indeed, many research projects in the field have been focused on developing multi-target/multifunctional therapies to modify the disease process [6-9]. Experimental identification of hits that interact with multiple proteins is costly, time consuming, and labour intensive. In silico target prediction is a fast and cheap alternative to experimental target identification approaches, which could accelerate the discovery of “multi-target-directed ligands (MTDLs)” against AD. The central issue of target prediction is to identify the chemical-protein interactions (CPI) between chemicals and proteins. Two main computational methods are used to predict the CPI for a given ligand, which were summarized by a recent review [10]. The methods are the ligand-based target prediction (LBTP) approach [11, 12] and the structure-based target prediction (SBTP) approach [13, 14]. As an LPTP approach, the multi-target quantitative structure-activity relationship (mt-QSAR) method is highly predictive and convenient and can simultaneously predict activities against different targets by using large and heterogeneous chemical datasets [15]. Cheng et al. built 200 mt-QSAR models for 100 GPCRs and 100 kinases using the support vector machine (SVM) algorithm and found that the models performed better than that built using the chemogenomic method [16]. Inspired by Cheng’s work [16], we built 100 binary classifiers to predict the chemical-protein interactions for 25 key targets against AD using the mt-QSAR method. The validated models were used to explore the polypharmacology against AD, and the prediction results were confirmed by the reported bioactivity data and our in vitro experimental validation, resulting in several highly potent MTDLs [17]. However, there are still some pitfalls and disadvantages that limit their application. First, the models only include drug candidate targets that entered into phase I clinical trials, excluding those in preclinical trials. Second, it is inconvenient and unscientific that no criteria for target naming and classification are defined. Furthermore, no publicly available knowledge base has been developed to integrate the binary classifiers that we built. Thus, it is still necessary to improve and update this research to predict CPI towards AD. The current work aims to apply the mt-QSAR method to enlarge the model system (AlzhCPI) to predict CPI towards AD. The schematic workflow of AlzhCPI is shown in Fig 1. Based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms, the updated system assembled 204 binary classifiers to integrate the chemical and pharmacological information derived from the BindingDB database. All developed classifiers were validated by 5-fold cross-validation and test set validation. To provide a free service for the scientific community, a web server named AlzhCPI was developed to integrate comprehensive information approximately 204 binary classifiers into a web-based information system. To illustrate examples of AlzhCPI, the developed system was employed for systems pharmacology-based investigation of shichangpu against AD, which aided in analysing the mechanisms of action of shichangpu.
Fig 1

The schematic workflow of AlzhCPI to predict cheimical-protein interactions toward Alzheimer's disease based on the multitarget quantitative structure-activity relationships (mt-QSAR).

Materials and methods

Data set construction

Following a similar procedure to the previous study, the Thomson Reuters Integrity Database [18], the Therapeutic Target Database (TTD) [19], and text mining from references [20-22] were used to collect targets for AD in preclinical trials, resulting in 26 preclinical targets. Together with 25 important targets that had entered into at least phase I clinical trials, 51 targets related to AD were obtained (Fig 2). After that, the names of the targets were imported into the UniProt database [23] to acquire the corresponding encoding gene, UniProt ID, entry name, and standardized protein name (S1 Table). The chemical structures and bioactivity data of the ligands for the 26 preclinical targets were downloaded from the Binding Database (http://www.bindingdb.org, accessed July 2015) [24].
Fig 2

Summary of 51 key targets in AlzhCPI.

The ligands were standardized using the following criteria: (i) duplicate molecules were deleted; (ii) salts were converted to the corresponding acid or base and solvent molecules were removed from hydrates; and (iii) the molecule was considered to be positive (designated +1) if its Ki, EC50 or IC50 ≤ 10 μM. After filtering, 21,468 active ligands were got. The decoy compounds (designated -1) for 26 targets were mainly generated through three ways (S2 Table): (i) randomly extracted from the specs database; (ii) directly extracted from DUD subsets; and (iii) generated in the DUD online database with known active compounds. The ratio of decoys to active ligands is 3. Both the active and decoy compounds were randomly divided into two groups (training set and test set at a ratio of 3).

Chemical descriptors calculation

Two kinds of fingerprints were calculated for the description of the small molecules. The first was the ECFP_6 fingerprint, which was calculated by the Discovery Studio 4.0 software [25]. Extended connectivity fingerprints (ECFP) represents a much larger set of features than a set of predefined substructures. The other was the MACCS fingerprint computed by PaDEL-Descriptor 2.18 [26]. MACCS used a dictionary of MDL Public Keys, which contains the 166 most common substructure patterns. A detailed description of these fingerprints can be found in the original literature [27, 28].

mt-QSAR method

In traditional QSAR studies, one binary classifier can only predict the activity of a compound against one specific target. The essence of mt-QSAR is to decompose the multi-label problem into multiple binary classification problems. As a consequence, to predict one molecule against 26 preclinical targets related to AD, 104 mt-QSAR classifiers were constructed based on two fingerprints (ECFP_6 and MACCS) and two machine learning algorithms (naive Bayesian and recursive partitioning). For each target, four classifiers (NB_ECFP6, NB_MACCS, RP_ECFP6 and RP_MACCS) can be used to predict the activity of a given molecule.

Naive Bayesian

The naive Bayesian (NB) models were developed using Discovery Studio 4.1 [25]. An advantage of NB classifiers is that they can process an abundance of data, can learn fast and are tolerant of random noise. A more detailed introduction can be found in the following references [29, 30]. In general, NB is a simple probabilistic classifier based on applying Bayesian theory with strong (naive) independence assumptions, which relates the conditional and marginal probabilities of two events. It generates the posterior probabilities based on the core of the function, given by Eq 1. The specific meaning of each parameter can be found in our previous study.

Recursive partition

Recursive partitioning (RP), using Discovery Studio 4.1 [25], was applied to develop decision trees to categorize the data set into active compounds and decoys. RP is a statistical method for multivariable analysis that operates by developing a decision tree to classify the members. Models are constructed by successively splitting a data set into smaller and smaller subsets using a set of hierarchical rules. The result of an RP model is more intuitive than other algorithms because it can be demonstrated by a “decision tree” or “graph” [31, 32]. In this study, 5-fold cross-validation was adopted to determine the degree of pruning to obtain the best predictive accuracy. The specific parameters were set as follows: minimum number of samples at each node and maximum tree depth, where the maximum tree depth was 10, 20 and 20.

Measurement of prediction quality

The internal 5-fold cross-validation and external test set validation were applied to evaluate the training sets and test set, respectively. In a 5-fold cross-validation, the entire data set was equally divided into 80% samples for training the model and 20% data samples for an internal validation set. The quality of all Bayesian and RP classifiers was evaluated based on the quantity of true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). The sensitivity (SE), specificity (SP), overall prediction accuracy (Q), and Matthews correlation coefficient (MCC) were further calculated using Eqs 2–5, respectively. In addition, the area under the receiver operating characteristic (ROC) curve (AUC) was also calculated. The ROC curve shows the separation ability of a binary classifier by iteratively setting the possible classifier threshold [33]. The AUC value falls in the range of 0.5≤AUC≤1. AUC = 1.0 means a perfect classifier, whereas AUC = 0.5 indicates the classifier has no discriminative power.

Compound filtering in the case study

A total of 132 chemical structures in the herb Acorus tatarinowii Schott (shichangpu) were obtained from the Traditional Chinese Medicine System Pharmacology Database [34] (TCMSP, http://tcmspnw.com), the potential target database of TCM [35] (TCM-PTD, http://tcm.zju.edu.cn/ptd), the Traditional Chinese Medicine Integrated Database [36] (TCMID, http://www.megabionet.org/tcmid/) and relevant references [37, 38]. Given that the content of most chemicals was very low, 22 typical ingredients with contents in the volatile oil higher than 0.1% were kept for further study, according to previous publications [39, 40]. The SMILES structure of the 22 compounds are given in S3 Table.

Target prediction for approved drugs and shichangpu against AD

The putative targets for approved drugs and shichangpu against AD were predicted by AlzhCPI. Considering that each classifier has its strengths and weaknesses, it is more reasonable to predict the activity of one given compound by combining the results from the four classifiers. Herein, a chemical-protein interaction is defined as a potential interaction if the molecule is predicted to be active by at least two out of the four single classifiers within one target.

Network construction and analysis

To reveal the underlying mode of action between compounds and targets, compound-target networks were constructed. The networks were generated and analysed using Cytoscape 3.2.0 [41]. The degree of a node was calculated by the network analysis plugin in Cytoscape, which defines the number of edges connected to a node, implying the significance of the node in a network.

Results and discussion

Data set analysis

To explore the chemical diversity of the data set used in the training set and test set, the Tanimoto similarity index was calculated using the ECFP_2 fingerprint in Discovery Studio 4.1 [25]. Tanimoto similarity index is an indicator to reflect chemical diversity within a data set, and a smaller value indicates that compounds within the data set have better diversity. As given in Table 1, similar to previous results for 25 targets, the Tanimoto indexes range from 0.054 to 0.338 for 26 training sets and 0.013 to 0.270 for 26 test sets, which indicates that the entire data set of 51 targets is diverse enough.
Table 1

Detailed statistical description of the entire data set based on the multi-label classification strategy.

Encoding GeneTraining set (ECFP2)Test set (ECFP2)
InhibitorsdecoysTotalTanimoto indexInhibitorsdecoysTotalTanimoto index
HTR2A2200660088000.288742222629680.198
ADORA2A2360708094400.279783234931320.179
CHRM2380114015200.2491283845120.15
PDE9A1103304400.11433991320.046
GRM231093012400.281063184240.234
GRM3501502000.3051648640.203
MAPK8780234031200.19226679810640.091
MAPK933099013200.131083244320.06
MAPK10510153020400.1831745226960.056
MAPK14401201600.1811957760.171
HS90AA1750225030000.2152487449920.1361
PIN1601802400.1252369920.0544
MAPT401201600.11251236480.0209
PTGS21760528070400.542583174923320.164
NOS2570171022800.331845527360.288
MPO601802400.3381957760.211
CHUK1203604800.173411231640.098
IKBKB600180024000.221985947920.123
TNF560168022400.1841925767680.083
ALOX121203604800.2401201600.119
CTSD1250375050000.246423126916920.093
PDK1440132017600.2611494475960.2
HMGCR600180024000.2331995977960.136
IDE601802400.0542060800.013
PPARG1730519069200.264582174623280.171
CES129087011600.3051003004000.27
The distribution of the target and ligand space in AlzhCPI was also investigated. As presented in Fig 3A, the target space (n = 51) can be divided into seven subfamilies according to multiple mechanisms involved in the pathogenesis of AD [20], namely modulating neurotransmission (n = 23), the tau pathology approach (n = 10), Aβ-related treatment approaches (n = 4), targeting intracellular signalling cascades (n = 3), the anti-inflammatory approach (n = 7), the mitochondrial dysfunction approach (n = 2), and the metabolic dysfunction approach (n = 3). Detailed information on the target classification is given in S4 Table. The number of corresponding ligands for seven subfamilies was 20,473, 4,762, 2,995, 1,169, 5,047, 2,262 and 3,501, respectively (Fig 3B). The above analysis demonstrates that the entire data set has diverse ligand and target coverage.
Fig 3

Targets (A) and active compounds (B) classification within the entire data set in AlzhCPI.

Targets (A) and active compounds (B) classification within the entire data set in AlzhCPI. The prediction quality for each sub-family were also evaluated by calculating the average MCC and AUC values in the 5-fold cross-validation (S5 Table). The high performance was obtained for each sub-family. For example, the average MCC value of NB_ECFP6 models for each sub-family ranges from 0.952 to 0.990, while their average AUC value falls in the range of 0.994 to 0.999.

Model evaluation and comparison

The classification performance of 104 classifiers for 26 preclinical targets was evaluated, and the results are given in Tables 2 and 3. In Table 2, the statistical results for the training sets were achieved using 5-fold cross-validation. Among the 104 models, 80 classifiers out of 104 (77%) obtain an MCC value higher than 0.8, whereas 98 models out of 104 (94%) give an AUC value higher than 0.9. In general, the values of MCC range from 0.564 to 1, with an average of 0.887, whereas the values of AUC fall in the range of 0.815 to 1, with an average of 0.968. The more detailed performance of the training sets can be found in S6 Table. Furthermore, 90 out of 104 models (87%) have the values of Q higher than 0.9, with an average of 0.954. The results above indicate that the overall predictive accuracies of the mt-QSAR models are desirable.
Table 2

Performance of the 5-fold cross-validation for 26 targets towards Alzheimer disease using NB and RP classifiers.

Encoding GeneECFP6MACCS
NBRPNBRP
MCCAUCMCCAUCMCCAUCMCCAUC
HTR2A0.99210.9440.9880.7320.9480.9380.989
ADORA2A0.98910.9470.9890.890.9810.9840.995
CHRM20.9840.9990.8770.9760.7790.9630.9280.978
PDE9A0.9940.9990.9130.970.9390.9930.9470.971
GRM20.98910.9550.9870.7540.9620.8920.979
GRM3110.8820.9680.9060.9840.8890.961
MAPK80.99110.9160.9730.7070.9410.8930.966
MAPK90.980.9960.8520.9610.7630.9450.8220.939
MAPK100.9520.9930.8660.9560.650.9150.8490.943
MAPK14110.9050.9350.9160.980.7950.897
HS90AA10.9750.9970.9280.9840.6890.9410.9110.97
PIN10.9780.9990.9140.9640.9780.9980.8120.922
MAPT0.9370.9980.7250.8860.7940.9040.7240.815
PTGS20.9560.9970.930.9820.6980.9350.9650.991
NOS20.9760.9990.8860.9680.7020.9290.8870.97
MPO0.9560.9960.9140.9630.7810.9560.9180.953
CHUK0.9830.9920.9550.9610.7290.9710.8820.947
IKBKB0.99310.9320.9830.7750.9540.9050.967
TNF0.8670.9850.8140.9330.5640.8540.7980.938
ALOX120.98910.9240.980.880.9860.9360.989
CTSD0.9610.9940.9760.9940.7290.9490.9420.992
PDK10.9950.9970.9810.9960.9850.9940.9830.991
HMGCR0.99110.9740.9960.9350.9980.970.995
IDE0.8510.9880.6790.8810.680.9230.7530.829
PPARG0.9810.9980.9550.9910.7450.9470.9340.988
CES10.9560.9990.9340.9720.6760.9130.890.969
Table 3

Performance of the test set validation for 25 targets towards Alzheimer disease using NB and RP classifiers.

Encoding GeneECFP6MACCS
NBRPNBRP
MCCAUCMCCAUCMCCAUCMCCAUC
HTR2A0.9530.9970.8840.9670.6780.9310.8380.959
ADORA2A0.6530.9490.6810.9110.5530.8680.260.714
CHRM20.7970.9610.7380.8890.6640.9150.6510.939
PDE9A0.960.9940.8360.9540.6430.9820.7710.855
GRM20.9560.9890.8930.9550.5440.8760.6870.917
GRM30.8320.8970.7970.9110.7850.8740.7880.847
MAPK80.9270.9910.8010.9280.6510.9030.7460.898
MAPK90.8290.9560.6810.8690.6330.9010.6150.874
MAPK100.7870.9370.6950.8790.5410.8520.5940.84
MAPK140.9650.9840.8940.9210.750.9350.3930.7
HS90AA10.8210.9350.8070.8970.5850.880.7450.857
PIN10.8540.9640.7910.9060.7280.8990.6980.887
MAPT0.8320.970.4080.7480.5910.8540.4150.779
PTGS20.8540.9830.7560.9190.5870.8740.8980.976
NOS20.8930.9830.7520.9010.5430.8410.6680.894
MPO0.7870.9940.6660.8650.3830.6290.4920.752
CHUK0.7350.9390.7310.8560.7260.9280.6770.921
IKBKB0.8950.9730.8320.9110.6960.9070.7180.915
TNF0.6970.9150.5010.7910.1710.7220.5020.814
ALOX120.8490.970.7520.9060.7180.9010.8040.932
CTSD0.8850.9740.920.950.6470.9130.8670.941
PDK10.9460.9590.9550.9760.9230.9610.9370.955
HMGCR0.96410.9630.9870.9130.9950.9290.984
IDE0.8640.9830.3210.7290.1140.690.4010.704
PPARG0.8970.9650.8840.9480.6610.9160.8030.928
CES10.6830.9290.8090.9190.4720.7920.6620.861
To further evaluate the built mt-QSAR models, external test set validation was also performed to control the quality of the computational model. As shown in Table 3, the test sets of 104 mt-QSAR classifiers achieve an overall acceptable performance. The MCC values range from 0.114 to 0.965, with an average value of 0.724. The AUC values range from 0.629 to 1.0, with an average of 0.903. Among the 26 preclinical targets, the four models from the insulin-degrading enzyme (IDE_HUMAN) perform the worst, with average MCC and AUC values of 0.501 and 0.777, respectively. The main reason for this is that few active compounds are included in the training set (n = 60), resulting in a narrow application domain of the generated classifiers, which fails to predict the test set (n = 20). The detailed performance of the test sets is given in S7 Table. The updated AlzhCPI was composed of 204 binary classifiers towards 54 important targets related to AD. To compare the performance of four types of classifiers (NB_ECFP6, NB_MACCS, RP_ECFP6 and RP_MACCS), a boxplot graph (Fig 4A) was plotted to show the minimum, lower quartile (Q1), median quartile (Q2), upper quartile (Q3), and maximum of MCC values of test sets. As shown in Fig 4A, among the four types of classifiers, the NB_ECFP6 models (Q2 = 0.953) outperform the other three, and the NB_MACCS classifiers (Q2 = 0.651) perform the worst. However, there are no obvious differences between the performance of RP_ECFP6 (Q2 = 0.816) and RP_MACCS (Q2 = 0.757). As they are based on the same fingerprint, it is interesting that the NB_ECFP6 (Q2 = 0.953) models outperform RP_ECFP6 (Q2 = 0.816), whereas the RP_MACCS (Q2 = 0.757) models outperform than NB_MACCS (Q2 = 0.651). This indicates that the performance of the models derived from the different algorithms depends on which fingerprint is used.
Fig 4

Boxplot shows the minimum, lower quartile (Q1), median (Q2), upper quartile (Q3), and maximum of Matthews correlation coefficient (MCC) on test sets based on four types of classifiers (A) and different fingerprints and algorithms (B).

Boxplot shows the minimum, lower quartile (Q1), median (Q2), upper quartile (Q3), and maximum of Matthews correlation coefficient (MCC) on test sets based on four types of classifiers (A) and different fingerprints and algorithms (B). Similarly, Fig 4B depicts the distributions of the MCC values based on the different fingerprints and algorithms. The boxplot result indicates that the classifiers (Q2 = 0.879) derived from the ECFP6 fingerprint outperform those (Q2 = 0.708) derived from the MACCS fingerprint. In addition, there is a significant difference in the performance of the NB (Q2 = 0.832) and RP (Q2 = 0.798) models. Thus, the same conclusion can be drawn that both algorithms have their respective advantages. More detailed data for the boxplot can be found in S8 Table. As discussed above, it is necessary to integrate the results of the four single classifiers to predict CPIs. In fact, the advantage of integrated model to identify CPI has been displayed in our previous study, resulting in several highly active MTDLs against AD. In this study, the same integrated criteria is adopted. We defined CPI as a potential interaction if the molecule was forecast to be active by at least two out of the four single classifiers within one target [17].

Implementation of AlzhCPI

In the present study, the multi-target quantitative structure-activity relationship (mt-QSAR) method using naive Bayesian (NB) and recursive partitioning (RP) algorithms was conducted. A web server, namely AlzhCPI, was designed using HTML and CSS technology to provide all the results of our models. In this web server, users can find important fragments for multi-targets against AD given by the naive Bayesian classifier, the case study of the prediction of polypharmacology for known AD drugs, and the detailed 204 binary classifiers towards 54 important targets related to AD. In addition, the users can also download the XML files of 204 models and import them to the PipelinePilot/Discovery Studio software to predict the activities of a given molecule. We anticipate that this server will facilitate the target identification and virtual screening of active compounds for the treatment of AD.

Case study based on AlzhCPI: Systematic analysis of the multiple bioactivities of shichangpu through a network pharmacology approach

AD is caused by multiple genes or their products. Single-target therapy has been found ineffective due to insufficient understanding of the complex disease. Traditional Chinese medicine (TCM), which treats disease based on the concept of “multiple components and multiple targets”, has accumulated rich theories and a great deal of valuable experience in the prevention and treatment of AD [42]. Shichangpu is the most frequently used herbal medicine among anti-AD TCM prescriptions [43-45]. Thus, it is urgently needed to systematically analyse the mechanisms of action of shichangpu from a holistic perspective. Based on AlzhCPI, the potential targets of 22 key compounds of shichangpu against AD were identified, and the associations between the molecules and target proteins are listed in S9 Table. The predicted results were also integrated to construct the compound–target–mechanism network. As shown in Fig 5, shichangpu can target 20 targets from a holistic perspective, which includes six mechanisms involved in the pathogenesis of AD. This means that shichangpu can treat AD through modulating neurotransmission, the tau pathology approach, the metabolic dysfunction approach, Aβ-related treatment, the anti-inflammatory approach and intracellular signalling cascade approach.
Fig 5

The compound–target–mechanism network of shichangpu based on AlzhCPI.

Ellipse, hexagon and triangle represent drug nodes, protein nodes and mechanism nodes, respectively.

The compound–target–mechanism network of shichangpu based on AlzhCPI.

Ellipse, hexagon and triangle represent drug nodes, protein nodes and mechanism nodes, respectively. The degree analysis revealed that the target could interact with multiple molecules (5.75 compounds per target on average), and one compound could also target several proteins related to AD (5.23 targets per compound on average). There were 13 compounds out of 22 that could target at least 5 proteins, which may imply that these compounds are the main pharmacological active ingredients. Among the 13 compounds, both methyl eugenol and asaraldehyde were predicted to ne active against 10 targets. In addition, 10 targets out of 20 could simultaneously interact with at least 5 compounds. Among the 10 proteins, ACHE and PTGS2 achieved the highest degree (n = 21 and 18, respectively) of linking to molecular nodes, indicating that they would have key pharmacological functions in shichangpu.

Conclusion

In this paper, based on the naive Bayesian (NB) and recursive partitioning (RP) algorithms, a model library first built in a previous study was updated by constructing 104 binary classifiers against 26 preclinical AD targets using the mt-QSAR method. The internal 5-fold cross-validation and external test set validation confirmed the prediction reliability of the models. In addition, a web server entitled AlzhCPI was implemented to provide comprehensive information on the approximately 204 binary classifiers and is available free to the scientific community. A case for AlzhCPI was illustrated to systematically analyse the multiple bioactivities of shichangpu through a network pharmacology approach. The results showed that shichangpu could target 20 targets related to AD, which were involved in multiple mechanisms, supporting the TCM theme of “multiple components and multiple targets”. AlzhCPI has potential applications in network pharmacology, drug repositioning, and virtual screening for MTDLs towards AD. The methodology and tools here may provide guidance for constructing similar platforms for other complex diseases.

Detailed information on the 51 targets.

(XLSX) Click here for additional data file.

The generation of decoy compounds.

(XLSX) Click here for additional data file.

The SMILES structures of 22 key compounds in shichangpu.

(XLSX) Click here for additional data file.

The detailed information on the target classification for 51 targets.

(XLSX) Click here for additional data file.

The prediction quality for each sub-family.

(XLSX) Click here for additional data file.

The performance of the 5-fold cross-validation for 26 targets towards Alzheimer’s disease using NB and RP classifiers.

(XLSX) Click here for additional data file.

The performance of the test set validation for 26 targets towards Alzheimer’s disease using NB and RP classifiers.

(XLSX) Click here for additional data file.

The detailed parameter information from the boxplot of the test sets.

(XLSX) Click here for additional data file.

The associations between molecules and targets predicted by AlzhCPI for shichangpu.

(XLSX) Click here for additional data file.
  39 in total

1.  Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors:  Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal:  Genome Res       Date:  2003-11       Impact factor: 9.043

Review 2.  Alzheimer's disease: strategies for disease modification.

Authors:  Martin Citron
Journal:  Nat Rev Drug Discov       Date:  2010-05       Impact factor: 84.694

3.  PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints.

Authors:  Chun Wei Yap
Journal:  J Comput Chem       Date:  2010-12-17       Impact factor: 3.376

4.  GSA: a GPU-accelerated structure similarity algorithm and its application in progressive virtual screening.

Authors:  Xin Yan; Qiong Gu; Feng Lu; Jiabo Li; Jun Xu
Journal:  Mol Divers       Date:  2012-10-19       Impact factor: 2.943

Review 5.  Shifting from the single to the multitarget paradigm in drug discovery.

Authors:  José L Medina-Franco; Marc A Giulianotti; Gregory S Welmaker; Richard A Houghten
Journal:  Drug Discov Today       Date:  2013-01-20       Impact factor: 7.851

6.  LBVS: an online platform for ligand-based virtual screening using publicly accessible databases.

Authors:  Minghao Zheng; Zhihong Liu; Xin Yan; Qianzhi Ding; Qiong Gu; Jun Xu
Journal:  Mol Divers       Date:  2014-09-03       Impact factor: 2.943

7.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques.

Authors:  Lei Chen; Youyong Li; Qing Zhao; Hui Peng; Tingjun Hou
Journal:  Mol Pharm       Date:  2011-03-25       Impact factor: 4.939

Review 8.  Traditional Chinese medicines and Alzheimer's disease.

Authors:  Tzong-Yuan Wu; Chih-Ping Chen; Chip-Ping Chen; Tzyy-Rong Jinn
Journal:  Taiwan J Obstet Gynecol       Date:  2011-06       Impact factor: 1.705

9.  Synthesis and evaluation of multi-target-directed ligands against Alzheimer's disease based on the fusion of donepezil and ebselen.

Authors:  Zonghua Luo; Jianfei Sheng; Yang Sun; Chuanjun Lu; Jun Yan; Anqiu Liu; Hai-Bin Luo; Ling Huang; Xingshu Li
Journal:  J Med Chem       Date:  2013-11-12       Impact factor: 7.446

Review 10.  A century of Alzheimer's disease.

Authors:  Michel Goedert; Maria Grazia Spillantini
Journal:  Science       Date:  2006-11-03       Impact factor: 47.728

View more
  4 in total

Review 1.  Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system.

Authors:  Vertika Gautam; Anand Gaurav; Neeraj Masand; Vannajan Sanghiran Lee; Vaishali M Patil
Journal:  Mol Divers       Date:  2022-07-11       Impact factor: 3.364

2.  Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents.

Authors:  Kushagra Kashyap; Mohammad Imran Siddiqi
Journal:  Mol Divers       Date:  2021-07-19       Impact factor: 3.364

3.  The Mechanisms of Bushen-Yizhi Formula as a Therapeutic Agent against Alzheimer's Disease.

Authors:  Haobin Cai; Yunxia Luo; Xin Yan; Peng Ding; Yujie Huang; Shuhuan Fang; Rong Zhang; Yunbo Chen; Zhouke Guo; Jiansong Fang; Qi Wang; Jun Xu
Journal:  Sci Rep       Date:  2018-02-15       Impact factor: 4.379

4.  DRDB: A Machine Learning Platform to Predict Chemical-Protein Interactions towards Diabetic Retinopathy.

Authors:  Yu Wei; Ruili Zhang; Xiaoqiang Li; Zhonglin Li; Kaimin Guo; Shanshan Li; Li Yan; Qian Zhao; Baijian Qu; Wenjia Wang; Shuiping Zhou; He Sun; Jianping Lin; Yunhui Hu
Journal:  Oxid Med Cell Longev       Date:  2022-07-20       Impact factor: 7.310

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.