| Literature DB >> 34268139 |
Ashleigh van Heerden1,2, Roelof van Wyk1,2, Lyn-Marie Birkholtz1,2.
Abstract
The rapid development of antimalarial resistance motivates the continued search for novel compounds with a mode of action (MoA) different to current antimalarials. Phenotypic screening has delivered thousands of promising hit compounds without prior knowledge of the compounds' exact target or MoA. Whilst the latter is not initially required to progress a compound in a medicinal chemistry program, identifying the MoA early can accelerate hit prioritization, hit-to-lead optimization and preclinical combination studies in malaria research. The effects of drug treatment on a cell can be observed on systems level in changes in the transcriptome, proteome and metabolome. Machine learning (ML) algorithms are powerful tools able to deconvolute such complex chemically-induced transcriptional signatures to identify pathways on which a compound act and in this manner provide an indication of the MoA of a compound. In this study, we assessed different ML approaches for their ability to stratify antimalarial compounds based on varied chemically-induced transcriptional responses. We developed a rational gene selection approach that could identify predictive features for MoA to train and generate ML models. The best performing model could stratify compounds with similar MoA with a classification accuracy of 76.6 ± 6.4%. Moreover, only a limited set of 50 biomarkers was required to stratify compounds with similar MoA and define chemo-transcriptomic fingerprints for each compound. These fingerprints were unique for each compound and compounds with similar targets/MoA clustered together. The ML model was specific and sensitive enough to group new compounds into MoAs associated with their predicted target and was robust enough to be extended to also generate chemo-transcriptomic fingerprints for additional life cycle stages like immature gametocytes. This work therefore contributes a new strategy to rapidly, specifically and sensitively indicate the MoA of compounds based on chemo-transcriptomic fingerprints and holds promise to accelerate antimalarial drug discovery programs.Entities:
Keywords: biomarker; chemo-transcriptomic fingerprint; gene expression profile; machine learning; mode of action; multinominal logistic regression
Mesh:
Substances:
Year: 2021 PMID: 34268139 PMCID: PMC8277430 DOI: 10.3389/fcimb.2021.688256
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Figure 1Workflow of data acquisition and model building. (A) Accepted GEPs of treated P. falciparum formed our inclusive database (pink) and underwent feature-selection (purple). In both instances the GEPs were merged and underwent normalization. The resultant transcripts from the (pink) accepted GEPs produced our inclusive database. Similarly, the consequential transcripts from our feature selection produced our rational selection database (purple). (B) Both transcripts in the inclusive database and in the rational selection database were used to separately build models. Treatment time points were randomly split at an 80/20 ratio, whereby 80% was used as a training set for model tuning, training and K-fold cross validation. The resultant 20% of treatment time points were used as a test set to evaluate model performance on untrained data. Prediction accuracy of treatment MoA from both the K-fold cross validation and prediction on the test set was used to determine the best classification algorithm for antiplasmodial MoA. From this classification algorithm, the resultant models trained on either the inclusive database or rational selection database were compared and assessed which selection approach was the best in identifying the minimum number of transcripts for robust MoA prediction using a sliding gene-scale approach.
The top 50 biomarkers identified from the ML modeling.
| Treatment | Mode of action/target protein | PlasmoDB Gene ID | Gene product description |
|---|---|---|---|
| ML-7 & W7 ( | Ca2+/calmodulin-dependent protein kinase inhibitor |
| Secreted ookinete protein |
|
| protein MAK16 | ||
|
| Adenylosuccinate lyase | ||
|
| Pre-mRNA-processing ATP-dependent RNA helicase PRP5 | ||
|
| Conserved | ||
|
| Conserved | ||
| TSA ( | Histone deacetylase |
| mRNA-capping enzyme subunit beta |
|
| Major facilitator superfamily-related transporter | ||
|
| Nucleotidyltransferase | ||
|
| Serine/threonine protein kinase, FIKK family | ||
|
| Conserved | ||
|
| Vacuolar protein sorting-associated protein 52 | ||
|
| Eukaryotic translation initiation factor eIF2A | ||
| Febrifugine ( | Prolyl-tRNA synthetase |
| Ribonuclease H2 subunit A |
|
| tRNA N6-adenosine threonylcarbamoyltransferase | ||
|
| 50S ribosomal protein L22, apicoplast | ||
| Staurosporine A ( | Serine/threonine kinases |
| Cysteine desulfuration protein SufE |
|
| Conserved | ||
|
| Kinesin-like protein | ||
|
| Debranching enzyme-associated ribonuclease | ||
|
| DNA replication licensing factor MCM4 | ||
| Artemisinin ( | Free radicals formation and protein & heme alkylation |
| Conserved |
|
| Conserved | ||
| DFMO | Ornithine decarboxylase |
| Actin-depolymerizing factor 1 |
|
| Structural maintenance of chromosomes protein 4 | ||
|
| tRNA methyltransferase | ||
|
| 40S ribosomal protein S17 | ||
|
| Conserved | ||
|
| Allantoicase | ||
| Cyclosporine A ( | Binds sphingomyelin |
| Conserved |
|
| Phosphoinositide-specific phospholipase C | ||
|
| Conserved | ||
|
| GTP-binding protein | ||
|
| Splicing factor 3A subunit 1 | ||
| Chloroquine & Quinine ( | Heme metabolism |
| Cytoplasmic tRNA 2-thiolation protein 1 |
|
| Serine/threonine protein kinase | ||
|
| Conserved | ||
|
| AP2 domain transcription factor | ||
|
| Conserved | ||
|
| Conserved | ||
| PMSF ( | Serine protease |
| Inositol-3-phosphate synthase |
|
| Conserved | ||
|
| DnaJ protein | ||
|
| Cysteine proteinase falcipain 3 | ||
|
| Golgi apparatus membrane protein TVP23 | ||
| Ionomycin ( | Ca2+-binding ionophore |
| Gametocyte-specific protein |
| MMV’048 & UCT’943 ( | Phosphatidylinositol 4-kinase (PI4K) |
| Conserved protein, unknown function |
|
|
| ||
|
| Sodium-dependent phosphate transporter | ||
|
| Ribosomal protein L16, mitochondrial |
Figure 2Robustness and accuracy of different ML algorithm’s ability to stratify treatments with similar MoA using either the 2463-transcript inclusive database or 174-transcript rational selection database. Different ML algorithms are grouped according to statistics-driven elementary ML, ensemble or deep learning classifiers. Algorithm classifiers were either trained on the 2463-transcript inclusive database (blue) or 174-transcript rational selected database (gray). Classifiers were hyperparameter tuned before undergoing 10-fold cross-validation. Bars indicate the average accuracy of the classifier obtained from 10-fold cross-validation on the training data and the error bars are the standard deviation of performance measures. Triangles indicate the accuracy of the classifier in stratifying the MoA of test data. Some ML algorithms had variations in R packages that could be used for model building, which were also interrogated. SVC, support vector classification with the1071 R package; P, polynomial; L, linear; MLR, multinomial logistic regression; rF, random forest with randomForest R package; RF, random forest with h2o R package; XGBoost, built with xgboost R package, GBM, gradient boosting machine; ANN, artificial neural network. MLR. RF GBM and ANN was built using h2o R package.
Figure 3Influence of limiting the number of transcripts used as training features on MoA stratification of MLR models. MLR classifiers trained on either ML-inferred transcripts (blue) or on rationally selected transcripts (light gray). Using variable importance, transcripts were ranked according to their importance in making classification decisions for the MLR classifier. With the ranked transcripts a sliding gene-scale approach was applied where the top transcripts were used to make minimodels with each sequential model containing decreasing number of transcripts used to train the MLR classifier. Each minimodels underwent 10-fold cross-validation and was also assessed in the accuracy of MoA stratification on test data. Bars indicate average accuracy obtained from 10-fold cross-validation, and triangles indicate model accuracy on the untrained test set. Error bars indicates the standard deviation of the average accuracy.
Figure 4Novelty of 50 rationally selected biomarker transcripts expression patterns associated with similar MoA. (A) The top 50 biomarkers (gray) from the rational selection 50-transcript MLR minimodel were compared to DEGs associated to MoA as identified by (Hu et al., 2009) shown in black. Biomarker transcripts which were also identified for the same compound and MoA within these two studies are shown as stacked bars. (B) Correlation in gene expression of the 50 biomarkers between compounds within our database. Log2 fold change values for each transcript was extracted in all the compound treatments to plot the heatmap. Similar expression patterns (red blocks) are seen within compound treatments with similar MoA. The biomarkers are grouped according to the MoA (black blocks) they were identified from in .
Figure 5The ML defined biomarkers result in unique suprahexagonal chemo-transcriptomic fingerprints clustered per compound MoA. The dimensionality of expression profiles for the 50 biomarkers from various compound treatments were reduced using self-organizing maps (SOM) visualized as suprahexagonal chemo-transcriptomic fingerprints. Each hexagon within each suprahexagonal map defines a cluster of biomarkers colored according to log2 fold change (FC) expression values, and hierarchically clustered using Ward linkage on Euclidian distance of expression profiles. Known protein targets or biological processes affected by the compound treatments are indicated. CDPK, Ca2+/calmodulin-dependent protein kinase; Ser/Thr kinase, serine/threonine kinase; Ser protease, serine protease; HDAC, histone deacetylase; ODC, ornithine decarboxylase; Stauro, staurosporine; Iono, ionomycin; CQ, chloroquine; Art, artemisinin; Febr, febrifugine; Quin, quinine; ASA-9, 2-aminosuberic acid derivative; CylcoS, cyclosporine A; PMSF, phenylmethylsulfonyl fluoride; TSA, trichostatin A; SAHA; suberoylanilide hydroxamic acid; DFMO, difluoromethylornithine; TSA1, data from Hu et al. (2009); TSA2, data from Andrews et al. (2012).
Figure 6Performance of the biomarkers to stratify new compounds to MoA classes. (A) Correlation plots were generated from biomarker expression profiles for compounds within our database as well as new compounds (TSA, MMV666810 and MMV642850) treated on immature gametocytes (stage II/III) to assess the usefulness of the biomarkers on different life cycle stages of the parasite. Plots were visualized using corrplot based on Pearson correlations. Similar correlation patterns, shown in blocks, were observed for compound treatments with similar MoA. Areas of high correlation (positive or negative) are indicated in blocks for particular compound groups. (B) SOM suprahexagonal chemo-transcriptomic fingerprints for each of the new compounds included in the validation compared to example compounds within the same MoA class. TSA, Trichostatin A; MMV666810, MMV’810; MMV642850, MMV’850; im Gc, immature stage II/III gametocytes.