| Literature DB >> 35360890 |
Maria-Anna Trapotsi1, Layla Hosseini-Gerami1, Andreas Bender1.
Abstract
The elucidation of a compound's Mechanism of Action (MoA) is a challenging task in the drug discovery process, but it is important in order to rationalise phenotypic findings and to anticipate potential side-effects. Bioinformatic approaches, advances in machine learning techniques and the increasing deposition of high-throughput data in public databases have significantly contributed to recent advances in the field, but it is not straightforward to decide which data and methods are most suitable to use in a given case. In this review, we focus on these methods and data and their applications in generating MoA hypotheses for subsequent experimental validation. We discuss compound-specific data such as -omics, cell morphology and bioactivity data, as well as commonly used supplementary prior knowledge such as network and pathway data, and provide information on databases where this data can be accessed. In terms of methodologies, we discuss both well-established methods (connectivity mapping, pathway enrichment) as well as more developing methods (neural networks and multi-omics integration). Finally, we review case studies where the MoA of a compound was successfully suggested from computational analysis by incorporating multiple data modalities and/or methodologies. Our aim for this review is to provide researchers with insights into the benefits and drawbacks of both the data and methods in terms of level of understanding, biases and interpretation - and to highlight future avenues of investigation which we foresee will improve the field of MoA elucidation, including greater public access to -omics data and methodologies which are capable of data integration. This journal is © The Royal Society of Chemistry.Entities:
Year: 2021 PMID: 35360890 PMCID: PMC8827085 DOI: 10.1039/d1cb00069a
Source DB: PubMed Journal: RSC Chem Biol ISSN: 2633-0679
Fig. 1Overview of the different types of data/information used in MoA studies and the various levels that MoA can be defined on, as reviewed in this paper. This includes experimental data, such as transcriptomics data, and data resources which are used to provide biological context to experimental data, such as pathway and network data. Created with BioRender.
Experimental data types commonly used in MoA analysis, the level of biology represented, and some advantages and disadvantages of the data, which are usually generated with high-throughput unbiased techniques
| Data type | MoA biology represented | Advantages | Disadvantages | Experimental techniques |
|---|---|---|---|---|
| Bioactivity | Compound-target binding and functional effects ( | Relatively easy and cheap to measure (high-throughput screening or HTS)[ | Target binding | There is a broad spectrum of assays to test bioactivity of compounds to targets. |
| Does not inform about specific changes in cell signalling following target binding |
| |||
| Not all target–ligand interactions are efficacy (MoA) related ( | ||||
| Transcriptomics | Changes in gene expression arising from modulated signalling (and transcription factor activity) | High-throughput techniques developed for large data acquisition[ | High level of noise in data arising from fluctuations in biological activity[ | Micro-array and RNA-sequencing |
| Provides a ‘snapshot’ of cellular changes in signalling following compound administration | Assumes gene expression is static, rather than a dynamic process[ | |||
| An array of standard analysis methods have been developed | Does not necessarily translate to protein expression due to | |||
| Cell image | Changes in cellular morphology ( | High-throughput imaging techniques developed for large data acquisition[ | May not produce a meaningful signal if the compound is not able to alter cell morphology.[ | High throughput imaging assays |
| Feature extraction software and methods are evolving[ | Features are often highly correlated and biologically ambiguous[ |
| ||
| Young field | Requires orthogonal data to be able to relate changes to modulated genes/proteins[ | |||
| Little case studies for MoA analysis | Phenotypic effects may be subtle and hence the biological signal can be overwhelmed by sources of technical variation[ | |||
| Proteomics | Changes in protein abundance arising from modulated signalling induced by a compound (transcription, translation, protein degradation) | Extends upon transcriptomics data by capturing changes in post-translational regulation | Data generation is costly and cumbersome[ | LC-MS/MS |
| High biological variability/low reproducibility as well as significant technical variability[ | ||||
| ‘Missing value problem’[ | ||||
| Metabolomics | Changes in metabolite abundance arising from modulated signalling induced by a compound (and metabolic enzymes) | Contains downstream products of transcriptomic and proteomic processes[ | Data generation is costly and cumbersome, requiring multiple technical methods to capture the entire metabolome[ | NMR, LC-MS |
| Can also identify potential toxicity[ | High biological variability/low reproducibility as well as technical variability due to | |||
| Lack of comprehensive metabolite annotation and ability to relate to other biochemical components ( | ||||
| Phosphoproteomics | Changes in protein phosphorylation (protein signalling) induced by a compound | Captures the signalling proteins modulated, thus the specific biological pathways relevant to MoA | Phosphorylation site annotation is not trivial and functional relevance is often unclear[ | MS |
| Links ‘higher-level’ bioactivity data and ‘lower-level’ | Time-consuming assays limiting data availability[ | |||
| High-throughput assays in development[ | High biological variability/low reproducibility, as well as technical variability arising from MS instruments[ |
Fig. 2Schematic description of the cell painting assay demonstrated with the Warfarin compound. Created with BioRender using cell images from the Image Data Resource (IDR0036).
Supplementary data commonly used in MoA analysis, the level of biology represented, and some advantages and disadvantages of the data
| Data type | MoA biology represented | Advantages | Disadvantages |
|---|---|---|---|
| Network | Global interactome of molecular entities ( | Can be used as prior knowledge with | High false positive and false negative rates for interactions ( |
| Standardised formats have been developed for effective data integration and sharing in line with FAIR principles[ | Curation bias – well-studied entities usually ‘hub’ nodes which bias downstream analyses[ | ||
| Interaction filtering is possible based on types of evidence, allowing for greater flexibility[ | Simultaneously noisy and incomplete[ | ||
| Pathway | Describes cascades of molecular interactions which have a defined entry point, signalling mediators, and cellular effect | Enables groups of genes/proteins to be characterised in terms of shared biological functions for ease of interpretation[ | Static representation of a dynamic process[ |
| Interactions between pathways often not considered[ | |||
| Curation bias – well-studied processes more comprehensive and detailed, and overrepresented in pathway databases[ |
Fig. 3The merged mTOR signalling pathway from KEGG (blue), Reactome (orange) and Wikipathways (green) visualised in PathME viewer. The intersection sizes represent the number of entities in common vs. the number of entities in each pathway. We observe that, for the same pathway, the information from 3 different sources varies. Visualisation created with PathMe Viewer.
Fig. 4Connectivity map procedure (adapted from original article). (A) The biological state of interest should be represented as a gene expression signature (query), from which the top up- and down-regulated genes are interrogated. (B) The query signature is compared against reference profiles to compute connectivity. (C) The reference profiles are ranked in terms of both magnitude and direction (positive or negative) of connectivity to the query signature.
Fig. 5The GO hierarchy is skewed, and contains redundant terms. Tools such as GOATOOLS can be used to correct for the skewed nature of GO ontology. Here, three terms (A, B and S) have the same level of hierarchy but different descendants, which illustrates the complexity of using GO terms for enrichment analysis. Figure adapted from Klopfenstein et al.[157] with permission from the authors, copyright 2018.
Fig. 6(A) Demonstration of model overview. Multi Omics Factor Analysis (MOFA) takes a number of data matrices as input from different data modalities and decomposes these matrices into a matrix of factors for each sample and weight matrices, one for each data modality. (B) Downstream analysis of MOFA model including variance decomposition, assessing the proportion of variance explained by each factor in each data modality, inspection of factors and imputation of missing values. Created with BioRender.
Applications of different methods and data modalities to gain understanding of compound MoA
| Application type | Data type(s) | Method(s) | Scientific findings | General learnings | Ref. |
|---|---|---|---|---|---|
| Integration of data | Phosphoproteomics | Causal | Generated detailed mechanistic hypotheses, | Phosphoproteomics data was used to enhance network inference using transcriptomics data, but the approach was limited by data availability | Ji |
| Transcriptomics | Reasoning | ||||
| Network | |||||
| Pathway | |||||
| Proteomics | Pathway enrichment | Proteomics analysis showed specific compound-induced increases and decreases on the protein expression level of proteins relevant to cytoskeletal regulation and signal transduction pathways in neurons, and were related to the changes on the mRNA level to hypothesise the signalling cascades modulated by the compound | Pathway enrichment analysis on a set of proteins/genes derived from proteomics and transcriptomics data of a compound can put the genes/proteins into biological context and further better understand a compound's mechanism of action | Weinreb | |
| Transcriptomics | |||||
| Pathway | |||||
| Transcriptomics | Machine learning | Each type of molecular data was mapped to a network of molecular interactions | Machine learning network models on multiple -omics spaces are able to prioritize disease-relevant mechanisms of action | Patel-Murray | |
| Proteomics | Network optimization of this large interactome highlighted the functional changes induced by the compounds | ||||
| Metabolomics | |||||
| Epigenomics | |||||
| Cell image | Machine learning | Cell image data used in bioassay prediction increased hit rates of two internal Janssen projects and hits were chemically diverse | Cell image data can be useful and, in some cases, complementary to chemical structural information for bioactivity predictions | Simm | |
| Bioactivity | |||||
| Cell images derived from cell types treated with: | Deep learning | Immune signalling modelled with images of cells and be used for the development of accurate disease models, which proved to facilitate the discovery and MoA understanding of immune modulating drugs | Cell Painting data derived from different types of treatment can be used to develop disease models and identify potential treatments, at the same time understanding their MoA | Cuccarese | |
| (a) Recombinant proteins | Methodology applied on the context of COVID-19 and identified drugs currently in clinical trials for COVID-19 | ||||
| (b) CRISPR-based genetic modifications | |||||
| (c) Small molecules | |||||
| Integration of methods | Transcriptomics | Connectivity mapping | Application of two methodologies enabled the researchers to generate the novel NF-κB hypotheses for the MoA of pinosylvin | Two complementary methodologies were applied to generate novel hypotheses for the MoA of anti-inflammatory compound pinosylvin, similar mechanisms suggested by two separate methods increasing confidence in the hypothesis | Kibble |
| Group factor analysis | |||||
| Integration of data and methods | Cell image | Pathway enrichment | Functional enrichment analysis on Nomilin, Zardaverine and Hydrocotarnine identified genes involved in the regulation of cytoskeletal remodelling and growth activation, hence cellular changes in the cytoskeleton in addition to its role in determining cell morphology produce changes in gene expression | Significant associations between alterations in cell morphology and gene expression were identified | Nassiri and McCall[ |
| Transcriptomics | Machine learning | A set of genes associated with an image-based feature and resulted in a better understanding of the biological responses to compound perturbations | |||
| Pathway | |||||
| Transcriptomics | Machine learning | Phenothiazine was predicted to interact with the androgen receptor (AR) based on its high transcriptional similarity with enzalutamide (despite low chemical similarity), which is indicated for prostate cancer | The combination of transcriptional similarity with pathway enrichment analysis provided new (and experimentally validated) therapeutic indications for compounds across different diseases, meanwhile chemical similarity alone would not have led to this hypothesis | Iwata | |
| Bioactivity | Pathway enrichment | An | |||
| Pathway | |||||
| Proteomics | Clustering | Pathway enrichment analysis on biopsies identified factors (proteins and phosphosites) that are up-regulated specifically in hepatocellular carcinoma upon sorafenib treatment | Proteomics and phosphoproteomics data from biopsies can contribute to precision medicine based on phenotypic data to identify new targets, biomarkers and signalling pathways that mediate drug resistance | Dazert | |
| Phosphoproteomics | Pathway enrichment |