| Literature DB >> 36192551 |
Rongting Yue1, Abhishek Dutta2.
Abstract
Omics-based approaches have become increasingly influential in identifying disease mechanisms and drug responses. Considering that diseases and drug responses are co-expressed and regulated in the relevant omics data interactions, the traditional way of grabbing omics data from single isolated layers cannot always obtain valuable inference. Also, drugs have adverse effects that may impair patients, and launching new medicines for diseases is costly. To resolve the above difficulties, systems biology is applied to predict potential molecular interactions by integrating omics data from genomic, proteomic, transcriptional, and metabolic layers. Combined with known drug reactions, the resulting models improve medicines' therapeutical performance by re-purposing the existing drugs and combining drug molecules without off-target effects. Based on the identified computational models, drug administration control laws are designed to balance toxicity and efficacy. This review introduces biomedical applications and analyses of interactions among gene, protein and drug molecules for modeling disease mechanisms and drug responses. The therapeutical performance can be improved by combining the predictive and computational models with drug administration designed by control laws. The challenges are also discussed for its clinical uses in this work.Entities:
Mesh:
Year: 2022 PMID: 36192551 PMCID: PMC9528884 DOI: 10.1038/s41540-022-00247-4
Source DB: PubMed Journal: NPJ Syst Biol Appl ISSN: 2056-7189
Fig. 1A systemic view of disease.
Interactions among genomic, proteomic and transcriptomic levels reveal the regulatory process within organisms. Drug molecules intervene in this process by binding with specific target ligands. These omics data and chemical molecules are required to be analyzed simultaneously to study the entire disease mechanisms and drug reaction.
Fig. 2Analyses of disease and drug effect through single and multiple layers of omics data.
Systemic view enables scientists to establish disease models from a higher hierarchic level by multi-layer data integration. Through vertical analysis within single layers, the disease-related molecules are identified by abnormal gene expression values. Through horizontal analysis with different layers, the interactive information are used to track diseases or drug effects throughout the entire biological process.
Methods for constructing gene co-expression networks.
| Algorithms and applications | Advantages | Potential limitations |
|---|---|---|
| Quasi-Clique Merger algorithm for finding co-expressed gene clusters[ | Integrates multiple microarray datasets, even including the data without normal samples[ | Requires large datasets to ensure a high level of significance for correlations of gene expressions. |
| Context Likelihood of Relatedness algorithm for inferring edges in the network to identify cross-species gene interactions[ | Captures nonlinear changes in gene expressions[ | Can’t discriminate the direction of correlations of gene pairs without Pearson Correlation Coefficient. |
| GENIE3 for inferring gene co-expression network[ | Fast detects gene networks from large multifactorial gene expression data. | Requires prior knowledge of the transcription factors. |
| WGCNA for detecting functional gene clusters[ | The approximately scale-free network structure reserves connectivity when randomly removing nodes. | Sensitive to the number of genes and the choices of parameters (i.e., soft threshold). |
Fig. 3Interactions among drugs and target proteins offer chances for drug combination, co-administration, and repurposing.
Drug molecules may bind with off-target proteins that induce side effects (labeled by red arrows), which should be avoided. Though, the therapeutic drug molecules should be reserved, as they function as desired to cure diseases (labeled in green arrows). Interactions between drugs provide an opportunity for enhanced therapeutic performance through drug co-administration and combination (labeled by cyan arrow). Drug similarity conveyed by drug-target interactions provides a chance for drug repurposing (labeled by orange arrows). Drug targets may be expanded to similar proteins using protein-protein interactions for drug repurposing (labeled in purple arrows). Drug molecules in this figure include epinephrine (for “Drug 1”) and benzene (for “Drug 2”) as examples, using the icons from Biorender.
Databases for gene and protein interaction analyses.
| Application | Database | Description | Reference |
|---|---|---|---|
| DTI, PPI | Comparative Toxicogenomics Database | Information of chemicals, pathways, disease, organisms, genes, drug-gene interactions. Data are mainly collected from references. | [ |
| Gene regulation and interaction | GEO (Gene Expression Omnibus) | One of NCBI databases. Gene expression data (eg. RNA, genome methylation and proteins) that comes from data submissions such as microarray or other researches. | [ |
| Genomics of cancer | Cancer Genome Atlas (TCGA) | Cancer molecular data including genome, epigenome, transcriptome and proteome. | [ |
| Biological pathway | GeneGo MetaBase | Bioinformatics including signaling and metabolic pathways, interactions among drugs and proteins as well as kinetic information of drugs. | [ |
| PPI, DTI, signaling pathway | KEGG (Kyoto Encyclopedia of Genes and Genomes) | Information of pathways, genome, chemicals and diseases based on diagrams of interaction and reaction. It is complementary to the majority of the existing molecular biology databases that contain information on individual molecules or individual genes. | [ |
| PPI | STRING (Search Tool for the Retrieval of Interacting Genes) | Functional links in PPI based on experimental data. Interactions are predicted by comparative genomics and text mining based on the scoring system. | [ |
| Gene regulation and interaction | CCLE (the Cancer Cell Line Encyclopedia) | Gene expression data for human cancer analysis, including information of mutation, Gene Methylation and the associations between cell line and genomics. | [ |
| DTI, DDI | PubChem (NCBI) | Characteristics of chemical molecules and activities from experimental results or literature. For drug analysis, it provides information on the chemical structure for each drug and the validated chemical depiction information. | [ |
| Gene regulation and interaction | GO (Gene Ontology) | Biological annotations including structure, function and dynamics in pathways, molecules and organism level for a variety of species. | [ |
| DTI | STITCH (search tool for interactions of chemicals) | Profiles of chemicals and proteins interactions. The data source includes experimental results and text mining. More than 9 million proteins come from almost 2,000 organisms in this database. | [ |
| DTI | ChEMBL | Biological activities and characteristics of molecules such as chemicals and proteins that contribute to the study of drug target and drug discovery. | [ |
| Gene regualtion and transcription | UniGene (NCBI) | Gene sequences from animals and plants. The well-characterized sequences are driven from algorithm-based classification which helps to identify uniqueness among genes. The Source of intact gene sequences is GenBank. | [ |
| DTI, DDI | Drug Bank | Drug-target and drug-drug interacting information such as chemical sequence, three-dimensional structure and pharmacological pathway involvement. | [ |
| Genomics of breast cancer | METABRIC (Molecular Taxonomy of Breast Cancer International Consortium) | Clinical and expression data for breast tumors. The collected breast cancer specimens are grouped for discovery and validation. It helps to assess the survival prediction of cancer patients. | [ |
| PPI | HPRD (Human Protein Reference Database) | Bioinformatics of human protein-protein interactions from literature and data are manually curated. | [ |
| Signaling pathway | Reactome | Bioinformatics information including pathway, proteins and drugs for model visualization and analysis. | [ |
| DDI | Online Mendelian Inheritance in Man (OMIM) | Disease data including disease loci, known disease genes and the known disorder-gene associations such as the molecular relationship between genetic variation and phenotypic expression. | [ |
| PPI | Human Protein Reference Database (HPRD) | Protein information based on interactions described in published reports. The interaction set is expected to be biased toward known disease genes. | [ |
Fig. 4Statistic analyses on gene expression Values from RNA-sequencing data identify DEGs.
The signaling pathways that contain the highest ratio of DEGs are regarded as disease-related. All or part of proteins in these pathways are selected to form the target protein set. By parsing PPI and DTI data from databases, the PPI network and the DTI network are constructed. The drug molecules in the DTI network are also used to parse and construct the DDI network. Finally, the DDI and the PPI are connected by the DTI, and thus the heterogeneous network is constructed. The task of predicting the potential drugs for a given disease is now transferred to the prediction of the interactions between the proteins and the drug molecules. Abbreviation: DEGs Differentially Expressed Genes, PPI Protein-Protein Interaction, DDI Drug-Drug Interaction, DTI Drug-Target Interaction.
Fig. 5A convolutional neural network.
This neural network is comprised of an input layer, convolutional layers (that extract features from hyperplanes of input data by projection/convolution), pooling layers (that reduce the spatial size and mitigate the locational sensitivity), flatten layer (that flattens the features and feeds them into the artificial neural network), fully connected layer (that learns nonlinear function of the extracted features) and an output layer. The input is the raw features such as molecules sequence patterns, gene regulation annotations and patterns, molecular interaction network motifs, molecular structures and structural associations, drug chemical structures, drug side-effect reports, and so on. The output can be the predicted classification (e.g., molecular binding profiles) and regression (e.g., quantified molecular binding affinities) when obtaining new samples.
Learning-based Methods for Predicting Molecular Interactions.
| Type | Advantages | Disadvantages | Applications |
|---|---|---|---|
| Supervised learning | Use full label information of omics data. | Rely heavily on size of labeled data. Data preprocessing for noise and features may be needed, but this causes information loss. | Logistic regression for genome-wise prediction on relevant functions, disease and trait[ |
| Unsupervised learning | No need for data labels. Suitable for the case where the labeled data is few and expensive to obrain. | Lose the informative features brought by labels. | Autoencoder for denoising a single-cell RNA-sequencing model[ |
| Semi-supervised learning | Combine the benefits of feature extraction brought by unsupervised learning, and also make full use of the informative label data. | Algorithms work under proper assumptions. The trained model will loss generalization on testing data if assumptions don’t hold. | Autoencoder-based semi-supervised learning for predicting DTI[ |
Graph learning methods.
| Methods | Advantages | Limitations |
|---|---|---|
| Graph Covolutional Network[ | Aggregate graph information and make the use of structural information of graphs. | Less computationally efficient for large graphs. Lack of generalization to graphs with different structure[ |
| Graph Attention Network[ | Computationally efficient for node-level parallel processing. Don’t need the knowledge of the entire graph structure. | Can’t tell the differences between local and global structures well[ |
| (Variational) Graph Autoencoder[ | Reduce data dimensionality and speed up the training process. | Captures more information from dataset, rather than the relevant information to the problem. And the reconstruction process loses information. |
| Graph Generative Adversarial Nets[ | Can augment dataset and impute missing values. | Instability of gradient updates, and the vanished gradients of generator[ |
Fig. 6Schematic diagrams in dynamic modeling.
a Gene regulatory network adapted from the work[25]. Gene 1, 2, and 3 are coding genes. Gene 1 regulates its own expression and those of Gene 2. The protein produced by Gene 1 regulates Gene 3 expression through a signaling factor/protein (that is produced from the protein expressed by Gene 2.) Drugs can intervene in the regulation by binding with proteins that change the gene expressions. b Diagram of signaling transduction. Signals are received and enter the nucleus to change gene expression. Proteins are synthesized to regulate phenotypic behaviors of cells or tissue. Errors (e.g., dysregulation) in signaling pathways (e.g., dysregulation) may cause the ceasing of cell apoptotic that results in unlimited growth and division.
Fig. 7Dynamic modeling and analyses.
a Scheme of parameter estimation. After data acquisition, parameters are fitted in models by minimizing the difference between experimental data and model output. Sensitivity analysis, uncertainty quantification and identifiability analysis help assess the performance and robustness of the fit. b Loop of Model Predictive Control. Based on the output of model prediction, this control strategy updates control input (e.g., drug administration) to make the system dynamics track reference trajectory (e.g., desired tumor cell decrement) during each time interval. Its essence is to handle the constrained optimization problem (Constraints can be maximal drug doses and minimal normal cell populations).
Control algorithms for drug administration.
| Control Objective | Algorithm | Advantages | Disadvantages | Applications |
|---|---|---|---|---|
| Maintain drug concentration at certain levels | PID control | Easy to implement; flexible structure for functional expansion. | Design of PID controllers usually needs linear models. The linear approximation of the nonlinear model leads to information loss that eventually decreases control performance. | For anesthesia delivery[ |
| Balance between drug toxicity, drug cost and therapeutic performance | Optimal control | Balance drug cost, toxicity and therapeutic performance. | More computations are required. Weighting variables | For optimal imatinib treatment for leukemia model[ |
| Handle constraints in the model; Balance between drug toxicity, drug cost and therapeutic performance | Model predictive control | Consider physical constraints based on drug toxicity and patient conditions. | Solving the constrained optimization problem needs additional work to ensure stability, optimality and feasibility. | For stabilizing HIV infection[ |