| Literature DB >> 34844399 |
Dohoon Lee1, Sun Kim2,3,4,5.
Abstract
Cells survive and proliferate through complex interactions among diverse molecules across multiomics layers. Conventional experimental approaches for identifying these interactions have built a firm foundation for molecular biology, but their scalability is gradually becoming inadequate compared to the rapid accumulation of multiomics data measured by high-throughput technologies. Therefore, the need for data-driven computational modeling of interactions within cells has been highlighted in recent years. The complexity of multiomics interactions is primarily due to their nonlinearity. That is, their accurate modeling requires intricate conditional dependencies, synergies, or antagonisms between considered genes or proteins, which retard experimental validations. Artificial intelligence (AI) technologies, including deep learning models, are optimal choices for handling complex nonlinear relationships between features that are scalable and produce large amounts of data. Thus, they have great potential for modeling multiomics interactions. Although there exist many AI-driven models for computational biology applications, relatively few explicitly incorporate the prior knowledge within model architectures or training procedures. Such guidance of models by domain knowledge will greatly reduce the amount of data needed to train models and constrain their vast expressive powers to focus on the biologically relevant space. Therefore, it can enhance a model's interpretability, reduce spurious interactions, and prove its validity and utility. Thus, to facilitate further development of knowledge-guided AI technologies for the modeling of multiomics interactions, here we review representative bioinformatics applications of deep learning models for multiomics interactions developed to date by categorizing them by guidance mode.Entities:
Keywords: Artificial intelligence; Computational biology; Deep learning; Molecular biology
Year: 2021 PMID: 34844399 PMCID: PMC9082244 DOI: 10.3345/cep.2021.01438
Source DB: PubMed Journal: Clin Exp Pediatr ISSN: 2713-4148
Fig. 1.Interactions between omics layers that are modeled by weakly guided deep learning models. The schematic diagram shows 6 types of interactions that are formulated as tasks for deep learning models: (1) DNA/RNA binding specificity prediction, (2) mRNA splicing prediction, (3) gene expression prediction based on genomic sequences, (4) prediction of DNA methylation states and levels based on genomic sequences, (5) capturing relationship between genome and epigenome, and (6) simultaneous integration of multiple omics features. The black lines denote DNA, purple lines denote mRNA, and green lines denote miRNA. The black and white circles denote the methylation states of CpG sites, while the other colored circles represent proteins.
Fig. 2.Strong biological guidance of deep learning models. (A) Graph neural networks (GNNs) are suitable for the modeling of interaction networks. Gene expression values for each sample are assigned to the corresponding nodes in the network to instantiate the network as an input for GNNs. Information of each gene is propagated to its neighbors by graph convolution. After a few iterations of graph convolution and pooling, information of the whole node is aggregated through readout function. Aggregated information is used to predict output values. (B) Knowledge-primed neural network. Nodes in a knowledge-primed neural network directly correspond to genes or proteins, and edges represent the interaction and transcriptional regulation between them. After training the network to predict the observed biological outcome upon certain stimuli, the model is clearly interpretable by edge weights and, thus, the core regulators of the process can be identified. (C) DCell and DrugCell incorporate hierarchical representations of biological knowledge to their network structure called a visible neural network (VNN). While input nodes denote the mutational states of genes, the nodes in hidden layers correspond to the biological concepts. Note that the nodes close to the output layer represent the broader concept. The VNN output, an embedding of the genotype, is subsequently used for phenotype prediction in DCell and drug response prediction in DrugCell.