| Literature DB >> 35538061 |
Yujie You1, Xin Lai2, Yi Pan3, Huiru Zheng4, Julio Vera2, Suran Liu1, Senyi Deng5, Le Zhang6,7,8.
Abstract
Artificial intelligence is an advanced method to identify novel anticancer targets and discover novel drugs from biology networks because the networks can effectively preserve and quantify the interaction between components of cell systems underlying human diseases such as cancer. Here, we review and discuss how to employ artificial intelligence approaches to identify novel anticancer targets and discover drugs. First, we describe the scope of artificial intelligence biology analysis for novel anticancer target investigations. Second, we review and discuss the basic principles and theory of commonly used network-based and machine learning-based artificial intelligence algorithms. Finally, we showcase the applications of artificial intelligence approaches in cancer target identification and drug discovery. Taken together, the artificial intelligence models have provided us with a quantitative framework to study the relationship between network characteristics and cancer, thereby leading to the identification of potential anticancer targets and the discovery of novel drug candidates.Entities:
Mesh:
Year: 2022 PMID: 35538061 PMCID: PMC9090746 DOI: 10.1038/s41392-022-00994-0
Source DB: PubMed Journal: Signal Transduct Target Ther ISSN: 2059-3635
Fig. 1The historical milestones of network-based and ML-based biology analysis. (Created with BioRender.com)
Fig. 2Artificial intelligence to integrate multiomics data (e.g., epigenetics, genomics, proteomics, and metabolomics) for cancer therapeutic targets identification. (Created with BioRender.com)
Commonly used repositories related to human diseases, drug targets, genomics, and biological networks
| Database name | Description | Web link | Ref |
|---|---|---|---|
| Disease | |||
| Online Mendelian Inheritance in Man (OMIM) | A comprehensive, authoritative, and timely knowledgebase of human genes and genetic disorders | [ | |
| Pathologisch Anatomisch Landelijk Geautomatiseerd Archief (PALGA) | A database of histopathology and cytopathology was stored. | [ | |
| Drug Target | |||
| DrugBank | DrugBank is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions, and their targets. | [ | |
| Therapeutic Targets Database (TTD) | A database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, etc. | [ | |
| PubChem | PubChem is an open repository for chemical structures and their biological test results. | [ | |
| ChEMBL | ChEMBL is an open data database containing binding, functional and ADMET information for many drug-like bioactive compounds. | [ | |
| Genomics Data | |||
| Gene Expression Omnibus (GEO) | GEO is a public functional genomics data repository. Array- and sequence-based data are accepted. | [ | |
| The Cancer Genome Atlas (TCGA) | TCGA contains clinical data of various human cancers, genomic mutations, mRNA expression, miRNA expression, methylation, etc. | [ | |
| Cancer Cell Line Encyclopedia (CCLE) | A compilation of gene expression, chromosomal copy number and massively parallel sequencing data from 947 human cancer cell lines. | [ | |
| ENCyclopedia Of DNA Elements (ENCODE) | ENCODE has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. | [ | |
| Catalogue Of Somatic Mutations In Cancer (COSMIC) | COSMIC curates comprehensive information on somatic mutations in human cancer. | [ | |
| Biological Network | |||
| Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) | A database of known and predicted protein interactions | [ | |
| Gene Ontology (GO) | The world’s largest source of information on the functions of genes. | [ | |
| Kyoto Encyclopedia of Genes and Genomes (KEGG) | A collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances | [ | |
Fig. 3The flow chart of the shortest path algorithm. The red paths in the bottom network are the identified shortest path from node S to T
Fig. 4The flow chart of the module detection algorithm
The formula to compute degree centrality, coreness centrality, betweenness centrality and eigenvector centrality
| Node centrality | Formula | Description | Eq. |
|---|---|---|---|
| Degree centrality | di is the degree of vertex i. | (3) | |
| Coreness centrality | Vertex j belongs to the neighbours of vertex i, ks(j) is the k-shell index of vertex j. | (4) | |
| Betweenness centrality | gj,k is the number of all shortest paths between j and k, gj,k(i) is the number of shortest paths between j and k containing i. | (5) | |
| Eigenvector centrality | if vertex i is linked to vertex j, ai,j = 1, xj is the degree of vertex j, λ is a constant. | (6) |
Fig. 5Four types of node centralities of biological networks.
(a) Degree centrality; (b) Coreness centrality; (c) Betweenness centrality; (d) Eigenvector centrality
Fig. 6An illustration of a simple decision tree model
Thirteen network topological features for decision tree classification[147]. The score is a combination of the classification accuracy and the Gini index[148]
| Topological measures | Concept | Score |
|---|---|---|
| Structural holes[ | Rank nodes by their connectivity and lack of redundancy | 13.37 |
| Node degree | The number of connections of a node | 13.36 |
| Node coreness | Considers both the degree of nodes and their positions in a network | 12.05 |
| k-Step Markov[ | The probability that a random walk of length k makes the system reach a certain vertex | 10.47 |
| Subgraph[ | The number of times a given vertex participates in different connected subgraphs of a network | 10.36 |
| Within–module z-score[ | Measure how nodes are related. | 8.88 |
| Katz status index[ | Rank a vertex as highly important if many nodes are connected to it. | 8.64 |
| Closeness | The average length of the shortest path between nodes | 8.18 |
| Proximity prestige | The average shortest path length of a node | 8.12 |
| Eigenvector centrality | The influence of directly adjacent nodes on central node | 8.09 |
| Betweenness | A node acts as a bridge along the shortest path between two other nodes | 7.93 |
| Bary centre score[ | Rank the nodes by the total shortest path of the vertex | 5.70 |
| Clustering coefficient[ | Measure the degree of cohesiveness | 0.15 |
Fig. 7An example of an ADTree model.
The root nodes indicate the ratio between positive and negative class examples. The numbers in parentheses within each decision node (rectangles) indicate the order in which the rule was found. The amount of node conservation between each of the trees is indicated by the colour of the box. Ovals (prediction nodes) contain the value for the weighted vote. The numbers next to the arrows correspond to the threshold for the prediction
Commonly used neural networks in ML-based biology analysis
| Model | Characteristic | Application scenarios |
|---|---|---|
| Non-graph Neural Network | ||
| DNN | Deep neural network (DNN), also called multi-layer perceptron, is a neural network with multi-layer hidden layer. | [ |
| CNN | Convolutional neural network (CNN) obtains local information between input data by convolution. | [ |
| Graph-based Neural Network | ||
| GCN | Graph convolutional network (GCN) applied cconvolution in networks to obtain local information between nodes and neighbour nodes. | [ |
| GAE | Graph autoencoder (GAE) uses autoencoder to extract the embedded features of the network. | [ |
| GAN | Graph attention network (GAN) uses attention mechanism instead of convolution to obtain local or global information between nodes. | [ |
| DeepWalk | DeepWalk is a network embedding model, which can represent the attributes of graph nodes as low dimensional and dense eigenvectors. | [ |
Fig. 8The illustration of graph-based neural networks for ML-based biology analysis.
The graph-based neural networks take the topology of the biological networks data (such as gene-gene networks, protein-protein networks and drug-target networks) as input data. And then, the graph-based neural network realizes the functions of link prediction, classification and clustering by analyzing the biological information in the network topology. (Created with BioRender.com)
Fig. 9The workflow to identify novel anticancer targets by network-based. (Created with BioRender.com)
Fig. 10The workflow to evaluate the druggability of potential target proteins. (Created with BioRender.com)
Fig. 11The graph-based neural network for DTI prediction by combining both bottom-up and top-down biology analysis approaches. (Created with BioRender.com)
The brief description of the ADMET properties[256]
| Property | Description |
|---|---|
| Absorption | The ability of a drug that cross membranes of many cell to reach its site of action, when drug is administered via oral ingestion. |
| Distribution | After absorption or systemic administration into the bloodstream, a drug is distributed to its site of action through the circulatory systems. |
| Metabolism | The process of chemically converting a drug to a metabolite is called metabolism or biotransformation. |
| Excretion | The collective term used for irreversibly removing a drug from the body |
| Toxicity | The extent to which a drug damages an entire organism, an organism’s substructure, or an organ. |
Fig. 12The graph-based neural network capture the features related to drug properties from drug molecular structure to predict ADMET properties of drugs. (Created with BioRender.com)
| 1: | |
| 2: | create an empty set P and a set Q contains all nodes |
| 3: | |
| 4: | d(S,V) ← infinity |
| 5: | d(S,S) ← 0 |
| 6: | |
| 7: | U ← vertex in Q with minimal d(S,U) |
| 8: | remove U from Q |
| 9: | |
| 10: | alt ← d(S,U) + dU,V |
| 11: | |
| 12: | d(S,V) ← alt |
| 13: | add U to the set P |
| 14: | |
| 15: |
| 1: | |
| 2: | M ← the total number of edges in the Network |
| 3: | |
| 4: | i ← a single module |
| 5: | ki ← degree of vertex i |
| 6: | ai ← ki/2 M |
| 7: | |
| 8: | |
| 9: | ei.j ← 1/2 M |
| 10: | |
| 11: | ei.j ← 0 |
| 12: | |
| 13: | ΔQ ← ei.j + ej,i-2aiaj |
| 14: | consolidate related communities |
| 15: | direction ← the greatest increase (or smallest decrease) in Q |
| 16: | |
| 17: |
| 1: | |
| 2: | |
| 3: | |
| 4: | di ← the number of ties that vertex i has |
| 5: | CD(i)=di |
| 6: | |
| 7: | |
| 8: | |
| 9: | |
| 10: | N(i) ← the set of the neighbours adjacent to vertex i |
| 11: | |
| 12: | ks(j) ← the k-shell index of vertex j |
| 13: | CC(i) ← CC(i) + ks(j) |
| 14: | |
| 15: | |
| 16: | |
| 17: | |
| 18: | |
| 19: | |
| 20: | |
| 21: | gj,k ← number of all shortest paths between j and k |
| 22: | gj,k(i) ← number of shortest paths between j and k containing i |
| 23: | CB(i) ← CB(i) + gj,k(i)/gj,k |
| 24: | |
| 25: | |
| 26: | |
| 27: | |
| 28: | |
| 29: | |
| 30: | ai,j=1 |
| 31: | |
| 32: | ai,j=0 |
| 33: | xj ← the degree of vertex j |
| 34: | CE(i) ← CE(i)+ 1/λ ∙ ai,jxj |
| 35: | |
| 1: | |
| 2: | root node ← the bias in the dataset |
| 3: | |
| 4: | ai ← attribute value |
| 5: | ti ← threshold |
| 6: | |
| 7: | |
| 8: | |
| 9: | |
| 10: | |
| 11: | |
| 12: | |
| 13: | |
| 14: | s ← the sum of all scores acquired |
| 15: | |
| 16: | |
| 17: | |
| 18: | |