| Literature DB >> 34179848 |
Chinedu Anthony Anene1, Faraz Khan1, Findlay Bewicke-Copley1, Eleni Maniati1, Jun Wang1.
Abstract
Determining the tissue- and disease-specific circuit of biological pathways remains a fundamental goal of molecular biology. Many components of these biological pathways still remain unknown, hindering the full and accurate characterization of biological processes of interest. Here we describe ACSNI, an algorithm that combines prior knpan>owledge of biological processes with a deep neural network to effectively decompose gene expression profiles (GEPs) into multi-variable pathway activities and identify unknown pathway components. Experiments on public GEP data show that ACSNI predicts cogent components of mTOR, ATF2, and HOTAIRM1 signaling that recapitulate regulatory information from genetic perturbation and transcription factor binding datasets. Our framework provides a fast and easy-to-use method to identify components of signaling pathways as a tool for molecular mechanism discovery and to prioritize genes for designing future targeted experiments (https://github.com/caanene1/ACSNI).Entities:
Keywords: autoencoder; cell signaling; dimension reduction; gene expression; gene-regulatory networks; machine learning; neural network; pathways; systems biology
Year: 2021 PMID: 34179848 PMCID: PMC8212143 DOI: 10.1016/j.patter.2021.100270
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1Overview of the ACSNI method and the robust signal reconstruction capabilities
(A) ACSNI reduces the expression of a gene set into a small number of subprocesses and derives corresponding gene interactions, while constraining the optimization to reduce technical noise. Given two inputs, the expression matrix, and the binary gene set membership (pathway representation), the algorithm starts by splitting the transcriptome into two parts: (1) expression of the genes in the gene set and (2) the expression of the rest of the transcriptome (transposed). It then extracts the subprocess activities across samples (step i: W) from the expression profiles of the gene sets, interacts the subprocesses with the expression of the rest of the transcriptome to extract subprocess-gene interaction scores (step ii: N), and classifies the scores to infer extended network of the pathway represented by the gene set (step iii: P).
(B) Box plot of the percentage of the simulated signal recovered from the analysis of 50 gene sets against five GTEx tissue expression samples. Ex represents the gene set signal, In the expected signal, and Rn the random signal.
(C) Box plot depicting the effect of random noise (log normal) on the percentage of gene set signal recovered from the analysis of 50 gene sets against five GTEx tissue expression samples.
(D) Box plot showing the false discovery rates (FDR) from the analysis of 50 gene sets against five GTEx tissue expression samples (estimated as the number of genes from the random signal divide the total number of predicted genes in B).
(E) Scatterplot of the relationship between FDR and size of the gene set.
(F) Box plot of Jaccard index of the similarity of predicted genes between two independent split or with one shuffled split (Null) across 50 curated gene sets.
(G) Box plot comparing the Jaccard index of predicted genes between two independent expression splits for randomly generated (R, n = 20) and curated (C, n = 50) gene sets.
See also Figures S1 and S4.
Figure 2ACSNI identified pathway components and signaling crosstalk
(A and B) (A) Bar plot of the top 25 significantly enriched biological processes associated with the predicted 1,166 genes mTOR signaling components. Bars are ordered from top to bottom according to FDR values. (B) Association of ACSNI-predicted mTOR signaling genes (TCGA) with DE genes in KLF6 and EPAS1 knockout in ccRCC cells, divided into datasets (KLF6: 786 = GEO: GSE115763, EPAS1: 786 = GEO: GSE115389). Chi-squared tests (Chisq, p) and ACSNI-predicted (P) and background (B) are indicated on top of the panel. DE and nDE represent differentially or non-differentially expressed genes at adjusted p value of <0.05, respectively.
(C) Bar plot comparing enrichment of DE genes from EPAS1-KO (red) and KLF6-KO (blue) across the different ACSNI-derived subprocesses of mTOR signaling (TCGA).
(D) (Left) Heatmap of the coefficient of determination (R2) of the linear model of ACSNI-derived mTOR activity and disease status (cancer or normal adjacent tissues) across ten subprocesses. The higher the R2, the more significantly related is the subprocess to disease status. (Right) Expression of mTOR subprocess w8-associated genes in ccRCC cell lines following the inhibition of mTOR pathway with everolimus (Eve) compared with vehicle control (Cont) (GEO: GSE106819). Within the plot differential expressions at FDR < 0.05 (DE) are indicated (upregulated, red; downregulated, blue; unchanged, white).
(E) Ranked dot plot of the ratio of transcription factor (TF) ChIP density at the promoter regions (± 1 kb from TSS) of the ACSNI-predicted ATF2 signaling genes relative to background genes in artery aorta. DNA-binding domains are highlighted in red.
(F) Bar plot comparing ratio of TF ChIP density at the promoter regions (±1 kb from TSS) of the ACSNI-predicted ATF2 signaling genes across the different ACSNI-derived subprocesses from artery aorta.
(G) Gene ontology analysis of biological processes associated with predicted HOTAIRM1 genes and DE genes in HOTAIRM1.
See also Figures S2–S4.
| REAGENT OR RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| TCGA ccRCC dataset | Cancer Genome Atlas Research Network, 2013 | |
| GTEx expression datasets | Lonsdale et al., 2013 | |
| Validation ccRCC dataset | Yao et al., 2017 | |
| KFL6-ko data in ccRCC cells | Syafruddin et al., 2019 | |
| EPAS1-ko data in ccRCC cells | Zou et al., 2019 | |
| KFL6-ko data in blood cells | Adelman et al., 2019 | |
| EPAS1-ko data in endothelial cells | Yoo et al., 2015 | |
| mTOR-inhibition data in ccRCC cells | Kornakiewicz et al., 2018 | |
| HOTAIRM1-ko data in kidney cells | Hamilton et al., 2020 | |
| TF ChIP-Seq datasets | Oki et al., 2018 | |
| Cancer-cell dependency data | Tsherniak et al., 2017 | |
| Pathway interaction gene sets | Schaefer et al., 2009 | |
| MSigDB v7.2 | Liberzon et al., 2011 | |
| This paper | ||
| Trimmomatic v0.39 | Bolger et al., 2014 | |
| HISAT2 v2.1.0 | Pertea et al., 2016 | |
| HTSeq v0.11.1 | Anders et al., 2015 | |
| Python v3.8.6 | PSF | |
| R v4.0.3 | CRAN | |