| Literature DB >> 34899830 |
Jianying Li1,2,3, Pierre R Bushel3,4, Lin Lin5,6, Kevin Day7, Tianyuan Wang1,2, Francesco J DeMayo6, San-Pin Wu6, Jian-Liang Li1.
Abstract
Gene expression is controlled by multiple regulators and their interactions. Data from genome-wide gene expression assays can be used to estimate molecular activities of regulators within a model organism and extrapolate them to biological processes in humans. This approach is valuable in studies to better understand complex human biological systems which may be involved in diseases and hence, have potential clinical relevance. In order to achieve this, it is necessary to infer gene interactions that are not directly observed (i.e. latent or hidden) by way of structural equation modeling (SEM) on the expression levels or activities of the downstream targets of regulator genes. Here we developed an R Shiny application, termed "Structural Equation Modeling of In silico Perturbations (SEMIPs)" to compute a two-sided t-statistic (T-score) from analysis of gene expression data, as a surrogate to gene activity in a given human specimen. SEMIPs can be used in either correlational studies between outcome variables of interest or subsequent model fitting on multiple variables. This application implements a 3-node SEM model that consists of two upstream regulators as input variables and one downstream reporter as an outcome variable to examine the significance of interactions among these variables. SEMIPs enables scientists to investigate gene interactions among three variables through computational and mathematical modeling (i.e. in silico). In a case study using SEMIPs, we have shown that putative direct downstream genes of the GATA Binding Protein 2 (GATA2) transcription factor are sufficient to infer its activities in silico for the conserved progesterone receptor (PGR)-GATA2-SRY-box transcription factor 17 (SOX17) genetic network in the human uterine endometrium.Entities:
Keywords: In silico perturbation; R Shiny; gene expression; molecular interaction; structural equation modeling
Year: 2021 PMID: 34899830 PMCID: PMC8652139 DOI: 10.3389/fgene.2021.727532
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1The workflow and application of SEMIPs. The left four rectangles and arrows indicate our hypothesis testing and generation schema; the components bounded by the dotted orange rectangle are features provided in the Rshiny App web-application. A biological hypothesis is tested in a model system (i.e. mouse) on relationship between two interacting factors (Fac1 & Fac2) and their endpoint through a 3-node SEM model indicated by the green rectangle. The hypothesis is translated to another species (i.e., human in our research) via T-score computation (represented by the upper blue arrow noted as “assisted by”) and verified with the SEM model (represented by the lower blue arrow noted as “achieved through SEM”). This process is accomplished with our R Shiny app indicated by two curved arrows. γ 11 and γ 21 are correlation coefficients and ξ1 is the model residual. The two-class bootstrap resampling is shown in the red rectangle box. Hypothesis generating and exploring steps are explained by the bottom two rectangles.
FIGURE 2The SEMIPs user interface. The main panel contains four tabs: “T-Scores”, “Bootstrap”, “SEM”, and “Instructions”. The right panel shows the screen when the “T-Scores” tab is selected and generated. In the left panel, the application accepts two inputs: 1) a list of signatures (in Entrez gene symbol format) and 2) a data matrix of expression measurement with the top lines shown for viewing. The green “Go!” button is clicked to launch the T-score generation and grayed out to denote the process is running. The first 10 rows of the T-scores matrix are shown; however, the entire matrix can be downloaded by clicking the “Download T-Scores” button.
FIGURE 3A two-class bootstrap resampling (elimination with or without replacement) simulation. From the initial GATA2 significant gene list represented as the yellow rectangle, the downstream target genes (“N”) are eliminated in the without replacement simulation (left side) giving rise to the shrunk significant gene list represented by a smaller yellow rectangle; in the elimination with replacement simulation (right side), the same number of genes as that of the targeted subset of genes (“N”) are eliminated giving rise to the shrunk significant gene list, and then restored back to the original size by adding back randomly draw (“N”) represented by the far right green oval from the gene pool represented by the blue cylinder. In the elimination without replacement, the resulting shrunken GATA2 gene list is used to calculate the T-scores, then fed into the SEM model indicated by the green rectangle. In the elimination with replacement, the restored gene list is used to calculate the T-scores, then fed into the SEM model. The simulation can be repeated for a large “number of bootstraps” to generate a non-parametric distribution for statistics significance.
FIGURE 4Major model fitting statistics for the joint regulation of the SOX17 gene expression levels by GATA2 and PGR activities in the GEO accession: GSE58144 dataset illustrated in the 3-node SEM. Two exogenous variables are “Gene Signature of GATA2 Direct Downstream Targets” and “PGR Gene Signature” respectively, and one endogenous variable is “SOX17 Expression Levels”.