| Literature DB >> 28347313 |
Saskia Trescher1, Jannes Münchmeyer2, Ulf Leser2.
Abstract
BACKGROUND: Gene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation.Entities:
Keywords: Gene regulation; Mathematical optimization; Regulatory network; Systems biology
Mesh:
Substances:
Year: 2017 PMID: 28347313 PMCID: PMC5369021 DOI: 10.1186/s12918-017-0419-z
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1Transcription of DNA into RNA. Transcription factors (TFs) bind to distal or proximal TF binding sites (TFBS) enhancing the binding of RNA polymerase and activating the transcription of DNA into RNA
Overview of methods for estimating regulatory activity from transcriptome data comparing input data, modelling, computational aspects and outcome variables
| Method | Input | Model | Computation | Output |
|---|---|---|---|---|
| Approach by Schacht et al. | - mRNA expression data | Linear model | - Optimization criterion: minimize sum of absolute errors | - parameter for each TF: |
| RACER | - mRNA expression data | Linear models: | - Optimization criterion: minimize sum of squared errors with L1 norm penalty on linear coefficients | 1) sample-specific TF and miRNA activities |
| RABIT | - differential mRNA expression data | Linear model: | - Frisch-Waugh-Lovell method, select subset of significant TFs via model selection procedure and remove TFs with insignificant correlation across tumors | - regulatory activity score for each TF (t value of linear regression coefficient of |
| ISMARA | - gene expression or chromatin state measurements | Linear model | - Optimization criterion: minimize sum of errors | - inferred motif activity profiles |
| biRte | - mRNA differential expression | Likelihood model: | - data specific marginal likelihoods using estimation of hidden state variables with via MCMC | - Estimation of active regulators |
| ARACNE | - microarray expression profiles | none | - local estimation of pairwise gene expression profile mutual information | - Reconstruction of gene regulatory network |
Gene expression data is named “g” with index i, estimated parameters with “β”, TF binding information with “b”, TFs with “t”, samples with “s”, miRNAs with “mi” and model constants with “c”. Other variables are explained in the text
Fig. 2General scheme of a TF – gene network where all T TFs are connected to each other and can regulate all of the G genes
Fig. 3Flow chart of the approach by Schacht et al. The input data sets (marked in blue) are partly filtered and passed to a linear regression model (yellow) which calculates an activity value for each TF (green)
Fig. 4Scheme of RACER method. The input data sets (marked in blue) are passed to a two-step linear regression model (yellow) which calculates sample specific activity values for each regulator and determines the most predominant regulators (green)
Fig. 5Flow chart of RABIT method. The input data sets (marked in blue) are passed to a linear regression model (yellow) which calculates sample specific activity values for each regulator and determines general regulatory activities (green)
Fig. 6ISMARA model scheme. The input data sets (marked in blue) are passed to a linear regression model (yellow) which calculates motif activities and determines associated regulators (green)
Fig. 7Scheme of biRte method. The input data sets (marked in blue) are passed to a likelihood model (yellow) which determines active regulators (green)
Fig. 8ARACNE flow chart. The input data set (marked in blue) is used to calculate pairwise mutual information where indirect interactions are removed (yellow) and which allow a reconstruction of the gene regulatory network (green)
HGNC Symbols of the top 10 regulators found by each method for COAD (using 165 samples), LIHC (404 samples) and PAAD (180 samples) and the use of only mRNA data as input (left panel) and multiple input data sets (RACER: mRNA, miRNA, CNV and DNA methylation; RABIT: mRNA, CNV and DNA methylation; biRte: mRNA and CNV; right panel). TFs with equal activity values are marked with*. TFs found by several method’s top 10 are marked in bold (when found by RACER, RABIT and biRte), blue (RACER and RABIT), red (RABIT and biRte) or yellow (RACER and biRte)
Fig. 9Number of overlapping TFs in the top 100 of ranked TFs per method (for RABIT the overlap with the top 76/67/57 TFs (having activity > 0) in COAD/LIHC/PAAD is shown)