| Literature DB >> 29641998 |
Konstantine Tchourine1, Christine Vogel2, Richard Bonneau3.
Abstract
Large-scale inference of eukaryotic transcription-regulatory networks remains challenging. One underlying reason is that existing algorithms typically ignore crucial regulatory mechanisms, such as RNA degradation and post-transcriptional processing. Here, we describe InfereCLaDR, which incorporates such elements and advances prediction in Saccharomyces cerevisiae. First, InfereCLaDR employs a high-quality Gold Standard dataset that we use separately as prior information and for model validation. Second, InfereCLaDR explicitly models transcription factor activity and RNA half-lives. Third, it introduces expression subspaces to derive condition-responsive regulatory networks for every gene. InfereCLaDR's final network is validated by known data and trends and results in multiple insights. For example, it predicts long half-lives for transcripts of the nucleic acid metabolism genes and members of the cytosolic chaperonin complex as targets of the proteasome regulator Rpn4p. InfereCLaDR demonstrates that more biophysically realistic modeling of regulatory networks advances prediction accuracy both in eukaryotes and prokaryotes.Entities:
Keywords: RNA degradation rates; RNA stability; biophysical modeling; gene regulatory networks; machine learning; network inference; network remodeling; saccharomyces cerevisiae; systems biology; transcriptional regulatory networks
Mesh:
Substances:
Year: 2018 PMID: 29641998 PMCID: PMC5987223 DOI: 10.1016/j.celrep.2018.03.048
Source DB: PubMed Journal: Cell Rep Impact factor: 9.423
Figure 1Network Inference Is Sensitive to RNA Half-Lives in Both Eukaryotes and Pro-karyotes in a Condition- and Gene-Specific Manner
(A and B) AUPR is shown as a function of preset RNA half-life (A) in Saccharomyces cerevisiae and (B) in Bacillus subtilis. Each line denotes one of the 20 independent gold standard re-samples, and colored dots represent the maximum AUPR for a given re-sample.
(C) Over two thousand expression datasets group into 20 bi-clusters (unrelated to 20 Gold Standard re-samples) with gene- and condition-specific properties. Red and blue denote high and low expression levels, respectively. Gene cluster names correspond to the most highly enriched function category. Condition cluster names represent the most highly enriched terms in the meta-data. The heatmap shows the 997 genes from the Gold Standard. Note that the final network was derived from expression data of 5,716 genes mapped onto these clusters.
(D) Shades of red denote the optimal half-life, in minutes, for each of the 20 bi-clusters. The color scale is devised to discriminate half-lives < 50 min, which contain 16/20 of the predictions.
For the full plot of AUPR and AUROC trajectories for every bi-cluster, see Figures S4 and S5, respectively. See also Figures S1 and S6A–S6H.
Figure 3InfereCLaDR Outperforms Previous Network Inference Approaches by Predicting New High-Confidence Condition-Specific Interactions
(A) The improvement in the precision-recall curve is a result of the use of a high-quality Gold Standard, bi-cluster specific network inference, and optimization of bi-cluster specific RNA half-lives. We compare InfereCLaDR (red line) with Inferelator without bi-clustering or half-life optimization (black dotted line), with the Inferelator using the MacIsaac gold standard of interactions (orange dashed line), and with context likelihood of relatedness (CLR), Genie3, and iRafNet (purple dash-dotted line, blue dashed line, and green dash-dotted line, respectively). Each curve is constructed using median precision and recall values across 20 re-samples. For improvement based on AUROC, see Figure S1F.
(B) The number of new predicted interactions (i.e., interactions not in the Gold Standard), obtained using the optimized bi-cluster-specific half-lives and the full Gold Standard for training, compares favorably with new predictions from the original Inferelator and from Genie3. The height of a section within each bar corresponds to the number of new interactions that were confirmed by the corresponding type of evidence in orthogonal data. Direct evidence refers to physical protein-DNA interactions, and indirect evidence refers to knockout and overexpression assays (Table S1). The number above each bar denotes the fraction of new interactions supported by at least one orthogonal source. Prec, precision.
(C) High-scoring regulatory interactions correlate between InfereCLaDR and Inferelator, but InfereCLaDR predicts many new interactions. The vertical and horizontal blue lines show the precision = 0.5 rank cutoff for InfereCLaDR and Inferelator, respectively (Supplemental Experimental Procedures). The lower the rank, the higher the confidence in the prediction. The red line maps the InfereCLaDR rank to the same rank in Inferelator. See also Figures S6I–S6K.
(D) Most (56%) regulatory interactions that were newly predicted by InfereCLaDR have orthogonal support to validate them. In comparison, Inferelator’s predictions that were removed in InfereCLaDR have little support, suggesting that they had been false positives. Black bars denote the bottom right quadrant in (C) (gained), gray bars denote the top left quadrant in (C) (lost), and white bars denote the interactions in the bottom left quadrant of (C) (conserved).
(E) InfereCLaDR’s gained interactions are often specific to experimental conditions. Each bar displays how many regulatory interactions were above the rank-based cutoff (Supplemental Experimental Procedures) for the given number of clusters. Interactions that only appear in one cluster are very condition-specific, whereas interactions that appear in all four clusters are more general and independent of experimental conditions. The graph shows only the high-confidence predictions that were above the cutoff for at least one cluster prior to rank-combining.
(F) InfereCLaDR’s gained predictions are often specific to non-standard conditions. Each interaction was assigned a bi-cluster based on the gene cluster of the target gene and a condition cluster in which this interaction had the best rank. Red cells represent bi-clusters with significantly more gained interactions, blue cells represent bi-clusters with significantly fewer gained interactions, and white cells represent no enrichment (Supplemental Experimental Procedures).
Figure 2InfereCLaDR Recapitulates Known Differences between RNA Half-Lives of Different Genes and Conditions and Identifies New Relationships
(A–H) The boxes show median RNA half-lives with the first and third quartiles. (A)–(D) show the distribution of predicted values across 20 Gold Standard re-samples, and (E)–(H) show values measured experimentally across genes. Predicted values are produced by InfereCLaDR; observed values are from experimental datasets. Magenta color denotes minimally perturbed conditions (i.e., chemostat and log phase growth) (predicted) and Neymotin et al. (2014) experimental data for subsets of genes; i.e., nucleobase-containing small-molecule metabolism (NCSM) and translation. We highlight the NCSM category because its high half-lives was the most prominent predicted pattern under minimally perturbed conditions. Light blue denotes all genes predicted or observed under unperturbed conditions. Green denotes half-lives of all genes predicted or observed under conditions of transcription inhibition (Shalem et al., 2008). See also Figure S4.
Expression Data Bi-clustering, RNA Half-Life Fitting, and a High-Quality Gold Standard Underlie InfereCLaDR’s Superior Performance
| Method | Re-samples Outperforming Inferelator + GS | Re-samples Outperformed by InfereCLaDR | Median AUPR |
|---|---|---|---|
| InfereCLaDR (GS + clustering + RNA half-life) | 19/20 – | – | 0.328 |
| Inferelator + GS + clustering | 20/20 | 17/20 | 0.319 |
| Inferelator + GS + RNA half-life | 19/20 | 19/20 | 0.305 |
| Inferelator + GS | – | 19/20 | 0.290 |
| Inferelator + MacIsaac | 0/20 | 20/20 | 0.146 |
| Genie3 + GS | 0/20 | 20/20 | 0.042 |
| iRafNet + GS | 0/20 | 20/20 | 0.031 |
Each modification independently outperforms the original Inferelator using the Gold Standard (second column). “Inferelator + MacIsaac” shows the results of the Inferelator when the MacIsaac standard of interactions was used for training and Gold Standard for evaluation. Combining all modifications optimizes performance compared with using them separately (third column). Columns two and three show the number of times one method outperformed the other in a re-sample, as specified by the corresponding row and column, in terms of AUPR. The fourth column shows the median AUPR. See Sub-sampling the Gold Standard for RNA Half-Life Fitting and Error Estimation, RNA Half-Life Estimation, and Supplemental Experimental Procedures for further details. See also Table S2.
Figure 4Condition-Specific Networks Reveal New Predictions and Regulatory Relationships beyond What a Global Network Can Show
The figure displays the final high-confidence regulatory network split into four parts, based on the four experimental condition clusters, where each interaction was detected with the strongest confidence. Transcription factors are shown in black (center), and target genes are colored (periphery). Different colors indicate different gene clusters, as shown in the legend. The colors of the edges correspond to predicted transcriptionally activating (red) and repressive (blue) regulation, respectively. A stronger color denotes high confidence. A large font size denotes the five transcription factors that are most specific to each condition cluster. TFs that were not among the top 5 in terms of cluster specificity in any of the clusters are not shown.
InfereCLaDR Top-Ranking Predictions and Their Precision Values
| A | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RAP1 | GCN4 | SFP1 | MSN2 | SOK2 | YAP1 | HSF1 | RPN4 | ABF1 | TEC1 | |||||||||||
| SEC14 | 0.67 | LYS1 | 0.85 | RPS24A | 0.98 | YDR391C | 0.71 | YRO2 | 0.67 | 0.48 | OPI10 | 0.81 | RPN12 | 0.93 | 0.31 | 0.26 | ||||
| YEL1 | 0.62 | HOM2 | 0.84 | RPL18B | 0.92 | CMK2 | 0.64 | UIP4 | 0.67 | TAH18 | 0.46 | GGA1 | 0.66 | RPT2 | 0.92 | COA3 | 0.28 | CRH1 | 0.25 | |
| PMT4 | 0.54 | BAT1 | 0.82 | RPS16A | 0.92 | YLR257W | 0.43 | YNL194C | 0.61 | ISF1 | 0.44 | RTC3 | 0.64 | PRE10 | 0.92 | GET2 | 0.24 | YOL019W | 0.23 | |
| FEN1 | 0.52 | HIS2 | 0.82 | RPL8A | 0.90 | STF2 | 0.38 | GAD1 | 0.60 | YNR034W-A | 0.25 | YRO2 | 0.62 | RPT5 | 0.92 | BSD2 | 0.23 | RAX2 | 0.20 | |
| ALG7 | 0.50 | STR2 | 0.80 | RPS18B | 0.91 | RCN2 | 0.32 | JIP4 | 0.59 | MBR1 | 0.22 | APJ1 | 0.61 | RPN7 | 0.88 | 0.22 | YOL014W | 0.18 | ||
| 0.44 | SDT1 | 0.76 | RPS18A | 0.90 | HAL5 | 0.25 | MSC1 | 0.59 | 0.20 | 0.57 | RPN1 | 0.85 | 0.22 | TGL2 | 0.16 | |||||
| RPL18B | 0.43 | RIB3 | 0.75 | RPL40B | 0.90 | OXR1 | 0.24 | TFS1 | 0.56 | TCB2 | 0.18 | 0.53 | RPN10 | 0.83 | MIM1 | 0.22 | YPS3 | 0.15 | ||
| RPL16B | 0.41 | PSF2 | 0.75 | RPL42B | 0.88 | PNC1 | 0.24 | YNL195C | 0.56 | GAT2 | 0.18 | YGR250C | 0.51 | PRE7 | 0.82 | YDR541C | 0.18 | PHM8 | 0.14 | |
| YLR412C-A | 0.40 | POS5 | 0.74 | RPS22A | 0.88 | DOA1 | 0.23 | YJR096W | 0.54 | YLR460C | 0.16 | CUR1 | 0.51 | YBR062C | 0.81 | 0.18 | 0.12 | |||
| HXK2 | 0.40 | SRY1 | 0.74 | RPL9A | 0.88 | MRP8 | 0.22 | OM45 | 0.49 | ATR1 | 0.16 | 0.45 | PUP1 | 0.81 | SLC1 | 0.17 | 0.11 | |||
| B | ||||||||||||||||||||
| PLB2 | 0.83 | FIG2 | 0.49 | HNM1 | 0.84 | YHR022C | 0.47 | AIM20 | 0.83 | FLO9 | 0.33 | IDH2 | 0.71 | VHR2 | 0.33 | OAC1 | 0.74 | 0.60 | ||
| TDA4 | 0.71 | PRM1 | 0.48 | SAH1 | 0.81 | CRS5 | 0.37 | HOF1 | 0.84 | 0.24 | PDH1 | 0.30 | 0.33 | BAT1 | 0.72 | 0.40 | ||||
| YLR413W | 0.66 | SAG1 | 0.46 | FAS1 | 0.81 | 0.22 | ALK1 | 0.82 | 0.17 | AAT2 | 0.27 | MET3 | 0.32 | FRS2 | 0.66 | 0.28 | ||||
| 0.65 | TIR1 | 0.46 | SAM2 | 0.78 | SNO4 | 0.20 | CLB1 | 0.81 | 0.15 | 0.22 | 0.30 | ILV5 | 0.52 | YLR407W | 0.27 | |||||
| FAS1 | 0.50 | PAU24 | 0.39 | CHO1 | 0.75 | FRE7 | 0.16 | 0.79 | 0.15 | 0.14 | MET10 | 0.28 | LEU1 | 0.49 | 0.26 | |||||
| 0.46 | AAC3 | 0.39 | EPT1 | 0.67 | PUG1 | 0.13 | BUD4 | 0.78 | 0.14 | WTM1 | 0.14 | MET5 | 0.27 | 0.43 | 0.24 | |||||
| SUR2 | 0.36 | TIR4 | 0.35 | ADO1 | 0.63 | YEL073C | 0.13 | CDC5 | 0.77 | 0.14 | IDP1 | 0.13 | MEP2 | 0.27 | MAE1 | 0.33 | 0.21 | |||
| 0.35 | EUG1 | 0.31 | OPI3 | 0.63 | PDC6 | 0.12 | 0.76 | 0.14 | 0.12 | 0.21 | PAB1 | 0.26 | 0.20 | |||||||
| 0.35 | TIR2 | 0.27 | EHT 1 | 0.59 | GCY1 | 0.11 | KIN3 | 0.75 | 0.13 | 0.10 | GNP1 | 0.21 | 0.25 | 0.19 | ||||||
| 0.35 | FIG1 | 0.26 | YIP3 | 0.55 | FMP48 | 0.11 | HST3 | 0.67 | 0.13 | 0.10 | DAL80 | 0.18 | 0.22 | MCH5 | 0.16 | |||||
(A) New targets (i.e., interactions not in the Gold Standard) of the ten transcription factors (top row) that are most connected in the Gold Standard. The table also lists the precision values of these interactions, with darker green denoting higher precision.
(B) New targets of ten TFs of medium connectivity. Precision values are calculated using the entire matrix of prediction confidence scores, containing 5,716 genes and 557 TFs. The list of true positives was defined by the Gold Standard. Bold targets correspond to interactions that were not found in any of the four orthogonal datasets listed in Table S1; i.e., these regulatory interactions are entirely new. See Table S3 and Data S1 for more details.