| Literature DB >> 31114856 |
Pei-Chen Peng1,2, Pierre Khoueiry3,4, Charles Girardot3, James P Reddington3, David A Garfield3,5, Eileen E M Furlong3, Saurabh Sinha1,6.
Abstract
Transcription factor (TF) binding is determined by sequence as well as chromatin accessibility. Although the role of accessibility in shaping TF-binding landscapes is well recorded, its role in evolutionary divergence of TF binding, which in turn can alter cis-regulatory activities, is not well understood. In this work, we studied the evolution of genome-wide binding landscapes of five major TFs in the core network of mesoderm specification, between Drosophila melanogaster and Drosophila virilis, and examined its relationship to accessibility and sequence-level changes. We generated chromatin accessibility data from three important stages of embryogenesis in both Drosophila melanogaster and Drosophila virilis and recorded conservation and divergence patterns. We then used multivariable models to correlate accessibility and sequence changes to TF-binding divergence. We found that accessibility changes can in some cases, for example, for the master regulator Twist and for earlier developmental stages, more accurately predict binding change than is possible using TF-binding motif changes between orthologous enhancers. Accessibility changes also explain a significant portion of the codivergence of TF pairs. We noted that accessibility and motif changes offer complementary views of the evolution of TF binding and developed a combined model that captures the evolutionary data much more accurately than either view alone. Finally, we trained machine learning models to predict enhancer activity from TF binding and used these functional models to argue that motif and accessibility-based predictors of TF-binding change can substitute for experimentally measured binding change, for the purpose of predicting evolutionary changes in enhancer activity.Entities:
Keywords: zzm321990 cis-regulatory evolution; chromatin accessibility; enhancer activity; interspecies; sequence motif; transcription factor binding
Mesh:
Substances:
Year: 2019 PMID: 31114856 PMCID: PMC6601868 DOI: 10.1093/gbe/evz103
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
. 1.—Examining evolutionary changes in TF binding and accessibility across developmental time points. (A) Regulatory network of five key TFs in mesoderm specification, Source: Khoueiry et al. (2017). (B) Data from Drosophila melanogaster and Drosophila virilis TF ChIP and DNase I hypersensitivity assays were collected; D. virilis DHS data were generated for this study. Colored boxes indicate time points (TP1–5) for which each type of genomic profile is available. Orthologous developmental stages between species were mapped according to hours of development in each species, after egg laying (AEL). (C, D) Pairwise Pearson correlations of interspecies ChIP changes, sorted by TF (C) or by time points (D). (E) Normalized accessibility scores of orthologous enhancers for three time points (TP1, 3, and 5). Colors indicate point density, with warmer colors denoting greater density. Pearson correlations between D. melanogaster TF ChIP and D. virilis TF ChIP are also shown. (F) Pairwise Pearson correlations of interspecies accessibility changes. Data and analysis shown in (C–E) pertain to over 2,500 pairs of putative orthologous enhancers involved in mesoderm specification as defined in text.
. 3.—Changes in motif presence and accessibility are both predictive of TF occupancy change. (A) Correlation between measured ΔChIP and ΔChIP predicted by models based on motif presence changes, denoted by “pΔChIP(ΔSTAP).” For each TF-TP condition, average PCC from 5-fold cross-validation is reported. (B) Similar to (A), except that the ΔChIP predictions are now based on changes in accessibility, denoted by “pΔChIP(ΔAcc).” These values are similar to those reported in figure 2, but with slightly modified models (see text). (C) Comparison of motif-based models (x axis) and accessibility-based models (y axis). P values of PCC (r) with sample size of 2,754 are also shown. (D, E) Predictions of ΔChIP based on both motif changes and accessibility changes, denoted by “pΔChIP(ΔSTAP + ΔAcc),” are better than using only motif changes (D) or only accessibility changes (E).
Classifiers Trained from Combinatorial Transcription Factor Binding Data Can Accurately Predict Enhancer Activities
| Meso | VM | SM | |
|---|---|---|---|
|
| 100 | 145 | 144 |
|
| 13 | 21 | 16 |
|
| 89 | 44 | 50 |
|
| 31 | 23 | 23 |
|
| 0.82 | 0.77 | 0.81 |
Note.—Balanced accuracy from leave-one-out cross-validation is shown for models built for each activity class: mesoderm (“Meso”), visceral muscle (“VM”), and somatic muscle (“SM”). Models were trained (and tested) on 223 experimentally characterized enhancers in Drosophila melanogaster; for each activity class, enhancers with that activity were positives, whereas enhancers of the other two classes were negatives. The numbers of correctly and incorrectly classified enhancers for each model are listed.
TN, true negative; FN, false negative; TP, true positive; and FP, false positive.
. 2.—Accessibility changes alone are modest predictors of TF occupancy changes between species. (A) Scatter plot of Drosophila melanogaster ChIP scores versus Drosophila virilis ChIP scores for Twi at TP1. Points represent orthologous enhancers that are accessible in at least one species. Colors indicate change of accessibility score. (B) Correlation coefficient between measured ΔChIP and ΔChIP predicted based on ΔAcc, denoted as “pΔChIP(ΔAcc).” P values of PCC (r) with sample size of 2,754 are also shown. (C) Scatter plot of ΔChIP versus pΔChIP for Twi at TP1. Warmer colors indicate greater point density. (D) Correlation between ChIP and accessibility in D. melanogaster (x axis) is compared with correlation between interspecies ΔChIP and pΔChIP(ΔAcc).
. 4.—A strategy to assess predictions of binding change through the lens of enhancer activity. (A) Change in regulatory activity between orthologous enhancers is estimated from difference between output scores of activity classifiers that use Drosophila melanogaster and Drosophila virilis ChIP profiles, respectively, as input. (B) An alternative estimate of change in regulatory activity between orthologous enhancers, similar to strategy in (A), except that D. virilis activity classifier uses “imputed” D. virilis ChIP profiles as input. Imputation of D. virilis ChIP scores is based on D. melanogaster ChIP scores and ΔChIP scores predicted from motif- and/or accessibility-level interspecies changes.
Changes in Motif Presence and Accessibility Can Be Used to Predict Enhancer Activity Change
| ( | Meso | VM | SM |
|---|---|---|---|
| ΔSTAP | 0.33 | 0.42 | 0.36 |
| ΔAcc | 0.21 | 0.30 | 0.33 |
| ΔSTAP and ΔAcc | 0.30 | 0.47 | 0.37 |
| Random control | 0.07 | 0.16 | 0.09 |
| #Samples | 110 | 110 | 56 |
|
|
|
|
|
| ΔSTAP | 0.65 | 0.70 | 0.82 |
| ΔAcc | 0.57 | 0.68 | 0.70 |
| ΔSTAP and ΔAcc | 0.65 | 0.75 | 0.78 |
| Random control | 0.56 | 0.59 | 0.57 |
| #Samples | 110 | 110 | 56 |
Note.—(A) PCCs between two different estimates of activity change: ΔAC, based on measured Drosophila virilis ChIP score profiles and , based on D. virilis ChIP score profiles imputed from Drosophila melanogaster scores and predictions of binding change (ΔChIP), which in turn were made from changes in sequence (“ΔSTAP”), accessibility (“ΔAcc”) or both. As a random control baseline, we used D. virilis ChIP scores imputed from D. melanogaster scores and a permuted version of the ΔChIP matrix. (B) AUROC values representing how well values can classify high versus low ΔAC enhancer pairs. These analyses were performed for three expression domains: mesoderm (Meso), visceral muscle (VM), and somatic muscle (SM). Enhancer pairs that exhibited the highest and lowest ΔAC values (in top and bottom 10 percentile for classes “Meso” and “VM,” and in the top and bottom 5 percentile for class “SM”) are reported.