| Literature DB >> 23685394 |
Chengqi Wang1, Michael Q Zhang, Zhihua Zhang.
Abstract
As a class of cis-regulatory elements, enhancers were first identified as the genomic regions that are able to markedly increase the transcription of genes nearly 30years ago. Enhancers can regulate gene expression in a cell-type specific and developmental stage specific manner. Although experimental technologies have been developed to identify enhancers genome-wide, the design principle of the regulatory elements and the way they rewire the transcriptional regulatory network tempo-spatially are far from clear. At present, developing predictive methods for enhancers, particularly for the cell-type specific activity of enhancers, is central to computational biology. In this review, we survey the current computational approaches for active enhancer prediction and discuss future directions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23685394 PMCID: PMC4357786 DOI: 10.1016/j.gpb.2013.04.002
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Features used in enhancer prediction algorithms The comparative genomic features are usually generated from comparison between DNA sequences in closely-related species. TF binding features result from two sources, one from known TF binding motifs and the other from ChIP experiments. Epigenetic features can be measured by various technologies. See the main text for more details.
Features of computational methods for enhancer prediction
| Feature | Method | Ref |
|---|---|---|
| Comparative genomic features | Aparicio’s method | |
| Visel’s method (2008) | ||
| Chen’s method | ||
| Yip’s method | ||
| Sequence-based TF binding related features | Narlikar’s method | |
| Chen’s method | ||
| Lee’s method | ||
| Yip’s method | ||
| Experiment-based TF binding related features | Visel’s method (2009) | |
| Zinzen’s method | ||
| May’s method | ||
| Chen’s method | ||
| Epigenetic features | Heintzman’s method | |
| Won’s method | ||
| Firpi’s method | ||
| SEGWAY | ||
| Kharchenko’s method | ||
| He’s method | ||
| Ernst’s method | ||
| ChromaGenSVM | ||
| Yip’s method | ||
| Chen’s method | ||
| Bonn’s method | ||
Note: More than one type of features were employed to build enhancer recognition model in some studies. For example, Chen et al. used all four types of features to develop active enhancer recognition model [8].
Figure 2Flow scheme of model building To improve model interpretability and reduce overfitting, sophisticated computational strategies implement feature selection algorithm to select a subset of relevant features for model building. Then, appropriate classification model is employed to differentiate active enhancers from non-enhancers. Generally, there are two major classification models. The first is the discriminative models which find the optimal classification border in the feature space (lower left panel). The other one is the probabilistic graphical models that try to model the joint distribution of states and associated features with graph (lower right panel). ANN, artificial neutral network; BN, Bayesian network; HMM, hidden Markov model; SVM, support vector machine.
Model building strategies and performance of enhancer prediction methods
| Category | Method | Operational model | Positive predictive value (%) | Note | Ref |
|---|---|---|---|---|---|
| Discriminative model | Heintzman’s method | Thresholds of histone modification profiles | 39.5 | Mapped to distal p300 binding sites in HeLa cells | |
| Visel’s method (2009) | Thresholds of p300 binding profiles | 87.7 | With reproducible enhancer activity in transgenic mouse | ||
| Narlikar’s method | Linear regression | 62 | With reproducible enhancer activity | ||
| Zinzen’s method | Support vector machine | 71.4 | With reproducible enhancer activity in transgenic | ||
| Firpi’s method | Time-delay neural network | 66.3 | Overlapped with p300 binding sites, Dnase I hypersensitivity sites or TRAP220 binding sites in HeLa cells | ||
| Lee’s method | Support vector machine | 74.5 | Overlapped with Dnase I hypersensitive enhancers in embryonic mouse whole brain cells | ||
| ChromaGenSVM | Support vector machine | 57 | Overlapped with p300 binding sites, Dnase I hypersensitivity sites or TRAP220 binding sites in HeLa cells | ||
| Probabilistic graphical model | Won’s method | Hidden Markov model | 54.8 | Overlapped with p300 binding sites, Dnase I hypersensitivity sites or TRAP220 binding sites in HeLa cells | |
| Bonn’s method | Bayesian network | 78 | Overlapped with previously identified TF binding sites in | ||
| Other | Chen’s method | Multinomial logistic | 83 | Overlapped with at least one TF peak from 7 mouse embryonic stem cell ChIP-seq datasets | |
| Yip’s method | Random forest | 67 | With enhancer activity | ||
Note: The performance shown here is the reported performance compared to experimental results. The positive predictive value (percentage) was calculated as follows: positive predictive value = true positive/(true positive + false positive).