| Literature DB >> 24565409 |
Agnieszka Podsiadło, Mariusz Wrzesień, Wiesław Paja, Witold Rudnicki, Bartek Wilczyński.
Abstract
BACKGROUND: Transcriptional regulation in multi-cellular organisms is a complex process involving multiple modular regulatory elements for each gene. Building whole-genome models of transcriptional networks requires mapping all relevant enhancers and then linking them to target genes. Previous methods of enhancer identification based either on sequence information or on epigenetic marks have different limitations stemming from incompleteness of each of these datasets taken separately.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24565409 PMCID: PMC4029456 DOI: 10.1186/1752-0509-7-S6-S16
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Comparison of prediction quality from histone marks. Difference in prediction quality achieved with BNFinder on epigenetic features for dataset of different sizes: 64 examples from [16] - AUC of0.75 on average (a) and 8008 examples from [17] - AUC of 0.93 on average (b). Both experiments are reported for cross-validated training.
Classification using different feature sets and classifiers
| Dataset | BNFinder | SVM | RF |
|---|---|---|---|
| EPI | 0.88 | 0.86 | |
| MOT | 0.5 | 0.87 | |
| ALL | 0.93 | 0.97 |
Classification with repeat-masked negative sets
| Dataset | SVM | RF |
|---|---|---|
| EPI | 0.88 | 0.87 |
| MOT | 0.95 | |
| ALL | 0.97 |
Figure 2Feature importance computed from Boruta package. Relative importance of different features as computed by the Boruta package [29]. Each boxplot corresponds to a different feature and represents importance z-score from 500 randomizations. Histone modifications are the most important (z-score above 10), followed by all motif features (z-score above 3), all of which are separated from the randomized control variables with (red, z-scores below 3).
Figure 3Accuracy loss as a function of multiple chromatin feature removal.
Figure 4Accuracy loss as a function of single chromatin feature removal.
Validation of classifiers on the Redfly database
| Dataset | Redfly Meso | RedFly |
|---|---|---|
| EPI | 0.77 | 0.62 |
| MOT | 0.74 | |
| ALL | 0.75 |
Classification quality with different cross- validation schemes
| Dataset | Cross-validation 9:1 | Cross-validation 1:9 |
|---|---|---|
| EPI | 88.2 ± 0.6% | 87.3 ± 0.2% |
| MOT | 89.9 ± 0.9% | 87.2 ± 0.6% |
| ALL | 98.1 ± 0.5% | 97.2 ± 0.4% |