| Literature DB >> 27209209 |
Daniela Hombach1,2, Jana Marie Schwarz1,2, Peter N Robinson3, Markus Schuelke1,2, Dominik Seelow4,5,6.
Abstract
BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis.Entities:
Keywords: Genetic variation; PSSM; TFBS prediction; Transcription factor binding sites
Mesh:
Substances:
Year: 2016 PMID: 27209209 PMCID: PMC4875604 DOI: 10.1186/s12864-016-2729-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Average AUC scores and representative ROC plots for different binding model sources. a Average AUC scores generated for the different binding model sources. b ROC plot for TFAP2C. c ROC plots for TFAP2A for the entire ENCODE test set (left) and the high confidence set (right). d Underlying TF binding models for TFAP2A
Fig. 2Direct comparison of binding models generated by different methods. Depicted are AUC scores for TFs stored in both JASPAR (manually collected curated models) and HT-SELEX. AUC scores were generated using ROCR. If multiple binding models were available for one TF, we depict the average AUC value
Fig. 3Representative plots for conservation analyses. We determined the maximum phastCons (a) and phyloP (b) scores in each experimentally confirmed binding site of BCL11A (left panel) and ZBTB33 (right panel) and calculated the averages of the maximum scores
Fig. 4Overview of tested TFs for the entire data set (a) and the high-confidence data (b). ENCODE: Entire set of ENCODE TFBSs (2012 freeze). High confidence set: TFs reaching at least 80 % of the maximum possible binding score (published by ENCODE) in at least 100 binding instances. Please note that the intersections are not drawn to scale