| Literature DB >> 19088121 |
Trevis M Alleyne1, Lourdes Peña-Castillo, Gwenael Badis, Shaheynoor Talukder, Michael F Berger, Andrew R Gehrke, Anthony A Philippakis, Martha L Bulyk, Quaid D Morris, Timothy R Hughes.
Abstract
MOTIVATION: Recognition of specific DNA sequences is a central mechanism by which transcription factors (TFs) control gene expression. Many TF-binding preferences, however, are unknown or poorly characterized, in part due to the difficulty associated with determining their specificity experimentally, and an incomplete understanding of the mechanisms governing sequence specificity. New techniques that estimate the affinity of TFs to all possible k-mers provide a new opportunity to study DNA-protein interaction mechanisms, and may facilitate inference of binding preferences for members of a given TF family when such information is available for other family members.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19088121 PMCID: PMC2666811 DOI: 10.1093/bioinformatics/btn645
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
List of 75 mouse homeodomains unique at 15 AA positions that contact DNA
| Alx3 | Dobox4 | Hlxb9 | Hoxc12 | Lhx6 | Pax4 | Rhox6 |
| Bapx1 | Dobox5 | Hmbox1 | Hoxc8 | Meis1 | Pax6 | Six1 |
| Barhl1 | Duxl | Hmx1 | Ipf1 | Meox1 | Pax7 | Six3 |
| Barx1 | Emx2 | Hmx2 | Irx2 | Msx1 | Pbx1 | Six4 |
| Bsx | En1 | Homez | Irx3 | Nkx1-1 | Pitx1 | Tcf1 |
| Cdx1 | Esx1 | Hoxa1 | Isl2 | Nkx2-2 | Pknox1 | Tcf2 |
| Cphx | Evx1 | Hoxa10 | Isx | Nkx6-1 | Pou1f1 | Tgif1 |
| Crx | Gsc | Hoxa13 | Lbx2 | Obox1 | Pou2f1 | Tgif2 |
| Cutl1 | Gsh2 | Hoxa2 | Lhx1 | Obox6 | Pou4f3 | Tlx2 |
| Dbx1 | Hdx | Hoxa6 | Lhx2 | Og2x | Pou6f1 | |
| Dlx1 | Hlx1 | Hoxb13 | Lhx3 | Otp | Rhox11 |
Leave-one-out cross-validation measures for 8mer Z-score profile prediction algorithms on 32 896 8mers for 75 homeodomains
Algorithms are sorted in descending order of median rank across all columns, where ties are resolved using mean rank. The first row shows the agreement between 19 experimental replicates and their corresponding true Z-score profiles as measured using PBM. Columns labelled ‘predicted versus real’ show the mean or median performance between each predicted profile and its true, measured Z-score profile. Columns labelled ‘control’ show the difference between the median predicted versus real performance and the median of the performance between all pairs of predicted and actual profiles. Cells in a given column are coloured according to their position in the range of that column. Rows labelled top6 and top15 represent the result obtained if we use the 6 and 15 most important amino acid positions according to the RF importance score on the 57AA set.
Fig. 1.2D clustergram of Z-scores for 2042 8mers and 75 mouse homeodomains, as observed in either real PBM data (left) or NN predictions (right), with some of the established classes of homeodomains labelled. NN predictions were made using 6AA positions and leave-one-out cross-validation. The 2042 8mers were selected because they comprise the top 100 8mers by Z-score over the DBDs shown.
Fig. 2.Comparison of the accuracy of NN predictions versus experimental replicates. Scatterplots show the measured Z-scores for all 32 896 non-redundant eight-base DNA sequences from one PBM versus a second PBM for the same DBD (top) or versus the Z-score predicted using NN (6AA variant; bottom). Median performance metrics are given. Evx1 has a single NN (Hoxa2); Irx2 has a single NN (Irx3); Lhx1 has two NN (Alx3 and Lhx3).
Fig. 3.Node purity importance scores for 57 homeodomain amino acid positions for 75 rounds of leave-one-out cross-validation, sorted by median value (purple).
Fig. 4.Association between top-100 overlap scores for pairs of 8mer profile inference methods. Scatterplots show the top-100 overlap values for 75 homeodomains when Z-score profiles are predicted using one inference method versus another method for the same proteins. All axes range from 0 to 100. The names on the diagonal label the axes. Predictions are made using the 15 homeodomain DNA-contacting residues. Homeodomains are coloured according to whether they have ≥slant 5 (red), 3–4 (blue) or 1–2 (green) mismatches to their nearest sequence neighbour.