| Literature DB >> 21297945 |
Konstantin Tretyakov1, Sven Laur, Jaak Vilo.
Abstract
Transcription factors are proteins that bind to motifs on the DNA and thus affect gene expression regulation. The qualitative description of the corresponding processes is therefore important for a better understanding of essential biological mechanisms. However, wet lab experiments targeted at the discovery of the regulatory interplay between transcription factors and binding sites are expensive. We propose a new, purely computational method for finding putative associations between transcription factors and motifs. This method is based on a linear model that combines sequence information with expression data. We present various methods for model parameter estimation and show, via experiments on simulated data, that these methods are reliable. Finally, we examine the performance of this model on biological data and conclude that it can indeed be used to discover meaningful associations. The developed software is available as a web tool and Scilab source code at http://biit.cs.ut.ee/gmat/.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21297945 PMCID: PMC3031503 DOI: 10.1371/journal.pone.0014559
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The matrices
(top), (bottom left) and (bottom right). Each row of corresponds to a certain gene, as does each row of . Each column of corresponds to a certain experiment, as does each column of . The rows of can be regarded as descriptive attributes for the rows of , and the columns of – as the attributes for the columns of .
G = MAT analysis of the Spellman dataset.
| Motif | TF | Score |
| F$GAL4_01 (Binding site for GAL4) |
| 0.30 |
| F$GAL4_01 (Binding site for GAL4) |
| 0.26 |
| F$GAL4_01 (Binding site for GAL4) |
| 0.18 |
| F$MCM1_02 (Binding site for MCM1 and SFF) |
| 0.12 |
| F$MCM1_02 (Binding site for MCM1 and SFF) |
| 0.12 |
The table presents five motif-TF pairs having the largest (most positive) values of the corresponding parameters . Motifs are in the leftmost column and are identified by their Transfac identifiers. The middle column contains TFs, which are identified by their gene names. The rightmost column contains the corresponding values .
Figure 2The ROC AUC score of different estimation methods, averaged over 100 runs.
Note the increase in performance of the basic techniques brought by the use of randomization and a further increase due to centering. Also note the high performance of the correlation-based estimate.
G = MAT analysis of the Gasch dataset.
| Motif | TF | Score |
| Y$GAL1_15 (Binding site for MIG1) |
| 0.63 |
| Y$HSP12_01 (Binding site for ABF1) |
| 0.52 |
| Y$HSP12_01 (Binding site for ABF1) |
| 0.50 |
| Y$CHA1_04 (Binding site for ABF1) |
| 0.50 |
| Y$GAL1_15 (Binding site for MIG1) |
| 0.48 |
The table presents five motif-TF pairs having the largest values of the corresponding parameters . The parameter values are given in the rightmost column.
G = MAT for GO annotation on the Spellman dataset.
| Motif | TF | Score |
| GO:0000747 (Conjugation with cellular fusion) |
| 0.07 |
| GO:0043332 (Mating projection tip) |
| 0.06 |
| GO:0005762 (Mitochondrial large ribosomal subunit) |
| 0.05 |
| GO:0006999 (Nuclear pore organization and biogenesis) |
| 0.05 |
| GO:0005763 (Mitochondrial small ribosomal subunit) |
| 0.05 |
The table presents five (GO term, TF) pairs having the largest values of the corresponding parameters .