| Literature DB >> 19767613 |
Krishna B S Swamy1, Chung-Yi Cho, Sufeng Chiang, Zing Tsung-Yeh Tsai, Huai-Kuang Tsai.
Abstract
Transcription factors (TFs) regulate gene expression by binding to specific binding sites (TFBSs) in gene promoters. TFBS motifs may contain one or more variable positions. Although the prevailing assumption is that nucleotide variants at such positions are functionally equivalent, there is increasing evidence that such variants play a role in regulation of gene expression. In this article, we propose a method for studying the relationship between the expression of target genes and nucleotide variants in TFBS motifs at a genome-wide scale in Saccharomyces cerevisiae, especially the combinatorial effects of variants at two positions. Our analysis shows that nucleotide variations in more than one-third of variable positions and in 20% of dependent position pairs are highly correlated to gene expression. We define such positions as 'functional'. However, some positions are only functional as dependent pairs, but not individually. In addition, a significant proportion of the functional positions have been well conserved across all yeast-related species studied. We also find that some positions require the presence of co-occurring TFs, while others maintain their functionality in the absence of a co-occurring TF. Our analysis supports the importance of nucleotide variants at variable positions of TFBSs in gene regulation.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19767613 PMCID: PMC2790881 DOI: 10.1093/nar/gkp743
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of the proposed method: (a) the TFBSs were downloaded from MYBS (using PHO4 as an example); (b) the target genes are grouped into two groups, b and ¬b, according to the nucleotide at a certain variable position; (c) the target genes are grouped into two groups (b and ¬b) by considering dependent position pairs; (d) the Pearson correlation coefficient for any two genes in the same group was calculated; and (e) the KS test was used to determine whether the degrees of co-expression between two groups are significantly different.
Information on the studied TFs
| TF | Positions | Variable positions | Individual-position | Deg-pair | Dependence | Dependent-position pairs |
|---|---|---|---|---|---|---|
| ABF1 | 13 | 8 | 8 | 28 | 10 | 10 |
| CAD1 | 10 | 1 | 0 | ‐ | ‐ | ‐ |
| CIN5 | 10 | 6 | 5 | 15 | 11 | 0 |
| DIG1 | 8 | 1 | 0 | ‐ | ‐ | ‐ |
| FHL1 | 10 | 2 | 1 | 1 | 1 | 0 |
| FKH1 | 8 | 2 | 0 | 1 | 0 | 0 |
| FKH2 | 8 | 2 | 0 | 1 | 0 | 0 |
| HAP1 | 11 | 11 | 1 | 55 | 27 | 8 |
| HAP4 | 8 | 2 | 1 | 1 | 1 | 0 |
| INO2 | 9 | 4 | 1 | 6 | 3 | 0 |
| MCM1 | 10 | 6 | 2 | 15 | 5 | 3 |
| RAP1 | 9 | 6 | 2 | 15 | 3 | 2 |
| REB1 | 7 | 1 | 0 | ‐ | ‐ | ‐ |
| RLM1 | 10 | 1 | 0 | ‐ | ‐ | ‐ |
| SKN7 | 10 | 1 | 0 | ‐ | ‐ | ‐ |
| STB1 | 9 | 1 | 1 | ‐ | ‐ | ‐ |
| STE12 | 8 | 1 | 0 | ‐ | ‐ | ‐ |
| SUM1 | 9 | 5 | 1 | 10 | 4 | 0 |
| SWI4 | 7 | 3 | 1 | 3 | 1 | 1 |
| SWI6 | 6 | 4 | 3 | 6 | 6 | 6 |
| TYE7 | 8 | 1 | 0 | ‐ | ‐ | ‐ |
| UME6 | 9 | 4 | 1 | 6 | 2 | 2 |
| YAP1 | 7 | 1 | 0 | ‐ | ‐ | ‐ |
| YDR026C | 9 | 1 | 0 | ‐ | ‐ | ‐ |
| Total | 213 | 75 | 28 | 163 | 74 | 32 |
Positions: the length of the consensus of a TFBS; Variable positions: the number of variable positions; Individual-position: the number of positions that are functional (see ‘Materials and Methods’ section); Deg-pair: the number of variable position pairs; Dependence: the number of variable position pairs that are dependent (passed the χ-test); Dependent-position pairs: the number of variable position pairs that are functional (see ‘Materials and Methods’ section).
Figure 2.Fisher’s exact test for the association between co-occurring TFs and variable positions of a given TF. Here, X represents the number of target genes of TF α with nucleotide i (A, T, C or G) at the predicted functional variable position in the TFBS and with/without ( j = 0 or 1, respectively) co-occurring TF β. N, where i = 1, 2, 3, 4, indicates the number of target genes whose TFBSs contain nucleotide i (A, T, C or G) at their functional variable positions. K1 indicates the number of target genes containing the TFBSs of TF α and TF β in their upstream or promoter region. K2 represents the frequency of target genes that only contain the TFBS of TF α. M is the total number of target genes used. The exact probability of observing the particular arrangement of the target genes of TF α was calculated by the hypergeometric distribution .
Figure 3.Overview of the results with individual functional positions: ‘TFBS’ lists the TFs with variable positions in their TFBS motifs. ‘Positions’ shows the functional positions in our study. ‘b group’ represents the nucleotide at a functional variable position considered functionally significant. The abbreviations at the top of the table correspond to different microarray conditions: G1 (glucose), G2 (glucT2), Ca (calcium), M (mec1), F (fkh), Sn (snf), A (alpha), Cd (cdc15), Sp (sporulation) and D (diauxic). In this figure, the positions that satisfy our criteria (see ‘Materials and Methods’ section) are shown in grey.
Functional-dependent variable position pairs
| TFBS | ( | ( |
|---|---|---|
| ABF1 | (1, 9)1 | (A, A, |
| (4, 6)1 | (A, C, | |
| ( | (A, A, | |
| (4, 9)1 | (A, A, | |
| (4, 10)1 | (A, C, | |
| (6, 7)1 | (G, C, | |
| (6, 8)1 | (T, T, | |
| (6, 10)1 | (C, C, | |
| (8, 10)1 | (T, G, | |
| (9, 10)1 | (A, C, | |
| HAP1 | (1, 3)3 | (G, T, |
| (2, 3)3 | (G, A, | |
| (2, 8)2 | (C, T, | |
| (3, 5)3 | (T, T, | |
| (3, 6)3 | (T, T, | |
| (3, 9)3 | (T, C, | |
| (3, 10)3 | (T, G, | |
| (3, 11)3 | (A, G, | |
| MCM1 | (5, 6)2 | (T, T, |
| (5, 7)2 | (T, T, | |
| (6, 7)2 | (T, T, | |
| RAP1 | (2, 7)3 | (C, A, |
| (5, 6)3 | (A, T, | |
| SWI4 | (2, 4)3 | (A, G, |
| SWI6 | (1, 4)3 | (A, A, |
| (1, 5)3 | (A, A, | |
| (1, 6)3 | (A, A, | |
| (4, 5)1 | (A, A, | |
| (4, 6)1 | (A, A, | |
| (5, 6)1 | (A, A, | |
| UME6 | (2, 7)2 | (G, C, |
| (7, 8)3 | (C, C, |
In this table, i and j denote, respectively, the first and second positions of a variable position pair; and in (b, b, C), b and b are the nucleotides at positions i and j, respectively; and C corresponds to the following microarray conditions: G1 (glucose), G2 (glucT2), Ca (calcium), M (mec1), F (fkh), Sn (snf), A (alpha), Cd (cdc15), Sp (sporulation), and D (diauxic). Subscripts 1–3, respectively, denote the position pairs in Category 1 (functional individually and in combination), Category 2 (only functional as combinations) and Category 3 (one position was functional individually but the other position was not). The superscript asterist indicates position pairs in Category 1 that are only functional as a combination under specific conditions.
The frequency of each group of nucleotide variations
| Nucleotide | Frequency | ( | Frequency |
|---|---|---|---|
| A | 19 | (A, A) | 13 |
| AC | 7 | (A, C) | 4 |
| AG | 8 | (A, G) | 5 |
| AT | 13 | (A, T) | 4 |
| C | 15 | (C, A) | 3 |
| G | 13 | (C, C) | 2 |
| T | 16 | (C, T) | 1 |
| (G, A) | 4 | ||
| (G, C) | 2 | ||
| (G, G) | 2 | ||
| (G, T) | 2 | ||
| (T, C) | 5 | ||
| (T, G) | 5 | ||
| (T, T) | 7 |
‘Nucleotide’ represents the individual nucleotide types A, T, G and C; b and b are the nucleotides that form functional-dependent pairs (b, b); ‘Frequency’ denotes the frequency of the ‘Nucleotide’ and nucleotide pairs (b, b).
Relationships between co-occurring TFs and functional variable positions
| TFBS | Position | Less-preferred | More-preferred | Co-TF |
|---|---|---|---|---|
| ABF1 | 4 | A | G | MBP1 |
| 6 | TYE7 | |||
| 7 | T | CBF1 | ||
| 8 | T | REB1 | ||
| 9 | INO2 | |||
| 10 | CBF1 | |||
| HAP1 | ||||
| CIN5 | 3 | T | MBP1 | |
| T | PUT3 | |||
| T | SWI4 | |||
| T | SWI6 | |||
| 4 | MBP1 | |||
| PUT3 | ||||
| SWI4 | ||||
| SWI6 | ||||
| 8 | T | MBP1 | ||
| T | PUT3 | |||
| T | SWI4 | |||
| T | SWI6 | |||
| FHL1 | 8 | A | G | RAP1 |
| MCM1 | 8 | T | RPN4 | |
| RAP1 | 6 | RPN4 | ||
| 7 | C | INO2 | ||
| T | SUM1 | |||
| SWI6 | 4 | A | C | MBP1 |
| 5 | A | G | MBP1 | |
| 6 | A | T | MBP1 |
Functional positional variants in TFBS motifs that are associated with co-occurring TFs are listed. ‘TFBS’ corresponds to TFs with individual functional variable positions. ‘Position’ is the list of individual functional variable positions. ‘Less-preferred’ and ‘More-preferred’ correspond to nucleotide variants that are significantly and non-significantly associated with co-occurring TFs. ‘Co-TF’ is the list of co-occurring TFs for each TF listed in the first column of the table. The nucleotides that matched our predictions in the ‘More-preferred’ category are shown in bold font.