| Literature DB >> 16733547 |
Enrique Blanco1, Xavier Messeguer, Temple F Smith, Roderic Guigó.
Abstract
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels--to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human-mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16733547 PMCID: PMC1464811 DOI: 10.1371/journal.pcbi.0020049
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1TF-Map Alignment of the Promoters of Two Hypothetical Co-Regulated Genes
(A) The sequence of a promoter is searched for occurrences of known binding motifs for TFs. Matches are annotated with the position of the match in the primary sequence, and the label of the TF. Because TFs can bind to motifs showing no sequence conservation, labels of the same TF at different positions may correspond to different underlying nucleotide sequences. We refer here to these sequences of pairs (“label,” “position”) as TF-maps. TF-maps are actually more complicated. First, we do not only register the position of each match, but also its length. Second, while in the example here, sequence motifs are associated to TFs by means of a (binary) look-up table, in our work we have instead used collections of PWMs. Matches to TFBSs are thus scored, and this score is also registered.
(B) TF-map of the promoter region of two hypothetically co-regulated genes X and Y. Each letter corresponds to a different TF. We assume that 200 nucleotides upstream of the annotated TSS have been considered, with position 1 corresponding to position −200 from the TSS.
(C) Global pairwise alignment of the two co-regulated genes X and Y. Only positions with identical labels can be aligned. Essentially, the alignment finds the longest common substring constrained to maximizing the sum of the scores (unpublished data) of the aligned positions, and minimizing the differences in the distances on the primary sequence between adjacent aligned positions.
Figure 2Graphical Representation of the Sparse Dynamic Programming Matrix S When Obtaining the TF-Map Alignment between the Human and Mouse Promoters of the skeletal alpha-actin Gene, Using Different Collections of PWMs for TFBSs
The axes of the matrix list the TF labels of the predicted TFBSs in the human and mouse promoters. Despite the differences in the total number of predicted TFBSs depending on the collection, the occupancy of the matrix remains consistently low.
Sequence and TF-Map Alignments of Different Genomic Regions between the Human and Mouse Orthologous Pairs in the HR Set and between the Human and Chicken Orthologous Pairs in the HC Set
Summary of the TF-Map Alignments Obtained between the Promoter of the transthyretin Gene and the Promoters of the Genes Co-Regulated with It according to the CISRED Database
Results of the Estimation of the Optimal Parameters of the TF-map Alignment Algorithm in the HR Set of Orthologous Human–Mouse Promoter Sequences
Figure 3Results of the TF-Alignment of the Human and Mouse Promoters of the phospholipase A1 member A Gene
Here, the 2,000 nucleotides upstream of the annotated TSS have been considered (with position 1 corresponding to −2,000). The TF-maps on these sequences were obtained using TRANSFAC 6.3 [10]. These maps contained 676 predicted binding sites in human and 595 in mouse (threshold 85%), and they are represented graphically on the top right. Each box represents a different binding site and the color corresponds to the associated TF. The resulting TF-map alignment is also represented graphically at the bottom right. The explicit alignment (the TFs and the coordinates in the human and mouse promoter of the underlying TFBSs) is given on the left. As it is possible to see, while the region proximal to the TSS is not more dense in predicted TFBSs than other regions, most of the aligned elements cluster near the TSS. Indeed, more than half of the elements in the TF-map alignments are within 500 nucleotides of the TSS. The program GFF2PS [48] has been used to obtain the graphical representation of input predictions and final alignment.