| Literature DB >> 12760745 |
Boris Lenhard1, Albin Sandelin1, Luis Mendoza1,2, Pär Engström1, Niclas Jareborg1,3, Wyeth W Wasserman1,4.
Abstract
BACKGROUND: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12760745 PMCID: PMC193685 DOI: 10.1186/1475-4924-2-13
Source DB: PubMed Journal: J Biol ISSN: 1475-4924
Figure 1Cross-species comparisons of the β-globin gene promoter. (a) Analysis of the human promoter without phylogenetic filtering generates numerous predictions, most of which are biologically irrelevant. (b) Comparison with the chicken promoter fails to detect conserved sites (screened with the artificially low conservation cutoff of 25%). (c) Comparison with the mouse promoter sequence identifies conserved sites, including a documented GATA-binding site [49] (boxed). (d) Comparison with the cow promoter identifies more conserved sites. (e) Comparison to the Macaque monkey (Macaca cynomolgus) promoter results in a plot similar to the single sequence analysis. Unless indicated, all plots were generated using all available matrices from vertebrates, with 70% conservation cutoff, 50 base-pair window size and 85% transcription factor score threshold settings. The y axis in all graphs specifies the percentage of identical nucleotides within a sliding window of fixed length (using the default of 50 base-pairs). The x axis refers to the nucleotide position in the human sequence at which the window initiates.
The reference collection of 14 gene pairs and 40 verified transcription-factor-binding sites used for testing
| Gene name | Human sequence | Rodent sequence | Transcription factors | Binding sequence | Location | MEDLINE ID [ |
| Skeletal muscle actin | AF182035* | M12347 | SP1 | GCGGGGTGGCGCG | -64/-51 | 11017083 |
| SRF | ACCCAAATATGGCT | -100/ -86 | 1922033 | |||
| TEF-1 | GACATTCCTGCG | -73/-51 | 11017083 | |||
| Aldolase A α B crystallin | X12447* | J05517 | MEF2 | CCTAAATATAGGTC | -125/-111 | 8413246 |
| M28638* | U04320 | SP1 | AGGAGGAGGGGCA | -343/-330 | 11017083 | |
| SRF | GCCCAAGATAGTTG | -393/-379 | 11017083 | |||
| Cardiac α myosin heavy chain | Z20656 | U71441 and M62404* | MEF2 | TTAAAAATAACTGA | -327/-313 | 8366095 |
| TEF-1 | AGGAGGAATGTGC | -239/-226 | 7961957 | |||
| SRF | CTCCAAATTTAGGC | -62/-48 | 8782063 | |||
| CEBPα | U34070* | M62362 | AP2 α | GGCCGGGGGCGGA | -243/-232 | 9520389 |
| TBP | TATAAAA | -30/-24 | 96003748 | |||
| Cell division cycle protein 2 | L06298 and | U69555 | E2F | TCTTTCGCGC | -131/-119 | 94094909 |
| X66172* | cETS | GGGAAG | -109/-104 | 951721551 | ||
| Cholesterol 7 α hydroxylase | L13460 | U01962* | HNF3 β | TCTGTTTGTTCT | -175/-166 | 9799805 |
| cEBP | ATGTTATGTCA | -227/-217 | 28182075 | |||
| Early growth response protein 1 | AJ243425 | M22326* | SRF | TGCTTCCCATATATGGCCATGT | -88/-67 | 90097904 |
| SRF | CCAGCGCCTTATATGGAGTGGC | -358/-337 | 90097904 | |||
| SRF | GAAACGCCATATAAGGAGCAGG | -412/-391 | 90097904 | |||
| Glucose-6-phosphatase | AF051355* | U57552 | HNF3 β | CCAAAGA | -72/-66 | 9369482 |
| HNF3 β | ACAAACG | -91/-85 | 9369482 | |||
| HNF3 β | GTTTTTGAG | -82/-74 | 9369482 | |||
| HNF3 β | TGTGTGC | -180/-174 | 9369482 | |||
| HNF3 β | TGTTTGC | -139/-133 | 9369482 | |||
| HNF1 | AGTTAATCATTGGCC | -226/-212 | 9369482 | |||
| Leptin | U43589 | U36238* | SP1 | GGGCGG | -100/-95 | 9492033 |
| cEBP | GTTGCGCAAG | -58/-49 | 9492033 | |||
| TBP | TATAAG | -33/-28 | 9492033 | |||
| Lipoprotein lipase | M29549* | M63335 | NFY | CAAT | -65/-61 | 1918010 |
| cEBP | TAGCCAAT | -68/-61 | 1918010 | |||
| TBP | TATAA | -27/-23 | 1918010 | |||
| Muscle creatine kinase | M21487 | AF188002 and M21390* | SRF | CCATGTAAGG | -1236/-1227 | 93233638 |
| AP2 α | GGCCTGGGGA | -1220/-1211 | 93233638 | |||
| MEF2 | TCTAAAAATAAC | -1078/-1067 | 93233638 | |||
| MYF | GGGCCAGCTGTCCC | -253/-240 | 96347575 | |||
| MYF | CCAACACCTGCTGC | -1157/-1144 | 96347575 | |||
| P53 | ATACAAGGCC | -176/-167 | 96047120 | |||
| P53 | ATACAAGGCC | -158/-149 | 96047120 | |||
| Rb susceptibility gene | L11910* | M86180 U49920 and S66110 | SP1 | GGGCGG | -202/-188 | 1881452 |
| Troponin I | L21905* | MEF2 | AGACTATAATAGCC | -976/-962 | 9774679 | |
| MYF | TAAACAGGTGCAGC | -879/-865 | 9774679 |
GenBank accession numbers [41] are given for the human and rodent sequences. The transcription-factor-binding sequences refer to the human or rodent sequence(s) marked with an asterisk. 'Location' refers to the position of the TFBS relative to the transcription start site.
Figure 2The impact of phylogenetic footprinting analysis. Both (a-c) a high-quality set (14 genes and 40 verified sites), and (d-f) a larger collection of promoters (57 genes and 110 sites, from the TRANSFAC database [20,21]) were analyzed. (a,d) Comparison of the selectivity (defined as the average number of predictions per 100 bp, using all models) between orthologous and single-sequence analysis modes. (b,e) Comparison of the sensitivity (the portion of 40 or 110 verified sites, respectively, that are detected with the given setting) between orthologous and single-sequence analysis modes. (c,f) Ratios of the number of sites detected in single-sequence mode to the number detected in orthologous-sequence mode; the pair: single-sequence ratios are displayed for both sensitivity (detected verified sites) and selectivity (all predicted sites).
Figure 3The ConSite result report and visualization tools for the analysis of two orthologous genomic sequences. (a) Graphical view, with conservation profile plots for the two orthologous sequences, as well as the control panel for altering the visualization parameters. (b) Pop-up window containing information about individual TFBSs. (c) Detailed alignment view, providing sequence-level details on putative TFBSs conserved between two orthologous sequences.