| Literature DB >> 23019068 |
Ken Daigoro Yokoyama1, David D Pollock.
Abstract
Functional modification of regulatory proteins can affect hundreds of genes throughout the genome, and is therefore thought to be almost universally deleterious. This belief, however, has recently been challenged. A potential example comes from transcription factor SP1, for which statistical evidence indicates that motif preferences were altered in <span class="Species">eutherian mammals. Here, we set out to discover possible structural and theoretical explanations, evaluate the role of selection in SP1 evolution, and discover effects on coregulatory proteins. We show that SP1 motif preferences were convergently altered in birds as well as <span class="Species">mammals, inducing coevolutionary changes in over 800 regulatory regions. Structural and phylogenic evidence implicates a single causative amino acid replacement at the same SP1 position along both lineages. Furthermore, paralogs SP3 and SP4, which coregulate SP1 target genes through competitive binding to the same sites, have accumulated convergent replacements at the homologous position multiple times during eutherian and bird evolution, presumably to preserve competitive binding. To determine plausibility, we developed and implemented a simple model of transcription factor and binding site coevolution. This model predicts that, in contrast to prevailing beliefs, even small selective benefits per locus can drive concurrent fixation of transcription factor and binding site mutants under a broad range of conditions. Novel binding sites tend to arise de novo, rather than by mutation from ancestral sites, a prediction substantiated by SP1-binding site alignments. Thus, multiple lines of evidence indicate that selection has driven convergent evolution of transcription factors along with their binding sites and coregulatory proteins.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23019068 PMCID: PMC3514965 DOI: 10.1093/gbe/evs085
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Convergence of SP1 Cis-Regulatory Conversions within Birds and Placental Mammals
| Human GC Box | Human GA Box | |||||
|---|---|---|---|---|---|---|
| Obs. | Exp | Obs | Exp | |||
| Mammal root: GA box | ||||||
| Bird GC box | 63 | 43 | 5.7e−5 | 55 | 55 | 0.32 |
| Bird GA box | 47 | 48 | 0.39 | 83 | 61 | 1.6e−5 |
| Mammal root: GC box | ||||||
| Bird GC box | 109 | 90 | 2.6e−4 | 37 | 34 | 0.17 |
| Bird GA box | 51 | 45 | 0.06 | 27 | 17 | 3.0e−4 |
aThe observed numbers of GA/GC box co-occurrences across orthologous genes.
bThe expected number of GA/GC box co-occurrences, assuming a random distribution of motifs without regards to gene orthology.
cP values representing the significance of enrichment using Fisher’s exact test.
dGenes are separated according to the motif inferred in the common mammalian ancestor; genes inferred to contain both motifs at the root of mammals have been excluded.
eGenes in this category have gained a GC box independently along the human and bird lineages; no GC box was present at the root of mammals.
FEvolution of SP transcription factors. (A) SP1 binds preferentially to the GC box in placental mammals and birds (red) and to the ancestral GA box consensus in other vertebrates (black). Modifications in binding motif preferences along the phylogeny are denoted by red-filled circles. “Variable regions” in zinc finger 2 (zf2-VR), containing all nonconserved sites in zinc finger 2 within vertebrates, are shown for SP1, SP3, and SP4. Site –13 (highlighted) is putatively responsible for the change in SP1 binding preferences. (B) Zinc finger 2 (zf2) of human SP1, SP3, and SP4. Each zinc finger contains an alpha-helix and two beta sheets (Philipsen and Suske 1999; Dhanasekaran et al. 2006). Red and gray columns denote sites nonconserved across vertebrates; all are contained in the boxed variable region (zf2-VR), comprising sites −13 to −8. Site +3 binds directly to the convergent A/C fourth site of the GC box. (C) SP1 binds to the DNA via zinc fingers 1–3 (zf1-zf3), where zf2 binds to the three central nucleotides of the GC box (GGG) (Philipsen and Suske 1999; Bouwman and Philipsen 2002; Dhanasekaran et al. 2006). Site −13 (red) is only 9.5 Å from site +3 (green) and directly contacts the neighboring site (site +4) (Bouwman and Philipsen 2002; Oka et al. 2004; Dhanasekaran et al. 2006).
Enrichment of Functional Categories in SP1 Target Genes
| Human GC Box Promoters | Anc GA | Anc No GA | Anc GA | ||||
|---|---|---|---|---|---|---|---|
| Obs | Exp | Ratio | Hum GC | Hum GC | Hum No GC | ||
| Protein binding | 2e−27 | 2,354 | 2,034 | 1.16× | 1.27× | 1.10× | 1.04× |
| Transferase activity | 9e−9 | 450 | 364 | 1.24× | 1.35× | 1.23× | 0.91× |
| Protein amino acid phosphorylation | 8e−8 | 208 | 154 | 1.35× | 1.43× | 1.32× | 0.92× |
| Protein serine/threonine activity | 4e−7 | 148 | 107 | 1.39× | 1.52× | 1.35× | 0.97× |
| Nucleotide binding | 9e−7 | 696 | 606 | 1.15× | 1.19× | 1.18× | 0.98× |
| Purine nucleotide biosynthetic process | 1e−5 | 12 | 4 | 2.76× | 4.40× | 2.10× | 0.0 |
| CpG island promoters | 0.0 | 3,583 | 2,405 | 1.49× | 1.49× | 1.49× | 1.07× |
| Methylation: BG02ES (human embryonic stem cells) | 6e−4 | 29 | 45 | 0.64× | 0.23× | 0.88× | 0.69× |
| Methylation: H1hESC (human embryonic stem cells) | 8e−4 | 15 | 28 | 0.54× | 0.19× | 0.75× | 0.83× |
| Methylation: HAL (human adult liver) | 7e−3 | 15 | 24 | 0.62× | 0.22× | 0.86× | 0.94× |
aP values represent the significance of enrichment according to Fisher’s exact test.
bThe observed number of genes in each category.
cThe expected number of genes in each category.
dThe observed-to-expected ratio.
eObserved-to-expected ratios for human GC box target genes with a GA box in the ancestor. Note the consistent over-enrichment of GO categories and under-enrichment of promoter methylation.
fObserved-to-expected ratios for human GC box target genes without a GA box in the ancestor.
gObserved-to-expected ratios for genes with a GA box in the ancestor and without a GC box in humans.
hUnder-enrichment for methylated promoters.
FBirth-death rates of the SP1-binding motif in mammals. Birth rates (α) denote the probability (per year) that an unoccupied position will gain a binding site; death rates (β) give the probability (per year) that an existing binding site is lost. Branches in the mammalian phylogeny were partitioned into three groups: early eutherian mammals (red), late eutherian mammals (black), and GA box-preferring noneutherian mammals (blue). Birth and death rates of each group were estimated for the GC box (GGGCGG), GA box (GGGAGG), and the nonfunctional motif GGGTGG (Letovsky and Dynan 1989; Wierstra 2008).
FPopulation frequencies of an adaptive mutant transcription factor and its binding sites. (A) Shown are the population frequencies of the adaptive mutant transcription factor allele (blue), which first occurs in a single heterozygous individual at generation (population size: n = 1,000). The total population frequency of the novel binding consensus (BOXC) and the initial wild-type binding motif (BOXA) are shown in red and black, respectively. We assume a small adaptive benefit for the adaptive transcription factor SPC binding to BOXC (relative fitness , where ) over the wild-type transcription factor and its motif (relative fitness 1). Maladaptive binding events (SPC binding to BOXA or the wild-type transcription factor binding to BOXC) have reduced fitness (, where ). Population frequencies of SPC, BOXA, and BOXC are given on the left for the first 20,000 generations and on the right for 150,000 generations. (B) Evolution of the adaptive trans-factor and binding sites under a semi-dominant model. SPC binding to BOXC is assigned relative fitness for individuals heterozygous for the transcription factor genotype () and for individuals homozygous for the mutant transcription factor. (C) The single binding site locus model. In contrast to the previous model, each locus is restricted to no more than one binding motif (either BOXA or BOXC).
Motifs in Opossum that Frequently Align with the GC Box in Humans
| Motif | All Promoters | Promoters Containing a GA Box in Opossum | |||||
|---|---|---|---|---|---|---|---|
| Aligned | Total | Fraction | Aligned | Total | Fraction | ||
| 1 | GGGCGG | 971 | 4,035 | 24.1 | 436 | 1,837 | 23.7 |
| 2 | GGGAGG | 228 | 5,007 | 4.6 | 228 | 5,007 | 4.6 |
| 3 | AGGCGG | 216 | 1,795 | 12.0 | 111 | 853 | 13.0 |
| 4 | GGGTGG | 203 | 2,102 | 9.7 | 99 | 942 | 10.5 |
| 5 | GGGCAG | 105 | 1,660 | 6.3 | 42 | 705 | 6.0 |
| 6 | GGGCTG | 77 | 1,803 | 4.3 | 35 | 743 | 4.7 |
| 7 | GGGGGG | 59 | 3,402 | 1.7 | 34 | 1,921 | 1.8 |
| 8 | GGGCCG | 52 | 1,493 | 3.5 | 30 | 610 | 4.9 |
| 9 | GGGCGT | 45 | 640 | 7.0 | 18 | 233 | 7.7 |
| 10 | GAGCGG | 40 | 990 | 4.0 | 18 | 436 | 4.1 |
| 11 | GGACGG | 36 | 618 | 5.8 | 18 | 284 | 6.3 |
| 12 | GGCCGG | 34 | 1,647 | 2.1 | 19 | 716 | 2.7 |
| 13 | GGGCGA | 32 | 596 | 5.4 | 15 | 248 | 6.0 |
| 14 | GGGCGC | 29 | 1,287 | 2.3 | 15 | 535 | 2.8 |
| 15 | TGGCGG | 27 | 887 | 3.0 | 6 | 315 | 1.9 |
aThe number of sites in opossum aligned to a human GC box.
bThe total number of motifs in opossum.
cThe fraction of each motif that align to human GC boxes.
dMotifs that have higher rates of conversion to the GC box than the GA box.
Effects of Amino Acid Replacements on SP1 Zinc Finger 2 (zf2) Structure
| Peptide | Human zf2 (-13M) | M-13V peptide |
|---|---|---|
| M-13V | 0.135 | 0 |
| M-13I | 0.091 | 0.174 |
| T-11S | 0.048 | 0.131 |
| S-9M | 0 | 0.135 |
| S-9V | 0.045 | 0.130 |
| Y-8F | 0 | 0.135 |
| S-9L | 0 | 0.135 |
| T-11S/S-9L | 0.039 | 0.137 |
| S-9V/Y-8F | 0.045 | 0.130 |
| T-11S/S-9M | 0.039 | 0.137 |
| T-11S/Y-8F | 0.039 | 0.137 |
aLowest free energy peptide structures are labeled according to amino acid replacements relative to the human zf2 peptide.
bRMSD values of each peptide compared with the lowest free energy human zf2 structure.
cRMSD values compared with the M-13V peptide.
FStructural changes of SP1 zinc finger 2 (zf2) following replacements at site −13. (Top) Comparisons of predicted lowest-energy zf2 structures between the native human peptide (−13M), and peptides following replacements to the ancestral valine (M-13V) and bird isoleucine (M-13I) at site −13. Structural alignments were conducted according to residues on the 5′-end of the peptide (residues −16 to −12). Both −13M and M-13I peptides showed displacement of residues 5′ to the DNA-contacting alpha-helix (sites −6 to −1) compared with the ancestral valine peptide. No such displacement was seen between −13M and M-13I. All three peptides aligned closely at the 3′-end of the alpha-helix (sites +6 to +10), reflecting structural modifications at the 5′-end of the alpha-helix. (Bottom) Distances between alpha carbons prior to and within the alpha-helix (blue and orange, respectively). Comparisons between the native human peptide and M-13V (left) and between M-13I and M-13V (center) show closely aligned residues at the 3′-end of the alpha-helix and increasing displacement toward the 5′-end. These modifications begin around site +3, which directly contacts the A/C evolving site of the SP1-binding motif (Philipsen and Suske 1999; Bouwman and Philipsen 2002; Dhanasekaran et al. 2006). No such region-specific displacement between −13M and M-13I was observed between −13M and M-13I (right).