| Literature DB >> 23625967 |
Christopher S Nelson1, Chris K Fuller, Polly M Fordyce, Alexander L Greninger, Hao Li, Joseph L DeRisi.
Abstract
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23625967 PMCID: PMC3695516 DOI: 10.1093/nar/gkt259
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic of FOXP2 domains and truncated construct used in MITOMI experiments showing C2H2 zinc-finger domain, leucine zipper domain, forkhead box DNA-binding domain and histidine repeat epitope tag (6xHis). Human lineage substitutions are at positions 303 and 325. The R553H mutation linked to verbal dyspraxia lies within the DNA-binding domain. A polyglutamine (polyQ) stretch was removed by truncation of the shaded region. We 6xHis-tagged the C-terminus for recruitment and retention on chip.
Previously reported models of the FOXP2-binding site
| Publication | Data type | System | Motif |
|---|---|---|---|
| Vernes | ChIP-chip | SH-SY5Y cells overexpressing FOXP2 | TCTTCGT |
| Vernes | EMSA | AATTTG | |
| Enard | Gene expression | Humanized mice | TATTTAT |
| Vernes | ChIP-chip | Wild-type embryonic mice | ARKTAMYT |
Figure 2.Results from FOXP2 MITOMI 2.0 binding assays against a pseudorandom 8mer library. (A) Histograms of MITOMI data showing the ratios of DNA signal intensities to protein signal intensities for human R553H mutant, human WT and chimpanzee alleles. R553H shows no binding to any sequence in the library, whereas chimp and human FOXP2 produces strong binding to a subset of oligonucleotides. (B) Comparison of chimp and human binding ratios (rNN) for all oligonucleotides in the DNA library. Oligonucleotide #175 (used for later targeted analysis) is labeled in red. (C) Top scoring human MatrixREDUCE 7mer affinity logo generated using AffinityLogo (27). The height of each letter depicts the predicted energetic cost or benefit (ΔΔG/RT) of a particular nucleotide at that position in the motif. The centerline indicates zero energetic change. (D) Top scoring chimp MatrixREDUCE 7mer affinity logo.
Figure 3.Affinity measurements for systematic mutations of the binding site and flanking sequences. (A) Fold change in affinity (mutated Ka/unmutated Ka) shown in log-scale. At every position, three values are shown for substitutions with each alternate base relative to the starting sequence. Error bars represent the standard error of the mean. Chimp and human data are displayed in red and blue, respectively. (B) PSAM affinity logo based on the affinities displayed in part A for the human allele. As in Figure 2, the height of each of four base letters depicts the measured energetic cost or benefit (ΔΔG/RT) of adding that base at that position in the motif. The centerline indicates zero energetic change. (C) PSAM affinity logo based on the affinities displayed in part A for the chimp allele.
Figure 4.FOXP2 target-binding motif as revealed by ChIP-seq analysis. (A) Motif derived from MEME analysis of 71 ChIP-seq peak sequences with 50 bp of flanking sequence included. Motifs are displayed with small sample correction error bars. (B) Histogram of the relative positioning of the 71 FOXP2 ChIP-seq sites relative to the start of the nearest neighboring gene.
Consistent ChIP-seq peaks near genes
| Peak | Max PSAM score | Top 0.1% | ‘TGTTTAC’ | Nearest gene | Description | RefSeq# | Distance to TSS | Intronic? |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.45 | Yes | No | NFIA | Nuclear factor I/A | NM_001145512 | 93 | |
| 2 | 1.00 | Yes | Yes | TPRG1L | Tumor protein p63-regulated gene 1-like protein | NM_182752 | 199 | |
| 3 | 1.00 | Yes | Yes | BROX | BRO1 domain and CAAX motif containing | NM_144695 | 802 | Intronic |
| 4 | 1.00 | Yes | Yes | RBM17 | RNA-binding motif protein 17 | NM_001145547 | 2444 | |
| 6 | 1.00 | Yes | Yes | PSMA1 | Proteasome subunit α type-1 | NM_148976 | 64 934 | Intronic |
| 7 | 0.05 | No | No | ZBTB16 | Zinc-finger and BTB domain containing 16 | NM_006006 | 102 327 | Intronic |
| 9 | 0.10 | Yes | No | NAB2 | NGFI-A-binding protein 2 (EGR1-binding protein 2) | NM_005967 | 273 | |
| 10 | 0.22 | Yes | No | TPCN1 | Two pore segment channel 1 | NM_001143819 | 1534 | Intronic |
| 11 | 1.00 | Yes | Yes | BTG1 | B-cell translocation gene 1, anti-proliferative | NM_001731 | 50 | |
| 13 | 1.00 | Yes | Yes | KLHDC2 | Kelch domain containing 2 | NM_014315 | 46 | |
| 14 | 0.14 | Yes | No | KIAA0586 | Uncharacterized protein | NM_001244189 | 120 | Intronic |
| 15 | 1.00 | Yes | Yes | BAHCC1 | Bromo adjacent homology domain and coiled-coil containing 1 | NM_001080519 | 5616 | |
| 16 | 1.00 | Yes | Yes | DHX8 | DEAH (Asp-Glu-Ala-His) box polypeptide 8 | NM_004941 | 47 | |
| 17 | 1.00 | Yes | Yes | DHX40 | DEAH (Asp-Glu-Ala-His) box polypeptide 40 | NM_024612 | 49 | |
| 18 | 0.04 | No | No | SPOP | Speckle-type POZ protein (SPOP) | NM_001007226 | 99 | |
| 19 | 1.00 | Yes | Yes | PHLPP1 | PH domain leucine-rich repeat-containing protein phosphatase 1 | NM_194449 | 216 | |
| 20 | 1.00 | Yes | Yes | LTBP4 | Latent-transforming growth factor β-binding protein 4 | NM_001042544 | 2595 | Intronic |
| 21 | 1.00 | Yes | Yes | JUNB | Jun B proto-oncogene | NM_002229 | 149 | |
| 22 | 0.14 | Yes | No | FBXO46 | F-box protein 46 | NM_001080469 | 5927 | Intronic |
| 23 | 1.00 | Yes | Yes | BBC3 | BCL2-binding component 3 | NM_001127240 | 517 | Intronic |
| 24 | 1.00 | Yes | Yes | FUZ | Fuzzy homolog ( | NM_025129 | 363 | |
| 25 | 0.14 | Yes | No | SPAST | Spastin | NM_014946 | 115 | |
| 27 | 0.07 | Yes | No | ARHGAP25 | Rho GTPase-activating protein 25 | NM_001007231 | 3084 | |
| 29 | 1.00 | Yes | Yes | PCMTD2 | Protein- | NM_018257 | 5581 | Intronic |
| 32 | 1.00 | Yes | Yes | HSF2BP | Heat shock transcription factor 2-binding protein | NM_007031 | 48 420 | Intronic |
| 33 | 1.00 | Yes | Yes | PIGP | Phosphatidylinositol | NM_153682 | 12 250 | |
| 34 | 1.00 | Yes | Yes | C21orf77 | C21orf77 | NM_144659 | 6138 | |
| 35 | 1.00 | Yes | Yes | CBX7 | Chromobox protein homolog 7 | NM_175709 | 6758 | Intronic |
| 36 | 1.00 | Yes | Yes | CECR3 | Cat eye syndrome chromosome region, candidate 3 (non-protein coding) | NR_038398 | 173 | |
| 38 | 1.00 | Yes | Yes | FOXP1 | Forkhead box P1 | NM_032682 | 100 | |
| 39 | 1.00 | Yes | Yes | MAML3 | Mastermind-like protein 3 | NM_018717 | 131 | |
| 40 | 1.00 | Yes | Yes | YTHDC1 | YTH domain-containing protein 1 | NM_001031732 | 19 | |
| 41 | 0.14 | Yes | No | UBE2B | Ubiquitin-conjugating enzyme E2B | NM_003337 | 74 | |
| 42 | 1.00 | Yes | Yes | POLK | DNA-directed DNA polymerase κ | NM_016218 | 10 564 | Intronic |
| 43 | 1.00 | Yes | Yes | NR3C1 | Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor) | NM_000176 | 47 709 | Intronic |
| 44 | 0.01 | No | No | GPANK1 | G patch domain and ankyrin repeats 1 | NM_001199237 | 346 | |
| 46 | 1.00 | Yes | Yes | CCDC28A | Coiled-coil domain containing 28A | NM_015439 | 236 | |
| 47 | 0.14 | Yes | No | FAM8A1 | Family with sequence similarity 8, member A1 | NM_016255 | 6 | |
| 48 | 0.45 | Yes | No | DTNBP1 | Dystrobrevin-binding protein 1 | NM_032122 | 110 549 | Intronic |
| 49 | 1.00 | Yes | Yes | RUNX2 | Runt-related transcription factor 2 | NM_004348 | 23 847 | Intronic |
| 50 | 1.00 | Yes | Yes | CITED2 | Cbp/p300-interacting transactivator, with Glu/Asp-rich carboxy-terminal domain, 2 | NM_006079 | 869 | |
| 51 | 0.00 | No | No | PRKRIP1 | PRKR interacting protein 1 (IL11 inducible) | NM_024653 | 157 | |
| 52 | 1.00 | Yes | Yes | ELN | Elastin | NM_000501 | 15 646 | Intronic |
| 53 | 1.00 | Yes | Yes | CBLL1 | Cas-Br-M (murine) ecotropic retroviral transforming sequence-like 1 | NM_024814 | 205 | |
| 54 & 55 | 1.00 | Yes | Yes | FOXP2 | Forkhead box P2 | NR_033766.1 | 84 | |
| 56 | 1.00 | Yes | Yes | FOXK1 | Forkhead box K1 | NM_001037165 | 62 596 | Intronic |
| 57 | 1.00 | Yes | Yes | HIBADH | 3-hydroxyisobutyrate dehydrogenase | NM_152740 | 140 | |
| 58 | 1.00 | Yes | Yes | THSD7A | Thrombospondin type-1 domain-containing protein 7A | NM_015204 | 291 190 | Intronic |
| 59 | 1.00 | Yes | Yes | TNRC18 | Trinucleotide repeat-containing gene 18 protein | NM_001080495 | 459 | |
| 61 | 1.00 | Yes | Yes | PVT1 | Pvt1 oncogene (non-protein coding) | NR_003367 | 57 522 | Intronic |
| 62 | 1.00 | Yes | Yes | ZNF395 | Zinc-finger protein 395 | NM_018660 | 15 465 | |
| 63 | 0.05 | No | No | FNTA | Farnesyltransferase, CAAX box, α | NM_002027 | 37 | |
| 64 | 1.00 | Yes | Yes | OSR2 | Protein odd-skipped-related 2 | NM_001142462 | 77 | |
| 65 | 0.22 | Yes | No | TNFRSF10B | Tumor necrosis factor (ligand) superfamily, member 10 | NM_003810 | 34 | |
| 66 | 0.45 | Yes | No | FBXO32 | F-box protein 32 | NM_058229 | 79 | |
| 67 | 0.14 | Yes | No | ASAP1 | ArfGAP with SH3 domain, ankyrin repeat and PH domain 1 | NM_018482 | 5653 | Intronic |
| 69 | 1.00 | Yes | Yes | BRD3 | Bromodomain containing 3 | NM_007371 | 32 | |
| 70 | 0.50 | Yes | No | TBL1X | Transducin (β)-like 1X-linked | NM_005647 | 159 |
Peaks within 5 kb of either end of a gene model are shown along with PSAM motif scores. Max PSAM score refers to the maximum local alignment score. If the PSAM score is in the top 0.1% of score for random 7mers then it is noted in the ‘Top 0.1%’ column. The ‘TGTTTAC’ column notes whether the peak contains the consensus TGTTTAC. Nearest Gene, description and RefSeq# characterize the gene model nearest each peak. The nucleotide distance to TSS is sometimes >5 kb because some peaks are downstream of the gene model, and some are within large introns, as noted in the last column (intergenic peaks are described in Supplementary Table S6).
Gene ontology term analysis of consistent peaks from the ENCODE ChIP-seq data, with number of genes in each category noted
| Cell line | GO # | GO term | No. of genes | |
|---|---|---|---|---|
| PFSK-1 | 0008134 | Transcription factor binding | 0.0016 | 54 |
| PFSK-1 | 0030528 | Transcription regulator activity | 0.0016 | 83 |
| SK-N-MC | 0003690 | Double-stranded DNA binding | 0.0558 | 6 |
| SK-N-MC | 0003700 | Sequence-specific DNA-binding transcription factor activity | 0.0470 | 23 |
| SK-N-MC | 0016563 | Transcription activator activity | 0.0558 | 13 |
| SK-N-MC | 0016564 | Transcription repressor activity | 0.0189 | 13 |
Figure 5.Conservation of sequences within ChIP-seq peaks containing instances of the FOXP2 motif. (A) Example of two FOXP2 ChIP-seq peaks aligned with elements of strong conservation upstream of BACCH1 on Chr17: 79 366 750–79 370 250 (hg18 / NCBI36). Alignments are shown for two high-confidence peak regions with high scoring instances of our MEME motif, and the vertebrate conservation score for the underlying sequence. (B) The mean of the phastcons conservation score over the FOXP2 peak regions is displayed relative to the position of the strong FOXP2-binding motif, with the genomic background average conservation score in red. The first principal component of the phastcons conservation is plotted in blue on the same scale, noted on the right-hand axis.
FOXP2-binding sites within ChIP-seq peaks where the human sequence is novel relative to chimps and other primates
Coordinates listed are relative to Hg18/NCBI36 draft of the human genome. Nearest gene models within 5 kb of these peaks are noted, with the blank spaces signifying that there is no gene model within 5 kb of the peak. The motif values are given in bits for alignment of the site in question to the 7mer human MEME matrix; only those sites with the highest possible 7mer motif value (13.63 bits) are displayed. Gray shading denotes a gene with brain-specific function.