Alvaro Perdomo-Sabogal1, Katja Nowick2, Ilaria Piccini3, Ralf Sudbrak4, Hans Lehrach5, Marie-Laure Yaspo5, Hans-Jörg Warnatz5, Robert Querfurth6. 1. Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University Leipzig, Leipzig, Germany Paul-Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany. 2. Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University Leipzig, Leipzig, Germany Paul-Flechsig Institute for Brain Research, University of Leipzig, Leipzig, Germany nowick@bioinf.uni-leipzig.de Robert.Querfurth@gmx.de. 3. Institute of Genetics of Heart Diseases (IfGH), Department of Cardiovascular Medicine, University Hospital Münster, 48149 Münster, Germany Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany. 4. European Centre for Public Heath Genomics, UNU-MERIT, Unsiversity Maastricht,PO Box 616, 6200 MD Maastricht, The Netherlands Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany. 5. Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany. 6. Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Berlin, Germany nowick@bioinf.uni-leipzig.de Robert.Querfurth@gmx.de.
Abstract
A substantial fraction of phenotypic differences between closely related species are likely caused by differences in gene regulation. While this has already been postulated over 30 years ago, only few examples of evolutionary changes in gene regulation have been verified. Here, we identified and investigated binding sites of the transcription factor GA-binding protein alpha (GABPa) aiming to discover cis-regulatory adaptations on the human lineage. By performing chromatin immunoprecipitation-sequencing experiments in a human cell line, we found 11,619 putative GABPa binding sites. Through sequence comparisons of the human GABPa binding regions with orthologous sequences from 34 mammals, we identified substitutions that have resulted in 224 putative human-specific GABPa binding sites. To experimentally assess the transcriptional impact of those substitutions, we selected four promoters for promoter-reporter gene assays using human and African green monkey cells. We compared the activities of wild-type promoters to mutated forms, where we have introduced one or more substitutions to mimic the ancestral state devoid of the GABPa consensus binding sequence. Similarly, we introduced the human-specific substitutions into chimpanzee and macaque promoter backgrounds. Our results demonstrate that the identified substitutions are functional, both in human and nonhuman promoters. In addition, we performed GABPa knock-down experiments and found 1,215 genes as strong candidates for primary targets. Further analyses of our data sets link GABPa to cognitive disorders, diabetes, KRAB zinc finger (KRAB-ZNF), and human-specific genes. Thus, we propose that differences in GABPa binding sites played important roles in the evolution of human-specific phenotypes.
A substantial fraction of phenotypic differences between closely related species are likely caused by differences in gene regulation. While this has already been postulated over 30 years ago, only few examples of evolutionary changes in gene regulation have been verified. Here, we identified and investigated binding sites of the transcription factor GA-binding protein alpha (GABPa) aiming to discover cis-regulatory adaptations on the human lineage. By performing chromatin immunoprecipitation-sequencing experiments in a human cell line, we found 11,619 putative GABPa binding sites. Through sequence comparisons of the humanGABPa binding regions with orthologous sequences from 34 mammals, we identified substitutions that have resulted in 224 putative human-specific GABPa binding sites. To experimentally assess the transcriptional impact of those substitutions, we selected four promoters for promoter-reporter gene assays using human and African green monkey cells. We compared the activities of wild-type promoters to mutated forms, where we have introduced one or more substitutions to mimic the ancestral state devoid of the GABPa consensus binding sequence. Similarly, we introduced the human-specific substitutions into chimpanzee and macaque promoter backgrounds. Our results demonstrate that the identified substitutions are functional, both in human and nonhuman promoters. In addition, we performed GABPa knock-down experiments and found 1,215 genes as strong candidates for primary targets. Further analyses of our data sets link GABPa to cognitive disorders, diabetes, KRAB zinc finger (KRAB-ZNF), and human-specific genes. Thus, we propose that differences in GABPa binding sites played important roles in the evolution of human-specific phenotypes.
Regulation of gene expression is a major mechanism shaping phenotypic traits of organisms (Britten and Davidson 1971; King and Wilson 1975; Wray 2007). Gene expression is regulated to a large extent by transcription factors (TFs), proteins that bind to specific DNA sites of typically 5–15 bp, known as TF binding sites (TFBS) or cis-regulatory sites. Even though there is some sequence variation in the binding sites recognized by a certain TF, residing nucleotide substitutions can have a substantial impact on the affinity of the TF and hence on the levels at which the target genes are transcribed (Lin et al. 2007). Consequently, evolutionary changes in the sequence of a TFBS can change the expression of the target genes and lead to phenotypic differences and speciation.Previous studies aiming at identifying species-specific cis-regulatory sites were driven by the interest in a particular gene (Huby et al. 2001; Rockman et al. 2005; Romanelli et al. 2009), or based on genome-wide bioinformatics approaches, for instance, the search for certain substitution patterns in multiple species alignments (Haygood et al. 2007; Taylor et al. 2008; Molineris et al. 2011). However, experimentally supported examples for species-specific cis-regulatory sites are still sparse. This is partially due to laborious experimental approaches required to confirm the functionality of a binding site. Some in vitro studies successfully demonstrated functionality of species-specific TFBSs using promoter reporter gene assays (Chabot et al. 2007), and a selected number of determined changes have also been tested in vivo in zebrafish and mice (Stedman et al. 2004; Rockman et al. 2005; Brinkman-Van der Linden et al. 2007; Prabhakar et al. 2008; Capra et al. 2013).Here, we investigated binding sites of the GA-binding protein alpha subunit (GABPa) by using chromatin immunoprecipitation and comparative genomic approaches. GABPa belongs to the E-twenty-six (ETS) (E-26) family of DNA-binding factors and regulates a broad range of genes involved in cell cycle control, apoptosis, differentiation, hormonal regulation, and other critical cellular functions. GABPa is also recognized as an important transcriptional regulator in myeloid cell differentiation and neuromuscular function (Rosmarin et al. 2004; Yang et al. 2011). It further regulates the expression of genes implicated in mitochondrial functions (Yang et al. 2014) during early cleavage events of the embryo (Ristevski et al. 2004), and inactivation of GABPa induces early embryonic lethality in mice (Ristevski et al. 2004; Jaworski et al. 2007).The GABPa binding motif has been confirmed by several studies (Batchelor et al. 1998; Lin et al. 2007). GABPa forms hetero-tetramers composed of two GABPa and two GABPb subunits to bind tandem repeats of the GGAA consensus motif (Batchelor et al. 1998). Repetitions of GABPa binding sites influence transcription levels in a synergistic manner (Yu et al. 1997). It is also common for GABPa to regulate bidirectional promoters (Collins et al. 2007). Importantly, the DNA-binding domain of the GABPa protein is entirely conserved in primates, mouse, dog, and cow, making it likely that GABPa binds to the same motif in all these species. Therefore, analyzing changes in DNA sequence resulting in gain or loss of GABPa binding sites will facilitate the identification of genes whose regulation has been altered by GABPa along the lineages leading to humans and hominids. In addition, given that GABPa preferentially binds in close proximity to the transcriptional start site (TSS) (Valouev et al. 2008), it is possible to evaluate potential binding site alterations straightforwardly by using promoter-reporter gene assays in human and nonhuman primate cells.We performed chromatin immunoprecipitation-sequencing (ChIP-Seq) assays in humanembryonic kidney (HEK293T) cells to identify TFBS for GABPa and independently supported the functionality of the binding sites by using two different RNA interference molecules to knock-down its expression. To search for GABPa consensus binding site emergence along the human lineage, we reconstructed the ancestral sequences of the binding sites within the ChIP-Seq peak regions based on multiple species alignments (fig. 1). We identified human and hominid-specific GABPa binding sites and exemplarily evaluated the functionality of four human wild-type (wt) promoters on gene expression by performing promoter reporter gene assays in HEK293T cells. By mutating the human/hominid-specific GABPa binding sites, we could also evaluate the activity of the ancestral states of these promoters lacking the GABPa consensus binding site. In addition, to test whether the human/hominid binding sites drive similar expression in a chimpanzee and rhesus macaque background, we also introduced the human/hominid-specific substitutions into the chimpanzee and rhesus macaque promoters. Lastly, we also introduced all promoter reporter gene constructs in COS-1 cells to test whether the activity of the binding sites would change within the background of a nonhuman primate cell line.
F
Overview of the experiments and analyses performed in this study. (A) Schematic representation of ChIP-Seq experiments with a GABPa-specific antibody in HEK293T cells to identify binding sites for GABPa. (B) Peak detection using the peak-calling algorithm available in QuEST. (C) Search for the GABPa consensus binding sequence and deduction of a PWM using the motif discovery tool Multiple Em for Motif Elicitation (MEME). (D) Search for additional binding sites within the peak region using the motif alignment and scan tool MAST. (E) Extraction of multiple species alignments of the MAST-identified binding sites from the UCSC MultiZ 44 vertebrate alignments. (F) Ancestral sequence reconstruction along with the phylogeny of 34 mammalian species from the UCSC 44-vertebrate alignments using the tools PAML and ANCESTORS. (G) GABPa motif matching within reconstructed ancestral binding site sequences using MAST. The dots in panels C and E represent sequence conservation.
Overview of the experiments and analyses performed in this study. (A) Schematic representation of ChIP-Seq experiments with a GABPa-specific antibody in HEK293T cells to identify binding sites for GABPa. (B) Peak detection using the peak-calling algorithm available in QuEST. (C) Search for the GABPa consensus binding sequence and deduction of a PWM using the motif discovery tool Multiple Em for Motif Elicitation (MEME). (D) Search for additional binding sites within the peak region using the motif alignment and scan tool MAST. (E) Extraction of multiple species alignments of the MAST-identified binding sites from the UCSC MultiZ 44 vertebrate alignments. (F) Ancestral sequence reconstruction along with the phylogeny of 34 mammalian species from the UCSC 44-vertebrate alignments using the tools PAML and ANCESTORS. (G) GABPa motif matching within reconstructed ancestral binding site sequences using MAST. The dots in panels C and E represent sequence conservation.
Results
Identification of GABPa Binding Sites by Chromatin Immunoprecipitation
To identify binding sites for GABPa, we performed ChIP-Seq experiments with a GABPa-specific antibody in HEK293T cells (fig. 1). This antibody has been validated for ChIP-Seq of GABPa, resulting in highly specific GABPa peak regions (Valouev et al. 2008). Our peak calling from the ChIP-Seq reads resulted in a list of 6,208 GABPa binding peaks (see Materials and Methods for details).In order to derive a GABPa consensus binding motif from the ChIP-Seq peak regions, we used 200 bp DNA sequences equally surrounding the center of the 6,208 peaks as input for the de novo motif discovery algorithm MEME (fig. 1 and supplementary table S1, Supplementary Material online) (Bailey et al. 2009). This allowed us to build a consensus binding sequence and a position-specific weight matrix (PWM) for GABPa of 11 bp in length (figs. 1 and 2). The PWM-contributing sites were in 93% located close to the peak centers (fig. 3), indicating proper peak-calling from ChIP-Seq reads and further demonstrating the validity of our ChIP-Seq data. The identified PWM is very similar to the GABPa PWMs available in the JASPAR and TRANSFAC TFBS databases, and it is almost identical to the one found by Valouev et al. (2008) in Jurkat cells (fig. 2). Taken together, the binding sites and motif identified here seem to be reliable.
F
Comparison of GABPa motifs from different studies and databases. Sequence logos represent the different PWMs.
F
GABPa peaks map close to gene starts and harbor GABPa binding sites residing closely to the peak centers. (A) Distance of the sites contributing to the MEME motif (6,031 of 6,208 in total) to the ChIP peak centers. (B) Number of GABPa motifs found per each ChIP-Seq peak. (C) Distribution of motif occurrences within 200 bp surrounding the ChIP peak centers. (D) Distance of peak calls to the nearest TSSs of UCSC genes within 10 kb, centered on the TSS. (E) ± 400 bp zoom of the peak calls to the nearest TSS. The x axis shows the distance to the nearest TSS in base pairs. Negative values represent upstream, positive values downstream regions.
Comparison of GABPa motifs from different studies and databases. Sequence logos represent the different PWMs.GABPa peaks map close to gene starts and harbor GABPa binding sites residing closely to the peak centers. (A) Distance of the sites contributing to the MEME motif (6,031 of 6,208 in total) to the ChIP peak centers. (B) Number of GABPa motifs found per each ChIP-Seq peak. (C) Distribution of motif occurrences within 200 bp surrounding the ChIP peak centers. (D) Distance of peak calls to the nearest TSSs of UCSC genes within 10 kb, centered on the TSS. (E) ± 400 bp zoom of the peak calls to the nearest TSS. The x axis shows the distance to the nearest TSS in base pairs. Negative values represent upstream, positive values downstream regions.Under default parameters, the MEME algorithm assumes that each peak contains zero or one sequence motif. This assumption is advantageous to find nonrepetitive motif elements. However, as more than one motif is likely present in each peak region (Yu et al. 1997; fig. 3), we searched for additional binding sites. To this end, we used the motif alignment and scan tool MAST (Bailey et al. 2009; fig. 1). The MAST analysis revealed 11,619 PWM hits in 5,797 peak regions of 200 bp length, with the majority of peaks containing two binding sites, closely followed by peaks with single sites (fig. 3 and supplementary table S2, Supplementary Material online).Out of the 6,208 genomic peaks for GABPa, 4,277 (69%) peaks were located within 300 bp up- and downstream of the TSSs of 11,848 UCSC transcripts corresponding to 3,994 putative target genes (Entrez IDs). When we extended the window to ± 5 kb centered to the TSSs, the number of peaks mapping to transcripts increased to 5,321 (86%), corresponding to 15,046 transcripts from 5,218 putative targets (supplementary table S3, Supplementary Material online). Further extending the window to ± 10 kb, the numbers increased to 5,465 peaks (88%) mapping to 18,730 transcripts and corresponding to 5,784 putative genes. The majority of peaks reside close to the TSS (fig. 3). We also detected that GABPa binding sites seem to experience depletion between −150 and −100 bp distance from the closest TSS (fig. 3). For downstream analyses we used mappings within ±5 kb centered on UCSC-annotated TSSs.To further validate these putative GABPa binding regions, we performed gene reporter assays and GABPa knock-down experiments in the same cell lines (see below). In addition, we also generated a replicate ChIP-Seq data set and compared it to the peaks we obtained from the first ChIP-seq experiment. We found that 91% of the regions in the initial experiment (5,647 peaks) were also present in the regions from the replicate experiment (supplementary fig. S1, Supplementary Material online). According to the recommendation of the ENCODE consortium for ChIP-Seq experiments, we also calculated the Irreproducible Discovery Rate (IDR) (Landt et al. 2012) for the two replicates and found that 3,677 peaks (∼60%) overlapped at IDR less than 0.05 (supplementary fig. S2, Supplementary Material online), demonstrating reasonable consistency among the two replicates. Furthermore, we also analyzed agreement of our ChIP-Seq data with the ChIP-Seq data obtained by ENCODE in five different cell lines. We found a representative overlap between our data and these data sets, with approximately 60% of our peaks overlapping with at least one of the other data sets (supplementary fig. S3, Supplementary Material online).
Identification of Newly Evolved GABPa Binding Sites
In order to identify human-specific GABPa binding sites, we used all 11,619 GABPa binding sites to extract multiple sequence alignments from the UCSC MultiZ 44 vertebrate alignments. We obtained 11,008 alignments (fig. 1), while for the remaining 611 binding site regions there was either no alignment available or the aligned regions were not contiguous. In addition, we used the eight available nonhuman primate genomes (Chimpanzee, Gorilla, Orangutan, Macaque, Marmoset, Tarsier, Mouse Lemur, and Bushbaby), which allowed us to identify those binding sites that are specific to the Human, Hominini (Human and Chimpanzee), Homininae (Homini and Gorilla), and Hominidae (Homininae and Orangutan) lineages. We could reconstruct a hominid ancestral sequence for all except for 65 binding sites which miss aligned sequences (supplementary table S4, Supplementary Material online). We then searched within these ancestral sequences for the presence of GABPa motifs (supplementary table S5, Supplementary Material online).We discovered 224 GABPa binding sites that are specific to humans. These binding sites correspond to 219 ChIP-Seq peaks mapping within ±5 kb centered on TSSs of 217 genes (supplementary table S6, Supplementary Material online). Three of these genes are human-specific (CEP170, RPL41, and GUSBP4), and three more are Hominidae-specific (STAG3L4, USP6, and ZNF383) (Zhang et al. 2010) (supplementary table S7, Supplementary Material online). Two peaks with human-specific binding sites did not map to known genes. Manual inspection revealed that one of these peaks is located 317 nt upstream of the transfer RNA Phe (anticodon GAA) gene (uc021qjx.1), whereas the other one is 3,810 nt upstream of a human cDNA (uc021suf.1). Among the 217 promoters that gained human-specific GABPa binding sites, we detected 12 KRAB-ZNF TFs. Our subsequent test revealed that KRAB-ZNFs are indeed significantly overrepresented among genes with human-specific GABPa binding sites in their promoters (P value = 0.01116, Fisher’s exact test). For the ancestral sequences of Hominini, Homininae, and Hominids, we identified 57, 244, and 310 lineage-specific binding sites (supplementary table S8, Supplementary Material online) mapping to 44, 240, and 326 genes, respectively (supplementary table S9, Supplementary Material online). Binding site appearances for all ancestral branches leading to human are shown in supplementary figure S4, Supplementary Material online.To gain insights into the functions of those genes belonging to the 217 promoters that gained human-specific GABPa binding sites, we further performed enrichment tests based on Gene Ontology (GO) annotations. We found several enriched GO terms, for example, terms associated with heart development, RNA processing of tRNAs and mRNAs, mammary morphogenesis and development, lipid biosynthesis and metabolism, signaling pathways involved in ventral spinal cord interneuron and spinal cord motor neuron cell fate specification, and dorsal/ventral neural tube patterning (supplementary table S10, Supplementary Material online).
Functional Analysis of Newly Evolved GABPa Binding Sites Using Reporter Gene Assays
From the set of promoters with human-specific GABPa binding sites, we selected four promoters for further experimental validation: the promoters of ZNF197, ANTXR1, and TMBIM6 and the bidirectional promoter of ZNF398/ZNF425. ZNF197, ZNF398, and ZNF425 were chosen as representatives of the KRAB-ZNF family. The promoter of ZNF197 has two binding sites; one is conserved among mammals, whereas the other one is specific to humans. The anthrax toxin receptor-1 gene (ANTXR1) harbors three GABPa binding sites in the human promoter, but only two in chimpanzee and rhesus. In addition, this gene is highly expressed in HEK293T cells. Even though we were particularly interested in human-specific binding site gains, we included the TMBIM6 promoter, which harbors a hominid-specific binding site, because of its strong ChIP-Seq peak and expression levels of the corresponding gene (Sultan et al. 2008). Interestingly, MAST analysis predicted another GABPa binding site next to the hominid-specific binding site for TMBIM6, which is deeply conserved. However, this variation was found in only 0.94% of all 11,008 binding sites and it does not match the GABPa core consensus sequence “GGAA” (see fig. 4). Lastly, the promoter of the KRAB-ZNF genes ZNF398/ZNF425 was chosen as a representative of a bi-directional promoter having TSSs located approximately 130 bp apart. It harbors two overlapping GABPa binding sites caused by two human-specific single nucleotide changes. This promoter was cloned in both directions to account for bidirectional transcription. A genomic view of ChIP-Seq peaks, cloned fragments, sequence differences to the human reference sequence, and binding site predictions can be found in supplementary figure S5, Supplementary Material online.
F
Genomic view of GABPa ChIP-Seq reads spanning the TMBIM6 promoter including GABPa binding site predictions and multiple species alignment of the first exon. ChIP-Seq reads are colored in blue (forward reads) and red (reverse reads). The first exon (5′-UTR) is shown as blue bar with a black arrow indicating the TSS in HEK293T cells as determined by RNA-Seq. GABPa binding site predictions are shown as green and black boxes. Within the blowup in the lower part, including the UCSC multiple species alignment of exon 1, binding sites are shown as sequence logos of the GABPa PWM aligned to their matching positions. Within the alignment, dots indicate identity to the human reference sequence, whereas orange vertical bars indicate bases that are not depicted. Orange numbers below represent the sum of bases not depicted. The blue (C) illustrates the presence of a single cytosine in all nonhaplorhini at the indicated site.
Genomic view of GABPa ChIP-Seq reads spanning the TMBIM6 promoter including GABPa binding site predictions and multiple species alignment of the first exon. ChIP-Seq reads are colored in blue (forward reads) and red (reverse reads). The first exon (5′-UTR) is shown as blue bar with a black arrow indicating the TSS in HEK293T cells as determined by RNA-Seq. GABPa binding site predictions are shown as green and black boxes. Within the blowup in the lower part, including the UCSC multiple species alignment of exon 1, binding sites are shown as sequence logos of the GABPa PWM aligned to their matching positions. Within the alignment, dots indicate identity to the human reference sequence, whereas orange vertical bars indicate bases that are not depicted. Orange numbers below represent the sum of bases not depicted. The blue (C) illustrates the presence of a single cytosine in all nonhaplorhini at the indicated site.Orthologous promoters were cloned from human, chimpanzee, and rhesus macaque genomic DNA. For each promoter, two fragments were cloned, one representing the wt and the other one a mutated (mut) form. For human mutated forms, the binding sites were modified by one or two single nucleotide mutations to mimic the ancestral state incompatible with GABPa binding. Inversely, original sequences for chimpanzee and macaque were altered to generate the human-specific GABPa binding sites. All wt and mutated promoters were cloned into a modified firefly luciferase reporter gene vector (pGL3, see Materials and Methods) and verified by whole-insert sequencing. Reporter gene expression was measured in humanHEK293T cells and COS-1 cells derived from African green monkey and normalized to a cotransformed plasmid stably expressing Renilla luciferase.For all selected promoters we detected a measurable effect of the sequence differences in GABPa binding sites on the promoter activity (see figs. 5 and 6). The humanZNF197 wt promoter that harbors one conserved and one human-specific GABPa binding site showed significantly higher activity than the chimpanzee and rhesus macaque wt promoters (figs. 5A and 6). The introduction of a single nucleotide mutation to create the human-specific binding site within the sequence background of the chimpanzee and rhesus macaque promoter sequences resulted in a significant increase of the promoter activity in both cell lines, lifting reporter activities almost to the level of the human wt sequence. However, the activity of the human promoter did not change upon the binding site disruption to the ancestral state, thus indicating that also other sequences adjacent to the humanGABPa binding site might contribute to the overall promoter activity.
F
Normalized firefly luciferase activities of human, chimpanzee, and rhesus macaque wt and mutated (mut) promoters. Bars represent average firefly to Renilla ratio in black for human (HSA), in gray for chimpanzee (PTR) and in white for macaque (MAC) promoters. For each species, the left bar refers to the wt and the right bar to the mutated promoter. (+binding site) or (−binding site) indicate presence or absence of GABPa binding sites in wt promoters and indicate introduction or disruption of sites in mutated promoters. For each promoter, measured activities were normalized to the construct with the lowest promoter activity level in HEK293T cells (set to one; supplementary table S16, Supplementary Material online). Standard errors were calculated from at least six replicates. (*) indicates significant differences between wt and mutated promoter activities according to a one-tailed Welch’s t-test, while (#) indicates significant difference of wt chimpanzee or macaque promoters compared with human wt activity according to a two-tailed Welch’s t-test (supplementary tables S17–S21, Supplementary Material online). The raw data for all constructs and the statistical significance are available in supplementary material (supplementary tables S17–S21, Supplementary Material online).
F
The introduction and disruption of GABPa consensus binding sites significantly influence reporter gene activities. For each gene, the number of predicted binding sites within 200 bp surrounding the peak centers is indicated. Species are denoted by HSA, Homo sapiens; PTR, Pan troglodytes (chimpanzee); and MAC, Macacca mulatta (macaque). Sequences are shown for wt and mutated sites. Underlined bases indicate differences from the human wt sequence. Mutated bases are colored in green or red indicating generation or disruption of a GABPa binding site, respectively. Green arrows depict higher activity of mutated over wt promoter, red arrows indicate lower activity, and yellow arrows represent no change. Differences in mutated and wt promoter activities are given as log2 ratios of average luciferase to Renilla ratios. Significance levels, as determined by Welch’s t-test for unequal variances, are indicated as (*) P value < 0.05, (**) P value < 0.01, (***) P value < 0.001, and (ns) not significant.
Normalized firefly luciferase activities of human, chimpanzee, and rhesus macaque wt and mutated (mut) promoters. Bars represent average firefly to Renilla ratio in black for human (HSA), in gray for chimpanzee (PTR) and in white for macaque (MAC) promoters. For each species, the left bar refers to the wt and the right bar to the mutated promoter. (+binding site) or (−binding site) indicate presence or absence of GABPa binding sites in wt promoters and indicate introduction or disruption of sites in mutated promoters. For each promoter, measured activities were normalized to the construct with the lowest promoter activity level in HEK293T cells (set to one; supplementary table S16, Supplementary Material online). Standard errors were calculated from at least six replicates. (*) indicates significant differences between wt and mutated promoter activities according to a one-tailed Welch’s t-test, while (#) indicates significant difference of wt chimpanzee or macaque promoters compared with human wt activity according to a two-tailed Welch’s t-test (supplementary tables S17–S21, Supplementary Material online). The raw data for all constructs and the statistical significance are available in supplementary material (supplementary tables S17–S21, Supplementary Material online).The introduction and disruption of GABPa consensus binding sites significantly influence reporter gene activities. For each gene, the number of predicted binding sites within 200 bp surrounding the peak centers is indicated. Species are denoted by HSA, Homo sapiens; PTR, Pan troglodytes (chimpanzee); and MAC, Macacca mulatta (macaque). Sequences are shown for wt and mutated sites. Underlined bases indicate differences from the human wt sequence. Mutated bases are colored in green or red indicating generation or disruption of a GABPa binding site, respectively. Green arrows depict higher activity of mutated over wt promoter, red arrows indicate lower activity, and yellow arrows represent no change. Differences in mutated and wt promoter activities are given as log2 ratios of average luciferase to Renilla ratios. Significance levels, as determined by Welch’s t-test for unequal variances, are indicated as (*) P value < 0.05, (**) P value < 0.01, (***) P value < 0.001, and (ns) not significant.The humanANTXR1 promoter with two conserved and one human-specific binding site showed significantly higher activity than the chimpanzee and rhesus macaque wt orthologs in at least one of the two cell lines (figs. 5 and 6). Introduction of the human binding site into chimpanzee and macaque promoters raised activity levels significantly in three of the four cases, namely for chimpanzee in both cell lines and for macaque in HEK293T cells. Single nucleotide mutations of the human-specific binding site for creating the ancestral state caused significantly decreased reporter activity in COS-1 cells and also decreased activity in HEK293T cells.The promoter of TMBIM6 contains a hominid-specific GABPa binding site (see fig. 4). As expected, the activities of the human and chimpanzee wt promoter were higher than the activity of the rhesus macaque wt promoter in both cell lines. Surprisingly, the human wt promoter activity was found to be significantly higher than that of the chimpanzee wt promoter (figs. 5 and 6). This suggests an influence of sequence differences between humans and chimpanzees on the promoter activity. The introduction of the hominid-specific site into the macaque promoter resulted in a highly significant change in activity, incrementing the intensities above chimpanzee wt activity. Disruption of the hominid-specific binding site in human decreased the activity slightly below chimpanzee wt activity, whereas disruption of the chimpanzee binding site decreased the activity below macaque wt activity.The bidirectional promoter of ZNF398/ZNF425 showed significant differences in the activities of the human, chimpanzee, and rhesus macaque wt promoters in the direction of ZNF398 in COS-1 cells and ZNF425 in HEK293T cells (figs. 5B, 5E and 6). The introduction of the human binding site into chimpanzee and rhesus macaque promoters resulted in a significant increase in activity in both cell lines. The reversion of the human binding site to the chimpanzee sequence caused more than a 2-fold reduction in promoter activity in both cell lines.In summary, the introduction of humanGABPa binding sites into chimpanzee or rhesus macaque promoters resulted in significant increase in reporter gene expression in 17 of 18 cases in both cell lines. In contrast, the disruption of GABPa binding sites in human and chimpanzee promoters led to a significant decrease of the reporter gene activity in 9 out of 12 cases. In none of the cases we observed conflicting effects, since binding site introduction did never lead to significant activity decrease, and binding site disruption did never result in any significant activity increase.
Identification of Differential Gene Expression after GABPa Knock-Down
To obtain an independent confirmation of the functionality of all GABPa binding sites determined here by ChIP-Seq, we carried out two knock-down experiments of GABPa in HEK293T cells. Knock-down was performed through transfection of cells with two independent types of silencing molecules (Qiagen SI00423311 and Invitrogen HSS103907), followed by genome-wide expression profiling at two different time points (RNA extraction at 24 and 72 h after transfection) (see Materials and Methods). We only considered genes that were significantly differentially expressed (DE) with both siRNA molecules. Among 14,873 expressed genes, we found 1,156 (24 h after transfection) and 3,238 (72 h) to be DE. For the 72 h experiment, more DE genes than expected by chance harbored GABPa ChIP-Seq peaks in their regulatory sequence (P < 0.001) (see Materials and Methods). In total, out of 4,531 genes that are expressed at the 72 h time point and have at least one GABPa binding site, about a quarter (1,215) showed significant changes in expression after GABPa knock-down. These genes are strong candidate target genes of GABPa (supplementary table S11, Supplementary Material online).
Analyses of DE Genes
Next, we used GO enrichment analyses to determine the terms associated with those 1,215 strong candidate target genes of GABPa (see Materials and Methods). A number of functional categories were significantly enriched; including several GO terms associated with RNA processing, ribosome biogenesis, regulation of lipid metabolism, protein ubiquitination, neuron projection development and innate, and adaptive immune response, among others (supplementary table S12, Supplementary Material online).In addition, we also analyzed 2,023 genes found to be DE after GABPa knock-down (72 h time point, see Materials and Methods) that did not presented GABPa binding sites in their promoter regions (supplementary table S11, Supplementary Material online). We found very similar GO groups enriched, suggesting that among the DE genes are not only primary targets of GABPa, but also a big group of secondary targets involved in similar processes. For instance, for both groups (DE genes with ChIP-Seq peaks and DE genes without ChIP-Seq peaks for GABPa) we found significant enrichments for biological processes associated with lipid and fat biosynthesis and metabolism such as lipoprotein transport, glycosphingolipid metabolic process, lipid biosynthetic process, fatty acid transmembrane transport, and regulation of fatty acids, mitochondrial biogenesis, among others (supplementary table S13, Supplementary Material online). In addition, we also found in all GO enrichment analyses an overrepresentation of terms related to tissue development, such as regulation of endothelial tube morphogenesis and endothelial cell proliferation, and epithelial cell differentiation involved in mammary gland alveolus development.
Analysis of DE Genes with Human-Specific GABPa Binding Sites
Of the 217 genes with human-specific binding sites, we found 52 that showed significant changes in gene expression 72 h after GABPa knock-down with both siRNA molecules (supplementary table S7, Supplementary Material online). We found an enrichment of genes with human-specific binding sites among the DE genes (Fisher’s Exact test, P value = 5.79−10). Four (CEP170, RPL41, GUSBP4, and STAG3L4) out of the six human and great ape specific genes having human-specific GABPa binding sites showed significant changes in gene expression after GABPa knock-down at least with one siRNA molecule (supplementary table S7, Supplementary Material online). This indicates that many newly evolved binding sites are functional.Since GABPa has been associated with several human medical conditions (Bauer-Mehren et al. 2010), we explored if some of these 52 genes have also been associated with the very same altered phenotypes. We used the gene-disease association database “DisGeNET” (Bauer-Mehren et al. 2010, 2011), finding information for 17 genes, many of them have been associated with diabetes (PCMT1, UGGT2), Parkinson disease (RPL6, TDP2, HSPA8), and breast cancer (ACOT13, ANTXR1, BAG4, EMG1, HSPA8, NEK7, YPEL3, ZNF398), among other diseases (supplementary table S14, Supplementary Material online).
Discussion
By using ChIP-Seq and knock-down experiments, we found that GABPa regulates a substantial fraction of human genes. In total, we identified binding sites of GABPa in the vicinity of 5,321 genes. Out of them, 217 genes have gained human-specific GABPa binding sites. We found that GABPa binds to almost one-third (31%) of the promoters of genes expressed in HEK293T cells. Analyses of differential expression after GABPa knock-down allowed us to identify 1,215 genes with at least one GABPa binding site and significant changes in gene expression. This constitutes a set of strong candidates for being primary targets of GABPa. We also found that around 2,000 genes are likely to be secondary targets of GABPa. In general, our results indicated that GABPa is mainly involved in the regulation, direct or indirect, of genes that are important for neurological processes, lipid biosynthesis and metabolism, endothelial and epithelial cell morphogenesis, and mammary morphogenesis and development.
Newly Evolved GABPa Binding Sites Are Functional
To find binding sites that are specific to humans or hominids, we reconstructed the ancestral sequences for the 11,008 humanGABPa binding sites in HEK293T cells based on the UCSC 44-vertebrate whole-genome alignments. This approach relies on the accuracy of the UCSC alignments. UCSC multiZ alignments of human–chimpanzee and human–macaque have been estimated to be problematic (while not necessarily wrong) for 0.004% and 0.02% of the aligned nucleotides, respectively (Prakash and Tompa 2007). Theoretically, this would imply that for the 11,008 human–macaque alignments corresponding each to 11bp of a GABPa binding site, a fraction of 24 nt was problematically aligned. However, this fraction is likely even smaller as most problematic alignments have been found in intronic and intergenic regions (Prakash and Tompa 2007). The majority of the GABPa binding sites reside in proximal promoter regions where mammalian genomic sequences are particularly conserved (Mahony et al. 2007), allowing for accurate overall alignments. We thus expect that our results were not significantly influenced by erroneous alignments.To test whether the identified binding sites are functional, we selected four of them for experimental validation, three of them carrying human-specific, and one carrying a hominoid-specific GABPa binding site. We carried out dual luciferase reporter assays of human, chimpanzee, and macaque promoters in humanHEK293T and African green monkey-derived COS-1 cells. The relative reporter activity was similar in both cell lines, thus experimentally indicating independence from the species-specific cellular background. Importantly, this also indicates that the activity of human-specific binding sites can be demonstrated even within the genomic/proteomic background of a catharine monkey cell line. The insertion of the humanGABPa binding site into chimpanzee or rhesus macaque promoters resulted in significant increase in reporter gene expression in the majority of the cases. Conversely, disruption of GABPa binding sites in human and chimpanzee promoters led to a significant decrease of the reporter gene activity.Mutation analyses for the promoters of ZNF197 and ANTXR1 showed that an insertion of a human-specific GABPa binding site into chimpanzee and macaque promoters resulted in a significant and consistent increase in reporter activity. In contrast, we did not observe a decrease in activity when disrupting the newly evolved binding sites in the human promoters of these two genes. In general, this does not necessarily render the human-specific binding sites irrelevant; under different conditions these binding sites may still have a functional impact. Instead, our findings could indicate the presence of additional mutations that facilitate the binding of one or more factors to compensate for the activating property of the new GABPa binding sites. Indeed, both promoters (ZNF197 and ANTXR1) harbor additional human-specific mutations within less than 100 bp to the newly evolved GABPa binding sites. This could illustrate a scenario where an increase in gene expression was beneficial at some time during evolution, with a subsequent period of decreased evolutionary pressure. In the case of TMBIM6 the human wt promoter drives higher reporter activity compared to the chimpanzee wt promoter, even though both species share a GABPa consensus-binding site. However, the human promoter (including exon 1) harbors two additional single nucleotide mutations in very close proximity that might account for the observed difference (see fig. 4). Disruption of two overlapping GABPa binding sites in the bidirectional promoter of ZNF398/ZNF425 resulted in greater than 2-fold activity reduction in the direction of ZNF398 and greater than 1.2-fold reduction in the direction of ZNF425. To convert the rhesus macaque-binding site to the human binding site, it was necessary to introduce six mutations, accompanied by a 3 bp deletion. Interestingly, this strong intervention had only a moderate effect in the direction of ZNF425. Conversely, in the direction of ZNF398, we observed an almost 5-fold increase in activity. This might indicate context-specificity, in other words, which gene is affected depends on the tissue or cell type.Notably, in a previous study where GABPa binding sites were introduced into six promoters that are not regulated by GABPa in their wt form, only one promoter was activated (Collins et al. 2007). This may indicate that the introduction of GABPa binding sites per se is mostly insufficient to affect gene expression. However, we found that by inserting human-specific GABPa binding sites into chimpanzee and rhesus macaque promoter regions, the reporter gene activity increased consistently. Our results thus support the conclusion that binding sites need to be placed in the right genomic context to exert an influence on gene expression.Our knock-down experiments further revealed that 1,215 of the genes with at least one GABPa binding site changed in expression after GABPa knock-down. Interestingly, genes with human-specific binding sites were enriched among the genes with differential expression after knock-down. This indicates that at least a representative number of the predicted newly evolved binding sites contribute to transcriptional regulation by GABPa in humans, including ANTXR1 and ZNF398, which we had also tested in our promoter assays.
Human-Specific GABPa Binding Sites Regulate Genes That Are Potentially Important for Human Evolution and Human Diseases
Among the genes with GABPa binding sites specific to the human lineage, we found a significant overrepresentation of KRAB-ZNF TFs genes. KRAB-ZNFs, a relatively young and fast evolving family of TF genes with several primate-specific family members (Hamilton et al. 2006; Huntley et al. 2006; Nowick et al. 2010), seem prone to acquire new GABPa binding sites. KRAB-ZNF genes play a role in a number of biological processes, including development and brain functions (Najmabadi et al. 2011; Zhang et al. 2011) and have been considered as excellent candidates for contributing to postzygotic isolation and other barriers that could lead to speciation processes (Nowick et al. 2013). Consequently, it would be interesting to further investigate how GABPa, presumably via regulating the expression of KRAB-ZNF genes expressed in brain cells, has contributed to the evolution of human-specific phenotypes.Among the genes with human-specific GABPa binding sites, we also found enrichment for genes involved in RNA processing, especially of tRNAs, and signaling pathways involved in ventral spinal cord interneuron and spinal cord motor neuron cell fate specification and dorsal/ventral neural tube patterning (supplementary table S10, Supplementary Material online). For instance, the promoters of DARS and PARS2, genes that encode for aminoacyl tRNA synthetases (aspartyl- and prolyl-tRNAs) and catalyze the attachment of amino acids to their cognate tRNAs, carry human-specific GABPa binding sites. Both of these genes were DE with at least one siRNA molecule. Functional changes in these two aminoacyl tRNA synthetases have been related with neuronal disorders of the peripheral nervous system, amyotrophic lateral sclerosis, and ataxia (Park et al. 2008), brain stem and spinal cord leukoencephalopathy with cerebellar and dorsal column dysfunctions (Sissler et al. 2007; Taft et al. 2013), and Alpers syndrome (Sofou et al. 2015).ALDOA, HSPA8, and TP73 are three further examples of genes with human-specific GABPa binding sites that showed significant changes in gene expression after GABPa knock-down. They, as well as TMBIM6, which we have tested with our reporter assays, have been associated with cognitive diseases such as autism, Alzheimer’s disease (Harris et al. 2007; Kumar et al. 2009), and Parkinson’s disease (Naidoo 2009; Lauterbach 2013), control of neocortical regionalization and brain pathologies like loss of Cajal-Retzius cells (Meyer et al. 2004), abnormal accumulation of cerebrospinal fluid in the brain (Yang et al. 2000), and neuron apoptosis (Pozniak et al. 2002). In agreement with our results, it has been shown that expression changes in GABPa might have consequences for mitochondrial functions in patients with Alzheimer’s disease (Sheng et al. 2012).Genes that harbor human-specific GABPa binding sites and that significantly change their expression after GABPa knock-down have also been associated with the same medical conditions with which GABPa has been associated. For instance, protein-L-isoaspartate (D-aspartate) O-methyltransferase (PCMT1), UDP-glucose glycoprotein glucosyltransferase 2 (UGGT2), as well as TMBIM6, have been linked to diabetes types I and II. PCMT1 encodes the Protein-L-isoaspartate (D-aspartate) O-methyltransferase, a repair enzyme for which reduced expression levels have been associated with delaying the appearance and decreasing the severity of diabetes type I (Wägner et al. 2007, 2008). Similarly, patients with diabetes type II showed significantly lower UGGT2 expression (Marchetti et al. 2007). Noteworthy, we also found enrichments for processes associated with triglyceride biosynthesis and cholesterol transport, both being connected with the occurrence and prevalence of several humanmetabolic diseases, for example, abnormal blood sugar levels. Thus, we suggest that human-specific GABPa binding sites might be associated with human-specific changes in metabolic functions, potentially increasing the risk for developing diabetic phenotypes and other metabolic syndromes.It was previously demonstrated that GABPa is associated with the transcriptional regulation of genes directly involved in controlling cell migration in breast epithelial cells (Odrowaz and Sharrocks 2012). Interestingly, among the genes with human-specific binding sites, we found enrichment of genes involved in mammary gland development (supplementary table S10, Supplementary Material online). For instance, PHB2, also known as Repressor of Estrogen Receptor Activity (REA), is important for mammary gland cell proliferation, breast alveolus development, and postnatal breast development (Mussi et al. 2006). PHB2 has a human-specific GABPa binding site 26 bp downstream the TSS and shows significant change in expression after GABP knock-down. Within this study, and given the uniqueness of the human breast, this particular human-specific GABPa binding site is the best candidate for a functional contribution to this human trait.
Conclusions
In summary, we identified human-specific TFBSs for GABPa using ChIP-Seq data together with comparative genomic analysis and experimentally demonstrated the functionality of selected newly evolved GABPa binding sites. Our results depict a scenario that may connect gene regulation by GABPa to human speciation and to shaping human-specific phenotypes, including brain, breast, and metabolic functions among others. Our work can also serve as an example for using a bioinformatics approach with the increasing number of publicly available ChIP-Seq data for TFs to study binding site evolution on a “TF-ome”-wide level.
Materials and Methods
Chromatin Immunoprecipitation-Sequencing
We performed ChIP-Seq according to a published protocol (Warnatz et al. 2011). Briefly, 5 × 107 HEK293T cells were cross-linked for 10 min at room temperature with 1% formaldehyde, nuclei were prepared following the published protocol and chromatin was sheared to 100–500 bp size by 45 cycles of 30 s on/off at the highest amplitude using a Bioruptor water bath sonicator (Diagenode). Nuclear extracts were immunoprecipitated with 10 µg rabbit anti-GABP-α (H-180X, Santa Cruz Biotechnology sc-22810) and 70 µl Protein G-Dynabeads (Invitrogen). After washing of beads, protein-DNA complexes were eluted, crosslinks were reversed overnight, and DNA was purified according to the manufacturer’s protocol. For sequencing library preparation, 2 ng ChIP DNA and 10 ng Input DNA were subjected to end-repair, addition of adenine bases and ligation of sequencing adapters, followed by DNA amplification through polymerase chain reaction (PCR) and subsequent gel purification for sequencing on an Illumina Genome Analyzer GAII according to the manufacturer’s protocol for 36 bp reads. Reads were aligned to the human genomic sequence (hg18) using Eland, resulting in 6,955,499 GABPa ChIP reads with unique match to the genome (allowing up to two mismatches) and 2,948,346 corresponding reads from the input DNA. A replicate ChIP-seq experiment performed later for validation of the initial experiment resulted in 16,856,422 GABPa ChIP reads and 26,104,399 reads from the input DNA.
Peak Calling, Gene Mapping, MEME, and MAST Analysis
ChIP-Seq reads were analyzed in three steps as published previously (Valouev et al. 2008). To find regions of high sequencing read density (peaks) within 6.96 million reads from GABPa ChIP-Seq, we used the peak-calling algorithm available in QuEST. In total, we found 6,208 genomic peaks for GABPa (fig. 1 and supplementary table S15, Supplementary Material online). From a replicate experiment with 16.86 million ChIP-Seq reads, we identified 8,311 genomic peaks of GABPa, which were used for calculation of the overlap of regions among replicates and for calculation of the IDR as described before (Landt et al. 2012). The 6,208 peaks from the first replicate were mapped to all UCSC known transcripts that start within ±5 kb of the peak (hg19). For annotation we used 82,961 UCSC known transcripts (Hsu et al. 2006) corresponding to 23,460 Entrez genes. UCSC transcript IDs were converted to Entrez gene IDs using UCSCs knownToLocusLink table. 1,116 peaks were mapped to UCSC transcripts that do not correspond to Entrez genes (supplementary table S3, Supplementary Material online). A majority of these correspond to partial novel nucleic acids, CDS, human cDNAs, among others. After extraction of peak-associated sequences comprising 200 bp surrounding each peak center via the UCSC table browser (Karolchik et al. 2004), we applied MEME (Bailey et al. 2009) to identify overrepresented motifs. 97% of the 6,208 peaks contributed to the motif identified by MEME (supplementary table S1, Supplementary Material online). Using default parameters, MEME assumes that each sequence contains zero or one motif. The derived PWM was then used to run the MEME tool MAST that reports the occurrences of PWM hits for each sequence in the input set at a particular stringency (set to P = 0.001) (fig. 1 and supplementary table S2, Supplementary Material online).
Multiple Species Alignment Extraction and Conversion
UCSC provides the 44 way-vertebrate alignments in a multiple alignment format (MAF) that consists of short blocks (1–200 bp) of multiple alignments that can be concatenated. We extracted the alignments corresponding to GABPa binding sites within the ChIP peak regions via UCSCs table browser (Karolchik et al. 2004) and converted the MAF-formatted alignments into the commonly used FASTA format, while excluding nonsyntenic blocks and species with missing sequence data (e.g., insertions not included in the MAF alignments; supplementary table S4, Supplementary Material online).
Ancestral Sequence Reconstruction
Ancestral sequences were calculated using ANCESTORS (Blanchette et al. 2004) obtained from http://ancestor.bioinfo.uqam.ca/programs/anc.tar. The program requires a multiple species alignment and a phylogenetic tree including branch lengths. Therefore, we reconstructed the ancestral sequences along with the phylogeny of 34 mammalian species from the UCSC 44-vertebrate alignments (fig. 1) (Diallo et al. 2007). The approach implemented in ANCESTORS is suitable for reconstructing ancestral sequences, including the most likely scenario of insertions and deletions observed in alignments while retaining an extremely high degree of accuracy (Diallo et al. 2007). To calculate branch lengths, all alignments were concatenated and run through BASEML, a maximum likelihood-based program of the PAML package (Yang 2007). The nucleotide substitution model HKY was used in both programs. Phylogeny was taken from UCSC (phyloP44wayPlacMammal) available at http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons44way/vertebrate.mod.
Cloning and Plasmid Preparation
PCR primers were designed using the Primer3 online service and extended by 29 bp Gateway attB tails (Invitrogen) at the 5′-end of each primer. Touch-down PCR was performed as described previously (Ralser et al. 2006), except for the supplementation of each reaction with 0.001U Pfu polymerase. Mutations were introduced by primer-mediated mutagenesis. To facilitate cloning, the Gateway cloning cassette (Invitrogen) was amplified with the forward primer attP1 and the reverse primer attP2 and cloned into the pGL3 reporter vector (Promega). PCR products were purified and cloned upstream of the luciferase gene in the modified pGL3 vector using BP Clonase II Enzyme Mix (Invitrogen) following the manufacturer’s instructions. Plasmids were transformed into the methylation-deficient Dam–
Escherichia coli strain GM2929. Inserts of positive clones were validated using capillary Sanger sequencing (Services in Molecular Biology, Berlin, Germany). DNA concentration was measured on a Nanodrop UV spectrophotometer (NanoDrop Technologies) and standardized to 50 ng/µl for transfections.
Cell Culture, Transient Transfection, and Reporter Gene Activity Assays
HEK293T and COS-1 cells were cultivated in Dulbecco’s modified Eagle’s medium (DMEM, Gibco) supplemented with 100 U/ml penicillin/G-streptomycin (Biochrom) and 10% heat-inactivated fetal bovine serum (Biochrom) at 37 °C and 5% CO2. We seeded approximately 15,000 (HEK293T) and approximately 5,000 (COS-1) cells per well in clear-bottom 96-well plates (Costar). Twenty-four hours after seeding, we cotransfected 150 ng of experimental firefly luciferase plasmid together with 10 ng of Renilla luciferase control plasmid (pRL-TK, Promega) in five replicates using Lipofectamine 2000 following the manufacturer’s recommendations. Cells were lyzed 24 h posttransfection. We measured firefly luciferase and Renilla luciferase activities using the Centro LB960 luminometer (Berthold) and the Dual Luciferase Kit (Promega). We followed the protocol suggested by the manufacturer with the exception of injecting 25 µl each of the firefly luciferase and Renilla luciferase substrate reagents. All measurements were performed at least in three technical and two biological replicates, including new dilution and concentration adjustments of reporter plasmids.
Inhibition of GABPa Expression by RNA Interference
RNA interference experiments for GABPa were performed using two independent types of silencing molecules against GABPa, namely one unmodified synthetic small interfering RNA (Qiagen SI00423311) and one chemically modified synthetic small interfering RNA (Invitrogen HSS103907). HEK293T cells were seeded in 12-well plates together with siRNA-HiPerFect complexes according to the HiPerFect fast forward protocol (Qiagen). For the mock transfections, cells were treated with HiPerFect reagent only. Knock-down transfections were performed in triplicates; mock transfections were performed in quadruplicates. Total RNA was extracted from cultured cells at 24 and 72 h posttransfection using the RNeasy mini kit (Qiagen) following the manufacturer’s instructions. All RNA samples were DNase-treated, purified, quantified, and inspected for integrity. For hybridizations on microarrays, biotinylated cRNA was synthesized using the GeneChip expression 3′-amplification one-cycle target labeling and control reagents (Affymetrix). Following integrity control, the cRNA was hybridized to the Affymetrix GeneChip HG-U133Plus2. The arrays were washed, stained, and scanned following the recommended protocols from Affymetrix.Gene expression was analyzed using the “affy” package from Bioconductor (Gautier et al. 2004). In particular, we determined Robust Multi-Array Robust Multi-array Average (RMA) normalized expression values for each probeset (Bolstad et al. 2003), filtered for probe sets that were not reliably detected in any sample (detection P value >0.05), and combined rma values of probesets referring to the same gene into one mean expression value as described before (Nowick et al. 2009). We determined genes that were DE between knock-down and mock transfected samples for each time point using the package “multtest” (Pollard et al. 2005) implemented in Bioconductor. We then overlapped the DE genes of each time point with the candidate genes from our ChIP-Seq experiments, obtaining 392 and 1,215 genes overlapping for 24 and 72 h experiments, respectively. Permutation tests indicated that the observed overlap is higher than expected by chance for the 72 h time point (P value = 0.001), conversely to the results obtained for 24 h treatment (P value = 0.137). This suggests that for the 72 h treatment we obtained more genes than we would expect by chance. Consequently, we chose the data of the 72 h treatment for further analyses.
GO Enrichment Analyses
In total, we performed three GO analyses using the hyper geometric test implemented in FUNC (Prüfer et al. 2007). In the first analysis, we tested for enrichment of GO groups in the 217 genes located ±5 kb around the human-specific GABPaTFBS compared to the genes with at least one ChIP-Seq peak in the promoter region (± 5) (supplementary tables S6 and S11, Supplementary Material online). In the second analysis, we assessed enrichment among the set of genes that were DE after GABPa knock-down with both siRNA molecules (Qiagen: SI00423311 and Invitrogen: HSS103907) and had at least one GABPa binding site in our ChIP-Seq data compared with the set of DE genes not having ChIP-Seq peaks in the promoter region (supplementary tables S11 and S12, Supplementary Material online). In the third test, we analyzed whether DE genes after GABPa knock-down and without a GABPa binding site in our ChIP-Seq data were enriched for particular GO groups (supplementary tables S11 and S13, Supplementary Material online). We followed the enrichment tests by refinement of the enriched GO groups as implemented in FUNC (Prüfer et al. 2007) applying P value cutoffs after refinement of P < 0.05 for the first and third tests test, and P < 0.01 for the second one.
Supplementary Material
Supplementary figures S1–S5 and tables S1–S21 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
Authors: Martin S Taylor; Tim Massingham; Yoshihide Hayashizaki; Piero Carninci; Nick Goldman; Colin A M Semple Journal: Nat Genet Date: 2008-11 Impact factor: 38.330
Authors: Marc Sultan; Marcel H Schulz; Hugues Richard; Alon Magen; Andreas Klingenhoff; Matthias Scherf; Martin Seifert; Tatjana Borodina; Aleksey Soldatov; Dmitri Parkhomchuk; Dominic Schmidt; Sean O'Keeffe; Stefan Haas; Martin Vingron; Hans Lehrach; Marie-Laure Yaspo Journal: Science Date: 2008-07-03 Impact factor: 47.728
Authors: Jane M Lin; Patrick J Collins; Nathan D Trinklein; Yutao Fu; Hualin Xi; Richard M Myers; Zhiping Weng Journal: Genome Res Date: 2007-06 Impact factor: 9.043
Authors: A M Wägner; P Cloos; R Bergholdt; P Boissy; T L Andersen; D B Henriksen; C Christiansen; S Christgau; F Pociot; J Nerup Journal: Diabetologia Date: 2007-01-10 Impact factor: 10.122
Authors: P Marchetti; M Bugliani; R Lupi; L Marselli; M Masini; U Boggi; F Filipponi; G C Weir; D L Eizirik; M Cnop Journal: Diabetologia Date: 2007-09-30 Impact factor: 10.122
Authors: Els C M Brinkman-Van der Linden; Nancy Hurtado-Ziola; Toshiyuki Hayakawa; Lisa Wiggleton; Kurt Benirschke; Ajit Varki; Nissi Varki Journal: Glycobiology Date: 2007-06-18 Impact factor: 4.313
Authors: Ryan J Taft; Adeline Vanderver; Richard J Leventer; Stephen A Damiani; Cas Simons; Sean M Grimmond; David Miller; Johanna Schmidt; Paul J Lockhart; Kate Pope; Kelin Ru; Joanna Crawford; Tena Rosser; Irenaeus F M de Coo; Monica Juneja; Ishwar C Verma; Prab Prabhakar; Susan Blaser; Julian Raiman; Petra J W Pouwels; Marianna R Bevova; Truus E M Abbink; Marjo S van der Knaap; Nicole I Wolf Journal: Am J Hum Genet Date: 2013-05-02 Impact factor: 11.025
Authors: Hansell H Stedman; Benjamin W Kozyak; Anthony Nelson; Danielle M Thesier; Leonard T Su; David W Low; Charles R Bridges; Joseph B Shrager; Nancy Minugh-Purvis; Marilyn A Mitchell Journal: Nature Date: 2004-03-25 Impact factor: 49.962
Authors: Veronica M Pravata; Villo Muha; Mehmet Gundogdu; Andrew T Ferenbach; Poonam S Kakade; Vasudha Vandadi; Ariane C Wilmes; Vladimir S Borodkin; Shelagh Joss; Marios P Stavridis; Daan M F van Aalten Journal: Proc Natl Acad Sci U S A Date: 2019-07-11 Impact factor: 11.205
Authors: Vladimir M Jovanovic; Melanie Sarfert; Carlos S Reyna-Blanco; Henrike Indrischek; Dulce I Valdivia; Ekaterina Shelest; Katja Nowick Journal: Front Genet Date: 2021-05-17 Impact factor: 4.599