Literature DB >> 15980499

Identifying synonymous regulatory elements in vertebrate genomes.

Ivan Ovcharenko1, Marcelo A Nobrega.   

Abstract

Synonymous gene regulation, defined by regulatory elements driving shared temporal and/or spatial aspects of gene expression, is most probably predicated on genomic elements that contain similar modules of certain transcription factor binding sites (TFBS). We have developed a method to scan vertebrate genomes for evolutionary conserved modules of TFBS in a predefined configuration, and created a tool, named SynoR that identifies synonymous regulatory elements (SREs) in vertebrate genomes. SynoR performs de novo identification of SREs utilizing known patterns of TFBS in active regulatory elements (REs) as seeds for genome scans. Layers of multiple-species conservation allow the use of differential phylogenetic sequence conservation filters in search of SREs and the results are displayed such as to provide an extensive annotation of the genes containing the detected REs. Gene Ontology categories are utilized to further functionally classify the identified genes, and integrated GNF Expression Atlas 2 data allow the cataloging of tissue-specificities of the predicted SREs. SynoR is publicly available at http://synor.dcode.org.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15980499      PMCID: PMC1160227          DOI: 10.1093/nar/gki466

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The complex patterns of gene expression in vertebrates arise from the combinatorial interaction of multiple transcription factors with target cis- regulatory units consisting of modules of transcription factor binding sites (TFBS). Although in simpler organisms, such as yeast, bacteria and viruses, REs are usually associated with the promoters of their target genes, in more complex organisms, especially vertebrates, REs modulate promoter activity and are often positioned remotely from the genes they regulate—sometimes being as far away as a megabase from the transcriptional start site of a gene (1). Therefore, the general architectural features of complex gene regulatory networks, consisting of multiple distant REs distributed over long distances, upstream and downstream of a gene, make their identification challenging. Comparative genomics was shown to be a powerful tool in facilitating the genomic search for REs (2), but despite the progress in identifying REs, the location of the majority of vertebrate REs remains unknown, owing partially to our lack of understanding about what are the fundamental components of REs, and whether their organizational rules (if any) can be used as signatures for the genome-wide identification. Toward this end, it has recently been shown that in invertebrates searching for TFBS clustered in defined configuration allows for the identification of REs with a predefined function (3,4). Several tools have been created to identify specific modules of TFBS in promoters of co-expressed genes, including Toucan (5) and Crème 2.0 (6). Recently, those observations were expanded to genome scans of vertebrates using defined TFBS motifs as seeds (7,8). Our goal was to expand on these observations, develop and test a new strategy to carry out genome-wide scans for REs using evolutionarily conserved TFBS (cTFBS) motifs. Known TFBS structures of REs, defined as a cluster of TFBS and their defined spatial order and distribution, were used as seeds to search for novel REs that can dramatically differ from the original REs at the sequence level, but are synonymous in function—synonymous REs (SREs). We created a publicly available tool, SynoR (), which provides the users with the ability to extend the knowledge derived from a single gene's regulatory genomic structure to the whole genome and identify novel genes with synonymous regulation.

METHODS

Genome-wide annotation of cTFBS

The ECR Browser tool generates whole-genome blastz-based alignments of vertebrate and invertebrate genomes (9). To generate a dataset of cTFBS for SynoR scans, we have established an automated annotation of evolutionarily cTFBS based on the ECR Browser alignments. This was created by using the rVista 2.0 tool () (10) with ‘optimized-for-function’ position weight matrix thresholds (11). The automated ECR Browser/rVista 2.0 annotation gradually expands the list of available genome alignments with cTFBS for the subsequent SynoR processing. Table 1 summarizes the number of cTFBS in the human genome as compared with different species.
Table 1

cTFBS in alignments of the human genome (hg17) to the mouse (mm5), chicken (galGal2), frog (xenTro1) and fugu (fr1) genomes (assembly indexes from the UCSC genome browser)

OrganismMouseChickenFrogFugu
No. of cTFBS13 069 0481 945 164859 769402 784

Defining TFBS modules as seeds for the genome scans

Transcription Factor (TF) molecules interact with each other and bind to DNA to establish a gene transcription signal. The number of different TFs in a module, the number of TFBS, the spatial constraints, the order of TFBS and relative strands of TFBS differ for different regulatory pathways. SynoR requires user input describing a TFBS module structure to initiate a genome scan. In practical terms, three tiers of information on TFBS modules might be available: (i) a list of TFs known to participate in a particular regulatory pathway, (ii) a set of spatial constraints separating different TFBS and (iii) the order and orientation of individual TFBS in a module. While the TF content is essential for the genome scans, the other two tiers of information effectively refine the module signature and are provided as optional features. SynoR is limited in selection of TFBS to the list of TFBS available from the TFANSFAC Professional database (12), which is utilized by the rVista 2.0 tool in genome scans.

Identification of statistically enriched Gene Ontology (GO) categories

To predict the putative biological function of the identified elements, SynoR performs a GO category enrichment analysis for the genes that either flank or contain the non-coding subset of these elements. The tool employs binomial distribution analysis as an approximation for the hypergeometric distribution to accomplish this analysis (applicable owing to the number of genes in each GO category being significantly smaller than the number of genes in the genome). The statistical analysis depicts GO categories that contain significantly more genes than would be expected just by chance given the number of identified genes with GO annotation and the distribution of all the genes in the genome among different GO categories. At the final step of the analysis SynoR utilizes Holm's sequential Bonferroni correction (13) to correct for the multiple testing. Significantly enriched GO categories (as quantified by the P-value < 0.05) are reported to the user.

Establishing tissue-specificities of identified genes

To predict tissue specificity (if any) of the identified genes with non-coding elements, SynoR analyzes the GNF Expression Atlas 2 data (14). It performs a two-step clustering analysis of tissue specificity in expression of the genes. First, the clustering of the data into groups of co-expressed genes is performed using the Cluster 3.0 tool (15) and the results are visualized in a micro-array expression profile style plot (Figure 2). At the second step, SynoR identifies a set of tissues, in which the genes are either significantly overexpressed or suppressed. In order to do so, the tool calculates the difference between the number of overexpressed and the number of suppressed genes for each tissue i, δj. An estimate for an average difference and a corresponding standard deviation σδ are calculated using the distribution of δj across all the tissues. That allows defining a z-value describing deviation in the observed difference in the number of overexpressed and suppressed genes versus the expectation for a given tissue: The expression in tissues with an absolute z-value >2.0 is reported as significantly increased/decreased, and in tissues with an absolute z-value >1.0 as changed.
Figure 2

GNF Expression Atlas2 analysis for genes identified in the SRF/SP1 SynoR scan of the human genome. A subset of 46 genes including the SRF gene is presented. Cardiac myocytes with significant overexpression identified by solid red background. Light red and light green backgrounds correspond to the overexpressed and suppressed tissue categories. Different columns correspond to different tissues listed on top and different rows correspond to the identified genes listed on the right. The number in parentheses following gene name provides a distance between an element and the gene in case of intergenic elements.

In the search for tissue-specificities, performed by the Cluster 3.0 tool, SynoR eliminates absolute differences in expression in between different genes from the analysis. In order to do so, expression pattern of each gene across different tissues is normalized by dividing expression score in a particular tissue by the highest expression score in all the tissues. This effectively brings the average expression of highly expressed genes and the genes with a low level of expression to the same level and strongly highlights the differences in gene expression across different tissues. Also, GNF Atlas2 expression patterns in cancer cell lines and cell lines without profound tissue-specificity are excluded from the analysis to provide sampling of co-expression in normal tissues; thus, providing a link between a predicted SRE and normal tissue specificity.

RESULTS

Design and features of the SynoR tool

SynoR utilizes pre-computed annotations of cTFBS in vertebrate genomes (as obtained through multi-species genome alignments) adopted from the ECR Browser (9) (). It scans the genome distribution of cTFBS in search for modules of TFBS in defined spatial configurations that match the seed profile defined by the user (Figure 1). SynoR overlaps the identified TFBS modules with gene annotation [‘UCSC known genes’ (16)] to categorize them into promoter elements, UTRs, introns, intergenic elements and coding exons. The ratio of newly identified elements overlapping with coding exons is expected to be small in comparisons of evolutionary distant species (such as humans and fish or humans and chickens), serving as an immediate quantifier of the specificity of predictions. The online results page also includes the multi-species conservation analysis of all the identified modules. Genes bracketing the identified noncoding elements or including them are selected for a further two-step analysis of shared activity. First, the Gene Ontology (GO) (17) categories for each gene, reflecting their biological function, are defined. Enrichment in GO categories that match the known functional activity of the seed RE allows us to evaluate the sensitivity of genome scans. Next, the analysis using the GNF Expression Atlas 2 (14) is performed to define the tissue specificity of the genes. Comparative analysis of the expression of identified genes versus the average expression of all the genes in the genome highlights a subset of tissues, in which the genes bracketing the identified SREs are preferentially expressed.
Figure 1

The schematic profile of SynoR genome scans and data analysis.

A priori knowledge of TFs or TFBS modules involved in a particular biochemical process [such as neuronal development (18), heart formation (19) and muscle development (20)] is helpful in determining the seed signatures used in the SynoR scans. Studies that generate additional positional sequence information for active, multiple TFBS (10,21) can be effectively used to establish the configuration of spatial constraints and TFBS ordering, thus increasing the specificity of a SynoR search.

SREs associated with synergistic activation of gene expression in cardiac myocytes

The GNF Expression Atlas 2 summarizes the expression patterns of human, mouse and rat genes in several selected tissues using whole-genome microarray experiments (14). These data provide evidence of tissue specificity of genes bracketing predicted SREs. If a particular SRE is associated with a gene expressed in a set of defined tissues, for example, these tissues should also correspond to the expression pattern of the candidate genes sharing the SRE motif identified by SynoR. To assess the applicability of SynoR's tissue specificity analysis of predicted SREs, we scanned the human genome for combinatorial modules of two cTFBS, SRF (serum response factor) and SP1. Multiple lines of evidence support the notion that these TFs cooperatively participate in orchestrating gene expression in the heart and the vascular tissues (22,23). We applied SynoR to predict targets of synergistic SRF/SP1 gene regulation in the human genome using as a seed motif the presence of these TFBS separated by <40 bp. Humanmouse conservation threshold was utilized. A total of 114 non-coding modules were identified in this scan, 23 (20%) of which overlapped with the promoter regions. Taking into account the density of human/mouse ECRs in the human genome (24), the probability of this high ratio of elements associated with promoters by chance is <10−5, suggesting an enrichment in functional SREs identified in this scan. Expression analysis of the genes that either contain or flank the identified SREs presented a very distinct tissue specificity of these genes. Sixty-four percent of them (88 out of 138) are specifically expressed in cardiac myocytes while others are expressed in smooth muscle, heart and other tissues (Figure 2). This general observation is in agreement with the experimental data on expression of the studied TFs supporting the notion that GNF Expression Atlas 2 data integrated with SynoR predictions may provide an effective and straightforward annotation of tissue specificity for the identified elements and the search patterns. Together, these data support the idea that using this pair of cTFBS as a seed for genome-wide scan successfully identifies SREs, which are most probably responsible for the shared pattern of expression of their corresponding genes. Further studies are required to assess the in vivo functional activity of these elements and to investigate their possible roles in cardiovascular diseases.

Features categorization

To illustrate the application of SynoR's categorization of identified elements based on gene annotation, we scanned the human genome for a module of three NRSF human/mouse cTFBS. NRSF (neuron-restrictive silencer factor) plays a key role in neuronal differentiation (25) and mediation of transcriptional repression of neuron-specific genes in non-neuronal cells (26). Ten noncoding modules were identified, of which three were within promoters, four in introns, and four in intergenic intervals. One of the three promoters corresponds to that of Barhl1, a gene associated with neuronal migration (27), in an expression resembling that of the NRSF regulatory pathway. The remaining two promoters identified in this scan correspond to uncharacterized genes, and these results raise, thus, the possibility that these genes represent new members of the NRSF pathway.

DISCUSSION

The identification of non-coding sequences conserved among vertebrates has served as the most important pillar leading to the identification of functional gene REs in the human genome (1,2). Here we introduce a new tool based on cTFBS, named SynoR that performs genome scans for REs with shared biological activity. The fundamental inference behind the conceptualization of SynoR is that regulatory elements with similar function (SREs) operate under similar organizational principles, the modular distribution of a defined set of TFBS. This principle has been previously validated in several eukaryotes including yeast, worm and flies, and recent evidence suggests that the SREs can also be identified in humans (7). Our results support this notion, and SynoR was created to represent a publicly available tool for the search of SREs allowing for a broad range of user-defined options guiding the search of SREs in multiple regulatory pathways. SynoR is endowed with multiple checkpoint mechanisms to define the precise functional annotation of the identified elements, which include multispecies evolutionary conservation analysis, GO functional characterization and GNF Expression Atlas 2 analysis of tissue specificity of the genes bracketing the identified SREs. The results provided by SynoR allow for the immediate comparison between the functions of the genes in the vicinity of the identified SREs, ensuring the reliability of SynoR genome scans, while presenting the use with a categorized set of identified elements with distinct functions and evolutionary traits. In summary, we present a strategy to identify SREs in eukaryotic genomes, and describe the design of a new tool, SynoR aiding in the identification of noncoding sequences that are most likely to correspond to regulatory elements, which can be tested in the laboratory.
  26 in total

1.  TRANSFAC: an integrated system for gene expression regulation.

Authors:  E Wingender; X Chen; R Hehl; H Karas; I Liebich; V Matys; T Meinhardt; M Prüss; I Reuter; F Schacherer
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Scanning human gene deserts for long-range enhancers.

Authors:  Marcelo A Nobrega; Ivan Ovcharenko; Veena Afzal; Edward M Rubin
Journal:  Science       Date:  2003-10-17       Impact factor: 47.728

3.  Combinatorial code of growth factors and neuropeptides define neuroendocrine differentiation in PC12 cells.

Authors:  Delphine Beaujean; Claudia Rosenbaum; Hans-Werner Müller; Jacques J Willemsen; Jacques Lenders; Stefan R Bornstein
Journal:  Exp Neurol       Date:  2003-11       Impact factor: 5.330

4.  The UCSC Genome Browser Database.

Authors:  D Karolchik; R Baertsch; M Diekhans; T S Furey; A Hinrichs; Y T Lu; K M Roskin; M Schwartz; C W Sugnet; D J Thomas; R J Weber; D Haussler; W J Kent
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

5.  Toucan: deciphering the cis-regulatory logic of coregulated genes.

Authors:  Stein Aerts; Gert Thijs; Bert Coessens; Mik Staes; Yves Moreau; Bart De Moor
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

6.  A murine model of Holt-Oram syndrome defines roles of the T-box transcription factor Tbx5 in cardiogenesis and disease.

Authors:  B G Bruneau; G Nemer; J P Schmitt; F Charron; L Robitaille; S Caron; D A Conner; M Gessler; M Nemer; C E Seidman; J G Seidman
Journal:  Cell       Date:  2001-09-21       Impact factor: 41.582

7.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons.

Authors:  G G Loots; R M Locksley; C M Blankespoor; Z E Wang; W Miller; E M Rubin; K A Frazer
Journal:  Science       Date:  2000-04-07       Impact factor: 47.728

8.  Cardiac-specific activity of an Nkx2-5 enhancer requires an evolutionarily conserved Smad binding site.

Authors:  Ching-Ling Lien; John McAnally; James A Richardson; Eric N Olson
Journal:  Dev Biol       Date:  2002-04-15       Impact factor: 3.582

9.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

10.  Barhl1, a gene belonging to a new subfamily of mammalian homeobox genes, is expressed in migrating neurons of the CNS.

Authors:  A Bulfone; E Menguzzato; V Broccoli; A Marchitiello; C Gattuso; M Mariani; G G Consalez; S Martinez; A Ballabio; S Banfi
Journal:  Hum Mol Genet       Date:  2000-05-22       Impact factor: 6.150

View more
  13 in total

1.  Intersection of FOXO- and RUNX1-mediated gene expression programs in single breast epithelial cells during morphogenesis and tumor progression.

Authors:  Lixin Wang; Joan S Brugge; Kevin A Janes
Journal:  Proc Natl Acad Sci U S A       Date:  2011-08-22       Impact factor: 11.205

Review 2.  Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

Authors:  Gabriela G Loots
Journal:  Adv Genet       Date:  2008       Impact factor: 1.944

3.  Distinct enhancers at the Pax3 locus can function redundantly to regulate neural tube and neural crest expressions.

Authors:  Karl R Degenhardt; Rita C Milewski; Arun Padmanabhan; Mayumi Miller; Manvendra K Singh; Deborah Lang; Kurt A Engleka; Meilin Wu; Jun Li; Diane Zhou; Nicole Antonucci; Li Li; Jonathan A Epstein
Journal:  Dev Biol       Date:  2010-01-04       Impact factor: 3.582

4.  MED12 regulates a transcriptional network of calcium-handling genes in the heart.

Authors:  Kedryn K Baskin; Catherine A Makarewich; Susan M DeLeon; Wenduo Ye; Beibei Chen; Nadine Beetz; Heinrich Schrewe; Rhonda Bassel-Duby; Eric N Olson
Journal:  JCI Insight       Date:  2017-07-20

5.  Isochores and the regulation of gene expression in the human genome.

Authors:  Stilianos Arhondakis; Fabio Auletta; Giorgio Bernardi
Journal:  Genome Biol Evol       Date:  2011       Impact factor: 3.416

6.  Gli2 and MEF2C activate each other's expression and function synergistically during cardiomyogenesis in vitro.

Authors:  Anastassia Voronova; Ashraf Al Madhoun; Anna Fischer; Michael Shelton; Christina Karamboulas; Ilona Sylvia Skerjanc
Journal:  Nucleic Acids Res       Date:  2011-12-22       Impact factor: 16.971

7.  Microarray analysis of Foxa2 mutant mouse embryos reveals novel gene expression and inductive roles for the gastrula organizer and its derivatives.

Authors:  Owen J Tamplin; Doris Kinzel; Brian J Cox; Christine E Bell; Janet Rossant; Heiko Lickert
Journal:  BMC Genomics       Date:  2008-10-30       Impact factor: 3.969

Review 8.  Integrating sequence, evolution and functional genomics in regulatory genomics.

Authors:  Martin Vingron; Alvis Brazma; Richard Coulson; Jacques van Helden; Thomas Manke; Kimmo Palin; Olivier Sand; Esko Ukkonen
Journal:  Genome Biol       Date:  2009-01-30       Impact factor: 13.583

9.  Enrichment of conserved synaptic activity-responsive element in neuronal genes predicts a coordinated response of MEF2, CREB and SRF.

Authors:  Fernanda M Rodríguez-Tornos; Iñigo San Aniceto; Beatriz Cubelos; Marta Nieto
Journal:  PLoS One       Date:  2013-01-31       Impact factor: 3.240

10.  Hedgehog signaling regulates MyoD expression and activity.

Authors:  Anastassia Voronova; Erin Coyne; Ashraf Al Madhoun; Joel V Fair; Neven Bosiljcic; Catherine St-Louis; Grace Li; Sherry Thurig; Valerie A Wallace; Nadine Wiper-Bergeron; Ilona S Skerjanc
Journal:  J Biol Chem       Date:  2012-12-24       Impact factor: 5.157

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.