Literature DB >> 24148943

Uncovering the functional constraints underlying the genomic organization of the odorant-binding protein genes.

Pablo Librado1, Julio Rozas.   

Abstract

Animal olfactory systems have a critical role for the survival and reproduction of individuals. In insects, the odorant-binding proteins (OBPs) are encoded by a moderately sized gene family, and mediate the first steps of the olfactory processing. Most OBPs are organized in clusters of a few paralogs, which are conserved over time. Currently, the biological mechanism explaining the close physical proximity among OBPs is not yet established. Here, we conducted a comprehensive study aiming to gain insights into the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large conserved arrangements. These organizations also include other non-OBP genes, which often encode proteins integral to plasma membrane. Moreover, the conservation degree of such large clusters is related to the following: 1) the promoter architecture of the confined genes, 2) a characteristic transcriptional environment, and 3) the chromatin conformation of the chromosomal region. Our results suggest that chromatin domains may restrict the location of OBP genes to regions having the appropriate transcriptional environment, leading to the OBP cluster structure. However, the appropriate transcriptional environment for OBP and the other neighbor genes is not dominated by reduced levels of expression noise. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.

Entities:  

Keywords:  chemosensory system; chromatin domain; expression noise; gene cluster constraint; olfactory reception

Mesh:

Substances:

Year:  2013        PMID: 24148943      PMCID: PMC3845639          DOI: 10.1093/gbe/evt158

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Animal olfactory systems allow for the detection of food, predators, and mates, and thus demonstrating a critical role for the survival and reproduction of individuals (Krieger and Ross 2002; Matsuo et al. 2007). In Drosophila, the early steps of odor processing occur in chemosensory hairs (i.e., the sensilla), which are located in the third antennal segment and the maxillary palp. The main biochemical events include the uptake of volatile molecules through the cuticle pores, transport across the sensilla lymph, and interaction with olfactory receptors. The latter steps are mediated by the odorant-binding proteins (OBPs), which may have an active role in olfactory coding such as contributing to odor discrimination (Swarup et al. 2011) and receptor activation (Laughlin et al. 2008; Biessmann et al. 2010). OBPs are small (10–30 kDa; 130–220 aa long), highly abundant, globular, and water-soluble proteins (Kruse et al. 2003; Tegoni et al. 2004). These molecules are encoded by a moderately sized multigene family (in the 12 Drosophila species, the number of OBP members range from 41 to 62), with an evolution that is consistent with the birth-and-death model (Vieira et al. 2007). In arthropods, most OBP genes are organized in clusters of a few paralogs (Hekmat-Scafe et al. 2002; Foret and Maleszka 2006), an arrangement that is moreover conserved over time (Vieira and Rozas 2011). Nevertheless, it is not well established whether the conservation of these OBP clusters represent the outcome of an uneven distribution of chromosomal rearrangement breakpoints, or rather they are constrained by natural selection for some functional meaning (Zhou et al. 2009; Sanchez-Gracia and Rozas 2011; Vieira and Rozas 2011). For example, functionally linked genes, such as those encoding subunits of the same complex (Chamaon et al. 2002), proteins of the same pathway (Lee and Sonnhammer 2003), or genes with expression patterns restricted to the head, embryo, or testes (Boutanaev et al. 2002) are often clustered in the Drosophila melanogaster genome. As clusters of functionally linked genes may include nonhomologous members, the OBP gene organization may be preserved by functional constraints imposed from neighboring genes. The presence of shared cis-regulatory elements, such as bidirectional promoters or pleiotropic enhancers, may explain the OBP gene organization (Li et al. 2006; Yang and Yu 2009). For example, central regions of some Drosophila syntenic clusters are enriched for highly conserved noncoding elements that regulate the transcription of genes with the appropriate composition of core promoter elements (CPEs) (Engstrom et al. 2007). Notably, the CPE composition and expression pattern are two features characterizing the broad and peaked promoter architectures (a classification based on the distribution of transcription start sites) (Hoskins et al. 2011). Although genes with peaked promoters are often expressed in specific tissues or developmental stages, those with broad promoters usually have constitutive transcription (Kharchenko et al. 2011; Rach et al. 2011). Therefore, shared cis-regulatory elements may differentially restrict the movement of genes with particular promoter architectures or transcriptional patterns. Chromatin conformation (Filion et al. 2010; Kharchenko et al. 2011) could also affect gene organization given its role in the regulation of gene expression (i.e., the so-called position effect). For example, human unfolded chromatin (30-nm chromatin fibers) encompasses high-density gene regions (Gilbert et al. 2004), which usually exhibit elevated expression breadth (EB) (Caron et al. 2001; Lercher et al. 2002). Interestingly, transcriptional activation after chromatin unfolding induces stochastic fluctuations in transcript abundance (i.e., expression noise [EN]) (Becskei et al. 2005). Such EN is often deleterious, particularly for broadly expressed genes, because it yields imbalances in the stoichiometry of proteins (Fraser et al. 2004). These features led Batada and Hurst (2007) to hypothesize that broadly expressed genes are clustered in regions of constitutively unfolded chromatin to minimize EN. Several lines of thought support this model. For example, as head-to-head gene pairs share their promoter regions, a chromatin unfolding event can facilitate the transcriptional activation of both genes. Therefore, chromatin unfolding events will be less frequent in head-to-head than other gene pair arrangements, leading to reduced levels of EN (Wang et al. 2010). Because EN is often deleterious, natural selection may favor the maintenance of the head-to-head gene pair organization in clusters. Chromosomal proteins determining the chromatin state, such as nuclear membrane (Capelson et al. 2010; Vaquerizas et al. 2010), insulators (Maeda and Karch 2007; Wallace et al. 2009; Negre et al. 2010), and chromatin remodeling (Kalmykova et al. 2005; Li and Reinberg 2011) proteins, may therefore play a relevant role in maintaining gene clusters. In this regard, the function of the JIL-1 protein kinase deserves special attention for its role in defining the decondensed interbands of polytene chromosomes, which characterize active and unfolded chromatin (Jin et al. 1999; Regnard et al. 2011; Kellner et al. 2012). Moreover, JIL-1 kinase, which phosphorylates Serine 10 and 28 at Histone 3, physically interacts with the lamin Dm0 (a structural nuclear membrane protein) (Bao et al. 2005) and Chromator (localized in the spindle matrix of the nucleosketeleton) (Gan et al. 2011) proteins. Recently, the lamin Dm0 protein has been shown to colocalize with conserved microsynteny in Drosophila (Ranz et al. 2011), whereas Chromator changes the chromatin folding state (Rath et al. 2006). Therefore, high-order regulatory mechanisms involving chromatin conformation may underlie the conservation of some gene clusters. Here, we analyzed the mechanisms underlying the OBP genomic organization. We found that the OBP clusters are embedded within large arrangements, which also include other non-OBP genes. The conservation degree of such large arrangements is moreover related to a number of functional and expression features, such as a transcriptional environment not dominated by reduced levels of EN. Indeed, the stochastic fluctuations in the OBP transcript abundance may have a critical role in the combinatorial nature of the olfactory coding process.

Materials and Methods

DNA Sequence Data and Assignment of Orthologous Groups

We downloaded the D. melanogaster gene and protein sequences and their orthologous relationships (release fb_2011_04) with the additional 11 Drosophila species (Drosophila 12 Genomes 2007) from FlyBase (release 5.40). The orthology data set contains predicted and curated pairwise relationships between the Drosophila species (i.e., one-to-one, one-to-many, and many-to-many relationships). We clustered these ortholog pairs into groups with multiple species using the Markov Clustering Algorithm software with default parameters (inflation = 2 and scheme = 7).

Gene Clustering

We define a conserved cluster as a group of neighbor genes maintained over time; this definition allowed us to study clusters of linked genes, regardless whether they are homologous. To infer such conserved gene clusters, we used the MCMuSeC software (Ling et al. 2009), which permits that clusters can undergo internal rearrangement events (Luc et al. 2003), as well as tandem gene duplications (recent duplicates originated from members of the same cluster). For each inferred cluster, we measured the conservation level as the branch length score (BLS), that is, the total divergence time (Tamura et al. 2004) since the cluster origin. The larger the BLS value, the more ancient the gene cluster. We evaluated the significance of each BLS value separately for each cluster size (n). Indeed, small-sized clusters (with a low number of genes) have a lower probability to be disrupted by chromosomal rearrangements than larger ones. For each cluster size, we generated an empirical null distribution of the expected BLS value by randomly sampling 10,000 groups of n contiguous D. melanogaster genes, and the BLS values were computed across the information of the 12 Drosophila species. We defined the probability of an observed BLS value (pBLS) as the fraction of sampled clusters with a BLS value lower than or equal to the observed (supplementary table S1, Supplementary Material online). We also used computer simulations to examine whether the chromatin and expression factors that correlate with the pBLS value (e.g., JIL-1 binding intensity or EN) are specific constraints of the OBP gene organization, or correspond to genome-wide characteristics. We generated null empirical distributions by randomly sampling 10,000 replicates of 31 D. melanogaster clusters without OBP genes, but with the same number of genes and similar pBLS (±0.01) as that observed for clusters including OBP genes. For each replicate, we calculated the correlation between the characteristic chromatin and expression factors and the pBLS value. The probability of an observed correlation (P value) was estimated as the proportion of samples with correlation values higher than the observed. A low probability (i.e., P < 0.05) value indicates that the surveyed factor is not as common among the genome-wide Drosophila gene clusters as it is in the clusters including OBP genes.

Expression Data

We obtained gene expression data for all of the D. melanogaster genes from FlyAtlas (Chintapalli et al. 2007). We used the whole fly expression intensity (EI) information, and all of the 26 conditions incorporated in FlyAtlas, including larval and adult tissues. We considered that a gene is transcribed if the present call value was greater than zero. In addition to the EI value, we also computed the EB as the fraction of tissues where the gene is transcribed (regardless of the expression level in a given tissue), the sex-specific expression (SSE) as the transcription in sexual tissues (i.e., testis, ovary, male accessory glands, virgin spermateca, and mated spermateca) relative to the rest of tissues, and the EN as the coefficient of variation (COV) of the EI values. As the FlyAtlas expression data were determined from highly inbred flies (the Canton-S stock) reared at homogeneous conditions (22 °C with a 12 h:12 h light regime), the COV values are not explained by differences in the genetic or environmental background, but rather represent an excellent proxy to evaluate the stochastic fluctuations in transcript abundance (EN). The mean expression measures for each cluster were calculated as the average expression values of the spanned genes.

Functional Genomic Data

The ChIP-chip binding intensity for the JIL-1 protein and the nine chromatin states defined in Kharchenko et al. (2011) were downloaded from the modENCODE project database (BG3 D. melanogaster cell line). The nine-state chromatin model classifies each D. melanogaster nucleotide position into one out of nine chromatin states (i.e., Promoter and TSS, Transcription elongation [TE], Regulatory regions, Open chromatin, Active genes on the male X chromosome, Polycomb-mediated repression, Pericentromeric heterochromatin, Heterochromatin-like embedded in euchromatin, and Transcriptionally silent, intergenic) on the basis of the combinatorial profile of 18 histone marks (Kharchenko et al. 2011). The promoter architecture information, which integrates cap analysis of gene expression (CAGE), RNA ligase mediated rapid amplification of cDNA ends (RLM-RACE) and cap-trapped expressed sequence tags data, was obtained from Hoskins et al. (2011). We performed the promoter analysis using all promoter annotations, but also confirmed the results by restricting the analysis to promoters with only validated support (evidence from two or more data types; e.g., CAGE and RLM-RACE). We used the FlyBase Gene Ontology (GO) annotation (release fb_2011_04) to gauge whether genes clustered with OBP genes are functionally related. We analyzed the GO overrepresentation using the Topology-Elim algorithm (Grossmann et al. 2007), which considers the hierarchical dependencies of the GO terms, and was implemented in the Ontologizer 2.0 software (Bauer et al. 2008).

Phylogeny-Based Analysis

The age of the genes (the divergence time since its origin) is a relevant factor to be considered when analyzing the mechanisms involved in gene cluster conservation. For example, recent gene duplications usually evolve faster than older ones (Luz et al. 2006) and often exhibit an SSE pattern. Moreover, the maximum BLS value of a particular cluster depends on the age of the encompassed genes. We inferred the maximum BLS cluster value as the minimum age of the encompassed genes, using the topological dating approach (Huerta-Cepas and Gabaldon 2011) with the BadiRate software (Librado et al. 2012).

Statistical Multivariate Analysis

We examined the relationships among the pBLS and a number of genomic and gene expression factors by different association tests (supplementary table S2, Supplementary Material online). On the one hand, we analyzed bivariate associations by using the following: 1) the Wilcoxon exact test, 2) the Pearson correlation coefficient, 3) the Spearman’s rank correlation coefficient, and 4) the maximal information coefficient (MIC) (Reshef et al. 2011). We used the Wilcoxon exact test to compare clusters with low (<0.90) and high (>0.99) pBLS values. As this test requires a categorization of a continuous variable (the pBLS value), it is often conservative. For this reason, we also computed the Pearson correlation coefficient, which captures the linear continuous dependence between variables. Nevertheless, the Pearson correlation coefficient is very sensitive to outliers and skewed distributions, which may generate spurious associations between variables. Indeed, the assumptions required to calculate the probability associated to the Pearson correlation coefficient may not hold in our data; for instance, the pBLS values are not normally distributed (Kolmogorov–Smirnov test: P < 2.2e−16). In such case, the Spearman’s rank correlation coefficient is recommendable. This test, however, is not without problems, such as the use of the midrank approach for handling ties. The MIC-based test does not assume normality of the data and allows detecting a wide range of bivariate associations, including monotonic (e.g., linear, exponential) and nonmonotonic (e.g., sinusoidal) relationships. However, the P value of the MIC score can only be obtained by simulations. Currently only a few precomputed tables are available, which precludes computing exact P values, especially for our genome-wide data set (sample size of 3,434). Given these pros and cons, we reported the Spearman’s rank correlation coefficient throughout the manuscript. In addition, it is worth noting that all conclusions extracted from the Spearman’s rank correlation coefficient were also supported by other tests, especially the main findings (supplementary table S2, Supplementary Material online). On the other hand, as the examined variables are clearly intercorrelated, we also conducted a partial correlation and a path analysis. We assessed the goodness of fit of our empirical data to the underlying path model by evaluating the chi-squared significance. The Wilcoxon exact test, the Pearson, and the Spearman’s correlation coefficients, as well as the partial correlation and the path analysis were performed using the R programming language (version 2.7.2). The MIC score was computed using the Java binary provided by the authors, and its P values were determined using the precomputed tables available at the MINE web site. We conducted the multiple testing correction using the Benjamini–Hochberg procedure (Benjamini and Hochberg 1995) at a 5% of false discovery rate (FDR), which was implemented in the multtest package of the R programming language. We also used in-house developed Perl scripts for handling all genomic and expression data files.

Results

Gene Cluster Identification

We inferred a total of 31 conserved clusters that include both OBP and other nonhomologous genes (see Materials and Methods; table 1). These 31 clusters are maintained, on average, in 5.9 Drosophila species, comprise a mean of 8.3 genes and, more importantly, recover most of the OBP clusters defined in Vieira et al. (2007). For example, the cluster with highest gene density comprises four OBP genes (Obp19a, Obp19b, Obp19c, and Obp19d; cluster 1 in Vieira et al. [2007]) and one non-OBP gene in 7,330 bp. This cluster has been detected in 11 species, having a pBLS (cluster constraint probability) value of 0.995, and an adjusted pBLS (after correcting for the FDR [Benjamini and Hochberg 1995]) of 0.977 (table 1). In total, 14 of these clusters are significant (pBLS > 0.95), although only 10 remain after correcting for multiple testing (adjusted pBLS > 0.95). Therefore, these clusters are likely to be under functional constraints.
Table 1

The Drosophila melanogaster Clusters Including OBP Genes

D. melanogaster Gene Cluster RegionNo. of GenesNo. of OBPsNo. of Genomes ConservedpBLSAdjusted pBLS
Obp8aX:9100153 … 91114014180.9257190.872071
Obp18aX:19029114 … 190646753120.7330750.714666
Obp19a-dX:20284679 … 2029200954110.9954590.976539*
Obp22a2L:1991705 … 20089664140.7348120.714666
Obp28a2L:7426866 … 749736010130.9309500.874085
Obp44a2R:4018938 … 402258821120.9214120.871778
Obp46a2R:6194535 … 62094054190.9457670.887918
Obp47a2R:6785747 … 68292064150.8937600.843170
Obp47b2R:7189426 … 719733441120.9920880.964959*
Obp49a2R:8574114 … 864502810170.9974710.983415*
Obp50a-c2R:10257836 … 102605113360.7999920.753622
Obp50d2R:10257836 … 102612644150.7933600.753622
Obp50e2R:10262077 … 102990775150.8346100.786371
Obp51a2R:10911880 … 109437462140.6035380.603538
Obp56a-c2R:15585228 … 1558857333110.9377640.879417
Obp56d-f2R:15573111 … 156023739330.8957670.843170
Obp56g2R:15656966 … 156715252190.7477670.714666
Obp56h2R:15703059 … 1572047321100.8407400.786371
Obp56i2R:15703059 … 157684254130.6877170.676687
Obp57a-c2R:16391061 … 1642681910340.9514380.892469
Obp57d-e2R:16413832 … 1644983415220.9593500.903065
Obp58b-d; Obp59a2R:18554661 … 1859521911450.9880700.958908*
Obp69a3L:12332216 … 124108037190.9903560.962628*
Obp73a2R:5950890 … 60049626190.9862280.957306*
Obp76a3L:19561538 … 1968309220130.9999830.999483*
Obp83a-b3R:1786045 … 18529626240.8396880.786371
Obp83cd; Obp83ef; Obp83g3R:1880432 … 212937529330.9999670.999483*
Obp84a3R:3050136 … 311335412160.9985750.985275*
Obp93a3R:16774436 … 1696608733120.9973250.983415*
Obp99a3R:25456026 … 255011417450.9764600.933660
Obp99b-d3R:25444756 … 2554811117320.970250.923146
Average8.31.75.9

Note.—The “no. of genes” and “no. of OBPs” columns indicate the total number of protein coding and OBP genes in the clusters, respectively. The “no. of genomes conserved” column represents the number of Drosophila species where the gene cluster region is identified.

*Significant clusters (adjusted pBLS > 0.95).

The Drosophila melanogaster Clusters Including OBP Genes Note.—The “no. of genes” and “no. of OBPs” columns indicate the total number of protein coding and OBP genes in the clusters, respectively. The “no. of genomes conserved” column represents the number of Drosophila species where the gene cluster region is identified. *Significant clusters (adjusted pBLS > 0.95). To determine specific features of the OBP gene organization, we compared clusters including OBP genes with all clusters identified in the Drosophila genomes. We inferred a total of 3,434 clusters (supplementary table S1, Supplementary Material online) that, on average, are conserved in 5.9 Drosophila species and encompass 6.4 genes (fig. 1). A total of 1,290 of the 3,434 clusters have a pBLS higher than 0.95, although only 58 remain significant after controlling for FDR. Because the FDR correction constitutes a conservative criterion (i.e., FDR methodologies reduce its statistical power as the number of tests increases [Carvajal-Rodriguez et al. 2009]), the actual number of clusters under functional constraint is likely to be higher than these 58 cases. Given that the raw pBLS value, which is not adjusted for multiple testing, is a continuous estimate of the cluster constraint strength, classifying clusters into significant and nonsignificant unbalanced categories will yield a further loss of statistical power (Pearson 1913). To avoid the negative effects of categorization, we analyzed the effect of competing factors on raw pBLS estimates using different association measures (supplementary table S2, Supplementary Material online), although only the values of the Spearman’s rank correlation coefficient are reported throughout the manuscript.
F

Frequency distribution of the 3,434 Drosophila clusters. Frequency distribution of the 3,434 Drosophila clusters, which is conditioned on the cluster size (i.e., number of genes per cluster) and the BLS value (total time of cluster conservation in million years ago). The 58 significant clusters after correcting for multiple testing are depicted in red.

Frequency distribution of the 3,434 Drosophila clusters. Frequency distribution of the 3,434 Drosophila clusters, which is conditioned on the cluster size (i.e., number of genes per cluster) and the BLS value (total time of cluster conservation in million years ago). The 58 significant clusters after correcting for multiple testing are depicted in red.

Genes Clustered with OBP Genes Encode Plasma Membrane Proteins

We studied the existence of functional relationships among the genes clustered with OBPs by GO enrichment analysis (in total, 198 non-OBP genes). We compared the functionally annotated non-OBP genes in the 31 focal clusters (162 out of the 198 genes have GO annotations) with those present in all of the 3,434 Drosophila clusters (9,353 out of 11,811 genes). We found that the most characteristic GO terms among the genes clustered with OBPs are regulation of neurotransmitter transport, sodium channel activity, axon, neurotransmitter receptor activity, and integral to plasma membrane. After multiple testing correction (Benjamini and Hochberg 1995), only the latter category remained significant (hypergeometric test, P = 1.34e−15; table 2). As this analysis does not take into account the pBLS value of the clusters, we also separately reanalyzed the data from three different pBLS bins, each containing a similar number of genes. Notably, we found that the integral to plasma membrane GO term is enriched among the genes most conserving their neighborhood with the OBP genes.
Table 2

The 15 GO Terms Most Overrepresented among Genes Clustered with OBPs

GO TermNo. of Population CountNo. of Sample CountP ValueAdjusted P Value
Integral to plasma membrane180226.78e−141.06e−10*
Sodium channel activity3540.00220.4634
GTPase activator activity6250.00300.4634
Retinal binding620.00360.4634
Phototransduction4140.00400.4634
Metal ion transport13070.00470.4634
Monovalent inorganic cation transport13770.00620.4634
Locomotion253100.00710.4634
Neurotransmitter receptor activity4940.00750.4634
Locomotory behavior14470.00810.4634
Axon5240.00930.4634
Regulation of neurotransmitter secretion1020.01040.4634
Regulation of neurotransmitter transport1020.01040.4634
Sodium ion transport5640.01200.4634
Calcium-dependent phospholipid binding1120.01260.4634

Note.—The “Population Count” and “Sample Count” columns indicate the number of genes with GO annotation in the population (9,353 genes in the 3,434 Drosophila clusters) and sample (162 in genes clustered with OBPs), respectively. The “P value” column indicates the probability of observing such number of genes in the sample, given the number of genes in the population. *Overrepresented GO terms (adjusted P < 0.05).

The 15 GO Terms Most Overrepresented among Genes Clustered with OBPs Note.—The “Population Count” and “Sample Count” columns indicate the number of genes with GO annotation in the population (9,353 genes in the 3,434 Drosophila clusters) and sample (162 in genes clustered with OBPs), respectively. The “P value” column indicates the probability of observing such number of genes in the sample, given the number of genes in the population. *Overrepresented GO terms (adjusted P < 0.05).

The Cluster Conservation Correlates with the Type of Cis-Regulatory Elements

We analyzed the relevance of cis-regulatory elements in maintaining clusters including OBP genes. In particular, we examined whether the pBLS value of such clusters is associated with the promoter architecture of the confined genes (i.e., the peaked or broad promoters as a proxy for the type of CPEs [Hoskins et al. 2011]). We found a significant correlation (Spearman’s rank correlation coefficient: ρ = 0.415, P = 0.044; table 3), that is, the higher the pBLS value, the higher the proportion of broad-type promoters. Remarkably, this trend is also observed for all of the 3,434 Drosophila clusters (Spearman’s rank correlation coefficient: ρ = 0.044, P = 0.016), indicating that gene clusters may have distinctive cis-regulatory elements.
Table 3

Summary of the Associations between pBLS and EB, EI, and EN

OBP Clusters
Clusters with OBPs
All Clusters
BCPCBCPAPA
EBρ = 0.099 (P = 0.596)t = 1.770 (P = 0.089)ρ = 0.548 (P = 0.001)β = 0.423 (P = 0.004)β = 0.114 (P = 2.3e−9)
EIρ = −0.197 (P = 0.288)t = −2.831 (P = 0.009)ρ = 0.087 (P = 0.641)β = −0.032 (P = 0.821)β = 0.201 (P < 2e−16)
ENρ = 0.138 (P = 0.458)t = 2.382 (P = 0.025)ρ = 0.403 (P = 0.024)β = 0.290 (P = 0.043)β = 0.011 (P = 0.489)

Note.—Relationship between pBLS and the EB, EI, and EN. The “OBP clusters,” “Clusters with OBPs” and “All clusters” columns show results for clusters of OBP genes, for clusters including OBP genes, and for all 3,434 Drosophila clusters, respectively. “BC,” “PC,” and “PA” stand for bivariate correlation, partial correlation, and path analysis, respectively.

Summary of the Associations between pBLS and EB, EI, and EN Note.—Relationship between pBLS and the EB, EI, and EN. The “OBP clusters,” “Clusters with OBPs” and “All clusters” columns show results for clusters of OBP genes, for clusters including OBP genes, and for all 3,434 Drosophila clusters, respectively. “BC,” “PC,” and “PA” stand for bivariate correlation, partial correlation, and path analysis, respectively. The presence of the cis-regulatory elements shared among genes can restrict the movement of the target genes. For example, genes transcribed from shared promoters are common in many species, resulting in an excess of head-to-head gene pair arrangements (Trinklein et al. 2004; Kensche et al. 2008; Xu et al. 2009). We analyzed whether clusters including OBP genes have distinctive head-to-head, tail-to-tail, or head-to-tail gene pair organizations, but we detected no significant correlation with their pBLS value (Spearman’s rank correlation coefficient, P > 0.05; supplementary fig. S1, Supplementary Material online). In contrast, the results of the genome-wide analysis (including all 3,434 Drosophila clusters) were all significant (Spearman’s rank correlation coefficient: ρ = −0.095, P = 4.85e−8; ρ = 0.214, P < 2e−16; ρ = 0.110, P < 2.92e−10 for the head-to-tail, tail-to-tail and head-to-head gene pair arrangements, respectively). Therefore, the sharing of cis-regulatory elements between contiguous genes is not a major factor in explaining the maintenance of OBP gene organization.

EB and EN Are Associated with the Conservation of Clusters That Include OBP Genes

As genes with broad-type promoters are often broadly expressed (Hoskins et al. 2011), we examined expression pattern effects on cluster conservation. We found that the pBLS value of the clusters including OBP genes significantly correlates with EB (Spearman’s rank correlation coefficient: ρ = 0.548, P = 0.001) and EN (Spearman’s rank correlation coefficient: ρ = 0.403, P = 0.024), but not with EI (Spearman’s rank correlation coefficient: ρ = 0.087, P = 0.641) (table 3). Nevertheless, these variables are highly intercorrelated: broadly expressed genes often exhibit high EI (Newman et al. 2006) and low EN (Lehner 2008). In addition, other factors, such as gene age (GA), may also hinder the causes of cluster conservation. For example, newly arising genes exhibit low EI and high gene loss rates (Wolf et al. 2009). We determined the causal relationships among the factors involved in the OBP gene organization using path analysis (fig. 2), and assigning GA as the exogenous variable (i.e., not affected by factors of the underlying model). After factoring out the intercorrelated variables, EB (β = 0.423, P = 0.004) and EN (β = 0.290, P = 0.043) remained significant, that is, clusters including OBP genes are expressed in many tissues, exhibiting high stochastic fluctuations in transcript abundance, regardless of their EI (β = −0.032, P = 0.821). Interestingly, this result differs from the genome-wide analyses (3,434 clusters), where the pBLS value is affected by the EI (β = 0.201, P < 2e−16) and EB (β = 0.114, P = 2e−9), but not by the EN (β = 0.011, P = 0.489). However, the transcriptional effects on both data sets (including or not OBP genes) are not directly comparable, because they contain a different number and type of clusters. To evaluate whether EI and EN are specific features of the OBP gene organization, we thus performed computer simulations. We found that the EN effect (path coefficient from EN to pBLS) is higher for clusters including OBP genes than for random samples of 31 comparable clusters (P = 0.035), whereas the EI effect is lower (P = 0.034). Unlike comparable genome-wide clusters, clusters with OBP genes are not only influenced by EB but also by the EN, which does not support the clustering model of EN minimization.
F

Transcriptional environment in clusters that include OBP genes. Path analysis model for the causal relationships among cluster constraint probability (pBLS), the minimum age of a gene in the cluster (GA), the EB, the EI, and the EN. The GA is the exogenous variable. The numbers on the lines indicate the path coefficients. Solid and dashed arrows represent significant and nonsignificant relationships.

Transcriptional environment in clusters that include OBP genes. Path analysis model for the causal relationships among cluster constraint probability (pBLS), the minimum age of a gene in the cluster (GA), the EB, the EI, and the EN. The GA is the exogenous variable. The numbers on the lines indicate the path coefficients. Solid and dashed arrows represent significant and nonsignificant relationships.

OBP Genes in Conserved Clusters Also Exhibit Elevated Levels of EN

We analyzed whether the positive relationship between EN and cluster conservation remains significant after excluding non-OBP genes from the 31 conserved clusters. For that, we controlled for intercorrelated expression features. For example, we found that OBP genes in clusters with low pBLS, such as the Obp22a and Obp50a genes, are often transcribed in sexual tissues, which may suggest that the OBP gene organization has an SSE component (Spearman’s rank correlation coefficient: ρ = −0.420, P = 0.017; fig. 3A). However, we found that this association is just a by-product of the OBP GA (partial correlation analysis, t = −1.262, P = 0.219; fig. 3B), supporting the observation that newly arising genes often exhibit an SSE pattern (Yeh et al. 2012). Actually, only the EI and EN of the OBP genes are directly associated with cluster conservation (partial correlation analysis, t = −2.831 and t = 2.382, P = 0.009 and P = 0.025, respectively). Overall, it supports the idea that EN may play a major role in shaping the OBP gene organization.
F

Genomic features of OBP genes. Relationship between pBLS and the SSE value using (A) all OBP genes and (B) after removing the recent OBP duplicates (red points).

Genomic features of OBP genes. Relationship between pBLS and the SSE value using (A) all OBP genes and (B) after removing the recent OBP duplicates (red points).

Clusters Including OBP Genes Exhibit Distinctive Transcriptional Regulation by High-Order Chromatin Structures

We studied the effect of high-order chromatin structures (i.e., the nine specific chromatin states defined in Kharchenko et al. [2011]) on the conservation of clusters including OBP genes. We found a significant positive relationship between the pBLS value of these clusters and the proportion of nucleotides in the TE chromatin state (Spearman’s rank correlation coefficient: ρ = 0.480, P = 0.006; fig. 4A). This chromatin state exhibits a distinct composition of proteins and histone marks (Kharchenko et al. 2011). As JIL-1 kinase is preferentially localized at the coding (Regnard et al. 2011) and promoter (Kellner et al. 2012) regions of the regulated genes, we analyzed its binding intensity separately for the coding, untranslated region, intergenic and intronic regions of the 31 focal clusters. We observed a strong positive correlation between the pBLS value and the JIL-1 binding intensity, though, after correcting by multiple testing, only remains statistically significant for the coding regions (Spearman’s rank correlation coefficient: ρ = 0.617; P = 2e−4; fig. 4B). Taken together, these results suggest that the transcriptional regulation by high-order chromatin structures maintains the OBP gene organization to chromatin domains with the appropriate transcriptional environment (supplementary fig. S2, Supplementary Material online).
F

Chromatin features of the clusters that include OBP genes. Relationships between the cluster constraint probability (pBLS) and (A) the proportion of nucleotides annotated as TE and (B) JIL-1 binding intensity in coding regions. The ρ values are the correlation coefficients of these associations. Distribution of the correlation coefficients between pBLS values and (C) the proportion of TE and (D) JIL-1 binding intensities in Drosophila clusters obtained by computer simulations (10,000 replicates of 31 clusters). The arrow indicates the correlation coefficients observed for clusters including OBP genes (P < 1e−5 and P = 0.010, for the TE proportion and JIL-1 binding intensity, respectively). The shaded area in the right tail represents the 5% of the total distribution area.

Chromatin features of the clusters that include OBP genes. Relationships between the cluster constraint probability (pBLS) and (A) the proportion of nucleotides annotated as TE and (B) JIL-1 binding intensity in coding regions. The ρ values are the correlation coefficients of these associations. Distribution of the correlation coefficients between pBLS values and (C) the proportion of TE and (D) JIL-1 binding intensities in Drosophila clusters obtained by computer simulations (10,000 replicates of 31 clusters). The arrow indicates the correlation coefficients observed for clusters including OBP genes (P < 1e−5 and P = 0.010, for the TE proportion and JIL-1 binding intensity, respectively). The shaded area in the right tail represents the 5% of the total distribution area. We further examined whether the high JIL-1 binding intensity and TE chromatin state represent particular features of clusters including OBP genes. Remarkably, the genome-wide cluster data set also shows significant correlation between the pBLS values and the JIL-1 binding intensities (Spearman’s rank correlation coefficient: ρ = 0.305; P < 2e−16) and TE chromatin state (Spearman’s rank correlation coefficient: ρ = 0.312; P < 2e−16). However, our computer simulations show that the correlation strengths are much higher for clusters including OBP than for random groups of 31 comparable clusters (P < 1e−5 and P = 0.010; fig. 4C and D for the TE chromatin state and for JIL-1), which suggests that the JIL-1 binding intensity and TE chromatin state are relevant factors explaining the conservation of clusters including OBP genes.

Discussion

Cluster Inference

Several methods have been developed to detect gene clusters conserved across a phylogeny (Lathe et al. 2000; Tamames 2001; Zheng et al. 2005). These methods differ in their underlying biological assumptions; therefore, their appropriateness depends on the biological question to be addressed. For example, the Synteny Database (Catchen et al. 2009) uses synteny information (i.e., it requires the same gene order and orientation across two genomes) to infer ortholog and paralog relationships, whereas the original version of the OperonDB algorithm (Ermolaeva et al. 2001) searches for clusters of physically close gene pairs conserved across different species to predict operons. The latter version of OperonDB (Pertea et al. 2009) improves the sensibility of the method by allowing rearrangement events inside the candidate cluster regions. There is compelling evidence indicating that some functional clusters can undergo internal rearrangements without transcriptional consequences (Itoh et al. 1999; Lathe et al. 2000); this observation led to the formation of the gene team model (Luc et al. 2003), which we applied here to infer Drosophila clusters. Nevertheless, the gene team model implemented in the MCMuSeC software (Ling et al. 2009) also has some statistical problems. First, the inferred clusters can contain overlapping information, that is, a particular gene may be present in more than one cluster. Because such a feature violates the independence premise assumed for most statistical tests, we have confirmed that all of our conclusions hold after excluding overlapping clusters (1,634 out of 3,434 clusters have nonoverlapping information, and 25 of these encompass OBP genes). Second, the statistical power to estimate conserved gene clusters increases with the species divergence time. Indeed, the 12 Drosophila species (Drosophila 12 Genomes 2007) used in this study are not divergent enough to detect small clusters (i.e., up to three genes). To detect such small clusters, it would be more appropriate to use more divergent species. However, the 12 Drosophila genomes provide a reasonable tradeoff between the quality of the assemblies and annotations (e.g., identification of orthologous and low sequence fragmentation in scaffolds) and the statistical power. This issue has important implications because two main classes of clusters have been described (Weber and Hurst 2011): small clusters of highly coexpressed genes (likely constrained by shared CREs) and large clusters of housekeeping and unrelated (i.e., nonhomologous) genes. Despite using genome data from 12 Drosophila species small-size clusters may be underestimated, this bias should not be relevant for the second cluster class. Thus, our results do not discard a relevant role of shared CREs in shaping genome architecture, but rather highlight the importance of high-order chromatin coregulatory mechanisms in the OBP gene organization. We mostly found large clusters (an average of 6.4 and 8.3 genes for clusters with and without OBP genes, respectively), which comprise a number of nonhomologous genes that exhibit high gene EB; features that characterize housekeeping gene clusters (i.e., the large-size cluster class).

Clusters Including OBP Genes Are Conserved by Functional Constraints

We identified 31 clusters including—at least—one OBP gene, and ten remained significant after correcting for multiple testing (table 1). Although natural selection may appear as the most immediate explanation for the conservation of the OBP genome organization, it could also represent a by-product of the uneven distribution of rearrangements along chromosomes (Ranz et al. 2001; Pevzner and Tesler 2003; Ruiz-Herrera et al. 2006; Bhutkar et al. 2008). Indeed, orthologous chromosome regions affected by a reduced number of rearrangements may maintain their cluster-like structure in the absence of functional constraints (von Grotthuss et al. 2010). However, such an explanation is unlikely to be the main reason for the maintenance of clusters including OBP genes. Indeed, homologous chromosome regions depleted in rearrangement breakpoints (and hence in rearrangements) are not common across Drosophila species (Ranz et al. 2001; Bhutkar et al. 2008; Schaeffer et al. 2008). In fact, the recombination rate, which is highly associated with the rearrangement rate, widely varies among closely related species (True et al. 1996). Consistently, we found no association between the recombination rate (Comeron et al. 2012) and pBLS values across the 31 focal clusters (Spearman’s rank correlation coefficient: ρ = −0.14, P = 0.47), or across all 3,434 Drosophila clusters (Spearman’s rank correlation coefficient: ρ = 0.02, P = 0.16) (supplementary fig. S3, Supplementary Material online). This lack of association results from the fact that we evaluated the statistical significance of the clusters using the observed divergence time of microsynteny conservation as null distribution. As this empirical null distribution depends upon the mode of chromosome evolution, it already captures the information of the uneven rearrangement distribution observed along Drosophila chromosomes. Therefore, it is unlikely that the OBP gene organization was a by-product of the rearrangement rate heterogeneity. In contrast, it may be constrained by natural selection for some functional meaning. As conserved clusters of functionally or transcriptionally linked genes may include nonparalogous members, we defined a cluster as a group of genes that maintain their neighborhood across species regardless of whether they are homologous. This approximation is different from that used by Vieira and Rozas (2011) who only consider clusters of OBP paralogs. These authors observed that OBP genes are found physically closer than expected by chance, although OR (odorant receptors) are not. In contrast, we found that some ORs are clustered with other nonhomologous genes (supplementary tables S1 and S3, Supplementary Material online). Similarly, clusters of OBP paralogs are conserved, but embedded within large arrangements that also include other non-OBP genes (table 1). For example, one of the most conserved Drosophila clusters includes lush (table 1 and fig. 5), which encodes an OBP involved in social aggregation and mating behavior (Xu et al. 2005), but also Shal (a potassium channel), ash1 (involved in the ovoposition and oogenesis), asf1 (dendrite morphogenesis), and tey (synaptic target recognition). Noticeably, the genes within this cluster also exhibit similar patterns of transcription across different developmental stages (fig. 5; Graveley et al. 2011). Overall, it suggests that some functional and transcriptional links maintain the lush genome cluster.
F

The cluster including the lush (Obp76a) gene. The cluster (pBLS value of 0.999983) including lush (Obp76a) and other 19 non-OBP genes (blue boxes). The coordinates (from 19,570 k to 19,680 k) correspond to the 3L chromosome of Drosophila melanogaster. The intensity peaks below the genes indicate the EI values across 30 developmental stages (in different colors).

The cluster including the lush (Obp76a) gene. The cluster (pBLS value of 0.999983) including lush (Obp76a) and other 19 non-OBP genes (blue boxes). The coordinates (from 19,570 k to 19,680 k) correspond to the 3L chromosome of Drosophila melanogaster. The intensity peaks below the genes indicate the EI values across 30 developmental stages (in different colors).

High-Order Chromatin Regulatory Mechanisms Provide the Appropriate Transcriptional Environment for Cluster Maintenance

Chromatin domains may restrict the location of genes to regions having the appropriate transcriptional environment (Noordermeer et al. 2011; Thomas et al. 2011), which may maintain the OBP gene organization. Nonhistone chromatin proteins regulating the chromatin state are therefore of particular interest. For example, lamin Dm0, which physically interacts with JIL-1 kinase (Bao et al. 2005), binds to gene clusters conserved across Drosophila species (Ranz et al. 2011). Remarkably, we found a strong association between the JIL-1 binding intensity and the maintenance of clusters including OBP genes (Spearman’s rank correlation coefficient: ρ = 0.617, P = 0.010; fig. 4D). Moreover, genes regulated by JIL-1 kinase exhibit elevated levels of EB (Regnard et al. 2011) and EN (JIL-1 releases the paused RNA polymerase II at the proximal-promoter (Kellner et al. 2012), favoring transcriptional elongation bursts that increase EN [Becskei et al. 2005; Kaern et al. 2005; Rajala et al. 2010]). Consistent with this idea, we have shown that the OBP gene organization is associated with elevated levels of EB (P = 0.004) and EN (P = 0.043). It has been shown that housekeeping genes may be particularly confined to chromosome regions possessing the appropriate transcriptional environment; indeed, mutations that alter their location may exert important deleterious pleiotropic effects in diverse tissues and developmental stages (Wang and Zhang 2010). Batada and Hurst (2007) have suggested that broadly expressed genes are located in chromosome regions with low stochastic transcriptional fluctuations to minimize the deleterious effects of EN. However, the functional constraints underlying the conservation of the OBP gene organization do not support this hypothesis. First, clusters with OBP genes often exhibit a high proportion of broad-type promoters, which yield elevated levels of EB. Although these two features (broad-promoters and EB) are associated with reduced levels of EN (Tirosh and Barkai 2008; Wang and Zhang 2010; Xi et al. 2011), we detected a positive relationship between the stochastic transcriptional fluctuation (EN) and the pBLS value of these clusters (table 3). Second, although EN can be alleviated by increasing EI (Lehner 2008), the most conserved clusters include OBP genes not only with the highest EN (partial correlation analysis, P = 0.025) but also with the lowest EI (partial correlation analysis; P = 0.009; table 3). In fact, the EI effect on pBLS is lower for clusters with OBP genes than for random samples of 31 comparable clusters (P = 0.034). Finally, even though head-to-head gene pair arrangements can minimize EN (Wang et al. 2011), clusters with OBP genes do not exhibit a significant correlation between the pBLS value and the proportion of head-to-head gene pair frequency. Therefore, a suitable transcriptional environment need not always have reduced levels of EN; indeed, a clustering model based on elevated EN levels may explain the OBP gene organization. Some theoretical models predict that, under certain circumstances, EN can even be beneficial as a source for natural variation, particularly for proteins acting in changing environments (e.g., stress response proteins such as oxidative kinases [Dong et al. 2011]). Some empirical results are consistent with this model. In yeast, for example, the elevated EN of plasma-membrane transporters appears to be driven by positive selection (Zhang et al. 2009). The genes clustered with OBPs also encode membrane proteins and, interestingly, many of these proteins have transporter activity (table 2). In fact, the extensive transcriptional diversification of the OBPs suggests that, apart from transporting odorants of the external environment, some OBPs also act as general carriers of hydrophobic molecules through the extracellular matrix (Arya et al. 2010). Therefore, higher EN levels may allow for the detection of wider ranges of concentrations of hydrophobic molecules. Fluctuations in OBP transcript abundance may represent an important mechanism to increase phenotypic plasticity. Mutations affecting OBP mRNA stability (Wang et al. 2007) and reduced OBP expression levels (Swarup et al. 2011) can actually elicit different Drosophila behaviors to particular odorants, that is, fluctuations in OBP transcript abundance can play a key role in the combinatorial nature of the olfactory coding process. Therefore, natural selection may have favored assembling OBP genes in chromosomal regions with high EN, which in turn may have led to the observed structure of OBP genes in clusters of functionally and transcriptionally related genes.

Supplementary Material

Supplementary tables S1–S3 and figures S1–S3 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  92 in total

1.  Automated identification of conserved synteny after whole-genome duplication.

Authors:  Julian M Catchen; John S Conery; John H Postlethwait
Journal:  Genome Res       Date:  2009-05-22       Impact factor: 9.043

2.  Detecting gene clusters under evolutionary constraint in a large number of genomes.

Authors:  Xu Ling; Xin He; Dong Xin
Journal:  Bioinformatics       Date:  2009-01-21       Impact factor: 6.937

3.  The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages.

Authors:  Yuri I Wolf; Pavel S Novichkov; Georgy P Karev; Eugene V Koonin; David J Lipman
Journal:  Proc Natl Acad Sci U S A       Date:  2009-04-07       Impact factor: 11.205

4.  Bidirectional promoters generate pervasive transcription in yeast.

Authors:  Zhenyu Xu; Wu Wei; Julien Gagneur; Fabiana Perocchi; Sandra Clauder-Münster; Jurgi Camblong; Elisa Guffanti; Françoise Stutz; Wolfgang Huber; Lars M Steinmetz
Journal:  Nature       Date:  2009-01-25       Impact factor: 49.962

5.  OperonDB: a comprehensive database of predicted operons in microbial genomes.

Authors:  Mihaela Pertea; Kunmi Ayanbule; Megan Smedinghoff; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2008-10-23       Impact factor: 16.971

6.  Positive selection for elevated gene expression noise in yeast.

Authors:  Zhihua Zhang; Wenfeng Qian; Jianzhi Zhang
Journal:  Mol Syst Biol       Date:  2009-08-18       Impact factor: 11.429

7.  Plasticity of the chemoreceptor repertoire in Drosophila melanogaster.

Authors:  Shanshan Zhou; Eric A Stone; Trudy F C Mackay; Robert R H Anholt
Journal:  PLoS Genet       Date:  2009-10-09       Impact factor: 5.917

8.  A comprehensive map of insulator elements for the Drosophila genome.

Authors:  Nicolas Nègre; Christopher D Brown; Parantu K Shah; Pouya Kheradpour; Carolyn A Morrison; Jorja G Henikoff; Xin Feng; Kami Ahmad; Steven Russell; Robert A H White; Lincoln Stein; Steven Henikoff; Manolis Kellis; Kevin P White
Journal:  PLoS Genet       Date:  2010-01-15       Impact factor: 5.917

9.  A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests.

Authors:  Antonio Carvajal-Rodríguez; Jacobo de Uña-Alvarez; Emilio Rolán-Alvarez
Journal:  BMC Bioinformatics       Date:  2009-07-08       Impact factor: 3.169

10.  A comparative analysis of divergently-paired genes (DPGs) among Drosophila and vertebrate genomes.

Authors:  Liang Yang; Jun Yu
Journal:  BMC Evol Biol       Date:  2009-03-11       Impact factor: 3.260

View more
  4 in total

Review 1.  Reconstructing Gene Gains and Losses with BadiRate.

Authors:  Pablo Librado; Julio Rozas
Journal:  Methods Mol Biol       Date:  2022

2.  Lipocalins in Arthropod Chemical Communication.

Authors:  Jiao Zhu; Alessio Iannucci; Francesca Romana Dani; Wolfgang Knoll; Paolo Pelosi
Journal:  Genome Biol Evol       Date:  2021-06-08       Impact factor: 3.416

3.  Tissue-specific transcriptomics, chromosomal localization, and phylogeny of chemosensory and odorant binding proteins from the red flour beetle Tribolium castaneum reveal subgroup specificities for olfaction or more general functions.

Authors:  Stefan Dippel; Georg Oberhofer; Jörg Kahnt; Lizzy Gerischer; Lennart Opitz; Joachim Schachtner; Mario Stanke; Stefan Schütz; Ernst A Wimmer; Sergio Angeli
Journal:  BMC Genomics       Date:  2014-12-18       Impact factor: 3.969

4.  Weak Polygenic Selection Drives the Rapid Adaptation of the Chemosensory System: Lessons from the Upstream Regions of the Major Gene Families.

Authors:  Pablo Librado; Julio Rozas
Journal:  Genome Biol Evol       Date:  2016-08-27       Impact factor: 3.416

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.