Literature DB >> 30239702

Genomic Landscape of Methylation Islands in Hymenopteran Insects.

Hyeonsoo Jeong1, Xin Wu1, Brandon Smith1, Soojin V Yi1.   

Abstract

Recent genome-wide DNA methylation analyses of insect genomes accentuate an intriguing contrast compared with those in mammals. In mammals, most CpGs are heavily methylated, with the exceptions of clusters of hypomethylated sites referred to as CpG islands. In contrast, DNA methylation in insects is localized to a small number of CpG sites. Here, we refer to clusters of methylated CpGs as "methylation islands (MIs)," and investigate their characteristics in seven hymenopteran insects with high-quality bisulfite sequencing data. Methylation islands were primarily located within gene bodies. They were significantly overrepresented in exon-intron boundaries, indicating their potential roles in splicing. Methylated CpGs within MIs exhibited stronger evolutionary conservation compared with those outside of MIs. Additionally, genes harboring MIs exhibited higher and more stable levels of gene expression compared with those that do not harbor MIs. The effects of MIs on evolutionary conservation and gene expression are independent and stronger than the effect of DNA methylation alone. These results indicate that MIs may be useful to gain additional insights into understanding the role of DNA methylation in gene expression and evolutionary conservation in invertebrate genomes.

Entities:  

Mesh:

Year:  2018        PMID: 30239702      PMCID: PMC6195173          DOI: 10.1093/gbe/evy203

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

Methylation of cytosine residues is a widespread epigenetic modification in eukaryotes. In animal genomes, the primary targets of DNA methylation are cytosines in the context of CpG dinucleotides. DNA methylation of CpGs (often referred to as CpG methylation) has been extensively investigated in mammalian model systems to reveal its critical roles in key regulatory processes such as genomic imprinting, disease, and development (Razin and Cedar 1994; Tremblay et al. 1995; Robertson and Wolffe 2000; Saze et al. 2003; Beck 2018). In the recent decade, the phylogenetic scope of CpG methylation studies has dramatically expanded, thanks to the advances of sequencing methods to profile whole genome methylomes. The influx of whole genome methylation data from previously little explored taxa, in turn, has further advanced our understanding of the genomic as well as phylogenetic distribution of DNA methylation. For example, even though DNA methylation has been traditionally viewed as a repressive marker, it has now become clear that the functional consequences of DNA methylation depend on the genomic target of DNA methylation. Specifically, DNA methylation near transcription start sites is associated with transcriptional repression of downstream genes (Jones 2012; Schübeler 2015). Methylation of repetitive elements curbs the activity of these sequences, thereby protecting the genome from the harmful effects of transposition of these elements (Yoder et al. 1997; Schübeler 2015). DNA methylation of gene bodies, on the other hand, is generally associated with active transcription of genes, even though the exact cause and effect relationship of this association is not yet resolved (Jjingo et al. 2012; Jones 2012). Recent whole genome profiling of diverse species has further revealed that DNA methylation is phylogenetically more widespread than previously envisioned (Feng et al. 2010; Zemach et al. 2010). The lack of DNA methylation in some prominent model organisms (i.e., fruit flies and Caenorhabditis elegans) represents lineage-specific loss of DNA methylation (Glastad et al. 2011; Yi 2012; Rosic et al. 2018). With the new wealth of methylome data from closely related species, some lineages, such as hymenopteran insects (bees, wasps, and ants) and nematodes, are emerging as useful model systems to understand the evolution and function of DNA methylation (Lyko et al. 2010; Wang et al. 2013; Greer et al. 2015; Rosic et al. 2018). Notably, the pattern of genomic DNA methylation is highly variable among different animals. In vertebrates (especially in mammals), the majority of genomic CpG dinucleotides are heavily methylated. Exceptions to this are found in clusters of hypomethylated CpGs, referred to as “CpG islands” (Bird et al. 1985; Bird 1992). DNA methylation of CpG islands is associated with regulation of gene expression (Schübeler 2015; Mendizabal et al. 2016). Consequently, CpG islands have been widely used as units of investigation for DNA methylation studies. Furthermore, commercially available DNA methylation chips tend to target CpGs found in CpG islands. In contrast, genomic CpG dinucleotides in invertebrates are typically devoid of DNA methylation (Suzuki and Bird 2008; Feng et al. 2010; Zemach and Zilberman 2010). For example, in hymenopteran genomes, DNA methylation is limited to a subset of CpG dinucleotides, often found within genes (Wang et al. 2013; Beeler et al. 2014; Bewick et al. 2017). Figure 1 is a representative example of the differences between the methylomes of humans and honey bees. An interesting observation is that in honey bees and other hymenopteran species, methylated CpGs tend to localize in clusters, as seen in figure 1 (Huh et al. 2014; Wang et al. 2016). Therefore, whereas the heavily methylated human methylome is punctuated by hypomethylated CpG clusters (CpG islands), the lowly methylated honey bee methylome is punctuated by clusters of hypermethylated CpGs (fig. 1).
. 1.

—Contrasting methylation landscapes found in honey bees and humans. Methylated CpGs are sparse but clustered in honey bees and other hymenopteran insects. These “Methylation Islands” are ∼250 bp in length and stand out in the otherwise lowly methylated insect genomes. In contrast, the human methylation landscape is heavily methylated throughout with breaks of hypomethylated CpG islands that are typically ∼1 kb in length.

—Contrasting methylation landscapes found in honey bees and humans. Methylated CpGs are sparse but clustered in honey bees and other hymenopteran insects. These “Methylation Islands” are ∼250 bp in length and stand out in the otherwise lowly methylated insect genomes. In contrast, the human methylation landscape is heavily methylated throughout with breaks of hypomethylated CpG islands that are typically ∼1 kb in length. As studies using CpG islands as units of epigenetic variation have been highly successful in illuminating the functional implications of epigenetic variation, here we investigated the distribution and functional implications of clusters of methylated CpGs in insect genomes. We refer to them as “methylation islands (MIs).” In this study, we analyzed seven methylomes of hymenopteran insects, which offer well-annotated genomes and high quality methylomes, to define MIs and characterize their genomic distribution, and investigated potential functional consequences using RNA-seq data. Our analyses show that clusters of hypomethylated CpGs, namely MIs, have profound associations with gene sequence conservation and gene expression.

Materials and Methods

Bisulfite-Sequencing and RNA-Seq Data Analysis

Raw bisulfite-sequencing data of selected species were obtained from the SRA, and accession numbers can be found in supplementary file, Supplementary Material online. Reads were subjected to quality as well as adapter trimming using Trim-galore and subsequently aligned and deduplicated to their respective reference genomes using Bismark v0.14.4 (Krueger and Andrews 2011; Martin 2011). Additionally, reads were aligned to the unmethylated lambda phage genome (NCBI reference NC_001416.1) to estimate the bisulfite conversion efficiency for each sample. Alignment summaries and conversion rates can be found in supplementary file S1, Supplementary Material online. RNA-Seq data of Apismellifera, Nasoniavitripennis, and Trichogrammapretiosum were downloaded from the SRA, and accession numbers can be found in supplementary file S1, Supplementary Material online. Transcriptome sequencing reads were preprocessed using FastQC to assess average quality scores (Andrews 2010). We removed potential adaptor sequences using Trimmomatic (Bolger et al. 2014). Tophat2 was used to align transcriptome reads to a reference genome and FeatureCount was used to quantify transcripts (Kim et al. 2013; Liao et al. 2014). We removed lowly expressed genes that had less than five read counts.

Identification of mCGs and MIs

Methylated CpGs (mCGs) were first identified using the Bis-Class algorithm, which takes into account correlation of methylation levels for adjacent CpGs (Huh et al. 2014). Our custom script for identifying methylation islands scans for clustered mCGs using the following steps (supplementary fig. S2, Supplementary Material online): Each scaffold is scanned (5′ -> 3′) in 200-bp windows. Each window is evaluated for its mCG fraction, which is defined as: If the mCG fraction of the window is below the threshold of 0.02, the algorithm moves onto the next downstream mCG and uses it as the starting position for the new 200-bp window and evaluates its mCG fraction (supplementary fig. S2A, Supplementary Material online). This process is repeated until a window’s mCG fraction is greater than or equal to the threshold. A 200-bp window satisfying the mCG fraction threshold is extended by 50 bp and re-evaluated for its mCG fraction. This extension continues until the mCG fraction of the extended window falls below the threshold, after which the last mCG in the previously evaluated window is chosen as the ending position of the methylation island (supplementary fig. S2B, Supplementary Material online). Therefore, the starting and ending positions of methylation islands are always mCGs. Once a methylation island has been identified, the algorithm begins evaluating 200-bp windows again starting at the next downstream mCG. Steps 2 and 3 are repeated until the last mCG on the scaffold is reached (supplementary fig. S2C, Supplementary Material online).

Computing the Conservation Score

Orthologous gene sets were generated using ProteinOrtho (ver. 5.1.6) with the default setting (Lechner et al. 2011). The orthologous gene sets including protein sequences from all species were further processed to calculate conservation scores. Clustal-Omega (ver. 1.2.4) was used for sequence alignment (Sievers et al. 2014). The conservation score of each amino acid position was quantified based on the Jensen–Shannon (JS) divergence, which is a robust method for identifying protein sequence conservation (Lin 1991; Capra and Singh 2007). Conservation scores were analyzed with a linear mixed model using amino acid location (outside-MI or inside-MI) and the existence of corresponding DNA methylation sites (unmethylated CpGs or methylated CpGs) as the main factors and the interaction and random factors of gene and species information. To ensure adequate representation, we only analyzed genes that had at least five amino acids with mCpGs and non-mCpGs and five amino acids inside and outside of MIs.

Results

Identification of Methylation Islands in Invertebrate Genomes

We selected seven Hymenopteran insects (A.mellifera, Camponotus floridanus, Harpegnathos saltator, N.vitripennis, Polistes canadensis, Solenopsis invicta, and T.pretiosum) with well-annotated genome information and available deep-coverage whole-genome bisulfite sequencing (BS-seq) data for this study (table 1). The average fractional methylation of methylated CpGs (mCGs) varies among species ranging from 0.44 to 0.74 (table 1). The global average fractional methylation of all CpG sites was ranged from 0.008 to 0.025 in all seven species (table 1). Previously it was shown that methylated CpGs in some species tend to occur in clusters (Wang et al. 2013; Huh et al. 2014). Indeed, mCGs in our data set exhibited clustering (supplementary fig. S1, Supplementary Material online). Specifically, the distances between two adjacent mCGs were significantly shorter than the distance between two randomly selected CGs from the genome, for all species considered (supplementary table S3, Supplementary Material online).
Table 1

Genome Composition of the Species Used in This Study Along with Methylation Statistics

SpeciesGenome Size (Mb)# Protein-Coding Genes# of mCGs (% of all CGs)Avg. Fractional Methylation of mCGs
Apis mellifera234.0715,31478,846 (0.78%)0.584
Camponotus floridanus232.6811,04285,746 (0.84%)0.635
Harpegnathos saltator294.4611,838112,212 (0.53%)0.662
Nasonia vitripennis295.7813,354114,261 (0.85%)0.737
Polistes canadensis211.219,87615,744 (0.24%)0.386
Solenopsis invicta396.0214,451157,829 (0.98%)0.526
Trichogramma pretiosum196.2213,20060,298 (0.60%)0.345
Genome Composition of the Species Used in This Study Along with Methylation Statistics To capture the clusters of mCGs, henceforth referred to as methylation islands (MIs), we utilized a sliding window approach to identify regions of high mCG density and employ them as units of measurement to explore DNA methylation (detailed description of MI definition and search can be found in supplementary file S1, Supplementary Material online). Specifically, our algorithm identified MIs as regions harboring >2% of mCGs (∼3-fold enrichment compared with the genome average, table 1) in windows longer than 200 bp (Materials and Methods).

Characteristics of MIs

Our approach identified thousands of MIs in the seven species. As expected from the pattern of clustering, a large portion of mCGs was located within the MIs, even though the total lengths of MIs were only a minute fraction of the genome size (table 2). The average length and number of the identified MIs were positively correlated with the total number of mCGs (Pearson correlation coefficients = 0.97) rather than the size of the genome (tables 1 and 2). For example, P. canadensis had the lowest number of MIs (n = 1,342) even though its genome size is 20 Mb larger than that of T. pretiosum (the number of MIs in T. pretiosum is 4,889). Similarly, the average MI length was the longest in S. invicta, the species with the most mCGs, and the shortest in P. canadensis, the species with the least mCGs.
Table 2

Summary Statistics of MIs Detected in Each Species

Apis melliferaCamponotus floridanusHarpegnathos saltatorNasonia vitripennisPolistes canadensisSolenopsis invictaTrichogramma pretiosum
# of predicted MIs5,1266,3278,3759,6441,34210,5744,889
# of mCGs in MIs (% of total mCGs)29,254 (37.1%)47,804 (55.8%)78,490 (69.9%)85,007 (74.4%)8,293 (52.7%)112,819 (71.5%)30,141 (50%)
Total MI length (bp) (% of genome)1,043,247 (0.45%)1,803,969 (0.77%)2,969,693 (1.01%)3,355,006 (1.13%)210,235 (0.099%)4,291,930 (1.08%)1,136,846 (0.58%)
Avg. MI length (bp)213.15286.12355.59348.88157.66406.89233.53
Avg. mCG density per MI (# of mCGs/MI length)0.030.020.030.020.070.030.03
# of MIs overlapping with genesa (% of all MIs)4,958 (96.7%)6,082 (96.1%)7,845 (93.7%)9,079 (94.1%)1,020 (76%)9,843 (93.1%)4,603 (94.2%)
# of MIs overlapping exclusively with genesa (% of all MIs)4,788 (93.4%)5,961 (94.2%)7,606 (90.8%)8,873 (92.0%)1,010 (75.3%)9,477 (89.6%)4,469 (91.4%)
# of MIs overlapping with exons/exclusively with exons (% of all MIs)4,830/3,117 (94.2%/60.8%)5,763/2,634 (91.1%/41.6%)7,319/2,704 (87.4%/32.3%)8,184/3,381 (84.9%/35.1%)741/524 (55.2%/39.0%)8,839/3,412 (83.6%/32.3%)4,433/2,926 (90.7%/59.8%)
# of MIs overlapping with introns/exclusively with introns (% of all MIs)1,794/178 (35.0%/3.5%)3,404/382 (53.8%/6.0%)5,011/592 (59.8%/7.1%)5,739/1,206 (59.5%/12.5%)478/273 (35.6%/20.3%)6,300/1,160 (59.6%/11.0%)1,881/242 (38.5%/4.9%)
# of MIs overlapping with exon–intron boundaries/only one exon–intron boundary (% of all MIs)1,637/705 (31.9%/13.8%)3,051/1,312 (48.2%/20.7%)4,461/1,690 (53.3%/20.2%)4,672/1,635 (48.4%/17.0%)205/92 (15.3%/6.9%)5,252/2,123 (49.7%/20.1%)1,649/611 (33.7%/12.5%)
# of MIs overlapping with promoters (% of all MIs)172 (3.4%)117 (1.8%)146 (1.7%)199 (2.1%)30 (2.2%)308 (2.9%)213 (4.4%)

Defined as the region spanning the transcript start site to the transcription termination site.

Summary Statistics of MIs Detected in Each Species Defined as the region spanning the transcript start site to the transcription termination site. In the honey bee, clusters of DNA methylation predominantly overlap with gene bodies (96.7%, defined as the region between the transcription start and transcription termination sites), particularly exons (94.2%; table 2). Of all the MIs found in the honey bee, 60.8% were exclusively residing (100% overlap) within exons (supplementary fig. S4, Supplementary Material online). MIs are also found in introns, though less frequently. In fact, only 3.5% of honey bee MIs were found exclusively in introns. Interestingly, 31.9% of the honey bee MIs extended across an exon–intron boundary. Previous studies have speculated on the role of DNA methylation at exon–intron boundaries in signaling splice junctions and/or playing a role in alternative splicing (Lyko et al. 2010; Herb et al. 2012; Li-Byarlay et al. 2013; Galbraith et al. 2015). Consequently, we asked whether MIs were preferentially located at exon–intron boundaries. To answer this question, we randomly permuted MIs onto concatenated genes, and examined how often MIs were found in exon–intron boundaries (detailed methodology can be found in supplementary file, Supplementary Material online). We found significant (empirically determined P value <0.001) enrichment of MIs at exon–intron boundaries for all seven species (supplementary fig. S5, Supplementary Material online). It was previously shown that mCGs in some hymenopteran insects, including honey bees (A. mellifera), tended to occur near the 5′ end of a gene (Lyko et al. 2010; Hunt et al. 2013a; Wang et al. 2013; Galbraith et al. 2015; Lindsey et al. in press). Interestingly, MIs from A. mellifera as well as T. pretiosum were found slightly biased to the 3′ end of a gene (fig. 2). On the other hand, MIs in C. floridanus, H. saltator, P. canadensis, and N. vitripennis were 5′ biased (fig. 2).
. 2.

—Characterization of MIs across genic regions in seven hymenopteran species. (A) Box plots displaying the coverage of MIs across different types of genic regions (gene body, introns, and exons). (B) Violin plots comparing the relative position of MIs with regards to the TSS for genes with MIs.

—Characterization of MIs across genic regions in seven hymenopteran species. (A) Box plots displaying the coverage of MIs across different types of genic regions (gene body, introns, and exons). (B) Violin plots comparing the relative position of MIs with regards to the TSS for genes with MIs.

MIs Are Enriched in Evolutionarily Conserved Genes and the Amino Acids inside of MIs Are More Conserved than the Amino Acids outside of MIs

Previous studies often classified genes as methylated or unmethylated, typically based on the mean fractional methylation (Lyko et al. 2010; Sarda et al. 2012; Wang et al. 2013; Glastad et al. 2016). These studies have shown that methylated genes are more evolutionarily conserved than unmethylated genes (Lyko et al. 2010; Wang et al. 2013; Galbraith et al. 2015; Glastad et al. 2016). Here, we further examined evolutionary conservation of genes based on the presence or absence of MIs. We first processed protein sequences of all species to identify orthologous gene sets (Materials and Methods). This yielded a total of 12,249 gene sets out of which 5,403 (44%) were single copy orthologs found in all seven species and thus termed as the Complete Orthologous (CO) gene set. Of the remaining gene sets, 6,429 (52%) were found in two or more species and classified as the Incomplete Orthologous (IO) gene set. Genes unique to each species (n = 21,523) were termed as the Unique Gene (UG) set (fig. 3).
. 3.

—MIs are enriched in evolutionarily conserved genes. (A) Bar plots illustrating the number of genes in each gene type (all genes [AG], complete orthologous genes [CO], incomplete orthologous genes [IO], and unique genes in each species [UG]). (B) Bar plots indicating the proportion of genes having different types of methylation features.

—MIs are enriched in evolutionarily conserved genes. (A) Bar plots illustrating the number of genes in each gene type (all genes [AG], complete orthologous genes [CO], incomplete orthologous genes [IO], and unique genes in each species [UG]). (B) Bar plots indicating the proportion of genes having different types of methylation features. We then examined the frequencies of 1) genes with MI, 2) genes without MI but with at least one mCG, and 3) genes without both MI and mCG in each type of gene set. The frequency of genes with MIs is higher in the CO compared with those in the IO and UG while the frequency of genes without MI but with mCGs is similar between CO and IO (fig. 3). We tested if genes with MIs were overrepresented in CO compared with IO using Fisher’s exact test, which yielded an average odds ratio of 2.84. In comparison, the same test using the number of genes without MI but with mCGs yields an average odds ratio of 1.31 (supplementary table S1, Supplementary Material online). These two odds ratios are statistically significantly different, indicating that clusters of mCGs (which by definition are MIs), more so than individual mCGs, tend to be highly enriched in conserved gene sets (table 3 and supplementary table S1, Supplementary Material online).
Table 3

Statistical Significance of Differences between Odds Ratios (OR) of Genes with and without MI Using Z Approximation

SpeciesOR of Genes w/MIaOR of Genes w/o MI but w/mCGbDifference of Log. OR (δ)SE(δ)P Value
Apis mellifera2.311.650.340.073.8E-07
Camponotus floridanus2.791.330.740.072.2E-16
Harpegnathos saltator3.230.931.250.082.2E-16
Nasonia vitripennis3.291.141.060.092.2E-16
Polistes Canadensis1.441.230.160.105.7E-02
Solenopsis invicta3.841.191.170.072.2E-16
Trichogramma pretiosum2.971.740.530.081.2E-11

Note.—Odds ratios were calculated and summarized in supplementary table S1, Supplementary Material online.

Odds ratio of the number of genes with MIs and the number of the remaining genes between CO and IO types, respectively, were tested using Fisher’s exact test.

Odds ratio of the number of genes without MIs but with mCGs and the number of the remaining genes between CO and IO types, respectively, were tested using Fisher’s exact test.

Statistical Significance of Differences between Odds Ratios (OR) of Genes with and without MI Using Z Approximation Note.—Odds ratios were calculated and summarized in supplementary table S1, Supplementary Material online. Odds ratio of the number of genes with MIs and the number of the remaining genes between CO and IO types, respectively, were tested using Fisher’s exact test. Odds ratio of the number of genes without MIs but with mCGs and the number of the remaining genes between CO and IO types, respectively, were tested using Fisher’s exact test. We further examined whether the existence of DNA methylation and/or MI have effects on the conservation of specific amino acids. To do so, we mapped genomic coordinates of mCGs occurring in the coding regions of the corresponding positions in the protein sequence and quantified the conservation scores using the Jensen–Shannon (JS) divergence of protein sequence conservation (Lin 1991; Capra and Singh 2007). A linear mixed model was fitted to predict the conservation scores of amino acids based on the existence of mCG sites in the corresponding DNA coding sequence and the location of amino acids positioned either inside or outside of MIs (Materials and Methods). We found that amino acids harboring mCGs had higher conservation scores compared with amino acids without mCGs (fig. 4). Specifically, amino acids positioned inside MIs showed higher conservation scores compared with amino acids positioned outside of MIs (P value <2.2×10−16). Strikingly, amino acids that did not have mCG sites but were located inside MIs had similar or even higher conservation scores than amino acids in MIs with mCG sites (fig. 4; see also supplementary fig. S6, Supplementary Material online). Even though the exact relationship between conservation scores and the location in regard to MIs was variable in different hymenopteran species (supplementary fig. S6, Supplementary Material online), sites within MIs consistently exhibited higher conservation than those outside of MIs. These results demonstrate that protein sequence conservation has a stronger association with methylation islands than individually methylated CG sites.
. 4.

—Effect of MI and DNA methylation on amino acid conservation. Linear mixed models were fitted to estimate the conservation score of amino acid sites using amino acid location (outside-MI or inside-MI) and the existence of corresponding DNA methylation sites (nonmethylated CpGs or methylated CpGs) as the main factors with the interaction and random factors being gene and species information, respectively. The conservation score of each amino acid position is quantified based on the Jensen–Shannon (JS) divergence.

—Effect of MI and DNA methylation on amino acid conservation. Linear mixed models were fitted to estimate the conservation score of amino acid sites using amino acid location (outside-MI or inside-MI) and the existence of corresponding DNA methylation sites (nonmethylated CpGs or methylated CpGs) as the main factors with the interaction and random factors being gene and species information, respectively. The conservation score of each amino acid position is quantified based on the Jensen–Shannon (JS) divergence.

Gene Expression Is Influenced by the Presence of MIs

Previous studies demonstrated that gene body methylation is prevalent in evolutionarily conserved genes and correlates with constitutive and high gene expression (Elango et al. 2009; Lyko et al. 2010; Wang et al. 2013; Galbraith et al. 2015; Glastad et al. 2016). Here, we investigated expression patterns with respect to the presence of MIs. We compared normalized gene expression levels between MI- and non-MI-genes for three of the species that we had RNA-seq data for (fig. 5), and observed that expression levels of MI-genes were higher than that of non-MI genes in all cases. In addition, highly conserved genes (i.e., CO) tend to have higher expression levels than lowly conserved genes across all species (i.e., IO and UG), indicating that gene expression increases according to the degree of conservation. These results align with other studies that have shown gene body methylation to be associated with sequence conservation and robust expression (Sarda et al. 2012; Huh et al. 2013; Hunt et al. 2013b). Interestingly, while expression levels of non-MI genes clearly decreased as conservation level decreased, expression levels of MI genes remained consistent regardless of gene sequence conservation (fig. 5).
. 5.

—Gene expression levels between MI- and non-MI-genes in each gene type. The y-axis is log2-transformed gene expression level (normalized by gene length). The x-axis represents gene types of all genes (AG), complete orthologous genes (CO), incomplete orthologous genes (IO), and unique genes in each species (UG).

—Gene expression levels between MI- and non-MI-genes in each gene type. The y-axis is log2-transformed gene expression level (normalized by gene length). The x-axis represents gene types of all genes (AG), complete orthologous genes (CO), incomplete orthologous genes (IO), and unique genes in each species (UG). We sought to characterize expression change in relation to gain or loss of MIs in conserved genes. Since gene expression varies extensively among species and conditions, a direct comparison between species is difficult. To overcome this limitation, we tested how a change in the MI state of CO genes associated with the overall correlation of gene expression between species. We first assigned each gene pair a binary classification for its MI state. A gene pair is considered to have the “same MI state” if it lacks an MI in both species or it possesses at least one MI in both. Conversely, a gene is labeled as having a “different MI state” if only one species in the pair has an MI. There are greater number of “same MI state” genes than “different MI state” genes in the orthologous gene pairs, consistent with the previous observation that conserved genes tend to share similar MI states (table 4). We then conducted pairwise comparisons of gene expression for each species pair. We observed a significant difference in Spearman’s rank correlation coefficients for all three pairwise comparisons between genes with “same MI state” and “different MI state.” Specifically, “same MI state” genes exhibited stronger correlations, indicating that the presence of MIs in conserved genes is associated with stable and constitutive transcriptional activity in insects (table 4).
Table 4

Statistical Significance between Pairwise Correlation Coefficients of “Same State MI” and “Different State MI” Genes

Same State MIs
Different State MIs
Spearman’s ρNumber of GenesSpearman’s ρNumber of GenesP value
Apis melliferaNasonia vitripennis0.6073,5900.5571,7689.30E-03
Apis melliferaTrichogramma pretiosum0.3743,5870.3011,7794.50E-03
Nasonia vitripennisTrichogramma pretiosum0.4683,9270.3511,4312.20E-16

Note.—The correlation coefficients were estimated between two species’ gene expression level using Spearman’s rho correlations.

Statistical Significance between Pairwise Correlation Coefficients of “Same State MI” and “Different State MI” Genes Note.—The correlation coefficients were estimated between two species’ gene expression level using Spearman’s rho correlations. To further test whether the existence of MIs affected gene expression levels, we compared relative expression levels of exons located in MIs (MI-exon) and that of exons located outside of MIs (non-MI-exon) within the same gene. The median expression level of MI-exons was generally higher than that of non-MI-exons (fig. 6). This was particularly evident for the CO and IO gene groups where the locally weighted smoothing line was >0 for all three species, suggesting expression bias inclined toward MI-exons. Overall, we consistently observed expression bias toward higher expression of MI-exons regardless of gene type and species, indicating a robust relationship between MI-exons and increased gene expression.
. 6.

—Comparison of average expression levels between exons located in MIs (MI-exon) and exons located outside of MIs (non-MI-exon) within the same gene. For complete orthologous genes (CO), incomplete orthologous genes (IO), and unique genes in each species (UG), we calculated the expression fold change (log2 transformed) between MI-exons and non-MI-exons within the same gene, where each dot represents one gene. A locally weighted smoothing line is included to map the general direction bias of relative expression change; when the line is >0 it indicates higher expression in MI-exons compared with non-MI-exons and vice versa for when the line dips <0. We repeated this analysis for three species, (A) Apis mellifera, (I) Nasonia vitripennis, and (C) Trichogramma pretiousum.

—Comparison of average expression levels between exons located in MIs (MI-exon) and exons located outside of MIs (non-MI-exon) within the same gene. For complete orthologous genes (CO), incomplete orthologous genes (IO), and unique genes in each species (UG), we calculated the expression fold change (log2 transformed) between MI-exons and non-MI-exons within the same gene, where each dot represents one gene. A locally weighted smoothing line is included to map the general direction bias of relative expression change; when the line is >0 it indicates higher expression in MI-exons compared with non-MI-exons and vice versa for when the line dips <0. We repeated this analysis for three species, (A) Apis mellifera, (I) Nasonia vitripennis, and (C) Trichogramma pretiousum.

DNMT3 Knockdown Further Highlights Significance of MIs on Alternative Splicing

We performed additional analyses to further explore roles of MI using data from a previous knockdown experiment of DNMT3, the enzyme responsible for de novo methylation, in A. mellifera (Li-Byarlay et al. 2013). There was a modest drop in both mCGs and MIs in the knockdown sample (table 5), which is consistent with the role of DNMT3 in DNA methylation. In total, 89.8% of mCGs remained consistent between control and knockdown bees, which was similarly reflected in the MI measurements where 83.2% of MIs remained the same (table 5). In the 205 genes that lost MIs in the knockdown sample, we found no significant expression differences between control and knockdown genes (supplementary fig. S7, Supplementary Material online). We further examined genes that were similarly methylated in both conditions but lost MIs in the knockdown bee. Gene ontology analysis revealed functions related to nucleotide binding (P value = 0.017) and methyltransferase activity (P value = 0.032), though none of the terms were statistically significant following adjustment for the false discovery rate (supplementary table S4, Supplementary Material online).
Table 5

Summary Statistics of DNA Methylation and MIs in Control and dnmt3 Gene Knockdown Honey Bees

 Controldnmt3 Gene Knockdown
# total mCG sites78,84675,897
# genes with mCG sites6,3086,277
Avg. # of mCGs per gene12.311.9
# MIs (MI genes)5,126 (3,280)4,946 (3,207)
# MIs only present in group (MI genes)501 (222)372 (147)
# MIs at exon–intron boundary only present in group11638
Summary Statistics of DNA Methylation and MIs in Control and dnmt3 Gene Knockdown Honey Bees Interestingly, among the 501 MIs lost in the knockdown bees, 116 (23.1%) were on exon–intron boundaries. The observation that MIs in exon–intron boundaries tend to be excluded from those that were lost in the knockdown (P < 0.05, Fisher’s exact test) is consistent with the importance of DNA methylation, and MIs, residing at splice sites (Li-Byarlay et al. 2013). In the 372 MIs that were gained in the knockdown, those residing in exon–intron boundaries were significantly underrepresented (P value <0.0001, Fisher’s exact test), which might indicate that regulation of splicing is overall impeded in knockdown bees (Li-Byarlay et al. 2013).

Discussion

One of the most remarkable classical findings of DNA methylation studies in mammals is that unmethylated CpGs tend to occur in clusters, or “CpG islands (CGIs)” (Bird et al. 1985; Bird 1992; Suzuki and Bird 2008). They have been used as central “markers” to study epigenetic variation for several decades (Suzuki and Bird 2008; Illingworth and Bird 2009; Yi 2017). As methylome data from nonmodel species including many invertebrates begin to accumulate, the intriguing contrast between methylomes of mammals and invertebrates (fig. 1) becomes clearer, motivating us to ask several questions: given that methylated CpGs are marked exceptions to the generally unmethylated invertebrate genomes, do methylated cytosines occur in clusters in these species, and if so, do they carry specific functional consequences? As the first step to answer these questions, we used relatively well-characterized genomes and methylomes of seven hymenopteran insects in this study. Previous analyses of hymenopteran methylomes often defined methylated and unmethylated genes based on whether they harbor or lack methylated cytosines, or used the average methylation level of a gene as a summary statistic for comparisons (Bonasio et al. 2012; Smith et al. 2012; Wang et al. 2013). Even though these prior studies were successful in illuminating novel aspects of invertebrate DNA methylation, because the proportion of methylated sites in each gene is typically small, taking averages could dilute the true signal of DNA methylation (Lyko et al. 2010; Bonasio et al. 2012; Smith et al. 2012; Wang et al. 2013). In addition, some studies indicated that methylated CpGs in these species occur in clusters (Wang et al. 2013; Huh et al. 2014), which we show is true in the seven species analyzed here. We used a sliding window approach that is conceptually similar to the algorithms used to identify CpG islands in mammalian genomes. We reasoned that if MIs represented functional units, they might occur in similar numbers across closely related species as in the case of mammals (Illingworth et al. 2010). This idea led to identifying MIs as regions exhibiting >3-fold enrichment of methylated CpGs compared with the rest of the genome. Interestingly, mammalian CGI algorithms typically use 3-fold enrichment of unmethylated CpGs as one of the criteria (Gardiner-Garden and Frommer 1987; Takai and Jones 2002). This similarity is another interesting parallel between the methylomes of mammals and hymenopteran insects. Nevertheless, it is known that the criteria to define CGIs require some adjustments when used in nonhuman species, to account for species-specific nucleotide compositions (Matsuo et al. 1993; Aerts et al. 2004). Similarly, the criteria we used here likely will require finer adjustments according to specific genomes that will be targets of the study. An important discovery regarding CGIs is that genes harboring CGIs in their promoters are more highly and more stably expressed compared with genes that lack CGIs in the promoters (Antequera 2003; Saxonov et al. 2006; Elango and Yi 2008) Moreover, this trend was consistently observed in diverse vertebrates (Elango and Yi 2008). We show that MIs in hymenopteran insects also have deep implications for gene expression. First, MIs are significantly overrepresented in exon–intron boundaries, which is consistent with their presumed role in regulation of intron splicing or alternative splicing (Flores et al. 2012; Herb et al. 2012; Li-Byarlay et al. 2013; Maunakea et al. 2013; Galbraith et al. 2015). This discovery has potential uses in aiding annotation of previously unannotated genes, particularly exonic regions. Interestingly, when DNMT3 was knocked down (Li-Byarlay et al. 2013), MIs occurring on exon–intron boundaries tended to be maintained in a higher frequency than expected. Second, genes harboring MIs exhibit higher and more stable levels of gene expression compared with those without MIs, a pattern that was also evident at the exon level and may be indicative of their inclusion in alternative transcripts. We also investigated whether the gain or loss of MIs could be connected to these expression traits, thus getting us closer to understanding the cause-and-effect relationship between MIs and expression. Regrettably, currently available data sets are limited to moderately diverged species trios (A. mellifera, N. vitripennis, and T. pretiosum), prohibiting accurate identification of orthology of individual MIs. Nevertheless, we could clearly demonstrate that expression levels across species were more strongly correlated when MIs were maintained in coding sequences. It is notable that the effect of MIs on expression is in the same direction as the effect of CGIs in case of mammals (promoting higher and more stable gene expression). These findings suggest that characterizing DNA methylation in insects beyond singular CpGs or broader regions could offer additional insights for understanding the functional role of DNA methylation. Click here for additional data file.
  57 in total

1.  Conservation and divergence of methylation patterning in plants and animals.

Authors:  Suhua Feng; Shawn J Cokus; Xiaoyu Zhang; Pao-Yang Chen; Magnolia Bostick; Mary G Goll; Jonathan Hetzel; Jayati Jain; Steven H Strauss; Marnie E Halpern; Chinweike Ukomadu; Kirsten C Sadler; Sriharsa Pradhan; Matteo Pellegrini; Steven E Jacobsen
Journal:  Proc Natl Acad Sci U S A       Date:  2010-04-15       Impact factor: 11.205

Review 2.  DNA methylation and genomic imprinting.

Authors:  A Razin; H Cedar
Journal:  Cell       Date:  1994-05-20       Impact factor: 41.582

3.  RNA interference knockdown of DNA methyl-transferase 3 affects gene alternative splicing in the honey bee.

Authors:  Hongmei Li-Byarlay; Yang Li; Hume Stroud; Suhua Feng; Thomas C Newman; Megan Kaneda; Kirk K Hou; Kim C Worley; Christine G Elsik; Samuel A Wickline; Steven E Jacobsen; Jian Ma; Gene E Robinson
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-12       Impact factor: 11.205

Review 4.  Cytosine methylation and the ecology of intragenomic parasites.

Authors:  J A Yoder; C P Walsh; T H Bestor
Journal:  Trends Genet       Date:  1997-08       Impact factor: 11.639

5.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

6.  Implications of CpG islands on chromosomal architectures and modes of global gene regulation.

Authors:  Samuel Beck; Catherine Rhee; Jawon Song; Bum-Kyu Lee; Lucy LeBlanc; Laurie Cannon; Jonghwan Kim
Journal:  Nucleic Acids Res       Date:  2018-05-18       Impact factor: 16.971

7.  On the presence and role of human gene-body DNA methylation.

Authors:  Daudi Jjingo; Andrew B Conley; Soojin V Yi; Victoria V Lunyak; I King Jordan
Journal:  Oncotarget       Date:  2012-04

8.  Function and evolution of DNA methylation in Nasonia vitripennis.

Authors:  Xu Wang; David Wheeler; Amanda Avery; Alfredo Rago; Jeong-Hyeon Choi; John K Colbourne; Andrew G Clark; John H Werren
Journal:  PLoS Genet       Date:  2013-10-10       Impact factor: 5.917

9.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors:  Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal:  Genome Biol       Date:  2013-04-25       Impact factor: 13.583

10.  Comparative genomics of the miniature wasp and pest control agent Trichogramma pretiosum.

Authors:  Amelia R I Lindsey; Yogeshwar D Kelkar; Xin Wu; Dan Sun; Ellen O Martinson; Zhichao Yan; Paul F Rugman-Jones; Daniel S T Hughes; Shwetha C Murali; Jiaxin Qu; Shannon Dugan; Sandra L Lee; Hsu Chao; Huyen Dinh; Yi Han; Harsha Vardhan Doddapaneni; Kim C Worley; Donna M Muzny; Gongyin Ye; Richard A Gibbs; Stephen Richards; Soojin V Yi; Richard Stouthamer; John H Werren
Journal:  BMC Biol       Date:  2018-05-18       Impact factor: 7.431

View more
  4 in total

Review 1.  The Tsetse Metabolic Gambit: Living on Blood by Relying on Symbionts Demands Synchronization.

Authors:  Mason H Lee; Miguel Medina Munoz; Rita V M Rio
Journal:  Front Microbiol       Date:  2022-06-09       Impact factor: 6.064

Review 2.  The impact of epigenetic information on genome evolution.

Authors:  Soojin V Yi; Michael A D Goodisman
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2021-04-19       Impact factor: 6.671

Review 3.  Insects Provide Unique Systems to Investigate How Early-Life Experience Alters the Brain and Behavior.

Authors:  Rebecca R Westwick; Clare C Rittschof
Journal:  Front Behav Neurosci       Date:  2021-04-21       Impact factor: 3.558

4.  Self-organization of plasticity and specialization in a primitively social insect.

Authors:  Solenn Patalano; Adolfo Alsina; Carlos Gregorio-Rodríguez; Martin Bachman; Stephanie Dreier; Irene Hernando-Herraez; Paulin Nana; Shankar Balasubramanian; Seirian Sumner; Wolf Reik; Steffen Rulands
Journal:  Cell Syst       Date:  2022-08-30       Impact factor: 11.091

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.