Muhammad Ammar Raza1, Ningning Yu1, Dan Wang1, Liwen Cao1, Susheng Gan1,2, Liping Chen1. 1. Department of Horticulture, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou 310058, P. R. China. 2. Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, USA.
Abstract
Wide hybridization is a common and efficient breeding strategy for enhancing crop yield and quality. An interesting phenomenon is that the reciprocal hybrids usually show different phenotypes, and its underlying mechanism is not well understood. Here, we reported our comparative analysis of the DNA methylation patterns in Solanum lycopersicum, Solanum pimpinellifolium and their reciprocal hybrids by methylated DNA immunoprecipitation sequencing. The reciprocal hybrids had lower levels of DNA methylation in CpG islands and LTR retroelements when compared with those of their parents. Importantly, remarkable differences in DNA methylation patterns, mainly in introns and CDS regions, were revealed between the reciprocal hybrids. These different methylated regions were mapped to 79 genes, 14 of which were selected for analysis of gene expression levels. While there was an inverse correlation between DNA methylation and gene expression in promoter regions, the relationship was complicated in gene body regions. Further association analysis revealed that there were 15 differentially methylated genes associated with siRNAs, and that the methylation levels of these genes were inversely correlated with respective siRNAs. All these data raised the possibility that the direction of hybridization induced the divergent epigenomes leading to changes in the transcription levels of reciprocal hybrids.
Wide hybridization is a common and efficient breeding strategy for enhancing crop yield and quality. An interesting phenomenon is that the reciprocal hybrids usually show different phenotypes, and its underlying mechanism is not well understood. Here, we reported our comparative analysis of the DNA methylation patterns in Solanum lycopersicum, Solanum pimpinellifolium and their reciprocal hybrids by methylated DNA immunoprecipitation sequencing. The reciprocal hybrids had lower levels of DNA methylation in CpG islands and LTR retroelements when compared with those of their parents. Importantly, remarkable differences in DNA methylation patterns, mainly in introns and CDS regions, were revealed between the reciprocal hybrids. These different methylated regions were mapped to 79 genes, 14 of which were selected for analysis of gene expression levels. While there was an inverse correlation between DNA methylation and gene expression in promoter regions, the relationship was complicated in gene body regions. Further association analysis revealed that there were 15 differentially methylated genes associated with siRNAs, and that the methylation levels of these genes were inversely correlated with respective siRNAs. All these data raised the possibility that the direction of hybridization induced the divergent epigenomes leading to changes in the transcription levels of reciprocal hybrids.
Wide hybridization or distant crossing refers to the inter-specific or above-specific hybridization. It is a useful breeding strategy that has been used for crop improvement by transferring many desired traits from wild species to crops. The hybridization involves the selection of maternal and paternal parents, and often one parent can be maternal or paternal, and the other paternal or maternal, respectively, which is referred to as reciprocal crossings. It is well known that the reciprocal hybrids share identical nuclei but have different maternal cytoplasms. Therefore, the major difference between the reciprocal hybrids is the source of their cytoplasms.Reciprocal hybrids often show different phenotypes. A good example is crosses between Arabidopsis thaliana and A. arenosa. When A. thaliana was used as the maternal parent and A. arenosa as the paternal parent, many viable seeds were produced. In contrast, the reciprocal cross did not yield any viable seeds. Several genetic mechanisms have been proposed to explain the differences between the reciprocal hybrids. For example, maternal effects model is defined as the causal influence of the maternal genotype or phenotype on the offspring phenotype, while the parent-origin effects model hypothesizes that female and male gametes contribute different sets of active alleles or else different complements of gene products to the seed. Cytoplasmic inheritance., and cytoplasmic-nuclear interaction models mainly elucidate the interactions between chloroplast and mitochondrial genomes and the exogenous nuclear genome on the influence of cytoplasmic inheritance, respectively. Gene imprinting, the dominance model, over-dominant effects and epistasis hypothesis have also been proposed to explain the phenomenon. However, the underlying biological mechanisms accounting for the differences between reciprocal hybrids remain to be deciphered. Because the main differences between the reciprocal hybrids are in their cytoplasms, it is reasonable to assume that the most essential question is how the cytoplasm influences gene expression. Recent studies have pointed out that gene expression can be regulated by epigenetic modification, especially DNA methylation. Therefore, whether different cytoplasms can lead to the changes in methylation profile, and then affect gene expression levels is the key point to explain the phenotypic differences in reciprocal hybrids.Cytoplasm is where many biochemical reactions take place, including the formation of mature small RNAs (sRNAs) such as miRNAs and siRNAs that are often involved in regulation of gene expression., miRNAs are a class of small non-coding RNA molecules that regulate eukaryotic gene expression by cleaving their respective target genes at the posttranscriptional level. They specifically bind to mRNAs based on sequence pairing, leading to the degradation of their target mRNAs., In reciprocal crossings between A. thaliana ecotypes Ler and C24, researchers observed increased levels of 21-nt miRNAs in both hybrids compared with those in their parents. siRNAs are a class of double stranded RNA molecules, 20–25 bp in length, and may direct chromatin modification through RNA-directed DNA methylation (RdDM) and transposon silencing. Changes in 24-nt siRNA levels in two A. thaliana ecotypes showed correlated with changes in DNA methylation, creating allelic variants that might contribute to the phenotypic heterosis. An increase in genome-wide DNA methylation in A. thaliana reciprocal hybrids, possibly due to RdDM pathway, might alter circadian rhythms and gene expression in F1 hybrids. The maternal siRNAs as regulators of parental genome imbalance and gene expression could be transmitted to hybrids and regulate the seed size in reciprocal crosses. It was also reported that there were different profiles of 24-nt sRNAs in reciprocal hybrids, which might lead to the diversity of phenotypes. However, such questions as what the differences of DNA methylation profiles between two reciprocal hybrids are, whether the differences are induced by cytoplasmic siRNAs, and whether or how they regulate gene expression and subsequent phenotypes remain elusive.In our previous studies, reciprocal crossings between cultivated species Solanum lycopersicum cv. Micro-Tom (for simplicity hereafter M was used for Micro-Tom) and wild type S. Pimpinellifolium line WVa700 (W was similarly used for WVa700) were performed, and the resulting reciprocal hybrids (M × W and W × M were similarly used for Micro-Tom_WVa700 and WVa700_Micro-Tom, respectively) displayed different phenotypes in terms of plant height (43.79 cm for M × W and 41.89cm for W × M), fruit shape index (0.95 for M × W and 0.92 for W × M) and single fruit weight (3.54g for M × W and 4.40g for W × M). To unveil the underlying possible epigenetic mechanisms, we profiled, using methylated DNA immunoprecipitation sequencing (MeDIP-seq) method, and compared the DNA methylation patterns of the entire genomes of M, W and their reciprocal hybrids (M × W and W × M). We found that both reciprocal hybrids had lower methylation levels in CpG islands (CGIs) and major types of retroelements than their parents, suggesting that the crossing process per se may reduce the methylation levels of reciprocal hybrids in these regions. Importantly, the two reciprocal hybrids showed different DNA methylation patterns across their entire genomes, with major differences in gene body regions. We found 79 different methylated genes (DMGs) between the reciprocal hybrids, and 14 of which were randomly selected for analyses of their expression levels using qRT-PCR. We found that DNA methylation correlated negatively with gene expression in the promoter regions, while in gene body regions the correlation was complicated. Further association analysis of DNA methylation with siRNAs expression revealed that the methylation levels mainly inversely correlated with the siRNAs expression levels in gene body regions of the related genes. Our research suggested that the direction of hybridization induced the differences of epigenomes in the reciprocal hybrids. And the differences of DNA methylation profiles might be modified by siRNAs existed in maternal cytoplasms, leading to the distinct gene expression levels between the reciprocal hybrids.
2. Materials and methods
2.1. Plant materials
Inbred lines of S. lycopersicum cv. Micro-Tom (for simplicity, we used “M” for Micro-Tom in this study) (2n = 24) and S. pimpinellifolium line WVa700 (similarly “W” for WVa700) (2n = 24) were used as parents, and reciprocal crosses between the parental lines were made to produce two different types of hybrids M × W (i.e., Micro-Tom × WVa700) and W × M (WVa700 × Micro-Tom), respectively. A total of 80 tomato plants, with 20 plants for each of the four genotypes (W, M, W × M and M × W), were grown in a greenhouse at 23 °C with a photoperiod of 16-h light and 8-h dark and 60% relative humidity.
2.2. Extraction, 5mC antibody immunoprecipitation and sequencing of DNA
Total genomic DNA was extracted from 45-day-old leaves of the reciprocal hybrids and their parent plants using CTAB method as described by Murray and Thompson. At this stage, the differences in leaf phenotype and plant morphology among the four genotypes of tomato plants became evident. Three seedlings were pooled in each genotype sample for genomic DNA extraction. The quality of DNA was analysed using agarose gel electrophoresis. Five μg DNA was then randomly fragmented into small pieces with most of them 100–500 bp in length and purified by QIA quick PCR Purification Kit (Qiagen, Hilden, Germany). Four DNA libraries were constructed, namely the M, W, M × W and W × M DNA libraries. After DNA-end repair and 3’-dA overhang, DNA was ligated to an Illumina sequencing primer adaptor. The double-stranded DNA was subsequently denatured and immunoprecipitated using 5mC antibody. Real time quantitative PCR analysis was performed to validate the quality of immunoprecipitated DNA fragments. DNA with high quality was used for PCR amplification and bands between 200-300 bp were selected, and subsequently sequenced using the HiSeq 2000 system (SBS KIT-HS V3, Illumina).
2.3. Bioinformatics analysis
Sequence data were obtained through filtering out the adapter, reads containing more than 10% N and low quality reads. MeDIP-Seq data was mapped to reference genome (ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG2.3_release/ITAG2.3_genomic.fasta (12 June 2015, data last accessed)) by soapaligner, only unique alignments with no more than 2 mismatches were chosen for further analyses. The uniquely mapped reads were used to analyse the distribution of reads in genome chromosomes and different genomic regions. Peak scanning was performed using MACS (version 1.4.0) to identify genomic regions that were enriched with methylated reads. The scan graphs of two samples were merged to identify putative different methylation regions. The read numbers of individual methylated region from any two of the four genotypes were used to calculate the normalized log2 values and P-value, with chi-square statistics and a false discovery rate (FDR) statistics. The filtering criteria for P-value and the fold change were ≤0.05 and ≥2, respectively. DMRs were divided into two groups: the hypo-methylated (down) DMRs and the hyper-methylated (up) DMRs. The genes overlapped with DMRs were deemed as DMGs. DMGs were functionally clustered via Gene Ontology analysis to explore their biological functions, using the DAVID functional annotation tool with P ≤ 0.05. KEGG pathway enrichment analysis was carried out to identify significant enriched metabolic pathway or signal transduction pathway. The calculation formula of analysis is the same as that of GO function analysis. Q-value ≤ 0.05 was used as the threshold to determine significant enrichment pathway in DEGs.
2.4. Bisulfite sequencing
To validate the results of MeDIP-seq, the bisulfite sequencing polymerase chain reaction was performed using the same genomic DNA materials. Briefly, 2 μg DNA was treated with sodium bisulfate using the EZ DNA Methylation Gold Kit according to the manufacturer’s instruction (Zymo Research, Orange Country, US) and then incubated at 98 °C for10 minutes and 64 °C for 2.5 hours. 10 pairs of primers for selected genes were designed by the MethPrimer online software (http://www.urogene.org/methprimer) and were shown as Supplementary Table S1. The bisulfite-treated DNA was PCR amplified by with these primers. PCR products were purified with the Wizard® SV Gel and PCR Clean-Up System (Promega, Madison, WI, USA), and then cloned into the pMD18-T vector (Takara, Dalian, China) and sequenced (Sunny, Shanghai, China). At least 15 clones were sequenced for each sample. The methylation level was calculated by the ratio of unconverted cytosines over the total cytosines, including three types: CG, CHG and CHH. The final bisulfate sequencing results were analysed according to the Kismeth website (http://Katahdin.mssm.edu/Kismeth (8 January 2016, data last accessed)).
2.5. Quantitative real-time PCR
Total RNA from the 45-day-old leaves of the two reciprocal hybrids was extracted using the Trizol method per the manufacturer’s instruction (Invitrogen, Carlsbad, CA) and used to construct small RNA libraries. Three biological replicates were included for all the samples. The first strand cDNA was synthesized with oligo (dT) primer from 2 μg RNA using the Reverse Transcriptase M-MLV kit (Takara). Primers used for real-time PCR were listed in Supplementary Table S1. The real-time PCR was performed using a SYBR® Premix Ex TaqTM (Takara) on the ABI STEPONE Real-Time PCR System. Relative quantification was performed using the CT (2-△△Ct) method. Each sample was performed in triplicate for real-time PCR reaction.
2.6. Small RNA sequencing and analysis of siRNAs
Small RNA high-throughput sequencing of four libraries (representing M, W, W × M and M × W genotypes, respectively) was as previously described. The small RNA-seq clean data have been deposited in the SRA database of NCBI with accession number SRX722032, SRX722033, SRX722034, and SRX722035. The leaf materials used for the small RNA libraries were same as those for DNA extraction and MeDIP-seq. After eliminating the contaminant reads and low quality reads, tags were used to predict siRNA using tag2siRNA software. The clean reads were analysed by length distribution and common sequences. Then, the siRNAs sequences were matched to the DMGs via blast searching of NCBI databases (http://blast.ncbi.nlm.nih.gov/Blast.cgi (18 August 2016, data last accessed)).
3. Results
3.1. Genome-wide DNA methylation profiles of two parental lines and their reciprocal hybrids
Whole genome-wide DNA methylation patterns of two parental lines (M and W) and their reciprocal hybrids (M × W and W × M) were profiled using MeDIP-seq. The MeDIP-seq data have been deposited in the SRA database of NCBI with accession number SRR5397373, SRR5397378, SRR5397384 and SRR5397386. We obtained up to 4.5 Gb of MeDIP-seq data for each genotype (Table 1). After removal of adapter sequences, contaminations, and low-quality reads, more than 90 million reads, regarded as clean data, from each genotype were analysed. Total reads were mapped to the reference genome (ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG2.3_release/ITAG2.3_genomic.fasta (12 June 2015, data last accessed)), with mapping rates higher than 96% for individual genotypes. Reads were distributed on each chromosome region, with good genome coverage (see Supplementary Fig. S1). 57.3–58.79% of the mapped reads were uniquely mapped to reference genome for each genotype (Table 1). The distribution of MeDIP-seq reads in different CpG density regions was shown, and the majority of the reads in all samples were located in regions with 5 to 25 CpGs (see Supplementary Fig. S2).
Table 1
General information of read alignment and peak scanning in four samples
Parents
Reciprocal hybrids
Sample
Micro-Tom
WVa700
Micro-Tom_WVa700
WVa700_Micro-Tom
Data Size (Gb)
4.5
4.6
4.6
4.6
Reads Number (No.)
91836734
93207132
93085356
92982838
Mapped Reads (No.)
88334928
90089549
89689126
89518553
Mapping Rate (%)
96.19
96.66
96.35
96.27
Effective Chain Deptha
0.53
0.54
0.53
0.53
Unique Mapped Bases (No.)
53987715
54363987
54273327
53301136
Unique Mapped Rate (%)b
58.79
58.33
58.30
57.32
Peak Number (No.)
55100
50316
48935
48179
Average Peak Length (bp)
1654
1725
1725
1765
Peak Median Length (bp)
1442
1500
1506
1543
Peak Total Length (bp)
91129487
86803207
84405012
85016601
Peak Coverage (%)
11.66
11.10
10.80
10.88
aEffective Chain Depth = Mapped bases/the Size of the Reference Genome.
General information of read alignment and peak scanning in four samplesaEffective Chain Depth = Mapped bases/the Size of the Reference Genome.bUnique Mapping Rate = Unique Mapped Reads Count/Total Reads.We further analysed the methylation enrichment in different components of genome. CGIs are mainly located in promoter regions and play important roles in the epigenetic regulation of gene expression. Therefore, we compared the methylation status of CGIs and CGI shores (spanning 2kb fragment up- and down-stream of each CGI). We found that in both parents the CGIs had more methylation reads than the CGI shores had. In contrast, CGIs did not enrich much more methylated reads than the CGI shores enriched in the reciprocal hybrids (Fig. 1a). In particular, CGI shores had more methylated reads than CGIs had in M × W. More interestingly, the methylation levels of CGIs were lower in reciprocal hybrids than in their parents, suggesting that the process of hybridization led to the decrease in DNA methylation levels in these regions. Furthermore, W × M had higher methylation levels than M × W in CGIs (Fig. 1a), which was one of the major differences between the reciprocal hybrids in DNA methylation profiles.
Figure 1
Distribution of reads around (a) CpG islands and (b) intragenic regions. The X axis indicates the position around CpG islands (intragenic regions) and the Y axis indicates the normalized read number. These panels can reflect the methylation levels around CpG islands and gene body regions. (See online article for colour version of this figure).
Distribution of reads around (a) CpG islands and (b) intragenic regions. The X axis indicates the position around CpG islands (intragenic regions) and the Y axis indicates the normalized read number. These panels can reflect the methylation levels around CpG islands and gene body regions. (See online article for colour version of this figure).We also analysed the distribution of the reads in intragenic region and nearby 2kb sequences, and found that 2kb upstream region showed higher methylation levels than other regions in all four genotypes (parents and reciprocal hybrids). The DNA methylation levels decreased sharply before intragenic region, indicating that promoter regions were hypermethylated and might play a key role in the modification of DNA methylation (Fig. 1b). Interestingly, W (wild species) and the two reciprocal hybrids showed similar methylation levels in promoter regions, while M (cultivated species) had the lowest methylation levels.Considering that methylation of repeat elements was usually associated with genomic instability through structural changes such as transposition, translocation, and recombination, we compared the distribution of reads in such different repeat elements as LTR (long terminal repeat)/Gypsy, LTR/Copia, DNA/CMC-Enspm and SINE (short interactive nuclear element)/SINE. These four main repeat elements had the vast majority of the reads in all four samples (Table 2), and the LTR retroelements were especially rich with more than 90% reads, suggesting that such retroelements may play an important part in DNA methylation modification. It is noteworthy that the methylation levels in the major type of LTR retroelements of the two reciprocal hybrids were different, and lower than those of their parents (Table 2).
Table 2
Proportion of peaks of the major repeat elements in four samples
Parents
Reciprocal hybrids
Sample
Micro-Tom
WVa700
Micro-Tom_WVa700
WVa700_Micro-Tom
LTRa/Gypsy (%)
69.95
67.22
66.42
67.01
LTR/Copia (%)
22.21
23.51
23.67
23.53
DNA/CMC-EnSpm (%)
1.61
1.99
2.10
2.04
SINEb-SINE (%)
1.33
1.73
1.85
1.81
aLTR: long terminal repeat
bSINE: short interactive nuclear element
Proportion of peaks of the major repeat elements in four samplesaLTR: long terminal repeatbSINE: short interactive nuclear element
3.2. Divergent DNA methylation patterns in the reciprocal hybrids
Our previous research showed that the two reciprocal hybrids displayed significantly different phenotypes in terms of fruit shape index, single fruit weight and plant height. We here investigated DNA methylation patterns of the reciprocal hybrids to explore the possibility that these phenotypic changes were associated with or induced by changes in DNA methylation. Peak scanning software (MACS 1.4.0) was used to identify reads-enriched regions. Methylated peak, an important parameter in analysing DNA methylation profile,, was used to detect the highly methylated regions (HMRs) and only uniquely mapped reads were utilized in the whole genome peak scanning. As shown in Table 1, 48935 and 48179 peaks were identified in the reciprocal hybrids W × M and M × W samples, respectively. The methylation levels in the 2kb region upstream of 5’UTR, in the gene body (including 5’UTR, coding sequence (CDS) region, intron, and 3’UTR) and in the 2kb downstream region of 3’UTR in the genomes of the two hybrids were further analysed. The ratio of peaks was used to represent the methylation density of individual specific genome elements. As shown in Fig. 2, the 2kb upstream regions had the most abundant peaks, followed by the CDS regions and the 2kb downstream regions. Notably, the frequencies of the methylated peaks in both CDS region and intron were higher in W × M (with 13.32% in the CDS region and 9.02% in intron) than in M × W (12.85%, 8.72%) (Fig. 2).
Figure 2
Distribution of peaks in different gene elements in the reciprocal hybrids, including 2kb upstream region, 5’UTR, CDS region, Intron, 3’UTR and 2kb downstream region. The proportion of peak was shown in each specific gene element. (a) Micro-Tom_WVa700 (i.e. M × W); (b) WVa700_ Micro-Tom (W × M). (See online article for colour version of this figure).
Distribution of peaks in different gene elements in the reciprocal hybrids, including 2kb upstream region, 5’UTR, CDS region, Intron, 3’UTR and 2kb downstream region. The proportion of peak was shown in each specific gene element. (a) Micro-Tom_WVa700 (i.e. M × W); (b) WVa700_ Micro-Tom (W × M). (See online article for colour version of this figure).The differences between the two reciprocal hybrids as well as the two parents were further investigated by searching differentially methylated regions (DMRs), and the DMRs data-based whole-genome methylomes of M × W and W × M were shown in Fig. 3 There were 109 DMRs (between the two reciprocal hybrids) and 3208 DMRs (between the two parents), obviously many more differences between the parents than those between the two reciprocal hybrids. Comparison of gene methylation status between the two reciprocal hybrids revealed that there were 60 hypermethylated and 49 hypomethylated DMRs in six gene elements of W × M. Most of the DMRs were observed in introns (31.19%) and CDS regions (23.85%) (Table 3). Interestingly, more than one DMR were found in some of the DMGs. For examples, Solyc03g025770.2 gene was hypermethylated in the 2kb upstream region and CDS region, and Solyc00g090430.2 gene was hypomethylated in the 2kb upstream region, 3’UTR, intron and CDS region in W × M. Excluding repeat regions, there were 79 genes with different methylation patterns between the two reciprocal hybrids.
Figure 3
Comprehensive maps of the entire genome methylome of the reciprocal hybrids. Inner black circle represents gene density corresponding to each chromosome. Outer two circles represent the hyper (up-regulated) - and hypo (down-regulated) - methylated regions of the two reciprocal hybrids. (See online article for colour version of this figure).
Table 3
Numbers of differentially methylated genes between the two reciprocal hybrids in six gene elements
Numbers of differentially methylated genes between the two reciprocal hybrids in six gene elementsComprehensive maps of the entire genome methylome of the reciprocal hybrids. Inner black circle represents gene density corresponding to each chromosome. Outer two circles represent the hyper (up-regulated) - and hypo (down-regulated) - methylated regions of the two reciprocal hybrids. (See online article for colour version of this figure).
3.3. Gene ontology (GO) and KEGG pathway analysis of DMGs
In this study, we referred any gene with methylation peaks in its promoter or gene body regions as a methylated gene. As mentioned above, a total of 79 DMGs were found upon comparison of the two reciprocal hybrids, including 41 hypermethylated genes and 38 hypomethylated genes in W × M. These differentially methylated genes were annotated, including a cell division protease ftsH, a TPR repeat protein, a histone-lysine n-methyltransferase, a b-zip transcription factor-like, among others (Supplementary Table S2). GO analysis revealed that the DMGs in M × W and W × M were functionally belonging to one or more following categories: biological process (BP), cellular component (CC), and molecular function (MF). These DMGs were further divided into 17 functional clusters, with 10 in BP, 5 in CC and 2 in MF. Interestingly, hypermethylated gene Solyc08g013860.2 in M × W was related to mitochondrial part, participating in dicarboxylic acid metabolic process, and also hypermethylated in the parental line M than the other parental line W (Supplementary Table S3). Per the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, we observed that gene Solyc01g099150.2 and gene Solyc02g092780.1 were highly enriched in pathways: linoleic acid metabolism (Ko00591) and other types of O-glycan biosynthesis (Ko00514), with Q-value ≤ 0.05. Furthermore, two DMGs (Solyc05g056450.2 and Solyc08g013860.2) were involved in fructose and mannose metabolism and in carbon fixation in photosynthetic pathways (Supplementary Table S4).
3.4. Validation of MeDIP-seq data using bisulfite sequencing
To validate the MeDIP-seq data, we performed bisulfite sequencing of ten randomly selected DMRs of associated genes which showed the same methylation levels between the two parents. While, compared with W × M, four DMRs of associated genes [Solyc00g020840.2 (Intron), Solyc00g282510.1 (CDS), Solyc08g065650.1 (CDS) and Solyc00g020940.1 (Upstream 2k)] had higher methylation levels and six DMRs of associated genes [Solyc05g008030.1 (Intron), Solyc03g115180.1 (Intron), Solyc06g065730.2 (Intron) Solyc03g115180.1 (CDS), Solyc07g009320.2 (Intron), Solyc03g025770.2.1 (CDS)] had lower methylation level in M × W, essentially the same as those results revealed by MeDIP-seq. The bisulfite sequencing results proved that methylation levels of both Solyc05g008030.1 (Intron) and Solyc03g115180.1 (Intron) were up-regulated in W × M, while methylation levels of Solyc00g282510.1 (CDS) and Solyc00g020840.2 (Intron) were down-regulated (Fig. 4). Solyc08g065650.1 (CDS) and Solyc00g020940.1 (Upstream 2k) had higher levels of methylation in M × W, while Solyc03g115180.1 (CDS), Solyc07g009320.2 (Intron), Solyc03g025770.2 (CDS) and Solyc06g065730.2 (Intron) had higher methylation levels in W × M (see Supplementary Fig. S3). Hence, the bisulfite sequencing results were consistent with the MeDIP-seq data, demonstrating that the genome-wide DNA methylation data from MeDIP-seq was reasonably reliable.
Figure 4
Validation of DNA methylation levels of four randomly selected DMR-associated genes as examples. (a) and (b) Bisulfite sequencing of hypomethylated genes (Solyc05g008030.1 and Solyc03g115180.1) in M × W; (c) and (d) Bisulfite sequencing of hypermethylated genes (Solyc00g282510.1 and Solyc00g020840.2) in M × W. M × W represents Micro-Tom_WVa700, and W × M represents WVa700_Micro-Tom. The Y axis indicates the methylation of each genotype. The dot plot shows only the cytosines as circles, colored according to the type of cytosine. The filled circles represent methylated cytosines and empty circles represent un-methylated cytosines. There were 10 positive clones for every primer pairs of each genotype. TML stands for total methylation level. (See online article for colour version of this figure).
Validation of DNA methylation levels of four randomly selected DMR-associated genes as examples. (a) and (b) Bisulfite sequencing of hypomethylated genes (Solyc05g008030.1 and Solyc03g115180.1) in M × W; (c) and (d) Bisulfite sequencing of hypermethylated genes (Solyc00g282510.1 and Solyc00g020840.2) in M × W. M × W represents Micro-Tom_WVa700, and W × M represents WVa700_Micro-Tom. The Y axis indicates the methylation of each genotype. The dot plot shows only the cytosines as circles, colored according to the type of cytosine. The filled circles represent methylated cytosines and empty circles represent un-methylated cytosines. There were 10 positive clones for every primer pairs of each genotype. TML stands for total methylation level. (See online article for colour version of this figure).
3.5. Association analysis of gene expression and DNA methylation
To investigate the correlation between DNA methylation and gene expression, we randomly selected 14 DMGs, including seven genes with up-regulated methylation levels and the other seven genes with down-regulated methylation levels in M × W, for qRT-PCR analysis of their expression levels in the reciprocal hybrids (Fig. 5). Correlated with the different methylation patterns, the selected genes showed different expression levels between the two reciprocal hybrids. Specifically, in the M × W hybrid, three genes, namely Solyc04g071320.2, Solyc02g094350.2 and Solyc07g062890.2, were hypermethylated in the 2kb region upstream of the gene body, correlated with lower levels of transcripts. In contrast, the gene named Solyc05g018390.2 was hypomethylated in the 2kb region upstream of its gene body and had higher transcript levels. Notably, DNA methylation in the gene body regions is also mostly (but not always) inversely correlated with gene expression. Eight genes, namely Solyc05g008030.1, Solyc06g065730.2, Solyc00g282510.1, Solyc01g111020.2, Solyc02g092780.1, Solyc03g115180.1, Solyc07g009320.2 and Solyc03g025770.2, exhibited inverse correlation between the DNA methylation levels and their gene expression levels in the gene body regions. For examples, Solyc01g111020.2 had higher methylation level in its intron and CDS region, but dramatically down-regulated expression level in W × M. Solyc02g092780.1 showed a lower methylation level and a higher expression level in W × M (Fig. 5 and Supplementary Table S2). However, the intron of Solyc06g051760.2 and both the intron and CDS of Solyc03g097460.2 were hypermethylated in M × W, but both genes had higher levels of transcripts (Fig. 5 and Supplementary Table S2). These results suggested that there may be a negative correlation between the DNA methylation levels in the promoter regions and the gene expression levels, but in the gene body regions the correlation was complicated.
Figure 5
Relative gene expression levels of 14 randomly selected DMR-associated genes between the reciprocal hybrids. Asterisks indicate significant differences [Student’s t test (n = 3)] between the reciprocal hybrids. *P <0.05, **P < 0.01. The blocks with different colors indicate different gene elements. Error bars represent SD of triplicate experiments. (See online article for colour version of this figure).
Relative gene expression levels of 14 randomly selected DMR-associated genes between the reciprocal hybrids. Asterisks indicate significant differences [Student’s t test (n = 3)] between the reciprocal hybrids. *P <0.05, **P < 0.01. The blocks with different colors indicate different gene elements. Error bars represent SD of triplicate experiments. (See online article for colour version of this figure).
3.6. Association analysis of DNA methylation and small RNAs
It has been recently reported that siRNAs, generated via RNA interfering (RNAi) pathway, can target homologous genomic DNA sequences for cytosine methylation through the RdDM pathway., To investigate the possible relationship between siRNAs of different cytoplasms and differentially methylated patterns of the reciprocal hybrids, we analysed the accumulation of 24-nt siRNAs from small RNA high-throughput sequencing (small RNA-seq) data. Firstly, we compared the distribution of clean reads in chromosome from MeDIP-seq and small RNA-seq in the two reciprocal hybrids (see Supplementary Fig. S4). After removing reads mapped to rRNAs, tRNAs, and small nuclear and nucleolar RNAs, we newly data-mined 15,021,181 and 12,515,123 reads of 24-nt siRNAs from M × W and W × M, respectively (Supplementary Table S5). In both reciprocal hybrids, siRNAs were mapped to DMGs. We found that the expression levels of siRNAs, which were mapped to DMGs, were different between the two reciprocal hybrids. As shown in Fig. 6, those 15 genes with different methylation levels between the reciprocal hybrids were covered by different numbers of siRNAs. For example, Solyc00g0208400.2 was hypermethylated in M × W and mapped with 282 siRNAs, whereas in W × M there was no associated siRNAs at all. However, there was also opposite circumstance. Solyc06g065730.2 gene was hypermethylated in W × M and mapped with 201 siRNAs, while in M × W the number of associated siRNAs was 307 (Supplementary Table S6). As a result, the methylation levels of ten associated DMGs showed a negative correlation with their siRNAs’ expression levels, while methylation levels of the other five DMGs showed a positive correlation with siRNAs’ expression levels between the two reciprocal hybrids (Fig. 6).
Figure 6
Methylation levels and siRNAs profiles of 15 DMGs in the reciprocal hybrids. The X axis shows 15 DMGs. The Y axis indicates the number of methylated reads or the count of siRNAs. The siRNAs sequences were matched to the DMGs via NCBI blast search. The counts of methylated reads and mapped siRNAs in each gene were displayed in Supplementary Table S6. (See online article for colour version of this figure).
Methylation levels and siRNAs profiles of 15 DMGs in the reciprocal hybrids. The X axis shows 15 DMGs. The Y axis indicates the number of methylated reads or the count of siRNAs. The siRNAs sequences were matched to the DMGs via NCBI blast search. The counts of methylated reads and mapped siRNAs in each gene were displayed in Supplementary Table S6. (See online article for colour version of this figure).
4. Discussion
Wide hybridization, as a useful strategy for hybrid offspring to change genetic composition and phenotype, has been used in basic research and practice in plants., It’s well observed that reciprocal hybrids usually display different phenotypes. Several genetic models such as maternal effects, parent-origin effects and cytoplasmic inheritance have been proposed to determine if reciprocal differences are interacting with different genetic backgrounds., It has also been found that epigenetic changes influence gene expression and subsequently govern relative characteristics., However the underlying molecular, genetic and epigenetic mechanisms remain elusive. In our current study, we established genome-wide methylation and siRNAs profiles between the two reciprocal hybrids and performed a comparative analysis of these profiles and their association with gene expression. This research revealed an epigenetic mechanism possibly accounting for the different phenotypes between the reciprocal hybrids.In the present study, we established the global methylation profiles in S. lycopersicum, S. pimpinellifolium and their reciprocal hybrids by high-throughput sequencing, and identified the differentially methylated genes between the reciprocal hybrids. By bisulfite sequencing, we confirmed the reliability of MeDIP-seq results. At the genome-wide level, we compared the differences of methylation patterns between the reciprocal hybrids and their parents. Firstly, the W × M hybrid had higher methylation levels of methylation CGIs than its reciprocal hybrid M × W had (Fig. 1a). CpG islands normally remain un-methylated, whereas the methylation of CpG islands, especially in promoter regions, is often associated with gene silencing., Therefore, we hypothesized that the differences in CGIs methylation between the reciprocal hybrids might be correlated with the different gene expression levels between them. Secondly, sharply-decreased methylation levels after 2kb upstream region were observed, and the parents showed remarkable differences of methylation levels in these promoter regions (Fig. 1b). Interestingly, the parental line M (cultivated species) showed lower methylation levels than the other parental line W (wild species) in 2kb upstream region. It has been reported that 2kb upstream region (the promoter region) as a repressive epigenetic mark had a role in down-regulating gene expression., The lower methylation levels of promoter region might be inversely correlated with higher transcription levels of certain genes in M than in W. Thirdly, we analysed the distribution of reads in different repeat elements and found that LTR retroelements had vast majority of the reads (Table 2). It is noteworthy that the reciprocal hybrids had different methylation levels in these major types of retroelements. Generally, when there was a genomic hybridization, parental genomes suffered from gene shock caused by genome recombination. Methylation of these repeat elements was an important modification in the maintenance of genomic stability. The differences of methylation levels between the reciprocal hybrids in major types of retroelements might be correlated with the different cytoplasmic genome backgrounds during the process of hybridization.Specifically, we analysed the methylation patterns in different gene elements, including 2kb region upstream of 5’UTR, 5’UTR, CDS region, intron, 3’UTR and 2kb downstream of 3’UTR. The data showed that 2kb upstream region had the highest methylation level, whereas the main differences between the two reciprocal hybrids concentrated in introns and CDS regions (Fig. 2). It meant that the differences in DNA methylation between the reciprocal hybrids were predominantly located in gene body regions. Totally, we searched 109 DMRs and mapped them to 79 genes (Table 3 and Supplementary Table S2). Hence, an interesting phenomenon was that some methylated genes had more than one DMR, indicating that there might be the genomic region-specific DNA methylation patterns. After annotating the differentially methylated genes, the functions of these genes were revealed. Per GO analysis, we found that one DMG Solyc08g013860.2 was related to mitochondrial part and participated in dicarboxylic acid metabolic process, also showed differential methylation levels between the two parents (Supplementary Table S3). KEGG pathway analysis revealed that gene Solyc01g099150.2 and gene Solyc02g092780.1 were highly enriched in pathways: linoleic acid metabolish and other types of O-glycan biosynthesis. Furthermore, two DMGs (Solyc05g056450.2 and Solyc08g013860.2) were involved in carbon fixation in photosynthetic organisms and fructose and mannose metabolism pathways (Supplementary Table S4). It has been reported that carbohydrate content and sugar metabolism can regulate tomato fruit development., And the reciprocal hybrids showed significantly different phenotypes in fruit shape index and single fruit weight, as mentioned above. These results suggested that there existed differences of epigenetic regulation in the reciprocal hybrids that might be responsible for phenotypic diversities.As a common epigenetic process, DNA methylation can inactivate mobile elements and silence the repetitive elements to regulate gene expression or influence other epigenetic courses., To explore how the differences in methylation levels are responsible for the distinctive phenotypes in the reciprocal hybrids, we carried out the association analysis of DNA methylation and gene expression. In our study, randomly selected 14 DMGs were used to analyse the relationship between DNA methylation and gene expression, and most of them showed remarkably different transcription levels between the two reciprocal hybrids, suggesting that DNA methylation changes in the reciprocal hybrids are correlated with altered gene expression. Specifically, the negative relationship between DNA methylation and transcription has been reported in several papers., There are also some reports putting forward that gene body methylation, especially intronic DNA methylation, is positively correlated with gene expression., Our results showed that gene expression was negatively correlated with DNA methylation in the promoter region, while in the gene body region the relationship between DNA methylation and gene expression was complicated (Fig. 5). Here, we assumed that, DNA methylation in the promoter regions as well as the gene body regions modulated the expression of related genes, and the differences in transcription levels may subsequently contribute to the different phenotypes between the reciprocal hybrids. To fully address the point, the transcriptomes of all the DMGs revealed should be established by RNA sequencing and comparatively analysed in further research.In the process of reciprocal hybridization, the source of cytoplasm (mainly comes from maternal parent) is a primary difference between the reciprocal hybrids. Therefore, small RNAs, which are matured in cytoplasms are different between the reciprocal hybrids. Moreover, 24-nt siRNAs have been reported to direct DNA methylation and regulate gene expression., In the current study, we analysed the siRNAs which were generated in M × W and W × M hybrids, and associated these siRNAs with DMGs in the reciprocal hybrids. And we found 15 genes (among 79 DMGs) were associated with siRNAs in both reciprocal hybrids. Interestingly, siRNAs’ expression levels were mostly negatively correlated with DNA methylation levels (Fig. 6). It is notable that most DMRs of associated genes that were mapped with siRNAs were located in the gene body regions (Supplementary Table S6). Thus, these results are consistent with a recent study in Arabidopsis that methylation levels were inversely correlated with their siRNAs profiles in genomic transcribed region of protein coding genes. So it is reasonable to hypothesize that different sources of siRNAs induced the different DNA methylation patterns between the two reciprocal hybrids, and the correlation might be also associated with specific genomic regions.In conclusion, this study explored the epigenomic differences of tomato reciprocal hybrids. The reciprocal hybrids showed divergent methylation levels in different genomic regions, especially in the gene body regions. Gene expression analysis suggested that methylation in the promoter region inversely regulates transcription, while in gene body regions the correlation between DNA methylation and gene expression is complicated. Moreover, association analysis of DNA methylation and siRNAs implied the differences of DNA methylation might be induced by siRNAs existed in different cytoplasms. And these correlations were relative to specific genomic regions. Taken together, this research proposed that the direction of hybridization caused variance in DNA methylation profiles between the reciprocal hybrids. Furthermore, the distinct methylation levels might be regulated by siRNAs and then lead to the changes in gene expression levels and ultimately the different phenotypes in the reciprocal hybrids.
Supplementary data
Supplementary data are available at DNARES online.
Funding
National Natural Science Foundation of China (Grant No. 31272159) and National Natural Science Foundation of China (Grant No. 31572118) and Zhejiang Provincial Natural Science Foundation of China (Grant No. LZ12C15001). Funding to pay the Open Access publication charges for this article was provided by the National Natural Science Foundation of China (Grant No. 31572118).
Conflict of interest
None declared.Click here for additional data file.
Authors: A Amante-Bordeos; L A Sitch; R Nelson; R D Dalmacio; N P Oliva; H Aswidinnoor; H Leung Journal: Theor Appl Genet Date: 1992-07 Impact factor: 5.699
Authors: Michael Groszmann; Ian K Greaves; Zayed I Albertyn; Graham N Scofield; William J Peacock; Elizabeth S Dennis Journal: Proc Natl Acad Sci U S A Date: 2011-01-25 Impact factor: 11.205
Authors: Madeleine P Ball; Jin Billy Li; Yuan Gao; Je-Hyuk Lee; Emily M LeProust; In-Hyun Park; Bin Xie; George Q Daley; George M Church Journal: Nat Biotechnol Date: 2009-03-29 Impact factor: 54.908