Literature DB >> 28595569

Genome-wide comparative transcriptome analysis of CMS-D2 and its maintainer and restorer lines in upland cotton.

Jianyong Wu1, Meng Zhang2, Bingbing Zhang2, Xuexian Zhang2, Liping Guo2, Tingxiang Qi2, Hailin Wang2, Jinfa Zhang3, Chaozhu Xing4.   

Abstract

BACKGROUND: Cytoplasmic male sterility (CMS) conferred by the cytoplasm from Gossypium harknessii (D2) is an important system for hybrid seed production in Upland cotton (G. hirsutum). The male sterility of CMS-D2 (i.e., A line) can be restored to fertility by a restorer (i.e., R line) carrying the restorer gene Rf1 transferred from the D2 nuclear genome. However, the molecular mechanisms of CMS-D2 and its restoration are poorly understood.
RESULTS: In this study, a genome-wide comparative transcriptome analysis was performed to identify differentially expressed genes (DEGs) in flower buds among the isogenic fertile R line and sterile A line derived from a backcross population (BC8F1) and the recurrent parent, i.e., the maintainer (B line). A total of 1464 DEGs were identified among the three isogenic lines, and the Rf1-carrying Chr_D05 and its homeologous Chr_A05 had more DEGs than other chromosomes. The results of GO and KEGG enrichment analysis showed differences in circadian rhythm between the fertile and sterile lines. Eleven DEGs were selected for validation using qRT-PCR, confirming the accuracy of the RNA-seq results.
CONCLUSIONS: Through genome-wide comparative transcriptome analysis, the differential expression profiles of CMS-D2 and its maintainer and restorer lines in Upland cotton were identified. Our results provide an important foundation for further studies into the molecular mechanisms of the interactions between the restorer gene Rf1 and the CMS-D2 cytoplasm.

Entities:  

Keywords:  CMS-D2; Circadian rhythm; RNA-seq; Restorer gene; Upland cotton

Mesh:

Year:  2017        PMID: 28595569      PMCID: PMC5465541          DOI: 10.1186/s12864-017-3841-0

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Cotton is the most important fiber crop and an important oil-producing crop worldwide. As in other crop plants, utilization of heterosis is an important way to improve yield in cotton production. To date, most commercial cotton hybrids have been produced by artificial emasculation and pollination (AEP) in China [1] and India (http://www.cicr.org.in/), which is a time-consuming, labor-intensive and costly process. In addition, the purity of hybrid seeds produced by AEP cannot be guaranteed as some artificial emasculation may not completely remove the pollen. The cytoplasmic male sterility (CMS) system is an ideal tool for hybrid seed production, and it has been widely used to facilitate the use of heterosis in many crops [2]. CMS-D2 is one of the two major types of CMS [3-6] in cotton and has contributed to cotton heterosis utilization. Rf1 is the restorer gene and can recover the fertility of CMS-D2. Considering the importance of the CMS and restoration system, numerous molecular mapping studies have been conducted on of Rf1 in cotton [7-13]. Recently, a backcross population (BC8F1) with plants distinguished as male fertile (F) or sterile (S) was generated and used to map the Rf1 gene by our group [14]. However, there have been few studies on the molecular mechanism of the restorer gene. Over the past several years, next-generation sequencing (NGS) has been used in numerous research areas, resulting in high-throughput production of massive DNA and RNA data [15]. As a powerful tool for studying global transcriptional networks, transcriptome sequencing provides high-resolution data and has been widely used in many crops. In cotton, it has been used to study boll development [16], fiber development [17-19], leaf senescence [20], gland morphogenesis [21], abiotic stress responses [22-24], biotic stress responses [25, 26], RNA editing in relation to CMS-D8 [27], and genic male sterility [28]. Differential display and gene chips were used to study the expression levels of differentially expressed genes (DEGs) associated with the fertility of CMS-D8 in cotton [29, 30]. However, the global gene expression patterns of CMS-D2 and its interaction with its restorer gene Rf1 are still unknown. Now that the genome sequences of G. raimondii [31, 32], G arboreum [33], and G hirsutum [34, 35] have been published, gene annotation can be better performed, which will improve genome-wide transcriptome sequencing and analysis in cotton. To better understanding the gene expression profiles affected by the restorer gene Rf1 in Upland cotton with the CMS-D2 cytoplasm, RNA-seq by the Illumina NGS technology was used in this study to identify DEGs in flower buds of fertile (i.e., restorer R line) and sterile (i.e., CMS A line) plants of a backcross population (BC8F1) and its recurrent parent, i.e., the maintainer B line. GO and KEGG enrichment analysis showed that genes related to circadian rhythms were significantly affected by the presence of the restorer gene. The results from this study will serve as a foundation for further studies of the molecular mechanisms of interaction between the restorer gene Rf1 and the CMS-D2 cytoplasm.

Methods

Plant materials

In our previous study [14], the sterile line ZBA with the CMS-D2 cytoplasm was crossed with the restorer line Zhonghui46, and then the maintainer B line (designated dB3) with the normal fertile Upland cotton (AD1) cytoplasm was used as the recurrent male parent to backcross with the F1 plants to construct a BC8F1 population. In this segregating population, the sterile plants (designated dZB3) were considered to be the CMS-D2 A line, and the fertile plants (designated dZK3) were considered to be the restorer R line. All materials were provided by Institute of Cotton Research (ICR), Chinese Academy of Agricultural Science (CAAS). The BC8F1 population and recurrent parent were grown in the Experimental Farm, ICR-CAAS, Anyang, Henan province, China. A randomized complete block design with three biological replications was used, and crop management practices followed local recommendations. On sunny days of about 30 °C, flowering buds of about 3 mm in length (at roughly the stage of male meiosis) were collected and combined from 50 plants for each genotype in each replication. All harvested samples were snap-frozen in liquid nitrogen and stored at −80 °C before use.

RNA extraction, RNA-seq library construction and sequencing

Total RNA was isolated using the Sigma Spectrum Plant Total RNA kit (Sigma-Aldrich, USA) according to the manufacturer’s protocol. The concentration of each RNA sample was measured using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies Inc., USA). Nine individual libraries (three samples for each of the three genotypes) were constructed with an Illumina RNA TruSeq kit (Illumina, USA) per the manufacturer’s instructions using 5 μg of total RNA. Subsequently, PCR amplification was performed using Phusion DNA polymerase (NEB, USA) for 15 PCR cycles, and f cDNA fragments of 300–500 bp were isolated from a 2% low range ultra agarose gel (Bio-Rad, USA). After quantification by TBS380 (Picogreen, Invitrogen, USA), the paired-end libraries were then sequenced using the Illumina HiSeq™ 2500 system (2 × 151 bp read length) at Shanghai Majorbio Bio-pharm Biotechnology Co., Ltd. (Shanghai, China).

Data processing and expression analysis

SeqPrep (https://github.com/jstjohn/SeqPrep) and Sickle (https://github.com/najoshi/sickle) were used to remove low-quality reads (i.e., Q value <25), adapter sequences, reads with ambiguous bases (‘N’), and fragments of less than 20 bp in length. All clean reads were mapped to the G. hirsutum TM-1 reference genome (http://mascotton.njau.edu.cn/info/1054/1118.htm) using the TopHat software [36] which allowed no more than a 2-nucleotide mismatch. Gene annotation and expression quantification were performed using the software Cufflinks (http://cufflinks.cbcb.umd.edu/), and the FPKM (fragments per kilobase of exon per million fragments) method was used to identify DEGs based on a false discovery rate (FDR) of <0.05 and estimated absolute log2fold change > 1 between different genotypes. A heatmap was constructed using the web server ClustVis (http://biit.cs.ut.ee/clustvis/) with default parameters.

Functional annotation

GO and KEGG functional annotations for the transcripts were retrieved using blast2go (http://www.blast2go.com/b2ghome) and blastx/blastp searches against the KEGG genes (http://www.genome.jp/kegg/genes.html) database, respectively. GO term and KEGG pathway enrichment analysis was performed on the significantly differentially expressed transcripts using the Goatools software (https://github.com/tanghaibao/Goatools) and KOBAS software (http://kobas.cbi.pku.edu.cn) [37], with a corrected P-value ≤0.05 as the threshold.

Quantitative RT-PCR (qRT-PCR) validation

First-strand cDNA was generated from 1 μg total RNA from individual replications using a PrimerScript RT Reagent kit (Perfect Real Time, TaKaRa, Japan). Quantitative real-time RT-PCR was performed using SYBR® Premix Ex TaqTM (Perfect Real Time, TaKaRa, Japan) according to the manufacturer’s instructions. Primers for qPCR were designed using the Primer Express software (Applied Biosystems, Foster City, CA, USA), synthesized commercially (Tianyi Huiyuan Biotechnology, Beijing, China), and are shown in Additional file 1. PCR analysis was performed using a CFX96TM instrument (Bio-Rad, USA). Each reaction contained 2 μl cDNA template, 800 nM of each primer and 10 μl 2 × SYBR® Premix Ex TaqTM, with ddH2O to bring the final volume to 20 μl. The reaction was pre-denatured at 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 5 s, annealing at 58 °C for 20 s and extension at 72 °C for 30 s. A melting curve was generated for each sample at the end of each run to determine the specificity of the amplified products. Each gene was analyzed in triplicate, and controls without template were also included. Actin was used as an internal control. The threshold cycle (Ct) values of each reaction were determined automatically by the instrument software, and the relative amount of each gene to the internal control was calculated using the eq. 2−ΔΔCt, where ΔΔCt = (Ct target − Ct actin) sample X − (Ct target − Ct actin) sample 1. The whole assay protocol was repeated three times to ensure the reliability of the assay data. The standard deviations of the data were determined from the three independent experiments. The statistical significance of expression differences was analyzed using the Student’s t-test.

Identification of SNPs

Single nucleotide polymorphism (SNP) loci for candidate genes were identified in the assembled transcript sequences using the Samtools (http://samtools.sourceforge.net/) and VarScan (http://varscan.sourceforge.net/) software.

Results

Transcriptome sequencing and mapping

In this study, near-isogenic A, B and R lines each comprising three individual biological samples of 3 mm-long flowering buds at the stage of male meiosis were used to construct cDNA libraries for a deep Illumina sequencing. After filtering the raw reads, 48,365,894, 46,208,878, and 40,915,284 clean reads for the three replicates of the maintainer B line (dB3), 35,886,986, 46,397,948, and 39,667,094 clean reads for the male sterile A line (dZB3) in the BC8F1 population, and 45,856,082, 42,6816,76, and 52,325,842 clean reads for the fertile restorer R line (dZK3) in the BC8F1 population were obtained (Additional file 2). More than 90% of these clean reads were mapped to the G. hirsutum TM-1 reference genome (Additional file 3). The deep RNA-seq had a 90.55–91.89% genome coverage of the predicted genes in Upland cotton. In total, 62,001 of the 70,478 predicted transcripts in the reference TM-1 genome were identified in this study and were used for a further analysis.

GO and KEGG classification of the expressed genes

Blast2go was used to retrieve the GO functional annotations, and the results showed that 46,150 of the 62,001 predicted transcripts were successfully assigned GO annotations within the three main GO categories and 57 sub-categories (Fig. 1a). ‘Metabolic process’ (32,285 genes; representing 69.9% of transcripts in the biological process category), ‘cellular process’ (28,157 genes; 61.0%), and ‘single-organism process’ (23,292 genes; 50.5%) had the highest numbers of genes in the biological process category. ‘Cell’ (21,221 genes; representing 46.0% of transcripts in the cellular component category), ‘cell part’ (20,897 genes; 45.3%) and ‘organelle’ (14,269 genes; 30.9%) had the most genes in the cellular component category. ‘Catalytic activity’ (23,001 genes; representing 49.8% of transcripts in the molecular function category), ‘binding’ (22,866 genes; 49.5%) and ‘transporter activity’ (2677 genes; 5.8%) were the most important sub-categories in the molecular function category (Additional file 4). In addition, a total of 23,211 transcripts were categorized into 175 pathways (Additional file 5), among which metabolic pathways, biosynthesis of secondary metabolites and ribosome pathways contained the most transcripts (Fig. 1b).
Fig. 1

Gene ontology classification (a) and COG functional categories (b) of unigenes

Gene ontology classification (a) and COG functional categories (b) of unigenes

Global Transcriptome changes

The number of reads mapped to the predicted transcripts of the TM-1 reference genome was calculated as the expression level for each gene. The following three comparisons of gene expression levels were performed: B (dB3) vs. A (dZB3), which had the isogenic nuclear genomes (containing the recessive non-functional rf1 allele) but different cytoplasms and fertility; B (dB3) vs. R (dZK3), both of which were isogenic and fertile but differed in their cytoplasms and Rf1 alleles; and A (dZB3) vs. R (dZK3), both of which had the same CMS-D2 cytoplasm but differed in fertility and Rf1 alleles. A total of 728 (442 upregulated and 286 downregulated), 918 (524 upregulated and 394 downregulated) and 456 (176 upregulated and 280 downregulated) DEGs were identified in the above three comparisons, respectively (Additional files 6–8). These DEGs represented a total of 1464 non-redundant genes, including 1368 that were distributed across the 26 chromosomes of G. hirsutum and 96 genes on 56 scaffolds (Fig. 2). It is interesting to note that Chr_D05 (with restorer gene Rf1) and the homeologous Chr_A05 (99.5 DEGs vs. 48.7 DEGs) carried more DEGs than the other chromosomes. Furthermore, among the 1464 DEGs, three possible mitochondrial targeted protein-coding genes (Gh_D01G1128, Gh_D06G0518 and Gh_A03G1169) and five possible chloroplast targeted protein- coding genes (Gh_A13G2212, Gh_A05G2854, Gh_A12G0821, Gh_A12G0217 and Gh_D11G3195) were differentially expressed between dZK3 and dB3, and three possible chloroplast targeted protein-coding genes (Gh_Sca078114G01, Gh_D01G0297 and Gh_A07G1517) were differentially expressed between dZB3 and dB3. These DEGs may be affected by the CMS-D2 cytoplasm.
Fig. 2

Distribution of the differentially expressed genes on different chromosomes. a Location distribution of DEGs on different chromosomes. b DEG numbers on different chromosomes. The Y-axis represents different chromosomes. xis and numbers behind each bar represent the DEG numbers on each chromosome

Distribution of the differentially expressed genes on different chromosomes. a Location distribution of DEGs on different chromosomes. b DEG numbers on different chromosomes. The Y-axis represents different chromosomes. xis and numbers behind each bar represent the DEG numbers on each chromosome The distribution of unique and common DEGs for the three comparisons is shown in Fig. 3. The results indicated that 251 of 728 DEGs were unique to B (dB3) vs. A (dZB3), 408 of 918 were unique to B (dB3) vs. R (dZK3), and 192 of 456 were unique to A (dZB3) vs. R (dZK3). Compared with R (dZK3, containing the restorer gene), 136 common DEGs were identified in both B (dB3) and A (dZB3) containing the non-restoring gene. Compared with B (dB3, with normal Upland cotton cytoplasm), 349 common DEGs were identified in both A (dZB3) and R (dZK3), which contained the CMS-D2 cytoplasm. Compared with the male sterile A line (dZB3), 103 common DEGs were identified in the fertile B (dB3) and R (dZK3) lines.
Fig. 3

Venn diagram showing the distribution of unique and common DEGs among three comparisons

Venn diagram showing the distribution of unique and common DEGs among three comparisons

GO and KEGG enrichment analysis of DEGs

For the 728 DEGs between B (dB3) and A (dZB3), ‘metabolic process’, ‘catalytic activity’ and ‘single-organism process’ were the three most common GO terms (Additional file 9), and ‘metabolic pathways’, ‘biosynthesis of secondary metabolites’ and ‘microbial metabolism in diverse environments’ were the three most common KEGG pathways (Additional file 10). Seven DEGs associated with the GO terms ‘molecular transducer activity’ and ‘electron carrier activity’ were specifically upregulated and downregulated, respectively in dB3. For the 918 DEGs between B (dB3) and R (dZK3), ‘metabolic process’, ‘cellular process’ and ‘catalytic activity’ were the three most common GO terms (Additional file 11), while the three most common pathways (Additional file 12) were the same as in B (dB3) and A (dZB3). Six DEGs associated with the ‘cell junction’ and ‘symplast’ were specifically upregulated in R (dZK3). For the 456 DEGs between A (dZB3) and R (dZK3), ‘metabolic process’, ‘cellular process’ and ‘binding’ were the three most common GO terms (Additional file 13), and ‘metabolic pathways’, ‘biosynthesis of secondary metabolites’ and ‘drug metabolism cytochrome P450’ were the three most common pathways (Additional file 14). Eleven DEGs associated with growth, six with structural molecule activity and five with electron carrier activity were specific upregulated in dZB3. To identify significant GO categories and KEGG pathways among the three comparisons, further GO and KEGG enrichment analyses were performed. The GO categories ‘negative regulation of circadian rhythm’, ‘transcription regulatory region DNA binding’ and ‘regulatory region nucleic acid binding’ had the highest enrichment ratios between the maintainer B line (dB3) and the CMS-D2 A (dZB3) line (Additional file 15), while ‘long-day photoperiodism’, ‘negative regulation of sequence-specific DNA binding transcription factor activity’ and ‘negative regulation of circadian rhythm’ had the highest enrichment ratios between the A line (dZB3) and the restorer R (dZK3) line (Additional file 16). ‘Allene-oxide cyclase activity’, ‘response to wounding’ and ‘oxidoreductase activity’ had the highest enrichment ratios between the B (dB3) and the R (dZK3) lines (Additional file 17). The three primary KEGG pathways with the highest ratios were ‘circadian rhythm’, ‘alpha-linolenic acid metabolism’ and ‘sesquiterpenoid and triterpenoid biosynthesis’ between the B (dB3) and A (dZB3) lines (Additional file 18); ‘circadian rhythm’, ‘protein processing in endoplasmic reticulum’ and ‘photosynthesis’ between the A (dZB3) and R (dZK3) lines (Additional file 19); and ‘protein processing in endoplasmic reticulum’, ‘alpha-linolenic acid metabolism’ and ‘thyroid hormone synthesis’ between the B (dB3) and R (dZK3) lines (Additional file 20). The results showed that the circadian rhythm pathway was an important and common pathway that was affected during meiosis.

Analysis of DEGs on Chr_D05 and DEGs related to circadian rhythms

In our previous study [14], the restorer gene Rf1 was shown to be located on Chr_D05 near position 54,287,522. In this study, Gh_D05G3189 and Gh_D05G3427 near the target region were found to be specifically expressed in the fertile R lines but were not expressed in the A or B lines. To further understand the effect of DEGs from regions adjacent to Rf1, GO enrichment analysis of 105 DEGs on Chr_D05 was performed. The results demonstrated that ‘sesquiterpene synthase activity’ and ‘(+)-delta-cadinene synthase activity’ were the two major GO terms with the highest enrichment ratios, while ‘sesquiterpenoid and triterpenoid biosynthesis’, ‘protein processing in endoplasmic reticulum’ and ‘carotenoid biosynthesis’ were the three major pathways identified in KEGG enrichment analysis. To examine the correlation between the expression of the DEGs in different samples, a heatmap analysis was performed based on the FPKM values of the 105 DEGs on Chr_D05 with the restorer gene and 16 DEGs related to the circadian rhythm (Fig. 4). The results showed that DEGs participating in sesquiterpene synthase activity and (+)-delta-cadinene synthase activity were all expressed preferentially in the B line, while most of the genes related to protein processing in the endoplasmic reticulum were highly expressed in the R line. Furthermore, it was interesting to find that most DEGs related to the circadian rhythm were highly expressed in the R and A lines with the CMS-D2 cytoplasm, implying a possible connection between the circadian rhythm and the CMS-D2 cytoplasm.
Fig. 4

Heatmap showing the FPKM values of DEGs on Chr_D05 and DEGs related to circadian rhythm. The FPKM values for the DEGs in the three samples were used for hierarchical analysis. The heatmap shows the expression abundance of the DEGs. The colors correspond to FPKM values, ranging from blue (low expression) to red (high expression). Those genes in green boxes represent DEGs related to circadian rhythm

Heatmap showing the FPKM values of DEGs on Chr_D05 and DEGs related to circadian rhythm. The FPKM values for the DEGs in the three samples were used for hierarchical analysis. The heatmap shows the expression abundance of the DEGs. The colors correspond to FPKM values, ranging from blue (low expression) to red (high expression). Those genes in green boxes represent DEGs related to circadian rhythm

Validation of RNA-seq data by qRT-PCR

To validate the RNA-seq data using real-time qRT-PCR, 11 DEGs were selected based on high fold-changes (Gh_A12G1505), specific expression in certain genotypes (Gh_A08G0004), chromosomal location on Chr_D05 (Gh_D05G0902, Gh_D05G1016, Gh_D05G3189, and Gh_D05G3427), and association with the circadian rhythm (Gh_D02G0690, Gh_A11G0920, Gh_A11G0926, Gh_D09G1513, and Gh_D12G1525). The expression patterns of these genes are shown in Fig. 5. The results showed that except for the Gh_D09G1513 gene, the expression patterns as determined by qRT-PCR were consistent with those obtained by RNA-seq, confirming the accuracy of the RNA-seq results in this study.
Fig. 5

qRT-PCR analysis of gene expression compared with the RNA-seq data. The gray columns represented the relative expression levels of the genes; the dotted lines represent the RNA-seq reads. A: sterile line, B: maintainer line, R: restorer line

qRT-PCR analysis of gene expression compared with the RNA-seq data. The gray columns represented the relative expression levels of the genes; the dotted lines represent the RNA-seq reads. A: sterile line, B: maintainer line, R: restorer line

SNP identification of DEGs on Chr_D05

The DEGs located on Chr_D05 with the restorer gene Rf1 were chosen for identification of SNPs among the three lines (genotypes). For the 105 DEGs on Chr_D05, 11 SNP loci in 11 DEGs were identified between the sequences from the R line and those from the non-restoring genome, i.e., the A and B lines, including seven loci in exons and four loci downstream of the coding sequences (Table 1). Among these genes, Gh_D05G3129, Gh_D05G3141, Gh_D05G3211 and Gh_D05G3427 were located within the predicted target region of Rf1. Therefore, some of them may be related to the fertility-restoring gene, especially Gh_D05G3427, which is a proton pump-interactor 1-like gene that was expressed specifically in the restorer line.
Table 1

SNP information for DEGs on Chr_D05

Gene ChromosomeStartEndTM-1Three lineAnnotationMutationtypeGene Annotationfpkm
BARBAR
Gh_D05G0901 D0576043147604314AAACdownstream17.3 kDa class I heat shock protein17.482813.663629.6254
Gh_D05G0972 D058,177,1188,177,118AAAGexonicprobable aquaporin PIP2–21.852725.440793.00306
Gh_D05G2138 D0520,020,31920,020,319TTTCdownstreamprotein DMR6-LIKE OXYGENASE 2-like0.325691.999192.05069
Gh_D05G2233 D0521,298,39621,298,396AAATdownstreamuncharacterized protein7.0529214.126820.0803
Gh_D05G3043 D0540,631,40340,631,403GGGAexoniclipid phosphate phosphatase 2-like1.776452.155163.76462
Gh_D05G3129 D0546,402,20946,402,209AAAGdownstreamcytochrome P450 like_TBP3.203271.1101411.0941
Gh_D05G3141 D0546,856,96146,856,961AAACexonicsmall ubiquitin-related modifier 1-like14.507920.229340.4531
Gh_D05G3211 D0549,817,51249,817,512TTTAexonicelongation factor 259.366348.785826.7643
Gh_D05G3427 D0555,765,42355,765,423AAATexonicproton pump-interactor 1-like005.70722
Gh_D05G3508 D0557,898,84957,898,849AAAGexonicsynonymous(+)-delta-cadinene synthase7.730050.796990.82779
Gh_D05G3696 D0561,550,77561,550,775AAAGexonicprobable LRR receptor-like serine/threonine-protein kinase1.435510.940600.59621
SNP information for DEGs on Chr_D05

Discussion

Illumina sequencing and sequence annotation

The CMS system is considered the most important tool and is ideal for cotton hybrid seeds production. A restorer line containing a restorer gene is the determinant for the CMS system. Thus, to understand restorer genes, a large number of molecular mapping studies have been conducted. However, there have been no reports about how the restorer gene Rf1 affects gene expression. In the present study, transcriptome sequencing was performed to generate large amounts of cDNA sequence data and profile transcriptome changes in a restorer gene backcross population (BC8F1) with CMS cytoplasm and its backcross parent (maintainer line) without the CMS-D2 cytoplasm. With the genome sequence of G. hirsutum used as the reference genome, more than 90% of clean reads were mapped to the reference genome. In total, 62,001 of the 70,478 predicted transcripts in the reference genome were identified in this study through gene annotation. Thus, the transcriptomic data in this study met the basic requirements needed for a comparative analysis. Finally, 1464 DEGs were identified among the three lines, many of which could serve as potential targets for future studies aimed at discovering the molecular mechanism of nucleo-cytoplasmic interactions.

DEGs in the restorer Gene located on chromosome c5

The 1464 DEGs were mapped to 26 chromosomes and 56 scaffolds of G. hirsutum. Chr_D05 and its homeologous chromosome Chr_A05 were the two chromosomes with the most DEGs. In our previous study, the restorer gene Rf1 was mapped to Chr_D05 [14]. This implied that the expression profiles of these genes may be affected by the restorer gene. Sesquiterpene synthase activity, (+)-delta-cadinene synthase activity and carotenoid biosynthesis were identified as important pathways according to the GO enrichment analysis of the 105 DEGs on Chr_D05. Cotton (+)-delta-cadinene synthase has been reported as a sesquiterpene cyclase that catalyzes a branch-point step leading to the biosynthesis of sesquiterpene phytoalexins, including gossypol [38-40]. In plants, carotenoids are crucial for various biological processes, such as photosynthesis, photoprotection, and regulation of growth and development [41-44], as well as responses to the environment [45, 46]. During field tests, the fertility of CMS-D2 restorer containing the restorer gene was affected by the environment. Therefore, whether there are correlations between terpene biosynthesis and functions of the restorer gene requires further study. In our study, Gh_D05G3427, which had a SNP and specifically expression in the restorer line, was identified in the predicted target region of Rf1 on Chr_D05. It is a proton pump-interactor 1-like gene (PPI1). Previous studies have demonstrated that the PPI1 is a novel protein that can interact with the C-terminal autoinhibitory domain of the plasma membrane (PM) H(+)-ATPase [47]. PM H + −ATPases are important for plant nutrient acquisition and can be detected at the whole plant level [48-50]. Furthermore, some PM H + −ATPases only expressed in anther tissues have been identified [51-53], implying that this type of genes is important for male gametogenesis. In this study, the PM H + −ATPases regulatory gene Gh_D05G3427 was identified specifically in the restorer line. Thus, it could be a potentially important gene that interacts with the restorer gene and affects male gametophyte development. Further study of this gene is needed to elucidate the genetic and molecular mechanism of fertility restoration associated with Rf1.

The circadian rhythm pathway and its relationship with pollen development

Previous research has shown that the circadian rhythm pathway is involved in the promotion of reproductive organs development in the vegetative stage in higher plants [54-56], photosynthesis [57, 58], starch metabolism [59-61], phytohormone response [61-63], hypocotyl elongation [64, 65], and plant–pathogen interaction [66]. Additionally, some research has indicated that the circadian rhythm pathway is involved in the male sterility transition [67, 68]. In this current study, several genes associated with the circadian rhythm were identified, some of which comprise interlocking transcriptional feedback loops that play important roles in the plant central clock. Some loops integrate environmental factors, such as light and temperature, into the central clock through the input signaling pathway and import the rhythm signal into downstream signaling pathways through output signaling pathways [69, 70]. Here, circadian rhythm differences between the fertile and sterile lines were also identified, and the differential expression profiles of the genes related to the circadian rhythm were confirmed by qRT-PCR. However, how the restorer gene regulates the circadian rhythm, which in turn regulates male fertility, needs a further study.

Conclusions

Through genome-wide comparative transcriptome analysis, 1464 DEGs were identified in flower buds among the fertile R line, maintainer B line and sterile A line. The Rf1-carrying Chr_D05 and the homeologous Chr_A05 had more DEGs than the other chromosomes. qRT-PCR further confirmed the accuracy of the RNA-seq results. The circadian rhythm pathway was identified as an important pathway differing between the fertile and sterile lines by GO and KEGG enrichment analysis. In the predicted target region of Rf1 on Chr_D05, Gh_D05G3427 was found to be expressed specifically in the restorer line and to have a restorer line specific SNP. Our results provide useful data for future investigations into the molecular mechanisms of nucleo-cytoplasmic interaction in CMS cotton. Primers for quantitative RT-PCR (XLS 125 kb) Trimmed sequencing data (XLS 62 kb) Mapping percentage to the TM-1 reference genome (XLS 122 kb) GO classification of the expressed genes (XLS 60 kb) KEGG classification of the expressed genes (XLS 84 kb) Information on the differentially expressed genes between B and A (XLS 93 kb) Information on the differentially expressed genes between B and R (XLS 61 kb) Information on the differentially expressed genes between A and R (XLS 63 kb) GO analysis of DEGs between B and A (XLS 90 kb) KEGG analysis of DEGs between B and A (XLS 76 kb) GO analysis of DEGs between B and R (XLS 56 kb) KEGG analysis of DEGs between B and R (XLS 90 kb) GO analysis of DEGs between A and R (XLSX 41 kb) KEGG analysis of DEGs between A and R (XLS 56 kb) GO enrichment analysis of DEGs between B and A (XLS 68 kb) GO enrichment analysis of DEGs between A and R (XLS 2620 kb) GO enrichment analysis of DEGs between B and R (XLS 560 kb) KEGG enrichment analysis of DEGs between B and A (XLS 684 kb) KEGG enrichment analysis of DEGs between A and R (XLS 378 kb) KEGG enrichment analysis of DEGs between B and R (XLS 60 kb)
  58 in total

1.  PLANT PLASMA MEMBRANE H+-ATPases: Powerhouses for Nutrient Uptake.

Authors:  Michael G Palmgren
Journal:  Annu Rev Plant Physiol Plant Mol Biol       Date:  2001-06

Review 2.  Regulation of output from the plant circadian clock.

Authors:  Esther Yakir; Dror Hilman; Yael Harir; Rachel M Green
Journal:  FEBS J       Date:  2007-01       Impact factor: 5.542

3.  RNA editing events in mitochondrial genes by ultra-deep sequencing methods: a comparison of cytoplasmic male sterile, fertile and restored genotypes in cotton.

Authors:  Hideaki Suzuki; Jiwen Yu; Scott A Ness; Mary A O'Connell; Jinfa Zhang
Journal:  Mol Genet Genomics       Date:  2013-06-29       Impact factor: 3.291

Review 4.  Transcriptome analysis using next-generation sequencing.

Authors:  Kai-Oliver Mutz; Alexandra Heilkenbrinker; Maren Lönne; Johanna-Gabriela Walter; Frank Stahl
Journal:  Curr Opin Biotechnol       Date:  2012-09-25       Impact factor: 9.740

Review 5.  Carotenoid oxidation products as stress signals in plants.

Authors:  Michel Havaux
Journal:  Plant J       Date:  2013-12-28       Impact factor: 6.417

6.  Intracellular localisation of PPI1 (proton pump interactor, isoform 1), a regulatory protein of the plasma membrane H(+)-ATPase of Arabidopsis thaliana.

Authors:  M C Bonza; T Fusca; U Homann; G Thiel; M I De Michelis
Journal:  Plant Biol (Stuttg)       Date:  2009-11       Impact factor: 3.081

7.  Genome sequence of the cultivated cotton Gossypium arboreum.

Authors:  Fuguang Li; Guangyi Fan; Kunbo Wang; Fengming Sun; Youlu Yuan; Guoli Song; Qin Li; Zhiying Ma; Cairui Lu; Changsong Zou; Wenbin Chen; Xinming Liang; Haihong Shang; Weiqing Liu; Chengcheng Shi; Guanghui Xiao; Caiyun Gou; Wuwei Ye; Xun Xu; Xueyan Zhang; Hengling Wei; Zhifang Li; Guiyin Zhang; Junyi Wang; Kun Liu; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal:  Nat Genet       Date:  2014-05-18       Impact factor: 38.330

8.  Comparative transcriptomes profiling of photoperiod-sensitive male sterile rice Nongken 58S during the male sterility transition between short-day and long-day.

Authors:  Wei Wang; Zhenwei Liu; Zhibin Guo; Gaoyuan Song; Qin Cheng; Daiming Jiang; Yingguo Zhu; Daichang Yang
Journal:  BMC Genomics       Date:  2011-09-25       Impact factor: 3.969

9.  Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana.

Authors:  James C W Locke; László Kozma-Bognár; Peter D Gould; Balázs Fehér; Eva Kevei; Ferenc Nagy; Matthew S Turner; Anthony Hall; Andrew J Millar
Journal:  Mol Syst Biol       Date:  2006-11-14       Impact factor: 11.429

10.  Genome-wide analysis of DNA methylation in photoperiod- and thermo-sensitive male sterile rice Peiai 64S.

Authors:  Jihong Hu; Xiaojun Chen; Hongyuan Zhang; Yi Ding
Journal:  BMC Genomics       Date:  2015-02-19       Impact factor: 3.969

View more
  18 in total

1.  Expression profiling of immature florets of IR58025A, a wild-abortive cytoplasmic male sterile line of rice and its cognate, isonuclear maintainer line, IR58025B.

Authors:  K Pranathi; M B Kalyani; R M Sundaram; B C Viraktamath; S M Balachandran; S K Hajira; P Koteshwar Rao; S R Kulakarni; G Rekha; M Anila; M B V N Koushik; P Senguttuvel; A S Hariprasad; S K Mangrautia; M S Madhav
Journal:  3 Biotech       Date:  2019-06-21       Impact factor: 2.406

2.  Genome-wide comparative transcriptome analysis of the A4-CMS line ICPA 2043 and its maintainer ICPB 2043 during the floral bud development of pigeonpea.

Authors:  Abhishek Bohra; Abhishek Rathore; Prasad Gandham; Rachit K Saxena; S J Satheesh Naik; Dibendu Dutta; Indra P Singh; Farindra Singh; Meenal Rathore; Rajeev K Varshney; Narendra P Singh
Journal:  Funct Integr Genomics       Date:  2021-02-26       Impact factor: 3.410

3.  Transcriptome, cytological and biochemical analysis of cytoplasmic male sterility and maintainer line in CMS-D8 cotton.

Authors:  Li Yang; Yuanlong Wu; Meng Zhang; Jinfa Zhang; James McD Stewart; Chaozhu Xing; Jianyong Wu; Shuangxia Jin
Journal:  Plant Mol Biol       Date:  2018-07-31       Impact factor: 4.076

4.  A genome-wide analysis of pentatricopeptide repeat (PPR) protein-encoding genes in four Gossypium species with an emphasis on their expression in floral buds, ovules, and fibers in upland cotton.

Authors:  Zongfu Han; Yuxiang Qin; Xihua Li; Jiwen Yu; Ruzhong Li; Chaozhu Xing; Mingzhou Song; Jianyong Wu; Jinfa Zhang
Journal:  Mol Genet Genomics       Date:  2019-08-24       Impact factor: 3.291

5.  Exploration of miRNAs and target genes of cytoplasmic male sterility line in cotton during flower bud development.

Authors:  Hushuai Nie; Yumei Wang; Ying Su; Jinping Hua
Journal:  Funct Integr Genomics       Date:  2018-04-07       Impact factor: 3.410

6.  The cellulose synthase (CesA) gene family in four Gossypium species: phylogenetics, sequence variation and gene expression in relation to fiber quality in Upland cotton.

Authors:  Sujun Zhang; Zhenxing Jiang; Jie Chen; Zongfu Han; Jina Chi; Xihua Li; Jiwen Yu; Chaozhu Xing; Mingzhou Song; Jianyong Wu; Feng Liu; Xiangyun Zhang; Jinfa Zhang; Jianhong Zhang
Journal:  Mol Genet Genomics       Date:  2021-01-13       Impact factor: 3.291

7.  A super PPR cluster for restoring fertility revealed by genetic mapping, homocap-seq and de novo assembly in cotton.

Authors:  Bin Gao; Gaofeng Ren; Tianwang Wen; Haiping Li; Xianlong Zhang; Zhongxu Lin
Journal:  Theor Appl Genet       Date:  2021-11-22       Impact factor: 5.699

8.  Transcriptomic analysis of grapevine Dof transcription factor gene family in response to cold stress and functional analyses of the VaDof17d gene.

Authors:  Zemin Wang; Yi Wang; Qian Tong; Guangzhao Xu; Meilong Xu; Huayang Li; Peige Fan; Shaohua Li; Zhenchang Liang
Journal:  Planta       Date:  2021-02-01       Impact factor: 4.116

9.  Transcriptome analysis identified aberrant gene expression in pollen developmental pathways leading to CGMS in cotton (Gossypium hirsutum L.).

Authors:  Rasmieh Hamid; Hassan Marashi; Rukam S Tomar; Saeid Malekzadeh Shafaroudi; Pritesh H Sabara
Journal:  PLoS One       Date:  2019-06-24       Impact factor: 3.240

10.  Comparative transcriptome analysis between inbred and hybrids reveals molecular insights into yield heterosis of upland cotton.

Authors:  Kashif Shahzad; Xuexian Zhang; Liping Guo; Tingxiang Qi; Lisheng Bao; Meng Zhang; Bingbing Zhang; Hailin Wang; Huini Tang; Xiuqin Qiao; Juanjuan Feng; Jianyong Wu; Chaozhu Xing
Journal:  BMC Plant Biol       Date:  2020-05-27       Impact factor: 4.215

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.