Literature DB >> 25193496

De novo transcriptome sequence assembly from coconut leaves and seeds with a focus on factors involved in RNA-directed DNA methylation.

Ya-Yi Huang1, Chueh-Pai Lee2, Jason L Fu1, Bill Chia-Han Chang2, Antonius J M Matzke3, Marjori Matzke3.   

Abstract

Coconut palm (Cocos nucifera) is a symbol of the tropics and a source of numerous edible and nonedible products of economic value. Despite its nutritional and industrial significance, coconut remains under-represented in public repositories for genomic and transcriptomic data. We report de novo transcript assembly from RNA-seq data and analysis of gene expression in seed tissues (embryo and endosperm) and leaves of a dwarf coconut variety. Assembly of 10 GB sequencing data for each tissue resulted in 58,211 total unigenes in embryo, 61,152 in endosperm, and 33,446 in leaf. Within each unigene pool, 24,857 could be annotated in embryo, 29,731 could be annotated in endosperm, and 26,064 could be annotated in leaf. A KEGG analysis identified 138, 138, and 139 pathways, respectively, in transcriptomes of embryo, endosperm, and leaf tissues. Given the extraordinarily large size of coconut seeds and the importance of small RNA-mediated epigenetic regulation during seed development in model plants, we used homology searches to identify putative homologs of factors required for RNA-directed DNA methylation in coconut. The findings suggest that RNA-directed DNA methylation is important during coconut seed development, particularly in maturing endosperm. This dataset will expand the genomics resources available for coconut and provide a foundation for more detailed analyses that may assist molecular breeding strategies aimed at improving this major tropical crop.
Copyright © 2014 Huang et al.

Entities:  

Keywords:  RNA-seq; coconut; endosperm; epigenetics; monocot

Mesh:

Year:  2014        PMID: 25193496      PMCID: PMC4232540          DOI: 10.1534/g3.114.013409

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


The coconut palm (Cocos nucifera L., Arecaceae) is one of the most important crops in tropical zones, and it plays a significant role in the economy and culture in many tropical countries (Gunn ). Coconut is the only species within the genus Cocos and it is morphologically classified into two types: tall and dwarf. Tall coconuts are outbreeding, whereas the dwarf variety, which is thought to result from human selection, is mainly self-pollinating (Gunn ). Despite the importance of coconut for humans and tropical ecosystems, available genetic sequences are relatively scarce as compared with other economically important palms, oil palm and date palm, for which whole genome sequences and transcriptome data are available (Yang ; Al-Dous ; Bourgis ; Fang ; Uthaipaisanwong ; Yin ; Zhang ; Al-Mssallem ; Dussert ; Singh ). In the case of coconut, much of the recent effort has been limited to marker development to assist with cultivar identification (Perera , 2003; Perera ; Gunn ). Molecular studies at a genome-wide scale using next-generation sequencing are even scarcer. The sequence of the coconut chloroplast genome was recently determined (Huang ). However, a whole genome sequence has not yet been reported and the only published genome-wide study of coconut is a transcriptome analysis of a tall variety using combined tissues of leaves and fruit flesh (Fan ). Additional work is thus needed to increase coconut genomic and transcriptomic resources, which may provide new insights for the discovery of novel genes linked to important agronomic traits. In addition to being a prominent tropical crop, coconut is also interesting to investigate from the perspective of comparative seed development. Flowering plants can be divided into two major groups. Coconut, rice, wheat, barley, and maize are examples of monocotyledonous plants (monocots), whereas Arabidopsis thaliana (Arabidopsis) and legumes are representatives of the dicotyledonous group (dicots). Double fertilization in both groups produces seeds containing a diploid embryo and a triploid endosperm, which acts as a nutrient store for the developing or germinating embryo. In some monocots, including cereals and coconut, the endosperm persists in the mature seed and supplies an important source of nutrition in the human diet. Coconut has the second largest seed in the world, exceeded only by Lodoicea maldivica, another palm species endemic to the Seychelles. Seed size, which is a determinant of crop yield, is dependent on many factors, including maternal influences, epigenetic processes, and endosperm growth (Sreenivasulu and Wobus 2013). Similar to other monocots and dicot plants with albuminous seeds such as coffee and castor bean (Joët ), the embryo in coconut is tiny compared with the abundant endosperm. In late developing and mature coconut seeds, the endosperm, which supplies the major edible portion of coconut, can be 100-times the weight of the corresponding embryo (Supporting Information, Figure S1). Epigenetic processes, including those involving small RNAs and components of the RNA-directed DNA methylation (RdDM) machinery (Matzke and Mosher 2014), have important roles during seed development. Epigenetic alterations of chromatin, which include not only DNA methylation but also various histone modifications, are necessary to regulate seed-specific genes and to protect genome integrity by silencing transposons. In addition, epigenetic modifications are required in the endosperm to establish parental imprinting (Gehring 2013), an epigenetic phenomenon of allele-specific expression that can influence seed size (Sreenivasulu and Wobus 2013). Embryo and endosperm transcriptomes have been determined for several other plant species, including Arabidopsis (Belmonte ) and rice (Xu ; Gao ). Comparative transcriptomes of fruit, seed, and mesocarp tissues with an emphasis on fatty acid composition and metabolism at different developmental stages were determined for oil palm and date palm (Bourgis ; Dussert ). However, detailed transcriptome studies during seed development have not yet been extended to many nonmodel plants. Coconut represents an interesting nonmodel plant that produces exceptionally large seeds containing copious amounts of endosperm at later stages of development and small but macroscopically visible embryos. Coconut seeds thus provide an opportunity to investigate the expression of epigenetic factors in a developmental context that is unique. With these considerations in mind, we have performed de novo transcriptome assembly of RNA-seq libraries prepared from seed tissues (mature embryos and gelatinous endosperm) and leaves of a dwarf coconut plant. We report the findings from our analysis of the transcriptome data from these three tissues with a focus on identification of putative homologs of factors required for RdDM.

Materials and Methods

Plant materials and RNA extraction

For this study, we used a dwarf variety of coconut provided by Mr. Chi-Tai Lin, a private breeder who imported the first batch of green dwarf coconuts with strong taro fragrance to Taiwan from Thailand approximately 30 yr ago and started to breed this particular variety in Hengchun peninsula, Pingtung County, southern Taiwan. RNA-Seq data were collected from maturing gelatinous endosperm (coconut approximately 5 months old, embryo invisible at this stage), nearly mature embryo (coconut at approximately 8 months old), and young leaf (seedling at approximately 8 months old) (Figure S1). Total RNA from each tissue was extracted using a Plant Total RNA Miniprep Purification Kit (GMbiolab Co, Ltd., Taichung, Taiwan). We used the ratio of absorbance at 260 nm and 280 nm (A260/280) and gel electrophoresis to measure the purity and integrity of the extracted RNA. The mRNA molecules were isolated from high-quality RNA (concentration >300 ng/µl; A260/230 > 1.7; A260/280 = 1.8–2.1).

Library construction and sequencing

We used TruSeq RNA Sample Preparation kit for cDNA library construction (Illumina, San Diego, CA). First, poly-T oligo-attached magnetic beads were used to isolate mRNA from total RNA. The isolated mRNA was fragmented and reverse-transcribed to single-stranded cDNA with random primers, forming a mixture of DNA/RNA hybrid. The second-stranded cDNA was then synthesized and purified. During the synthesis of the second stranded cDNA, deoxythymidine triphosphate (dTTP) was substituted with deoxyuridine triphosphate (dUTP), which helped enforce strand specificity (Wang ). The resulting double-stranded cDNA was end-repaired and adenylated, followed by poly A tailing and paired-end Y-adaptor ligation, and then treated with uracil-DNA glucosylase (UDG) to digest the dUTP-marked strand. The selected strand was amplified and screened through agarose gel electrophoresis. Fragments ranging from 350 bp to 520 bp were selected for later sequencing. Paired-end sequencing using IlluminaHiSequation 2000/2500 was executed at YourGene Bioscience Co. (New Taipei City, Taiwan) with a maximum read length of 201 and a minimum read length of 101. All raw reads were deposited in the Sequence Read Archive (SRA) at NCBI under the accession number SRP041201.

De novo assembly, gene annotation, and expression, GO, and KEGG annotation

Bases of raw reads resulting from Illumina sequencing were trimmed with an error probability of 0.05 using CLC Genomic Workbench 6.0.1 (CLC bio, Aarhus, Denmark). Trimmed reads shorter than 35 bp were filtered out. Clean reads were put into Velvet 1.2.07 (Zerbino and Birney 2008) for initial assembly. Five k-mer values (35, 45, 55, 65, and 75) were set for our preliminary assembly. The resulting contigs were then merged for later procedure. Final contigs produced from Velvet were uploaded to Oases 0.2.06 (Schulz ) for transcript assembly and isoforms construction. Transcript isoforms of a locus with highest confidence score were selected as unigene for that locus. If two isoforms of the same locus have the same confidence score, then the one with longer length was selected as unigene. Unigene sequences were deposited in the Transcriptome Shotgun Assembly (TSA) Sequence Database at NCBI under the accession numbers of GBGL00000000 for embryo, GBGK00000000 for endosperm, and GBGM00000000 for leaf transcriptomes. To evaluate the expression of unigenes, we first mapped trimmed reads to unigene sequence using gapped alignment mode of the program Bowtie 2.2.1.0 (Langmead ). After alignment, we quantified gene expression with the software package eXpress 1.3.0 (Roberts and Pachter 2013), which reported the abundances of unigenes in the form of the fragments per kilobase of transcript per million mapped reads (FPKM). The identity of the unigene was annotated by BLAST (blastx) search against nonredundant (nr) protein database maintained by NCBI with an E-value cutoff of 10−5. The nr protein database is a combination of SwissProt, SwissProt updates, the Protein Identification Resource (PIR), and the Protein Data Bank (PDB). To determine the similarity of our transcriptome data with previous transcriptome data from palms, we performed BLAST search against available EST sequences of Cocos nucifera (1005 sequences), Elaeis guineensis (40,920 sequences), and Phoenix dactylifera (411 sequences) downloaded from GenBank. Moreover, we also aligned our unigenes against transcripts of fruit and seed tissues of oil palm (minimum 20 reads per million; 20,077 sequences in total) (Dussert ). To obtain the Gene Ontology (GO) annotation and perform enrichment analysis, accession numbers acquired from BLAST search were queried in the GO database using BLAST2GO 2.6.4 (Conesa ; Gotz ). For analysis of individual GO category enrichment, the reference dataset was obtained from the summation of unigenes matching to the same accession numbers. The significance of the enrichment was evaluated by Fisher’s exact test. To understand high-level functions and utilities of the biological systems, we also performed the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway annotation as described by Kanehisa .

Factors involved in RNA-directed DNA methylation

Factors in involved in RNA-directed DNA methylation (RdDM) were manually selected from annotated unigene pools. The sequences indicated in Table 3 were validated by Sanger sequencing and submitted to GenBank. They can be accessed through the following accession numbers: KJ851186–KJ851206. Real-time PCR (qPCR) was used for transcription profiling of four key RdDM-related genes (DRM, NRPD1, NRPE1, and MET1) in three and two different developmental stages of endosperm and embryo, respectively. The three stages of endosperm we used are gelatinous, early white solid (thin layer), and late white solid (thick layer) endosperms. The two embryos were collected from the same inflorescence that was used for collections of late white solid endosperm. Primers were designed based on assembled sequences and a complete primer list is provided in Table S1.
Table 3

Putative RdDM factors and related RNAi proteins identified in coconut tissues

NameTranscript IDLengthBLASTX best hit organism
Accession
nucleotideamino acidName and IDLength (AA)Coverage (%)Identity (%)
RPA1Endo_Locus_1915154171661Oryza sativa (BAD35511)17489556KJ851186
RPB1Leaf_21153181536Oryza brachyantha (XP_006659128)18558694KJ851187
NRPD1Leaf_Locus_31931712568Oryza brachyantha (XP_006653694)13869957KJ851188
NRPE1Endo_Locus_1000952441672Aegilops tauschii (EMT23234)20179552KJ851189
DCL1Endo_Locus_371954901534Vitis vinifera (XP_002268369)19719385KJ851190
DCL3aEndo_Locus_2573637771206Vitis vinifera (XP_002280293)16489561KJ851191
DCL3bEndo_Locus_67693715966Oryza sativa (ABB20894)11167843KJ851192
DCL4Embryo_Locus_788155761617Setaria italica (XM_004976122)16328663KJ851193
AGO1Endo_Locus_6733410864Vitis vinifera (XP_002271225)10857689KJ851194
AGO2Embryo_Locus_4735918251Oryza sativa (NP_001053871)10348246KJ851195
AGO4Endo_Locus_254172258659Vitis vinifera (XP_002275928)9138777KJ851196
AGO10Embryo_Locus_155171079313Vitis vinifera (XP_002279408)9953148KJ851197
RDR 1Endo_Locus_54052406134Vitis vinifera (XP_002284914)11219974KJ851198
RDR 2Embryo_Locus_433072041616Vitis vinifera (XP_002280099)11279067KJ851199
RDR 5Leaf_Locus_250011809590Populus trichocarpa (XP_006380469)8999760KJ851200
RDR 6Endo_Locus_130613140711Nicotiana tabacum (ADI52625)11977534KJ851201
MET1Endo_Locus_2612147751518Elaeis guineensis (ABW96888)15439897KJ851202
DRMLeaf_Locus_26272022554Elaeis guineensis (ABW96890)5918286KJ851203
CMTEndo_Locus_41912245603Elaeis guineensis (ABW96889)9259489KJ851204
ROS1Endo_Locus_23310620108Citrus sinensis (AGU16984)15735267KJ851205
DRD1Endo_Locus_3125044593Theobroma cacao (EOX97924) 8998996262KJ851206

Note: In a search of a list of date palm expressed genes (Supplementary Data 1, Al-Mssallem et al. 2013), we could find one RPA1 (KacstDP.mRNA.S000004.203), three AGO proteins (KacstDP.mRNA.S000249.22, KacstDP.mRNA.S001251.1 and KacstDP.mRNA.S000009.162) and one RDR2 (KacstDP.mRNA.S000670.5) proteins. In a BLAST search of oil palm unannotated, de novo assembled transcripts (Supplemental Data 1, 2 and 3, Dussert et al. 2013), we identified four RPB1 (CL1714Contig3, CLContig342, CL1Contig406 and CL1Contig7380), one NRPE1 (CL1715Contig1), one DCL3b (CL1841Contig3), five AGO1 (CL1Contig1334, CL1Contig4087, CL1Contig7749, CL1Contig2199 and CL1721Contig1), four AGO4 (CL1Contig2911, CL1Contig7017, CL665Contig1 and CL665Contig3), one RDR1 (CL1Contig877), one DRM (CL1Contig4186) and one DRD1 (CL3348Contig3) proteins.

Results and Discussion

Sequencing and de novo assembly

A summary of the sequencing results and de novo assembly is presented in Table 1. A total of 10 GB raw sequence was obtained for each tissue. Of the three tissues, the leaf transcriptome has the maximum total number of sequencing reads (121,151,552) with an average read length of 126 bp and an inserted size of 286 bp, whereas the embryo transcriptome has the minimum total sequencing reads (81,128,552) with an average read length of 168.5 bp. After quality trimming, the average read lengths are 158.3 bp, 97.9 bp, and 120.36 bp in transcriptomes of embryo, endosperm, and leaf, with an inserted size of 430 bp, 306 bp, and 286 bp, respectively. The de novo assembly of clean reads produced the most total transcripts in endosperm (229,866) and the minimum in embryo (86,254). Total unigenes in the former are 61,125 with an average length of 684 bp and a GC content of 41%, whereas the latter have 58,211 total unigenes with an average length of 732 bp and a GC content of 48%. Interestingly, although leaf transcriptome has more total transcripts (159,509) than embryo, its total unigenes (33,446) are far less than embryo transcriptome. However, its unigenes have the longest average length (744 bp) and highest GC content (48%). The length distribution of total unigenes found in three tissues is shown in Figure S2.
Table 1

Sequencing results and de novo assembly

Sequencing Results and De Novo AssemblyEmbryoEndospermLeaf
Sequencing results
Total Illumina reads (No)81,128,552103,080,366121,151,552
Average read length (bp)168.5101126
Total base (No)13,670,161,01210,411,116,96615,265,095,552
Total reads after QT (No)78,553,43599,253,456120,061,360
Average read length after QT (bp)158.397.9120.36
Total clean base (No)12,435,008,7609,682,020,83413,609,335,965
Insert size (bp)430306286
De novo assembly
Total transcripts (No)86,254229,866159,509
Total unigenes (No)58,21161,15233,446
Average contig length of unigene (bp)732684744
Unigenes with multiple hits (No)24,85729,73126,064
Unigenes with unique hits (No)23,83622,27820,844
N50951969912
GC content of unigene (%)414548

Gene expression and annotation

Our blast search of unigenes against the nr protein database in NCBI with an E-value cutoff of 10−5 led to the identification of 24,857 (42.7%), 29,731 (48.6%), and 26,064 (77.9%) annotated unigenes with multiple matches from embryo, endosperm, and leaf transcriptomes, respectively (Table 1). Of the annotated unigenes with a unique match, 2987 were shared by all three tissues, 3225 were shared by endosperm and embryo, 4852 were shared by endosperm and leaf, and 2311 were shared by embryo and leaf (Figure 1). Numbers of tissue-specific unigenes are 12,772 in endosperm, 12,321 in embryo, and 12,128 in leaf (Figure 1).
Figure 1

Venn chart showing unique and shared unigenes found in three coconut transcriptomes.

Venn chart showing unique and shared unigenes found in three coconut transcriptomes. Complete gene expression lists (FPKM > 1) from the embryo, endosperm, and leaf transcriptomes are provided as Table S2, Table S3, and Table S4. Expression profiling of the top 50 expressed genes (Figure 2) reflects the physiological characteristics and/or function of the different tissues. As expected, photosynthetic genes are found almost exclusively in the leaf transcriptome, with the most abundant transcript encoding the small subunit of ribulose-1,5-bisphosphate carboxylase (Table 2). In endosperm, which has been sampled at the gelatinous stage undergoing active cell division (Abraham and Mathew 1963), transcripts for cytoskeletal and translational proteins are abundant. Alpha-tubulin, which forms microtubules in the mitotic spindle, is the most highly transcribed gene in endosperm and translationally controlled tumor protein (Amson ), which binds to microtubules (Santa Brigida ) and regulates cell division (Brioudes ), is in the top 10 (Table 2). Other highly expressed genes in coconut endosperm (Table 2), including annexin, enolase, and metallothionein type 2A (MT2A), are also among those expressed at a high level in endosperm of castor bean (Lu ) and Brassica napus (Huang ).
Figure 2

Expression profile of top 50 expressed genes in the three tissues. The colors denote absence (white) and presence (red) of a particular gene transcript. Photosynthetic genes are almost exclusively found in the leaf transcriptome. Seed storage (7S globulin) and heat shock proteins are prominent in the embryo. Translational and cytoskeleton proteins are abundant in embryo and in endosperm, but rarely found in leaf. Cell wall–associated hydrolase and major intrinsic proteins are evenly distributed in three tissues. Uncharacterized proteins exist in all three tissues, but unigenes without matched sequences in GenBank are found only in embryo and in leaf, not in endosperm.

Table 2

Top expressed genes identified in three tissues of coconut

Embryo
Endosperm
Leaf
Transcript AnnotationFPKMTranscript AnnotationFPKMTranscript AnnotationFPKM
Metallothionein type 2a-FL12328.2Alpha-tubulin11020.0Chloroplast RubisCO small subunit71647.2
7S globulin8069.0Dehydration responsive protein7930.5Os08g056090029745.4
Aldose reductase-like protein2412.7Unnamed protein product6377.5Ubiquitin17929.8
Long chain acyl-CoA synthetase 4-like2410.7Elongation factor 1-alpha5994.5Mitochondrial protein17284.7
Cell wall–associated hydrolase1910.6Translationally controlled tumor protein5012.9ASCAB913504.6
GA-stimulated transcript-like protein 61817.0Cell wall–associated hydrolase4972.3Chloroplast chlorophyll a/b binding protein12838.5
Zn3H1 domain-containing protein 49-like1391.0SORBIDRAFT_04g0329704285.5Photosystem I reaction center subunit XI12695.6
1-Cys peroxiredoxin1322.4Sorbitol dehydrogenase-like protein4042.6No annotation8909.8
Eukaryotic translation initiation factor 1A-like1316.9RRNA intron-encoded homing endonuclease3847.1Unknown8812.3
OsI_083341304.3Ribosomal protein L323650.3Early light-induced protein 28496.1
AKIN beta11239.4Polyubiquitin3581.0Unnamed protein product6946.9
Heat shock protein 17a1179.5Metallothionein type 2a-FL3522.5Oxygen-evolving enhancer protein 25528.4
Tonoplast intrinsic protein1175.2Early nodulin 55-2 precursor3063.2Predicted protein5031.4
Actin1089.8Unknown2991.1POPTRDRAFT_7261685019.5
ZEAMMB73_7809021013.8Thiazole biosynthetic enzyme2620.5Glycine hydroxymethyltransferase4332.1
No annotation974.0Predicted protein2568.3Cell wall–associated hydrolase4316.6
Aldose reductase isoform 1963.7Glutaredoxin-12499.4Lipid transfer protein4310.7
ZEAMMB73_726804881.6Annexin2453.6Probable histone H2A.4 isoform 14241.2
ZEAMMB73_749085857.3SORBIDRAFT_01g0050102298.0LOC1007881424080.4
Histidine decarboxylase820.8Thioredoxin h12243.8MTR_5g0510503984.3
OsI_32485764.11-Cys peroxiredoxin2138.440S ribosomal protein s23525.6
Unnamed protein product736.0Enolase2113.7Senescence-associated protein 43431.4
60S ribosomal protein L7-like693.9Calmodulin2091.7Glycolate oxidase3425.2
Expression profile of top 50 expressed genes in the three tissues. The colors denote absence (white) and presence (red) of a particular gene transcript. Photosynthetic genes are almost exclusively found in the leaf transcriptome. Seed storage (7S globulin) and heat shock proteins are prominent in the embryo. Translational and cytoskeleton proteins are abundant in embryo and in endosperm, but rarely found in leaf. Cell wall–associated hydrolase and major intrinsic proteins are evenly distributed in three tissues. Uncharacterized proteins exist in all three tissues, but unigenes without matched sequences in GenBank are found only in embryo and in leaf, not in endosperm. In mature coconut embryos, genes encoding 7S globulin storage protein are among the most highly transcribed (Table 2) because they are in oil palm embryos at later stages of development (Morcillo ). However, the most highly transcribed gene in embryos encodes MT2A (Table 2). MTs are small cysteine-rich proteins with proposed roles in stress responses and metal storage, transport, and detoxification (Leszczyszyn ). The first plant MTS were discovered in wheat embryos (Leszczyszyn ), and our finding of high expression of MT2A in mature coconut embryos is consistent with a crucial but still undefined role late in embryogenesis. High expression of MTs has also been observed in ripening pineapple (Moyle ) and banana (Liu ), as well as developing Douglas fir seeds (Chatthai ) and oil palm embryoids and embryonic callus (Low ). Uncharacterized proteins exist in all three tissues, but unigenes without matched sequences in GenBank are found only in embryo and leaf, not endosperm (Figure 2). Our blast search showed that a significant number of unigenes in the embryo transcriptome (57%) and leaf transcriptome (22%) do not have any match in GenBank (Figure 3). Matches between coconut unigenes and entries of nr protein database showed that species of monocots have the best match (highest total score), particularly from the Poaceae (true grass) family, which includes 16%, 38%, and 28% of the best matches in embryo, endosperm, and leaf transcriptomes (Figure 3). Somewhat surprisingly, the species with the second-best match was Vitis vinifera (common grape vine), which contains 11%, 30%, and 23% of the best matching sequences in embryo, endosperm, and leaf transcriptomes. The significance of these similarities in protein sequence between two rather distantly related plants (monocot coconut and dicot grape) is not yet clear, but a relatively high percentages of matches to Vitis were also found for other monocots such as date palm (Al-Dous ), pineapple (Ong ), and banana (Passos ). Other eudicots combined together to comprise 10%–25% of the best matches. A small proportion of unigenes matched to members of basal angiosperms, gymnosperms, and ferns. Very few unigenes matched to sequences of nonvascular plants (bryophytes) or green algae.
Figure 3

Species distribution of coconut transcripts (FPKM >1.00) resulting from de novo assembly. Sections <2% are not labeled.

Species distribution of coconut transcripts (FPKM >1.00) resulting from de novo assembly. Sections <2% are not labeled. Although our search against the nr protein database found only few matches to palm sequences (1%–3%) (Figure 3), the search against palm EST sequences (42,336 sequences in total) showed that the matches between our datasets and palm EST sequences are 13,445 (23.10%) for embryo, 15,835 (25.90%) for endosperm, and 14,730 (44.04%) for leaf transcriptomes. However, the BLAST search against 20,077 oil palm transcripts (Dussert ) showed that the matches for our embryo, endosperm, and leaf unigenes are 15,906 (27.3%), 16,034 (26.22%), and 16,010 (47.84%), respectively.

GO annotation

Our GO annotation at level two assigned 88,224, 123,063, and 106,751 GO terms to annotated unigenes with unique hits of embryo (20,844), endosperm (22,278), and leaf (23,836) transcriptomes (Figure 1) under the categories of cellular component, molecular function, and biological process. Averages of three, five, and four GO terms were assigned per unigenes for embryo, endosperm, and leaf transcriptomes, respectively. Similar distribution pattern of GO classification was distinguished among three tissues (Figure S3), with the majority of the GO terms being assigned to biological process distributed in 21 subcategories (∼49%), followed by cellular component in eight subcategories (∼31%) and molecular function in 12 subcategories (∼20%). Of these GO terms, proteins participating in cellular and metabolic processes are the most abundant, counting 23% and 21% of the total GO terms of biological process in three tissues. Of the cellular component, the most dominant proteins are those involved in cell and organelle developments, which count 37% and 29% of the total GO terms of cellular component. Of the molecular function, proteins acting on catalytic activity and binding take up to 43% and 41% of the total GO terms of molecular function. A similar distribution pattern has also been demonstrated in a tall coconut variety by Fan , as well as in banana (Passos ) and pineapple (Ong ). We performed further GO analysis at level eight. Details of each main category can be found in the Supporting Information (Figure S4 and Figure S5 and Table S5, Table S6, Table S7). In addition, we assessed the enrichment of annotated GO terms at level eight for the category of biological process. This analysis indicated that three GO terms are significantly enriched (P < 0.01) in embryo, 24 are significantly enriched in endosperm, and 14 are significantly enriched in leaf transcriptomes (Figure 4). As anticipated, GO terms associated with photosynthesis and light responses are prominent in the leaf transcriptome. Consistent with gelatinous endosperm comprising actively dividing cells engaged in epigenetic processes, GO terms associated with chromatin modifications and assembly, RNA metabolism, and mitosis are enriched in the endosperm transcriptome (Figure 4).
Figure 4

Analysis of GO enrichment at level eight.

Analysis of GO enrichment at level eight.

KEGG analysis

A KEGG analysis (Kyoto Encyclopedia of Genes and Genomes) was performed to identify active biological pathways in the coconut tissues undergoing investigation. This analysis assigned 5686 unigenes distributing in 138 pathways for embryo transcriptome. There were 7707 unigenes assigned to 138 pathways and 5686 unigenes assigned to 139 pathways for endosperm and leaf transcriptomes, respectively (Table S8). Our results are comparable with a previous RNA-seq transcriptome study of pooled coconut tissues (spear leaves, young leaves, and fruit flesh, Hainan Tall cultivars) that identified 57,304 unigenes, 23,168 of which could be mapped to 215 KEGG pathways (Fan ). As noted in the previous study (Fan ), it is interesting to consider genes involved in fatty acid biosynthesis and metabolism. Derived from coconut meat (mature endosperm), coconut oil contains a high proportion of medium chain fatty acids, including commercially important lauric acid (Kumar 2011). In their pooled coconut transcriptomes, Fan found 347 genes involved in the five steps of fatty acid and metabolism (fatty acid biosynthesis, unsaturated fatty acid, citrate cycle, fatty acid metabolism, and fatty acid elongation). We identified 230, 361, and 335 unigenes for embryo, endosperm, and leaf transcriptomes in these pathways (Table S8). Unigenes for fatty acid biosynthesis, elongation, and metabolism were most highly represented in the endosperm transcriptome (Table S8). A comparative transcriptome analysis in three oil palm fruit and seed tissues revealed expression in endosperm of many genes involved in fatty acid synthesis (Dussert ).

RdDM factors

RNA interference (RNAi) is an umbrella term describing gene silencing pathways that use Dicers, Argonautes (AGO), and RNA-dependent RNA polymerases (RDR) to make and use small RNAs (20–30 nt in length) that elicit sequence-specific gene silencing (Martínez de Alba ). Genes encoding these factors have amplified and functionally diversified in plants (Xie ; Kapoor ; Liew ; Nakasugi ; Liu ). The Arabidopsis genome encodes four DICER-LIKE (DCL), 10 AGO, and six RDR proteins (Xie ), whereas rice has eight DCL, 19 AGO, and five RDR genes (Kapoor ). In the combined data set from the three coconut tissues, we identified at least partial transcripts for four DCL, four AGO, and four RDR proteins (Table 3), including those specialized for RdDM (discussed below). It is likely that additional members of these families remain to be identified in coconut once a whole genome sequence is available and RNA-seq technology is extended to other tissues types, developmental stages, and environmental conditions. Putative RdDM factors and related RNAi proteins identified in coconut tissues Note: In a search of a list of date palm expressed genes (Supplementary Data 1, Al-Mssallem et al. 2013), we could find one RPA1 (KacstDP.mRNA.S000004.203), three AGO proteins (KacstDP.mRNA.S000249.22, KacstDP.mRNA.S001251.1 and KacstDP.mRNA.S000009.162) and one RDR2 (KacstDP.mRNA.S000670.5) proteins. In a BLAST search of oil palm unannotated, de novo assembled transcripts (Supplemental Data 1, 2 and 3, Dussert et al. 2013), we identified four RPB1 (CL1714Contig3, CLContig342, CL1Contig406 and CL1Contig7380), one NRPE1 (CL1715Contig1), one DCL3b (CL1841Contig3), five AGO1 (CL1Contig1334, CL1Contig4087, CL1Contig7749, CL1Contig2199 and CL1721Contig1), four AGO4 (CL1Contig2911, CL1Contig7017, CL665Contig1 and CL665Contig3), one RDR1 (CL1Contig877), one DRM (CL1Contig4186) and one DRD1 (CL3348Contig3) proteins. RdDM is a specialized nuclear branch of RNAi in plants that uses in Arabidopsis DCL3, AGO4, and RDR2. In addition, RdDM requires two plant-specific, RNA polymerase II (Pol II)-related RNA polymerases, called Pol IV and Pol V, as well as a number of accessory factors including putative chromatin remodelers of the defective in RNA-directed DNA methylation 1 (DRD1) subfamily of SNF2 ATPases (Bargsten ). Pol IV is responsible for generating transcripts that are copied by RDR2 to produce double-stranded RNA precursors, which are then processed into 24-nt siRNAs by DCL3. After being loaded onto AGO4, which interacts with Pol V, the 24-nt siRNAs are thought to base-pair with Pol V-generated scaffold RNAs and guide the DNA methyltansferase domains rearranged methyltransferase-2 (DRM2) to catalyze cytosine methylation at the target DNA region. Methylation occurs at cytosines in all sequence contexts (CG, CHG, and CHH, where H is A, T, or C) and can be maintained at CG and CHG nucleotide groups during subsequent rounds of DNA replication in the absence of the RNA trigger by maintenance methyltransferases methyltransferase-1 (MET1) and chromomethylase-3 (CMT3), respectively (Matzke and Mosher 2014). We identified partial or nearly full-length transcripts for a number of factors involved in RdDM, including the largest subunits of Pol IV and Pol V (NRPD1 and NRPE1, respectively), as well as AGO4, DRD1, and the three major DNA methyltransferases, MET1, DRM, and CMT3 (Table 3). Notably, the relative abundance of most of these factors is highest in endosperm tissue compared with leaves and embryos (Figure 5). In particular, the relative abundance of the largest subunits of Pol IV and Pol V are higher in endosperm than in embryos or leaves. By contrast, the largest subunits of Pol I (RPA1) and Pol II (RPB1) have similar relative abundances in the three tissues. Although RDR2 is expressed in endosperm, the relative abundance is not overly high compared with the other tissues (Figure 5). Interestingly, a gene encoding a protein similar to the repressor of silencing-1/Demeter (ROS1/DME) family of DNA glycosylase/lyases involved in active DNA demethylation is highly expressed in endosperm. This finding is significant in view of the requirement for DME in establishing parental imprints in Arabidopsis and rice endosperm (Gehring 2013).
Figure 5

Relative abundance of RdDM-associated gene transcripts found in three tissues of coconut.

Relative abundance of RdDM-associated gene transcripts found in three tissues of coconut. Consistent with the importance of RdDM during seed development, the expression of DRM, NRPD1, NRPE1, and MET1 increases during endosperm maturation but is reduced in fully mature endosperm comprising a thick solid layer (Figure 6A). A decrease in expression of these factors is seen in older embryos as compared with younger embryos (Figure 6B).
Figure 6

(A) Relative quantity of four RdDM-related genes (DRM, NRPD1, NRPE1, and MET1) in three different developmental stages of endosperm. (B) Relative quantity of the same four genes in two different developmental stages of embryo.

(A) Relative quantity of four RdDM-related genes (DRM, NRPD1, NRPE1, and MET1) in three different developmental stages of endosperm. (B) Relative quantity of the same four genes in two different developmental stages of embryo. Similar to rice (Song ), transcripts of two DCL3-related enzymes were detected in coconut (Table 3). DCL3a, which is conserved in dicots and monocots (Kapoor ), was represented by transcripts in all three tissues examined, but the level is highest in endosperm (Figure 5). By contrast, DCL3b/DCL5 (Margis ; Fei ), which is monocot-specific and expressed preferentially reproductive tissues and developing seeds (Kapoor ; Song ), was detected only in endosperm tissue (Figure 5). In rice, DCL3a is responsible for producing 24-nt siRNAs that induce RdDM, whereas DCL3b/DCL5 produces phased 24-nt siRNAs that are also likely to induce RdDM during reproduction (Song ; Fei ). Our results extend the findings of a distinct DCL3b/DCL5 enzyme to developing seeds of a nonmodel monocot plant. We also identified transcripts of other DCL, AGO, and RDR family members that are involved in small RNA-mediated silencing at the posttranscriptional level. Some of these are also relatively highly expressed in endosperm compared with leaves and embryo. These include DCL1 and AGO1, which act in microRNA pathways important for development, as well as DCL4 and RDR6, which produce endogenous 21-nt siRNAs for viral defense and trans-acting siRNAs that influence developmental timing (Willmann ; Martínez de Alba ; Poulsen ). Interestingly, transcripts of AGO2 and AGO10, which act similarly to AGO1 in miRNA pathways but in more specialized contexts (Poulsen ), are most abundant in embryos. We also detected in one or more of the tested tissues transcripts for RDR proteins that are less well-functionally characterized, including RDR1 and RDR5 (Willmann ) (Table 3 and Figure 5). Our results suggest that the RdDM and other small RNA-mediated silencing pathways are active in coconut seeds, particularly maturing endosperm. This finding is consistent with previous studies of other plants indicating that RdDM is late-acting in endosperm development (Belmonte ; Gehring 2013). Future studies will focus on examining transcription of these factors more thoroughly at different stages of endosperm and embryo development.

Conclusions

Our study increases transcriptomic resources for coconut and provides a foundation for further functional and molecular studies that will inform efforts to improve coconut through molecular breeding and genetic engineering technologies. Our transcript profiles for leaves, endosperm, and embryos reveal highly expressed genes that can eventually help to identify strong tissue-specific promoters for future use in coconut biotechnology. Our analysis of RdDM factors in coconut expands the range of plants for which sequence information on these proteins is available and broadens our knowledge of epigenetic contributions to seed development in a nonmodel crop plant.
  55 in total

1.  Developing pineapple fruit has a small transcriptome dominated by metallothionein.

Authors:  Richard Moyle; David J Fairbairn; Jonni Ripi; Mark Crowe; Jose R Botella
Journal:  J Exp Bot       Date:  2004-11-01       Impact factor: 6.992

Review 2.  Seed-development programs: a systems biology-based comparison between dicots and monocots.

Authors:  Nese Sreenivasulu; Ulrich Wobus
Journal:  Annu Rev Plant Biol       Date:  2013-02-28       Impact factor: 26.379

3.  Comparative transcriptome and metabolite analysis of oil palm and date palm mesocarp that differ dramatically in carbon partitioning.

Authors:  Fabienne Bourgis; Aruna Kilaru; Xia Cao; Georges-Frank Ngando-Ebongue; Noureddine Drira; John B Ohlrogge; Vincent Arondel
Journal:  Proc Natl Acad Sci U S A       Date:  2011-06-27       Impact factor: 11.205

4.  Large-scale collection and annotation of gene models for date palm (Phoenix dactylifera, L.).

Authors:  Guangyu Zhang; Linlin Pan; Yuxin Yin; Wanfei Liu; Dawei Huang; Tongwu Zhang; Lei Wang; Chengqi Xin; Qiang Lin; Gaoyuan Sun; Mohammed M Ba Abdullah; Xiaowei Zhang; Songnian Hu; Ibrahim S Al-Mssallem; Jun Yu
Journal:  Plant Mol Biol       Date:  2012-06-27       Impact factor: 4.076

Review 5.  RNA-directed DNA methylation: an epigenetic pathway of increasing complexity.

Authors:  Marjori A Matzke; Rebecca A Mosher
Journal:  Nat Rev Genet       Date:  2014-05-08       Impact factor: 53.242

6.  Comparative transcriptome analysis of three oil palm fruit and seed tissues that differ in oil content and fatty acid composition.

Authors:  Stéphane Dussert; Chloé Guerin; Mariette Andersson; Thierry Joët; Timothy J Tranbarger; Maxime Pizot; Gautier Sarah; Alphonse Omore; Tristan Durand-Gasselin; Fabienne Morcillo
Journal:  Plant Physiol       Date:  2013-06-04       Impact factor: 8.340

7.  Independent origins of cultivated coconut (Cocos nucifera L.) in the old world tropics.

Authors:  Bee F Gunn; Luc Baudouin; Kenneth M Olsen
Journal:  PLoS One       Date:  2011-06-22       Impact factor: 3.240

8.  Streaming fragment assignment for real-time analysis of sequencing experiments.

Authors:  Adam Roberts; Lior Pachter
Journal:  Nat Methods       Date:  2012-11-18       Impact factor: 28.547

9.  Probing the endosperm gene expression landscape in Brassica napus.

Authors:  Yi Huang; Liang Chen; Liping Wang; Kannan Vijayan; Sieu Phan; Ziying Liu; Lianglu Wan; Andrew Ross; Daoquan Xiang; Raju Datla; Youlian Pan; Jitao Zou
Journal:  BMC Genomics       Date:  2009-06-02       Impact factor: 3.969

10.  An RNA-seq transcriptome analysis of histone modifiers and RNA silencing genes in soybean during floral initiation process.

Authors:  Lim Chee Liew; Mohan B Singh; Prem L Bhalla
Journal:  PLoS One       Date:  2013-10-16       Impact factor: 3.240

View more
  4 in total

1.  Genetic control of fatty acid composition in coconut (Cocos nucifera), African oil palm (Elaeis guineensis), and date palm (Phoenix dactylifera).

Authors:  Yong Xiao; Wei Xia; Annaliese S Mason; Zengying Cao; Haikuo Fan; Bo Zhang; Jinlan Zhang; Zilong Ma; Ming Peng; Dongyi Huang
Journal:  Planta       Date:  2018-09-07       Impact factor: 4.116

2.  Complete Sequence and Analysis of Coconut Palm (Cocos nucifera) Mitochondrial Genome.

Authors:  Hasan Awad Aljohi; Wanfei Liu; Qiang Lin; Yuhui Zhao; Jingyao Zeng; Ali Alamer; Ibrahim O Alanazi; Abdullah O Alawad; Abdullah M Al-Sadi; Songnian Hu; Jun Yu
Journal:  PLoS One       Date:  2016-10-13       Impact factor: 3.240

3.  Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

Authors:  Alix Armero; Luc Baudouin; Stéphanie Bocs; Dominique This
Journal:  PLoS One       Date:  2017-03-23       Impact factor: 3.240

4.  De Novo Genome Sequence Assembly of Dwarf Coconut (Cocos nucifera L. 'Catigan Green Dwarf') Provides Insights into Genomic Variation Between Coconut Types and Related Palm Species.

Authors:  Darlon V Lantican; Susan R Strickler; Alma O Canama; Roanne R Gardoce; Lukas A Mueller; Hayde F Galvez
Journal:  G3 (Bethesda)       Date:  2019-08-08       Impact factor: 3.154

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.