Literature DB >> 24023703

Complete sequence and comparative analysis of the chloroplast genome of coconut palm (Cocos nucifera).

Ya-Yi Huang1, Antonius J M Matzke, Marjori Matzke.   

Abstract

Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24023703      PMCID: PMC3758300          DOI: 10.1371/journal.pone.0074736

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Chloroplasts (cp) are cell organelles that carry out photosynthesis, thus converting light energy into chemical energy in green plants and algae. Chloroplasts contain their own genome, which in flowering plants usually consists of a circular double-stranded DNA molecule ranging from 120 to 160 kb in length [1]. The cp genome is divided into four parts comprising a large single copy region (LSC) and a small single copy region (SSC), which are separated by a pair of inverted repeats (IRs). Cp genomes typically encode four rRNAs, around 30 tRNAs and up to 80 unique proteins [2]–[4]. With the advent of high-throughput sequencing technologies and their use in obtaining complete plastid genomes [5], [6], the number of fully sequenced cp genomes has increased rapidly. To date, the Complete Organelle Genome Sequences Database (http://amoebidia.bcm.umontreal.ca/pg-gobase/complete_genome/ogmp.html) lists 324 complete cp genome sequences spanning 268 distinct organisms. The complete cp genome sequences include date palm (Phoenix dactylifera L.) and oil palm (Elaeis guineensis Jacq.). Both are members of the palm family (Arecaceae), which is the third most economically important family of plants after the grasses and legumes [7]. Complete sequence information on cp genomes from three additional palms - Calamus caryotoides, Pseudophoenix vinifera, Bismarkia nobilis – has recently been deposited in GenBank [8]. However, the complete cp genome sequence of coconut palm (Cocos nucifera L.), which is a universal symbol of the tropics and equally important as oil palm [7], has not yet been reported. Coconut is one of the most important crops in tropical zones where it is a source of food, drink, fuel, medicines and construction material [9]. In addition, coconut oil is used for cooking and for pharmaceutical and industrial applications [10]. Although coconut trees display considerable morphological diversity, they are considered taxonomically a single species (and the only species) within the genus Cocos. Based on stature and breeding, coconut cultivars can be divided into two groups: tall and dwarf [11]. The former typically grows up to 35 to 40 meters and is mainly outcrossing, whereas the latter can only grow up to 25 to 30 meters and usually is selfing. Dwarf coconuts, which are less common than the tall variety, are usually found growing close to humans and have traits that likely result from human selection [10]. Here we report the complete cp genome sequence of a dwarf coconut plant, which is thought to be descended from coconut trees originally imported into Taiwan from Thailand (personal communication from private breeder).

Materials and Methods

Whole genome sequencing and de novo assembly

Fresh young leaf material (ca. 2 g) was collected from a coconut seedling growing under ambient conditions in the greenhouse of Academia Sinica and the genomic DNA (gDNA) was extracted using a modified CTAB protocol [12]. We used the ratio of absorbance at 260 nm and 280 nm (A260/280) and gel electrophoresis to measure the purity and integrity of the extracted gDNA. High quality DNA (concentration >100 ng/µl; A260/230>1.7; A260/280 = 1.8∼2.0) was sequenced using the Illumina GAIIx platform (YOURGENE BIO SCIENCE Co., New Taipei City, Taiwan). Short reads (70 bp) from paired-end sequencing were trimmed with a 0.05 error probability. The trimmed reads were de novo assembled using CLC Genomic Workbench 6.0.1 (CLC Bio, Aarhus, Denmark). The de Bruijn Graph approach with a k-mer length of 22 bp and a coverage cutoff value of 10X was applied for assembly. The average read length and insert size were 151 bp and 340 bp respectively. The assembled contigs shorter than 200 bp were removed from the scaffold while those with coverage larger than 10X were selected for BLAST search against plastid genomes of date palm [2], oil palm [3], and other chloroplast sequences with an e-value cutoff of 10−5 (199 sequences in total). Gaps between contigs were filled by PCR amplification with specific primers that were designed based on contig sequences or homologous sequence alignments (Table S1). The PCR products were purified with GEL/PCR DNA clean-up kit (Favorgen Biotech Corp.) and then sequenced by conventional Sanger sequencing. The sequencing data along with gene annotation have been submitted to GenBank with an Accession number of KF285453.

Genome annotation, base composition, repeat structure, and codon usage

Preliminarily gene annotation was carried out through the online program DOGMA [13] and BLAST searches. To verify the exact gene and exon boundaries, we used MUSCLE [14] to align putative gene sequences with their homologues acquired from BLAST searches in GenBank. All tRNA genes were further confirmed through online tRNAscan-SE search server [15]. The online program tandem repeat finder [16] was used to search the locations of repeat sequences (>10 bp in length) with the following set up: (2, 7, 7) for alignment parameters (match, mismatch, indels); 80 for minimum alignment score to report repeat; and maximum period size of 500. Codon usage was calculated for all exons of protein-coding genes (pseudogenes were not calculated). Base composition was calculated by Artemis [17].

Analysis of RNA editing

Potential RNA editing sites in protein-coding genes of coconut cpDNA were predicted by the online program Predictive RNA Editor for Plants (PREP) suite (http://prep.unl.edu/) [18] with a cutoff value of 0.8. This program contains 35 reference genes for detecting RNA editing sites in plastid genomes. The predicted editing sites were verified by reverse transcription polymerase chain reaction (RT-PCR) experiments. In addition to those genes predicted by the program, we also investigated rpl22, rpl23, rps3, rps7, ycf1, ycf2, and ycf4 genes, within which RNA editing sites were reported in the cp genome of oil palm [3]. The Plant Total RNA Miniprep Purification Kit (GMbiolab Co., Ltd.) was applied to extract total RNA from leaf of the same seedling used for DNA extraction. The first strand cDNA was synthesized with QuantiTect Reverse Transcription Kit (Qiagen) following the manufacturer's protocol. Gene specific primers for cDNA amplification were designed based on homologous sequence alignment. Maximum 1 µl of the reaction mixture was used as template for PCR amplification. The PCR products were purified with GEL/PCR DNA clean-up kit (Favorgen Biotech Corp.). Purified PCR products were sequenced using ABI PRISM® 3700. A complete primer list is provided in Table S1.

Phylogenetic analysis

Forty seven protein coding genes were extracted from 25 taxa, including Amborella, Nuphar, 17 species of monocots, four species of magnoliids, and two species of eudicots. The GenBank accession number of each taxon is provided in Table 1. These taxa were selected because they have complete or nearly complete plastid genomes deposited in GenBank. Nucleotide sequences of each gene were first aligned by MUSCLE [14] through the online server of European Bioinformatics Institute (http://www.ebi.ac.uk/Tools/msa/muscle). The aligned sequences were then concatenated through copy and paste in text editor. The statistical method of Maximum Likelihood (ML) and the computer program Garli version 2.0 were applied for phylogenetic reconstruction, with parameters estimated from the data. The GTR substitution model with evolutionary rates among sites evaluated by a discrete gamma distribution was used for tree search. All positions containing gaps or missing data were eliminated. Branch support was evaluated by 1,000 replications of bootstrap (BS) re-sampling.
Table 1

Accessions and references for taxa used in phylogenetic reconstruction and genome comparison in this study.

TaxonGenBank accession numberReference
Basal angiosperms
Amborella trichopoda NC_005086Goremykin et al. 2003 [25]
Nuphar advena NC_008788Raubeson et al. 2007 [26]
Monocots
Acorus americanus EU273602Unpublished
Colocasia esculenta JN105690Ahmed et al. 2012 [28]
Cymbidium aloifolium KC876122Yang et al. 2013 [29]
Bismarckia nobilis JX088664Barrett et al. 2013 [8]
Calamus caryotoides JX088663Barrett et al. 2013 [8]
Chamaedorea seifrizii JX088667Barrett et al. [8]
Cocos nucifera KF285453Produced in this study
Elaeis guineensis JF274081Uthaipasanwong et al. 2012 [3]
Phoenix dactylifera GU811709Yang et al. 2010 [2]
Pseudophoenix vinifera JX088662Barrett et al. 2013 [8]
Dasypogon bromeliifolius JX088665Barrett et al. 2013 [8]
Kingia australis JX051651Barrett et al. 2013 [8]
Typha latifolia GU195652Jansen et al. 2007 [30]
Alpinia zerumbet JX088668Barrett et al. 2013 [8]
Heliconia collinsiana JX088660Barrett et al. 2013 [8]
Musa acuminata HF677508Martin et al. 2013 [59]
Xiphidium caeruleum JX088669Barrett et al. 2013 [8]
Magnoliids
Chloranthus spicatus EF380352Hansen et al. 2007 [31]
Drimys granadensis DQ887676Cai et al. 2006 [5]
Magnolia denudata JN867577Unpublished
Piper cenocladum DQ887677Cai et al. 2006 [5]
Eudicots
Ceratophyllum demersum NC009962Moore et al. 2007 [27]
Nandina demostica DQ923117Moore et al. 2006 [32]

Results and Discussion

Sequencing and de novo assembly

Illumina sequencing produced 6,413,504 paired-end reads with an average read length of 151 bp and a total base number of 968,439,104. After quality trim, 6,328,120 reads with an average of 145.3 bp and a total base number of 919,475,836 remain. The subsequent de novo assembly and reference-guided blast search resulted in five major contigs separated by five gaps, which were then filled by Sanger sequencing. In addition to gap closure and confirmation of four junction regions (LSC/IRA, LSC/IRB, SSC/IRA, SSC/IRB), we also validated the accuracy of our whole genome sequencing by randomly selecting genes/spacers for PCR-based sequencing. Priority was given to long genes (e.g., ycf1, ycf2, rpoC1) or long spacers (between pairs of rpoB and psbD, ycf2andndhB, ndhC and trnV-UAC). A few regions where genes were transcribed from clockwise to counterclockwise (vice versa) were also validated.

Organization of chloroplast genome

Analysis of the data obtained from high-throughput sequencing demonstrated that the cp genome of coconut is a typical quadripartite molecule (Fig. 1) within which a pair of inverted repeats (IRs) is separated by a large single copy region (LSC) and a small single copy region (SSC). The genome is 154,731 bp in length (IRs = 53,110 bp; LSC = 84,230 bp; SSC = 17,391 bp) and is predicted to encode 130 genes and four pseudogenes. The former includes 84 protein-coding genes, 38 tRNA genes, and eight rRNA genes while the latter is represented by pseudo ycf1, rps19, and two copies of ycf15. Of those genes, three protein-coding genes (ycf2, ndhB, and rps7), four rRNA genes (rrn16, rrn23, rrn4.5, and rrn5), and eight tRNA genes are present in two copies (Fig. 1).
Figure 1

Coconut chloroplast genome map.

Genes shown on the outside of the large circle are transcribed clockwise, while genes shown on the inside are transcribed counterclockwise. Thick lines of the small circle indicate IRs. Genes with intron are marked with “*”. Pseudo genes are marked with “Ψ”.

Coconut chloroplast genome map.

Genes shown on the outside of the large circle are transcribed clockwise, while genes shown on the inside are transcribed counterclockwise. Thick lines of the small circle indicate IRs. Genes with intron are marked with “*”. Pseudo genes are marked with “Ψ”. Fourteen of the protein-coding genes and eight of the tRNA genes contain introns; and four pairs of genes overlap (4 bp between atpE and atpB; 10 bp between ndhK and ndhC; 53 bp between psbC and psbD; and 57 bp between pseudo ycf1 and ndhF). Each intron-containing gene has only one intron, except ycf3 and clpP, which have two introns. Most protein-coding genes have standard AUG as initiator codon; however, rpl2 and ndhD have an initiator codon of ACG, rps19 starts with a GUG codon, and the initiator codon of cemA is ambiguous. The frequency of codon usage in the coconut cp genome is summarized in Table 2. Similar to many cp genomes of angiosperms [2], [3], [19]–[22], a strong bias toward an A or T in the third position of synonymous codons is also observed in the coconut cp genome. The most and least prevalent amino acids are leucine (2624) and cysteine (323), respectively.
Table 2

Codon usage and codon-anticodon recognition pattern in cp genome of coconut.

Amino acidCodonNoRSCUtRNAAmino acidCodonNoRSCUtRNA
PheUUU9061.23AlaGCA3830.59 trnA-UGC
UUC5640.77 trnF-GAA GCC2020.31
LeuUUA7850.60GCG1230.19
UUG5560.42 trnL-CAA GCU5860.91
CUA3710.28 trnL-UAG TyrUAU7691.59
CUC1880.14UAC1960.41 trnY-GUA
CUG1730.13HisCAC1440.45 trnH-GUG
CUU5510.42CAU4931.55
IleAUA7180.64 trnI-CAU GlnCAA6681.49 trnQ-UUG
AUC4870.43 trnI-GAU CAG2260.51
AUU10450.93AsnAAC2740.44 trnN-GUU
MetATG6131.00 trn(f)M-CAU AAU9671.56
ValGUA5170.74 trnV-UAC LysAAG3530.53
GUC1880.27 trnV-GAC AAA9881.47 trnK-UUU
GUG1900.27AspGAC2090.39 trnD-GUC
GUU4970.71GAU8631.61
SerAGC1040.10 trnS-GCU GluGAA10091.49 trnE-UUC
AGU4140.40GAG3460.51
UCA4400.43 trnS-UGA CysUGC780.48 trnC-GCA
UCC3380.33 trnS-GGA UGU2451.52
UCG1790.17TrpTGG4441.00 trnW-CCA
UCU5740.56ArgAGA5120.65 trnR-UCU
ProCCA3120.59 trnP-UGG AGG1610.20
CCC2070.39CGA3450.44
CCG1310.25CGC890.11
CCU4070.77CGG1230.16
ThrACA4170.64 trnT-UGU CGU3440.44 trnR-ACG
ACC2410.37 trnT-GGU GlyGGA7120.83 trnG-UCC
ACG1490.23GGC1430.17 trnG-GCC
ACU5040.77GGG2760.32
GGU5870.68

RSCU: Relative Synonymous Codon Usage.

RSCU: Relative Synonymous Codon Usage. Although RT-PCR analysis validated that C-to-U editing changed the ACG start codon to AUG in the ndhD gene, the ACG start codon in the rpl2 gene appeared to remain unedited in repeated experiments. However, we cannot eliminate the possibility that a low level of editing occurs in rpl2. Although less frequent than AUG, translation initiated at an ACG or GTG start codon is not unprecedented in plants. A previous study demonstrated that an initiator codon of AUG is not required to specify the initiation site for a proper translation in the cp genome [23]. GUG codons have been shown to be more efficient than ACG in initiating translation and have a relative strength varying from 15 to 30% of AUG activity [24]. In angiosperms, a GUG start codon has been found in the cemA gene [5], [25]–[27] and rps19 gene [2], [3], [5], [8], [26], [28]–[32]. A transcript starting with an ACG start codon has been observed in the ndhD gene in some species of Nicotiana [33], [34].

Repeats

With a criterion of 100% match in repeat copies, the tandem repeat finder identified 13 sets of repeats that are longer than 10 bp, including eight tandem repeats, three direct repeats, and two inverted repeats (Table 3). Three of the repeats are found in the ycf2 genes, which are in the IR regions. The remaining repeats are found in the LSC region: one at the 3′ end of the rps3 gene, seven in spacers, and two in the introns. This repeat content is similar to that found in date palm and oil palm. In fact, five of the repeats found in coconut (No. 2, 3, 6, 11and 12 in Table 3) are shared by both oil palm and date palm, though the copy number may differ. In addition, repeats No. 5 and No. 8 in coconut are shared by oil palm while repeats No. 4 and 13 are shared by date palm.
Table 3

Repeat sequences and their distribution in cpDNA of coconut.

No.Size (bp)Start positionRepeat numberTypeRepeat sequenceRegion
13064504, 645372D TATACTATAATAAATATACTATAATAAATA LSC; spacer between psbE and petL
22491629, 91653, 916773T GATATCGATATTGATGATAGTGAC IRB; ycf2 gene
324146981, 147005, 1470293T ATATCGTCACTATCATCAATATCG IRA; ycf2 gene
421149421, 1494422T GAAGTGACTTGGACAAAAAGA IRA; ycf2 gene
52031427, 314472T TTAAAAGATATACTCTGGAA LSC; spacer between trnT and psbD
62082734, 827542T CTCGTTTACAAATATCCAAA LSC; 3′ end of rps3 gene
71964518, 645372T TATACTATAATAAATATAC LSC; spacer between psbE and petL
81712731, 127482T TTCTTTATTTGTATTTG LSC; intron of atpF gene
91328852, 288732D TATTATATATAAA LSC; spacer between petN and psbM
1013 *590481I TATTATATATAAA LSC; spacer between petN and psbM, spacer between accD and psaI
11123749, 3773, 37933D AATTAAATAATA LSC; intron of trnK
121235106, 35118, 351413T ACTACTATACTA LSC; spacer between trnG and trnfM
1312 **351671I ACTACTATACTA LSC; spacer between trnG and trnfM

D: direct repeat; T: tandem repeat; I: inverted repeat.

*: inverted repeat sequence of repeat No. 9;

**: inverted sequence of repeat No. 12.

D: direct repeat; T: tandem repeat; I: inverted repeat. *: inverted repeat sequence of repeat No. 9; **: inverted sequence of repeat No. 12. Repetitive sequences in cp genomes may recombine and induce rearrangements [35]–[37], which could play a crucial role in stabilization of cpDNA [38]. Compared with other angiosperms, cp genomes of the palm family generally have fewer and shorter repeats (Table 4). Of the 13 repeats found in coconut cpDNA, the longest is 30 bp. The oil palm cp genome has seven repeats and the longest is 40 bp [3] while date palm has 11 repeats and the longest is 39 bp [2]. By contrast, more than 20 repeats, with the longest extending up to 132 bp, were reported in Poaceae [39], [40]. About 232 repeats, ranging from 30 to 61 bp in length, were reported in Cymbidium orchid [29]. In Citrus, 29 repeats with a range of 30 to 59 bp in length were detected [41]. In the Solanaceae family, as many as 42 repeats, with the most extensive being 56 bp, have been reported [42].The cp genome of Gossypiumhas 54 repeats, with a longest one of 64 bp [43]. In the Geraniaceae family, some cp genomes contain up to 9% (or higher) repetitive DNA [4], [44] and many of the repeats are longer than 100 bp [4].
Table 4

Comparison of repeat numbers and repeat lengths among 16 angiosperms.

TaxonTotal repeatsLongest repeatReferences
(No.)(bp)
Monocots
Orchidaceae
Cymbidium aloifolium 23261Yang et al. 2013 [29]
Arecaceae
Cocos nucifera 1330Produced in this study
Elaeis guineensis 740Uthaipaisanwong et al. 2012 [3]
Phoenix dactylifera 1139Yang et al. 2010 [2]
Poaceae
Bamboo emeiensis 39132Zhang et al. 2011 [39]
Hordeum vulgare 31>55Sasaki et al. 2007 [40]
Sorghum bicolor 26>55Sasaki et al. 2007 [40]
Agrostis stolonifera 19>55Sasaki et al. 2007 [40]
Dicots
Geraniaceae
Geranium palmatum 100–150>200Guisinger et al. 2011 [4]
Pelargonium hortorum ca. 200>200Guisinger et al. 2011 [4]
Rutaceae
Citrus sinensis 2953Bausher et al. 2006 [41]
Malvaceae
Gossypium hirsutum 5472Lee et al. 2006 [43]
Solanaceae
Atropa belladonna 4045–49Daniell et al. 2006 [42]
Nicotiana tabacum 33>55Daniell et al. 2006 [42]
Solanum lycopersicum 40>55Daniell et al. 2006 [42]
Solanum tuberosum 3150–54Daniell et al. 2006 [42]
In view of the correlation between repetitive DNA content and sequence rearrangement, significant structural rearrangements are likely to be observed in cp genomes rich in repetitive sequences. This idea has been validated in many cases listed above such as Poaceae [35], [39], [40], [42] and Geraniaceae [4], [44]–[46]. Conversely, the relatively low content of repetitive DNA in cp genomes of the palm family suggests a relatively higher degree of stability and conservation across different palm species. Consistent with this notion, our investigation revealed neither significant recombination (Fig. S1) nor dramatic variation (Table 5) in the cp genomes of six palm species.
Table 5

Comparison of cp genomes among six palm species.

Characteristics Calamus Pseudophoenix Phoenix Bismarckia Elaeis Cocos
Size (bp) 157,270157,829158,462158,211156,973154,731
LSC85,52585,73686,19886,39085,19284,230
SSC17,59517,58717,71217,45917,63917,391
IR54,15054,50654,55254,36254,14253,110
GC content (%)37.3637.3237.2337.4737.4037.44
Total number of genes 131131131131131129
Protein-coding genes858585858584
G+C (%)383838383837
bases (bp)192,481191,886192,511120,07910,78290,130
rRNAs888888
G+C (%)555555555555
bases (bp)9,0509,0519,0509,0507,0409,040
tRNAs383838383838
G+C (%)535353535344
bases (bp)10,74810,75610,76610,78910,78210,570
Number of Pseudogenes 111112
Gene with intron(s) 222222222222
Protein-coding genes141414141414
tRNAs888888

RNA editing sites

RNA editing is a posttranscriptional process that is mainly observed in mitochondrial and cp genomes of higher plants [47]. This process may induce the occurrence of substitution or indels, which in turn, can result in transcript alternation [33], [47], [48]. In coconut cpDNA, the PREP-cp program predicted 83 RNA editing sites out of 27 genes. Our RT-PCR analysis confirmed editing at 64 of those sites (Table 6). An additional six editing sites not predicted by the program were detected in accD, matK, ndhB, ndhG, ndhH, and rpoA. Of the genes investigated, ndh genes have the highest number of editing sites.
Table 6

RNA editing predicted by PREP-cp program and confirmed by RT-PCR.

GeneNucleotide PositionCodon changeEditing position within codonAmino acid changePREP PredictedRT-PCR results
accD154CGG - TGG1R-W+-
794TCG - TTG2S-L-+
1157TCA - TTA2S-L++
1159CAT - TAT1H-Y+-
1403CCT - CTT2P-L+-
atpA914TCA - TTA2S-L++
1148TCA - TTA2S-L++/−
atpB1184TCA - TTA2S-L++*
atpF92CCA - CTA2P-L++/−*
atpI428CCC - CTC2P-L++
629TCA - TTA2S-L++
ccsA647ACT - ATT2T-I+-
clpP82CAT - TAT1H-Y++*
559CAT - TAT1H-Y++*
matK188TCA - TTA2S-L+-
653CCA - CTA2P-L+-
734TTC - TTT3D- F-+
919CAT - TAT1H-Y+-
1267CAC - TAC1H-Y++
ndhA50TCG - TTG2S-L++
476TCA - TTA2S-L++
566TCA - TTA2S-L++
961CCT - TCT1P-S++
1073TCC - TTC2S-F+-
ndhB149TCA - TTA2S-L++/−
467CCA - CTA2P-L++
542ACG - ATG2T-M++
586CAT - TAT1H-Y++
704TCC - TTC2S-F++
737CCA - CTA2P-L-+*
830TCA - TTA2S-L++
836TCA - TTA2S-L++
1112TCA - TTA2S-L++
1193TCA - TTA2S-L++
1255CAT - TAT1H-Y++
1481CCA - CTA2P-L++/−
ndhD2ACG - ATG2T-M++/−
59TCA - TTA2S-L++
383TCA - TTA2S-L++
674TCG - TTG2S-L++
947ACA - ATA2T-I++
1193TCA - TTA2S-L++
1310TCA - TTA2S-L++
ndhF62TCA - TTA2S-L++/−
290TCA - TTA2S-L++/−
392TCC - TTC2S-F++
442CAT - TAT1H-Y++
586CTT - TTT1L-F+-
1393CAC - TAC1H-Y+-
2093TCC - TTC2S-F+-
ndhG314ACA - ATA2T-I+-
347CCA - CTA2P-L-+
ndhH505CAT - TAT1S-L++
545TCT - TTT2S-F-+/−*
726TAC - TAT3Y-Y--
ndhK131TCG - TTG2S-L++
372GTC - GTT3S-L--
518ATG - ACG2M-T--
677TCA - TTA2S-L--
petB418CGG - TGG1R-W++*
611CCA - CTA2P-L++
psaI80TCT - TTT2S-F++
85CAT - TAT1H-Y++
rpl22ACG - ATG2T-M+-
rpl2026ACA - ATA2T-I+-
308TCA - TTA2S-L+-
rpl22242TCA - TTA2S-L--
rpl2371TCA - TTA2S-L-+/-*
89TCT - TTT2S-F-+/-*
rpoA200TCT - TTT2S-F-+
368TCA - TTA2S-L++
527TCC - TTC2S-F++
830TCA - TTA2S-L++
887TCG - TTG2S-L+-
rpoB467TCG - TTG2S-L++/-
545TCA - TTA2S-L++
560TCG - TTG2S-L++
617CCG - CTG2P-L++/-
1994TCT - TTT2S-F++
2420TCA - TTA2S-L++/−
rpoC141CCA - CTA2P-L++
511CGG - TGG1R-W++
617TCA - TTA2S-L++
1663CAT - TAT1H-Y+-
rpoC21381CAT - TAT1H-Y+-
2275CGG - TGG1R-W+-
2309TCG - TTG2S-L++
rps2134ACA - ATA2T-I++
248TCA - TTA2S-L++
rps330TTC - TTT3I-I--
470ACA - ATA2S-L-+/−*
583CAT - TAT1S-L-+*
627ATC - ATT3I-I--
rps7300GCC - GCT3A-A--
rps8141AAT - AAC3N-N--
182TCA - TTA2S-L++
rps1480TCA - TTA2S-L++
149CCA - CTA2P-L++
ycf13423TAC - TAT3Y-Y--
3429GAT - GAC3D-D--
3449ATT - ACT2I-T--
3852ATC - ATT3I-I--
4487CTT - CCT2L-P--
ycf2549TCG - TCA3S-S--
607GAA - AAA1E-K--
ycf344TCT - TTT2S-F++
185ACG - ATG2T-M++
191CCA - CTA2P-L++
407TCC - TTC2S-F++
ycf4254TCA - TTA2S-L-+/−*

“+”: editing;

“-”: no editing;

“+/−”: partial editing;

“*”: editing sites shared with oil palm [3].

“+”: editing; “-”: no editing; “+/−”: partial editing; “*”: editing sites shared with oil palm [3]. The editing types in coconut were all non-silent and 100% C-to-U. One occurrence of editing altered the initiator codon ACG to AUG in ndhD gene. Of these editing events, 62 (82.67%) occurred at the second base of the codon, 12 (16%) were at the first base of the codon, and only one (1.33%) was at the third base of the codon. The conversions of amino acids include 63 hydrophilic to hydrophobic (S to L, S to F, H to Y, T to M, R to W, T to I, and D to F), 11 hydrophobic to hydrophobic (P to L), and one hydrophobic to hydrophilic (P to S). A comparative study of RNA editing across eight land plants demonstrated an evolutionary trend of decline (or complete loss) in the number of editing sites, silent editing, editing in the first or third position, and editing types other than C to U [47]. In angiosperms, the editing is almost exclusively a C to U substitution [49] and the total number of editing sites ranges from 20 to 37 [47], [50]–[53]. Compared with other angiosperms, coconut has more than twice as many editing sites, although the editing characteristics are similar (Table 7). Moreover, because of the evolutionary conservation of RNA editing, closely related taxa usually share more editing sites [47]. For example, more editing sites are shared within Poaceae than those shared among grasses and dicots [54]. Similarly, related Nicotiana species share more editing sites with each other than with plants from other genera [34].
Table 7

Comparison of RNA editing in six species of angiosperms.

Arabidopsis Nicotiana Cocos Elaeis Zea Oryza
Total editing sites343775322621
C to U editing (%)10010010078.12100100
U to C editing (%)00015.6300
G to A editing (%)0006.2500
Silent editing0001010
Non-silent editing343775182521
Intron editing000400
1st codon editing (%)14.75.41615.444.8
2nd codon editing (%)85.391.982.6746.19295.2
3rd codon editing (%)02.71.3323.540

The rps19 pseudogenization and IR fluctuation

Dot plot analysis demonstrated that the gene content and organization of coconut cpDNA are nearly identical to other palm species (Fig. S1). Nevertheless, some variation could be detected. For instance, other palm species have two copies of therps19 gene located near the IRA/LSC and IRB/SSC junctions respectively, whereas coconut has only one copy of rps19 at the IRB/SSC junction. At the IRA/LSC junction we found a rps19-like sequence of 174 bp, which is likely a pseudogene judged from its shorter length compared to the regular rps19 gene (279 bp). We speculate that the pseudogenization of the rps19 at IRA/LSC junction is due to IR fluctuation in coconut cpDNA. A comparative study among cpDNAs of six palm species (Table 5) indicated that coconut has the smallest cp genome (154,731 bp) and the shortest IRs (53,110 bp). The largest cp genome with the longest IRs is found in Phoenix (158,462 bp and 54,552 bp, respectively). Similarly to other cp genomes [2], [3], the palm cp genomes, including coconut, are all AT-rich. Graphical alignment showed that the IRs have both expanded and contracted during the evolution of the palm family, though dramatic changes were not detected (Fig. 2).
Figure 2

IR expansion into the LSC and SSC regions.

Comparison of IR boundaries among six palm species. Numbers in red denote distance between rpl22 and junction of LSC and IRB. Numbers in blue denote distance between rps19 and junction of LSC and IRA. Numbers in gray denote distance between psbA and junction of LSC and IRA.

IR expansion into the LSC and SSC regions.

Comparison of IR boundaries among six palm species. Numbers in red denote distance between rpl22 and junction of LSC and IRB. Numbers in blue denote distance between rps19 and junction of LSC and IRA. Numbers in gray denote distance between psbA and junction of LSC and IRA. Fluctuations of the IR regions have occurred sporadically during the evolutionary history of angiosperms [55]. Two of the most extreme cases are found in Pelargonium hortorum of the Geraniaceae and a group of legumes that includes pea and broad beans. The single IR region has expanded to 76 kb [46] in the former whereas one copy of the IRs is completely lost from cp genomes of the latter [1]. The structurally conserved feature of the IR regions is resistant to recombinational loss [56]. The presence of the IR regions may thus help to stabilize the cp genome. The most direct evidence for this suggestion is that more rearrangements occurred within the group of legumes that have lost a copy of IR than those that have not [57]. Another piece of evidence is the acceleration of synonymous substitution rates in the remaining copy of the duplicated region [56]. Consequently, we can infer that the evolutionary rates of cp genomes in the palm family are relatively mild, judging from the comparatively minor fluctuation of the IR regions.

Phylogenetic analysis and events of gene gain and loss

Our phylogenetic reconstruction built upon 47 protein-coding genes of cp sequences, rooted by Amborella, supported three major monophyletic groups: magnoliids, monocots, and eudicots (Fig. 3). Within monocots, Acorus (Acorales) diverged from other monocots first, followed by Colocasia (Alismatales), then by Cymbidium (Asparagales), which is sister to a clade that forms a monophyletic group of commelinids. The commelinids contain two sister clades. Within the first clade, Arecales group with the family Dasypogonaceae. In the second clade, Poales is sister to a subclade, which includes Zingiberales and Commelinales (Fig. 3). This topology is consistent with a phylogenetic study of commelinids based on 83 plastid genes [8]. Moreover, our inference of relationships within the Arecales is also congruent with a thorough study of the palm family using a supermatrix method with 16 data partition [58].
Figure 3

Phylogenetic tree of monocots.

Numbers above/below the branches are bootstrap value (only values higher than 50% are shown). Black square denotes rps19 duplication, gray square denotes rps19 pseudogenization, white square denotes complete loss of duplicate rps19, and blue square denotes pseudo ycf1 and ndhF overlap.

Phylogenetic tree of monocots.

Numbers above/below the branches are bootstrap value (only values higher than 50% are shown). Black square denotes rps19 duplication, gray square denotes rps19 pseudogenization, white square denotes complete loss of duplicate rps19, and blue square denotes pseudo ycf1 and ndhF overlap. We then mapped the related gene duplication and pseudogenization events onto the tree according to parsimony criteria. Our results indicate that the duplication of rps19 gene near the IRA/LSC junction likely occurred before the divergence of Asparagales from the remaining monocots, which consist of Arecales, a family (Dasypogonaceae) with indecisive order (Dasypogon and Kingia), Poales, Commelinales, and Zingiberales (Fig. 3). After the lineages differentiated, the duplicated rps19 eventually became a pseudogene independently in Cocos of the Arecales, Heliconia of the Zingiberales, and Nandina of the Ranunculales. It has been completely lost in Xiphidium of the Commelinales and Ceratophyllum of the Ceratophyllales (Fig. 3). In monocots, the overlap between ndhF and pseudo ycf1 was found in a clade that contains Arecales and Dasypogonaceae. However, it was also found in Drimys of the Canellales and Chloranthus of the Chloranthales, both belong to the magnoliids. Following the parsimony rule, we concluded that the occurrence of the overlap between ndhF and pseudo ycf1 in monocots and magnoliids arose from three independent events. In summary, we have presented here the first complete cp genome sequence from coconut palm. Although the cp genome of coconut is the smallest found so far among palms, it shares the same overall organization, gene content and repeat structure that have been observed with cpDNA sequenced from other palm species. Nevertheless, unique features were found for the coconut genome, including pseudogenization of rps19-like gene and an unusually high number of RNA editing sites. A closer relationship between coconut and oil palms than with date palm was supported by phylogenetic relationships among angiosperms. Our data will contribute to the growing number of molecular and genomic resources available for studying coconut palm biology. Dot plot analysis. The cp genomes are nearly identical in the palm family. (TIF) Click here for additional data file. Primers used for gap-filling PCR and RT-PCR. (DOCM) Click here for additional data file.
  55 in total

1.  Nucleotide substitution rates in legume chloroplast DNA depend on the presence of the inverted repeat.

Authors:  Antoinette S Perry; Kenneth H Wolfe
Journal:  J Mol Evol       Date:  2002-11       Impact factor: 2.395

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae).

Authors:  Debra R Hansen; Sayantani G Dastidar; Zhengqiu Cai; Cynthia Penaflor; Jennifer V Kuehl; Jeffrey L Boore; Robert K Jansen
Journal:  Mol Phylogenet Evol       Date:  2007-06-16       Impact factor: 4.286

4.  The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.).

Authors:  Meng Yang; Xiaowei Zhang; Guiming Liu; Yuxin Yin; Kaifu Chen; Quanzheng Yun; Duojun Zhao; Ibrahim S Al-Mssallem; Jun Yu
Journal:  PLoS One       Date:  2010-09-15       Impact factor: 3.240

5.  Complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera, and comparative analyses with other grass genomes.

Authors:  Christopher Saski; Seung-Bum Lee; Siri Fjellheim; Chittibabu Guda; Robert K Jansen; Hong Luo; Jeffrey Tomkins; Odd Arne Rognli; Henry Daniell; Jihong Liu Clarke
Journal:  Theor Appl Genet       Date:  2007-05-30       Impact factor: 5.699

6.  Cucumber, melon, pumpkin, and squash: are rules of editing in flowering plants chloroplast genes so well known indeed?

Authors:  Magdalena Guzowska-Nowowiejska; Ewa Fiedorowicz; Wojciech Plader
Journal:  Gene       Date:  2008-12-30       Impact factor: 3.688

7.  High-throughput sequencing of six bamboo chloroplast genomes: phylogenetic implications for temperate woody bamboos (Poaceae: Bambusoideae).

Authors:  Yun-Jie Zhang; Peng-Fei Ma; De-Zhu Li
Journal:  PLoS One       Date:  2011-05-31       Impact factor: 3.240

8.  Tobacco plastid ribosomal protein S18 is essential for cell survival.

Authors:  Marcelo Rogalski; Stephanie Ruf; Ralph Bock
Journal:  Nucleic Acids Res       Date:  2006-08-31       Impact factor: 16.971

9.  The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var 'Ridge Pineapple': organization and phylogenetic relationships to other angiosperms.

Authors:  Michael G Bausher; Nameirakpam D Singh; Seung-Bum Lee; Robert K Jansen; Henry Daniell
Journal:  BMC Plant Biol       Date:  2006-09-30       Impact factor: 4.215

10.  Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses.

Authors:  Jun-Bo Yang; Min Tang; Hong-Tao Li; Zhi-Rong Zhang; De-Zhu Li
Journal:  BMC Evol Biol       Date:  2013-04-18       Impact factor: 3.260

View more
  46 in total

1.  Chloroplast genome sequence of Chongming lima bean (Phaseolus lunatus L.) and comparative analyses with other legume chloroplast genomes.

Authors:  Shoubo Tian; Panling Lu; Zhaohui Zhang; Jian Qiang Wu; Hui Zhang; Haibin Shen
Journal:  BMC Genomics       Date:  2021-03-18       Impact factor: 3.969

2.  The newly developed single nucleotide polymorphism (SNP) markers for a potentially medicinal plant, Crepidiastrum denticulatum (Asteraceae), inferred from complete chloroplast genome data.

Authors:  Hoang Dang Khoa Do; Joonhyung Jung; JongYoung Hyun; Seok Jeong Yoon; Chaejin Lim; Keedon Park; Joo-Hwan Kim
Journal:  Mol Biol Rep       Date:  2019-04-12       Impact factor: 2.316

3.  The plastomes of Astrocaryum aculeatum G. Mey. and A. murumuru Mart. show a flip-flop recombination between two short inverted repeats.

Authors:  Amanda de Santana Lopes; Túlio Gomes Pacheco; Odyone Nascimento da Silva; Leonardo Magalhães Cruz; Eduardo Balsanelli; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Marcelo Rogalski
Journal:  Planta       Date:  2019-06-20       Impact factor: 4.116

4.  The complete plastome of macaw palm [Acrocomia aculeata (Jacq.) Lodd. ex Mart.] and extensive molecular analyses of the evolution of plastid genes in Arecaceae.

Authors:  Amanda de Santana Lopes; Túlio Gomes Pacheco; Tabea Nimz; Leila do Nascimento Vieira; Miguel P Guerra; Rubens O Nodari; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Marcelo Rogalski
Journal:  Planta       Date:  2018-01-16       Impact factor: 4.116

5.  Genetic, evolutionary and phylogenetic aspects of the plastome of annatto (Bixa orellana L.), the Amazonian commercial species of natural dyes.

Authors:  Túlio Gomes Pacheco; Amanda de Santana Lopes; Gélia Dinah Monteiro Viana; Odyone Nascimento da Silva; Gleyson Morais da Silva; Leila do Nascimento Vieira; Miguel Pedro Guerra; Rubens Onofre Nodari; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Wagner Campos Otoni; Marcelo Rogalski
Journal:  Planta       Date:  2018-10-11       Impact factor: 4.116

6.  The Linum usitatissimum L. plastome reveals atypical structural evolution, new editing sites, and the phylogenetic position of Linaceae within Malpighiales.

Authors:  Amanda de Santana Lopes; Túlio Gomes Pacheco; Karla Gasparini Dos Santos; Leila do Nascimento Vieira; Miguel Pedro Guerra; Rubens Onofre Nodari; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Marcelo Rogalski
Journal:  Plant Cell Rep       Date:  2017-10-30       Impact factor: 4.570

7.  Plastid genome evolution in Amazonian açaí palm (Euterpe oleracea Mart.) and Atlantic forest açaí palm (Euterpe edulis Mart.).

Authors:  Amanda de Santana Lopes; Túlio Gomes Pacheco; Odyone Nascimento da Silva; Leila do Nascimento Vieira; Miguel Pedro Guerra; Eduardo Pacca Luna Mattar; Valter Antonio de Baura; Eduardo Balsanelli; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Marcelo Rogalski
Journal:  Plant Mol Biol       Date:  2021-01-01       Impact factor: 4.076

8.  Complete Chloroplast Genome Sequence of Tartary Buckwheat (Fagopyrum tataricum) and Comparative Analysis with Common Buckwheat (F. esculentum).

Authors:  Kwang-Soo Cho; Bong-Kyoung Yun; Young-Ho Yoon; Su-Young Hong; Manjulatha Mekapogu; Kyung-Hee Kim; Tae-Jin Yang
Journal:  PLoS One       Date:  2015-05-12       Impact factor: 3.240

9.  Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats.

Authors:  Shiliang Hu; Gaurav Sablok; Bo Wang; Dong Qu; Enrico Barbaro; Roberto Viola; Mingai Li; Claudio Varotto
Journal:  BMC Genomics       Date:  2015-04-17       Impact factor: 3.969

10.  Comparative Analysis of Chloroplast Genomes of Four Medicinal Capparaceae Species: Genome Structures, Phylogenetic Relationships and Adaptive Evolution.

Authors:  Dhafer A Alzahrani; Enas J Albokhari; Samaila S Yaradua; Abidina Abba
Journal:  Plants (Basel)       Date:  2021-06-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.