| Literature DB >> 27736909 |
Hasan Awad Aljohi1,2, Wanfei Liu1,3, Qiang Lin1,3, Yuhui Zhao3, Jingyao Zeng3, Ali Alamer1,2, Ibrahim O Alanazi1,2, Abdullah O Alawad2, Abdullah M Al-Sadi4, Songnian Hu1,3, Jun Yu1,3.
Abstract
Coconut (Cocos nucifera L.), a member of the palm family (Arecaceae), is one of the most economically important crops in tropics, serving as an important source of food, drink, fuel, medicine, and construction material. Here we report an assembly of the coconut (C. nucifera, Oman local Tall cultivar) mitochondrial (mt) genome based on next-generation sequencing data. This genome, 678,653bp in length and 45.5% in GC content, encodes 72 proteins, 9 pseudogenes, 23 tRNAs, and 3 ribosomal RNAs. Within the assembly, we find that the chloroplast (cp) derived regions account for 5.07% of the total assembly length, including 13 proteins, 2 pseudogenes, and 11 tRNAs. The mt genome has a relatively large fraction of repeat content (17.26%), including both forward (tandem) and inverted (palindromic) repeats. Sequence variation analysis shows that the Ti/Tv ratio of the mt genome is lower as compared to that of the nuclear genome and neutral expectation. By combining public RNA-Seq data for coconut, we identify 734 RNA editing sites supported by at least two datasets. In summary, our data provides the second complete mt genome sequence in the family Arecaceae, essential for further investigations on mitochondrial biology of seed plants.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27736909 PMCID: PMC5063475 DOI: 10.1371/journal.pone.0163990
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Circular display of C. nucifera mt genome.
We display (from outside to inside): physical map scaled in kb; coding sequences transcribed in the clockwise and counterclockwise directions (nad in red; cob, matR and mttB in green; cox in blue; atp in purple; ccm in orange; rpl in yellow; rps in dark red; rRNA in dark green; tRNA in dark blue; orf in dark purple; and others in black); chloroplast-derived regions (green); repeats (forward repeats in green, palindrome repeats in red and tandem repeats in blue); RNA edit sites (synonymous in green and non-synonymous in red); gene conserve scores (black); proper HiSeq mate-pair (MP) reads percent with insert size 5kb and 8kb (blue); and the four regions (thick lines indicate IRs and thin lines indicate LSC and SSC). * indicates pseudogenes.
The gene content of the C. nucifera mt genome.
| Function | Genes |
|---|---|
| Genes of Mitochondrial Origin (109/85) | |
| Complex I (9) | |
| Complex II (1) | |
| Complex III (1) | |
| Complex IV (4/3) | |
| Complex V (5) | |
| Cytochrome c biogenesis (5/4) | |
| Ribosome large subunit (3) | |
| Ribosome small subunit (14/10) | |
| Intron maturase (1) | |
| SecY-independent transporter (1) | |
| rRNA genes (3) | |
| tRNA genes (29/18) | |
| Hypothetical genes (26/19) | |
| Pseudogenes (7) | |
| Genes of Chloroplast Origin (34/29) | |
| Functional genes (11/10) | |
| Hypothetical genes (4/3) | |
| rRNA genes (3) | |
| tRNA genes (13/11) | |
| Pseudogenes (3/2) | |
| Genes of Nuclear Origin (2): | |
Note: The two numbers in parentheses after the item of the first column stand for total and unique genes; the number in parentheses after gene name is gene copy number.
Fig 2Phylogenetic trees of 31 mt proteins from 19 plant species.
Shown in the left is a maximum parsimony tree and the right is a maximum likelihood tree based on MEGA 6.06. The C. nucifera mt proteins form a cluster with those of P. dactylifera and B. umbellatus among monocotyledons.
Codon usage and codon-anticodon recognition pattern in the C. nucifera mt genome.
| AA | C | No. | R | tRNA | AA | C | No. | R | tRNA | AA | C | No. | R | tRNA | AA | C | No. | R | tRNA |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 520 | 1.11 | Ser | UCU | 370 | 1.41 | Tyr | UAU | 349 | 1.31 | AUA | Cys | UGU | 140 | 1.06 | |||
| UUC | 416 | 0.89 | GAA2* | UCC | 275 | 1.05 | UAC | 184 | 0.69 | GUA | UGC | 125 | 0.94 | GCA2* | |||||
| Leu | UUA | 332 | 1.21 | UCA | 276 | 1.05 | UGA3* | Ter | UAA | 30 | 1.17 | Ter | UGA | 20 | 0.78 | ||||
| UUG | 344 | 1.25 | UCG | 219 | 0.84 | CGA | UAG | 27 | 1.05 | Trp | UGG | 258 | 1.00 | CCA | |||||
| CUU | 334 | 1.21 | Pro | CCU | 323 | 1.29 | His | CAU | 307 | 1.35 | Arg | CGU | 224 | 1.16 | ACG* | ||||
| CUC | 232 | 0.84 | CCC | 201 | 0.80 | CAC | 149 | 0.65 | GUG* | CGC | 134 | 0.69 | |||||||
| CUA | 230 | 0.84 | CCA | 327 | 1.30 | UGG2 | Gln | CAA | 317 | 1.33 | UUG | CGA | 256 | 1.32 | |||||
| CUG | 179 | 0.65 | CCG | 153 | 0.61 | CAG | 158 | 0.67 | CGG | 161 | 0.83 | ||||||||
| Ile | AUU | 462 | 1.17 | AAU2 | Thr | ACU | 283 | 1.29 | Asn | AAU | 348 | 1.24 | Ser | AGU | 236 | 0.90 | |||
| AUC | 394 | 1.00 | GAU2** | ACC | 239 | 1.09 | AAC | 215 | 0.76 | GUU2* | AGC | 194 | 0.74 | GCU | |||||
| AUA | 330 | 0.83 | UAU6* | ACA | 209 | 0.96 | UGU* | Lys | AAA | 463 | 1.04 | UUU4 | Arg | AGA | 301 | 1.21 | |||
| Met | AUG | 429 | 1.00 | CAU4** | ACG | 144 | 0.66 | AAG | 425 | 0.96 | AGG | 197 | 0.79 | ||||||
| Val | GUU | 303 | 1.20 | Ala | GCU | 414 | 1.58 | Asp | GAU | 410 | 1.29 | Gly | GGU | 355 | 1.23 | ||||
| GUC | 207 | 0.82 | GCC | 227 | 0.86 | GAC | 226 | 0.71 | GUC | GGC | 172 | 0.59 | GCC* | ||||||
| GUA | 279 | 1.11 | GCA | 252 | 0.96 | Glu | GAA | 486 | 1.21 | UUC | GGA | 387 | 1.34 | ||||||
| GUG | 220 | 0.87 | GCG | 158 | 0.60 | GAG | 320 | 0.79 | GGG | 243 | 0.84 |
Note: AA, Amino acid; C, Codon; R, relative synonymous codon usage;
a, the content of tRNA including anticodon and tRNA; the cp-derived tRNA is indicated with asterisks (*).
Fig 3Circular display of C. nucifera mt transcriptomes.
We display (from outside to inside): physical map scaled in kb; coding sequences transcribed in the clockwise and counterclockwise directions (nad in red; cob, matR and mttB in green; cox in blue; atp in purple; ccm in orange; rpl in yellow; rps in dark red; rRNA in dark green; tRNA in dark blue; orf in dark purple; and others in black); histogram of transcriptome data (plus strand in red and minus strand in green, standing for normalized average coverage value per 100 bp ranging from 0 to 100) for sample Health_leaf1, CYD_leaf, Callus, RWDS_leaf, Endosperm, Embryo, Health_leaf2 and Leaf_fruit; coding sequences transcribed in the clockwise and counterclockwise directions; and the four regions (thick lines indicate IRs and thin lines indicate LSC and SSC). * indicates pseudogene.
Mt transcriptome profiles of the 8 coconut RNA-Seq datasets.
| Cultivar | Tissue | SRA accession No. | Length | Original fragments | High quality fragments | Percent | mt mapping fragments | mt mapping percent |
|---|---|---|---|---|---|---|---|---|
| Malayan Red Dwarf | Healthy_leaf1 | SRR1063404 | 202 | 36,009,632 | 32,555,041 | 90.41% | 1,337,565 | 3.71% |
| Malayan Red Dwarf | CYD_leaf (CYD-infected leaf) | SRR1063407 | 202 | 35,467,948 | 32,141,745 | 90.62% | 101,295 | 0.29% |
| West Coast Tall | Callus (Embryogenic callus) | SRR1137438 | 152 | 50,839,994 | 42,267,444 | 83.14% | 121,356 | 0.24% |
| Chowghat Green Dwarf | RWDS_leaf (root wilt disease susceptible leaf) | SRR1173229 | 202 | 119,333,177 | 113,394,045 | 95.02% | 289,707 | 0.24% |
| Dwarf | Endosperm | SRR1265939 | 202 | 51,540,183 | 48,892,847 | 94.86% | 60,531 | 0.12% |
| Dwarf | Embryo | SRR1273070 | 337 | 40,564,276 | 37,752,443 | 93.07% | 21,021 | 0.05% |
| Dwarf | Healthy_leaf2 (Young leaf) | SRR1273180 | 252 | 60,030,680 | 54,291,251 | 90.44% | 882,592 | 1.47% |
| Hainan Tall | Leaf_fruit (Spear leaf, young leaf and fruit flesh) | SRR606452 | 180 | 27,465,703 | 27,063,513 | 98.54% | 447,384 | 1.63% |
Note: CYD, coconut yellow decline;
a, the percent is corresponding to high quality fragments.
The mt read coverage of the 8 coconut RNA-Seq datasets.
| Coverage | Type | Health_leaf1 | CYD_leaf | Callus | RWDS_leaf | Endosperm | Embryo | Health_leaf2 | Leaf_fruit |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Bases | 306960 | 441453 | 288308 | 190574 | 371400 | 425207 | 285658 | 217158 |
| Percent | 45.23% | 65.05% | 42.48% | 28.08% | 54.73% | 62.65% | 42.09% | 32.00% | |
| 1–4 | Bases | 198091 | 137894 | 253794 | 276303 | 156258 | 145853 | 226692 | 242753 |
| Percent | 29.19% | 20.32% | 37.40% | 40.71% | 23.02% | 21.49% | 33.40% | 35.77% | |
| 5–9 | Bases | 39147 | 35099 | 52538 | 70598 | 53185 | 42874 | 45553 | 61987 |
| Percent | 5.77% | 5.17% | 7.74% | 10.40% | 7.84% | 6.32% | 6.71% | 9.13% | |
| 10–99 | Bases | 98440 | 50247 | 70713 | 114064 | 83923 | 52614 | 89257 | 113677 |
| Percent | 14.51% | 7.40% | 10.42% | 16.81% | 12.37% | 7.75% | 13.15% | 16.75% | |
| 100–999 | Bases | 20793 | 10058 | 7332 | 17646 | 12001 | 12088 | 15898 | 32165 |
| Percent | 3.06% | 1.48% | 1.08% | 2.60% | 1.77% | 1.78% | 2.34% | 4.74% | |
| > = 1000 | Bases | 15222 | 3902 | 5968 | 9468 | 1886 | 17 | 15595 | 10913 |
| Percent | 2.24% | 0.57% | 0.88% | 1.40% | 0.28% | 0.00% | 2.30% | 1.61% | |
| > = 1 | Percent | 54.77% | 34.94% | 57.52% | 71.92% | 45.28% | 37.34% | 57.90% | 68.00% |
Fig 4Expression patterns of mt genes among 8 RNA-Seq datasets.
The expression levels are normalized based on DEseq.