| Literature DB >> 22655034 |
Yongjun Fang1, Hao Wu, Tongwu Zhang, Meng Yang, Yuxin Yin, Linlin Pan, Xiaoguang Yu, Xiaowei Zhang, Songnian Hu, Ibrahim S Al-Mssallem, Jun Yu.
Abstract
Based on next-generation sequencing data, we assembled the mitochondrial (mt) genome of date palm (Phoenix dactylifera L.) into a circular molecule of 715,001 bp in length. The mt genome of P. dactylifera encodes 38 proteins, 30 tRNAs, and 3 ribosomal RNAs, which constitute a gene content of 6.5% (46,770 bp) over the full length. The rest, 93.5% of the genome sequence, is comprised of cp (chloroplast)-derived (10.3% with respect to the whole genome length) and non-coding sequences. In the non-coding regions, there are 0.33% tandem and 2.3% long repeats. Our transcriptomic data from eight tissues (root, seed, bud, fruit, green leaf, yellow leaf, female flower, and male flower) showed higher gene expression levels in male flower, root, bud, and female flower, as compared to four other tissues. We identified 120 potential SNPs among three date palm cultivars (Khalas, Fahal, and Sukry), and successfully found seven SNPs in the coding sequences. A phylogenetic analysis, based on 22 conserved genes of 15 representative plant mitochondria, showed that P. dactylifera positions at the root of all sequenced monocot mt genomes. In addition, consistent with previous discoveries, there are three co-transcribed gene clusters-18S-5S rRNA, rps3-rpl16 and nad3-rps12-in P. dactylifera, which are highly conserved among all known mitochondrial genomes of angiosperms.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22655034 PMCID: PMC3360038 DOI: 10.1371/journal.pone.0037164
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A circular display of P. dactylifera mitochondrial genome.
We display (starting from outside to inside): physical map scaled in kb, coding sequences transcribed in the clockwise (red) and counterclockwise directions (blue), chloroplast-derived regions (green boxes), sequence repeats (black), histogram of transcriptome data (green bar, standing for average RPKM value per 200 bp, transformed using natural logs and ranging from 0 to 10), GC content variations (brown bar in a 500 bp sliding window and 500 bp increments), and SOLiD mate-pair (MP) read validation (sliding window 2 kb, MP insertion size 5–6 kb, Step size 15 kb). This figure was generated by using the Circos program [68]. Ψ indicates pseudogene.
Comparative analysis of genomic features among 15 mt genomes.
| Size(bp) | AT(%) | Gene number (Total/Protein/tRNA/rRNA) | Coding(%) | Repeats(%) | Cp(%) | Group I introns | Group II introns (Cis/Trans-spliced) | RNA editing sites | |
|
| 67,737 | 59.1 | 76/46/27/3 | 90.7 | 1.5 | − | 14 | 13/0 | − |
|
| 186,609 | 57.6 | 110/76/29/3 | 20.3 | 7.8 | − | 7 | 25/0 | − |
|
| 414,903 | 53.1 | 70/39/26/3 | 10.1 | 21 | 4.4 | 0 | 20/5 | 1084 |
|
| 368,801 | 56.1 | 171/140/26/5 | 10.3 | 13.4 | 2.1 | 0 | 14/6 | 370 |
|
| 221,853 | 54.8 | 100/79/17/3 | 17.3 | 5.2 | 3.6 | 0 | 18/5 | 427 |
|
| 366,924 | 55.2 | 131/117/21/3 | 10.6 | 10.6 | 1.1 | 0 | 18/5 | 441 |
|
| 430,597 | 55.0 | 183/156/23/4 | 9.9 | 11.7 | 2.5 | 0 | 17/6 | − |
|
| 773,279 | 55.9 | 161/74/31/3 | 5.0 | 2.9 | 8.8 | 0 | −/− | − |
|
| 715,001 | 54.8 | 90/43/23/3 | 6.5 | 2.3 | 10.3 | 0 | 20/4 | 592 |
|
| 509,941 | 56.1 | 61/35/21/5 | 6.3 | 5.5 | − | 0 | −/− | − |
|
| 452,528 | 55.7 | 78/39/34/9 | 8.6 | 15.9 | 3.0 | 0 | 17/6 | − |
|
| 490,520 | 56.2 | 81/53/22/3 | 11.1 | 30.4 | 6.3 | 0 | 17/6 | 446 |
|
| 468,628 | 56.3 | 54/32/18/3 | 6.7 | 16.2 | − | 0 | −/− | − |
|
| 704,100 | 56.1 | 55/33/18/3 | 5.0 | 36.4 | − | 0 | −/− | − |
|
| 569,630 | 56.1 | 213/163/33/4 | 6.2 | 19.1 | 4.4 | 0 | 15/7 | − |
We summarized several genomic features from 15 representative mt genomes, including AT content of the mt genomes, the percentage of gene-coding sequences, and the percentage of chloroplast-derived sequences in mt genome sequences. We only used the genus names for the reference genomes.
Information about these mt genomes are from reference [69] and information about other plant mt genomes are either from original publications or NCBI databases (see Table S1).
To be consistent, repetitive sequence contents in the 15 plant mt genomes are all computed by using REPuter (length >50 bp; mismatch ≤3).
Figure 2Phylogeny inferred from 22 genes common to 15 plant mt genomes.
We constructed an ML tree using PHYML (version 3.0) [67] (Chara vulgaris as outgroup, see Materials and Methods for details). Nodes receive over 90% bootstrap replicates are indicated. P. dactylifera mt genome rooted at the basal position of monocots (red).
The gene content of P. dactylifera mt genome.
| Genes of Mitochondrial Origin | |
| Complex I |
|
| Complex III |
|
| Complex IV |
|
| Complex V |
|
| Cytochrome c biogenesis |
|
| Ribosome large subunit |
|
| Ribosome small subunit |
|
| Intron maturase |
|
| SecY-independent transporter |
|
| rRNA genes |
|
| tRNA genes |
|
| Pseudogenes |
|
| Hypothetical genes | 5 ORFs |
|
| |
| Genes with intact ORFs |
|
| Pseudogenes |
|
| tRNA genes |
|
|
| |
| RNA polymerase | RNA_pol |
Genes with intact ORFs in cp-derived regions are identified based on >95% identity and >95% length coverage to the known cp genes.
Figure 3The distribution of tRNAs in vascular and angiosperm plant mitochondrial and chloroplast genomes.
Native tRNA genes in mitochondrial and chloroplast genomes are shown in open square and circles, respectively. Solid squares indicate cp-derived tRNAs found in mt genome and ψ stands for pseudogene. There are ten tRNAs (their anticodons are highlighted in bold) that are gradually lost in genome evolution and four tRNAs (their anticodons are underlined) that are gradually replaced by their cp-derived counterparts. These eight mt genomes are listed according to their relative phylogenetic positions in Figure 2.
The distribution of nine P.dactylifera chloroplast-derived mt regions in five known plant mt-genomes.
| Position | Length | Identity | GC |
|
|
|
|
|
| 130051–131335 | 1285 | 90 | 0.4084 | − | + | − | − | − |
| 87871–88837 | 967 | 88 | 0.4224 | − | − | + | + | + |
| 328935–329833 | 899 | 90 | 0.4405 | − | + | + | + | + |
| 179701–180523 | 823 | 91 | 0.4702 | − | − | + | − | − |
| 535882–536375 | 494 | 94 | 0.4231 | − | + | − | − | − |
| 271397–271847 | 451 | 87 | 0.3792 | − | + | + | − | − |
| 483235–483594 | 360 | 89 | 0.4389 | − | + | − | − | − |
| 586885–587154 | 270 | 95 | 0.4870 | − | − | + | + | + |
| 500598–500864 | 267 | 88 | 0.4607 | – | + | − | − | − |
We selected homologous sequences with identity >70% and length coverage >90% for the comparative analysis. The results for two dicots (Arabidopsis and Vitis) and three monocots (Bambusa, Oryza, and Zea) are listed here. The presence (+) and absence (−) of the corresponding cp regions are indicated based on identity and length coverage. Only the genus names are used for the reference mt genomes.
The sequence identity between the cp sequence insertions in P. dactylifera mt genome and their cp homologs.
The GC content of the cp-derived sequences in P. dactylifera mt genome.
Figure 4Venn diagram of shared RNA editing sites among three plant mt genomes.
Intra-varietal SNPs among the three cultivars.
| SNP | Khalas | Fahal | Sukry |
|
| |||
| A/G | 72 | 75 | 68 |
| G/A | 83 | 99 | 85 |
| T/C | 66 | 66 | 64 |
| C/T | 76 | 85 | 70 |
| total | 297 | 325 | 287 |
|
| |||
| A/C | 78 | 88 | 74 |
| A/T | 28 | 32 | 33 |
| C/A | 60 | 60 | 60 |
| C/G | 11 | 14 | 10 |
| G/C | 10 | 12 | 12 |
| G/T | 65 | 63 | 63 |
| T/A | 31 | 34 | 33 |
| T/G | 71 | 75 | 62 |
| Total | 354 | 378 | 347 |
Major and minor genotypes are separated with oblique lines (/).
Numbers of sites are calculated for each cultivar.
Inter-varietal SNPs.
| Coding | Non-coding | Total | |
| Khalas vs. Fahal | 2 | 79 | 81 |
| Khalas vs. Sukry | 6 | 91 | 97 |
| Fahal vs. Sukry | 6 | 50 | 56 |
| Khalas vs. Fahal vs. Sukry | 7 | 113 | 120 |
“Coding" and “Non-coding" indicate numbers of inter-varietal SNPs found among the groups.
Figure 5Percentage of long repeats and tandem repeats of 15 mt genomes.
We analyzed long repeats (repeat unit >50 bp) using REPuter [63] and tandem repeats based on Tandem Repeat Finder [64] (see Materials and Methods for details). The genus names are used to represent the sequenced mitochondrial genomes and arranged according to their relative phylogenetic positions in Figure 2.
The transcript coverage of P.dactylifera mt genome.
| Reads | Length (bp) | Percentage (%) |
| = 0 | 494769 | 69.20 |
| 1–9 | 136935 | 19.15 |
| 10–99 | 69116 | 9.67 |
| 100–999 | 10734 | 1.50 |
| 1000–9999 | 3020 | 0.42 |
| >10000 | 427 | 0.060 |
Read number in each genome position. “ = 0" means no transcription activity was observed and the larger the number the higher the gene expression level. Average coverage of 40 highly-conserved genes (∼51,000 bp in length) in P. dactylifera mt genome is 44.26×.
Total length of genomic sequences defined for transcript expression level.
The proportion of transcribed region relative to the whole mt genome.
Figure 6Gene expression profiles of P. dactylifera mitochondrion among 8 tissues.
We used 40 house-keeping (conserved over diverse plant lineages) genes for hierarchical clustering (Manhattan distance method). Red and green indicate high and low levels of gene expression, respectively.