Literature DB >> 35847622

Characterization and phylogenetic analysis of the complete mitochondrial genome sequence of Diospyros oleifera, the first representative from the family Ebenaceae.

Yang Xu1, Yi Dong1, Wenqiang Cheng1, Kaiyun Wu1, Haidong Gao2, Lei Liu2, Lei Xu2, Bangchu Gong1.   

Abstract

Plant mitochondrial genomes are a valuable source of genetic information for a better understanding of phylogenetic relationships. However, no mitochondrial genome of any species in Ebenaceae has been reported. In this study, we reported the first mitochondrial genome of an Ebenaceae model plant Diospyros oleifera. The mitogenome was 493,958 bp in length, contained 39 protein-coding genes, 27 transfer RNA genes, and 3 ribosomal RNA genes. The rps2 and rps11 genes were missing in the D. oleifera mt genome, while the rps10 gene was identified. The length of the repetitive sequence in the D. oleifera mt genome was 31 kb, accounting for 6.33%. A clear bias in RNA-editing sites were found in the D. oleifera mt genome. We also detected 28 chloroplast-derived fragments significantly associated with D. oleifera mt genes, indicating intracellular tRNA genes transferred frequently from chloroplasts to mitochondria in D. oleifera. Phylogenetic analysis based on the mt genomes of D. oleifera and 27 other taxa reflected the exact evolutionary and taxonomic status of D. oleifera. Ka/Ks analysis revealed that 95.16% of the protein-coding genes in the D. oleifera mt genome had undergone negative selections. But, the rearrangement of mitochondrial genes has been widely occur among D. oleifera and these observed species. These results will lay the foundation for identifying further evolutionary relationships within Ebenaceae.
© 2022 The Author(s).

Entities:  

Keywords:  Diospyros oleifera; Mitochondrial genome; Phylogenetic analysis

Year:  2022        PMID: 35847622      PMCID: PMC9283892          DOI: 10.1016/j.heliyon.2022.e09870

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Mitochondria are the main organelle involved in energy metabolism in plants [1, 2]. They supply ATP via oxidative phosphorylation for metabolism, cell differentiation, apoptosis, cell growth, and cell division and are abundant in energy-consuming tissues involved in essential biological functions [1, 2, 3, 4]. Therefore, mitochondria play an important role in plant productivity and development [2, 5, 6]. According to endosymbiotic theory, plant mitochondria are believed to have descended from free-living bacteria-independent microorganisms, which explains the presence of their genomes [5, 7]. During evolution, the plant mitochondrial (mt) genome underwent dramatic changes in, for example, the gene order, genome structure, and migration of sequences from other organelles [5, 7, 8, 9]. Thus, plants have about 100–10,000 times larger and more structurally complex mitochondrial (mt) genomes than animals [10, 11, 12]. The mt genomes of plants demonstrate significant genome size variation, from 66 kb [13] to 11.3 Mb [14]; the number of protein-coding genes varies from 14 to 67 [15]; and the number of tRNA genes varies from 3 to 27 [9]. There are variations in mitochondrial genomes not only between plant species but also within the same species [9, 12, 16, 17], in stark contrast to the conserved structure of plant chloroplast genomes [16, 17, 18]. Thus, mt genomes have been used as a valuable source of genetic information and for investigation of essential cellular processes in many phylogenetic studies [18, 19, 20, 21]. While, these characteristics of plant mt genomes (bigger size, more structural complexity, and low conservation across species) make plant mitochondrial genome assembly difficult [1, 8, 10]. To date, more than 5000 plant chloroplast genomes have been sequenced, but only about 400 mt genome sequences are available (www.ncbi.nlm.nih.gov/genome/organelle/,11/11/2021). In addition, sequenced plants largely differ in their classification, and only three complete mitochondrial genomes of species from the order Ericales have been identified. Diospyros L., from the Ebenaceae family, is a plant genus that includes over 500 species widely distributed across tropical and subtropical regions [22] and that is one of the largest angiosperm genera [23]. Among these species, Diospyros oleifera and Diospyros kaki have been cultivated as an important fruit crop in China, Korea, Japan for centuries, due to its edible fruit is rich in vitamins, sugars, nutrients and antioxidants vital for optimum health with various medicinal and chemical [24, 25]. Morphological, molecular, and genomic studies have shown that D. oleifera can be used as a model plant [24, 26]. Chloroplast genome sequencing has been performed in 15 species of Diospyros [26, 27], and nuclear genome sequencing has been performed in D. oleifera [23, 28] and Diospyros lotus [29, 30]. However, to date, no mt genome of any species in Ebenaceae has been reported. Fortunately, advancements in long read sequencing, such as PacBio and Oxford Nanopore, have made organelle genome sequencing easier and faster. Therefore, in this study, we constructed the complete mt genome of D. oleifera based on PacBio and Illumina data, performed a phylogenetic analysis, and compared the complete mt genomes of D. oleifera and related genera. These results will help better understand the features of the D. oleifera mitochondrial genome and lay the foundation for identifying further evolutionary relationships within Ebenaceae.

Materials and methods

Samples and mitogenome sequencing

Due to the advancement of sequence technology, long reads, used for de novo assembly of organelle genomes without the need for organelle DNA isolation, could be easily generated from high throughput sequencing. The well-established methodology is quite efficient and well accepted in the scientific community [1, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42]. In this study, Mature leaves of D. oleifera (at latitude 34.27569 and longitude 107.75079) were used to isolate total DNA following the protocol for the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA) and the SMRTbell Libraries protocol for PacBio data (Pacific Biosciences, Menlo Park, CA, USA). All these whole-genome Illumina HiSeq and PacBio sequencing data were deposited in the NCBI GenBank (accession no. PRJNA562043) and the Persimmon Genome Website (http://www.kakiwi.zju.edu.cn/cgi-bin/persimmon/about_genome.cgi). Sequencing reads of the mitochondria were filtered and extracted from these WGD sequencing data of D. oleifera. Raw data of second-generation sequencing were filtered using fastp version 0.20.0 software (https://github.com/OpenGene/fastp) [43]. The three-generation sequencing data of mitochondrial reads were error-corrected, trimmed, and de-novo-assembled using a Canu assembler (version 1.5) with default parameters [44]. Then, the contig sequence was obtained. The gene databases of plant mitochondria (the mitochondrial gene sequences of species published on the NCBI) were compared using blast v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi), and contigs that matched with the mitochondrial gene as the seed sequence were selected. The original data were used to extend and circularize the contigs to obtain the ring-dominant structure (or secondary ring), and then, the assembly was polished using NextPolish 1.3.1 (https://github.com/Nextomics/NextPolish) [45]. The assembly results were calibrated using second- and third-generation data, and the parameters were set as rerun = 3 and -max_depth = 100. Then, the final assembly results were obtained.

Genome annotation

The assembled D. oleifera mt genome was annotated using the GeSeq tool [46]. To confirm the annotated results, the assembled D. oleifera mt genome was also BLAST-searched against protein-coding genes and ribosomal RNA (rRNA) genes of available plant mt genomes at the NCBI. Then, the sequence coordinates of the identified protein-coding genes (PCGs) were manually verified for start and stop codons. The annotations of transfer RNA (tRNA) genes were also confirmed by tRNAscan-SE [47]. ViennarNA-2.4.14 [48] was used to visualize the secondary structure of tRNA. The physical circular map was drawn using the Organellar Genome DRAW (OGDraw) v1.2 program [49]. The final annotated mt genome sequences of D. oleifera have been deposited in the NCBI GenBank (accession no. MW970112). Strand asymmetry was calculated according to the formulas: AT-skew = [A − T]/[A + T] and GCskew = [G − C]/[G + C] [50]. The possible RNA-editing sites in the PCGs of D. oleifera were predicted using the online predictive RNA editor for plant mitochondrial genes (PREP-Mt) [51] suite of servers (http://prep.unl.edu/). The codon frequencies were calculated using the Codon Usage tool in the Sequence Manipulation Suite (bioinformatics.org/sms2/codon_usage.html) [52]. The relative synonymous codon usage (RSCU [53]) was calculated using the CAI Python package of Lee [54].

Analysis of repeated sequences

Three kinds of repeats (simple sequence, tandem, and dispersed) were detected in the D. oleifera mitochondrial genome. The MIcroSAtellite (MISA) identification tool Perl script was used to detect simple sequence repeats [55]. The repeats of mono-, di-, tri-, tetra-, penta-, and hexanucleotide bases with 12, 6, 4, 3, 3, and 3 repeat numbers, respectively, were identified. Tandem repeats (>6 bp repeat units) were detected using Tandem Repeats Finder v4.09 software (http://tandem.bu.edu/trf/trf.submit.options.html) [56] with default parameters (matching probability of 80 and indel probability of 10). Direct and inverted repeats were detected using the vmatch (v2.3.0) Perl script with the minimal repeat size set to 30 bp.

Chloroplast-to-mitochondrion-DNA transformation

The D. oleifera cp genome (NC_030787.1) was downloaded from the NCBI Organelle Genome Resources Database. The protein-coding and tRNA genes, which were transferred from chloroplasts to mitochondria, were identified using Blastn software with the following screening criteria: matching rate ≥70%, E-value ≤ 1e − 10, and length ≥30 bp.

Phylogenetic tree construction and Ka/Ks analysis

The D. oleifera and Twenty-seven other species with complete or nearly complete mitogenomes were used in phylogenetic analyses, representing twenty families. Two species from Conifers were used as outgroup. All species were listed in Table 1. The mt genomes were downloaded from the NCBI Organelle Genome Resources Database, and the conserved protein-coding genes (atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFC, ccmFN, cob, cox1, cox2, cox3, matR, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rpl5, rps12, rps13, rps3, and rps4) were extracted and aligned using MAFFT v7.402 [57] with default parameters. ModelTest-NG v0.1.3 was used to determine the best-fit model, and a maximum likelihood (ML) tree was generated using RAxMLv8.2.12 with the best-fit substitution model (GTRGAMMA) at 1000 bootstrap replicates [58].
Table 1

GenBank accession numbers of mitochondrial genomes for species sampled in this study.

Classification StatusOrderFamiliySpeciesLength (bp)Accession number
IngroupAsteridsApialesApiaceaeDaucus carota#281,132NC_017855
AsteridsAquifolialesAquifoliaceaeIlex pubescens517,520NC_045078
AsteridsAsteralesAsteraceaeChrysanthemum boreale211,002NC_039757
AsteridsAsteralesAsteraceaeHelianthus annuus300,945NC_023337
AsteridsAsteralesAsteraceaeLactuca sativa363,324NC_042756
AsteridsAsteralesAsteraceaeLactuca serriola363,328NC_042378
AsteridsAsteralesCampanulaceaeCodonopsis lanceolata403,704NC_037949
AsteridsAsteralesCampanulaceaePlatycodon grandiflorus1,249,593NC_035958
AsteridsEricalesEbenaceaeDiospyros oleifera∗#493,958MW970112
AsteridsEricalesEricaceaeRhododendron simsii#802,707WJXA01000014
AsteridsEricalesEricaceaeVaccinium macrocarpon#459,678NC_023338
AsteridsGentianalesRubiaceaeScyphiphora hydrophyllacea354,155MT610041
AsteridsLamialesLamiaceaeSalvia miltiorrhiza499,236NC_023209
AsteridsLamialesLentibulariaceaeUtricularia reniformis857,234NC_034982
AsteridsLamialesOleaceaeOlea europaea710,808MW262896
AsteridsLamialesPhrymaceaeMimulus guttatus525,671NC_018041
AsteridsSolanalesConvolvulaceaeIpomoea nil265,768NC_031158
AsteridsSolanalesSolanaceaeCapsicum annuum511,530KJ865410
AsteridsSolanalesSolanaceaeNicotiana tabacum430,597NC_006581
AsteridsSolanalesSolanaceaeSolanum lycopersicum#446,257NC_035963
CommelinidsPoalesPoaceaeOryza sativa637,692JF281153
CommelinidsPoalesPoaceaeZea mays680,603DQ645539.1
RosidsBrassicalesBrassicaceaeArabidopsis thaliana367,808NC_037304
RosidsFabalesFabaceaeGlycine max402,558NC_020455
RosidsRosalesRosaceaeMalus domestica#396,947NC_018554
RosidsVitalesVitaceaeVitis vinifera773,279NC_012119
OutgroupConifersGinkgoalesGinkgoaceaeGinkgo biloba346,544KM672373
ConifersPinalesPinaceaePinus taeda1,191,054MF991879

∗Represents the new mitogenome in this study. #Represents these species were used for mitogenome synteny and rearrangements through Mauve software.

GenBank accession numbers of mitochondrial genomes for species sampled in this study. ∗Represents the new mitogenome in this study. #Represents these species were used for mitogenome synteny and rearrangements through Mauve software. The synonymous (Ks) and nonsynonymous (Ka) substitution rates of the protein-coding genes in the D. oleifera mt genome were analyzed using the 27 species. In this analysis, KaKs_Calculator (v2.0) [59] with the MLWL model was used to calculate Ka/Ks. Genome synteny and rearrangements among the using six representative species (Table 1) mitogenomes were analyzed using the progressive Mauve algorithm as implemented in Mauve ver. 2.4.0 software [60].

Results and discussion

Genomic features of the D. oleifera mt genome

The plant mitochondrial genome greatly varies in size, from 66 kb in Viscum scurruloideum [13] to 11.3 Mb in Silene conica [14]. We assembled the complete mt genome of D. oleifera in a single circular contig of 493,958 bp (GenBank accession number MW970112). The relatively medium size of the D. oleifera mt genome is similar to that of Vaccinium macrocarpon (459,678 bp) [3] and some asterids, such as Solanum lycopersicum (446,257 bp) [61], Salvia miltiorrhiza (499,236 bp) [62], and Capsicum annuum (511,530 bp) [63]; smaller than that of Rhododendron simsii (802,707 bp) [64] and Olea europaea (710,808 bp) [65]; and larger than that of Daucus carota (281,132 bp) [66] and Malus domestica (396,947 bp) [67]. In the D. oleifera mt genome, 69 genes (39 protein-coding genes, 27 tRNA genes, and 3 rRNA genes (rrn5, rrn18, and rrn26)) were annotated. The functional categorization and physical locations of the annotated genes are shown in Figure 1. The 38 different proteins (rps19 has two copies) could be divided into 10 classes (Table 2): ATP synthase (five genes), cytochrome C biogenesis (four genes), ubiquinol cytochrome c reductase (one gene), cytochrome C oxidase (three genes), maturases (one gene), transport membrane protein (one gene), NADH dehydrogenase (nine genes), ribosomal proteins (LSU; four genes), ribosomal proteins (SSU; nine genes), and succinate dehydrogenase (two genes). ATG was used as the starting codon by almost all the protein-coding genes, and the four stop codons (TAA, TGA, TAG, and CGA) had utilization rates of 48.71%, 30.77%, 17.95%, and 2.57%, respectively.
Figure 1

The circular map of D. oleifera mt genome. Gene map showing 69 annotated genes of different functional groups.

Table 2

Gene profile and organization of the D. oleifera mt genome.

Group of genesGene nameLengthStart codonStop codonAmino acid
ATP synthaseatp11530ATGTGA510
atp4579ATGTAA193
atp6807ATGTGA269
atp8480ATGTAA160
atp9285ATGTAG95
Cytohrome c biogenesisccmB621ATGTGA207
ccmC753ATGTGA251
ccmFC∗1353ATGTAA451
ccmFN1755ATGTGA585
Ubichinol cytochrome c reductasecob1182ATGTGA394
Cytochrome c oxidasecox1∗1584ACG (ATG)TAA528
cox2∗∗780ATGTAA260
cox3798ATGTGA266
MaturasesmatR1968ATGTAG656
Transport membrance proteinmttB375ATGTAG125
NADH dehydrogenasenad1∗∗∗∗978ATGTAA326
nad2∗∗∗∗1467ATGTAA489
nad3357ATGTAA119
nad4∗∗∗1488ATGTGA496
nad4L273ATGTAA91
nad5∗∗∗∗2013ATGTAA671
nad6618ATGTAA206
nad7∗∗∗∗1185ATGTAG395
nad9588ATGTAG196
Ribosomal proteins (LSU)rpl10489ATGTAA163
rpl16435NDTAA145
rpl2∗1005ATGTAA335
rpl5564ATGTAA188
Ribosomal proteins (SSU)rps10∗333ACG (ATG)CGA111
rps12378ATGTGA126
rps13351ATGTGA117
rps14303ATGTAG101
rps19(2)(231,231)ATGTAA77
rps3∗1752ATGTAG584
rps41326ATGTAA442
rps7447ATGTAA149
Succinate dehydrogenasesdh3306ATGTGA102
sdh4432ATGTGA144
Ribosomal RNAsrrn181904
rrn263373
rrn5119
Transfer RNAstrnA-TGC∗67
trnC-GCA71
trnD-GTC74
trnE-TTC72
trnF-GAA74
trnG-GCC72
trnH-GTG74
trnI-AAT69
trnI-GAT∗74
trnK-TTT73
trnM-CAT(4)(73,74,74,77)
trnN-GTT(2)(72,72)
trnP-TGG75
trnQ-TTG72
trnR-ACG74
trnR-TCT∗72
trnS-GCT88
trnS-GGA87
trnS-TGA87
trnT-TGT∗75
trnV-GAC72
trnW-CCA74
trnY-GTA83
The circular map of D. oleifera mt genome. Gene map showing 69 annotated genes of different functional groups. Gene profile and organization of the D. oleifera mt genome. Previous studies have shown that rps10 is missing in the mt genomes of most plants, such as Arabidopsis thaliana, Brassica napus, and Beta vulgaris, and that its function is replaced by the nuclear gene [9]. However, the rps10 gene was found in the D. oleifera mt genome. The absence of rps2 and rps11 genes in the D. oleifera mt genome, consistent with R. simsii [64] and V. macrocarpon [3], supports Adams’ speculation that rps2 and rps11 genes were lost in the early evolution of eukaryotic plants [3]. Similar to Nicotiana tabacum [68] and M. luteus [4], the D. oleifera mt genome has no rps1 gene, whereas rps1 is present in the V. macrocarpon mt genome [3] and two copies of rps1 are present in the R. simsii mt genome [64]. The persimmon mitochondria have 27 tRNAs (23 typical tRNA genes, one more trnN-GTT and three more trnM-CAT). The average length of these tRNAs is 67–88 bp, with a total length of 1479 bp. The number of tRNAs in the D. oleifera mt genome is more than that in other asterids, such as V. macrocarpon (18) [3], R. simsii (23) [64], M. luteus (24) [4], and N. tabacum (21) [68]. This may be because some tRNAs in the D. oleifera mt genome have multiple copies; for example, trnN-GTT has two copies and trnM-CAT has four copies. The secondary structures are shown in Figure 2. Following terms for Agris et al. [69, 70], secondary structures of most tRNAs were recovered as ordinal cloverleaf structures, which includes amino acid accepting stem (AAS), dihydrouridine stem and loop (DSL), anticodon stem and loop (ASL), thymidine stem and loop (TSL), furthermore, trnI-GAT, trnS-GCT, trnS-GGA, trnS-TGA, trnY-GTA were with an addition variable stem and loop (VSL). And, consist with many report [19, 21, 38, 39, 42], G-T (U) matches were also found in mostly tRNA secondary structures in the D. oleifera mt genome.
Figure 2

Secondary structures of tRNAs of D. oleifera. Each region of tRNA is named as follows [69, 70]: Amino acid accepting stem, AAS (upper arm); dihydrouridine stem and loop, DSL (left arm); anticodon stem and loop, ASL (lower arm); thymidine stem and loop, TSL (right arm); variable stem and loop, VSL (between ASL and TSL).

Secondary structures of tRNAs of D. oleifera. Each region of tRNA is named as follows [69, 70]: Amino acid accepting stem, AAS (upper arm); dihydrouridine stem and loop, DSL (left arm); anticodon stem and loop, ASL (lower arm); thymidine stem and loop, TSL (right arm); variable stem and loop, VSL (between ASL and TSL). The total gene length added up to 8% of the total mt genome length, with protein-coding regions comprising only 6.5% (32 kb) of the genome length. The gene content of D. oleifera is similar to that of the published mt genomes of asterids, especially Mimulus guttatus (7.4%) [4] and Helianthus annuus (8.5%) [71]. We found 54 genes with no introns, accounting for 78.26% of the total, consistent with the result conclusion that 63.2%–100% of mitochondrial genes in most plants have no introns [8, 9]. In addition, 30 introns were found in the other 15 D. oleifera mt genes; nad1, nad2, nad5, and nad7 had 4 introns; nad4, 3 introns; and cox2, 2 introns. The nucleotide composition of the whole mt genome (Table 3) was found to be A (27.27%), T (27.03%), C (22.90%), and G (22.80%). The overall GC content was 45.7%, consistent with that of other asterids (V. macrocarpon 45.33% [3], D. carota 45.41% [66], Ilex pubescens 45.55% [35], Camellia sinensis 45.70% [33], and R. simsii 45.86% [64]). The GC skew was positive in CDS regions and negative in the mitochondrial genome. Strikingly, the GC content of the PCGs (43.11%) was lower than that of other CDS regions (tRNAs and rRNAs).
Table 3

Composition and skewness of the D. oleifera mt genome.

D.oleiferaSize (bp)A%T%G%C%A + T%G + C%AT-skewGC-skew
Mitogenome49395827.2727.0322.822.954.345.70.004-0.002
PCGs3240026.5930.321.921.2156.8943.11-0.0650.016
tRNAs202122.4626.0328.9522.5648.4951.51-0.0730.124
rRNAs539625.822.0729.2322.9147.8752.130.0780.121
Composition and skewness of the D. oleifera mt genome.

Repeat sequences analysis

Simple sequence repeats (SSRs, or microsatellites) are DNA stretches consisting of short, tandem units of sequence repetitions 1–6 base pairs in length [72]. We identified 87 SSRs in the D. oleifera mt genome. The proportions of different repeat units are shown in Table 1. Tetranucleotide repeats were the most abundant SSR type, constituting 68.97% of all identified SSRs, and there were 7 SSRs in di-, tri-, and pentanucleotide repeats, accounting for 8.05% of all identified SSRs. There were only three mono- and hexanucleotide repeats in the D. oleifera mt genome. AAAG/CTTT motifs (16) were most recurrent motifs, representing 18.39% of all identified SSRs (Table S1). Tandem repeats (satellite DNA) are core repeating units of 1–200 bases repeated several times in tandem [73]. As shown in Table 4, 12 tandem repeats 6 to 30 bp long were observed in the D. oleifera mt genome.
Table 4

Distribution of perfect tandem repeats in the D. oleifera mt genome.

NO.SizeCopyRepeat sequencePercent MatchesStartEnd
1301.9TACTACAATCCGTACGATAACTAGAATCCG82123393123450
2182.2GCTTGATTCGGTGTAAAC90143948143987
3202TTTGATTTCATCTTCATATAC90176075176115
4142.8GGAGCTGACACCCT84210479210515
5152.4AAATAAAAAAATAAA90273479273514
6192.1AACAACCTATCTTGCGACA90308468308506
7156.7ACAACCTATTATGCG70308469308572
8182.1AATACTAATAGAATAGAA90335217335254
9182.4CATAGTCGCGAGCTGTTT81400200400242
1064.2AAAGAA100409196409220
11185.2TATTGATGATAGTGACGA92456597456686
1296.8ATTGATGAT73456613456673
Distribution of perfect tandem repeats in the D. oleifera mt genome. In addition, 760 non-tandem repeats, with 30 bp or more in length, were detected in the D. oleifera mt genome. Of the 760 non-tandem repeats, 426 were direct, 332 were palindromic, and 2 were reverse. The longest direct-type repeat was 115 bp long, while the longest inverted repeat was 331 bp long (Table S2). As shown in Figure 3, the 30–39 bp repeats were most abundant for both repeat types.
Figure 3

The repeats in the D. oleifera mt genome. A: The synteny between the mt genome and its copy showing the direct repeats. B: The length distribution of reverse and inverted repeats in the D. oleifera mt genome. The number on the histograms represents the repeat number of designated lengths shown on the horizontal axi.

The repeats in the D. oleifera mt genome. A: The synteny between the mt genome and its copy showing the direct repeats. B: The length distribution of reverse and inverted repeats in the D. oleifera mt genome. The number on the histograms represents the repeat number of designated lengths shown on the horizontal axi. The repetitive sequence in the D. oleifera mt genome was 31 kb, accounting for 6.33% of the total mitochondria. This is considered a medium proportion of repeats, higher than that in Boea hygrometrica, (1.5%) and V. macrocarpon (3%) and lower than that in N. tabacum (13%) [68] and D. carota (16%) [66]. The different proportions of repeats may be because the mitochondria of B. hygrometrica, V. macrocarpon, and D. oleifera are mainly short repeating units, whereas those of tobacco and carrots are mainly longer repeating units [66].

The prediction of RNA editing

The number of RNA-editing sites varies in different species and is usually frequent in angiosperm and gymnosperm mitochondria. We predicted 515 RNA-editing sites within all the 38 protein-coding genes (Table 5) in the D. oleifera mt genome, which is more than those in A. thaliana (441) [5], Suaeda glauca (261) [73], Eucalyptus grandis (470) [74], and Citrullus lanatus (463) [75] and less than those in gymnosperms with larger mt genomes, such as Taxus cuspidata (974), Pinus taeda (1179), Cycas revoluta (1206), and Ginkgo biloba (1306) [32]. However, whether the number of RNA-editing sites is positively correlated with the size of the mt genome requires further research.
Table 5

Prediction of RNA editing sites.

TypeEffectNumberPercentage (%)
HydrophilicCGT (R) => TGT (C)2813.40
CGC (R) => TGC (C)13
CAT (H) => TAT (Y)20
CAC (H) => TAC (Y)8
HydrophobicGCT (A) => GTT (V)330.29
GCG (A) => GTG (V)7
GCC (A) => GTC (V)2
CTT (L) => TTT (F)13
CTC (L) => TTC (F)5
CCT (P) => CTT (L)19
CCG (P) => CTG (L)35
CCC (P) => TTC (F)6
CCC (P) => CTC (L)7
CCA (P) => CTA (L)45
CCT (P) => TTT (F)14
Hydrophilic-hydrophobicTCT (S) => TTT (F)4447.57
TCG (S) => TTG (L)49
TCC (S) => TTC (F)29
TCA (S) => TTA (L)78
CGG (R) => TGG (W)30
ACT (T) => ATT (I)4
ACG (T) => ATG (M)6
ACA (T) => ATA (I)5
Hydrophilic-stopCGA (R) => TGA (X)30.77
CAA (Q) => TAA (X)1
Hydrophobic-hydrophilicCCT (P) => TCT (S)217.77
CCC (P) => TCC (S)9
CCA (P) => TCA (S)6
CCG (P) => TCG (S)4
Prediction of RNA editing sites. The selection of mitochondrial RNA-editing sites in D. oleifera shows a high degree of compositional bias. All RNA-editing sites are the C-T editing type, which is consistent with the fact that C-T is the most common editing type found in plant mt genomes [76, 77, 78]. In previous studies, almost half of the mitochondrial RNA editing occurred at the second codon position [73, 77]. The proportion of RNA-editing sites at the second codon position in the D. oleifera mt genome is also about 45.72% (235), slightly less than that at the first codon position (259; 50.39%). However, no editing site was found at the third position of triplet codons, consistent with the fact that RNA-editing sites are rare in plant mt genomes [73, 78]. Due to mitochondrial RNA editing, the D. oleifera mt genome has more RNA-editing sites but fewer editing types (Table 5). There were only 29 codon transfer types, corresponding to 14 amino acid transfer types, among the 515 RNA-editing sites. The types of transfer are comparable to those of most gymnosperms (30–40 codons; around 20 amino acids) [32, 76] but less than those of monocotyledonous and dicotyledonous plants (50–60 codons; around 30 amino acids) [74, 75, 78]. Among the 29 codon transfer types, TCA => TTA was the most common, with 78 sites. A leucine tendency after RNA editing, supported by the fact that 45.24% (233 sites) of the edits are converted to leucine, was found in the amino acids of predicted editing codons. After RNA editing, 43.59% of the amino acids remained hydrophobic. However, 47.57% of the amino acids were predicted to change from hydrophilic to hydrophobic, while 7.77% were predicted to change from hydrophobic to hydrophilic. The number and type of RNA-editing sites differed among the mt genomes of D. oleifera and other species. Like with most angiosperms [73, 76], ribosomal proteins (except rps4) and ATPase subunits (except atp6) had a relatively small number of RNA-editing-derived substitutions (1–12 sites), while the transcripts of NADH dehydrogenase subunits and cytochrome c biogenesis genes were significantly edited (11–36 sites; Figure 4), and ccmFn and ccmB had the most RNA-editing sites predicted (36, 35).
Figure 4

The distribution of RNA-editing sites in the D. oleifera mt protein-coding genes. The blue bars represent the number of RNA-editing sites of each gene.

The distribution of RNA-editing sites in the D. oleifera mt protein-coding genes. The blue bars represent the number of RNA-editing sites of each gene. In D. oleifera, 10,611 amino acids were encoded. The most frequently used amino acids were Leu (10.25%), Ser (9.23%), and Arg (6.86%), and the least common amino acids were Trp (1.52%) and Met (2.65%) (Figure 4). The relative synonymous codon usage (RSCU) value for D. oleifera for the third codon position is shown in Figure 5. Consistent with most of the currently studied mitochondrial genomes [1, 73, 76], the use of both two- and four-fold degenerate codons was biased toward the use of codons abundant in A or T.
Figure 5

Relative synonymous codon usage in the D. oleifera mt genome.

Relative synonymous codon usage in the D. oleifera mt genome.

Chloroplast-derived mitogenomic sequences

The transfer of DNA sequences among chloroplast and mt genomes has been frequently observed in the mt genomes of plants [79]. In many cases, the chloroplast DNA content in the mt genomes of most plants is 3%–6%, sometimes reaching up to about 10% [80]. The D. oleifera mt genome contained 28 chloroplast insertions, ranging in length from 32 to 5703 bp (Figure 6, Table 6), with a total length of 32.83 kb, accounting for 6.65% of the total length of the genome, which is greater than the mitochondrial genome lengths of Liriodendron tulipifera (3%) [31], and N. tabacum (2.5%) [68]; comparable to those of C. lanatus (6%) [75], E. grandis (6%) [74], and Oryza sativa (6.3%) [81]; and less than those of Vitis vinifera (8.8%) [36] and Cucurbita pepo (11.5%) [75].
Figure 6

DNA and gene transfer between Chloroplast and Mitochondrial genomes in D. oleifera. The track shows complete genomes of cp and mt in green and red respectively.

Table 6

Chloroplast insertions in the mitochondrial genome of D. oleifera.

Chloroplast insertionStartEndLengthChloroplast genes carriedMitochondrial gene
11020391077415703rps12-rrn16-rrn23-trnA-UGC-trnI-GAU-trnV-GACnad5/trnA-TGC/trnI-GAT/trnV-GAC/
21370541427565703rps12-rrn16-rrn23-trnA-UGC-trnI-GAU-trnVGACnad5/trnA-TGC/trnI-GAT/trnV-GAC
31329351369434009rps12-rrn23-rrn4.5-rrn5-trnN-GUU-trnR-ACGtrnN-GTT/trnR-ACG/
41078521118604009rps12rrn23-rrn4.5-rrn5-trnN-GUU-trnR-ACGtrnN-GTT/trnR-ACG
51481831506212439ycf15-ycf2ORF
694174966122439rps12-ycf15-ycf2ORF
755481569731493atpB-atpEORF
866698677741127psbE-psbF-psbJ-psbLORF
92471325651939rpoB-rpoC1ORF
106873769666939petG-petL-trnPUGG-trnWCCAtrnW-CCA
11103614104477888rps12-rrn16rrn18
12140318141181888rps12-rrn16rrn18
13124702125419719rps12-ndhA-ndhHnad5
146513765375245petAnad1
153189232085197trnDGUCnad1/trnD-GTC
164705547223171trnSGGAtrnS-GGA
173667236818147psbCnad1
181096119096psbAORF
199282937092trnSGCUtrnS-GGA
2013314013322082rps12-trnN-GUUnad1/trnN-GTT
2111157511165582rps12-trnN-GUUnad1/trnN-GTT
22548085488679TrnM-CAUnad1/trnM-CAT
2315566115573577TrnI-CAUnad1/ccmC/orf
24890608913477rps12-trnI-CAUnad1/ccmC/orf
25958379589761rps12-ycf2nad1/orf
2614889814895861ycf2nad1/orf
2715566115569232trnICAUORF
28891038913432rps12-trnICAUORF
DNA and gene transfer between Chloroplast and Mitochondrial genomes in D. oleifera. The track shows complete genomes of cp and mt in green and red respectively. Chloroplast insertions in the mitochondrial genome of D. oleifera. Among the transfer DNA sequences, some chloroplast protein-coding genes, such as atpB, atpE, rps12, rpoB, petA, psaA, and psbC, lost their integrity while migrating from the cp to the mitochondria, and only partial sequences of those cp-derived PCGs could be found in the D. oleifera mt genome (Table 6). In the D. oleifera mt genome, 11 chloroplast-derived tRNAs with a complete sequence were identified: trnA-UGC, trnD-GUC, trnI-GAU, trnM-CAU, trnN-GUU, trnP-UGG, trnR-ACG, trnS-GCU, trnS-GGA, trnV-GAC, and trnW-CCA. The different completeness levels of the transferred PCGs and tRNA genes showed that tRNA genes are much more conserved in the mt genome than PCGs, indicating that tRNA genes play an indispensable role in mitochondria. The transfer of these tRNAs can be traced back to the retention of an earlier horizontal gene transfer event. In accordance with the present results, cp-derived trnM-CAU first appeared in gymnosperms [82]; cp-derived trnD-GUC mainly appeared in dicotyledons, not in monocotyledons [76]; and cp-derived trnM-CAU and trnD-GUC were both found in the D. oleifera mt genome. However, the absence of cp-derived trnH-GTG, which is commonly found in angiosperms [3, 74, 76, 82], and the presence of cp-derived trnA-UGC, lost during early evolution of terrestrial plants [80, 83], indicate that special evolutionary events may be occurring during D. oleifera formation.

Phylogenetic, Ka/Ks and gene arrangement analysis

To detect the evolutionary status of the D. oleifera mt genome, a phylogenetic analysis was performed on D. oleifera, together with 27 other species: 23 eudicots (19 asterids and 4 rosids), 2 monocotyledons, and 2 gymnosperms (designated as outgroups). Phylogenetic relationships were analyzed using the concatenated dataset (28 PCGs: atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFC, ccmFN, cob, cox1, cox2, cox3, matR, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, rpl5, rps12, rps13, rps3 and rps4) through ML phylogenetic analysis. The abbreviations and accession numbers of the mt genomes investigated in this study are listed in Table S1. As outgroups, the two gymnosperms were distinct from the other angiosperms. The phylogenetic tree (Figure 7) strongly supported the separation of asterids from rosids and the separation of eudicots from monocots. Moreover, the taxa from 20 families (Apiaceae, Aquifoliaceae, Asteraceae, Brassicaceae, Campanulaceae, Convolvulaceae, Ericaceae, Ebenaceae, Fabaceae, Ginkgoaceae, Lamiaceae, Lentibulariaceae, Oleaceae, Phrymaceae, Pinaceae, Poaceae, Rosaceae, Rubiaceae, Solanaceae, and Vitaceae) were well clustered. In addition, the monophyly of D. oleifera, which belongs to the single genus of Diospyros in the Ebenaceae family, was well supported based on mt genomes (Figure 7). Consistent with previous comparative genome studies [23, 28, 29], this study also found that the clade united V. macrocarpon and R. simsii and then formed a sister cluster with the Ebenaceae family with high confidence (bootstrap value of 100%). In general, the phylogenetic tree topology was in line with the evolutionary relationships among those species, indicating the consistency of traditional taxonomy with the molecular classification.
Figure 7

The phylogenetic relationships of D. oleifera with other 27 plant species using the maximum likelihood (ML) analysis. The bootstrapping values are listed in each node. The number after the species name is the GenBank accession number. Colors indicate the groups that the specific species belongs.

The phylogenetic relationships of D. oleifera with other 27 plant species using the maximum likelihood (ML) analysis. The bootstrapping values are listed in each node. The number after the species name is the GenBank accession number. Colors indicate the groups that the specific species belongs. To evaluate selective pressures during the evolutionary dynamics of protein-coding genes among closely related species, the nonsynonymous (Ka) and synonymous (Ks) substitution ratio (Ka/Ks) was calculated. For the Ka/Ks calculation, 28 PCGs from the D. oleifera mt genome were compared with the mt genomes of 27 species. As shown in Figure 8, for the gene-specific substitution rates, Ka/Ks ranged from 0.031 at the cox1 gene in V. macrocarpon to 4.321 at the atp4 gene in D. carota. In 58 cases (except Glycine max, O. sativa, Platycodon grandiflorus, Scyphiphora hydrophyllacea and Z. mays), the Ka/Ks values of D. oleifera gene-specific substitution rates were higher than 1, compared with 22 other species, suggesting positive selection during evolution. Among the 22 species, nine substitution genes with higher Ka/Ks values were found between the D. oleifera and V. vinifera mt genomes and six genes between the D. oleifera and V. macrocarpon mt genomes. The atp4 and atp8 genes exhibited the highest average rate (1.348 and 0.751) and 15 and 5 Ka/Ks values above 1, respectively, suggested to be the result of positive or relaxed selection [2]. However, most genes had undergone negative selection pressures during evolution, supported by the fact that the Ka/Ks values of 654 proteins, accounting for 91.59% of the proteins in D. oleifera, were less than 1 compared to the other plant species. The atp1 and cox1 genes have the smallest average Ka/Ks values (0.212 and 0.272), indicating strong purifying selection [34, 84]. These results show that mt genes are highly conserved during the evolutionary process in green plants.
Figure 8

The Ka/Ks values of 28 protein-coding genes of D. oleifera versus 27 species.

The Ka/Ks values of 28 protein-coding genes of D. oleifera versus 27 species. Because of no mt genome of any species in Ebenaceae has been reported, Synteny of entire mitochondrial genomes was only compared among four Asterids (including three Ericales, one Apiales, and one Solanales) and one Rosids species in this study to assess the degree of structural rearrangement between different lineages. Figure 9 and Figs.1 showed that the rearrangement of mitochondrial genes has been widely occur among these six species, which is accords with many mitogenome observations [20, 37, 38, 39, 41, 42]. When using D. oleifera as a reference genome, The dot-plot analyses showed sequences or synteny were seldom shared, and only short stretches of synteny among species (Figs.2). These Large rearrangement events have indicated differentiation within these six species mitogenome. Understandably, species that have close evolutionary relationships share more clusters [20, 41, 42], for example, In general, longer synteny sequences with higher similarity were found between D. oleifera and V. macrocarpon than that between D. oleifera and M. domestica.
Figure 9

Synteny analysis of D. oleifera and other five species mitogenomes as generated with Mauve. The sizes and relative positions of the homologous fragments varied across mitogenomes.

Synteny analysis of D. oleifera and other five species mitogenomes as generated with Mauve. The sizes and relative positions of the homologous fragments varied across mitogenomes. Among genus Diospyros, The most well-known species is D. kaki, which has been cultivated as an important fruit crop, due to its edible fruit [25]. However, D. kaki are hexaploid (2n = 6× = 90) or nonaploid (2n = 9× = 135) and their origin, and polyploidization mechanisms are unclear, which has hampered genome sequencing and molecular breeding [23, 28]. Phylogenetic analyses based on the nuclear [23, 28] and chloroplast [26, 27] genome and mtDNA non-coding fragments [85] have indicated that D. oleifera is more closely related to D. kaki [24] and could be used as a model plant for studies of Diospyros [24, 26]. So, as the nuclear and mt genome of hexaploid cultivated persimmon both remains unpublished, the availability of the D. oleifera mt genome provides more alternative comparable reference for D. kaki than D. lotus does. In addition, our results will lay the foundation for identifying further evolutionary relationships within Ebenaceae. However, due to the lack of adequate representative mitogenomes, more Ebenaceae mitogenomes are needed to be sequenced to better resolve the phylogeny and evolutionary biology of Ebenaceae.

Conclusions

Here, we presented the first mitochondrial genome assembly and annotation of an Ebenaceae model plant Diospyros oleifera as well as the mitochondrial genome in the family Ebenaceae. The mitogenome was 493,958 bp in length, contained 39 protein-coding genes, 27 transfer RNA genes, and 3 ribosomal RNA genes. Comparative analysis of gene structure, codon usage, repeat regions and RNA-editing sites shows that rps2 and rps11 genes are missing, and a clear bias of RNA-editing sites is existing in the D. oleifera mt genome. In addition, the phenomenon that intracellular tRNA genes transferred frequently from chloroplasts to mitochondria was also observed in D. oleifera. Moreover, Phylogenetic analysis based on the mt genomes of D. oleifera and 27 other taxa indicates consistency in molecular and taxonomic classification. Furthermore, The Ka/Ks analysis based on code substitution revealed that most of the coding genes had undergone negative selections, indicating the conservation of mt genes during the evolution. These results will help in better understanding the features of the D. oleifera mitochondrial genome and lay the foundation for identifying further evolutionary relationships within Ebenaceae. However, due to the lack of adequate representative mitogenomes, more Ebenaceae mitogenomes are needed to be sequenced to better resolve the phylogeny and evolutionary biology of Ebenaceae.

Declarations

Author contribution statement

Yang Xu: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. Yi Dong: Performed the experiments. Wenqiang Cheng; Haidong Gao, Lei Liu and Lei Xu: Analyzed and interpreted the data. Kaiyun Wu: Contributed reagents, materials, analysis tools or data. Bangchu Gong: Conceived and designed the experiments; Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

Ph.D. Yang Xu was supported by National Key R & D Program of China [2018YFD1000606]. Ph.D. Yang Xu was supported by National Key R & D Program of China [2019YFD1000600]. Bangchu Gong was supported by Key Project for New Agricultural Cultivar Breeding in Zhejiang Province, China [2021C02066-10].

Data availability statement

Data associated with this study [The final annotated mt genome sequences of D. oleifera] has been deposited at NCBI GenBank under the accession number MW970112.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  76 in total

1.  Generic delimitation and relationships in Ebenaceae sensu lato: evidence from six plastid DNA regions.

Authors:  Sutee Duangjai; Bruno Wallnöfer; Rosabelle Samuel; Jérôme Munzinger; Mark W Chase
Journal:  Am J Bot       Date:  2006-12       Impact factor: 3.844

2.  NextPolish: a fast and efficient genome polishing tool for long-read assembly.

Authors:  Jiang Hu; Junpeng Fan; Zongyi Sun; Shanlin Liu
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

3.  The complete nucleotide sequence and multipartite organization of the tobacco mitochondrial genome: comparative analysis of mitochondrial genomes in higher plants.

Authors:  Y Sugiyama; Y Watase; M Nagase; N Makita; S Yagura; A Hirai; M Sugiura
Journal:  Mol Genet Genomics       Date:  2004-12-03       Impact factor: 3.291

4.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

5.  The American cranberry mitochondrial genome reveals the presence of selenocysteine (tRNA-Sec and SECIS) insertion machinery in land plants.

Authors:  Diego Fajardo; Brandon Schlautman; Shawn Steffan; James Polashock; Nicholi Vorsa; Juan Zalapa
Journal:  Gene       Date:  2013-12-14       Impact factor: 3.688

6.  PREP-Mt: predictive RNA editor for plant mitochondrial genes.

Authors:  Jeffrey P Mower
Journal:  BMC Bioinformatics       Date:  2005-04-12       Impact factor: 3.169

7.  Horizontal transfer of DNA from the mitochondrial to the plastid genome and its subsequent evolution in milkweeds (apocynaceae).

Authors:  Shannon C K Straub; Richard C Cronn; Christopher Edwards; Mark Fishbein; Aaron Liston
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

8.  GeSeq - versatile and accurate annotation of organelle genomes.

Authors:  Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

9.  Mitochondrial Fostering: The Mitochondrial Genome May Play a Role in Plant Orphan Gene Evolution.

Authors:  Seth O'Conner; Ling Li
Journal:  Front Plant Sci       Date:  2020-12-03       Impact factor: 5.753

10.  Identification, characterization and functional analysis of grape (Vitis vinifera L.) mitochondrial transcription termination factor (mTERF) genes in responding to biotic stress and exogenous phytohormone.

Authors:  Xiangjing Yin; Yu Gao; Shiren Song; Danial Hassani; Jiang Lu
Journal:  BMC Genomics       Date:  2021-02-26       Impact factor: 3.969

View more
  1 in total

1.  Assembly and comparative analysis of the complete mitochondrial genome of Bupleurum chinense DC.

Authors:  Yonggang Qiao; Xinrui Zhang; Zheng Li; Yun Song; Zhe Sun
Journal:  BMC Genomics       Date:  2022-09-21       Impact factor: 4.547

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.