Literature DB >> 36131243

Assembly and comparative analysis of the complete mitochondrial genome of Bupleurum chinense DC.

Yonggang Qiao1, Xinrui Zhang2, Zheng Li2, Yun Song2, Zhe Sun2.   

Abstract

BACKGROUND: Bupleurum chinense(B. chinense) is a plant that is widely distributed globally and has strong pharmacological effects. Though the chloroplast(cp) genome of B. chinense has been studied, no reports regarding the mitochondrial(mt) genome of B. chinense have been published yet.
RESULTS: The mt genome of B.chinense was assembled and functionally annotated. The circular mt genome of B. chinense was 435,023 bp in length, and 78 genes, including 39 protein-coding genes, 35 tRNA genes, and 4 rRNA genes, were annotated. Repeat sequences were analyzed and sites at which RNA editing would occur were predicted. Gene migration was observed to occur between the mt and cp genomes of B. chinense via the detection of homologous gene fragments. In addition, the sizes of plant mt genomes and their GC content were analyzed and compared. The sizes of mt genomes of plants varied greatly, but their GC content was conserved to a greater extent during evolution. Ka/Ks analysis was based on code substitutions, and the results showed that most of the coding genes were negatively selected. This indicates that mt genes were conserved during evolution.
CONCLUSION: In this study, we assembled and annotated the mt genome of the medicinal plant B. chinense. Our findings provide extensive information regarding the mt genome of B. chinense, and help lay the foundation for future studies on the genetic variations, phylogeny, and breeding of B. chinense via an analysis of the mt genome.
© 2022. The Author(s).

Entities:  

Keywords:  Bupleurum chinense; Mitochondrial genome; Phylogenetic analysis; Repeat sequence

Mesh:

Substances:

Year:  2022        PMID: 36131243      PMCID: PMC9490909          DOI: 10.1186/s12864-022-08892-z

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   4.547


Background

Bupleurum chinense DC. is a perennial herb belonging to the Umbelliferae family [1]. Approximately 200 species of Bupleurum are distributed worldwide. Bupleurum L. has been used as a medicinal material in China for many years. The Chinese Pharmacopoeia states that the B. chinense and B. scorzonerifolium species are mainly used as drugs [2]. B. chinense is mainly grown in North and Northwest China. It is also distributed in relatively smaller amounts in other regions. The main components of B. chinense are saikosaponins, sterols, volatile oils, fatty acids, and polysaccharides [3]. These components have anti-pyretic, anti-inflammatory, and immune functions, and pharmacological effects that prevent liver injury [4-6]. In recent years, the pharmacological functions of B. chinense have been explored continuously, because of which it is currently considered as a natural resource with important economic and medicinal value [7, 8]. Mitochondria are important organelles in plant cells that participate in many metabolic processes related to the synthesis and degradation of intracellular compounds and energy production [9, 10]. Endosymbiosis origin theory states that mitochondria originate from endosymbiotic bacteria that can carry out aerobic respiration and are phagocytosed by primitive eukaryotic cells. After the bacteria are swallowed by primitive eukaryotic cells, they gradually evolve into organelles with specific functions in the long-term mutually beneficial process of symbiosis [11-14]. Mitochondria are semi-autonomous, possess relatively independent genetic material, and have an independent and self-sufficient system for protein synthesis [15, 16]. As an energy factory in the cell, mitochondria provide energy for intracellular biosynthesis and degradation. They represent the main site at which cells carry out aerobic respiration and participate in various life activities, such as intracellular differentiation, apoptosis, growth, and division [17-19]. Therefore, the mt genome is a valuable source of genetic information for the study of plant phylogeny and necessary cellular processes, and is of great significance in the study of species evolution, species identification, and genetic transformation [20, 21]. There are significant differences in the length, gene sequence, and gene content of plant mt genomes. This phenomenon is observed not only in different species, but also in the same species [22-24]. Some researchers found Double-strand break repair processes drive evolution of the mt genome in Arabidopsis, gene conversion and mismatch repair activity observed in the mt genome of the Arabidopsis mutants [22]. With the continuous development of sequencing technology, the mt genomes of a variety of plants have been published [25-27]; however, no report on the B. chinense mt genome has been published yet. In this study, we sequenced and annotated the B. chinense mt genome and conducted a thorough analysis with regard to genomic characteristics, repetitive sequences, RNA editing, codon preference, and comparative genomics. We performed system evolution analysis to understand the genetic variations in B. chinense more effectively, along with reports regarding breeding and plant research on B. chinense, as this would provide a theoretical foundation for conducting further research.

Results

Genomic features of the B. chinense mt genome

The B. chinense mt genome is a circular sequence with a length of 435,023 bp. The genome is composed of the A (27.19%), G (22.49%), T (27.77%), and C (22.55%) bases. The GC content is 45.04%. The functional classification and physical location of the annotated genes are shown in Fig. 1. 78 genes, including 39 protein-coding genes, 35 tRNA genes, and 4 rRNA genes, were annotated in the mt genome. We identified 2283 open reading frames (ORFs).
Fig. 1

The circular map of the B. chinense mt genome

The circular map of the B. chinense mt genome The results of this process are shown in Table 1. The B. chinense mt genome encodes 35 different proteins, which can be divided into 9 categories, and it contains two copies of nad7, cob, rps4, and mttB. The encoded proteins can be classified as NADH dehydrogenases (9 genes), ATP synthases (6 genes), cytochrome C biogenesis accessory proteins (4 genes), cytochrome C oxidases (3 genes), maturases (1 gene), ubiquinol cytochrome c reductases (1 gene), ribosomal proteins (SSU) (7 genes), ribosomal proteins (LSU) (3 genes), and transport membrane proteins (1 gene). The start codon of all protein-coding genes was ATG, and the use rates of the TAA, TGA, and TAG stop codons were different. The use rates of TAA, TGA, and TAG were 51.3, 25.6, and 23.1%, respectively. The use rate of the TAA stop codon was the highest.
Table 1

Functional classification of genes and physical location of the B. chinense mt genome

Group of genesGene nameLength(bp)Start codonStop codonAmino acid
NADH dehydrogenasenad1978ATGTAA326
nad21467ATGTAA489
nad3357ATGTAA119
nad41488ATGTGA496
nad4L303ATGTAA101
nad52013ATGTAA671
nad6618ATGTAA206
nad7(2)1185ATGTAG395
nad9573ATGTAA191
ATP synthaseatp11527ATGTGA509
atp4579ATGTAA193
atp6795ATGTGA265
atp8690ATGTAG230
atp8822ATGTAG274
atp9225ATGCAA(TAA)75
Cytochrome c biogenesisccmB621ATGTGA207
ccmC753ATGTGA251
ccmFc1317ATGCGA(TGA)439
ccmFn1740ATGTGA580
Cytochrome c oxidasecox11497ATGTAA499
cox2795ATGTAA265
cox3798ATGTGA266
MaturasesmatR1968ATGTAG656
Ubiquinol cytochrome c reductasecob(2)1695ATGTAG565
Ribosomal proteins (LSU)rpl10462ATGTGA154
rpl16516ATGTAA172
rpl5555ATGTAA185
Ribosomal proteins (SSU)rps1606ATGTAA202
rps10330ATGTAA110
rps12378ATGTGA126
rps13351ATGTAA117
rps14303ATGTAA101
rps4(2)918ATGTAA306
rps7447ATGTAA149
Transport membrane proteinmttB(2)375ATGTAG125
Ribosomal RNAsrrn181764
rrn26(2)3252
rrn5117
Transfer RNAstrnC-GCA71
trnD-GUC74
trnE-UUC72
trnF-AAA(2)(85,85)
trnF-GAA(2)(74,74)
trnG-GCC(2)(72,72)
trnH-GUG74
trnI-UAU76
trnK-CUU(2)(73,73)
trnK-UUU73
trnL-UAA70
trnM-CAU(6)(73,73,77,73,73,73)
trnN-GUU72
trnP-CGG88
trnP-UGG(2)(75,75)
trnQ-UUG(2)(72,72)
trnS-GCU(2)(88,88)
trnS-UGA87
trnV-CAC(3)(72,72,72)
trnW-CCA74
trnY-GUA83

Numbers after gene names are the number of copies

Functional classification of genes and physical location of the B. chinense mt genome Numbers after gene names are the number of copies Studies have shown that the mt genome of most terrestrial plants contains 3 rRNA genes [11, 13]. Here, 3 rRNA genes from the B. chinense mt genome, namely rrn18 (1764 bp), rrn26 (3252 bp), and rrn5 (117 bp), were annotated. In addition, 21 different tRNAs, which were involved in the transportation of a total of 17 amino acids, were identified in the B. chinense mt genome. This could be explained by the fact that two or more tRNAs might transport the same amino acid to different codons. For example, trnF-AAA and trnF-GAA are associated with the synonymous codons UUU and UUC, which are involved in the transportation of phenylalanine.

Repeat sequence analysis

Interspersed repeat sequences are repetitive sequences that are scattered in the genome. In the B. chinense mt genome, we identified a total of 844 interspersed repeats with a length greater than or equal to 30 bp; of these, 425 were forward repeats and 419 were palindrome repeats. The length of the longest forward repeat sequence was 12,012 bp and that of the longest palindrome repeat sequence was 16,761 bp. The distribution of the lengths of forward repeats and palindrome repeats is shown in the Fig. 2; the abundance of both types of repeats was the highest when repeats were in the range of 30-39 bp.
Fig. 2

The interspersed repeat sequences in the B. chinense mt genome. A The four interspersed repeat types are distributed throughout the genome; the two black lines represent the mt genome, and the same repeats are associated with the line segments. B Distribution of lengths of interspersed repeats in the mt genome. The abscissa indicates the type of interspersed repeats, and the ordinate indicates the number of scattered repeats

The interspersed repeat sequences in the B. chinense mt genome. A The four interspersed repeat types are distributed throughout the genome; the two black lines represent the mt genome, and the same repeats are associated with the line segments. B Distribution of lengths of interspersed repeats in the mt genome. The abscissa indicates the type of interspersed repeats, and the ordinate indicates the number of scattered repeats Microsatellites, also known as simple sequence repeats (SSRs), are DNA fragments with a length of 1-6 bp [28]. Microsatellites are widely used in species research due to their advantages, which include their polymorphism, codominant inheritance, relative abundance, and wide genome coverage [29, 30]. As shown in Table 2, in the B. chinense mt genome, the detected SSR sites included monomer, dimer, trimer, tetramer, pentamer, and hexamer repeats. Among these, the number of monomer repeats was the largest, and accounted for 47.75% of the total SSRs, followed by dimer repeats, which accounted for 36.75% of the total SSRs; the number of pentamer and hexamer repeats was the least. Monomer repeats composed of A/T bases accounted for 91.6% of monomer SSRs, and dimer repeats composed of AG/CT bases accounted for 59.2% of dimer SSRs.
Table 2

Distribution of SSRs in the B. chinense mt genome

SSR typeRepeatsNumber of SSRsTotal
monomerA/T175191
C/G16
dimerAC/GT11147
AG/CT87
AT/AT42
CG/CG7
trimerAAC/GTT316
AAG/CTT7
AAT/ATT3
ACT/AGT2
AGG/CCT1
tetramerAAAC/GTTT135
AAAG/CTTT16
AAAT/ATTT1
AACC/GGTT2
AAGC/CTTG4
AAGT/ACTT1
AATC/ATTG1
AATG/ATTC3
ACTC/AGTG1
AGCC/CTGG1
AGCT/AGCT2
ATCC/ATGG1
CCCG/CGGG1
pentamerAAAAG/CTTTT17
AAAAT/ATTTT1
AAAGT/ACTTT1
AAGCC/CTTGG1
ACCCG/CGGGT1
ACTAG/AGTCT1
CCCGG/CCGGG1
hexamerAAAGAT/ATCTTT24
AAATAG/ATTTCT1
AGCCCT/AGGGCT1
Distribution of SSRs in the B. chinense mt genome Tandem repeat sequences, also known as satellite DNAs, refer to repetitive sequences formed by the association between short sequences with 1 to 200 bases as repeating units in tandem, and are widely present in eukaryotic genomes and some prokaryotes [31, 32]. Tandem Repeats Finder v4.09 [33] was used to identify tandem repeats in the B. chinense mt genome. As shown in Table 3, a total of 10 tandem repeats ranging in length from 9 to 71 bp that had a match degree greater than 95% were found in the genome.
Table 3

Distribution of tandem repeats in the B. chinense mt genome

No.SizeRepeat SequenceCopyPercent MatchesStartEnd
112GCGCCTGAGCCA5.310014,01514,078
212GCGCTGGCTCAG5.310074,82074,883
312AATAATAGATAT2.1100133,437133,461
430AGTAAGTAAACTACCTCTCCTACCTAGTCG5.5100186,768186,933
530GTAAACTACCTCTCCTACCTAGTCGAGTAA3.498241,705241,805
634CACTTAGCTCTTATTGCGACATTCCTTGATATGG2.1100292,143292,214
729TTGAAGCTTTGCCAGGAAGCTTCTACTTG2.4100325,136325,204
89TCATAAATC3.0100337,032337,058
971

TATATGAAACGAAAACATTCGTTCAAGTTATAG

CTTTCCGGTAAGGAAATGGCATTGCTAGCTTTT

CCATA

2.396347,890348,051
1030AGGTAGTTTACTTACTCGACTAGGTAGGAG3.498371,343371,443
Distribution of tandem repeats in the B. chinense mt genome TATATGAAACGAAAACATTCGTTCAAGTTATAG CTTTCCGGTAAGGAAATGGCATTGCTAGCTTTT CCATA

Prediction of RNA editing sites

In all eukaryotes, the addition, loss, or substitution of bases in the coding region of the transcribed RNA is called RNA editing [34, 35]. In the mt and chloroplast genomes of plants, RNA editing is manifested as the conversion of specific cytosines to uracils, and it changes the genetic information in the genome [36]. In this study, RNA editor (PREP) [37] (http://prep.unl.edu/) was used to predict sites at which RNA editing would occur. A total of 517 RNAs were predicted in 34 protein-coding genes (Table 4) of the B. chinense mt genome (Fig. 3). Among the editing sites, the ribosomal protein (SSU) encoding gene rps14 contained the least predicted RNA editing sites, i.e., 2 sites, and the NADH dehydrogenase encoding gene nad4 contained the most predicted editing sites, i.e., 44 sites. After RNA editing, the hydrophobicity of 42.75% of amino acids remained unchanged, while 8.12% of hydrophobic amino acids became hydrophilic, and 48.16% of hydrophilic amino acids became hydrophobic. All RNA-editing sites in the B. chinense mt genome are the C-T editing type, among these, 30.95% (160) of the editing sites were located on the first base of the triplet codon, and 65.18% of the editing sites were located on the second base of the triplet codon (337). At certain instances, both the first and second bases of the triplet codon were edited. This caused the conversion of proline (CCC) to phenylalanine (TTC, TTT). RNA editing not only leads to changes in the encoded amino acids, but may also lead to the premature termination of the coding process [38]. In the B. chinense mt genome, this phenomenon could be observed in the coding genes atp6, atp9, ccmFc, cob, and rpl16. The predicted results also show that the amino acids generated after codon editing had the highest tendency to convert to leucine after RNA editing; 43.91% (227 positions) of amino acids were converted to leucine, and the second-highest number of amino acids were converted to phenylalanine, and this accounted for 23.40% of all conversions (121 sites).
Table 4

Prediction of RNA editing sites

TypeRNA-editingNumberPercentage
hydrophobicCCA (P) = > CTA (L)4330.56%
CCC (P) = > CTC (L)7
CCC (P) = > TTC (F)6
CCG (P) = > CTG (L)39
CCT (P) = > CTT (L)22
CTC (L) = > TTC (F)6
CTT (L) = > TTT (F)12
GCC (A) = > GTC (V)1
GCG (A) = > GTG (V)6
GCT (A) = > GTT (V)2
CCT (P) = > TTT (F)14
hydrophilicCAC (H) = > TAC (Y)612.19%
CAT (H) = > TAT (Y)13
CGC (R) = > TGC (C)13
CGT (R) = > TGT (C)31
hydrophobic-hydrophilicCCA (P) = > TCA (S)108.12%
CCC (P) = > TCC (S)13
CCT (P) = > TCT (S)16
CCG (P) = > TCG (S)3
hydrophilic-hydrophobicACA (T) = > ATA (I)348.16%
ACC (T) = > ATC (I)2
ACG (T) = > ATG (M)9
ACT (T) = > ATT (I)4
CGG (R) = > TGG (W)32
TCA (S) = > TTA (L)77
TCC (S) = > TTC (F)32
TCG (S) = > TTG (L)39
TCT (S) = > TTT (F)51
hydrophilic-stopCAA (Q) = > TAA (X)30.97%
CAG (Q) = > TAG (X)1
CGA (R) = > TGA (X)1
Fig. 3

Distribution of RNA editing sites among B. chinense mt protein-coding genes

Prediction of RNA editing sites Distribution of RNA editing sites among B. chinense mt protein-coding genes

Analysis of codon composition

We used self-coded Perl script to analyze the codon composition of the B. chinense mt genome. The results are shown in Table 5, the number of codons in all coding genes was 12,704, and the GC1, GC2, and GC3 content and the average GC content of 3 bases (all GC) in the B. chinense mt genome were less than 50%, indicating that the codons of the B. chinense mt genome were biased because of the use of both A and T bases. The effective codon number (Nc) was 55.48, which is indicative of the weak codon preference of the mt genome. The relative usage of synonymous codons (RSCU) in the B. chinense mt genome is shown in Fig. 4. There were 30 codons with RSCU> 1, indicating that the usage frequency of these codons is greater than that of other synonymous codons. Among these, 28 codons ending with the A/T base were identified, and these accounted for 93.33% of the codons, indicating that frequently used codons tend to end with the A/T base.
Table 5

Overall characteristics of codon usage in the B. chinense mt genome

ParameterCodon numberGC1GC2GC3GC_allNc
value12,70448.6143.4537.7743.2855.48
Fig. 4

Relative synonymous codon usage (RSCU) in the B. chinense mt genome. The different amino acids are shown on the x-axis. RSCU values are the number of times a particular codon is observed relative to the number of times that codon would be expected for a uniform synonymous codon usage

Overall characteristics of codon usage in the B. chinense mt genome Relative synonymous codon usage (RSCU) in the B. chinense mt genome. The different amino acids are shown on the x-axis. RSCU values are the number of times a particular codon is observed relative to the number of times that codon would be expected for a uniform synonymous codon usage

Analysis of homologous fragments of mitochondria and chloroplasts

Using BLAST v2.10.1, we screened the fragments of the B. chinense mt genome and cp genome [39] exhibiting > 70% similarity, and performed homologous fragment analysis (Fig. 5). We screened out 25 homologous fragments with a total length of 11,144 bp, which accounted for 2.56% of the mt genome (Table 6). These homologous fragments contained 8 annotated genes, of which 6 were tRNA genes, namely, trnV-GAC, trnW-CCA, trnN-GUU, trnD-GUC, trnM-CAU, and trnI-UAU, and the other two were rRNA gene (rrn18) and the cytochrome c biogenesis gene (ccmC).
Fig. 5

Comparison of a homologous fragment in the B. chinense cp genome to that in the mt genome. The green line segment of the circle represents the B. chinense cp genome, and the red line segment represents the B. chinense mt genome. The green line segment in the circle connects the start and end points of the transferred gene fragments. The width of the green line segment represents the size of the transferred fragment

Table 6

Comparison of a homologous fragment in the B. chinense cp genome to that in the mt genome

identity/%Length(bp)Mismatchesgap opensmt startmt endcp startcp endmt genecp gene
1100219900228,120225,92299,756101,954trnV-GACrrn16/trnV-GAC/ycf15
2100219900225,922228,120139,185141,383trnV-GACrrn16/trnV-GAC/ycf15
386.822903961384,04884,93682,22383,116rpl14/rpl16
477.92148123544159,455160,88249,99651,437ndhC/ndhJ/ndhK
581.3436706536163,167162,52367,47468,108trnW-CCApetG/trnP-UGG/trnW-CCA
610024000279,182278,943102,429102,668rrn16
710024000278,943279,182138,471138,710rrn16
894.36621312042,34842,56026,18326,395rpoB
994.366213120250,603250,81526,18326,395rpoB
1074.0998881773864,66163,803138,729139,592rrn18rrn16
1174.0998881773863,80364,661101,547102,410rrn18rrn16
1295.61411450311,442311,329131,516131,629trnN-GUUtrnN-GUU
1395.61411450311,329311,442109,510109,623trnN-GUUtrnN-GUU
1485.632174223130,537130,36431,63431,804trnD-GUCtrnD-GUC
151007800325,406325,329100177trnH-GUGtrnH-GUG
1692.3087860183,473183,39653,65953,736trnM-CAUtrnM-CAU
1792.3087860211,255211,33253,65953,736trnM-CAUtrnM-CAU
1888.3127772131,510131,43487,59287,666ccmC/trnI-UAUtrnI-CAU
1988.3127772131,434131,510153,473153,547ccmC/trnI-UAUtrnI-CAU
2097.3683810181,820181,783102,705102,742rrn16
2197.3683810181,783181,820138,397138,434rrn16
2297.3683810212,945212,908138,397138,434rrn16
2397.3683810212,908212,945102,705102,742rrn16
241003400311,355311,322109,513109,546trnN-GUU
251003400311,322311,355131,593131,626trnN-GUU

The dashes means empty

Comparison of a homologous fragment in the B. chinense cp genome to that in the mt genome. The green line segment of the circle represents the B. chinense cp genome, and the red line segment represents the B. chinense mt genome. The green line segment in the circle connects the start and end points of the transferred gene fragments. The width of the green line segment represents the size of the transferred fragment Comparison of a homologous fragment in the B. chinense cp genome to that in the mt genome The dashes means empty

Substitution rates of protein-coding genes

It is important to determine the number of non-synonymous substitutions (Ka) and synonymous substitutions (Ks) as it is of great significance for the phylogenetic reconstruction of related species and for understanding the evolutionary dynamics of protein-coding sequences [40, 41]. The Ka/Ks value can be used to determine whether a specific protein-coding gene was under selection pressure during evolution. In the case of a neutral selection, Ks = Ka or Ka/Ks = 1. If the Ka value is higher than the Ks value, it is indicative of positive selection (Ka/Ks > 1), while if Ks > Ka or Ka/Ks < 1, it is indicative of negative selection [42]. The 18 protein-coding genes from the B. chinense mt genome were compared with the mt genomes of Daucus carota (NC017855) [43] and B. falcalum (NC035962) and analyzed using Ka/Ks values. As shown in the Fig. 6, upon comparing the mt genome of B. chinense with that of B. falcalum, the Ka/Ks value of protein-coding genes such as ccmB and nad4 was found to be > 1, indicating that positive selection had occurred during evolution. In comparison to D. carota, the Ka/Ks values of rps1 and rps14 were > 1, indicating that both the coding genes in the B. chinense mt genome had been positively selected. The Ka/Ks value was < 1 for most protein-coding genes, which indicates that these genes had undergone negative selection during evolution.
Fig. 6

Ka/Ks values of 18 protein-coding genes of B. chinense versus those of ten species

Ka/Ks values of 18 protein-coding genes of B. chinense versus those of ten species

Comparison of the mt genome size and GC content between B. chinense and other species

The main characteristics of plant organelle genomes are their genome size and GC content. Seventeen plant mt genomes were selected and their sizes and GC contents were compared with those of the B. chinense mt genome. These 17 plant species included 2 species of Cruciferae, 3 species of Solanaceae, 7 species of Leguminosae, 1 specie of Umbelliferae, 1 specie of Labiatae, and 3 species of Gramineae. The species names and accession numbers are shown in Table 7. As shown in Fig. 7, plant mt genome sizes varied greatly, and the size of the selected plant mt genomes ranged from 219.766 Kb (Brassica juncea) to 566.589 Kb (Senna tora). The difference in the GC content of mt genomes was relatively small, with both being approximately 45%, which indicates that although the size of plant mt genomes differs greatly, their GC content is relatively conserved during the evolutionary process.
Table 7

NCBI accession numbers of mt genomes used in this study

SpeciesFamilyCategoryAccession number
Bupleurum chinenseUmbelliferaeBupleurum L.OK_166971
Brassica junceaCruciferaeBrassica L.NC_016123
Brassica napusCruciferaeBrassica L.NC_008285
Raphanus sativusCruciferaeRaphanus L.NC_018551.
Hyoscyamus nigerSolanaceaeHyoscyamus L.NC_026515
Nicotiana tabacumSolanaceaeNicotiana L.NC_006581
Solanum melongenaSolanaceaeSolanum L.NC_050334
Glycine sojaLeguminosaeGlycine Willd.NC_039768
Glycyrrhiza uralensisLeguminosaeGlycyrrhiza L.NC_053919
Senna occidentalisLeguminosaeCassia L.NC_038221
Senna toraLeguminosaeCassia L.NC_038053
Sophora flavescensLeguminosaeSophora L.NC_043897
Vigna angularisLeguminosaeVigna SaviNC_021092
Vigna radiataLeguminosaeVigna SaviNC_015121
Daucus carotaUmbelliferaeDaucus L.NC_017855
Salvia miltiorrhizaLabiataeSalvia L.NC_023209
Oryza rufipogonGramineaeOryza L.NC_013816
Sorghum bicolorGramineaeSorghum MoenchNC_008360
Triticum aestivumGramineaeTriticum L.NC_036024
Ginkgo bilobaGinkgoaceaeGinkgo L.NC_027976
Fig. 7

Sizes and GC contents of 18 plant mt genomes

NCBI accession numbers of mt genomes used in this study Sizes and GC contents of 18 plant mt genomes

Phylogenetic analysis

In order to understand the process of evolution of the B. chinense mt genome, this article conducted a phylogenetic analysis of the B. chinense mt genome and the published mt genomes of 19 plants. Phylogenetic trees were constructed based on maximum likelihood and Bayesian analysis, respectively. The names of the selected species and their NCBI accession numbers are shown in Table 7. The phylogenetic analysis selects Ginkgo biloba as an outgroup, the classification results of the phylogenetic tree constructed based on the two analysis methods are consistent. The results showed that Cruciferae, Solanaceae, Leguminosae, Labiatae, Gramineae, and Umbelliferae were well-clustered (Fig. 8). The clustering in the phylogenetic tree is consistent with the relationships of these species at the family and genus levels, indicating that mt genome-based clustering results are reliable. Based on the phylogenetic tree, results were obtained. 20 species of plants were found to be clustered into 3 major groups. Brassica napus, B. juncea, and Raphanus sativus, which belonged to the Cruciferae family, Vigna angularis, V. radiata, Glycine soja, Glycyrrhiza uralensis, Sophora flavescens, S. tora and S. occidentalis, which belonged to the Leguminosae family, were grouped together. B. chinense and the Umbelliferae plant D. carota were clustered into a small group, and the relationship between them was the closest, then clustered with Hyoscyamus niger, Nicotiana tabacum, Solanum melongena and Salvia miltiorrhiza into the second group; Oryza rufipogon, Sorghum bicolor, and Triticum aestivum, which belonged to the Gramineae family, were clustered into the third category.
Fig. 8

The phylogenetic relationships of B.chinense with other 19 plant species. Ginkgo biloba served as outgroup. A Phylogenetic trees constructed using the maximum-likelihood method. B Phylogenetic tree constructed using a Bayesian method

The phylogenetic relationships of B.chinense with other 19 plant species. Ginkgo biloba served as outgroup. A Phylogenetic trees constructed using the maximum-likelihood method. B Phylogenetic tree constructed using a Bayesian method

Discussion

Mitochondria provide plant cells with the energy needed for life processes. Plant mitochondria have a relatively complex genome [44] that exhibits abundant sequence-related changes. They have multiple types of repetitive sequences and relatively conserved coding sequences [45, 46]. The rapid development of genome sequencing technology has accelerated the study of the mitochondrial genome. Our study describes the basic characteristics of the B. chinense mt genome for the first time and our findings provide an important basis for understanding the function, inheritance, and evolutionary trajectory of the mt genome. The B. chinense mt genome is a circular sequence with a length of 435,023 bp and 45.04% GC content. We performed BLAST analysis and annotated sequences using software, and found that there were 39 protein-coding genes, 35 tRNA genes, 4 rRNA genes, and 2283 ORFs in the mt genome. GC content is a significant factor for assessing species. The GC content of the B. chinense mt genome is 45.68%, which is comparable to other sequenced plant mt genomes (D. carota, 45.42% [43]; B. juncea, 45.24% [15]; S. flavescens, 44.86% [47]), but higher than the B.chinense cp genome(37.68%) [39]. Since sequence repetitions can cause intermolecular recombination in mitochondria, it is particularly important to perform repetitive sequence analysis [48]. Repetitive sequences in the B. chinense mt genome, including simple repetitive sequences, scattered repetitive sequences, and tandem repetitive sequences, were analyzed. The results showed that the B. chinense mt genome contained abundant repetitive sequences, and 400 SSR loci were detected; among these, the number of single nucleotide repetitions was the largest. The identified SSRs were mainly composed of the A and T bases. Since the A and T bases were connected via two hydrogen bonds, the energy required to break the bonds is much less than that for the GC bonds and will change more easily. Kuang [49] and Qian [50] have shown that due to the bonds between A and T, SSRs containing AT repeat motifs are more likely to appear in the cp genome as well as in the mt genome. In addition, 844 scattered repeats with a length greater than or equal to 30 bp were identified, and 10 tandem repeats with a greater than 95% match were found. RNA editing occurs during a post-transcriptional process in the cp and mt genomes of higher plants, and can alter the genetic information at the mRNA level, which enables more efficient protein folding [51]. In this study, 517 RNA editing sites were identified in 34 coding genes of the B. chinense mt genome, with a total of 31 codon transfer types. Among the codon transfer types, TCA = > TTA was the most common, with 77 editing sites. After RNA editing, 8.12% of hydrophobic amino acids became hydrophilic, and 48.16% of hydrophilic amino acids became hydrophobic. Consistent results exists in the Diospyros oleifera mt genome [52], where the most abundant transfer type in this plant was TCA = > TTA, number 78, which has been edited to change the hydrophobicity of more than half of the amino acids [52]. The selection of B. chinense mt genome editing sites showed a strong bias, with all editing sites being C-T editing, which is the most common editing type in plant mt genomes according to several studies [53, 54]. In previous studies, RNA edits that occurred at the second position of a codon accounted for more than half of the total [20, 55]. In the B. chinense mt genome, 65.18% of the editing sites were also located at the second-position base of the triplet codon, whose result is consistent with those of previous studies. In addition, after RNA editing, the encoded amino acid will change into stop codons (TAA, TAG, TGA). In the B. chinense mt genome, 0.97% of the amino acid is edited into a stop codon, which resulted in the coding process being stopped prematurely, thus altering the function of the gene. The transfer of plant DNA between organelles and nuclear genomes as well as between species occurs frequently, and sequencing analysis has led to the discovery of DNA transfer events between different genomes (mitochondrial, nuclear and chloroplast) in many plants [56, 57]. Previous studies found that DNA transfer events is mainly organelle genome to nuclear genome DNA fragment transfer, followed by the nuclear genome and plastid genome to mitochondrial genome transfer [58, 59]. Plant mt DNA transfers its sequence fragments to nuclear DNA (rarely to cp DNA), and integrates some nuclear and cp DNA sequences [60, 61]. In high plants, the total length of transferred DNA varies depending on the plant species, lengths ranging from 50 kb (Arabidopsis thaliana) to 1.1 Mb (Oryza sativa subsp. Japonica) [62]. In this study, a total length of 11,144 bp was found to be transferred from the cp genome to the mt genome, accounting for 2.56% of the mt genome. The proportion of the transferred fragments in the mt genomes is similar to the previously reported data for Acer truncatum (2.36%) [63] and Salix suchowensis (2.8%) [64], but lower than Suaeda glauca (5.18%) [20]. In the transfer of DNA fragments from the cp genome to the mt genome of angiosperms, the transfer of tRNA genes is the most common [51]. We identified 25 homologous fragments that had been transferred from the cp genome to the mt genome, these homologous fragments contained 8 annotated genes, of which 6 were tRNA genes. This result is similar to those of Ma [63] et al., who found that the transfer fragment of the A. truncatum cp genome to the mt genome contained six integrated genes, of which 5 are tRNA genes. We analyzed the codons of the B. chinense mt genome, and determined the values of related parameters, such as GC1s, GC2s, GC3s, GC_all, Nc, and RSCU. Nc values ranged from 20 to 61; if values were closer to 20, it indicated a stronger codon preference and vice versa [65]. The Nc value of the codon of the B. chinense mt genome was 55.48, which indicated that the codon preference of the B. chinense mt genome was weak. The RSCU value can reflect the ratio of the actual frequency of use of a codon to the theoretical frequency when there is no usage bias; if RSCU = 1, it means that codon usage is unbiased, and if RSCU< 1, it means that the actual frequency of use of the codon is lower than the frequency of use of other synonymous codons, and if it is vice versa, it is higher than the frequency of use of other synonymous codons [66]. The results of the analysis show that there were 30 codons for which the RSCU> 1, and most of these ended with A/T bases. The results of Ka/Ks analysis of the mt genomes of B. chinense, D. carota, and B. falcalum showed that most of the genes were negatively selected during the evolution process, indicating that the protein-coding genes of the B. chinense mt genome are relatively well-conserved. However, the Ka/Ks value of protein-coding genes such as ccmB, nad4, rps1 and rps14 were found to be > 1, indicating that positive selection occurred during the evolution of these coding genes. Other plant mt genomes also have protein-coding genes with Ka/Ks ratios > 1 [64, 67], and a high gene Ka/Ks ratio plays an important role in further studies on gene selection and evolution of species. The size and GC content of the B. chinense mt genome were compared with D. carota mt genome. It was found that the size of the mt genome differed greatly, but its GC content was relatively conserved during the evolutionary process. In addition, based on the information obtained from the mt genome, a phylogenetic analysis of the B. chinense mt genome and the published mt genomes of 19 plant species was performed. In conclusion, the evolutionary relationships among these species are consistent with the topology of the phylogenetic tree, indicating the consistency of traditional and molecular taxonomy.

Conclusion

In this study, the mt genome of B. chinense was sequenced, assembled, and annotated, and the DNA and amino acid sequences of annotated genes were analyzed thoroughly. The B. chinense mt genome is circular and 435,023 bp in length. Seventy-eight genes, of which 39 protein-coding genes, 35 tRNA genes, and 4 rRNA genes, were annotated in the mt genome. Then, the repeat sequences, RNA editing process, and codon preferences of the B. chinense mt genome were analyzed. Gene transfer between the mt and cp genomes in B. chinense was observed via the detection of gene homologous fragments. In addition, our results also show that although plant mt genomes vary greatly in size, their GC content is relatively conserved during the evolutionary process. The results of Ka/Ks analysis, which was based on coding substitutions, show that most coding genes have undergone negative selection, indicating that mt genes were conserved during the process of evolution. This study provides extensive information regarding the mt genome of B. chinense. Importantly, it lays the foundation for future research on genetic variation, systematic evolution, and breeding of B. chinense using the mt genome.

Methods

Plant growth conditions, DNA extraction, and de novo sequencing

B. chinense plants were planted in the traditional Chinese medicine resource garden at the School of Life Sciences, Shanxi Agricultural University(Taigu, Shanxi, China). Plants were kept in the dark for 14 d to obtain etiolated B. chinense seedlings. The material was scrubbed with 70% alcohol to remove the dust and soil from the surface of B. chinense, snap-frozen in liquid nitrogen, and placed in a pre-cooled 50-mL sealed bag. We collected about 20 g of etiolated B. chinense seedlings, transported them using dry ice, and transferred them to the GENEPIONEER laboratory (Nanjing, China). Mt DNA of B. chinense was extracted from the sample and sequenced using the Nanopore (2000cUV-Vis) sequencing platform. To obtain a high-quality B. chinense mt genome, we used fastp (v0.20.0, https://github.com/OpenGene/fastp) software to filter the raw data, and discard the sequencing junction and primer sequences in the reads, filter out reads with an average quality value of less than Q5, and filter out reads for which the number (N) was greater than 5, and obtain high quality reads. The triple sequenced data were filtered using Filtlong v0.2.1 software, and counted using Perl scripts.

Assembly and annotation of the mt genome

The original tri-generational data were spliced using Canu assembly software [68] to obtain the contig sequence, which was compared to the plant mt gene database using BLAST v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The contig of the mt gene used for comparison was used as the seed sequence, and was extended and cyclized using the original data to determine the master structure (or sub-loop) of the ring; assembly was performed using NextPolish v1.3.1 [69] (https://github.com/Nextomics/NextPolish) using second- and third-generation data. The results were corrected and the results of the final assembly process were obtained after manually checking for errors. Mt annotation was performed using the following steps: the encoded proteins and rRNAs were compared to published plant mt sequences using BLAST, and further manual adjustments were made based on closely related species. The tRNA was annotated using tRNAscanSE [70] (http://lowelab.ucsc.edu/tRNAscan-SE/). ORFs were annotated using Open Reading Frame Finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). The mt genome was constructed using OrganellarGenomeDRAW [71] (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html).

Analysis of repeat sequences

Interspersed repeat sequences were identified using a combination of vmatch v2.3.0 (http://www.vmatch.de/) software and Perl scripts. The minimum length was set to 30 bp, and four types of sequences were identified: forward, backward, reverse, and complementary. Simple repetitive sequence analysis was performed using MISA online software [72] (https://webblast.ipk-gatersleben.de/misa/). We identified 8, 4, 4, 3, 3, and 3 repeats with 1, 2, 3, 4, 5, and 6 bases, respectively, in this analysis. The minimum distance between the two SSRs was set at 100 bp. Tandem repeats with lengths > 6 bp and > 95% matching repeat units were detected using Tandem Repeats Finder v4.09 software [33] (http://tandem.bu.edu/trf/trf.submit.options.html). The codon composition of the mt genome of B. chinense was analyzed using a self-encoded Perl script, to screen for a unique CDS and determine the number of codons per gene, GC content (GC1, GC2, and GC3), average GC content of 3 bases (GC all), effective number of codons (Nc, effective number of codons), and RSCU of synonymous codons.

Chloroplast to mitochondrion DNA transformation and RNA editing analyses

The cp genome sequence of B. chinense (NC_046774.1) [39] was downloaded from NCBI Organelle Genome Resources Database. BLAST v2.10.1 software was used to identify the homologous fragments in the mt genome and cp genome. Screening criteria were set to ensure that the matching rate was ≥70%. The editing sites in the mt RNA of B. chinense were identified using the mt gene-encoding proteins of plants as reference proteins. The analysis was conducted using the Plant Predictive RNA Editor (PREP) suite [34] (http://prep.unl.edu/).

Ka/Ks analysis and phylogenetic tree construction

Synonymous (Ks) and nonsynonymous (Ka) substitution rates of protein-coding genes were analyzed in the mt genome of B. chinense using B. falcalum and D. carota as references. Ka/Ks analysis aligned the CDS sequence using mafft v7.427, Ka/Ks was calculated using the Ka/Ks Calculator v2.0 [42] software MLWL model. The mt genome sequences of 19 species from different families were aligned using MAFFT [73] software. Connect the aligned sequences end-to-end, trim them with trimAl (v1.4.rev15) (parameter: -gt 0.7), and use jmodeltest-2.1.10 software to predict the model after trimming, and determine that the model is of GTR type. Then use RAxML [74] software, select GTRGAMMA model, bootstrap = 1000, build the maximum likelihood evolutionary tree. Bayesian analysis was performed using MrBayes v3.2.7, Markov Chain Monte Carlo (MCMC) iterative operation for 1 million generations, sampling every 100 generations. As a result, the initial 25% of the phylogenetic tree is removed (burn-in), and a majority consistent tree is finally obtained.
  67 in total

1.  Computational Detection of Plant RNA Editing Events.

Authors:  Alejandro A Edera; M Virginia Sanchez-Puerta
Journal:  Methods Mol Biol       Date:  2021

2.  NextPolish: a fast and efficient genome polishing tool for long-read assembly.

Authors:  Jiang Hu; Junpeng Fan; Zongyi Sun; Shanlin Liu
Journal:  Bioinformatics       Date:  2020-04-01       Impact factor: 6.937

3.  Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system.

Authors:  T Ikemura
Journal:  J Mol Biol       Date:  1981-09-25       Impact factor: 5.469

4.  Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis.

Authors:  Jaime I Davila; Maria P Arrieta-Montiel; Yashitola Wamboldt; Jun Cao; Joerg Hagmann; Vikas Shedge; Ying-Zhi Xu; Detlef Weigel; Sally A Mackenzie
Journal:  BMC Biol       Date:  2011-09-27       Impact factor: 7.431

5.  Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica.

Authors:  Shengxin Chang; Tiantian Yang; Tongqing Du; Yongjuan Huang; Jianmei Chen; Jiyong Yan; Jianbo He; Rongzhan Guan
Journal:  BMC Genomics       Date:  2011-10-11       Impact factor: 3.969

6.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Authors:  Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy
Journal:  Genome Res       Date:  2017-03-15       Impact factor: 9.043

7.  The alternative reality of plant mitochondrial DNA: One ring does not rule them all.

Authors:  Alexander Kozik; Beth A Rowan; Dean Lavelle; Lidija Berke; M Eric Schranz; Richard W Michelmore; Alan C Christensen
Journal:  PLoS Genet       Date:  2019-08-30       Impact factor: 5.917

8.  The Mitogenome of Norway Spruce and a Reappraisal of Mitochondrial Recombination in Plants.

Authors:  Alexis R Sullivan; Yrin Eldfjell; Bastian Schiffthaler; Nicolas Delhomme; Torben Asp; Kim H Hebelstrup; Olivier Keech; Lisa Öberg; Ian Max Møller; Lars Arvestad; Nathaniel R Street; Xiao-Ru Wang
Journal:  Genome Biol Evol       Date:  2020-01-01       Impact factor: 3.416

9.  Assembly and comparative analysis of the first complete mitochondrial genome of Acer truncatum Bunge: a woody oil-tree species producing nervonic acid.

Authors:  Qiuyue Ma; Yuxiao Wang; Shushun Li; Jing Wen; Lu Zhu; Kunyuan Yan; Yiming Du; Jie Ren; Shuxian Li; Zhu Chen; Changwei Bi; Qianzhong Li
Journal:  BMC Plant Biol       Date:  2022-01-13       Impact factor: 4.215

10.  INTRAMITOCHONDRIAL FIBERS WITH DNA CHARACTERISTICS. I. FIXATION AND ELECTRON STAINING REACTIONS.

Authors:  M M NASS; S NASS
Journal:  J Cell Biol       Date:  1963-12       Impact factor: 10.539

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.