Literature DB >> 27148547

The Whole Genome Assembly and Comparative Genomic Research of Thellungiella parvula (Extremophile Crucifer) Mitochondrion.

Xuelin Wang1, Changwei Bi1, Yiqing Xu2, Suyun Wei1, Xiaogang Dai3, Tongming Yin3, Ning Ye1.   

Abstract

The complete nucleotide sequences of the mitochondrial (mt) genome of an extremophile species Thellungiella parvula (T. parvula) have been determined with the lengths of 255,773 bp. T. parvula mt genome is a circular sequence and contains 32 protein-coding genes, 19 tRNA genes, and three ribosomal RNA genes with a 11.5% coding sequence. The base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order shows a slight bias of 55% AT. Fifty-three repeats were identified in the mitochondrial genome of T. parvula, including 24 direct repeats, 28 tandem repeats (TRs), and one palindromic repeat. Furthermore, a total of 199 perfect microsatellites have been mined with a high A/T content (83.1%) through simple sequence repeat (SSR) analysis and they were distributed unevenly within this mitochondrial genome. We also analyzed other plant mitochondrial genomes' evolution in general, providing clues for the understanding of the evolution of organelles genomes in plants. Comparing with other Brassicaceae species, T. parvula is related to Arabidopsis thaliana whose characters of low temperature resistance have been well documented. This study will provide important genetic tools for other Brassicaceae species research and improve yields of economically important plants.

Entities:  

Year:  2016        PMID: 27148547      PMCID: PMC4842374          DOI: 10.1155/2016/5283628

Source DB:  PubMed          Journal:  Int J Genomics        ISSN: 2314-436X            Impact factor:   2.326


1. Introduction

Plant mitochondrial genome contains unique features, including large-scale sequence, frequent recombination, and merging foreign DNA [1]. Mitochondria are significant to life and play an important role in the cell. Mitochondria tackle oxidative phosphorylation very efficiently which is related to energy releasing in the electron transport and each molecule of glucose oxidized can produce about 30 molecules of ATP; however, only two molecules of ATP are produced through nuclear-controlled glycolysis. Toxic reactive oxygen species, a byproduct produced by mitochondria through oxidative phosphorylation increasing with age, can damage proteins, DNA, lipids, and their production [2]. As a typically active organelle with regularly undergoing fission and fusion, mitochondria have been thought to play an important part in mitochondrial outer membrane permeabilization process [3]. Another important feature of mitochondrial genome is its maternal inheritance, which shows its difference with nuclear DNA [2]. In plants, nuclear genes carry out relative copies of molecules of recombination-derived subgenomic DNA from mitochondria [4]. Additionally, a large set of mitochondria ribosomal proteins moves rapidly to the nucleus in certain angiosperm groups [5]. Although the evolution of plant mitochondrial genome sequence is slow, its structural evolution is swift [6]. The reported Arabidopsis thaliana mitochondrial genome sequence gives a more comprehensive and detailed view of internal duplication and foreign DNA uptake has influenced the size, structure, and evolutionary potential. Evolutionary surprises about plant mitochondrial genomes will continue to be brought out. However, within the high rate of mutation, only a very small part of plant mitochondrial genomes can be assembled. The Brassicaceae family contains many species that are able to grow well in extreme conditions. Their stress tolerance mechanisms have been found in these species which are able to survive in multiple abiotic stresses environments including freezing temperature and high salinity. The abiotic stresses are disadvantageous to improve the yield of plants, so in order to improve productivity of economically important plants such as winter rape (Brassica napus), mustard (Brassica juncea), cabbage (Brassica oleracea), and radish (Raphanus sativus), it is crucial to study how crucifers withstand natural stress. T. parvula is a species in the Brassicaceae (subclade Eutremeae), and its whole genome has been reported (GenBank ID: AFAN00000000.1) and is similar to Arabidopsis thaliana. So far, Arabidopsis thaliana's low-temperature-resistant ability has been well documented, which is also another advantage to study native crucifers [7]. Thellungiella parvula mitochondrion plays significant roles in ATP synthesis, metabolism, and stress response. Because of its efficient mobilization of resources in poor or degraded soils, T. parvula perform better not only under salt conditions, but also in cold and freezing circumstance. T. parvula can live in the environment of 5°C, which means its temperature-sensitive processes such as seed germination and the production of pollen and seeds were resistant to cold temperatures. Moreover, the land plant shows dramatically different flowering phenotypes while shifting to cold environment [7]. It is expected that the research of the T. parvula mitochondrial complete genome sequence would serve as a reference and a useful genetic resource to the future research.

2. Results

2.1. T. parvula Mitochondrial Genome Content and Organization

Thellungiella parvula mitochondrial genome was assembled from the whole genome sequence using Roche-454 Sequencing technologies. The circular mitochondrial genome of T. parvula is 255,773 bp in size (GenBank accession number KT988071), with the overall base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order and A + T content of 55.0% (Table 1). The overall GC content of the T. parvula mitogenome is 45%, which is similar to Arabidopsis thaliana, the first published mitochondrial genome sequence of the Brassicaceae family, originally from NCBI (NC_001284).
Table 1

Overview of the Thellungiella parvula mitochondrial genome sequence.

NoteSize
A content27.5%
T content27.5%
C content22.7%
G content22.3%
A + T content55%
Longest gene2009 bp
Mean intron length3740 bp
Sequencing information such as uniquely aligned reads and average read depth of gene was shown in Table 6. Among all the 25 contigs, the longest is 73000 bp in length; its length value and N50 value are relatively large, suggesting that these sequencing data will be accurate and useful to assemble. Also, the encoding length of the largest contig is 51610 bp which accounts for 71%. Mapping these contigs to the completed mitochondrial genome sequence results of Arabidopsis thaliana showed that 20 of the contigs were assembled only once, while the other 5 contigs were linked twice or three times. Some certain sequences also have multiple copies in the mitogenome by analyzing these sequence results. Contig 1 was used at the beginning and we linked other contigs sequentially according to the connecting map (Figure 1). Finally, the T. parvula single master circle of mitochondrial genome covering the great mass of the contigs was generated with a complete length of 255773 bp.
Table 6

Sequence information of the Thellungiella parvula mitochondrial genome.

Total raw reads (millions) 2.156
Uniquely aligned reads (millions) 0.075
Average read length (bp) 479
Duplication (%) 6.76
Sample input (µg or ng) 50~100 µg
Average read depth of gene 37
Average read depth of contigs 52.477
Figure 1

Circular gene map of the Thellungiella parvula mitochondrial genome. The T. parvula mitogenome consists of 54 genes, including 3 ribosomal RNA (rRNA) genes (5S rRNA, 26S rRNA, and rrnS), 19 transfer RNA (tRNA) genes, and 32 protein-coding genes.

Using web-based tool Public MITOFY Analysis [5] and tRNA scan-SE, 54 genes were identified, including 3 ribosomal RNA (rRNA) genes (5S rRNA, 26S rRNA, and rrnS), 19 transfer RNA (tRNA) genes, and 32 protein-coding genes (PCGs) (Table 2) (Figure 1). In addition, the positions of these genes in the T. parvula mitochondrial genome are shown in Table 1. Among genes of known function, trnE-TTC has two copies. By comparison with other Arabidopsis thaliana, the mitochondrial genome of T. parvula has the same types of function genes as crucifers.
Table 2

Gene profile and organization of the Thellungiella parvula mitochondrial genome.

Gene/elementsStrandPositionSize (bp)Start codonStop codonAmino acidAnticodonRead depth
tRNATyr H832–91483 GTA34.8
Nad21458–371861467ATGTAA49338.025
Nad49133–170811488ATGTGA49334.8
Rpl2 L20135–229261050ATGTGA34934.8
Atp927712–87567917ATGTGA16935.45
rrnS H158451–160299184940.2
ccmC H30512–31255744ATGTGA24734.8
Rps4 L38009–390971089ATGTAA36233.9
Nad743566–438271185ATGTAG39433.9
Atp8 L51211–51687477ATGTGA15831.2
Nad6 H57368–57985618ATGTAA20537.9
CcmFn-158827–600481179ATGTAG39236.1
CcmFn-2 H213390–214040651ACGTAG21637.9
Nad163688–249532978ACGTAA32537.68
Cox3 L68658–69455798ATGTGA26536.1
tRNAGly L71129–7120274GCC36.1
CcmFc71321–736321359ATGTAA45236.1
Cox2 H80299–82430783ATGTAA26036.1
tRNASer L91957–9204387GCT36.1
cob L92354–935351182ATGTGA39336.1
Rpl5 L95345–95902558ATGTAA18536.1
Rpl16 L96020–96547528ACGTAA17836.1
Rps3 L96438–979641527ACGTAG50836.1
tRNASer H106353–10643986GCT36.1
Rps12 L116115–116492378ATGTGA12536.1
Nad3 L116538–116897360ATGTAA11936.1
tRNALys H118042–11811473TTT36.1
CcmB H119597–120217621ATGTAA20636.1
tRNAAsp H123824–12389774GTC36.1
Rrn26 H130530–133703317456.1
tRNAHis H139455–13952874GTG40.2
tRNAGlu H139709–13978072TTC40.2
tRNASer H139828–13991588GCT40.2
trnfM H144494–14456774CAT40.2
tRNAMet H153356–15342873CAT40.2
Rrn5 H160405–16052311940.2
Nad5167199–2063402009ATGTAA66256.46
matR L170736–1727091974ATGTAG75740.2
Atp1 H178158–1796811524ATGTAG50740.2
tRNAIle H188375–18845581CAT33.9
Rps7 H189123–189569447ATGTAA14833.9
tRNACys H195438–19550871GCA33.9
tRNAPro L195611–19568575TGG33.9
tRNAGlu H195757–19582872TTC33.9
tRNALeu H198814–19889885CAA33.9
tRNATrp H201484–20155774CCA35.5
Nad9 L209345–209917573ATGTAA19037.9
tRNAGln H216365–21643672TTG37.9
Atp6 L223161–223604444ATGTAA14737.9
Atp4 L231218–231796579ATGTAA19233.3
Nad4L L232069–232371303ATGTAA10033.3
Cox1-1 L251655–252425771ATGTAA25635.8
Cox1-2 L250843–251604762ATGTAA25335.8
tRNAAsn H255697–25576872GTT37.1
The percentage of genes of known function is 11.5% (Table 7), 32 protein-coding genes in this mitogenome contain a total of 29,118 bp, and 9,706 codons are coded by nucleotide. Analyzing all the 32 protein-coding genes, the nad5 is the longest (2009 bp) and the shortest is the nad3 (119 bp). In these codons, 157 (1.62%) encode cysteine and 1071 (11.06%) encode leucine, which are the least and the most prevalent amino acids separately. AUU and UGC are used frequently to encode leucine and cysteine, respectively. Furthermore, nine genes have introns: four genes (rpl2, ccmFn1, ccmFc, and cox2) have a single intron, two genes (nad4, nad2) have three introns, and three genes (nad1, nad5, and nad7) have four introns. The complete NADH dehydrogenase subunit 2 (nad2) in T. parvula mitochondrial genome is encoded by five exons. These exons are located in two distant locations of the mitochondrial genome. One genomic region encodes exons a and b; the other encodes exons c, d, and e. As a result, the trans-splicing reactions are required to link exons b and c [8, 9].
Table 7

The distribution of reads in the Thellungiella parvula mitochondrial genome.

Note Percentage (%)
rRNA 2.010
tRNA 0.6
Intergenic 76.132
Intron 8.457
Exon 4.501
The mitochondrial genome of T. parvula encoding tRNA and rRNA accounts for 0.6% and 2.01%, respectively. Noncoding sequences are present in a high rate (intergenic regions 76.132%, intron regions 8.457%, and exon regions 4.501%) (Table 7). Fourteen SNPs are identified: nine are synonymous and five are not synonymous (cox3, cob, ccmB, atp9, and rpl5). Additionally, we built a GBrowse (Figure 2) for the T. parvula mitogenome (http://bio.njfu.edu.cn/cgi-bin/gb2/gbrowse/Thellungiella_parvula/).
Figure 2

The GBrowse of the Thellungiella parvula mitogenome, showing the detailed location of mitogenome, GC content, CDs, tRNA, and other useful information.

2.2. Gene Organization of T. parvula Mitochondrial Genome

It has been observed that 19 of the conserved 32 protein-coding genes produce components of the electron transport chain and ATP synthase: nine subunits of complex I (nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, and nad9), one subunit of complex III (cob), four subunits of complex IV (cox1-1, cox1-2, cox2, and cox3), and five subunits of complex V (atp1, atp4, atp6, atp8, and atp9). Five additional proteins are involved in the biogenesis of cytochrome c (ccmB, ccmC, ccmFN-1, ccmFn-2, and ccmFC). Another seven genes encode ribosomal proteins (rpl2, rpl5, rpl16, rps3, rps4, rps7, and rps12). The remaining genes encode maturase (matR). What is more, the gene ccmFN in T. parvula mitochondrial genome is divided into two reading frames. The 19 tRNA genes, especially including three trnS-GCTs, are significantly sufficient to decode all codons. Out of 54 genes, fourteen tRNA genes (tRNA , tRNA , tRNA , tRNA , tRNA , tRNA , trnfM, tRNA , tRNA , tRNA , tRNA , tRNA , tRNA , and tRNA ) and seven protein-coding genes (ccmC, Nad6, CcmFn-2, Cox2, ccmB, Atp1, and Rps7) are encoded on the H-strand; others are encoded on the L-strand (Table 2) [10]. Three rRNA genes (rrn5, rrn26, and rrnS) are highly conserved between the two mitochondrial genomes of Brassicaceae.

2.3. Repeat Structure and Sequence Analysis

A repeat is defined as a pair of sequences which has over 90% similarity [1]. Overall, 53 repeats are identified in the mitochondrial genome of T. parvula, including 28 tandem repeats (TRs), 24 direct repeats, and one palindromic repeat. An amount of 28 TRs is recognized in T. parvula, which are fewer than A. thaliana (48 TRs) (Table 3). Among these repeats, the size of these tandem repeats is 6–69 bp. There are four long TRs (>100 bp) (with copy numbers of 51, 51, 69, and 69 separately) located in the intergenic spacer regions of nad3-trnK-TTT, nad5-nad1, cox1-1-trnN-cp, and cox1-1-trnN-cp of the T. parvula mitochondrial genome (Table 3). Tandem repeated sequences are allocated unevenly; repeat-rich sequences are widely concentrated near the central and the end regions.
Table 3

Tandem repeats in the mitogenome of Thellungiella parvula.

Serial numberStart siteStop sitePeriod size (nt)
12233231331
25809585722
3204972055328
4545655459918
5604686052625
6604746053125
7842028423114
81022211022519
911268211270914
1011391811399326
1111485211490327
1211743911753751
1312358112366736
1412378312382020
1518331818338133
1620885120888819
1721128921135829
1821671521678827
1922239422242918
2022621222625518
2123453123463051
2223563723570927
2323649823654916
2423648523654932
2523723523728719
2625333525349269
2725338125355469
2825438125440512
We used Blastn to align the T. parvula mitochondrial genome to locate and identify disperse repeats which has a minimal length of 100 bp and more than 90% identity between the two repeat copies, including forward repeats, inverted repeats, and palindromic repeats. As a result, the repeats in the Thellungiella parvula mitochondrial genome include sixteen forward repeats and eight inverted repeats (disperse repeats). We paid more attention to the 24 large repeats because they are related to reversible genomic structural changes. The large repeat of T. parvula mitochondrial genome is 10,084 bp in size, accounting for 3.9% of the whole mitogenome. Sixteen forward repeats and eight inverted repeats were among 106 and 453 bp (Figure 3). Most of these repeats are located in the no-protein-coding area; however, two inverted repeats are in the protein-coding regions and rRNA gene, respectively. These two genes in mitogenome that participated in energy metabolism probably have important effects on T. parvula. What is more, the length of palindromic repeat is 32 bp, which is distributed in the intergenic spacer regions of ccmFc and cox2. The disperse repeats research will offer significant information for the future population phylogeny study.
Figure 3

Frequency distribution of repeat lengths in the mitochondrial genome of Thellungiella parvula.

Simple sequence repeats (SSRs), also known as microsatellites, are usually 1–6 bp in length per unit and are widely distributed over the mitochondrial genome [11]. With a higher degree of polymorphism, codominant microsatellites are used as molecular marker assisting breeding [12], population genetics [13], genetic linkage map construction, and gene mapping [14]. A total of 199 simple sequences repeats were detected in the T. parvula mitochondrial genome, of which 4 are pentanucleotides, 6 are trinucleotides, and 189 are others. These SSRs have a high A/T content (83.1%) and the 3 dinucleotides are composed of AT (Table 4). The higher A/T content in SSRs of mitochondrial genome also contributes to a slight bias of 55% AT in the T. parvula mitogenome. Moreover, it is clear that SSRs distributed in the protein-coding areas are much fewer than noncoding regions, which account for 9.06% and 90.94%, respectively. Additionally, 32.01% of SSRs exist in the introns and 58.93% of SSRs exist in the intergenic spacer regions. However, SSRs account for only 0.1% of the whole protein-coding areas, which may further indicate an uneven distribution of SRRs within the mitochondrial genome of T. parvula.
Table 4

Distribution of SSRs in the mitogenome of T. parvula.

TypeSSRNumber of repeatsStartEndLocation
MonomerT9763771
A820742081Nad2(intron)
A81093210939Nad4(intron)
C81209612103Nad4(intron)
T101225312262Nad4(intron)
C91443514443Nad4(intron)
A111752517535IGS(nad4, rpl2)
T101917319182IGS(nad4, rpl2)
A112094420954Rpl2(intron)
A82513525142IGS(rpl2, atp9)
T82755827565IGS(rpl2, atp9)
C113550235512Nad2(intron)
C103983739846IGS(rps4, nad7)
A104350643515IGS(nad4, rpl2)
A94636146369Nad7(intron)
G94784347851Nad7(intron)
T85503855045IGS(nad6, atp8)
T125917359184CcmFn1(intron)
T116062960639IGS(ccmFn1, nad1)
T86306063067IGS(ccmFn1, nad1)
C86507965086IGS(cox3, nad1)
A86915269159Cox3
T97007870086IGS(cox3, trnG-GCC)
A87098970996IGS(cox3, trnG-GCC)
T87162771634ccmFC
T87350873515ccmFC
T87378273789IGS(ccmFC, cox2)
T88355383560IGS(cox2, atp9)
A98587685884IGS(cox2, atp9)
T89211492121IGS(cob, trnS-GCT)
A99436094368IGS(cob, rpl5)
T89647496481Rpl16
T99765097658IGS(rps3, trnS-GCT)
G119947499484IGS(rps3, trnS-GCT)
A8106875106882IGS(rps12, trnS-GCT)
T8107254107261IGS(rps3, trnS-GCT)
T8107957107964IGS(rps3, trnS-GCT)
C8117336117343IGS(nad3, trnK-TTT)
T8134167134174IGS(Rrn26, trnH-cp)
T8137531137538IGS(Rrn26, trnH-cp)
T9153501153509IGS(rrnS, trnM-cp)
T9154495154503IGS(rrnS, trnM-cp)
C8156682156689IGS(rrnS, trnM-cp)
G8158985158992rrnS
T8160651160658IGS(rrn5, nad5)
G9166753166761IGS(rrn5, nad5)
A10173435173444Nad5(intron)
A8174660174667Nad5(intron)
T8178106178113IGS(nad1, atp1)
T8179843179850IGS(atp1, trnI-CAT)
A8180779180786IGS(atp1, trnI-CAT)
T8194226194233IGS(rps7, trnC-MT)
A11195829195839IGS(trnE-TTC, trnL-CAA)
T9210120210128IGS(nad9, ccmFn2)
A8212337212344IGS(nad9, ccmFn2)
G10231222231231Atp4
T8233295233302IGS(nad4L, nad5)
A8234008234015IGS(nad1, nad5)
A8234745234752IGS(nad1, nad5)
T8238579238586IGS(nad4L, nad5)
G9245887245895IGS(nad4L, nad5)
A10250105250114IGS(nad1, cox1)

TrimerAAG122738327394IGS(rpl2, atp9)
AAC127853778548IGS(ccmFC, cox2)
AAG128014280153IGS(ccmFC, cox2)
ACA128653586546IGS(atp9, cox2)
AAT128759387604IGS(atp9, trnS-GCT)
CTT21134442134462IGS(rrn26, trnH-cp)

PentamerGTTCT151035610370Nad4(intron)
ATATG155287152885IGS(atp8, nad6)
TAATA15101932101946IGS(rps3, trnS-GCT)
ATAGA15143566143580IGS(trnfM, trnS-GCT)

2.4. Comparison with Other Brassicaceae Species

T. parvula completed mitochondrial genome sequence provides a unique view of mitochondrial genome structure, organization, and gene complement. Comparing with other Brassicaceae species, we conclude that the T. parvula has a similar mitochondrial sequence organization to that of most species, such as Arabidopsis thaliana (GenBank ID: Y08501.2) which is unquestionably a kind of stress-sensitive crucifers, so it is meaningful to compare its mitogenome with that of Arabidopsis thaliana. From the web-based tool Public MITOFY Analysis, the T. parvula mitochondrial genome has 5 ATPase and 9 NADH which their numbers are equal to A. thaliana mitochondrial genome, but ribosomal proteins are slightly higher in T. parvula [4] than that in A. thaliana [3, 5]. Also, it may reflect the different environment pressures and habitats between the species adapted. Ribosomal proteins S1 (rps1), rps10, and rps11 have been completely lost in both species of the mitochondrial genome, which may indicate that genes transfer to nuclear compartment. In general, the base content of the A. thaliana mitochondrial genome is as follows: A (27.9%), T (27.3%), C (22.6%), and G (22.2%), which demonstrate a slight bias of A + T (55.2%) rich feature similar to that in T. parvula (Table 5). Except for mttB and nad1 genes employing the ACG start codon, most of mitochondrial protein-coding genes of A. thaliana initiate with ATG; the usage of the stop one is rather complete or incomplete [15]. Four types of stop codons are used by the coding genes: TAA (atp4, atp6, ccmB, ccmC, ccmFc, cox1, nad1, nad2, nad3, nad4L, nad9, rpl5, rpl6, rps4, rps7, and rps14), TAG (atp1, matR, sdh4, nad7, and rpl2), TGA (atp8, atp9, ccmFn, cob, cox2, cox3, nad4, nad6, and rps12), and incomplete stop codons T- for rps2, rps19. Approximately, a total of 1098 and 646 tandem repeats were found in the A. thaliana and T. parvula genomes, respectively. In our initial analysis, this halophyte, with a genome about 30% smaller than that of A. thaliana, shows striking difference in gene complement. The differences are partly due to tandem duplications in A. thaliana of single copy genes in T. parvula and some genes amplification which may have known or assumed functions in stress defense responses [16].
Table 5

Overview of the Arabidopsis thaliana mitochondrial genome sequence.

NoteSize
A content27.9%
T content27.3%
C content22.6%
G content22.2%
A + T content55.2%
Longest gene2019 bp

2.5. Phylogenetic Analysis

A phylogenetic context was based on amino acid sequences of the 15 protein-encoding genes (atp1, atp9, ccmB, cob, cox1, cox3, nad1, nad3, nad4, nad4L, nad6, nad7, nad9, rps3, and rps4) of 21 species (Brassica napus, Brassica rapa subsp. campestris, Brassica juncea, Raphanus sativus, Brassica carinata, Thellungiella parvula, Arabidopsis thaliana, Carica papaya, Vigna radiata, Citrullus lanatus, Cucumis sativus, Cucurbita pepo, Nicotiana tabacum, Silene latifolia, Beta vulgaris subsp. vulgaris, Beta vulgaris subsp. maritima, Oryza sativa Indica, Sorghum bicolor, Tripsacum dactyloides, Zea perennis, and Cycas taitungensis) (Figure 4). We got these twenty complete genome sequences from the GenBank of NCBI Organelle Genome Resources database (http://www.ncbi.nlm.nih.gov/genome/organelle/). Among these species, 20 are angiosperm species and one is a gymnosperm plant (Cycas taitungensis) that was set as an out-group. Twenty angiosperm plants are divided into two major clades: monocotyledons and dicotyledons plants. Bootstrap (Figure 5) analysis shows that there are 12 out of 18 nodes with bootstrap values >90%, and 8 of these have a bootstrap value of 100%. Those strong lineages conclude their sister relationship to Caricaceae and Brassicaceae (100% bootstrap values). In the NJ tree, T. parvula is near to Arabidopsis thaliana; although the bootstrap value is only 82%, the results are consistent with previous molecular studies [16]. Through the phylogenetic analyzing, the mitochondrial genome of T. parvula is evolutionarily close to that of Brassica napus, Brassica Rapa subsp. campestris, Brassica juncea, Raphanus sativus, and Brassica carinata. Then, the mitogenomes of Carica papaya and Zea perennis belong to Caricaceae and Gramineae, separately.
Figure 4

A phylogenetic context based on amino acid sequences of the fifteen protein-encoding genes (atp1, atp9, ccmB, cob, cox1, cox3, nad1, nad3, nad4, nad4L, nad6, nad7, nad9, rps3, and rps4) of twenty-one species (Brassica napus, Brassica rapa subsp. campestris, Brassica juncea, Raphanus sativus, Brassica carinata, Thellungiella parvula, Arabidopsis thaliana, Carica papaya, Vigna radiata, Citrullus lanatus, Cucumis sativus, Cucurbita pepo, Nicotiana tabacum, Silene latifolia, Beta vulgaris subsp. vulgaris, Beta vulgaris subsp. maritima, Oryza sativa Indica, Sorghum bicolor, Tripsacum dactyloides, Zea perennis, and Cycas taitungensis). These conserved genes were aligned with ClustalW and the phylogenetic tree was constructed using the Neighbor-Joining method in MEGA 6.

Figure 5

Maximum likelihood tree based on fifteen protein-encoding genes (atp1, atp9, ccmB, cob, cox1, cox3, nad1, nad3, nad4, nad4L, nad6, nad7, nad9, rps3, and rps4) of twenty-one species (Brassica napus, Brassica rapa subsp. campestris, Brassica juncea, Raphanus sativus, Brassica carinata, Thellungiella parvula, Arabidopsis thaliana, Carica papaya, Vigna radiata, Citrullus lanatus, Cucumis sativus, Cucurbita pepo, Nicotiana tabacum, Silene latifolia, Beta vulgaris subsp. vulgaris, Beta vulgaris subsp. maritima, Oryza sativa Indica, Sorghum bicolor, Tripsacum dactyloides, Zea perennis, and Cycas taitungensis). Numbers on each node are bootstrap support values. Cycas taitungensis is set as an out-group.

In the plants evolution, the numbers of protein-coding genes in mitochondrial genomes tend to decrease [17]. As a representative of gymnosperms, the mitochondrial genome of Cycas taitungensis has conserved almost all genes inherited from the seed plants, which was mitochondrial genome progenitor [18]. In contrast, Silene latifolia has lost all of the ribosome coding genes except for rpl5 and other genes like the sdh [19]. Succinate dehydrogenase genes are usually missing. Almost all the monocotyledons and dicotyledons plants have lost rps11 gene. This loss in the mitochondrial genome probably occurred in the early stage of the divergence process between monocotyledons and dicotyledons. The protein-coding gene like mttB that encodes a transport membrane protein in T. parvula, Brassica napus, and Brassica carinata is an exceptional losing one. More specially, contrasting with Cycas taitungensis, other twenty plants like Thellungiella parvula, Brassica carinata, Brassica napus, and Zea perennis have lost rpl10, sdh3, and sdh4 genes, which can be inferred as rpl10, sdh3, and sdh4 gradually developed into pseudogenes during the evolution of angiosperms. The loss of rps2 in sixteen dicotyledons plants (Brassica carinata, Brassica juncea, Brassica napus, Brassica rapa subsp. campestris, Raphanus sativus, Thellungiella parvula, Arabidopsis thaliana, Carica papaya, Vigna radiata, Citrullus lanatus, Cucumis sativus, Cucurbita pepo, Nicotiana tabacum, Silene latifolia, Beta vulgari subsp. vulgaris, and Beta vulgaris subsp. maritima) probably occurred before the formation of Magnoliopsida plants, since most ribosome protein genes are frequently absent in clades of angiosperm mitochondrial genomes, which can be considered unnecessary to some extent [19]. Some chloroplast tRNA genes through random intercellular transfers have been inserted to mitochondrial genomes [20]. Many evolutionarily original species still maintained ancient transfers to some degree; meanwhile, new transfers exist in individual genera or species. These chloroplast genome genes trnH, trnM, and trnS (GGA) are regularly found in the mitochondrial genomes of C. taitungensis and the nineteen species mitogenomes (Brassica carinata, Brassica juncea, Brassica napus, Brassica rapa subsp. campestris, Raphanus sativus, Thellungiella parvula, Arabidopsis thaliana, Carica papaya, Vigna radiata, Citrullus lanatus, Cucumis sativus, Cucurbita pepo, Nicotiana tabacum, Silene latifolia, Beta vulgari subsp. vulgaris, Beta vulgaris subsp. Maritima, Oryza sativa Indica, Sorghum bicolor, and Zea perennis). It may be considered that this gene transfer phenomenon has occurred before the formation of angiosperms. Through analyzing species mitochondrial genome sequence in the phylogenetic tree (Figure 4), many mitochondrial-like tRNA genes may be lost in various plants, but the trnE, trnP, and trnR genes exist stably in mitochondrial genomes of these seed plants. Additionally, according to the analyzing of mitogenome sequence and phylogenetic context (Figure 4), seed plants' mitochondrial genomes are inserted into trnH but disappeared in the mitogenome of Tripsacum dactyloides. Comparing with the typical gymnosperm Cycas taitungensis, which has the largest numbers of mitochondrial-like genes, the T. parvula mitochondrial genome has lost 10 genes, including seven protein-coding genes (rps1, rps10, rps11, rps13, rps14, rps19, and sdh3) and three tRNA genes (tRNA-Phe, tRNA-Arg, and tRNA-Val) (Table 2). Similarly, these 10 genes, which are lost in T. parvula, cannot be found in its related species Arabidopsis thaliana. As a result, these genes are not essential in gene expression or they can be duplicated by other genes [21]. However, other lost mitochondrial genes may be considered as some functional losses [22]. Furthermore, this may imply that gene loss will be regarded as the mitogenome evolutionary compaction in seed plants [23].

3. Discussion

Plants mitochondrial genome sequences are increased rapidly with the availability of sequencing technology [24]. Crops, a group that includes putative keystone species [25], are chosen as an obvious target to satisfy the most urgent desire to study and improve agronomically important species [26]. These plants are expanding potential comparisons at the biochemical and physiological level in particular environments with other existing genomic and genetic models [27]. As a result, T. parvula is such a plant. On one hand, it resembles the prototypical model Arabidopsis thaliana in development and evolution [28]. On the other hand, it is a plant with halophytic characters and exceptionally high mobilization of extreme environment, including salinity, cold, freezing temperatures, and the ability to grow in degraded soils [29]. With the T. parvula mitochondrial genome sequence presented in this study and also with reference to whole genome sequence of T. parvula [16], we will enlarge the exploration of genome sequence that favored extremophile adaptations in further researches. The reported Brassicaceae species including T. parvula and Arabidopsis thaliana are 255773 bp and 366924 bp in size, respectively, both of them with a little smaller mitochondrial genome among higher plants [6]. As expected, genes contents of these mitochondrial genomes and the completed sequences are highly conserved between T. parvula and A. thaliana. Large repeats numbers (>1 kb) are less in Brassicaceae (Arabidopsis and Thellungiella) than in other species [30]. The repeat sequence detected in the T. parvula mitochondrial genome could be used as molecular marker and in population genetics studies, genetic linkage map construction, and gene mapping. Phylogenetic analysis also proved the two clades of Angiospermae and the isolation of Magnoliopsida with Liliopsida. Moreover, it proved the place of T. parvula in crucifer. So far, it has also been reported that the mitochondrial genomes of these crucifers do not contain much nuclear sequence plastid genomes, which can be one of the reasons why Brassicaceae species mitochondrial genomes are small. It seems that pathways and functions related to stress could be different in evolutionarily stress-tolerant plants ofBrassicaceae [31]. As a result, the mitogenome of T. parvula will be a useful tool in exploring extreme condition tolerance and mechanisms of adaptive evolution. Additionally, within these differences, we expect to learn the unique lifestyle of T. parvula and its adaptation to essential ecological niche. The accomplishment of mitochondrial genome sequencing of both model plant and important crops in succession enlarged the higher plant mitochondrial genome database. However, there are still untypical plant mitogenomes. For instance, Citrullus melo L. has a large mitochondrial genome [32]. The mitogenome of Geraniaceae has the higher rate of nucleic acid replacement compared with homologous species [33]. These untypical mitochondrial genomes can also provide clues to the higher plants mitochondrial genome evolution mechanism. How the unknown mitochondrial genome sequences have emerged remains unsolved. We can carry out a research using mutants of crucifers related to genome distribution and the integrity of genome structure on the problems above in the future.

4. Methods

4.1. Plant Material and DNA Extraction

The original data are from GenBank of NCBI Genome Resources database (http://www.ncbi.nlm.nih.gov/genome?term=Thellungiella+parvula%5Borgn%5D&cmd=DetailsSearch). In their research, 10-day-old seedlings of T. parvula isolated total DNA are applied. The seeds were obtained from a single plant. At the collection site, the soil bulk density was 1.225 g/cm3 with 32.4% salts by weight [16]. By using the Nucleon Phytopure Genomic DNA Extraction kit (GE Healthcare), genomic DNA is prepared [16].

4.2. DNA Library Preparation and 454 Sequencing

The original data was constructed by both shotgun and paired-end libraries. To construct the two shotgun libraries, genomic DNA was nebulized to fragments of 500–800 bp randomly in size. Additional DNA was used to construct paired-end libraries with size spans of 3 kb (three libraries), 8 kb (two libraries), and 20 kb (two libraries). 454 Genome Sequencer FLX-Titanium was used to construct, amplify, and sequence all libraries according to the manufacturer's kits and protocols (454 Life Sciences) [16].

4.3. Mitogenome Sequence Assembly (2,155,540 Reads)

We expected to produce a gap-free, scaffold-level mitochondrial genome of the T. parvula through this research. Using Roche-454 FLX Titanium sequencing which has an advantage of read length, 2,155,540 reads with a total length of 6.2 G were obtained. The original sequence reads were a mixture of DNA with nucleus and other organelles. In order to assemble the mitogenome, we researched 359 plants mitochondrial genomes sequences from NCBI Organelle Genome Resources and found the common features; then, we used Blastn to isolate mitochondrion reads from the whole genome reads based on these completed reference mitogenomes. The sequences were assembled using Newbler 3.0 (454-Roche) with default parameters. During this process, we found 3801 contigs. Contigs with long length and high reads depth were separated with shorter ones [34]. The longest was 73000 bp in length. Contigs with reads depth between 50x and 100x may contain essential mitochondrial genes, so we filtered them out. To visualize the contigs connections, we used perl scripts and Newbler 3.0 generated file “454AllContigGraph.txt”. At the same time, according to reads depths of the contigs, false links and some wrong forks were removed manually [34]. Referring to the basis of the connecting map for the normal mitochondrion genome, we connected contigs to develop a circular mitochondrial genome and mapped them to Arabidopsis thaliana mitochondrial genome. After connecting 25 contigs, there were still some gaps. Contigs' gaps were filled up with a method like Ma et al.'s [35] study: first, we mapped mitochondrion reads onto both ends (3-60 bp) of the assembled contigs and then extended the contigs by joining the reads, which were partly overlapped (≥95% identical, e ≤ 1E − 30) with the contigs. Finally, a 255,773 bp nucleotide sequence was finished.

4.4. Genome Annotation

Genes were located using the web-based tool Public MITOFY Analysis [5], coupled with synonymous and nonsynonymous SNPs. tRNAscan-SE was used to identify the transfer RNA genes with default settings [36]. GC content was analyzed by perl script. The OGDraw version 1.1 was used to draw the circular mitochondrial genome map [37] (http://ogdraw.mpimp-golm.mpg.de/cgi-bin/ogdraw.pl).

4.5. Repeat Analysis

T. parvula mitochondrial genome tandem repeats were analyzed using Tandem Repeats Finder program [38] with default settings. Blastn alignments were used to identify and locate disperse repeats including the forward and palindromic repeats with a minimal length of 100 bp and over 90% identity between the two copies. For the simple sequence repeats (SSR), we used MISA [39] with the size of one to six nucleotides and the thresholds of eight, four, four, three, three, and three, respectively. Additionally, all kinds of repeats that were detected from the above programs were confirmed manually to simplify the results.

4.6. Phylogenetic Tree

The evolutionary tree of twenty species (Brassica napus (NC_008285), Brassica rapa subsp. campestris (NC_016125), Brassica juncea (NC_016123), Raphanus sativus (NC_018551), Brassica carinata (NC_016120), Arabidopsis thaliana (NC_001284), Carica papaya (NC_012116), Vigna radiata (NC_015121), Citrullus lanatus (NC_014043), Cucumis sativus (NC_016005), Cucurbita pepo (NC_014050), Nicotiana tabacum (NC_006581), Silene latifolia (NC_014487), Beta vulgaris subsp. vulgaris (NC_002511), Beta vulgaris subsp. maritima (NC_015099), Oryza sativa Indica (NC_007886), Sorghum bicolor (NC_008360), Tripsacum dactyloides (NC_008362), Zea perennis (NC_008331), and Cycas taitungensis (NC_010303)) has been inferred using the Neighbor-Joining method. Fifteen genes (atp1, atp9, ccmB, cob, cox1, cox3, nad1, nad3, nad4, nad4L, nad6, nad7, nad9, rps3, and rps4) were extracted by local perl scripts among these twenty-one plants. The evolutionary analyses including aligning multiple sequence by ClustalW, algorithm of Neighbor-Joining maximum likelihood (ML), Neighbor-Joining (NJ), and maximum parsimony (MP) used for tree building were conducted in MEGA 6.0 [19]. Numbers designate bootstrap (BP) index values (%) were calculated using 1000 replicates to estimate the support of the data for each internal branch of the phylogeny.
  39 in total

1.  Small, repetitive DNAs contribute significantly to the expanded mitochondrial genome of cucumber.

Authors:  J W Lilly; M J Havey
Journal:  Genetics       Date:  2001-09       Impact factor: 4.562

Review 2.  Life at the extreme: lessons from the genome.

Authors:  Dong-Ha Oh; Maheshi Dassanayake; Hans J Bohnert; John M Cheeseman
Journal:  Genome Biol       Date:  2012       Impact factor: 13.583

3.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

4.  The genome of the extremophile crucifer Thellungiella parvula.

Authors:  Maheshi Dassanayake; Dong-Ha Oh; Jeffrey S Haas; Alvaro Hernandez; Hyewon Hong; Shahjahan Ali; Dae-Jin Yun; Ray A Bressan; Jian-Kang Zhu; Hans J Bohnert; John M Cheeseman
Journal:  Nat Genet       Date:  2011-08-07       Impact factor: 38.330

5.  The complete chloroplast genome sequence of Mahonia bealei (Berberidaceae) reveals a significant expansion of the inverted repeat and phylogenetic relationship with other angiosperms.

Authors:  Ji Ma; Bingxian Yang; Wei Zhu; Lianli Sun; Jingkui Tian; Xumin Wang
Journal:  Gene       Date:  2013-07-27       Impact factor: 3.688

6.  The complete mitochondrial genome of the Rhodeus shitaiensis (Teleostei, Cypriniformes, Acheilognathidae).

Authors:  Fan Li; Kwang-Tsao Shao; Yeong-Shin Lin; Chia-Hao Chang
Journal:  Mitochondrial DNA       Date:  2013-09-19

7.  Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines.

Authors:  W Powell; M Morgante; R McDevitt; G G Vendramin; J A Rafalski
Journal:  Proc Natl Acad Sci U S A       Date:  1995-08-15       Impact factor: 11.205

8.  Mitochondria and apoptosis: a quick take on a long view.

Authors:  Melissa J Parsons; Douglas R Green
Journal:  F1000 Biol Rep       Date:  2009-02-24

9.  Development of Simple Sequence Repeats (SSR) markers in Setaria italica (Poaceae) and cross-amplification in related species.

Authors:  Heng-Sheng Lin; Chih-Yun Chiang; Song-Bin Chang; Chang-Sheng Kuoh
Journal:  Int J Mol Sci       Date:  2011-11-11       Impact factor: 5.923

10.  Evolutionary dynamics of the mitochondrial genome in the evaniomorpha (hymenoptera)—a group with an intermediate rate of gene rearrangement.

Authors:  Meng Mao; Tracey Gibson; Mark Dowton
Journal:  Genome Biol Evol       Date:  2014-07       Impact factor: 3.416

View more
  3 in total

Review 1.  Halophytism: What Have We Learnt From Arabidopsis thaliana Relative Model Systems?

Authors:  Yana Kazachkova; Gil Eshel; Pramod Pantha; John M Cheeseman; Maheshi Dassanayake; Simon Barak
Journal:  Plant Physiol       Date:  2018-09-20       Impact factor: 8.340

2.  Organellar genome assembly methods and comparative analysis of horticultural plants.

Authors:  Xuelin Wang; Feng Cheng; Dekai Rohlsen; Changwei Bi; Chunyan Wang; Yiqing Xu; Suyun Wei; Qiaolin Ye; Tongming Yin; Ning Ye
Journal:  Hortic Res       Date:  2018-01-10       Impact factor: 6.793

3.  Characterization of the complete chloroplast genome of Camellia brevistyla, an oil-rich and evergreen shrub.

Authors:  Yupeng Wang; Jiyuan Li; Zhengqi Fan; Dongyang Wu; Hengfu Yin; Xinlei Li
Journal:  Mitochondrial DNA B Resour       Date:  2020-01-21       Impact factor: 0.658

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.