| Literature DB >> 22606240 |
Abstract
Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22606240 PMCID: PMC3351433 DOI: 10.1371/journal.pone.0035872
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The gene map of Sesamum indicum cp genome.
A pair of thick lines at the inside circle represents the inverted repeats (IRa and IRb; 25,141 bp each), which separate the large single copy region (LSC; 85,170 bp) from the small single copy region (SSC; 17,872 bp). Genes drawn inside the circle are transcribed clockwise, while those drawn outside the circle are transcribed counterclockwise. Intron-containing genes are marked by asterisks. The numbers at the outmost circle indicate the locations of 7 repeats including direct (black number), palimdromic (blue number), and dispersed repeats (red numbers), respectively (cf. Table 4).
Genes contained in the Sesamum indicum cp genome (total 114 genes).
| Category for genes | Group of genes | Name of genes |
| Self replication | rRNA genes | rrn16(×2), rrn23(×2), rrn4.5(×2), rrn5(×2) |
| tRNA genes | 30 trn genes (6 contain an intron, 7 in the IR regions) | |
| Small subunit of ribosome | rps2, rps3, rps4, rps7(×2), rps8, rps11, rps12(*), rps14, rps15, rps16*, rps18, rps19 | |
| Large subunit of ribosome | rpl2*( ×2), rpl14, rpl16*, rpl20, rpl22, rpl23(×2),rpl32, rpl33, rpl36 | |
| DNA dependent RNA polymerase | rpoA, rpoB, rpoC1*, rpoC2 | |
| Genes for photosynthesis | Subunits of NADH-dehydrogenase | ndhA*, ndhB*(×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK |
| Subunits of photosystem I | psaA, psaB, psaC, psaI, psaJ, ycf3** | |
| Subunits of photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI,psbJ, psbK, psbL, psbM, psbN, psbT, psbZ | |
| Subunits of cytochrome b/f complex | petA, petB*, petD*, petG, petL, petN | |
| Subunits of ATP synthase | atpA, atpB, atpE, atpF*, atpH, atpI | |
| Large subunit of rubisco | rbcL | |
| Other genes | Translational initiation factor | infA |
| Maturase | matK | |
| Protease | clpP** | |
| Envelope membrane protein | cemA | |
| Subunit of Acetyl-CoA-carboxylase | accD | |
| c-type cytochrom synthesis gene | ccsA | |
| Genes of unknown functions | Open Reading Frames (ORF, ycf) | ycf1, ycf2(×2), ycf4, ycf15(×2) |
One and two asterisks after gene names reflect one- and two-intron containing genes, respectively. Genes located in the IR regions are indicated by the (x2) symbol after the gene name. The rps12 gene is divided: the 5′-rps12 is located in the LSC region and the 3′-rps12 in the IR region.
Distribution of large repeat loci in the Sesamum indicum cp genome.
| Repeat Number | Size (bp) | Repeat | Location | Repeat Unit | Region |
|
| 72 | direct | CDS( |
| IRb,a |
|
| 44 | palindromic |
|
| LSC |
|
| 41 | palindromic dispersed repeats | IGS( |
| IR,SSC |
|
| 33 | palindromic | IGS( |
| LSC |
|
| 30 | palindromic dispersed repeats |
|
| LSC |
|
| 30 | palindromic dispersed repeats | CDS( |
| IRb,a |
|
| 26 | direct | CDS( |
| IRb,a |
The repeat units larger than 22 bp are presented in this table and the locations are presented on the Figure 1. The underline represents the SSR in the CDS and the bold numbers represent the shared SSR with Olea.
Base compositions in the Sesamum indicum cp genome.
| T(U) | C | A | G | Sequence lengths(bp) | ||
| LSC region | 32.5% | 18.6% | 31.1% | 17.8% | 85,170 | |
| IRa region | 28.4% | 22.5% | 28.2% | 20.9% | 25,141 | |
| IRb region | 28.2% | 20.9% | 28.4% | 22.5% | 25,141 | |
| SSC region | 33.8% | 17.0% | 33.8% | 15.5% | 17,872 | |
| Total | 31.3% | 19.4% | 30.5% | 18.8% | 153,324 | |
| Protein coding genes (CDS) | 31.5% | 17.6% | 30.4% | 20.5% | 68,097 | |
| 1st position | 23.0% | 18.8% | 30.2% | 27.5% | 22,699 | |
| 2nd position | 33.0% | 20.6% | 28.8% | 18.0% | 22,699 | |
| 3rd position | 38.0% | 13.6% | 32.1% | 16.1% | 22,699 |
Figure 2The comparison of the LSC, IR and SSC border regions among five cp genomes.
Figure 3The comparisons of 19 intron regions of the chloroplast genomes in the three different comparisons of Sesamum vs. Olea, Sesamum vs. Nicotiana, and Sesamum vs. Panax. Y axis indicates the sequence divergences.
Figure 4Small inversion mutations and associated secondary structures between the cp genomes of Sesamum (S) and the cp genome of Olea (O).
Distribution of simple sequence repeat (SSR) loci in the Sesamum indicum cp genome.
| Base | Length | No. SSRs | Coodinated Basepairs* |
|
| 10 | 6 | 239−248, 4,381−4,390, 8,578−8,587, 72,464−72,473, 121,267−121,276, 135,512−135,521 |
|
| 10 | 2 | 51,903−51,912, 66,525−66,534 |
|
| 11 | 1 | 36,699−36,709 |
|
| 10 | 11 |
|
|
| 11 | 1 | 81,161−81,171 |
|
| 10 | 1 |
|
|
| 12 | 1 | 42,984−42,995 |
|
| 10 | 1 | 31,905−31,914 |
|
| 12 | 1 |
|
|
| 12 | 1 | 55,471−55,482 |
|
| 12 | 1 | 23,018−23,029 |
|
| 12 | 1 |
|
The coordinated basepairs are the nucleotide number positions starting at the IRa/LSC junction (Figure 1). The underline represents the SSR in the CDS and the bold numbers represent the shared SSR with Olea chloroplast genome.
Figure 5A maximum likelihood tree (-lnL = 428640.9970) of the asterid clade of angiosperm using whole chloroplast genome sequences.
The tree was polarized by two outgroup taxa, Spinacea and Arabidopsis. The GTR+G+I base substitution model was adopted based on the Modeltest. Molecular clock was calibrated using two internal splitting points of the members of Araliaceae (70 mya) and Solanaceae (53 mya). The numbers above each node indicate the Bayesian support percentages. Taxon abbreviations are Solanum(bu): Solanum bulbocastanum, Solanum(tu): Solanum tuberosum, Solanum(ly): Solanum lycopersicum, Nicotiana(sy): Nicotiana sylvestris, Nicotiana(ta): Nicotiana tabacum, Nicotiana(to): Nicotiana tomentosiformis, Olea(eb): Olea europaea cv. bianchera, Olea(ee): Olea europaea subsp. europaea, Olea(em): Olea europaea subsp. maroccana, Olea(ec): Olea europaea subsp. Cuspidate and Olea(ew): Olea europaea subsp. Woodiana, respectively.
Comparisons of protein coding genes (CDS), introns, and intergenic spacers (IGS) at the IR, LSC, and SSC regions of the chloroplast genomes.
| Sesame/Olea | Sesame/Nicotiana | Sesamum/Panax | ||||||||||||||||||||
| Region | NG | LD (IE) | NP | ND | Ks | Ka | Ka/Ks | NG | LD (IE) | NP | ND | Ks | Ka | Ka/Ks | NG | LD (IE) | NP | ND | Ks | Ka | Ka/Ks | |
| CDS | LSC | 62 | 60 (22) | 1417 | 0.0323 | 0.0815 | 0.0189 | 0.23 | 62 | -306 (22) | 2471 | 0.0564 | 0.1686 | 0.0280 | 0.17 | 62 | -21 (21) | 2711 | 0.0617 | 0.1511 | 0.0404 | 0.27 |
| IR | 12 | -536 (17) | 127 | 0.0091 | 0.0113 | 0.0082 | 0.73 | 12 | -659 (17) | 212 | 0.0152 | 0.0270 | 0.0170 | 0.63 | 12 | -187 (17) | 233 | 0.0167 | 0.0296 | 0.0182 | 0.61 | |
| SSC | 12 | -213 (30) | 1004 | 0.0706 | 0.1398 | 0.0558 | 0.40 | 12 | -393 (30) | 1599 | 0.1125 | 0.2534 | 0.0878 | 0.35 | 12 | -330 (30) | 1699 | 0.1198 | 0.2447 | 0.1010 | 0.41 | |
| TOTAL | 86 | -689 (69) | 2548 | 0.0353 | 0.0660 | 0.0276 | 0.42 | 86 | -1358 (69) | 4282 | 0.0595 | 0.1630 | 0.0388 | 0.24 | 86 | -538 (68) | 4643 | 0.0644 | 0.1502 | 0.0497 | 0.33 | |
| Intron | LSC | 13 | -8 | 492 | 0.0546 | - | - | - | 13 | -309 | 836 | 0.0939 | - | - | - | 13 | -288 | 922 | 0.1035 | - | - | - |
| IR | 5 | -4 | 22 | 0.0061 | - | - | - | 5 | 347 | 35 | 0.0106 | - | - | - | 5 | 17 | 54 | 0.0149 | - | - | - | |
| SSC | 1 | -18 | 92 | 0.0868 | - | - | - | 1 | -68 | 141 | 0.1326 | - | - | - | 1 | 57 | 128 | 0.1292 | - | - | - | |
| TOTAL | 19 | -30 | 606 | 0.0442 | - | - | - | 19 | -30 | 1012 | 0.0763 | - | - | - | 19 | -214 | 1104 | 0.0817 | - | - | - | |
| IGS | LSC | 81 | -1425 | 2416 | 0.0826 | - | - | - | 81 | -675 | 3982 | 0.1462 | - | - | - | 82 | -221 | 4074 | 0.1494 | - | - | - |
| IR | 19 | -117 | 106 | 0.0202 | - | - | - | 19 | 99 | 180 | 0.0337 | - | - | - | 19 | -101 | 212 | 0.0399 | - | - | - | |
| SSC | 12 | 126 | 105 | 0.1401 | - | - | - | 12 | -397 | 610 | 0.2039 | - | - | - | 12 | -627 | 571 | 0.2082 | - | - | - | |
| TOTAL | 112 | -1416 | 326 | 0.0760 | - | - | - | 112 | -973 | 4772 | 0.1343 | - | - | - | 113 | -949 | 4857 | 0.1379 | - | - | - | |
| TOTAL | 217 | -2135 | 3480 | 0.0487 | - | - | - | 217 | -2361 | 10066 | 0.0901 | - | - | - | 218 | -1701 | 10604 | 0.0946 | - | - | - | |
This is a summary table of each calculation from three different comparisons of Sesamum vs.Olea, Sesamum vs. Nicotiana, and Sesamum vs. Panax. The rps 12 gene is included in the LSC region. Abbreviations: NG, The numbers of genes; LD, the length differences; IE, the indel events; NP, the numbers of polymorphic sites; ND, the nucleotide differences; Ks, the synonymous substitution differences; and Ka, the nonsynonymous substitution differences.
Figure 6The levels of evolutionary divergences among the SSC, LSC, and IR regions of cp genomes
. Y-axis represents the sequence divergences. The IR region evolves slower than the SSC or the LSC regions regardless the CDS, intron and IGS.
Figure 7The correlation pattern of indel numbers and indel sizes among three cp genomes.
The X-axis and Y-axis represent the indel sizes in base pair and indel numbers, respectively.