| Literature DB >> 28500291 |
Ze Li1,2, Hongxu Long1,2, Lin Zhang1,2, Zhiming Liu1,2,3, Heping Cao4, Mingwang Shi5, Xiaofeng Tan6,7.
Abstract
Tung tree (Vernicia fordii) is an economically important tree widely cultivated for industrial oil production in China. To better understand the molecular basis of tung tree chloroplasts, we sequenced and characterized its genome using PacBio RS II sequencing platforms. The chloroplast genome was sequenced with 161,528 bp in length, composed with one pair of inverted repeats (IRs) of 26,819 bp, which were separated by one small single copy (SSC; 18,758 bp) and one large single copy (LSC; 89,132 bp). The genome contains 114 genes, coding for 81 protein, four ribosomal RNAs and 29 transfer RNAs. An expansion with integration of an additional rps19 gene in the IR regions was identified. Compared to the chloroplast genome of Jatropha curcas, a species from the same family, the tung tree chloroplast genome is distinct with 85 single nucleotide polymorphisms (SNPs) and 82 indels. Phylogenetic analysis suggests that V. fordii is a sister species with J. curcas within the Eurosids I. The nucleotide sequence provides vital molecular information for understanding the biology of this important oil tree.Entities:
Mesh:
Year: 2017 PMID: 28500291 PMCID: PMC5431841 DOI: 10.1038/s41598-017-02076-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Gene map of tung tree chloroplast genome from PacBio RS II platform. The thick lines indicate the inverted repeats (IRa and IRb) which separate the genome into large single copy (LSC) and small single copy (SSC) regions. Genes shown in the inner side of the circle are transcribed clockwise, and those located on the outside of the circle are transcribed counter-clockwise.
Characteristics of tung tree plastome genome.
| Sequence region | Length (bp)/Percent (%) |
|---|---|
| Total cp genome | 161,528 (100.00) |
| LSC | 89,132 (55.18) |
| SSC | 18,758 (11.61) |
| IR | 26,819 (16.60) |
| Coding regions | 91,388 (57.20) |
| Protein-coding regions | 82,034 (50.79) |
| Introns | 17,821 (11.03) |
| rRNA | 9,048 (5.60) |
| tRNA | 2,742 (1.70) |
| IGS | 52,599 (32.56) |
|
|
|
| Overall GC size | 58,188 (36.02) |
| Overall A size | 52,378 (32.43) |
| Overall T size | 50,962 (31.55) |
| Overall G size | 29,615 (18.33) |
| Overall C size | 28,573 (17.69) |
| GC content in protein-coding regions | 30,780 (37.52) |
| GC content in IGS | 15,394 (29.27) |
| GC content in introns | 6,595 (37.01) |
| GC content in tRNA | 2,742 (53.17) |
| GC content in rRNA | 5,014 (55.42) |
|
|
|
| Total genes | 135 |
| Protein-coding genes | 81 |
| rRNA genes | 4 |
| tRNA genes | 29 |
| Genes with introns | 16 |
| Genes duplicated by IR | 21 |
Genes locating on tung tree cp genome.
| Gene categories | Groups of genes | Name of genes |
|---|---|---|
| Genes for photosynthesis | Subunits of photosystem I |
|
| Subunits of photosystem II |
| |
| Subunits of ATP synthase |
| |
| Subunits of cytochrome b/f complex |
| |
| Subunits of NADH-dehydrogenase |
| |
| Large subunit of RuBisco |
| |
| Self replication | Ribosomal RNAs |
|
| Transfer RNAs |
| |
| Proteins of small ribosomal subunit |
| |
| Proteins of large ribosomal subunit |
| |
| Subunits of RNA polymerase |
| |
| Other genes | Acetyl-CoA carboxylase |
|
| Cytochrome c biogenesis |
| |
| Envelope membrane protein |
| |
| Maturase |
| |
| Protease |
| |
| Translation initiation factor |
| |
| Unknown | Conserved hypothetical chloroplast reading frames |
|
aGenes located in the IR regions.
bGenes having introns.
Comparison of general features of Euphorbiaceae plastid genomes.
| Genome feature |
|
|
|
|
|---|---|---|---|---|
| Total length (bp) | 161528 | 163856 | 161191 | 161453 |
| LSC length (bp) | 89132 | 91756 | 89209 | 89295 |
| SSC length (bp) | 18758 | 17852 | 18362 | 18250 |
| IR length (bp) | 26819 | 27124 | 26810 | 26954 |
| GC content (%) | 36.02 | 35.36 | 35.74 | 35.87 |
| Total genes | 135 | 130 | 128 | 128 |
| Genes duplicated in IR | 21 | 17 | 19 | 16 |
| rRNA gene duplicated in IR | 4 | 4 | 4 | 4 |
| Protein gene | 81 | 78 | 78 | 79 |
| tRNA gene | 29 | 28 | 30 | 30 |
| rRNA gene | 4 | 4 | 4 | 4 |
Repeat sequences in the tung tree cp genome.
| No. | Length (bp) | Repeat type | Repeat 1 start position | Repeat 2 start position | Repeat 1 location | Repeat 2 location |
|---|---|---|---|---|---|---|
| 1 | 22 | C | 48854 | 78423 | trnfM-CAU_trnS-UGA | trnfM-CAU_trnS-UGA |
| 2 | 26 | C | 104466 | 146168 | ycf15_trnV-GAC | ycf15_trnV-GAC |
| 3 | 21 | F | 7396 | 146181 | rpoA_psbN | rpoA_psbN |
| 4 | 24 | F | 25372 | 25405 | psbJ_atpB | psbJ_atpB |
| 5 | 67 | F | 33717 | 33766 | trnV-UAC_ndhC | trnV-UAC_ndhC |
| 6 | 29 | F | 33738 | 33833 | trnV-UAC_ndhC | trnV-UAC_ndhC |
| 7 | 29 | F | 33787 | 33833 | trnV-UAC_ndhC | trnV-UAC_ndhC |
| 8 | 35 | F | 43695 | 45919 | psaA | psaA |
| 9 | 53 | F | 53710 | 75960 | trnS-UGA_trnE-UUC | trnS-UGA_trnE-UUC |
| 10 | 23 | F | 55290 | 55312 | trnS-UGA_trnE-UUC | trnS-UGA_trnE-UUC |
| 11 | 23 | F | 56751 | 56772 | trnD-GUC_psbM | trnD-GUC_psbM |
| 12 | 30 | F | 78078 | 78104 | atpA_trnS-GCU | atpA_trnS-GCU |
| 13 | 26 | F | 78159 | 78176 | atpA_trnS-GCU | atpA_trnS-GCU |
| 14 | 34 | F | 78333 | 78353 | atpA_trnS-GCU | atpA_trnS-GCU |
| 15 | 25 | F | 83781 | 83806 | rps16_trnK-UUU | rps16_trnK-UUU |
| 16 | 22 | F | 83784 | 83831 | rps16_trnK-UUU | rps16_trnK-UUU |
| 17 | 23 | F | 83809 | 83831 | rps16_trnK-UUU | rps16_trnK-UUU |
| 18 | 62 | F | 96620 | 96656 | ycf2 | ycf2 |
| 19 | 26 | F | 96620 | 96692 | ycf2 | ycf2 |
| 20 | 26 | F | 117126 | 117141 | ycf1 | ycf1 |
| 21 | 25 | F | 134591 | 134653 | ndhF_trnN-GUU | ndhF_trnN-GUU |
| 22 | 62 | F | 153942 | 153978 | trnL-CAA_trnI-CAU | trnL-CAA_trnI-CAU |
| 23 | 26 | F | 153942 | 154014 | trnL-CAA_trnI-CAU | trnL-CAA_trnI-CAU |
| 24 | 21 | P | 7396 | 104458 | rpoA_psbN | rpoA_psbN |
| 25 | 24 | P | 22678 | 22678 | psbJ_atpB | psbJ_atpB |
| 26 | 22 | P | 33682 | 33682 | trnV-UAC_ndhC | trnV-UAC_ndhC |
| 27 | 29 | P | 39744 | 80476 | rps4_ycf3 | rps4_ycf3 |
| 28 | 21 | P | 39749 | 49846 | rps4_ycf3 | rps4_ycf3 |
| 29 | 26 | P | 48023 | 48023 | trnfM-CAU_trnS-UGA | trnfM-CAU_trnS-UGA |
| 30 | 22 | P | 48862 | 79778 | trnfM-CAU_trnS-UGA | trnfM-CAU_trnS-UGA |
| 31 | 52 | P | 57302 | 57302 | psbM_rpoB | psbM_rpoB |
| 32 | 22 | P | 74732 | 74732 | atpH_atpF | atpH_atpF |
| 33 | 58 | P | 89003 | 89003 | trnH-GUG_ycf2 | trnH-GUG_ycf2 |
| 34 | 62 | P | 96620 | 153942 | ycf2 | ycf2 |
| 35 | 26 | P | 96620 | 153942 | ycf2 | ycf2 |
| 36 | 62 | P | 96656 | 153978 | ycf2 | ycf2 |
| 37 | 26 | P | 96692 | 154014 | ycf2 | ycf2 |
| 38 | 22 | P | 117973 | 117973 | ycf1 | ycf1 |
| 39 | 28 | P | 131410 | 131410 | ndhD_ndhF | ndhD_ndhF |
| 40 | 23 | R | 4544 | 4544 | rpl36_rps11 | rpl36_rps11 |
| 41 | 21 | R | 19798 | 19798 | trnW-CCA_psbE | trnW-CCA_psbE |
| 42 | 21 | R | 22672 | 22672 | psbJ_atpB | psbJ_atpB |
| 43 | 24 | R | 48637 | 48637 | trnfM-CAU_trnS-UGA | trnfM-CAU_trnS-UGA |
| 44 | 26 | R | 65276 | 65276 | rpoC1 | rpoC1 |
| 45 | 22 | R | 78009 | 78009 | atpA_trnS-GCU | atpA_trnS-GCU |
| 46 | 22 | R | 78029 | 78061 | atpA_trnS-GCU | atpA_trnS-GCU |
| 47 | 31 | R | 78030 | 78030 | atpA_trnS-GCU | atpA_trnS-GCU |
| 48 | 26 | R | 104466 | 104466 | ycf15_trnV-GAC | ycf15_trnV-GAC |
| 49 | 26 | R | 146168 | 146168 | trnN-GUU_rps7 | trnN-GUU_rps7 |
C: complement repeats, F: forward repeats, P: palindrome repeats, R: reverse repeats.
Figure 2The variation analysis within intergenic spacer (IGS) regions between V. fordii and J. curcas or M. esculenta.
Figure 3Comparison of the border regions of LSC, IR and SSC among six chloroplast genomes of basal eudicots.
Figure 4The maximum parsimony (MP) phylogenetic tree based on 36 protein-coding genes in the chloroplast genome. The numbers in each node was tested by bootstrap analysis with 1000 replicates.