| Literature DB >> 31289703 |
Wan Zhang1, Yunlin Zhao1, Guiyan Yang1,2, Jiao Peng1, Shuwen Chen2, Zhenggang Xu1,3.
Abstract
Camellia oleifera is one of the four largest woody edible oil plants in the world with high ecological and medicinal values. Due to frequent interspecific hybridization, it was difficult to study its genetics and evolutionary history. This study used C. oleifera that was collected on Hainan Island to conduct our research. The unique island environment makes the quality of tea oil higher than that of other species grown in the mainland. Moreover, a long-term geographic isolation might affect gene structure. In order to better understand the molecular biology of this species, protect excellent germplasm resources, and promote the population genetics and phylogenetic studies of Camellia plants, high-throughput sequencing technology was used to obtain the chloroplast genome sequence of Hainan C. oleifera. The results showed that the whole chloroplast genome of C. oleifera in Hainan was 156,995 bp in length, with a typical quadripartite structure of a large single copy (LSC) region of 86,648 bp, a small single copy (SSC) region of 18,297 bp, and a pair of inverted repeats (IRs) of 26,025 bp. The whole genome encoded a total of 141 genes (115 different genes), including 88 protein-coding genes, 45 tRNA genes, and eight rRNA genes. Among these genes, nine genes contained one intron, two genes contained two introns, and four overlapping genes were also detected. The total GC content of Hainan C. oleifera's chloroplast genome was 37.29%. The chloroplast genome structure characteristics of Hainan C. oleifera were compared with mainland C. oleifera and those of the other eight closely related Theaceae species; it was found that the contractions and expansions of the IR/LSC and IR/SSC regions affected the length of chloroplast genome. The chloroplast genome sequences of these Theaceae species were highly similar. A comparative analysis indicated that the Theaceae species were conserved in structure and evolution. A total of 51 simple sequence repeat (SSR) loci were detected in the chloroplast genome of Hainan C. oleifera, and all Camellia plants did not have pentanucleotide repeats, which could be used as a good marker in phylogenetic studies. We also detected seven long repeats, the base composition of all repeats was biased toward A/T, which was consistent with the codon bias. It was found that Hainan C. oleifera had a similar evolutionary relationship with C. crapnelliana, through the use of codons and phylogenetic analysis. This study can provide an effective genomic resource for the evolutionary history of Theaceae family.Entities:
Keywords: Camellia oleifera; Chloroplast genome; Codon usage; Evolution pressure; Island plant; Repeat analysis; SSR
Year: 2019 PMID: 31289703 PMCID: PMC6599451 DOI: 10.7717/peerj.7210
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Gene map of the Hainan C. oleifera chloroplast genome.
The outermost colored blocks represent the physical location of different genes on the chloroplast genome, the inner circle is the physical location of the LSC, SSC, and IR regions on the genome and the different colors represent genes of different functional classes. Genes distributed outside the circle are transcribed counterclockwise, whereas genes inside are transcribed clockwise. The dark gray in the inner circle represents GC content, and the light gray represents AT content.
Genes present in the Hainan C. oleifera chloroplast genome.
| Group of genes | Gene names |
|---|---|
| Photosystem I | |
| Photosystem II | |
| Cytochrome b/f complex | |
| ATP synthase | |
| NADH dehydrogenase | |
| RubisCO large subunit | |
| RNA polymerase | |
| Ribosomal proteins (SSU) | |
| Ribosomal proteins (LSU) | |
| Proteins of unknown function | |
| Transfer RNAs | |
| Ribosomal RNAs | |
| Other genes |
Notes:
One or two asterisks after genes indicate that gene contains one or two introns, respectively.
The numbers in parentheses indicate the copy number of the gene.
The basic composition of the Hainan C. oleifera chloroplast genome and other nine Theaceae plants.
| Total sequence length (bp)/GC content | LSC length/GC content | SSC length/GC content | IR length/GC content | Number of genes | Protein-coding genes | |
|---|---|---|---|---|---|---|
| 156,995/37.29% | 86,648/35.29% | 18,297/30.55% | 26,025/42.98% | 141 | 88 | |
| 156,971/37.31% | 86,515/35.30% | 18,288/30.54% | 26,084/42.98% | 133 | 87 | |
| 157,166/37.30% | 86,719/35.32% | 18,293/30.59% | 26,077/42.96% | 133 | 91 | |
| 157,127/37.29% | 86,656/35.32% | 18,285/30.52% | 26,093/42.94% | 127 | 91 | |
| 157,103/37.31% | 86,645/35.34% | 18,276/30.54% | 26,091/42.95% | 135 | 91 | |
| 157,102/37.30% | 86,647/35.32% | 18,275/30.58% | 26,090/42.95% | 126 | 91 | |
| 157,076/37.30% | 86,649/35.33% | 18,279/30.54% | 26,074/42.95% | 127 | 91 | |
| 156,997/37.30% | 86,655/35.30% | 18,406/30.60% | 25,968/43.01% | 136 | 91 | |
| 156,903/37.32% | 86,568/35.34% | 18,203/30.63% | 26,066/42.96% | 133 | 92 | |
| 156,576/37.34% | 86,204/35.36% | 18,259/30.59% | 26,056/42.98% | 135 | 92 |
Figure 2Comparison of the border positions of LSC, SSC, and IRs regions in chloroplast genome sequences of 10 Theaceae species.
Figure 3Visualization alignments of chloroplast genome sequences among 10 Theaceae species with Hainan C. oleifera chloroplast genome as a reference.
The abscissa represents the position coordinates of the chloroplast genome of Hainan C. oleifera, and the ordinate represents the sequence similarity of the sample genome to the reference genome. Arrows indicate the annotated gene and its transcriptional direction, blue for the protein coding sequence (exons), green for tRNA or rRNA, and red for the conserved non-coding sequence (CNS).
Figure 4Base sequence alignment of matK gene and trnH-psbA intergenic region between different C. oleifera.
(A) Partial nucleotide sequence of matK gene; (B) partial nucleotide sequence of trnH-psbA intergenic region.
Simple sequence repeats in Hainan C. oleifera chloroplast genome.
| ID | Repeat unit | Repeat number | Length (bp) | Start | End | Region | Annotation |
|---|---|---|---|---|---|---|---|
| 1 | TA | 4 | 8 | 2330 | 2337 | LSC | |
| 2 | AT | 4 | 8 | 4652 | 4659 | LSC | |
| 3 | AGAT | 3 | 12 | 6696 | 6707 | LSC | |
| 4 | TC | 4 | 8 | 9184 | 9191 | LSC | |
| 5 | GTCT | 3 | 12 | 11990 | 12001 | LSC | |
| 6 | AT | 4 | 8 | 20077 | 20084 | LSC | |
| 7 | AT | 5 | 10 | 20842 | 20851 | LSC | |
| 8 | AT | 4 | 8 | 21871 | 21878 | LSC | |
| 9 | GA | 4 | 8 | 30227 | 30234 | LSC | |
| 10 | AG | 4 | 8 | 31910 | 31917 | LSC | |
| 11 | TCTT | 3 | 109 | 34002 | 34110 | LSC | |
| 12 | GA | 4 | 8 | 37363 | 37370 | LSC | |
| 13 | AT | 4 | 8 | 38208 | 38215 | LSC | |
| 14 | TTTC | 3 | 12 | 45247 | 45258 | LSC | |
| 15 | TA | 4 | 8 | 48399 | 48406 | LSC | |
| 16 | AT | 4 | 75 | 49339 | 49413 | LSC | |
| 17 | AT | 4 | 8 | 56936 | 56943 | LSC | |
| 18 | TA | 4 | 8 | 60906 | 60913 | LSC | |
| 19 | AAAT | 3 | 12 | 62713 | 62724 | LSC | |
| 20 | TC | 4 | 8 | 63432 | 63439 | LSC | |
| 21 | AT | 4 | 8 | 64367 | 64374 | LSC | |
| 22 | AT | 4 | 8 | 65980 | 65987 | LSC | |
| 23 | TTC | 4 | 12 | 70097 | 70108 | LSC | |
| 24 | TA | 4 | 8 | 70419 | 70426 | LSC | |
| 25 | AT | 4 | 8 | 79616 | 79623 | LSC | |
| 26 | TA | 4 | 8 | 80572 | 80579 | LSC | |
| 27 | AT | 5 | 10 | 84341 | 84350 | LSC | |
| 28 | AT | 4 | 8 | 85931 | 85938 | LSC | |
| 29 | TA | 5 | 10 | 87305 | 87314 | IRa | |
| 30 | GA | 4 | 8 | 88915 | 88922 | IRa | |
| 31 | GA | 4 | 8 | 89902 | 89909 | IRa | |
| 32 | TCTA | 3 | 12 | 94648 | 94659 | IRa | |
| 33 | TA | 4 | 8 | 95494 | 95501 | IRa | |
| 34 | AG | 4 | 8 | 97417 | 97424 | IRa | |
| 35 | TA | 4 | 8 | 99589 | 99596 | IRa | |
| 36 | CT | 4 | 8 | 108654 | 108661 | IRa | |
| 37 | CCCT | 3 | 12 | 110067 | 110078 | IRa | |
| 38 | AT | 4 | 8 | 116288 | 116295 | SSC | |
| 39 | GAAA | 3 | 12 | 118178 | 118189 | SSC | |
| 40 | AATA | 3 | 12 | 118328 | 118339 | SSC | |
| 41 | AAAT | 3 | 12 | 121323 | 121334 | SSC | |
| 42 | AT | 4 | 8 | 123859 | 123866 | SSC | |
| 43 | GAGG | 3 | 12 | 133565 | 133576 | IRb | |
| 44 | AG | 4 | 8 | 134983 | 134990 | IRb | |
| 45 | TA | 4 | 8 | 144048 | 144055 | IRb | |
| 46 | CT | 4 | 8 | 146220 | 146227 | IRb | |
| 47 | TA | 4 | 8 | 148143 | 148150 | IRb | |
| 48 | ATAG | 3 | 12 | 148984 | 148995 | IRb | |
| 49 | TC | 4 | 8 | 153735 | 153742 | IRb | |
| 50 | TC | 4 | 8 | 154722 | 154729 | IRb | |
| 51 | AT | 5 | 10 | 156329 | 156338 | IRb |
Long repeat sequences in the chloroplast genome of 10 Theaceae species.
| Type | Repeat sizes (bp) | Location | Region | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 56 | 56 | 56 | 56 | 56 | 48 | IRa | |||||
| P | 56 | 56 | 56 | 56 | 56 | 48 | IRa, IRb | |||||
| P | 48 | 48 | 60 | IRa, IRb | ||||||||
| F | 56 | 48 | 48 | 56 | 56 | 56 | 48 | IRb | ||||
| P | 48 | IGS ( | LSC, IRa | |||||||||
| F | 48 | IGS ( | LSC, IRb | |||||||||
| F | 47 | IGS ( | LSC, IRb | |||||||||
| P | 46 | IGS ( | LSC | |||||||||
| P | 46 | 46 | 42 | 42 | 42 | 42 | 46 | 46 | 46 | LSC | ||
| F | 42 | 42 | 41 | 42 | 42 | 42 | 42 | 42 | 42 | 42 | IGS ( | IRa, SSC |
| P | 42 | 42 | 41 | 42 | 42 | 42 | 42 | 42 | 42 | 42 | IGS ( | IRb, SSC |
| F | 38 | 38 | 38 | 38 | 38 | 38 | 38 | 38 | 38 | 30 | IRa | |
| P | 38 | 38 | 38 | 38 | 38 | 38 | 38 | IRa, IRb | ||||
| F | 38 | 38 | 30 | 30 | 38 | 38 | 38 | 38 | 30 | IRb | ||
| F | 38 | IGS ( | IRb | |||||||||
| P | 38 | IRb | ||||||||||
| P | 38 | IRa, IRb | ||||||||||
| F | 34 | IGS ( | IRa | |||||||||
| P | 34 | IGS ( | IRa, IRb | |||||||||
| F | 34 | IGS ( | IRb | |||||||||
| R | 32 | IGS ( | LSC | |||||||||
| R | 31 | IGS ( | LSC | |||||||||
| F | 31 | IGS ( | LSC | |||||||||
| F | 34 | IGS ( | LSC | |||||||||
| P | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | 30 | IGS ( | LSC |
| R | 30 | IGS ( | LSC | |||||||||
| F | 30 | IGS ( | LSC | |||||||||
| P | 30 | 30 | 30 | IRa, IRb | ||||||||
Figure 520 amino acid codon and stop codon of the island plant Hainan C. oleifera chloroplast genome.
The color of the histogram corresponds to the color of the codon.
Figure 6The distributions of codon usage in the form of heat maps for 10 Theaceae species.
Color indication: red represents the larger RSCU values and blue represents the smaller RSCU values.
Figure 7Phylogenetic relationships of 22 species of Camellia plants inferred from different data partitions.
(A) Whole chloroplast genome; (B) protein coding region; (C) LSC region; (D) SSC region Species with similar evolutionary relationships are displayed in the same color.