Literature DB >> 30126202

Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Quercus acutissima.

Xuan Li1, Yongfu Li2, Mingyue Zang3, Mingzhi Li4, Yanming Fang5.   

Abstract

Quercus acutissima, an important endemic and ecological plant of the Quercus genus, is widely distributed throughout China. However, there have been few studies on its chloroplast genome. In this study, the complete chloroplast (cp) genome of Q. acutissima was sequenced, analyzed, and compared to four species in the Fagaceae family. The size of the Q. acutissima chloroplast genome is 161,124 bp, including one large single copy (LSC) region of 90,423 bp and one small single copy (SSC) region of 19,068 bp, separated by two inverted repeat (IR) regions of 51,632 bp. The GC content of the whole genome is 36.08%, while those of LSC, SSC, and IR are 34.62%, 30.84%, and 42.78%, respectively. The Q. acutissima chloroplast genome encodes 136 genes, including 88 protein-coding genes, four ribosomal RNA genes, and 40 transfer RNA genes. In the repeat structure analysis, 31 forward and 22 inverted long repeats and 65 simple-sequence repeat loci were detected in the Q. acutissima cp genome. The existence of abundant simple-sequence repeat loci in the genome suggests the potential for future population genetic work. The genome comparison revealed that the LSC region is more divergent than the SSC and IR regions, and there is higher divergence in noncoding regions than in coding regions. The phylogenetic relationships of 25 species inferred that members of the Quercus genus do not form a clade and that Q. acutissima is closely related to Q. variabilis. This study identified the unique characteristics of the Q. acutissima cp genome, which will provide a theoretical basis for species identification and biological research.

Entities:  

Keywords:  Quercus; chloroplast genome; phylogenetic relationship

Mesh:

Substances:

Year:  2018        PMID: 30126202      PMCID: PMC6121628          DOI: 10.3390/ijms19082443

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

Oak trees provide humans with materials used in food, clothing, and houses, while oak forests supply living organisms and animals with comfortable habitats, good air, and sufficient and pure moisture. Oak trees are linked to Chinese culture, and are also often called eucalyptus or pecking trees. In China, eucalyptus is regarded as a mysterious tree, growing silently, watching its ancestors forge ahead, and passing through generation to generation. Many countries regard oaks as sacred trees, and consider them to be magical and a symbol of longevity, strength, and pride. The genus Quercus L. (Oak) contains more than 400 species that are widespread in the northern hemisphere [1]. These species play important roles in China’s forest ecosystem. Quercus L. (Oak)’s taxonomy, genetic structure, and breeding is complicated because of its wide variety of species, diverse forms, complex habitat conditions, and gene exchanges between species. Many studies have used nuclear simple sequence repeat (SSR) chloroplast DNA makers to study phylogeny and population variation [2,3]. Previously, studies found a conflict (inconsistency) between the phylogeny of plastid data and nuclear data in Senecioneae and Neotropical Catasetinae [4,5]. Therefore, it is not sufficient to study Quercus simply by using plastid regions. With the rapid development of next-generation sequencing, genome acquisition is now cheaper and faster than traditional Sanger sequencing. Complete chloroplast (cp) genome size data will be necessarily used to infer the phylogenetic relationship of Quercus or Fagaceae in future studies. The genus is characterized by a high variability of morphological and ecological traits, the occurrence of mixed stands, the presence of large population sizes, and high levels of gene flow within the Quercus complex [6,7,8,9,10,11]. A new classification of Quercus L. was proposed by Denk with eight sections: Cyclobalanopsis, Cerris, Ilex, Lobatae, Quercus, Ponticae, Protobalanus, and Virentes [12]. In China, Quercus is divided into five morphology-based sections: Quercus, Aegilops, Heterobalanus, Engleriana, and Echinolepides [13,14,15]. Due to incomplete sampling and the use of markers with insufficient phylogenetic signals and complex evolutionary problems, the relationships among Quercus species are not fully understood. Q. acutissima is an ecological and economic tree species in deciduous broad-leaved forests in the temperate zone of East Asia, widely distributed on the Hu Huanyong line or in Southeast China (latitude from 18° to 41° N and longitude from 91° to 123° E) [16]. This line from Heilongjiang Province to Tengchong, Yunnan Province, is roughly inclined in a 45° straight line. The development, origin, and reproduction of China are linked with Q. acutissima. Therefore, we need to protect, cultivate, and utilize Q. acutissima, and this has received substantial attention in phylogeny and biogeography studies. Most previous studies have focused on its population structure [17], breeding [18], forest management [19], and physiology [20]. Studies on the genetic variation of Q. acutissima using simple sequence repeat (SSR) and cpDNA makers have been carried out in China and South Korea [16,21]. According to this research, the distribution of Q. acutissima often overlaps with other oak trees, i.e., Q. variabilis and Q. chenii [22]. There is often a variety of species found in the population, although this has usually been determined from a comparison of morphology, rather than at a molecular level. Therefore, an analysis of the complete cp genome of Q. acutissima will help to identify the species further. In the present study, we constructed the whole chloroplast genome of Q. acutissima by using next-generation sequencing and applying a combination of de novo and reference-guided assembly. Here, we describe the whole chloroplast genome sequence of Q. acutissima and the characterization of long repeats and simple sequence repeats (SSRs). We compare and analyze the chloroplast genome of Q. acutissima and the chloroplast genome of other members of Fagaceae. It is expected that the results will provide a theoretical basis for the determination of phylogenetic status and future scientific research.

2. Results and Discussion

2.1. Features of Q. Acutissima cpDNA

A total number of 63 million pair-end reads were produced with 9.82 Gb of clean data. Data from all of the reads were deposited in the NCBI Sequence Read Archive (SRA) under accession number MH607377. The size of the complete cp genome is 161,124 bp (Figure 1). The cp genome displayed a typical quadripartite structure, including a pair of IR (25,816 bp) separated by the large single copy (LSC; 90,423 bp) and small single copy (SSC; 19,069 bp) regions (Figure 1 and Table 1). The DNA G + C contents of the LSC, SSC, and IR regions, and the whole genome are 34.62, 30.84, 42.78, and 36.08 mol %, respectively, which is also similar to the chloroplast genomes of other Quercus species (Figure A1; Table 2). The DNA G + C content is a very important indicator of species affinity [23]. It is obvious that the DNA G + C content of the IR region is higher than that of other regions (LSC, SSC). This phenomenon is very common in other plants [23,24]. GC skewness has been shown to be an indicator of DNA lead chains, lag chains, replication origin, and replication terminals [25,26,27].
Figure 1

Chloroplast genome map of Q. acutissima. Genes inside the circle are transcribed clockwise, and those outside are transcribed counterclockwise. Genes of different functions are color-coded. The darker gray in the inner circle shows the GC content, while the lighter gray shows the AT content.

Table 1

Summary of five Quercus chloroplast genome features.

Genome Features Q. acutissima Q. variabilis Q. dolicholepis C. mollissima L. balansae F. engleriana
Genome size (bp)161,124161,077161,237160,799161,020158,346
LSC length (bp)90,42390,38790,46190,43290,59687,667
SSC length (bp)19,06819,05619,04818,99519,16018,895
IR length (bp)51,63251,63451,72851,37251,26451,784
Number of genes136134134130134131
Number of protein–coding genes888686838783
Number of tRNA genes404040373940
Number of rRNA genes888888
Figure A1

BLAST result of the chloroplast genome and the GC stew of Q. acutissima. BlAST 1 represents L. balansae; BlAST 2 represents Q. variabilis; BlAST 3 represents Q. dolicholepis.

Table 2

Base composition of the Q. acutissima chloroplast genome.

RegionA (%)T (U) (%)C (%)G (%)A + T (%)G + C (%)
LSC31.9933.417.7416.8865.3934.62
SSC34.4634.7116.2414.669.1730.84
IR28.6128.6121.3921.3957.2242.78
Total31.6932.2418.4617.6263.9336.08
Plant chloroplast genomes may have 63–209 genes, but most are concentrated between 110 and 130, with a highly conserved composition and arrangement, including photosynthetic genes, chloroplast transcriptional expression-related genes, and some other protein-coding genes [28]. In the Q. acutissima chloroplast genome, 136 functional genes were predicted and divided into six groups, including eight rRNA genes, 40 tRNA genes, and 88 protein-coding genes (Table 1 and Table 3). In addition, 14 tRNA genes, eight rRNA genes, and 15 protein-coding genes are duplicated in the IR regions (Figure 1). The LSC region includes 62 protein-coding and 25 tRNA genes, while the SSC region includes 13 protein-coding genes (Table A1).
Table 3

List of genes annotated in the cp genomes of Q. acutissima sequenced in this study.

FunctionGenes
RNAs, transfertrnH-GUG, trnK-UUU, trnQ-UUG, trnS-GCU, trnG-GCC, trnR-UCU, trnC-GCA, trnD-GUC, trnY-GUA, trnE-UUC, trnT-GGU, trnM-CAU, trnS-UGA, trnG-GCC, trnfM-CAU, trnS-GGA, trnT-UGU, trnL-UAA, trnF-GAA, trnV-UAC, trnM-CAU, trnT-GGU, trnW-CCA, trnP-UGG, trnP-GGG, trnI *-CAU, trnL-CAA *, trnV-GAC, trnI-GAU *, trnA-UGC, trnR-ACG, trnN-GUU, trnL-UAG, trnN-GUU, trnR-ACG, trnA-UGC, trnV-GAC
RNAs, ribosomalrrn23 *, rrn16 *, rrn5 *, rrn4.5 *
Transcription and splicingrpoC1 *, rpoC2, rpoA, rpoB
Translation, ribosomal proteins
Small subunitrps2, rps3, rps4, rps7, rps8, rps11, rps12 **, rps14, rps15, rps16 *, rps18, rps19
Large subunitrpl2 *, rpl14, rpl16 *, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
Photosynthesis
ATP synthaseatpE, atpB, atpA, atpF *, atpH, atpI
Photosystem IpsaI, psaB, psaA, psaC, psaJ, ycf3 *, ycf4
Photosystem IIpsbD, psbC, psbZ, psbT, psbH, psbK, psbI, psbJ, psbF, psbE, psbM, psbN, psbL, psbA, psbB
Calvin cycle rbcL
Cytochrome complexpetN, petA, petL, petG, petB *, petD *
NADH dehydrogenasendhB *, ndhI, ndhK, ndhC, ndhF, ndhD, ndhG, ndhE, ndhA, ndhH, ndhJ
OthersinFA, ycf15 *, ycf1 *, ycf2 *, accD, cemA, ccsA, clpP **

* Genes containing one intron; ** genes containing two introns.

Table A1

The number of genes in the Q. acutissima cp genome.

RegionNumber of CDSNumber of tRNANumber of rRNATotal
LSC region6225087
SSC region131014
IRA region67417
IRB region77418
Based on the protein-coding sequences and tRNA genes, the frequency of codon usage was estimated for the Q. acutissima cp genome and is summarized in Table A2. In total, all genes are encoded by 6311 codons. Among these, leucine, with 2824 (44.4%) codons, is the most frequent amino acid in the cp genome, and cysteine, with 293 (1.1%), is the least frequent (Table 3). A- and U-ending codons are common. The most preferred synonymous codons (relative synonymous codon usage values (RSCU) > 1) end with A or U [23,29].
Table A2

Codon-anticodon recognition patterns and codon usage of the Q. acutissima chloroplast genome.

Amino AcidCodonNo.RSCUtRNAAmino AcidCodonNo.RSCUtRNA
AlaGCG1640.47 ProCCA3131.13 trnP -TGG
AlaGCC2240.64 ProCCC2260.82
AlaGCU6301.79 ProCCU4091.48
AlaGCA3881.1 ProCCG1610.58
CysUGU2211.44 GlnCAG2150.45
CysUGC860.56 trnC -GCA GlnCAA7311.55 trnQ -TTG
AspGAC2090.39 trnD -GTC ArgCGU3371.26 trnR -ACG
AspGAU8701.61 ArgAGA5001.87 trnR -TCT
GluGAA10641.5 trnE -TTC ArgCGA3581.34
GluGAG3570.5 ArgAGG1830.68
PheUUU9831.3 ArgCGG1180.44
PheUUC5350.7 trnF -GAA ArgCGC1090.41
GlyGGU5801.27 SerAGC1250.37 trnS -GCT
GlyGGG3300.72 SerUCU5571.66
GlyGGA7061.55 SerUCA3971.18 trnS -TGA
GlyGGC2060.45 trnG -GCC SerUCC3491.04 trnS -GGA
HisCAU4861.54 SerAGU3911.17
HisCAC1450.46 trnH -GTG SerUCG1930.58
IleAUC4580.58 ThrACU5381.6
IleAUA7580.97 ThrACG1600.48
IleAUU11391.45 ThrACC2470.73 trnT -GGT
LysAAG3790.5 ThrACA4021.19 trnT -TGT
LysAAA10621.4 ValGUU5081.41
LeuUUG5721.22 trnL -CAA ValGUC1810.5 trnV -GAC
LeuUUA8941.9 ValGUA5471.52
LeuCUU5831.24 ValGUG2070.57
LeuCUA3730.79 trnL -TAG TrpUGG4621 trnW -CCA
LeuCUC2040.43 TyrUAC2120.42 trnY -GTA
LeuCUG1980.42 TyrUAU7921.58
MetAUG6201 trnI -CAT StopUAA471.6
AsnAAU10041.5 StopUAG220.75
AsnAAC3040.46 StopUGA190.65

RSCU: Relative Synonymous Codon Usage.

In total, we found 23 intron-containing genes, including 15 protein-coding genes, and eight tRNA genes (Table 4). 21 genes (13 protein-coding and eight tRNA genes) contain one intron, and two genes (ycf3 and clpP) contain two introns. The trnK-UUU has the largest intron (2505 bp), and the trnL-UAA has the smallest intron (483bp). Studies have shown that ycf3 is required for stable accumulation of photosystem I complexes [30]. Therefore, we speculate that the ycf3 intron gain of Q. acutissima may be helpful for further study of the mechanism of photosynthesis evolution.
Table 4

The lengths of exons and introns in genes with introns in the Q. acutissima chloroplast genome.

GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)
rps16 LSC42898195
atpF LSC144780411
rpoC1 LSC4328271626
ycf3 LSC127718228778155
clpP LSC69844294649228
petB LSC6841642
petD LSC9640474
rpl16 LSC91102399
rpl2 RepeatA390628471
ndhB RepeatA777680756
rps12 RepeatA10537231
ndhA SSC5511040541
rps12 RepeatB 23253626
ndhB RepeatB777680756
rpl2 RepeatB390628471
trnG-GCC LSC2373437
trnK-UUU LSC37250535
trnL-UAA LSC3548350
trnV-UAC LSC3663037
trnI-GAU RepeatA4295035
trnA-UGC RepeatA3880035
TRNA-UGC RepeatB3880035
trnI-GAU RepeatB4295035

2.2. Comparative Analysis of Genomic Structure

The chloroplast sequence are often used to measure the genetic diversity within a species, the gene flow between species, and the size of ancestral populations of separated sister species [31]. Thus, it is necessary to understand the chloroplast differences between species. The complete cp genome sequence of Q. acutissima was compared to those of Q. variabilis, Q. dolicholepis, Castanea mollissima, Lithocarpus balansae, and Fagus engleriana. F. engleriana has the smallest cp genome with the largest IR region (51,784 bp), and Q. dolicholepis has the largest cp genome (Table 1). We assumed that the different lengths of the SSC and IR regions is the main reason for variety in sequence lengths. To verify the possibility of genome divergence, sequence identity was calculated for six species’ chloroplast DNA using the program mVISTA with Q. variabilis as a reference (Figure 2). The results of this comparison revealed that LSC regions are more divergent than SSC and IR regions and that higher divergence is found in noncoding than in coding regions. The complete cp genome sequence of F. engleriana is quite different from the five other plants. There was no significant difference between the chloroplast genome sequences of evergreen and deciduous trees. At the same time, the results of the sliding window indicated that the location of the variation in the cp genome among the six species occurred in the LSC and SSC regions (Figure A2). Significant variation was found in coding regions of some genes, including psbI, rpl33, petB, rpl2, rps16, rpoC2, ndhK, ycf2, ycf1, and ndhI. The highest divergence in noncoding regions was found in the intergenic regions of trnK-rps16, rps 16-trnQ, psbK-psbI, trnS-trnG, atpH-atpI, atpI-rps2, rpoB-trnC, trnC-petN, psbM-trnD, trnD-trnY, trnE-trnM, trnT-petD, psbZ-trnG, trnT-trnL, trnF-ndhJ, rbcL-accD, psaI-ycf4, ycf4-cemA, petA-psbL, psaJ-rpl33, clpP-psbB, rpl14-rpl16, ndhF-rpl32, ccsA-ndhD, ndhD-psaC, and rps15-ycf1.
Figure 2

Complete chloroplast genome comparison of six species using the chloroplast genome of Q. variabilis as a reference. The grey arrows and thick black lines above the alignment indicate the genes’ orientations. The Y-axis represents the identity from 50% to 100%.

Figure A2

Percentage of variation in the complete cp genomes of the six species. The regions are oriented according to their locations in the genome.

The contraction and expansion of the IR region at the borders play important roles in evolution. They are common evolutionary events and a major cause of changes in the size of the chloroplast genome. They may also cause variation in the length of angiosperm plastid genome [32,33,34]. Detailed comparisons of the IR–SSC and IR–LSC boundaries among the cp genomes of the above six Fagaceae species were presented in Figure 3. The IR regions are relatively highly conserved in the Quercus genus—the rpl2 gene in the Quercus cp genome is shifted by 62 bp from IRb to LSC at the LSC/IRb border, and by 62 bp from IRa to LSC at the IRa/LSC border. Compared to other species in the genus, the range of the IRa/SSC regions changes greatly. Compared with evergreen and deciduous species, we found significant differences in IRb/SSC. Some reports showed that ycf1 is necessary for plant viability and encodes Tic214, an important component of the Arabidopsis TIC complex [35,36]. The ycf1 gene crossed the SSC/IRb region, with 1041bp of ycf1_like within IRb (incompletely duplicated in IRb). The SSC/IRa junction is located in the ycf1 region in all Fagaceae species chloroplast genomes and extends into the SSC region by different lengths depending on the genome (Q. acutissima, 4619 bp; Q. variabilis, 4620 bp; Q. dolicholepis, 4611 bp; C. mollissima, 4623 bp; L. balansae, 4626 bp; F. engleriana, 4633 bp); the IRa region includes 1041, 1041, 1068, 1059, 828, and 1049 bp of the ycf1 gene.
Figure 3

Comparison of the large single copy (LSC), small single copy (SSC), and inverted repeat (IR) regions in chloroplast genomes of four species. Genes are denoted by colored boxes. The gaps between the genes and the boundaries are indicated by the base lengths (bp). Extensions of the genes are indicated above the boxes.

2.3. Long-Repeat and SSR Analysis

For the repeat structure analysis (Table 5), 31 forward and 22 inverted repeats were detected in the Q. acutissima cp genome. Most of these repeats are between 19 and 46 bp. The longest forward repeat is 46 bp in length and is located in the LSC region. A total of 35, 18, and eight repeats were found in the LSC, SSC, IR regions, respectively. Seven forward repeats were located in IR, including one repeat associated with ycf1 genes and one repeat related to the trnV-UAC and trnA-UGC genes. Most repeats in the intergenic spacers are distributed in the LSC region. Ten repeats are distributed in the SSC region, and only four of them are in the intergenic spacers.
Table 5

Long repeat sequence in the Q. acutissima chloroplast genome.

IDRepeat Start ITypeSize (bp)Repeat Start 2Mismatch (bp)E-ValueGeneRegion
16831F46685301.47 × 10−18IGSLSC
211,847R3111,84701.58 × 10−9IGSLSC
36818R26681801.62 × 10−6 rps16 LSC
447,242F2547,26406.49 × 10−6IGSLSC
56831F24687502.59 × 10−5IGSLSC
6115,801F24135,72202.59 × 10−5 ycf1 IRA; IRB
7113,545F23113,57601.04 × 10−4IGSIRA
8118,844R23118,84401.04 × 10−4IGSIRA
9137,948F23137,97901.04 × 10−4IGSIRB
1011,371F2241,19304.15 × 10−4trnG-GCC (exon), trnG-GCCLSC
119536F2139,84901.66 × 10−3trnS-UGA, trnS-GCULSC
1210,319F2118,68201.66 × 10−3IGSLSC
13117,049R21117,04901.66 × 10−3 ndhF SSC
1436,478F2053,71906.64 × 10−3IGSLSC
1553,720F20130,48106.64 × 10−3IGSLSC; SSC
1655,907R2055,90706.64 × 10−3 atpB LSC
1757,271F20142,06406.64 × 10−3trnV-UAC, trnA-UGCLSC; IRB
18105,331F20105,34906.64 × 10−3IGSIRA
19146,178F20146,19606.64 × 10−3IGSIRB
204930F1936,47602.66 × 10−2IGSLSC
218915R19891502.66 × 10−2IGSLSC
2213,541R1976,64202.66 × 10−2 atpA LSC
2318,685R19118,84202.66 × 10−2 clpP LSC; SSC
2421,297R1954,18302.66 × 10−2 rpoC2 LSC
2536,479F19130,48102.66 × 10−2IGSLSC; SSC
2639,957R1939,95702.66 × 10−2IGSLSC
2762,040R1962,04002.66 × 10−2IGSLSC
2864,751R1964,75102.66 × 10−2IGSLSC
2969,026R1969,02602.66 × 10−2IGSLSC
3071,277R1971,27702.66 × 10−2IGSLSC
3172,561R1972,56102.66 × 10−2IGSLSC
324430R18443001.06 × 10−1IGSLSC
334437F1824,82801.06 × 10−1rpoC1 (intron)SSC
344935F1852,10501.06 × 10−1IGSLSC
354938F18118,69501.06 × 10−1IGSLSC
366813F18684701.06 × 10−1IGSLSC
376813F18686901.06 × 10−1IGSLSC
386817F18127,94501.06 × 10−1ndhA (intron)LSC
397369F18738701.06 × 10−1IGSLSC; SSC
407465R18746501.06 × 10−1IGSLSC; SSC
418589R1834,76801.06 × 10−1IGSLSC; SSC
429996R18999601.06 × 10−1IGSLSC
4310,283F1831,73001.06 × 10−1IGSLSC
4410,322R18118,84301.06 × 10−1IGSLSC; IRA
4510,548F18133,36501.06 × 10−1 ycf1 LSC
4631,728F18125,95101.06 × 10−1IGSLSC
4739,812F1840,69801.06 × 10−1 trnS -UGA LSC; SSC
4840,022R1869,09301.06 × 10−1IGSLSC
4940,700F18123,82701.06 × 10−1IGSLSC
5043,446F1845,67001.06 × 10−1 psaB SSC
5140,022R1869,09301.06 × 10−1IGSLSC
5240,700F18123,82701.06 × 10−1IGSLSC
5343,446F1845,67001.06 × 10−1psaB, psaALSC

F: forward; I: inverted; IGS: intergenic space.

As chloroplast-specific SSRs are uniparentally inherited and are inclined to undergo slipped-strand mispairing, they are often used in population genetics, species identification, and evolutionary process research of wild plants [37,38]. In addition, chloroplast genome sequences are highly conserved, and the SSR primer for chloroplast genomes can be transferred across species and genera. Yoko et al. used six maternally inherited chloroplast (cpDNA) simple sequence repeat (SSR) markers to study the genetic variation in Q. acutissima [39]. In this study, a total of 65 SSRs were found in Q. acutissima, most of them distributed in LSC and SSC and partly distributed in IR. These included 61 mononucleotide SSRs (93.85%) and four dinucleotide SSRs (6.15%) (Table 6). Compared with other Quercus species, fewer types of SSRs were identified in Q. acutissima [40]. Among them, two SSRs belonged to the C type, and the others all belonged to the A/T types. These results are consistent with the hypothesis that cpSSRs are generally composed of short polyadenine (polyA) or polythymine (polyT) repeats and rarely contain tandem guanine (G) or cytosine (C) repeats [41]. We also found that 12 SSRs were located in genes, and the remaining were all located in intergenic regions. These cpSSR markers could be used to examine the genetic structure, diversity, differentiation, and maternity in Q. acutissima and its relative species in future studies.
Table 6

Simple sequence repeats (SSRs) in the Q. acutissima chloroplast genome.

IDRepeat MotifLength (bp)StartEndRegionGeneIDRepeat MotifLength (bp)StartEndRegionGene
1(A)10918091818LSC 34(T)10955,71355,722LSC
2(C)141344334446LSC 35(T)10959,59159,600LSC
3(T)111046974707LSC 36(T)10960,06360,072LSC
4(A)10949394948LSC trnK-UUU 37(T)10964,09264,101LSC accD
5(T)111070017011LSC 38(A)111064,26664,276LSC
6(T)10977467755LSC 39(AT)71364,57064,583LSC
7(A)10981748183LSC 40(T)141364,94564,958LSC
8(A)121185908601LSC psbK 41(T)131266,17066,182LSC
9(A)111089208930LSC 42(T)111068,61668,626LSC petA
10(A)10994659474LSC 43(T)111070,73070,740LSC
11(A)10910,16110,170LSC 44(T)111071,39871,408LSC
12(A)111013,54713,557LSC 45(T)111073,38973,399LSC
13(T)121115,34515,356LSC 46(AT)61177,27477,285LSC clpP
14(T)10916,16016,169LSC 47(TA)71382,92882,941LSC petD
15(A)121118,69218,703LSC rpoC2 48(A)111085,78185,791LSC
16(T)121121,29521,306LSC rpoC2 49(T)10986,10086,109LSC
17(T)141325,29925,312LSC 50(T)10988,82088,829LSC
18(T)10928,56328,572LSC 51(T)1110114,070114,080IRA
19(T)10929,65129,660LSC 52(T)1211118,582118,593SSC
20(T)111030,27530,285LSC 53(A)1110118,695118,705SSC
21(C)141330,42830,441LSC 54(T)1110119,000119,010SSC
22(T)111031,73131,741LSC 55(A)109119,794119,803SSC
23(A)10932,09432,103LSC 56(T)1110122,199122,209SSC ndhD
24(A)10933,98633,995LSC 57(A)109122,546122,555SSC
25(A)131234,77534,787LSC 58(AT)815123,832123,847SSC
26(A)10934,95534,964LSC 59(T)1110125,812125,822SSC
27(A)10936,48536,494LSC 60(T)1110125,954125,964SSC
28(AT)61139,81939,830LSC 61(T)1110130,262130,272SSC
29(T)10941,23841,247LSC trnfM-CAU 62(A)109130,487130,496SSC
30(T)111053,21753,227LSC 63(T)109133,465133,474SSC ycf1
31(A)10953,72653,735LSC 64(T)1312134,042134,054SSC ycf1
32(T)151454,11054,124LSC 65(A)1110137,468137,478SSC
33(A)111054,99055,000LSC

2.4. Phylogenetic Analysis

Phylogenetic analysis was completed on an alignment of concatenated nucleotide sequences of all chloroplast genomes from 25 angiosperm species (Figure 4). We used the Bayesian inference (BI) method based on RAxML to build a phylogenetic tree, and Malus prunifolia and Ulmus gaussenii were used as the outgroup. Support is generally high for almost all relationships inferred from all chloroplast genome data based on BI methods (the support values have a range of 0.8956 to 1). It is noteworthy that the species in genus Quercus do not form a clade. Several evergreen tree species gather together to form one clade. Q. acutissima and Q. variabilis are sister species and are frequently mixed in Chinese endemic species; the second clade splits into two subclades. F. engleriana is in the top position, while Q. acutissima appears to be more closely related to Q. variabilis, Q. dolicholepis, and Q. baronii. In general, the topologies of the other branches (genus Fagus, Trigonobalanus, Lithocarpus, and Castanopsis) are almost the same based on two nuclear loci (ITS and CRC) [3].
Figure 4

Bayesian inference (BI) phylogenetic tree reconstruction including 25 species based on all chloroplast genomes. Malus prunifolia and Ulmus gaussenii were used as the outgroup.

3. Materials and Methods

3.1. Sampling, DNA Extraction, Sequencing, and Assembly

Q. acutissima was planted in Nanjing Forestry University and Zijin Mountain in Nanjing, China (32°04′ N, 118°48′ E; 32°04′ N, 118°50′ E), respectively. Fresh leaves were collected and wrapped in ice and immediately stored at −80 °C until analysis. Genomic DNA was isolated by the modified method CTAB [42]. Agarose gel electrophoresis and one drop spectrophotometer (OD-1000, Shanghai Cytoeasy Biotech Co., Ltd., Shanghai, China) were used to detect DNA integrity and quality. Shotgun libraries (250 bp) were constructed using pure DNA according to the manufacturer’s instructions. Sequencing was performed with an Illumina Hiseq 2500 platform (Nanjing, China), yielding at least 9.82 GB of clean data for Q. acutissima. Firstly, all of the raw reads were trimmed by Fastqc. Next, we performed a BLAST analysis between trimmed reads and references (Q. variabilis and Q. dolicholepis) to extract cp-like reads. Finally, we used the chloroplast-like reads to assemble sequences using NOVOPlasty [43]. NOVOPlasty assembled part reads and stretched as far as possible until a circular genome formed. When the assembly result was within the expected range, the overlap was larger than 200 bp, and the assembly formed a ring.

3.2. Annotation and Analysis of the cpDNA Sequences

CpGAVAS was used to annotate the sequences; DOGMA (http://dogma.ccbb.utexas.edu/) and BLAST were used to check the results of the annotation [44,45]. tRNAscanSE was used to identify the tRNAs [46]. The circular gene maps of the species of Q. acutissima were drawn using the OGDRAWv1.2 program [47] (http://ogdraw.mpimp-golm.mpg.de/). An analysis of variation in synonymous codon usage, relative synonymous codon usage values (RSCU), codon usage, and the GC content of the complete plastid genomes and commonly analyzed CDS was conducted. MISA(available online: http://pgrc.ipk-gatersleben.de/misa/misa.html) [48] and REPuter (available online: https://bibiserv.cebitec.uni-bielefeld.de/reputer/) [49] was used to visualize the SSRs and long repeats, respectively.

3.3. Genome Comparison

MUMmer [50] was used for pairing sequence alignment of the cp genome. The mVISTA [51] program was applied to compare the complete cp genome of Q. acutissima to the other published cp genomes of its related species, i.e., Q. variabilis (KU240009), Q. dolicholepis (KU240010), C. mollissima (HQ336406), L. balansae (KP299291), and F. engleriana (KX852398) with the shuffle-LAGAN mode [52], using the annotation of Q. variabilis as a reference.

3.4. Phylogenetic Analysis

Phylogenies were constructed by Bayesian inference (BI) analysis using the 25 cp genome of the Fagaceae species sequences from the NCBI Organelle Genome and Nucleotide Resources database. The sequences were initially aligned using MAFFT [53]. Then, the visualization and manual adjustment of multiple sequence alignment were conducted in BioEdit [54]. An IQ-tree was used to select the best-fitting evaluation of models of nucleotide sequences [55]. TVM + F + R4 and GTR + G were selected as the best substitution models for the BI analyses. BI analyses were conducted using Mrbayes [56]. Malus prunifolia (NC_031163), and the Ulmus gaussenii (NC_037840) were used as the outgroups.

4. Conclusions

In this study, we reported and analyzed the complete cp genome of Q. acutissima, an endemic and ecological tree species in China. The chloroplast genome was shown to be more conservative with similar characteristics to other genus Quercus species. Compared to the cp genomes of five other oak species, its LSC were shown to be more divergent among the four regions, and noncoding regions showed higher divergence. An analysis of the phylogenetic relationships among six species found Q. acutissima to be closely related to Q. variabilis. The developmental position of the tree in the Fagaceae family is consistent with previous studies. The results of this study provide an assembly of a whole chloroplast genome of Q. acutissima which might facilitate genetics, breeding, and biological discoveries in the future.
  33 in total

1.  The contributions of replication orientation, gene direction, and signal sequences to base-composition asymmetries in bacterial genomes.

Authors:  E R Tillier; R A Collins
Journal:  J Mol Evol       Date:  2000-03       Impact factor: 2.395

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  The tobacco plastid accD gene is essential and is required for leaf development.

Authors:  Vasumathi Kode; Elisabeth A Mudd; Siriluck Iamtham; Anil Day
Journal:  Plant J       Date:  2005-10       Impact factor: 6.417

4.  Patterns and causes of incongruence between plastid and nuclear Senecioneae (Asteraceae) phylogenies.

Authors:  Pieter B Pelser; Aaron H Kennedy; Eric J Tepe; Jacob B Shidler; Bertil Nordenstam; Joachim W Kadereit; Linda E Watson
Journal:  Am J Bot       Date:  2010-04-26       Impact factor: 3.844

5.  Uncovering the protein translocon at the chloroplast inner envelope membrane.

Authors:  Shingo Kikuchi; Jocelyn Bédard; Minako Hirano; Yoshino Hirabayashi; Maya Oishi; Midori Imai; Mai Takase; Toru Ide; Masato Nakai
Journal:  Science       Date:  2013-02-01       Impact factor: 47.728

6.  Complete Chloroplast Genome Sequence of Corroborates Structural Heterogeneity of Inverted Repeats in Wild Progenitors of Cultivated Bananas and Plantains.

Authors:  Santoshkumar M Shetty; Maria Ulfa Md Shah; Kavyashree Makale; Yusmin Mohd-Yusuf; Norzulaani Khalid; Rofina Yasmin Othman
Journal:  Plant Genome       Date:  2016-07       Impact factor: 4.089

7.  Plastid Genome Comparative and Phylogenetic Analyses of the Key Genera in Fagaceae: Highlighting the Effect of Codon Composition Bias in Phylogenetic Inference.

Authors:  Yanci Yang; Juan Zhu; Li Feng; Tao Zhou; Guoqing Bai; Jia Yang; Guifang Zhao
Journal:  Front Plant Sci       Date:  2018-02-01       Impact factor: 5.753

8.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

9.  Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

Authors:  Linda A Raubeson; Rhiannon Peery; Timothy W Chumley; Chris Dziubek; H Matthew Fourcade; Jeffrey L Boore; Robert K Jansen
Journal:  BMC Genomics       Date:  2007-06-15       Impact factor: 3.969

10.  Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Paeonia ostii.

Authors:  Shuai Guo; Lili Guo; Wei Zhao; Jiang Xu; Yuying Li; Xiaoyan Zhang; Xiaofeng Shen; Mingli Wu; Xiaogai Hou
Journal:  Molecules       Date:  2018-01-26       Impact factor: 4.411

View more
  26 in total

1.  Chloroplast genome sequence of Chongming lima bean (Phaseolus lunatus L.) and comparative analyses with other legume chloroplast genomes.

Authors:  Shoubo Tian; Panling Lu; Zhaohui Zhang; Jian Qiang Wu; Hui Zhang; Haibin Shen
Journal:  BMC Genomics       Date:  2021-03-18       Impact factor: 3.969

2.  Complete chloroplast genome sequencing support Angelica decursiva is an independent species from Peucedanum praeruptorum.

Authors:  Long Wang; Xiangxu Yu; Wenbo Xu; Junqing Zhang; Hanfeng Lin; Yucheng Zhao
Journal:  Physiol Mol Biol Plants       Date:  2021-11-15

3.  Chloroplast Genome of Lithocarpus dealbatus (Hook.f. & Thomson ex Miq.) Rehder Establishes Monophyletic Origin of the Species and Reveals Mutational Hotspots with Taxon Delimitation Potential.

Authors:  Rahul Gunvantrao Shelke; Rudra Prasad Banerjee; Babita Joshi; Prem Prakash Singh; Gopal Ji Tiwari; Dibyendu Adhikari; Satya Narayan Jena; Saroj Kanta Barik
Journal:  Life (Basel)       Date:  2022-06-02

4.  Complete Chloroplast Genome Sequence of Malus hupehensis: Genome Structure, Comparative Analysis, and Phylogenetic Relationships.

Authors:  Xin Zhang; Chunxiao Rong; Ling Qin; Chuanyuan Mo; Lu Fan; Jie Yan; Manrang Zhang
Journal:  Molecules       Date:  2018-11-08       Impact factor: 4.411

5.  The Complete Chloroplast Genomes of Punica granatum and a Comparison with Other Species in Lythraceae.

Authors:  Ming Yan; Xueqing Zhao; Jianqing Zhou; Yan Huo; Yu Ding; Zhaohe Yuan
Journal:  Int J Mol Sci       Date:  2019-06-13       Impact factor: 5.923

6.  Complete chloroplast genome sequence and phylogenetic analysis of Spathiphyllum 'Parrish'.

Authors:  Xiao-Fei Liu; Gen-Fa Zhu; Dong-Mei Li; Xiao-Jing Wang
Journal:  PLoS One       Date:  2019-10-23       Impact factor: 3.240

7.  PCIR: a database of Plant Chloroplast Inverted Repeats.

Authors:  Rui Zhang; Fangfang Ge; Huayang Li; Yudong Chen; Ying Zhao; Ying Gao; Zhiguo Liu; Long Yang
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

8.  The complete chloroplast genome of Quercus robur 'Fastigiata'.

Authors:  Lijuan Feng; Xuemei Yang; Qiqing Jiao; Chuanzeng Wang; Yanlei Yin
Journal:  Mitochondrial DNA B Resour       Date:  2019-12-11       Impact factor: 0.658

9.  Complete Chloroplast Genome Sequencing and Phylogenetic Analysis of Two Dracocephalum Plants.

Authors:  Junjun Yao; Fangyu Zhao; Yuanjiang Xu; Kaihui Zhao; Hong Quan; Yanjie Su; Peiyu Hao; Jiang Liu; Benxia Yu; Min Yao; Xiaojing Ma; Zhihua Liao; Xiaozhong Lan
Journal:  Biomed Res Int       Date:  2020-12-29       Impact factor: 3.411

10.  Evolution and Function of the Chloroplast. Current Investigations and Perspectives.

Authors:  Bartolomé Sabater
Journal:  Int J Mol Sci       Date:  2018-10-10       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.