Literature DB >> 28542519

Characterization of the whole chloroplast genome of Chikusichloa mutica and its comparison with other rice tribe (Oryzeae) species.

Zhiqiang Wu¹, Cuihua Gu², Luke R Tembrock³, Dong Zhang⁴, Song Ge¹.

Abstract

Chloroplast genomes are a significant genomic resource in plant species and have been used in many research areas. The complete genomic information from wild crop species could supply a valuable genetic reservoir for breeding. Chikusichloa mutica is one of the most important wild distant relatives of cultivated rice. In this study, we sequenced and characterized its complete chloroplast (cp) genome and compared it with other species in the same tribe. The whole cp genome sequence is 136,603 bp in size and exhibits a typical quadripartite structure with large and small single-copy regions (LSC, 82,327 bp; SSC, 12,598 bp) separated by a pair of 20,839-bp inverted repeats (IRA, B). A total of 110 unique genes are annotated, including 76 protein-coding genes, 4 ribosomal RNA genes and 30 tRNA genes. The genome structure, gene order, GC content, and other features are similar to those of other angiosperm cp genomes. When comparing the cp genomes between Oryzinae and Zizaniinae subtribes, the main differences were found between the junction regions and distribution of simple sequence repeats (SSRs). In comparing the two Chikusichloa species, the genomes were only 40 bp different in length and 108 polymorphic sites, including 83 single nucleotide substitutions (SNPs) and 25 insertion-deletions (Indels), were found between the whole cp genomes. The complete cp genome of C. mutica will be an important genetic tool for future breeding programs and understanding the evolution of wild rice relatives.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2017 PMID： 28542519 PMCID： PMC5443529 DOI： 10.1371/journal.pone.0177553

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The grass family (Poaceae) is one of the most diverse angiosperm families and contains numerous economically important crop species [1]Grass Phylogeny Work. Group II. 2012), including rice (Oryza sativa), the most economically important species in the world [2]. Because of its economic value, this species and even the Oryza genus has been used as a model system to conduct numerous genetic and evolutionary studies [3, 4]. The rice (Oryza) species and its many wild relatives are categorized into two well-supported subtribes, Oryzinae and Zizaniinae, in the subfamily Ehrhartoideae [5, 6]. In each subtribe, many species have economic value and have been used as food for many centuries, such as the two main cultivated rice species (Oryza sativa and O. glaberrima) in Oryzinae [7] and the wild rice species Zizania latifolia and Z. aquatica in Zizaniinae [8]. In addition to these species, many wild relatives in the Oryzeae tribe possess enormously useful genetic resources for improving rice breeding through increasing yields [9] and providing tolerance from environmental stress [10]. While the species in the Oryzinae tribe have been studied in depth with regard to their genetic importance [2, 11, 12, 13], the species in Zizaniinae have not been as thoroughly examined, except for the organelle genomes [14, 15, 16]. Chikusichloa is one such example of a genus from Zizaniinae for which we have only limited knowledge regarding the chloroplast genome. Chikusichloa is only made up of three perennial species in Southeast Asia, which are all uncommon within their range. The range of Chikusichloa extends from Indonesia (Sumatra) in the south to Japan and China in the north. The habitat of Chikusichloa includes wet swampy areas amid forests. C aquatica Koidz grows in wet valleys and on stream sides in China and Japan; C. mutica Keng is found in damp stream sides in forests of China and Indonesia; and C. brachyathera Ohwi is only found in the Ryukyu Islands [17]. Completion of their organelle genomes would supply a rich repository of genetic material for future breeding programs. Chloroplasts, which are the photosynthesis organelle in plant and algae cells, originated from cyanobacteria through endosymbiosis approximately one billion years ago [18] and retained their own genome through uniparental inheritance [19]. Many essential metabolites are synthesized in chloroplasts, such as fatty acids, starch, pigments, and amino acids [20]. Over time, chloroplast genomes have experienced dramatic variation, but a conserved structure has been maintained within land plants. The chloroplast genome structure is characterized by a small genome size with a circular quadripartite structure ranging from 120–165 kb in length, containing a pair of inverted repeats (IRs) separated by a large single-copy region (LSC) and a small single-copy region (SSC) [21, 22]. With the development of high throughput sequencing technologies [23] and the conserved features of chloroplast genomes [21, 24], over 1,000 species in Viridiplantae have been completely sequenced and published in the NCBI Organelle Genome Resources database (http://www.ncbi.nlm.nih.gov/genome/organelle/). The highly conserved gene order, stable gene content, and slow rate of mutation in chloroplast genomes [24, 25, 26] have made them an important genetic resource to explore evolutionary variation in land plants. For example, dozens of molecular markers or even the whole chloroplast genome have been used for plant molecular systematic and taxonomic studies [27, 28] in the field of plant biogeography [29] and for DNA barcoding [30]. In addition, using chloroplasts in genetic engineering also offers certain unique advantages over nuclear genomes, including high transgene expression [31, 32] and the containment of transgenes through maternal inheritance [33]. Thus, it is a valuable genetic resource to complete the chloroplast genomes from wild rice relatives. In this study, by employing traditional Sanger sequencing and sets of conserved universal primers from grass species, we assembled a high quality complete chloroplast genome of Chikusichloa mutica and deposited the annotated sequence into the NCBI database. We also conducted a comprehensive comparison with the other published chloroplast genome of C. aquatica (KR078265) [16] to detect all polymorphisms between the two whole chloroplast genomes. Utilizing the whole chloroplast, we reconstructed the phylogenetic relationships of all rice tribe species and compared their genomic features and structural variation.

Material and methods

Complete chloroplast genome of Chikusichloa mutica

Fresh leaves of the Chikusichloa mutica were collected from a plant (originally collected in the wild by Prof. Song Ge #GS0601 for [34]) grown in the greenhouse of the Institute of Botany of the Chinese Academy of Sciences in Beijing. The total cellular DNA was extracted using the cetyltrimethyl ammonium bromide (CTAB) method and purified with phenol extraction [34]. Amplification and Sanger sequencing methods were employed to complete the whole chloroplast genome of C. mutica. Based on the conserved features of chloroplast genome in land plants [21, 24] and our previous result [14, 15], by using the chloroplast primers from Wu et al [35], we successfully amplified the entire chloroplast in overlapping fragments. Conditions for PCR amplification were 4 min of initial denaturation at 94°C, 35 cycles of 45 s at 94°C, 45 s annealing at 52°C, and 90 s extension at 72°C, followed by a final 10-min incubation at 72°C. The PCR products were purified as described in Tang et al [34] and directly sequenced on an ABI 3730 (Applied Biosystems, Foster City, CA, USA). The final Sanger sequences were trimmed and assembled with the ContigExpress program from the Vector NTI Suite 6.0 (Informax Inc., North Bethesda, MD).

Chloroplast genome annotation

The final assembled chloroplast sequence was submitted to DOGMA (Dual Organellar GenoMe Annotator, http://dogma.ccbb.utexas.edu/) for annotation. The original DOGMA draft output contained many errors caused by variation of the exon–intron boundaries of genes or the questionable positioning of the start and stop codons. To finish the final annotation, we subsequently inspected all the inaccurate positions and performed blast searches within the published chloroplast genome database of related species to perform manual adjustments. Both tRNA and rRNA genes were identified by combining the BLASTN searches with relative species in rice tribes [14] and the DOGMA tools. The final annotation was submitted to GenBank and the diagrammatic annotation of the chloroplast genome was plotted using the bioinformatics tools in Circos 0.67 [36] (Fig 1).

Fig 1

The simplified schematic diagram showing the chloroplast genome information and variation maps of Chikusichloa mutica.

From outside to inside, all tracks independently represent: 1) the forward strand coding genes; 2) the reverse strand coding genes; 3) the number and distribution of single nucleotide substitutions (SNPs) (black bar color); 4) the number and distribution of non-repeat insertion-deletions (Indels) (purple bar color); 5) the number and distribution of homopolymer structures (grey bar color); 6) the number and distribution of repeat Indels (green bar color). The different functional groups of chloroplast coding genes are colored at the bottom. The diagram was generated with Circos v0.67 (http://circos.ca/).

The simplified schematic diagram showing the chloroplast genome information and variation maps of Chikusichloa mutica.

Polymorphisms detection

To compare the polymorphisms in detail between the whole chloroplast genomes within Chikusichloa, the published genome data from C. aquatica (KR078265) [16] was employed for comparison with our newly completed chloroplast genome of C. mutica. Based on the conserved structure of chloroplast genomes within the grass family [14, 37], the two genome sequences could be aligned by synteny. MAFFT v7.221 [38] was used to conduct the whole chloroplast genome alignment under the FFT-NS-2 setting, followed by manual adjustment. The two aligned genome sequences were used to extract the number and position of the polymorphic sites by DnaSP v5.10 [39], including the SNPs (single nucleotide polymorphisms) and Indels (insertion/deletions).

Simple sequence repeats (SSRs)

Simple sequence repeats (SSRs), also known as microsatellites with 1–6 bp long repeat motifs, are common genomic features, with high rates of polymorphism due to their slip strand mis-pairing mutation mechanism [40]. They have been widely used as co-dominant molecular markers in marker assisted breeding, population genetics, and genetic linkage mapping [41]. To identify the distribution of SSRs across the chloroplast genome, the public Perl script MISA (http://pgrc.ipk-gatersleben.de/misa/) was employed. The identification of SSRs included motif sizes from one to six nucleotide units with repeat lower thresholds set to of 6, 5, 4, 3, 3, and 3 repeat units for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs, respectively. Chikusichloa mutica and 13 other species in the rice tribe were examined for SSRs. Potamophila parviflora (GU592210) and Microlaena stipoides (GU592211) were excluded from this analysis due to their incomplete chloroplast genomes.

Chloroplast phylogenomics analysis

As an important target in plant systematics, the chloroplast genome has been widely used to resolve phylogenetic relationships among plant lineages [19]. To further determine and validate the phylogenic relationships of C. mutica with other Oryzeae species, published chloroplast genomes were included in the phylogenetic analysis, including 15 species from the subfamily Ehrhartoideae (Table 1) and one species (Phyllostachys propinqua) from Bambusoideae. A total of 17 species’ whole chloroplast genome data were included in the phylogenetic analysis. The complete chloroplast genome alignment from 17 species was used to construct the phylogenetic tree based on the conserved structure among grass family chloroplasts [14, 37, 42]. The alignment employed MAFFT v7.221 [38] using the same settings as mentioned in the annotation section above. The final alignments (S1 File) were used to resolve relationships using three different phylogenetic-inference methods: maximum parsimony (MP) analysis in PAUP* 4.0b10 [43]; Bayesian inference (BI) in MrBayes 3.1.2 [44] and maximum likelihood (ML) with PHYML Version 2.4.5[45] applying the settings mentioned previously [14].

Table 1

Base composition in various regions of the Chikusichloa mutica chloroplast genome.

Regions	A%	T%	C%	G%	GC%	Length (bp)
Total	30.63	30.34	19.44	19.60	39.04	136,603
LSC	31.25	31.54	18.38	18.82	37.20	82,327
SSC	35.84	30.79	17.25	16.12	33.37	12,598
IR _(A,B)	27.73	27.91	21.29	23.08	44.37	20,839
CDS^a	29.39	31.16	18.27	21.18	39.45	55,521
1st	28.97	23.28	19.01	28.74	47.75	18,507
2nd	27.51	32.92	21.10	18.47	39.57	18,507
1st+2nd	28.24	28.10	20.06	23.60	43.66	37,014
3rd	31.69	37.27	14.71	16.33	31.04	18,507

LSC: large single-copy region; SSC: small single-copy region; IR: inverted repeat; CDS: protein-coding region.

a: if some genes have two copies, only one copy is included.

LSC: large single-copy region; SSC: small single-copy region; IR: inverted repeat; CDS: protein-coding region. a: if some genes have two copies, only one copy is included.

Results

Genome assembly and feature

By employing the full set of the primers from Wu et al [35], the complete chloroplast genome of C. mutica was sequenced and assembled. For each amplicon, we conducted bi-directional Sanger sequencing to obtain high-quality sequencing bases. After assembly and editing, the whole chloroplast genome sequence was 136,603 bp in length. The genome was annotated following the methods of Wu and Ge [14] and deposited into GenBank with accession number KU696970. The chloroplast genome of C. mutica is a typical quadripartite structure consisting of a pair of inverted repeats (IRs) with a length of 20,839 bp separated by a small single-copy region (SSC) of 12,598 bp and a large single-copy region (LSC) of 82,327 bp, respectively (Fig 1; S1 Fig; Table 1). It is a AT-rich genome typical of most land plants [18] with a GC content of only 39.04%, similar to most of the published chloroplast genomes in the rice tribe (Table 2). The GC content of the two IR regions was 44.37%, which is higher than 37.20% of the LSC region and 33.37% of the SSC region (Table 1). The higher GC content of the IR regions was due to the high (54.78%) GC content of the four ribosomal RNAs (rRNAs). The overall average GC content of the rice tribe species was 38.99% (±0.0004), with the highest GC content in the IR region (44.34%) and the lowest in the SSC region (33.31%) (Table 2).

Table 2

Comparison of major features of 18 Poaceae chloroplast genomes from Ehrhartoideae and Bambusoideae subfamilies.

Subfamily	Tribe (Subtribe)	Species	Total size		LSC region		IR region		SSC region		GenBank Accession
Subfamily	Tribe (Subtribe)	Species	Length (bp)	GC (%)	Length (bp)	GC (%)	Length (bp)	GC (%)	Length (bp)	GC (%)	GenBank Accession
Ehrhartoideae	Oryzeae (Oryzinae)	Oryza sativa ssp. indica	134,496	39.00	80,553	37.11	20,798	44.35	12,347	33.32	NC_008155
		Oryza sativa ssp. japonica	134,551	39.00	80,604	37.11	20,802	44.35	12,343	33.37	AY522330
		Oryza nivara	134,494	39.01	80,544	37.12	20,802	44.35	12,346	33.33	NC_005973
		Oryza barthii	134,674	38.99	80,685	37.10	20,804	44.34	12,381	33.33	NC_027460
		Oryza glumipatula	134,583	38.99	80,613	37.09	20,807	44.34	12,356	33.32	NC_027461
		Oryza punctata	134,911	39.00	80,955	37.10	20,813	44.36	12,330	33.37	NC_027676
		Oryza officinalis	134,604	38.97	80,623	37.08	20,797	44.35	12,387	33.28	NC_027463
		Oryza australiensis	135,224	38.95	81,074	37.07	20,840	44.33	12,470	33.18	KJ830774
		Oryza brachyantha	134,604	38.98	80,411	37.10	20,832	44.31	12,529	33.31	KT992850
		Leersia tisserantii	136,550	38.88	81,865	37.01	21,329	44.05	12,027	33.23	JN415112
	Oryzeae (Zizaniinae)	Zizania latifolia	136,461	39.00	82,115	37.13	20,878	44.42	12,590	33.18	KT161956
		Zizania aquatica	136,364	39.02	82,013	37.14	20,879	44.41	12,593	33.31	KJ870999
		Rhynchoryza subulata	136,303	39.00	82,029	37.14	20,840	44.36	12,594	33.40	JN415114
		Chikusichloa aquatica	136,563	39.04	82,314	37.21	20,838	44.37	12,573	33.41	KR078265
		Chikusichloa mutica	136,603	39.04	82,327	37.20	20,839	44.37	12,598	33.37	KU696970^a
		Potamophila parviflora	134,551	39.07	80,604	37.19	20,800	44.32	12,347	33.58	GU592210^b
	Ehrharteae	Microlaena stipoides	134,551	39.22	80,613	37.28	20,793	44.18	12,343	33.77	GU592211^b
Bambusoideae	Bambusodae	Phyllostachys propinqua	139,704	38.88	83,227	36.96	21,800	44.23	12,877	33.14	JN415113

a Sequenced in this study;

b unfinished chloroplast genome.

a Sequenced in this study; b unfinished chloroplast genome. To understand the structural differences between chloroplasts in the rice tribe, we compared 15 genomes in the rice tribe and one from bamboo (Table 2). The total length variation between the complete genomes was approximately 2 kb, ranging in length from 134,494 bp to 136,603 bp with the species in Zizaniinae longer than in Oryzinae. The main contribution to the difference in length is found in the LSC regions, with lengths ranging from 80,411 bp to 82,327 bp (Table 2). The other regions, including the two IR and SSC regions, are relatively conserved in length within the rice tribe. It has been shown that chloroplast genomes are conserved in gene content and gene order across the grass family [46]. For the final annotation, we predicted a total of 128 functional genes in the chloroplast genome of C. mutica with 110 unique genes and 18 duplicated genes in the IR regions (Fig 1, S1 Table). Among the 110 unique genes, 76 were protein-coding genes and 34 were RNA genes, including 30 tRNA genes and four rRNA genes (S1 Table). For the 18 duplicated genes in the IR regions, there were six protein-coding genes, eight tRNA genes, and four rRNA genes (S1 Table). Sixteen genes contained introns; 14 contained a single intron (eight protein-coding and six tRNA genes) and ycf3 contained two introns. The rps12 gene was found to be trans-spliced with the 5′end exon located in the LSC region and the two 3′end exons duplicated in the IR region. The trnK-UUU gene had the largest intron (2,487 bp) with the gene matK located within this intronic region. The total length of 76 protein-coding genes was 55,521 bp, and the GC content for the first, second, and third codon positions was 47.75%, 39.57%, and 31.04%, respectively (Table 1). The lower percentage of GC nucleotides in our dataset at the third codon position corresponds to previous findings in which the third codon positions are AT-biased in the chloroplasts of land plants. SSR markers have been widely used in plant genetics studies and will constitute an important genomic resource with the development of NGS (Next Generation Sequencing) technologies [41]. In this study, we identified a total of 133 SSR loci, including 115 mono-nucleotides, four dinucleotides, three tri-nucleotides, ten tetra-nucleotides, and one penta-nucleotide (Table 3) from the whole chloroplast genome of C. mutica. The majority of the SSR loci were mononucleotides (86.47%), and of those, 91.30% were A/T motifs. These analyses demonstrate that the SSRs in chloroplast genomes are commonly composed of polyadenine (polyA) or polythymine (polyT) repeats [47]. In addition to SSR identification, we also conducted a comparative analysis across chloroplast SSRs in the rice tribe (Table 3). The main source of length variation came from mononucleotide SSRs, in which Zizaniinae chloroplasts possessed more than 110 mononucleotide SSRs of eight nucleotides long or longer and the Oryzinae species sampled possessed fewer than 100 such SSRs. All other SSR motifs were at the same length across the examined chloroplasts among all species.

Table 3

Comparison of the number of SSRs of 14 chloroplast genomes from rice tribe.

Species	mono-nucleotide 6 units (8 units)	di-nucleotide (5 units)	tri-nucleotide (4 units)	tetra-nucleotide (3 units)	penta-nucleotide (3 units)	hexa-nucleotide (3 units)	Total
Oryza sativa ssp. Japonica	511 (89)	4	3	8	0	1	527 (105)
Oryza nivara	509 (85)	4	3	9	1	0	526 (102)
Oryza barthii	511 (87)	4	3	9	0	2	529 (105)
Oryza glumipatula	509 (87)	4	3	9	0	0	525 (103)
Oryza punctata	497 (91)	4	3	10	0	0	514 (108)
Oryza officinalis	500 (93)	5	3	9	1	0	518 (111)
Oryza australiensis	500 (94)	4	4	9	0	0	517 (111)
Oryza brachyantha	514 (89)	3	3	7	0	0	527 (102)
Leersia tisserantii	505 (100)	2	1	9	2	0	519 (114)
Rhynchoryza subulata	509 (111)	5	2	8	0	0	524 (126)
Zizania latifolia	509 (111)	3	4	10	1	1	528 (130)
Zizania aquatica	515 (116)	3	3	9	2	0	532 (133)
Chikusichloa aquatica	497 (113)	4	3	10	1	0	515 (131)
Chikusichloa mutica	503 (115)	4	3	10	1	0	521 (133)

Dynamic variation of the junctions

The typical quadripartite structure of chloroplast genome possesses four junctions (JLA, JLB, JSA, and JSB) between the two IRs (IRA and IRB) and the two single copy (LSC and SSC) regions (Fig 2) [21, 48]. The expansion or contraction of the two IR regions produces variation of the four junction regions and provides a valuable signal for phylogenetic analysis [48]. The dynamic variation in IR regions can cause the size changes of chloroplast genome. For example, previous studies have shown that the variation of the junctions in Oryza exceeds the junction variability in Zizania [15]. Between C. mutica and C. aquatic, no junction length variation was found with a similar result for the two Zizania species (Fig 2). Limited junction length variation between these groups indicates a conserved structure in the Zizaniinae subtribe. We also compared the dynamic variation of junctions between the Zizaniinae and Oryzinae subtribes (Fig 2).

Fig 2

The variations of border distances between adjacent genes and four junction regions among 16 grasses’ chloroplast genomes.

The variations of border distances between adjacent genes and four junction regions among 16 grasses’ chloroplast genomes.

Boxes above or below the main line indicate the adjacent border genes, which were represented by the different colored boxes at the bottom. The LSC, SSC and two IR regions were also color coded. The distance is not scaled with sequence length. For JLA, located in the intergenic region of rps19-psbA, the distances between rps19 and JLA varied in length from 41 bp to 49 bp and the distance between psbA and JLA was from 81 bp to 83 bp in Oryzinae. In Zizaniinae, those distances were from 41 bp to 44 bp and 81 bp to 82 bp, respectively. For JLB, positioned between rpl22 and rps19, the distances between rpl22 and JLB varied from 24 bp to 30 bp in Oryzinae, and in Zizaniinae, the distance was consistently 24 bp. From analysis of those two junctions, the variation in Oryzinae was greater than in Zizaniinae. However, the variability in distances for JSA and JSB were greater than JLA and JLB. For JSA in all species, the ndhH gene spanned this junction in the Oryzinae subtribe. The distance that the ndhH gene overlapped the junction, which varied from 163 bp to 625 bp in Oryzinae, while in Zizaniinae, the overlap was consistently 181 bp. For JSB, near the ndhF gene, the distance varied from 17 bp to 42 bp in Oryzinae but from 89 bp to 93 bp in Zizaniinae. The junction comparisons indicate that the structural variation in the Oryzinae subtribe varies more widely than in Zizaniinae. Furthermore, these junction comparisons indicate that JLA and JLB is less variable in length than JSA and JSB, with the former less variable than the latter. From this, variations of JSB could be used as molecular markers to separate the two subtribes given that the distance in Zizaniinae was twice as long as that in Oryzinae for JSB.

Polymorphic variation

The two chloroplast genomes from Chikusichloa were found to be only 40 bp different in length with C. mutica shorter than C. aquatica (Table 2). In addition to total length differences, we assessed SNP and Indel variations between the entire chloroplast genomes of C. mutica and C. aquatica (Fig 1 and Table 4). In total, only 83 SNPs and 25 Indels were reported from the genome comparisons. For the SNPs, 58, 8 (16) and 9 were from LSC, IRs and SSC regions, respectively. For the 25 Indels, 21, 1(2) and 2 were within the LSC, IR and SSC regions. The distribution of these polymorphisms in the genome was as follows: 41, 8 (16) and 7 SNPs were from LSC, IR and SSC regions, and 20, 1(2) and 2 Indels were within LSC, IR and SSC regions, respectively. Most of the Indels and SNP variations were found from non-coding regions, including 64 SNPs and 24 Indels. Nineteen SNPs and 1 Indel were found in the coding regions, with the one Indel 21 base pairs into the rps18 gene. Thirteen of those coding SNPs were as synonymous substitutions, and only six of them were as non- synonymous substitutions (S2 Table). Those six non-synonymous substitutions are also from just six different genes: matK, rpoB, rpoC2, ndhJ, rpl16 and ndhD. The types of mutations between the two genomes were 41 transitions and 42 transversions among the 83 SNPs, and among the 25 Indels, 16 were homopolymer repeats, 4 repeat-related Indels and 5 independent Indels. Eleven of 16 homopolymer variations were A/T single repeats. This homopolymer variation is also consistent with previous findings [47].

Table 4

The number and distribution of polymorphisms of chloroplast genome between two Chikusichloa species.

Type A	Region	Coding Regions			Non-Coding Regions			Sum
SNP	LSC	17			41			58
	IR	0			16			16
	SSC	2			7			9
	Total	19			64			83
Type B	Region	Coding Regions			Non-Coding Regions			Sum
Type B	Region	Indel	Poly	Repeat	Indel	Poly	Repeat	Sum
INDEL	LSC	0	0	1	2	16	2	21
	IR	0	0	0	2	0	0	2
	SSC	0	0	0	1	0	1	2
	Total	0	0	1	5	16	3	25

Phylogeny

The chloroplast genome has been widely used as an important source for molecular markers in plant systematics [49, 50]. However, with the development of high-throughput sequencing, the whole chloroplast genome has recently been used in phylogenetic studies as chloroplast phylogenomics [14, 19, 27]. The conserved structure among grass species chloroplast genomes has been reported from other lineages [14, 37] (S2 Fig). In this study, by employing the whole chloroplast genome alignment and three different methods to resolve the phylogenetic relationships among 16 species from the Ehrhartoideae subfamily and one bamboo species as an outgroup (Fig 3), two clades corresponding to the subtribes Oryzinae and Zizaniinae were resolved with high support (as 100 for ML and MP and 1.0 for BI). Within each clade, the relationships among species matched the topology of previous studies, which used partial chloroplast and/or nuclear genes [6, 34]. In subtribe Zizaniinae, the two species in Chikusichloa, C. mutica and C. aquatica were closely clustered together as sister species with equal branch lengths. The two species in Zizania were resolved on branches of different lengths. The differing branch lengths in the Oryzinae suggest heterogeneous evolutionary history between these clades with regard to chloroplast evolution.

Fig 3

The chloroplast phylogenomic trees were generated from 17 grass species.

Three different methods as Bayesian inference (BI), maximum parsimony (MP) and maximum likelihood (ML) were employed to build the tree. Numbers above the branches were the posterior probabilities for BI and bootstrap values of MP and NL. Branch length is proportional to the number of substitutions, as indicated by the scale bar.

The chloroplast phylogenomic trees were generated from 17 grass species.

Discussion

In this study, by employing the traditional Sanger sequencing method, we completely sequenced the chloroplast genome of Chikusichloa mutica. As an important resource in rice germplasm, the complete chloroplast genome provides a valuable genetic resource for breeding and molecular analysis. Furthermore, the set of conserved primers used in this study could be widely employed in all rice tribe species, as well as Poaceae in general [14, 35]. The chloroplast genome of C. mutica is extremely conserved in structure compared with other published grass chloroplasts, with the gene content and number the same as other published chloroplast genomes [14, 15, 16, 51]. In comparison with the other species in Chikusichloa, C. mutica was found to have very limited variations (Fig 1) across the whole chloroplast genome.

Sequencing and assembly strategy

Since the first two complete chloroplast genomes were reported from liverwort [52] and tobacco[53] in 1986, the knowledge of the organization and evolution of chloroplast genomes has increased rapidly. Currently, more than 1,000 fully sequenced chloroplast genomes have been deposited in the public database, brought about by the recent developments in NGS technologies [23] as well as innovations in bioinformatics algorithms for assembly [54]. However, the sequencing quality from the traditional Sanger sequencing remains higher than other NGS technologies. The traditional Sanger method of genome sequencing and assembly is more laborious and costly compared with the NGS method[22]. With the development of NGS and corresponding assembled methods, dozens or hundreds of chloroplast genomes could be completed in less time [55, 56]. However, the assembled quality of those genomes should be carefully scrutinized [22]. For example, using the Sanger method, Wu et al [22] sequenced one wild rice chloroplast genome and compared it with another published genome generated by a NGS short reads method. They found that the assembled chloroplast genomes were heterogeneous in coding and noncoding regions. Although NGS methods can produce high coverage for the assembled genome, some questions remain unresolved. For example, NGS data from short reads is difficult to assemble with regard to repeat regions across the genome [57]. Further complicating the solution to short read data is the fact that longer reads appear to possess more sequencing errors [58]. The traditional Sanger sequencing method is still one of the most effective ways to complete high quality genomes in spite of its higher cost and time investment compared to NGS methods. By employing this traditional Sanger method to complete a high-quality chloroplast genome for one wild rice—C. mutica, this study provided many valuable informative markers for future studies. However, with the new generation of sequencing technology, those high error rate sequencing could be improved lots and will change the way of sequencing. The third-generation genomic technologies have been widely used in many species [59, 60]. For example, the long-read sequencing technology from Pacific Biosciences’ Single Molecule Real-Time (SMRT) sequencing can generate reads with an average ~20 kb size, but the error of raw reads can be up to 15% [61]. However, if this SMRT technology could be combined with short sequencing reads as Illumina or by self-correction with sufficient sequencing data, the accuracy of the assembled genome can be improved to over 99.99%.

Conserved chloroplast genome features in the grass family

The typical and stable quadripartite structure in chloroplast genomes, including a pair of IRs separating the LSC and SSC regions, has been reported in thousands of species [21, 26]. Among all published chloroplast genomes of the grass family, these conserved structures have been reported in all studies [14, 34, 37]. With regard to the genome size, the length variation of the whole chloroplast genome varies from 132 kb to 141 kb across Poaceae [14, 37]. In comparison, the SSC region is more stable in length than the LSC and IRs regions, with a length of approximately 12.5 kb. In contrast, the LSC region varies from 78.0 kb to 83.5 kb, and the IR region varies from 19.0 kb to 22.0 kb. The main reason for variation in genome length is expansions and contractions in the intergenic regions. For our sequenced C. mutica, the genome features are intermediate in length in relation to other Poaceae chloroplasts (Table 1). Secondly, the four junctions of the chloroplast genome [48] were consistently located in the same gene regions (Fig 2). Dynamic placement of junctions indicates the variation of the IR regions [21], and as such, the junction positions could be used in phylogenetic analyses [48]. For example, in Chikusichloa, the distances in all four junctions were the same, but they were different in other species (Fig 2). Thirdly, the gene content for all published chloroplast genomes in the grass family are the same as C. mutica (S1 Table). A total of 78 unique protein coding genes and 30 tRNA and 4rRNA genes were annotated among all grass species [14, 37]. All monocots have lost the infA, accD, ycf1 and ycf2 genes from their most recent common ancestors with dicots [62]. Although the conserved features of the chloroplast genome in the grass family are highly conserved, numerous microstructural variations (such as small insertions and deletions and SSR variation) have been found and constitute a valuable resource in phylogenetic and population analyses [22, 63]. The high-quality chloroplast genome of C. mutica reported here will be a valuable asset for discovering chloroplast variation in other Poaceae species.

Limited variation within the Chikusichloa genus

Polymorphic markers in chloroplast genomes between different species have provided an abundance of informative loci in plant systematic or barcoding research [49, 64]-. In this study, we comprehensively compared the polymorphisms, including the SNPs and Indels, between the two fully sequenced chloroplast genomes of C. mutica (KU696970) and C. aquatic (KR078265). We found extremely limited variations, with only 83 SNPs and 24 Indels from the 136,640-bp alignment matrix between the two species. Most of the polymorphisms from coding genes are also synonymous, only six SNP from six genes are identified as non- synonymous. This also reflects that the variation of those polymorphisms is rare as adaptive. In contrast to Chikusichloa, in Zizania, 744 SNPs and 137 Indels were reported between Z. latifolia and Z. aquatica [15]. Several reasons might explain the differences found between the two genera. First, if the divergence times of Zizania were earlier than Chikusichloa, more variations could accumulate. However, the divergence times between the two genera were nearly equal at approximately 4 MYA [34]. Thus, differences in divergence times do not explain the differences in polymorphisms between the genera. Second, the distribution of species might drive the differences: all three species in genus Chikusichloa are located in Southeast Asia, whereas Zizania has a broad geographic distribution, with Z. latifolia and Z. aquatica separately distributed in Asia and North America [8]. The geographic patterns between these species, indicating a broad radiation and/or long-distance dispersal event, might explain the differences in polymorphisms. Partial lineage-specific variations from their own chloroplast genome were reflected the long distance of the segregation [25, 65]. This can be seen from the phylogenetic relationships (Fig 3): the branches of two Chikusichloa species are the same, while the branch lengths between the two Zizania species are longer. Several other factors could also cause such differences, such as the efficiency of the inner DNA polymerase, differences in the molecular evolutionary rate, and demographic history. Additional work is needed to clarify the causes of the different rates of polymorphism found in Zizaniinae.

Conclusion

Using traditional high-quality Sanger sequencing technology, we presented the complete chloroplast genome of Chikusichloa mutica, performed comparative analyses in related species of the rice tribe, and deposited the genome into GenBank with accession number KU696970. The gene content, number and genome organization of C. mutica were identical to all other chloroplast genomes from Poaceae. From the whole genome comparison, limited variations were reported between two Chikusichloa species, with only 83 SNPs and 24 Indels between them. Phylogenetic analysis using whole genome sequences from 17 species in grass demonstrated the close relationship of two Chikusichloa species and also confirmed their phylogenetic position in relation to other rice tribe species. The full chloroplast genome data of C. mutica will facilitate the biological study of this important wild rice species. Furthermore, the chloroplast genome sequence is a valuable genetic resource that can be used to conduct population studies for this species and help shed light on its genetic mechanisms and evolutionary history.

The full chloroplast reference genome of Chikusichloa mutica.

The inside of the outer circle means the counterclockwise transcribed genes and the outside shows as the clockwise transcribed genes. Gray areas in the inner circle indicate the GC content as darker gray and the AT content as lighter gray. Genes belonging to different functional groups are color coded. LSC = large single copy; IR = inverted repeat; SSC = small single copy. (TIF) Click here for additional data file.

The whole chloroplast genome sequence identity plots containing two Chikusichloa species, two Zizania species with O. sativa ssp. Japonica (AY522330) as the reference genome.

The vertical scale indicates the percentage of sequence identity (50%-100%). The horizontal axis shows the base position from the AY522330 chloroplast genome. Genome regions are color coded as protein-coding, rRNA, tRNA, intron, and conserved noncoding sequences (CNS) at bottom. The diagram was generated with mVISTA (http://genome.lbl.gov/vista/mvista/submit.shtml). (EPS) Click here for additional data file.

Whole chloroplast genome alignment of 17 species from grass family.

(NEX) Click here for additional data file.

Gene content encoded in the C. mutica chloroplast genome.

(DOCX) Click here for additional data file.

Polymorphic information from comparisons between two Chikusichloa species.

(XLSX) Click here for additional data file.

57 in total

1. Comparative phylogeography of the wild-rice genus Zizania (Poaceae) in eastern Asia and North America.

Authors: Xin-Wei Xu; Jin-Wei Wu; Mei-Xia Qi; Qi-Xiang Lu; Peter F Lee; Sue Lutz; Song Ge; Jun Wen
Journal: Am J Bot Date: 2015-01-22 Impact factor: 3.844

2. Genome size is not correlated with effective population size in the Oryza species.

Authors: Bin Ai; Zhao-Shan Wang; Song Ge
Journal: Evolution Date: 2012-05-14 Impact factor: 3.694

Review 3. Next-generation sequencing platforms.

Authors: Elaine R Mardis
Journal: Annu Rev Anal Chem (Palo Alto Calif) Date: 2013 Impact factor: 10.745

4. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors: Kazutaka Katoh; Daron M Standley
Journal: Mol Biol Evol Date: 2013-01-16 Impact factor: 16.240

5. Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors: Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal: Proc Natl Acad Sci U S A Date: 2007-11-28 Impact factor: 11.205

6. Phylogeography of the Sino-Himalayan fern Lepisorus clathratus on "the roof of the world".

Authors: Li Wang; Zhi-Qiang Wu; Nadia Bystriakova; Stephen W Ansell; Qiao-Ping Xiang; Jochen Heinrichs; Harald Schneider; Xian-Chun Zhang
Journal: PLoS One Date: 2011-09-30 Impact factor: 3.240

7. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.

Authors: Fredrik Ronquist; Maxim Teslenko; Paul van der Mark; Daniel L Ayres; Aaron Darling; Sebastian Höhna; Bret Larget; Liang Liu; Marc A Suchard; John P Huelsenbeck
Journal: Syst Biol Date: 2012-02-22 Impact factor: 15.683

8. Resolving deep relationships of PACMAD grasses: a phylogenomic approach.

Authors: Joseph L Cotton; William P Wysocki; Lynn G Clark; Scot A Kelchner; J Chris Pires; Patrick P Edger; Dustin Mayfield-Jones; Melvin R Duvall
Journal: BMC Plant Biol Date: 2015-07-11 Impact factor: 4.215

9. Construction, alignment and analysis of twelve framework physical maps that represent the ten genome types of the genus Oryza.

Authors: HyeRan Kim; Bonnie Hurwitz; Yeisoo Yu; Kristi Collura; Navdeep Gill; Phillip SanMiguel; James C Mullikin; Christopher Maher; William Nelson; Marina Wissotski; Michele Braidotti; David Kudrna; José Luis Goicoechea; Lincoln Stein; Doreen Ware; Scott A Jackson; Carol Soderlund; Rod A Wing
Journal: Genome Biol Date: 2008-02-28 Impact factor: 13.583

10. Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology.

Authors: Richard Cronn; Aaron Liston; Matthew Parks; David S Gernandt; Rongkun Shen; Todd Mockler
Journal: Nucleic Acids Res Date: 2008-08-27 Impact factor: 16.971

8 in total

1. First reported chloroplast genome sequence of Punica granatum (cultivar Helow) from Jabal Al-Akhdar, Oman: phylogenetic comparative assortment with Lagerstroemia.

Authors: Abdul Latif Khan; Sajjad Asaf; In-Jung Lee; Ahmed Al-Harrasi; Ahmed Al-Rawahi
Journal: Genetica Date: 2018-08-29 Impact factor: 1.082

2. Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions.

Authors: Yanqiang Ding; Yang Fang; Ling Guo; Zhidan Li; Kaize He; Yun Zhao; Hai Zhao
Journal: PeerJ Date: 2017-12-22 Impact factor: 2.984

3. Complete chloroplast genome sequence of Dryopteris fragrans (L.) Schott and the repeat structures against the thermal environment.

Authors: Rui Gao; Wenzhong Wang; Qingyang Huang; Ruifeng Fan; Xu Wang; Peng Feng; Guangming Zhao; Shuang Bian; Hongli Ren; Ying Chang
Journal: Sci Rep Date: 2018-11-09 Impact factor: 4.379

4. Exploring the evolutionary characteristics between cultivated tea and its wild relatives using complete chloroplast genomes.

Authors: Jiao Peng; Yunlin Zhao; Meng Dong; Shiquan Liu; Zhiyuan Hu; Xiaofen Zhong; Zhenggang Xu
Journal: BMC Ecol Evol Date: 2021-04-30

5. Plastid genomes reveal evolutionary shifts in elevational range and flowering time of Osmanthus (Oleaceae).

Authors: Yongfu Li; Xuan Li; Steven Paul Sylvester; Min Zhang; Xianrong Wang; Yifan Duan
Journal: Ecol Evol Date: 2022-04-01 Impact factor: 2.912

6. Characterization and Comparative Analysis of the Complete Chloroplast Genome of the Critically Endangered Species Streptocarpus teitensis (Gesneriaceae).

Authors: Cornelius M Kyalo; Andrew W Gichira; Zhi-Zhong Li; Josphat K Saina; Itambo Malombe; Guang-Wan Hu; Qing-Feng Wang
Journal: Biomed Res Int Date: 2018-03-25 Impact factor: 3.411

7. The Complete Plastome Sequences of Eleven Capsicum Genotypes: Insights into DNA Variation and Molecular Evolution.

Authors: Nunzio D'Agostino; Rachele Tamburino; Concita Cantarella; Valentina De Carluccio; Lorenza Sannino; Salvatore Cozzolino; Teodoro Cardi; Nunzia Scotti
Journal: Genes (Basel) Date: 2018-10-17 Impact factor: 4.096

8. Initial Characterization of the Chloroplast Genome of Vicia sepium, an Important Wild Resource Plant, and Related Inferences About Its Evolution.

Authors: Chaoyang Li; Yunlin Zhao; Zhenggang Xu; Guiyan Yang; Jiao Peng; Xiaoyun Peng
Journal: Front Genet Date: 2020-02-20 Impact factor: 4.599

8 in total