Literature DB >> 28125648

Complete Chloroplast Genomes of Erianthus arundinaceus and Miscanthus sinensis: Comparative Genomics and Evolution of the Saccharum Complex.

Shin-Ichi Tsuruta1, Masumi Ebina2, Makoto Kobayashi2, Wataru Takahashi2.   

Abstract

The genera Erianthus and Miscanthus, both members of the Saccharum complex, are of interest as potential resources for sugarcane improvement and as bioenergy crops. Recent studies have mainly focused on the conservation and use of wild accessions of these genera as breeding materials. However, the sequence data are limited, which hampers the studies of phylogenetic relationships, population structure, and evolution of these grasses. Here, we determined the complete chloroplast genome sequences of Erianthus arundinaceus and Miscanthus sinensis by using 454 GS FLX pyrosequencing and Sanger sequencing. Alignment of the E. arundinaceus and M. sinensis chloroplast genome sequences with the known sequence of Saccharum officinarum demonstrated a high degree of conservation in gene content and order. Using the data sets of 76 chloroplast protein-coding genes, we performed phylogenetic analysis in 40 taxa including E. arundinaceus and M. sinensis. Our results show that S. officinarum is more closely related to M. sinensis than to E. arundinaceus. We estimated that E. arundinaceus diverged from the subtribe Sorghinae before the divergence of Sorghum bicolor and the common ancestor of S. officinarum and M. sinensis. This is the first report of the phylogenetic and evolutionary relationships inferred from maternally inherited variation in the Saccharum complex. Our study provides an important framework for understanding the phylogenetic relatedness of the economically important genera Erianthus, Miscanthus, and Saccharum.

Entities:  

Mesh:

Year:  2017        PMID: 28125648      PMCID: PMC5268433          DOI: 10.1371/journal.pone.0169992

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The Poaceae is the grass family comprised of approximately 700 genera and more than 10,000 species and grouped into two major clades, BEP (the subfamilies Bambusoideae, Ehrhartoideae, and Pooideae) and PACMAD (the subfamilies Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae) [1-3]. The Andropogoneae is one of the tribes of the Panicoideae that includes many economically important C4 grasses such as maize (Zea mays L.), sorghum (Sorghum bicolor L. Moench), and sugarcane (Saccharum spp.). The genera Saccharum, Erianthus, and Miscanthus are members of the subtribe Saccharinae within the Andropogoneae [4]. Erianthus and Miscanthus exhibit diverse important agricultural traits such as high productivity, high percentage of dry matter, good ratooning ability, vigor, and resistance to environmental stresses [5-7]. These genera are cross-compatible with Saccharum species [8], and sugarcane breeders have created intergeneric hybrids between commercial Saccharum spp. hybrids and these genera [9-12]. Thus, Erianthus and Miscanthus have attracted attention as potential genetic resources for sugarcane improvement [13-15]. In addition to their favorable agricultural traits, the low ash content and high heating value make Erianthus and Miscanthus promising cellulosic feedstocks at energy conversion plants; they can be used for methanol synthesis by gasification and for direct combustion [6, 7, 16]. Ongoing studies focus on the conservation and use of wild Erianthus and Miscanthus accessions as breeding materials [17, 18]. Despite current interest, the taxonomy and phylogenetic relatedness of Saccharum and these related genera have been controversial until recently, because the common criterion, variation of the awn on the lemma, used for differentiation within these genera does not clearly distinguish between the genera [12]. Therefore, Erianthus and Miscanthus have been regarded by some taxonomists as being synonymous with Saccharum and have been grouped into the so-called ‘Saccharum complex’ [19], which includes the members of Saccharum L., Erianthus Michx., Miscanthus Anderss., Narenga Bor, Sclerostachya A. Camus. This theory is widely accepted by sugarcane breeders [20]. Phylogenetic analyses based on molecular data have been employed to reconstruct the phylogeny of the Saccharum complex. In these studies, DNA variation detected by using DNA markers developed from nuclear genomes [8, 10, 14, 17, 21–24], was used to assess genetic diversity among wild accessions in these genera. Welker et al. [25] showed that a phylogenetic tree inferred from low-copy nuclear loci was useful for understanding the relationships between polyploid taxa and identifying allopolyploidization events in Saccharum and related genera. In addition, the data sets of partial sequences [21] and DNA markers [8, 14, 26–28] developed from organelle genomes were also used to estimate the phylogenetic relationships between the species and genera of the Saccharum complex. These studies have provided valuable insight into the phylogenetic relations within the Saccharum complex: (1) Saccharum is more closely related to Miscanthus than to Erianthus; (2) Erianthus is more closely related to Sorghum than to the other members of the Saccharum complex; (3) the evolutionary history of Erianthus may differ from that of other members of the Saccharum complex. These results have indicated the potential of this approach to elucidate the phylogenetic relationships within the Saccharum complex. Because the chloroplast (cp) genome has conserved gene content and uniparental inheritance [29], polymorphism within the chloroplast genome is a valuable tool for phylogenetic and evolutionary studies [30]. To date, only 12 cpDNA markers [14] and 28 partial sequences are registered for Erianthus arundinaceus in GenBank; therefore, there is a clear need for additional sequence information on the E. arundinaceus cp genome. Comparison of the complete cp genome sequences could reveal novel genome features such as single-nucleotide polymorphisms (SNPs), insertions/deletions (indels), and microsatellites. This information would improve the analyses of the relationships in the Saccharum complex, especially for Erianthus. Multiple alignments of complete cp genomes reveal sequence variability, which is needed for the development of DNA markers for taxonomic and evolutionary studies. In the Saccharum complex, the complete cp genome sequences were first reported for Saccharum officinarum in 2004 [31, 32] and more recently for Miscanthus sinensis [33]. However, as the E. arundinaceus cp genome has not been fully sequenced, the whole-genome comparison between these major genera of the Saccharum complex has not yet been possible. Recent advances in pyrosequencing, which allows high-throughput sequence analysis of a wide range of genomes, has simplified sequencing, considerably increased its speed, and reduced the cost. This approach enables faster and more efficient determination of whole cp genome sequences, and has been applied to many plant species, including those in the Poaceae [33, 34]. In this study, we present the complete cp genome sequence of Erianthus arundinaceus determined using pyrosequencing. On the basis of this sequence, we designed a primer set that is useful for validation of ambiguous sites such as homopolymeric and gap regions in Poaceae cp genomes, and also for sequencing of the entire cp genomes; we used these primers to sequence the whole cp genome of Miscanthus sinensis. Our analysis of these cp genomes provides detailed data on the distribution of SNPs, indels, and microsatellites in Saccharum and related genera. We also discuss the evolution of the Saccharum complex based on the sequence variations of these cp genomes.

Results

Assembly and annotation of the chloroplast genomes of Erianthus arundinaceus and Miscanthus sinensis

The E. arundinaceus cp genome was sequenced using pyrosequencing on the 454 GS FLX system. A total of 481,406 sequence reads (average, 336 bp; range, 30–897 bp) were generated, representing a 162-Mbp sequence. After filtering the reads by local BLASTN analysis with the S. officinarum cp genome (GenBank accession No. NC006084) as a reference, 5,052 reads (average, 362 bp) were retained; a 12-fold coverage of the cp genome was reached. There were 30 homopolymeric stretches (≥10 bp), which may lead to errors in the assembled sequences [35]. The accuracy of these regions and the inverted repeat (IR) junction regions in assembled sequences was confirmed by using PCR-based sequencing. Thus, the complete E. arundinaceus cp genome sequence was obtained. To determine the complete sequence of the M. sinensis cp genome, we used the Sanger sequencing with primers designed from the E. arundinaceus cp genome sequence. Sixteen overlapping regions were amplified with specific primers (Table 1) and a total of 320 sequence reads were obtained by using 258 primers, among which 253 primers (98.1%) were identical to both M. sinensis and S. officinarum cp genome sequences and 251 (97.3%) were also identical to that of Sorghum bicolor (S1 Table).
Table 1

Primer pairs used for amplification of Miscanthus sinensis cp genome.

Primer pairPrimer sequence (5′ to 3′)TmPCR productLength (bp)
ForwardReverse(°C)Location1Start2End2
ES01TTGTGAGCATTACGTTCGTGCGCTGAGTGGTTGATAGCTCCG60LSC1401211311974
ES02TGATCGTGATTTGGAACCTGTTCATTGAAGCATCTCGCACCTT58LSC11835192677433
ES03AATGAAAGGGTCTGGTTGGACCAATTGCATGCGTCTAATC58LSC18929260017073
ES04AGAGTGCCTAATCACGAGGATCCCCTCTTGTATCATCAACCCATCG60LSC250733741012338
ES05AACAAAGGGCGATGAATCAGAACCGTTCAAGCTGTTCCTG56LSC36910386561747
ES06GTCGAATTTGCAGAAGGGACGAGGAGTTCTTGTCGCACTCCTTTGTG60LSC373145022612913
ES07GTGGATTAATCGGACGAGGAACTGCAGCTCCTGCTTCTTC58LSC49983577287746
ES08GCAGGCGCAGATCTATGAATCCTTTGCTCTGATGGTTGGAATC58LSC56347639397593
ES09GGCTAGTTGAGTAGTTTTGATTAAGGAGACCGTGGAGGATCCACAATAG60LSC636907612012431
ES10CCATGAACAGGCTCCGTAAGCGTTATGATACTGAATCTCATGCC58LSC75648828817234
ES11TGGATTATGACGTGGATTGTATCGGTAGGACTGGTGCCGACAGTTCATC58LSC-IRA823189486412547
ES12CCAAACATATGCGGATCAAATCACGGAATATTGGAGTTAACCATATTATC56IRA-SSC9413010641912290
ES13CCAAATTCCAGATTCCAGCAAAACCATTGCTTCGTCTGGT54SSC1054001129867587
ES14CCCATGTGAGATACGGAGGATGAAATTCTCGAGCCCAAAG56SSC1115731193687796
ES15TGTAAATACCCTAATATAGGTTCGCCCAAACATATGCGGATCAAATCACG56SSC-IRB11847713042811952
ES16GTAGGACTGGTGCCGACAGTTCATCTAGGTATTAGTACTATGGCATTC60IRB-LSC12969429012013

1 LSC: Large single-copy, SSC: Small single-copy, IRA: Inverted repeat A, IRB: Inverted repeat B.

2 Position (base pairs) in the M. sinensis chloroplast genome sequence.

1 LSC: Large single-copy, SSC: Small single-copy, IRA: Inverted repeat A, IRB: Inverted repeat B. 2 Position (base pairs) in the M. sinensis chloroplast genome sequence. The complete cp genomes of E. arundinaceus (141,210 bp) and M. sinensis (141,416 bp) had typical circular structures (Fig 1). The cp genome of E. arundinaceus included a large single-copy (LSC) region (83,170 bp) and a small single-copy (SSC) region (12,516 bp), which were separated by a pair of IRs (IRa and IRb; 22,762 bp each); that of M. sinensis consisted of an LSC (83,141 bp), an SSC (12,681bp), and two IRs (22,797 bp each). The GC content was 38.5% in the E. arundinaceus genome and 38.4% in the M. sinensis genome; these values were similar to those of other Panicoideae including S. officinarum [31], M. sinensis [33], and S. bicolor [36]. The number of genes was 143 in E. arundinaceus and 141 in M. sinensis, including 86 and 84 protein–coding genes, respectively. Each genome contained 8 ribosomal RNA (rRNA) genes and 49 transfer RNA (tRNA) genes. Coding genes accounted for 58.9% (E. arundinaceus) and 58.4% (M. sinensis) of the genomes (Table 2). The difference in the gene number was due to a difference in ycf68 in the IR regions, which appeared to be a pseudogene in M. sinensis because of a frame-shift mutation. S. officinarum and E. arundinaceus have the complete ycf68 open reading frame, whereas S. bicolor has a frame-shift mutation at the same position as in M. sinensis. The members of the Saccharum complex also have lost accD, ycf1, and ycf2, which are absent in the cp genomes of other Panicoideae grasses [33, 36, 37]. We also found that the start codons of the rpl2 and rps19 genes are likely to convert to ACG and GTG via RNA editing during translation both in E. arundinaceus and M. sinensis, as reported in other species [37-39].
Fig 1

Chloroplast genome maps of Erianthus arundinaceus and Miscanthus sinensis.

The genes of different functional groups are indicated in different colors. Genes on the inside and outside of the maps are transcribed clockwise and counter-clockwise, respectively. The thick lines on the inner circles indicate inverted repeats (IRa and IRb), which separate the genomes into the small single-copy (SSC) and large single-copy (LSC) regions.

Table 2

Characteristics of the chloroplast genomes in three genera of the Saccharum complex and Sorghum bicolor.

SpeciesGenome organization1Number of genes2GenBank accession No.
Length of total genomeLength of LSCLength of SSCLength of IRGC/AT contents (%)TotalCDS3rRNAtRNA
S. officinarum141,18283,04812,54422,79538.4/61.613687 (8) 48 (4) 441 (8) 4NC006084 [31]
E. arundinaceus141,21083,17012,51622,76238.5/61.513687 (8)8 (4)41 (8)LC160130 [This study]
M. sinensis141,41683,14112,68122,79738.4/61.613485 (7)8 (4)41 (8)LC160131 [This study]
M. sinensis141,37283,16312,65922,77538.4/61.613485 (7)8 (4)41 (8)NC028721 [33]
M. sacchariflorus141,33283,20712,57522,77538.4/61.613485 (7)8 (4)41 (8)NC028720 [33]
S. bicolor140,75482,68812,50222,78238.5/61.513586 (7)8 (4)41 (8)NC008602 [36]

1 Length is indicated in base pairs.

2 Including genes detected in this study (not annotated in GenBank).

3 Including ycf15 and ycf68.

4 The numbers of duplicated genes are shown in parentheses.

Chloroplast genome maps of Erianthus arundinaceus and Miscanthus sinensis.

The genes of different functional groups are indicated in different colors. Genes on the inside and outside of the maps are transcribed clockwise and counter-clockwise, respectively. The thick lines on the inner circles indicate inverted repeats (IRa and IRb), which separate the genomes into the small single-copy (SSC) and large single-copy (LSC) regions. 1 Length is indicated in base pairs. 2 Including genes detected in this study (not annotated in GenBank). 3 Including ycf15 and ycf68. 4 The numbers of duplicated genes are shown in parentheses.

Sequence variations in cp genomes

We compared sequences determined in this study with those previously registered in GenBank (28 pertial sequences including 10 regions for E. arundinaceus and the whole cp genome sequence for M. sinensis). For E. arundinaceus, sequence variations were identified at ten sites in seven regions, of which four sites (in trnG–trnfM, atpBrbcL, trnK intron, and rpl16 intron) were mutated in repeat regions (poly A or T). Base substitutions were detected at six sites (atpA–rps14, three sites in the rpl16 intron, rps16trnQ, and rps3). In the atpA–rps14 intergenic spacer region we found an adenine-to-cytosine transition (A-to-C; A in Japanese accessions and C in Indonesian accessions), which could reflect geographical variation (S1 Fig). Detailed comparison between the M. sinensis sequence determined in this study and the previously reported one [33] (NC028721) detected three SNPs and nine indels. Of these, an indel in rpoC2 and a SNP in ycf3 resulted in amino acid sequence changes (S2 Table).

Whole-genome comparison in the Saccharum complex

A global alignment of the Saccharum complex cp genomes with the Zea mays cp genome (NC001666) as a reference is shown in Fig 2. High sequence similarities in the protein-coding regions were detected. The IR regions showed lower levels of sequence divergence than the single-copy regions, although there was some gene loss. The gene order was identical in E. arundinaceus, M. sinensis, and S. officinarum. However, detailed comparisons within the Saccharum complex revealed a number of SNPs and indels (Table 3). The rates of SNP substitutions (nonsynonymous [dN] and synonymous [dS]) and their ratio (dN / dS) among the 76 protein-coding genes in comparison with those of Z. mays are shown in Table 4. The dN (0.0039) and dS (0.0170) values of E. arundinaceus were slightly higher than those of the other genera. The dN/dS values of the Saccharum complex were smaller than 1.0, similar to those of other Poaceae [40-42]; these values suggest purifying selection of the cp protein-coding genes in these genera.
Fig 2

Alignment of whole chloroplast genome sequences from four Panicoideae species.

Chloroplast genomes were aligned by using the mVISTA program with the Zea mays sequence as a reference. The X- and Y-scales indicate the coordinates within cp genomes and the percentage of identity (50%–100%), respectively. Genome regions (exons, introns, and conserved non-coding sequences) are color-coded. Gray arrows indicate the direction of transcription of each gene. The genes encoding transfer RNAs (trn) are indicated under gray arrows using the single-letter amino acid code (e.g., K: trnK).

Table 3

SNPs and indels between Erianthus arundinaceus and Miscanthus sinensis chloroplast genomes.

CategorySNPIndelTotal
Photosystem IpsaA (2), psaC (1)3
Photosystem IIpsbB (4), psbC (4), psbD (1), psbE (4), psbM (1), psbN (1), psbT (1), psbZ (1)17
ATP synthaseatpA (1), atpB (4), atpI (2)7
CytochromepetA (2), petB (2), petD (3),petB (1)8
NADPHndhA (1), ndhB (1), ndhC (2), ndhD (6), ndhF (7), ndhG (3), ndhH (6), ndhI (1), ndhJ (3), ndhK (2)ndhA (1)33
TranscriptionrpoA (4), rpoB (11), rpoC1 (7), rpoC2 (14)36
Ribosomal proteins (Small subunit)rps2 (1), rps3 (4), rps8 (1), rps11 (1), rps14 (2), rps15 (1), rps18 (2)12
Ribosomal proteins (Large subunit)rpl14 (1), rpl16 (1), rpl20 (2), rpl22 (1), rpl32 (1), rpl33 (1)7
OtherinfA (1), ycf3 (2), ycf4 (1), ycf68 (1), matK (12), ccsA (5)matK (1), ccsA (1)24
RubiscorbcL (5)5
Non-codingIntron (38), IGS1 (360)124522
Total546 (148) 2128674

1 Intergenic spacer region.

2 Parenthesis shows SNPs in protein-coding genes.

Table 4

Substitution rates on 76 protein-coding chloroplast genes in three genera of the Saccharum complex and Sorghum bicolor.

SpeciesSubstitution rate 1dN / dS
dNdS
S. officinarum0.0030±0.00070.0144±0.00300.2460
E. arundinaceus0.0039±0.00100.0170±0.00300.2433
M. sinensis0.0036±0.00100.0152±0.00300.2548
S. bicolor0.0038±0.00090.0183±0.00280.2140

1 dN: the rates of nonsynonymous, dS: the rates of synonymous substitutions, Zea mays was used as a reference.

Alignment of whole chloroplast genome sequences from four Panicoideae species.

Chloroplast genomes were aligned by using the mVISTA program with the Zea mays sequence as a reference. The X- and Y-scales indicate the coordinates within cp genomes and the percentage of identity (50%–100%), respectively. Genome regions (exons, introns, and conserved non-coding sequences) are color-coded. Gray arrows indicate the direction of transcription of each gene. The genes encoding transfer RNAs (trn) are indicated under gray arrows using the single-letter amino acid code (e.g., K: trnK). 1 Intergenic spacer region. 2 Parenthesis shows SNPs in protein-coding genes. 1 dN: the rates of nonsynonymous, dS: the rates of synonymous substitutions, Zea mays was used as a reference. The distribution of microsatellites (also called simple sequence repeats) in the cp genomes of E. arundinaceus and M. sinensis is shown in Table 5. A total of 40 microsatellite regions (≥8 bp) were identified in E. arundinaceus, including 36 mono-, 3 tri-, and one tetranucleotide repeats. In M. sinensis, a total of 38 regions were identified, including 36 mono-, one tri-, and one tetranucleotide repeats. The majority of repeats were located in non-coding regions, whereas some were found in genes such as psbC, rpoB, ndhK, infA, and rpl22. Two microsatellites (in rps16trnQ/UUG and trnR/UCU–trnfM/CAU) were found in E. arundinaceus but not in M. sinensis.
Table 5

Microsatellites in Erianthus arundinaceus and Miscanthus sinensis chloroplast genomes.

LocationMotifE. arundinaceusM. sinensis
SequenceStartEndSequenceStartEnd
matK-trnK/UUUMono(A/T)835483555(A/T)1035503560
matK-trnK/UUUMono(A/T)1537563771(A/T)1337603773
trnK/UUU-rps16Mono(A/T)1141184129(A/T)1041204130
rps16-trnQ/UUGTri(ATT)458445856---
rps16-trnQ/UUGMono(A/T)1164176428(A/T)1362146227
psbK-psbIMono(A/T)1077577767(A/T)1464716485
trnS/GCU-psbDMono(A/T)1290679079(A/T)1087468756
psbCMono(G/C)101103311043(G/C)101103211042
trnG/GCC-trnfM/CAUMono(A/T)101344713457(A/T)121344013452
trnT/GGU-trnE/UUCMono(A/T)151661416629(A/T)111663816649
trnT/GGU-trnE/UUCMono(A/T)111670816719(A/T)91672816736
trnD/GUC-psbMMono(A/T)141871718731(A/T)111873618747
psbM-petNMono(A/T)141926719281(A/T)111927919290
trnC/GCA-rpoBMono(A/T)112112421135(A/T)132111121124
rpoBMono(A/T)103197031980(A/T)103196131971
atpI-atpHMono(A/T)103414834158(A/T)143414034154
atpI-atpHMono(A/T)93468434692(A/T)123469634708
atpF intronMono(A/T)93587135879(A/T)103589235902
atpA-trnR/UCUMono(A/T)93874338751(A/T)103876438774
trnR/UCU-trnfM/CAUTri(ATT)73890138922---
psaA-ycfIIIMono(A/T)134432544338(A/T)134567545688
trnT/UGU-trnL/UAAMono(A/T)114891048921(A/T)84889648903
trnL/UAA-trnF/GAAMono(A/T)105027450284(A/T)85028250289
ndhKMono(A/T)145243252446(A/T)145243852452
trnM/CAU-atpETetra(AGGT)45473154747(AGGT)35473654747
atpE-rbcLMono(A/T)125679956811(A/T)115698256993
atpE-rbcLMono(A/T)125745757469(A/T)115745457465
rpl23-psaIMono(A/T)105977459784(A/T)85977759784
psaI-ycf4Mono(A/T)106022560235(A/T)96022660234
petA-psbJMono(A/T)116364563656(A/T)96364463652
psbE-petLMono(A/T)146580565819(A/T)106560465614
rpl33-rps18Mono(A/T)126826068272(A/T)126821968231
petB intronMono(A/T)127405674068(A/T)117855578566
infAMono(A/T)107914579155(A/T)107910479114
infAMono(A/T)107916379173(A/T)107912279132
rpl16 intronMono(A/T)138136381376(A/T)158132881343
rps3-rpl22Mono(A/T)108262482634(A/T)108169881708
rpl22Tri(CTT)48306383075(CTT)48302883040
rpl32-trnL/UAGMono(A/T)9109365109373(A/T)11109392109403
ndhA intronMono(A/T)7116146116152(A/T)12116318116330

Phylogenetic analyses

Phylogenetic analyses were performed on an alignment of concatenated nucleotide sequences of 76 protein-coding genes from 40 angiosperm species (39 monocots and one dicot). After all positions containing gaps and missing data were excluded, the final dataset contained a total of 17,396 nucleotide sequences. Maximum likelihood (ML) analysis resulted in a single tree with the highest log-likelihood (lnL) of −89413.4029. Of the 37 nodes, 29 had bootstrap values of ≥95% and 24 of these had bootstrap values of 100% (Fig 3). Maximum parsimony (MP) analysis generated a single most parsimonious tree with a length of 11,454 (consistency index, 0.57; retention index, 0.86; data not shown). The ML and MP trees had similar topology, which was also similar to those of the published phylogenetic trees of grasses based on complete cp genomes [37, 43]. The 39 monocot taxa were divided into two major groups, one containing Poales, including the Saccharum complex, and the other one containing all other monocots. E. arundinaceus, M. sinensis, and S. officinarum were grouped into the PACMAD clade, which is one of the major Poaceae lineages. S. officinarum was more closely related to M. sinensis than to E. arundinaceus, in line with previous phylogenetic analyses [14, 44].
Fig 3

Phylogenetic analysis of 40 species including three genera of the Saccharum complex.

A phylogenetic tree was generated using the maximum-likelihood method based on the concatenated nucleotide sequences of 76 protein-coding chloroplast genes. Numbers beside the nodes indicate the bootstrap values (%) from 1,000 replicates.

Phylogenetic analysis of 40 species including three genera of the Saccharum complex.

A phylogenetic tree was generated using the maximum-likelihood method based on the concatenated nucleotide sequences of 76 protein-coding chloroplast genes. Numbers beside the nodes indicate the bootstrap values (%) from 1,000 replicates.

Divergence time estimates

Using 76 concatenated protein-coding genes from the PACMAD clade, including the Saccharum complex, we estimated the divergence time with the Bayesian approach assuming a relaxed lognormal clock with the constrained calibration point of the oldest C4 lineage in Chloridoideae. As shown in Fig 4, the BEP and the PACMAD clades diverged 81.97 million years ago (mya). Within the PACMAD clade, Panicum virgatum (Paniceae) diverged from the other species 24.50 mya (range, 20.04–44.20 mya). E. arundinaceus was estimated to have diverged from the other genera of the Saccharum complex 9.14 mya (range, 0.91–17.99 mya), whereas M. sinensis and S. officinarum diverged approximately 3.64 mya (range, 0.01–9.01 mya).
Fig 4

Divergence times of the PACMAD clade.

A Bayesian relaxed-clock approach based on 76 concatenated protein-coding chloroplast genes was used to estimate divergence times.

Divergence times of the PACMAD clade.

A Bayesian relaxed-clock approach based on 76 concatenated protein-coding chloroplast genes was used to estimate divergence times.

Discussion

Features of the chloroplast genomes of E. arundinaceus and M. sinensis

In this study, we determined the complete cp genome sequences of the members of the Saccharum complex, E. arundinaceus and M. sinensis, using 454 GS FLX pyrosequencing and Sanger sequencing. Pyrosequencing has been increasingly used for the sequencing of entire cp genomes, including those of species from several genera of the Poaceae family [33, 36, 37], because of its high throughput and low cost. However, homopolymer stretches (mononucleotide repeats) cause errors in pyrosequencing data; these errors are generally difficult to correct by increasing sequence read depth [45, 46]. In addition, alignment gaps are often allowed in the assembled sequences [45]. In this study, we designed 258 primers, which made it possible to complete sequencing of the entire E. arundinaceus cp genome, and applied these primers to M. sinensis. These primers have high identity with other plant cp genome sequences such as those of S. officinarum and S. bicolor (S1 Table), and could be used, together with pyrosequencing, for resequencing of ambiguous sites such as homopolymeric and gap regions in Poaceae cp genomes, but also for sequencing of entire cp genomes. Homopolymers are often present in cp genomes and may be used as microsatellite markers. Because the cp genome sequences are highly conserved among grasses, microsatellite primers for cp genomes are transferable across species and genera. In addition, homopolymers are highly polymorphic, and are valuable markers for the analysis of differentiation and population structure, although overall the cp genome sequences are highly conserved. Inter- and intraspecific variations of cp microsatellites have been used to estimate the genetic diversity and phylogenetic relationships among species and genera [47]. With a threshold of ≥8 bp, we found 40 microsatellite loci for E. arundinaceus and 38 for M. sinensis, including 3 tri- and one tetranucleotide repeats, which were located mostly in non-coding regions. This information could be useful for the development of microsatellite markers for the analysis of genetic diversity in Erianthus, Miscanthus, and related genera.

Comparison of the sequences within and among Saccharum complex species

Comparison of the sequences determined in this study and the sequences previously registered in GenBank identified some polymorphisms. Most of them were found in homopolymeric regions in E. arundinaceus. A base substitution identified in the atpA–rps14 intergenic spacer region reflects geographic heterogeneity. Comparison of the whole cp genome sequences of two M. sinensis accessions detected SNPs and indels at 12 sites. These results indicated the presence of intraspecific mutations in the highly conservative cp genome and could be useful for the analysis of genetic diversity and evolution of Erianthus, Miscanthus, and related genera. However, Yook et al. [48] have reported (on the basis of phenotypic and nuclear SSR genotypic analyses) that some M. sinensis accessions, including those used for cp genome sequencing, might be hybrids with M. sacchariflorus. Further studies are required to validate intraspecific mutations in M. sinensis. The gene contents differ slightly among the three genera of the Saccharum complex because of a frame-shift mutation that resulted in a premature stop codon and loss of the hypothetical gene ycf68 in these genera. Similar mutations have been reported in some other plant species [49]. Intact copies of another hypothetical gene, ycf15, were detected in both E. arundinaceus and M. sinensis cp genomes, although in some other species this gene contains several internal stop codons and is thus nonfunctional [49]. The validity of ycf15 and ycf68 as protein-coding genes is questionable: according to Raubeson et al. [50], their pattern of evolution is not consistent with them encoding proteins. Therefore, these genes were excluded from subsequent analysis in this study and further investigation is required to understand their functions.

Phylogenetic relationships and evolution

Our phylogenetic analysis based on the variation of the nucleotide sequences of 76 protein-coding genes in cp genomes separated Poales from other monocot groups with a bootstrap value of 100%, which is largely consistent with a recent analysis of other cp genome sequences [40]. Our data suggest that S. officinarum is more closely related to M. sinensis than to E. arundinaceus. We estimated that S. officinarum and M. sinensis diverged 3.6 mya, which is in good agreement with divergence times previously estimated on the basis of nuclear (3.1–3.8 mya) genome diversity [51, 52]. A study based on restriction fragment length polymorphism analysis, which used 12 cp-specific probes and examined 32 Saccharum complex genotypes, showed that Erianthus diverged from other lineages early in the evolution of the subtribe Saccharinae [14]. Our analysis estimated the divergence time as 9.1 mya. In addition, E. arundinaceus diverged from the subtribe Sorghinae before the divergence of S. bicolor and the common ancestor of S. officinarum and M. sinensis. The present study showed that the cp genome of E. arundinaceus is more closely related to that of S. bicolor than to those of other members of the Saccharum complex. These data support the suggestion of Sobral et al. [14] that the evolutionary history of Erianthus may differ from that of other members of the Saccharum complex. In the Old World, Erianthus species comprise four cytotypes: diploid (2n = 2x = 20), triploid (2n = 3x = 30), tetraploid (2n = 4x = 40), and hexaploid (2n = 6x = 60), with a basic number of x = 10 [4]. The present study does not clarify how Erianthus was established, and additional investigations are required. Inclusion of different cytotypes in phylogenetic analysis based on cp genome sequences may provide useful information on the origin and establishment of this genus. Maternal origin of hybrids and polyploids of several species has been investigated using cpDNA variations [53-55]. The use of combined data on nuclear and cpDNA variations may help determine the origin and evolutionary history of polyploids [56]. In the subtribe Saccharinae, comparative analysis of nuclear genome variations in Saccharum and Miscanthus suggested that a whole-genome duplication occurred in their common ancestor [51]. This molecular phylogenetic approach, which is used to elucidate the origin and history of polyploidization, could also contribute to characterization of the phylogenetic relationships of Erianthus. Therefore, understanding nuclear genome variations, especially in low-copy nuclear loci [52, 57], together with cp genome variations would also be useful for clarifying the evolution of the Erianthus polyploid complex. Understanding its evolution could help us to gain more insight into the phylogenetic relationships of the Saccharum complex genera and provide useful information on their ancestor and polyploidization, which is critical for genetic studies and breeding in these genera.

Conclusion

Comparison of the complete cp genomes provided detailed information on genetic variations among three economically important genera, Saccharum, Erianthus, and Miscanthus. Comparison of the sequences indicated that S. officinarum and M. sinensis are more closely related to each other than to E. arundinaceus. We suggest that E. arundinaceus diverged from the subtribe Sorghinae before the divergence of S. bicolor and the common ancestor of S. officinarum and M. sinensis. This is the first report of phylogenetic and evolutionary relationships among the three genera of the Saccharum complex inferred from maternally inherited variations in whole cp genomes and gene data sets. Our results provide an important framework for understanding the phylogeny and evolutionary history of the Saccharum complex. Molecular data for the other genera of the complex, Narenga and Sclerostachya, are limited and further studies on these genera are needed to improve our understanding of the phylogeny and evolution of the Saccharum complex.

Materials and Methods

Plant materials and DNA extraction

The E. arundinaceus accession JW630 (Genebank accession number JP173957 at the Genetic Resources Center of the National Institute of Agrobiological Sciences, Japan; https://www.gene.affrc.go.jp/index_en.php) is a wild hexaploid collected in Shizuoka prefecture, Japan (the northernmost area of the wild E. arundinaceus range in Japan). The M. sinensis accession Niigata 410 (JP177091) is a wild diploid collected in Niigata prefecture, Japan. Plants were cultivated in a greenhouse at the National Agriculture and Food Research Organization, Institute of Livestock and Grassland Science (NARO-ILGS), and genomic DNA was isolated from fresh green leaves using the CTAB method [58].

E. arundinaceus chloroplast genome sequencing and assembly

The E. arundinaceus cp genome was sequenced by using pyrosequencing. Total E. arundinaceus genomic DNA was sheared by nebulization and then amplified by emulsion PCR. Amplification products were sequenced on a 454 GS FLX Titanium platform (Roche, Basel, Switzerland) [59]. Chloroplast sequence reads were extracted by local BLASTN searches using the cp genome of S. officinarum [31] as a reference and assembled with Newbler software (v 2.5; Roche). Homopolymer regions (poly A/T and poly G/C) and the junctions between single-copy regions (LSC and SSC) and IRs were amplified and confirmed using primers designed from the E. arundinaceus cp sequence (S1 Table) and PrimeSTAR HS DNA polymerase (TaKaRa, Shiga, Japan). PCR products were purified in a QuickStep2 PCR Purification system (Edge Biosystems, Gaithersburg, MD, USA). They were cycle-sequenced with a BigDye Terminator Cycle Sequence Kit v3.1 (Life Technologies, Foster city, CA, USA) and sequenced using an ABI3130xl genetic analyzer (Life Technologies) using primers described below (S1 Table).

M. sinensis chloroplast genome sequencing

The M. sinensis cp genome was sequenced by using Sanger sequencing of PCR products. Sixteen primers to amplify overlapping products (1,747–12,913 bp) were designed from the E. arundinaceus cp genome sequence for initial amplification of the M. sinensis cp genome (Table 1). Amplification reactions and cycle-sequencing were performed as described above for E. arundinaceus. A total of 258 primers (S1 Table) were used to sequence the entire M. sinensis cp genome.

Annotation, microsatellite analysis, and comparison of the chloroplast genomes

The entire sequences of the E. arundinaceus and M. sinensis cp genomes were annotated using Dual Organellar GenoMe Annotator (DOGMA) software [60]. The predicted annotations were manually checked and verified by comparison with sequences from other PACMAD clade species. The circular chloroplast genome maps were drawn by GenomeVx software [61]. Microsatellites were predicted using MSATCOMMANDER 1.03 software [62]. We defined microsatellites as ≥10 repeats (10 bases) for mononucleotides, ≥8 repeats (16 bases) for dinucleotides, ≥5 repeats (15 bases) for trinucleotides, ≥4 repeats (16 bases) for tetranucleotides, ≥4 repeats (20 bases) for pentanucleotides, and ≥4 repeats (24 bases) for hexanucleotides. Genome structures among the genera of the Saccharum complex were compared using mVISTA software in Shuffle-LAGAN mode [63]; sequence annotation of Z. mays was used.

Substitution rates

Substitution rates were calculated using the PAMLX package [64]. The program CODEML in PAMLX was employed to estimate the rates of nonsynonymous (dN) and synonymous (dS) substitutions and their ratio (dN / dS) in 76 cp protein-coding genes aligned by using PAL2NAL [65]. The maximum likelihood (ML) tree (see below) was used as a topologically constrained tree. The F3 × 4 model was adopted for codon frequencies under the branch-site model (model = 2, NSsites = 2, and cleandata = 1).

Phylogenetic analysis

Nucleotide sequences of 76 cp protein-coding genes of 37 monocot angiosperms and one dicot angiosperm (Artemisia frigida) available in the GenBank database, and those of E. arundinaceus and M. sinensis were concatenated and aligned using Clustal W [66]. After manual editing, phylogenetic analyses using ML and maximum parsimony (MP) were performed with MEGA6 [67] using subtree-pruning-regrafting and nearest-neighbor-interchange algorithms, respectively. The gaps in the alignment were treated as missing data and statistical support at each node was assessed by bootstrapping [68] with 1,000 replicates. Bootstrap values are indicated on the tree.

Estimation of divergence time of the Saccharum complex

A set of 76 protein-coding genes was aligned and used for the estimation of divergence time. The analysis was performed with nine species including three species of the Saccharum complex with a focus on the PACMAD clade (Fig 4) using the BEAST2 program, which infers tree topology, branch lengths, and node ages by using Bayesian inference and Markov Chain Monte Carlo (MCMC) analysis [69]. The AIC (Akaike Information Criterion) analysis was performed by using jModelTest 2.1.6 [70] to identify the best fit of the substitution model for mutation rates. BEAUti in the BEAST2 program was used to set the criteria for the analysis. We used the GTR (general-time reversible) model of nucleotide substitution with five categories of gamma-distributed rate. An uncorrelated lognormal model of rate variation among branches was assumed and a Yule prior on the birth rate of new lineages was employed [71]. A single divergence time was previously estimated, assuming that the major diversification of the grass groups occurred 80 mya and the Andropogoneae crown diverged 20 mya [72, 73]; these two time points were used to calibrate the age of the stem nodes. Two independent MCMC runs were performed for 10 million generations with tree sampling every 1,000 generations. The results were checked with Tracer 1.6 [74], and the sampled trees were summarized by using TreeAnnotator v.2.1.2 available in the BEAST2 package, and edited by using FigTree v.1.4.2 [75]. The mean and the estimated 95% highest posterior density interval for the divergence time are given for the major PACMAD lineages.

Sequence variations detected among the whole cp genome (sequenced in this study) and partial sequences (registered in Genbank) from Erianthus arundinaceus.

Origin is indicated as follows: JPN, Japan; IDN, Indonesia; IND, India; THA, Thailand. (PPTX) Click here for additional data file.

Sequencing primers designed from the Erianthus arundinaceus cp genome.

(XLSX) Click here for additional data file.

Summary of sequence variations detected between chloroplast genome sequences of two Miscanthus sinensis accessions.

(DOCX) Click here for additional data file.
  47 in total

1.  Chloroplast microsatellites: new tools for studies in plant ecology and evolution.

Authors:  J Provan; W Powell; P M. Hollingsworth
Journal:  Trends Ecol Evol       Date:  2001-03-01       Impact factor: 17.712

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes.

Authors:  Shu-Miaw Chaw; Chien-Chang Chang; Hsin-Liang Chen; Wen-Hsiung Li
Journal:  J Mol Evol       Date:  2004-04       Impact factor: 2.395

4.  Genome sequencing in microfabricated high-density picolitre reactors.

Authors:  Marcel Margulies; Michael Egholm; William E Altman; Said Attiya; Joel S Bader; Lisa A Bemben; Jan Berka; Michael S Braverman; Yi-Ju Chen; Zhoutao Chen; Scott B Dewell; Lei Du; Joseph M Fierro; Xavier V Gomes; Brian C Godwin; Wen He; Scott Helgesen; Chun Heen Ho; Chun He Ho; Gerard P Irzyk; Szilveszter C Jando; Maria L I Alenquer; Thomas P Jarvie; Kshama B Jirage; Jong-Bum Kim; James R Knight; Janna R Lanza; John H Leamon; Steven M Lefkowitz; Ming Lei; Jing Li; Kenton L Lohman; Hong Lu; Vinod B Makhijani; Keith E McDade; Michael P McKenna; Eugene W Myers; Elizabeth Nickerson; John R Nobile; Ramona Plant; Bernard P Puc; Michael T Ronan; George T Roth; Gary J Sarkis; Jan Fredrik Simons; John W Simpson; Maithreyan Srinivasan; Karrie R Tartaro; Alexander Tomasz; Kari A Vogt; Greg A Volkmer; Shally H Wang; Yong Wang; Michael P Weiner; Pengguang Yu; Richard F Begley; Jonathan M Rothberg
Journal:  Nature       Date:  2005-07-31       Impact factor: 49.962

5.  Structural features and transcript-editing analysis of sugarcane (Saccharum officinarum L.) chloroplast genome.

Authors:  Tercilio Calsa Júnior; Dirce Maria Carraro; Matheus Romanos Benatti; Alexandre Corrêa Barbosa; João Paulo Kitajima; Helaine Carrer
Journal:  Curr Genet       Date:  2004-11-04       Impact factor: 3.886

6.  Assessing genetic diversity in a sugarcane germplasm collection using an automated AFLP analysis.

Authors:  P Besse; G Taylor; B Carroll; N Berding; D Burner; C L McIntyre
Journal:  Genetica       Date:  1998-10       Impact factor: 1.082

7.  Phylogenetics of Miscanthus, Saccharum and related genera (Saccharinae, Andropogoneae, Poaceae) based on DNA sequences from ITS nuclear ribosomal DNA and plastid trnLintron and trnL-F intergenic spacers.

Authors:  Trevor R Hodkinson; Mark W Chase; M Dolores Lledó; Nicolas Salamin; Stephen A Renvoize
Journal:  J Plant Res       Date:  2002-08-28       Impact factor: 2.629

8.  Complete nucleotide sequence of the sugarcane (Saccharum officinarum) chloroplast genome: a comparative analysis of four monocot chloroplast genomes.

Authors:  Takayuki Asano; Takahiko Tsudzuki; Sakiko Takahashi; Hiroaki Shimada; Koh-ichi Kadowaki
Journal:  DNA Res       Date:  2004-04-30       Impact factor: 4.458

9.  VISTA: computational tools for comparative genomics.

Authors:  Kelly A Frazer; Lior Pachter; Alexander Poliakov; Edward M Rubin; Inna Dubchak
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

10.  Relaxed phylogenetics and dating with confidence.

Authors:  Alexei J Drummond; Simon Y W Ho; Matthew J Phillips; Andrew Rambaut
Journal:  PLoS Biol       Date:  2006-03-14       Impact factor: 8.029

View more
  4 in total

1.  Comparative analysis and phylogenetic investigation of Hong Kong Ilex chloroplast genomes.

Authors:  Bobby Lim-Ho Kong; Hyun-Seung Park; Tai-Wai David Lau; Zhixiu Lin; Tae-Jin Yang; Pang-Chui Shaw
Journal:  Sci Rep       Date:  2021-03-04       Impact factor: 4.379

2.  Comparative Analysis of Chloroplast Genome in Saccharum spp. and Related Members of 'Saccharum Complex'.

Authors:  Sicheng Li; Weixing Duan; Jihan Zhao; Yanfen Jing; Mengfan Feng; Bowen Kuang; Ni Wei; Baoshan Chen; Xiping Yang
Journal:  Int J Mol Sci       Date:  2022-07-11       Impact factor: 6.208

3.  Whole chloroplast genome and gene locus phylogenies reveal the taxonomic placement and relationship of Tripidium (Panicoideae: Andropogoneae) to sugarcane.

Authors:  Dyfed Lloyd Evans; Shailesh V Joshi; Jianping Wang
Journal:  BMC Evol Biol       Date:  2019-01-25       Impact factor: 3.260

4.  Erianthus germplasm collection in Thailand: genetic structure and phylogenetic aspects of tetraploid and hexaploid accessions.

Authors:  Shin-Ichi Tsuruta; Suparat Srithawong; Suchirat Sakuanrungsirikul; Masumi Ebina; Makoto Kobayashi; Yoshifumi Terajima; Amarawan Tippayawat; Werapon Ponragdee
Journal:  BMC Plant Biol       Date:  2022-01-22       Impact factor: 4.215

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.