| Literature DB >> 34814832 |
Dong-Mei Li1, Jie Li2, Dai-Rong Wang2, Ye-Chun Xu2, Gen-Fa Zhu3.
Abstract
BACKGROUND: Zingiberoideae is a large and diverse subfamily of the family Zingiberaceae. Four genera in subfamily Zingiberoideae each possess 50 or more species, including Globba (100), Hedychium (> 80), Kaempferia (50) and Zingiber (150). Despite the agricultural, medicinal and horticultural importance of these species, genomic resources and suitable molecular markers for them are currently sparse.Entities:
Keywords: Chloroplast genome; Divergent hotspots; Genome evolution; Phylogeny; Zingiberaceae; Zingiberoideae
Mesh:
Year: 2021 PMID: 34814832 PMCID: PMC8611967 DOI: 10.1186/s12870-021-03315-9
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Fig. 1Chloroplast genome map of G. lancangensis (GenBank accession number: MT473704; the outermost three rings) and CGView comparison [31] of eighteen Zingiberoideae chloroplast genomes (the inter rings with different colors). Genes belonging to different functional groups are shown in different colors in the outermost first ring. Genes shown on the outside of the outermost first ring are transcribed counter-clockwise and on the inside clockwise.The gray arrowheads indicate the direction of the genes. The tRNA genes are indicated by one letter code of amino acids with anticodons. LSC, large single copy region; IR, inverted repeat; SSC, small single copy region. The outermost second ring with darker gray corresponds to GC content, whereas the outermost third ring with the lighter gray corresponds to AT content of G. lancangensis chloroplast genome by OGDRAW [30]. The innermost first black ring indicates the chloroplast genome size of G. lancangensis. The innermost second and third rings indicate GC content and GC skews deviations in chloroplast genome of G. lancangensis, respectively: GC skew + indicates G > C, and GC skew – indicates G < C. From innermost fourth color ring to outwards 21st ring in turn: G. lancangensis MT473704, G. marantina MT473705, G. multiflora MT473706, G. schomburgkii MK262735, G. schomburgkii var. angustata MT473707, H. coccineum MT473708, H. coronarium MK262736, H. neocarneum MT473709, H. spicatum NC_047248, K. galanga MK209001, K. elegans MK209002, K. rotunda ‘Red Leaf’ MT473710, K. rotunda ‘Silver Diamonds’ MT473711, Z. montanum MK262727, Z. officinale NC_044775, Z. recurvatum MT473712, Z. spectabile JX088661 and Z. zerumbet MK262726; chloroplast genome similar and highly divergent locations are represented by continuous and interrupted track lines, respectively. The sequenced species studied here were marked in bold
Comparison of ten chloroplast genomes features among the nine Zingiberoideae species studied
| Genome feature | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Genome size (bp) | 163,306 | 162,774 | 163,199 | 163,325 | 163,432 | 163,968 | 163,903 | 162,630 | 162,875 | 163,151 |
| LSC length (bp) | 88,545 | 87,989 | 87,994 | 88,451 | 88,556 | 88,632 | 88,541 | 87,172 | 87,306 | 87,780 |
| SSC length (bp) | 15,393 | 15,425 | 15,715 | 15,525 | 15,526 | 15,798 | 15,824 | 15,800 | 15,917 | 15,787 |
| IR length (bp) | 29,684 | 29,680 | 29,745/ 29,742 | 29,673/ 29,676 | 29,675/ 29,678 | 29,769 | 29,769 | 29,829/ 29,833 | 29,826 | 29,792 |
| GC content (%) | ||||||||||
| Total genome | 35.73 | 35.92 | 35.85 | 35.85 | 35.83 | 36.08 | 36.08 | 36.18 | 36.13 | 36.12 |
| LSC | 33.35 | 33.58 | 33.60 | 33.51 | 33.47 | 33.83 | 33.85 | 34.02 | 33.97 | 33.91 |
| SSC | 29.07 | 29.40 | 28.83 | 29.17 | 29.16 | 29.56 | 29.51 | 29.60 | 29.44 | 29.53 |
| IR | 41.02/41.03 | 41.07 | 41.02/41.03 | 41.09/41.10 | 41.09/41.10 | 41.15 | 41.15 | 41.08/41.09 | 41.08/41.09 | 41.14 |
| CDS | 36.59 | 36.68 | 36.68 | 36.68 | 36.68 | 37.22 | 37.22 | 36.94 | 36.96 | 36.95 |
| Genes (total/different) | 141/113 | 140/113 | 140/113 | 141/111 | 140/113 | 141/113 | 141/113 | 140/113 | 140/113 | 140/113 |
| CDS (total/different) | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 | 87/79 |
| tRNA (total/different) | 46/30 | 45/30 | 45/30 | 46/28 | 45/30 | 46/30 | 46/30 | 45/30 | 45/30 | 45/30 |
| rRNA (total/different) | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 | 8/4 |
| Genes with introns | 18 | 17 | 17 | 18 | 17 | 17 | 17 | 17 | 17 | 17 |
| Different CDS in LSC | 61 | 61 | 61 | 61 | 61 | 61 | 61 | 61 | 61 | 61 |
| Different CDS in SSC | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| Different CDS in IRA | 9 | 8 | 8 | 9 | 8 | 8 | 8 | 9 | 8 | 8 |
| Different CDS in IRB | 8 | 8 | 8 | 9 | 8 | 8 | 8 | 8 | 8 | 8 |
| Different genes in IRs | 21 | 20 | 20 | 20 | 20 | 20 | 20 | 21 | 20 | 20 |
| GenBank accession | MT473704 | MT473705 | MT473706 | MK262735 | MT473707 | MT47308 | MT473709 | MT473710 | MT473711 | MT473712 |
Note: LSC large single copy region, SSC small single copy region, IR inverted repeat, CDS protein coding genes
Genes present in the ten sequenced chloroplast genomes in subfamily Zingiberoideae
| Category for genes | Group of genes | Name of genes |
|---|---|---|
| Photosynthesis | Subunits of photosystem I | |
| Subunits of photosystem II | ||
| Subunits of cytochrome b/f complex | ||
| Subunits of ATP synthase | ||
| Subunits of NADH dehydrogenase | ||
| Subunit of rubisco | ||
| Self-replication | RNA polymerase | |
| Large subunit of ribosomal proteins | ||
| Small subunit of ribosomal proteins | ||
| Ribosomal RNAs | ||
| Transfer RNAs | ||
| Other genes | Subunit of acetyl-coA-carboxylase | |
| c-type cytochrome synthesis gene | ||
| Envelop membrane protein | ||
| Protease | ||
| Translational initiation factor | ||
| Maturase | ||
| Unknown function | Conserved open reading frames |
Note: (×2): gene with two copies; (×4): gene with four copies; *: gene containing one intron; **: gene containing two introns; ①: psbZ gene is only present in the chloroplast genomes of H. neocarneum and H. coccineum, respectively; ②: lhbA gene is missing in the chloroplast genomes of H. neocarneum and H. coccineum, respectively; ③: trnS-GCU and trnT-UGU exist two gene copies only in the chloroplast genome of G. schomburgkii, and only once in other 9 sequenced chloroplast genomes in this study; ④: trnS-GGA and trnT-GGU are missing in the chloroplast genome of G. schomburgkii
Fig. 2Codon content of all protein coding genes of ten sequenced chloroplast genomes in subfamily Zingiberoideae. a amino acids and stop codons proportion in protein coding sequences of ten sequenced chloroplast genomes and b heat map analysis for codon distribution of all protein coding genes of ten sequenced chloroplast genomes. Red colour indicates higher RSCU [32, 33] values and blue colour indicates lower RSCU values
Fig. 3Comparison of the simple sequence repeats (SSRs) among ten sequenced chloroplast genomes in subfamily Zingiberoideae. a the number of different SSR types detected in ten Zingiberoideae chloroplast genomes. b the frequency of the identified SSRs in different repeat class types. c the frequencies of the identified SSRs in the LSC, SSC and IR regions. d the SSR distribution in protein coding regions, intron regions and intergenic regions detected in ten Zingiberoideae chloroplast genomes
Fig. 4Long repeat sequences among ten sequenced chloroplast genomes in subfamily Zingiberoideae. a total of four long repeat types in ten Zingiberoideae chloroplast genomes and b numbers of long repeat sequences by length
Fig. 5Comparisons of LSC, SSC and IR regions boundaries among 18 chloroplast genomes in subfamily Zingiberoideae. Ψ, pseudogenes. The figure was not to scale with respect to sequence length, and only showed relative changes at or near the IR/SC borders. The ten sequenced chloroplast genomes in this study were marked in bold
Fig. 6Comparative plots of percent sequence identity of 18 chloroplast genomes in subfamily Zingiberoideae. The chloroplast genome of G. lancangensis was used as a reference genome (upper plot). The percentage of sequence identities were visualized in mVISTA software [34]. Gray arrows and thick black lines indicated gene orientation. Purple bars represented exons, sky-blue bars represented untranslated regions (UTRs), red bars represented non-coding sequences (CNS), gray bars represented mRNA and white regions represented sequence differences among all analyzed chloroplast genomes. The horizontal axis indicated the coordinates within the chloroplast genome. The vertical scale represented the identity percentage that ranged from 50 to 100%. The ten sequenced chloroplast genomes in this study were marked in bold
Fig. 7Nucleotide diversity (Pi) values of various regions in 18 chloroplast genomes in subfamily Zingiberoideae. a protein coding regions. Peak regions with a Pi value of > 0.0128 were labeled with loci tags of genic names. b intron and intergenic regions. Peak regions with a Pi value of > 0.033 were labeled with loci tags of intergenic region names
Evaluation of the identification capability of thirteen regions among four genera in subfamily Zingiberoideae
| Species | Bootstrap values of thirteen regions on ML trees | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 62 | 56 | 66 | 95 | 64 | 59 | 60 | 71 | 92 | 85 | 56 | 92 | 93 | |
| 99 | 100 | 99 | 92 | 80 | 100 | 92 | 87 | 98 | 100 | 99 | 87 | 71 | |
| 99 | 56 | 100 | 99 | 98 | 82 | 99 | 71 | 92 | 99 | 56 | 98 | 71 | |
| 89 | 99 | 99 | 92 | 98 | 72 | 99 | 28 | 100 | 75 | 91 | 89 | 96 | |
| 89 | 99 | 99 | 92 | 98 | 72 | 99 | 28 | 100 | 75 | 91 | 89 | 96 | |
| 28 | 100 | 50 | 36 | 90 | 99 | 88 | 73 | 69 | 63 | 89 | 42 | 88 | |
| 100 | 27 | 100 | 36 | 20 | 26 | 100 | 100 | 99 | 100 | 100 | 26 | 20 | |
| 28 | 24 | 82 | 71 | 17 | 70 | 30 | 54 | 29 | 28 | 31 | 26 | 14 | |
| 72 | 27 | 82 | 71 | 20 | 70 | 30 | 73 | 29 | 28 | 31 | 62 | 20 | |
| 40 | 86 | 98 | 58 | 61 | 72 | 99 | 93 | 33 | 7 | 94 | 60 | 35 | |
| 95 | 86 | 98 | 96 | 61 | 38 | 99 | 93 | 33 | 22 | 94 | 53 | 35 | |
| 96 | 100 | 100 | 99 | 97 | 95 | 87 | 99 | 82 | 97 | 87 | 93 | 62 | |
| 96 | 100 | 100 | 99 | 97 | 95 | 87 | 99 | 82 | 97 | 87 | 93 | 62 |
| 63 | 55 | 56 | 87 | 84 | 72 | 84 | 98 | 54 | 63 | 56 | 61 | 35 | |
| 49 | 87 | 80 | 98 | 60 | 95 | 61 | 75 | 13 | 41 | 27 | 61 | 43 | |
| 85 | 94 | 100 | 93 | 94 | 93 | 100 | 94 | 91 | 81 | 44 | 47 | 21 | |
| 63 | 98 | 56 | 86 | 60 | 72 | 81 | 64 | 13 | 89 | 20 | 53 | 35 | |
| 88 | 98 | 56 | 86 | 84 | 72 | 84 | 64 | 54 | 89 | 20 | 55 | 40 | |
| ratio(%) | 77.78 | 83.33 | 100 | 88.89 | 83.33 | 88.89 | 88.89 | 88.89 | 66.67 | 72.22 | 66.67 | 77.78 | 44.44 |
Note: ratio (%) = [(the total number of species-the number of species with bootstrap values below 50%) /the total number of species] × 100%; ①: ycf1 is here a protein coding gene in chloroplast genome. The sequenced species in this study were marked in bold
Positive selective amino acid loci and estimation of parameters for fourteen genes in subfamily Zingiberoideae
| Gene | Ln L | Estimates of parameters | Positively selected sites |
|---|---|---|---|
| −2726.384456 | p0 = 0.96351 (p1 = 0.003649) ω = 11.82130 | 4 W 1.000**, 9 L 0.987*, 218H 0.958*, 299R 0.968* | |
| − 1712.097311 | p0 = 0.97744 (p1 = 0.02256) ω = 13.95433 | 87 T 0.957*, 180 L 0.994**, 200Y 1.000**, 201 K 1.000** | |
| − 1991.585023 | p0 = 0.98899 (p1 = 0.01101) ω = 57.72615 | 132F 0.964*, 189S 1.000**, 190S 1.000**, 191 T 1.000**, 192 V 1.000** | |
| − 2119.544926 | p0 = 0.98391 (p1 = 0.01609) ω = 43.84612 | 133 V 0.953*, 181 T 0.957*, 246P 0.955* | |
| −162.569004 | p0 = 0.97242 (p1 = 0.02758) ω = 999.00000 | 20 L 1.000** | |
| − 2346.044248 | p0 = 0.97658 (p1 = 0.02342) ω = 14.08875 | 169 L 0.980*, 225I 0.996**, 226Y 0.997**, 247C 0.963*, 255I 0.955*, 407 L 0.980*, 424 L 0.999**, 449S 1.000** | |
| −806.763954 | p0 = 0.94834 (p1 = 0.05166) ω = 8.41059 | 118 K 0.998**, 125Y 1.000** | |
| − 3213.853612 | p0 = 0.98613 (p1 = 0.01387) ω = 11.40945 | 147 N 0.972*, 606D 0.971* | |
| − 6806.754011 | p0 = 0.98887 (p1 = 0.01113) ω = 11.89405 | 711Y 0.995**, 1174 W 0.984* | |
| − 759.324938 | p0 = 0.73184 (p1 = 0.26816) ω = 772.95793 | 1 M 0.955*, 2P 0.996**, 3 T 0.961*, 4I 0.956*, 5 K 0.956*, 6Q 1.000**, 7 L 0.998**, 8I 0.999**, 9R 0.974*, 10 N 0.999**, 11A 0.998**, 12R 0.959*, 13Q 1.000**, 14P 0.966*, 15I 0.989*, 16R 0.959*, 17 N 0.999**, 18 V 0.999**, 19 T 1.000**, 20 K 1.000**, 21S 0.998**, 22P 0.998**, 23A 0.963*, 24 L 0.998**, 25R 0.986*, 26E 0.998**, 27C 0.964*, 28P 0.998**, 29Q 1.000**, 30R 0.998**, 31R 0.999**, 32G 0.999**, 33 T 0.999**, 34C 0.962*, 35 T 0.956*, 36R 0.958*, 37 V 0.998**, 38Y 0.960*, 94R 0.952*, 116Q 0.952* | |
| − 553.014018 | p0 = 0.98952 (p1 = 0.01048) ω = 49.80264 | 27P 0.973* | |
| −10,584.294185 | p0 = 0.88238 (p1 = 0.11762) ω = 7.72414 | 14 L 0.994**, 16 M 0.985*, 48R 0.961*, 142 L 0.990**, 212A 0.989*, 215R 0.981*, 606D 0.975*, 663R 0.994**, 809Y 0.952*, 928P 0.992**, 1293 V 0.986*, 1302I 0.964*, 1341 M 1.000**, 1433 K 0.992**, 1439 N 0.999**, 1452 K 0.984*, 1453 K 0.982*, 1466 K 0.999**, 1469S 0.998**, 1473D 0.999**, 1499D 0.966*, 1506Q 0.991**, 1528E 0.988*, 1576F 0.990**, 1586Y 1.000**, 1590 K 0.990**, 1604P 0.990**, 1621A 0.987*, 1628 L 0.991**, 1629 N 0.993**, 1632D 0.993**, 1651G 0.987*, 1667S 0.995**, 1757 L 0.961* | |
| −10,373.971098 | p0 = 0.93261 (p1 = 0.06739) ω = 20.73253 | 220P 0.993**, 998D 0.993**, 1069I 0.993**, 1324 L 0.994**, 1343F 1.000**, 1411S 0.993**, 1665I 0.993**, 1758R 0.993**, 1977A 0.993**, 2121D 0.999**, 2191R 0.994**, 2261 L 0.963*, 2263H 0.993**, 2265 T 0.999**, 2266G 0.995**, 2267E 0.993**, 2268R 0.993**, 2269F 0.999**, 2271I 0.993**, 2272P 0.994** | |
| −960.017298 | p0 = 0.91059 (p1 = 0.08941) ω = 4.34827 | 181 M 0.962*, 184 L 0.971* |
Note: the degree of freedom for each gene was 38; * and ** indicate posterior probability higher than 0.95 and 0.99, respectively
Fig. 8Molecular phylogenetic tree based on the SNPs from 56 chloroplast genomes of family Zingiberaceae. C. indica, C. pulverulentus and C. viridis set as the outgroups. The tree was constructed with maximum likelihood analysis of SNP matrix using MEGA software [32]. The stability of each tree node was tested by bootstrap method with 1000 replicates. Bootstrap values ≧ 50% were indicated numbers next to the branches. The ten sequenced chloroplast genomes in this study were marked in bold