| Literature DB >> 28763486 |
Sajjad Asaf1, Abdul Latif Khan2, Muhammad Aaqil Khan1, Qari Muhammad Imran1, Sang-Mo Kang1, Khdija Al-Hosni1, Eun Ju Jeong1, Ko Eun Lee1, In-Jung Lee1.
Abstract
The plastid genomes of different plant species exhibit significant variation, thereby providing valuable markers for exploring evolutionary relationships and population genetics. Glycine soja (wild soybean) is recognized as the wild ancestor of cultivated soybean (G. max), representing a valuable genetic resource for soybean breeding programmes. In the present study, the complete plastid genome of G. soja was sequenced using Illumina paired-end sequencing and then compared it for the first time with previously reported plastid genome sequences from nine other Glycine species. The G. soja plastid genome was 152,224 bp in length and possessed a typical quadripartite structure, consisting of a pair of inverted repeats (IRa/IRb; 25,574 bp) separated by small (178,963 bp) and large (83,181 bp) single-copy regions, with a 51-kb inversion in the large single-copy region. The genome encoded 134 genes, including 87 protein-coding genes, eight ribosomal RNA genes, and 39 transfer RNA genes, and possessed 204 randomly distributed microsatellites, including 15 forward, 25 tandem, and 34 palindromic repeats. Whole-plastid genome comparisons revealed an overall high degree of sequence similarity between G. max and G. gracilis and some divergence in the intergenic spacers of other species. Greater numbers of indels and SNP substitutions were observed compared with G. cyrtoloba. The sequence of the accD gene from G. soja was highly divergent from those of the other species except for G. max and G. gracilis. Phylogenomic analyses of the complete plastid genomes and 76 shared genes yielded an identical topology and indicated that G. soja is closely related to G. max and G. gracilis. The complete G. soja genome sequenced in the present study is a valuable resource for investigating the population and evolutionary genetics of Glycine species and can be used to identify related species.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28763486 PMCID: PMC5538705 DOI: 10.1371/journal.pone.0182281
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Gene map of the Glycine soja plastid genome.
Thick lines in the red area indicate the extent of the inverted repeat regions (IRa and IRb; 25,574 bp), which separate the genome into small (SSC; 17,896 bp) and large (LSC; 83,181 bp) single-copy regions. Genes located inside the circle are transcribed clockwise, and those outside the circle are transcribed counterclockwise. Genes belonging to different functional groups are colour-coded. The dark grey in the inner circle corresponds to the GC content, and the light grey corresponds to the AT content. The green colour arc indicates the location of the 51-kb inversion.
Summary of complete chloroplast genomes for ten Glycine species.
| Region | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Length (bp) | 83,181 | 83,174 | 83,174 | 83,175 | 83,579 | 83,174 | 83,815 | 84,027 | 83,937 | 83,839 | 83,773 |
| GC(%) | 32.8 | 32.8 | 32.8 | 32.8 | 32.7 | 32.7 | 32.7 | 32.7 | 32.8 | 32.7 | 32.7 |
| Length (%) | 54.64 | 54.64 | 54.64 | 54.64 | 54.64 | 54.58 | 54.8 | 54.9 | 54.99 | 54.87 | 54.85 |
| Length (bp) | 17,896 | 17,895 | 17,896 | 17,895 | 17,880 | 17,838 | 17,807 | 17,846 | 17,817 | 17,859 | 17,829 |
| GC(%) | 28.7 | 28.8 | 28.8 | 28.8 | 28.6 | 28.6 | 28.7 | 28.7 | 28.8 | 28.7 | 28.7 |
| Length (%) | 11.75 | 11.75 | 11.75 | 11.75 | 11.75 | 11.70 | 11.65 | 11.66 | 11.67 | 11.68 | 16.67 |
| Length (bp) | 25,574 | 25,574 | 25,574 | 25,574 | 25,530 | 25,485 | 25,591 | 25,575 | 25,432 | 25,542 | 25,563 |
| GC(%) | 41.8 | 41.9 | 41.9 | 41.9 | 41.9 | 41.9 | 41.9 | 41.9 | 41.8 | 41.9 | 41.9 |
| Length (%) | 16.80 | 16.8 | 16.8 | 16.8 | 16.77 | 16.72 | 16.74 | 16.71 | 16.66 | 16.71 | 16.73 |
| GC(%) | 35.4 | 35.4 | 35.4 | 35.4 | 35.3 | 35.3 | 35.3 | 35.3 | 35.3 | 35.3 | 35.3 |
| Length (bp) | 152,224 | 152,217 | 152,218 | 152,218 | 152,218 | 152,381 | 152,804 | 153,023 | 152,618 | 152,783 | 152,728 |
= G. soja new (in this study),
= G. soja (old), = G. max, = G.gracilis, = G.canescens, = G. cyrtoloba, = G.dolichocarpa, = G. falcata, = G.stenophita, = G.sydetika, = G.tomentella
Comparsion of coding and non-codign region size among ten Glycine species.
| Region | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Length (bp) | 79,250 | 77,835 | 77,769 | 77,811 | 77,607 | 72,294 | 77,649 | 77,598 | 77,646 | 77,604 | 77,601 |
| GC(%) | 36.2 | 36.1 | 36.1 | 36.1 | 36.1 | 36.8 | 36.1 | 36.1 | 36.1 | 36.1 | 36.1 |
| Length (%) | 52.06 | 51.13 | 51.12 | 51.11 | 50.98 | 47.44 | 50.91 | 50.71 | 50.8 | 50.7 | 50.8 |
| Length (bp) | 2,925 | 2,817 | 2,792 | 2,799 | 2,792 | 2,792 | 2,792 | 2,792 | 2,792 | 2,792 | 2,792 |
| GC(%) | 52.4 | 52.9 | 52.9 | 53.0 | 52.8 | 52.8 | 52.8 | 52.9 | 52.8 | 52.8 | 52.8 |
| Length (%) | 1.92 | 1.85 | 1.83 | 1.83 | 1.83 | 1.83 | 1.82 | 1.82 | 1.82 | 1.82 | 1.82 |
| Length (bp) | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 | 9,054 |
| GC(%) | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 | 54.9 |
| Length (%) | 5.94 | 5.94 | 5.94 | 5.94 | 5.94 | 5.94 | 5.93 | 5.91 | 5.93 | 5.93 | 5.94 |
| GC(%) | 33.23 | 33.45 | 33.26 | 33.23 | 33.45 | 33.23 | 33.432 | 33.23 | 33.454 | 33.45 | 33.26 |
| Length (bp) | 60,995 | 62,511 | 62,603 | 62,554 | 62,765 | 68,241 | 63,309 | 63,579 | 63,123 | 63,333 | 63,281 |
= G. soja (in this study),
= G. soja (old), = G. max, = G.gracilis, = G.canescens, = G. cyrtoloba, = G.dolichocarpa, = G. falcata, = G.stenophita, = G.sydetika, = G.tomentella
Base composition of the G. soja plastid genome.
| T/U(%) | C (%) | A (%) | G(%) | Length (bp) | |
|---|---|---|---|---|---|
| 32.3 | 17.4 | 32.4 | 18.0 | 152,224 | |
| 33.6 | 16.0 | 33.6 | 16.8 | 83,181 | |
| 35.3 | 13.6 | 36.0 | 15.1 | 17,896 | |
| 29.0 | 21.7 | 29.2 | 20.1 | 25,574 | |
| 25.2 | 23.1 | 22.4 | 29.3 | 2,925 | |
| 18.9 | 23.4 | 26.2 | 31.5 | 9,054 | |
| 32.2 | 17.0 | 31.5 | 19.2 | 79,250 | |
| 24.1 | 18.3 | 31.6 | 25.8 | 26,416 | |
| 33.2 | 19.8 | 29.7 | 17.1 | 26,416 | |
| 39.2 | 12.7 | 33.2 | 14.7 | 26,416 |
The codon-anticodon recognition pattern and codon usage for the G. soja plastid genome.
| Amino acid | Codon | No | RSCU | tRNA | Amino acid | Codon | No | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 1099 | 1.28 | Ala | GCA | 395 | 1.18 | ||
| Phe | UUC | 503 | 0.7 | Ala | GCG | 122 | 0.5 | ||
| Leu | UUA | 932 | 1.9 | Tyr | UAU | 846 | 1.5 | ||
| Leu | UUG | 557 | 1.1 | Tyr | UAC | 165 | 0.47 | ||
| Leu | CUU | 589 | 1.29 | Stop | UAG | 1 | 0.74 | ||
| Leu | CUC | 172 | 0.4 | Stop | UGA | 0 | 0.80 | ||
| Leu | CUA | 381 | 0.87 | Stop | UAA | 5 | 1.44 | ||
| Leu | CUG | 164 | 0.32 | His | CAU | 503 | 1.49 | ||
| Ile | AUU | 1170 | 1.51 | His | CAC | 134 | 0.50 | ||
| Ile | AUC | 392 | 0.5 | Gln | CAA | 764 | 1.53 | ||
| Ile | AUA | 827 | 0.89 | Gln | CAG | 200 | 0.49 | ||
| Met | AUG | 499 | 1 | Asn | AAU | 1045 | 1.44 | ||
| Val | GUU | 533 | 1.50 | Asn | AAC | 286 | 0.55 | ||
| Val | GUC | 158 | 0.46 | Lys | AAA | 1181 | 1.44 | ||
| Val | GUA | 534 | 1.47 | Lys | AAG | 331 | 0.55 | ||
| Val | GUG | 173 | 0.54 | Asp | GAU | 827 | 1.55 | ||
| Ser | UCU | 591 | 1.56 | Asp | GAC | 204 | 0.44 | ||
| Ser | UCC | 298 | 1.23 | Glu | GAA | 1042 | 1.48 | ||
| Ser | UCA | 442 | 1.03 | Glu | GAG | 313 | 0.51 | ||
| Ser | UCG | 181 | 0.48 | Cys | UGU | 231 | 1.50 | ||
| Ser | AGU | 405 | 1.24 | Cys | UGC | 85 | 0.49 | ||
| Ser | AGC | 120 | 0.42 | Trp | UGG | 442 | 1 | ||
| Pro | CCU | 403 | 1.59 | Arg | CGU | 339 | 1.36 | ||
| Pro | CCC | 202 | 0.86 | Arg | CGC | 91 | 0.51 | ||
| Pro | CCA | 334 | 1.07 | Arg | CGA | 361 | 1.24 | ||
| Pro | CCG | 122 | 0.47 | Arg | CGG | 100 | 0.48 | ||
| Thr | ACU | 571 | 1.68 | Arg | AGA | 485 | 1.77 | ||
| Thr | ACC | 210 | 0.76 | Arg | AGG | 156 | 0.61 | ||
| Thr | ACA | 421 | 1.08 | Gly | GGU | 585 | 1.28 | ||
| Thr | ACG | 139 | 0.45 | Gly | GGC | 157 | 0.42 | ||
| Ala | GCU | 623 | 1.72 | Gly | GGA | 691 | 1.52 | ||
| Ala | GCC | 189 | 0.59 | Gly | GGG | 282 | 0.77 |
Genes in the sequenced G. soja chloroplast genome.
| Category | Group of genes | Name of genes |
|---|---|---|
| Large subunit of ribosomal proteins | ||
| Small subunit of ribosomal proteins | ||
| DNA dependent RNA polymerase | ||
| rRNA genes | ||
| tRNA genes | ||
| Photosystem I | ||
| Photosystem II | ||
| NadH oxidoreductase | ||
| Cytochrome b6/f complex | ||
| ATP synthase | ||
| Rubisco | ||
| Maturase | ||
| Protease | ||
| Envelop membrane protein | ||
| Subunit Acetyl- CoA-Carboxylate | ||
| c-type cytochrome synthesis gene | ||
| Conserved Open reading frames |
Length of exons and introns in intron-containing genes from the Glycine soja plastid genome.
| Gene | Location | Exon I (bp) | Intron 1 (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
| LSC | 144 | 736 | 414 | |||
| LSC | 69 | 710 | 297 | 775 | 225 | |
| SSC | 552 | 1269 | 756 | |||
| IR | 777 | 692 | 756 | |||
| LSC | 6 | 808 | 642 | |||
| LSC | 8 | 728 | 476 | |||
| IR | 393 | 681 | 468 | |||
| LSC | 9 | 1165 | 402 | |||
| LSC | 441 | 785 | 1638 | 719 | 159 | |
| 114 | - | 26 | 531 | 232 | ||
| LSC | 39 | 887 | 228 | |||
| LSC | 126 | 697 | 228 | 745 | 150 | |
| IR | 38 | 810 | 35 | |||
| IR | 42 | 948 | 35 | |||
| LSC | 37 | 508 | 50 | |||
| LSC | 37 | 2583 | 29 | |||
| LSC | 39 | 586 | 37 |
replicated genes
*The rps12 coding sequence is split between 5′-rps12 and 3′-rps12, which are located in the large single-copy region and inverted repeat region, respectively.
Fig 2Analysis of repeated sequences in 10 Glycine plastid genomes.
A, Total of three repeat types; B, Length distribution of forward repeat sequences; C, Length distribution of tandem repeat sequences; D, Length distribution of palindromic repeat sequences.
Fig 3Analysis of simple sequence repeats (SSRs) in the ten Glycine plastid genomes.
A, Number of SSR types; B, Frequency of identified SSR motifs in different repeat class types; C, Frequency of identified SSRs in coding regions; D, Frequency of identified SSRs in the small single-copy (SSC), large simple-copy (LSC), and inverted repeat (IR) regions.
Simple sequence repeats (SSRs) in the Glycine soja plastid genome.
| Unit | Length | No. | SSR start |
|---|---|---|---|
| 18 | 1 | 51,531 | |
| 16 | 2 | 92,627, 142,764 | |
| 15 | 2 | 76,538, 119,451 | |
| 14 | 2 | 33,433, 82,862 | |
| 13 | 4 | 24,610, 51701, 110,244, 111,377 | |
| 12 | 7 | 6,968, 9,644, 9,656, 58,365, 62,260, 75,661, 82,660 | |
| 11 | 15 | 14,313, 42,712, 54,965, 59,329, 70698, 78,955, 79,488, 81,034, 81,302, 10,9835, 111,046, 111,519, 111,927, 112,225, 122,146 | |
| 10 | 22 | 2,991, 4,452, 7,568, 25,542, 31,495, 34,893, 38,160, 38,510, 45,234, 46,902, 54,259, 56,682, 62,419, 66,716, 67,450, 69,278, 93297, 109,698, 110,547, 114,419, 124,220, 142,100 | |
| 12 | 1 | 9644 | |
| 19 | 1 | 5,177 | |
| 17 | 1 | 5,159 | |
| 16 | 1 | 24,676 | |
| 14 | 1 | 32,841 | |
| 13 | 1 | 48,415 | |
| 12 | 2 | 54,297, 118,666 | |
| 11 | 8 | 33,695, 48,440, 65,081, 67,502, 68,320, 78,342, 79,508, 122,331 | |
| 10 | 5 | 31,746, 32,806, 68,072, 80,714, 116,632 | |
| 9 | 9 | 13,837, 35,671, 54,930, 58,400, 60,678, 64,792, 69,490, 82,699, 120,175 | |
| 8 | 24 | 100, 1,607, 2,068, 3,635, 4,513, 4,526, 13,370, 16,835, 28,206, 47,399, 51,596, 51,773, 51,795, 58,249, 60,155, 65,092, 69,374, 76,625, 79,531, 82,378, 92,346, 116,291, 123,690, 143,053 | |
| 9 | 2 | 25,492, 28,221 | |
| 8 | 15 | 3,673, 6,261, 85,791, 86,793, 94,040, 105,226, 105,546, 107,047, 120,875, 128,352, 129,853, 130,173, 141,359,148,606, 149,608 | |
| 9 | 1 | 120,511 | |
| 15 | 1 | 28,637 | |
| 13 | 1 | 14,614 | |
| 12 | 1 | 29,635 | |
| 11 | 1 | 73,972 | |
| 10 | 6 | 2,980, 14,647, 23,469, 47,482, 61,211, 83,153 | |
| 9 | 15 | 4,840, 6,885, 18,582, 24,528, 28,614, 32,259, 32,318, 45,719, 47,151, 58,337, 80,973, 99,425, 115,619, 120,102, 135,973 | |
| 12 | 1 | 2,123 | |
| 11 | 1 | 111,544 | |
| 10 | 4 | 83,359, 95,785, 139,612, 152,038 | |
| 9 | 15 | 23,601, 39,016, 61,479, 69,713, 76,888, 89,691, 91,515, 94,335, 102,444, 109,943, 117,624, 133,154, 141,063, 143,883, 145,707 | |
| 11 | 1 | 57,126 | |
| 9 | 6 | 22,369, 40,828, 45,626, 83,824, 116,434, 151,574 | |
| 10 | 2 | 83,313, 152,084 | |
| 9 | 5 | 5,366, 20,175, 68,568, 103,665, 131,733 | |
| 9 | 2 | 58,920, 90,061 | |
| 9 | 1 | 66,702 | |
| 15 | 2 | 18,423, 18,450 | |
| 13 | 1 | 119,923 | |
| 12 | 1 | 78,291 | |
| 12 | 1 | 67,682 | |
| 12 | 1 | 117,190 | |
| 15 | 2 | 107,707, 127,685 |
Fig 4Visual alignment of plastid genomes from Glycine soja (new and old) and nine other Glycine species.
VISTA-based identity plot showing the sequence identity among the ten Glycine species, using G. soja (new) as a reference. Vertical scale indicates the percentage of identity, ranging from 50% to 100%. Horizontal axis indicates the coordinates within the chloroplast genome. Arrows indicate the annotated genes and their transcriptional direction. A thick black line indicates the inverted repeat (IR) regions.
Fig 5Pairwise distance of 76 genes from Glycine soja (new and old) and nine other Glycine species.
Fig 6Distance between adjacent genes and junctions of the small single-copy (SSC), large single-copy (LSC), and two inverted repeat (IR) regions of the plastid genomes from ten Glycine species.
Boxes above and below the main line indicate the adjacent bordering genes. The figure is not to scale in regard to sequence length and only shows relative changes at or near the IR/SC borders.
Fig 7Phylogenetic trees of ten Glycine species.
The whole-genome dataset was analysed using four different methods: neighbour-joining (NJ), maximum parsimony (MP), maximum likelihood (ML), and Bayesian inference (BI). Numbers above the branches represent bootstrap values in the NJ, MP, and ML trees and posterior probabilities in the BI trees. A red dot represents the position of G. soja (KY241814).