| Literature DB >> 34237086 |
Bin Zhu1, Lijuan Hu1, Fang Qian1, Zuomin Gao1, Chenchen Gan1, Zhaochao Liu1, Xuye Du1, Hongcheng Wang1.
Abstract
Moricandia arvensis, a plant species originating from the Mediterranean, has been classified as a rare C3-C4 intermediate species, and it is a possible bridge during the evolutionary process from C3 to C4 plant photosynthesis in the family Brassicaceae. Understanding the genomic structure, gene order, and gene content of chloroplasts (cp) of such species can provide a glimpse into the evolution of photosynthesis. In the present study, we obtained a well-annotated cp genome of M. arvensis using long PacBio and short Illumina reads with a de novo assembly strategy. The M. arvensis cp genome was a quadripartite circular molecule with the length of 153,312 bp, including two inverted repeats (IR) regions of 26,196 bp, divided by a small single copy (SSC) region of 17,786 bp and a large single copy (LSC) region of 83,134 bp. We detected 112 unigenes in this genome, comprising 79 protein-coding genes, 29 tRNAs, and four rRNAs. Forty-nine long repeat sequences and 51 simple sequence repeat (SSR) loci of 15 repeat types were identified. The analysis of Ks (synonymous) and Ka (non-synonymous) substitution rates indicated that the genes associated with "subunits of ATP synthase" (atpB), "subunits of NADH-dehydrogenase" (ndhG and ndhE), and "self-replication" (rps12 and rpl16) showed relatively higher Ka/Ks values than those of the other genes. The gene content, gene order, and LSC/IR/SSC boundaries and adjacent genes of the M. arvensis cp genome were highly conserved compared to those in related C3 species. Our phylogenetic analysis demonstrated that M. arvensis was clustered into a subclade with cultivated Brassica species and Raphanus sativus, indicating that M. arvensis was not involved in an independent evolutionary origin event. These results will open the way for further studies on the evolutionary process from C3 to C4 photosynthesis and hopefully provide guidance for utilizing M. arvensis as a resource for improvinng photosynthesis efficiency in cultivated Brassica species.Entities:
Year: 2021 PMID: 34237086 PMCID: PMC8266105 DOI: 10.1371/journal.pone.0254109
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The detail characteristics of the complete cp genome of Moricandia arvensis.
| Category | Items | Descriptions |
|---|---|---|
| Construction of cp genome | LSC region (bp) | 83134 |
| IRA region (bp) | 26196 | |
| SSC region (bp) | 17786 | |
| IRB region (bp) | 26196 | |
| Genome Size (bp) | 153,312 | |
| Gene content | Total genes | 132 |
| Protein-coding genes | 87 | |
| tRNAs | 37 | |
| rRNAs | 4 | |
| Two copy genes | 20 | |
| Genes on LSC region | 84 | |
| Genes on IRA region | 18 | |
| Genes on SSC region | 12 | |
| Genes on IRB region | 18 | |
| Gene total length (bp) | 78,396 | |
| Average of genes length (bp) | 933 | |
| Gene length / Genome (%) | 51.13 | |
| GC content | GC content of LSC region (%) | 34.15 |
| GC content of IRA region (%) | 42.34 | |
| GC content of SSC region (%) | 29.17 | |
| GC content of IRB region (%) | 42.34 | |
| Overall GC content (%) | 36.37 |
Fig 1Gene map of the complete M. arvensis cp genome.
Genes on the outside and inside of the circle are transcribed in clockwise and counterclockwise directions, respectively. Genes belonging to different functional groups are color coded. Color intensity refering to the inner circle corresponds to GC content. The SSC, LSC, and inverted repeat regions (IRA and IRB) are indicated.
Summary of assembled gene functions of Moricandia arvensis cp genome.
| Category for genes | Groups of genes | Name of genes |
|---|---|---|
| Genes involvingin photosynthesis | Subunits of photosystem | |
| Subunits of cytochrome b/f complex | ||
| Large subunit of Rubisco | ||
| Subunits of ATP synthase | ||
| Subunits of NADH-dehydrogenase | ||
| Self-replication | Ribosomal RNA genes | |
| Transfer RNA genes | ||
| Small subunit of ribosome | ||
| Large subunit of ribosome | ||
| DNA-dependent RNA polymerase | ||
| Other genes | Maturase | |
| Envelope membrane protein | ||
| Subunit of acetyl-CoA | ||
| C-type cytochrome synthesis gene | ||
| Protease | ||
| Functionally unknown genes | Conserved Open reading frames |
a, b, c The letters indicate the gene with two copes, harboring one intron and two introns, respectively.
Fig 2Alignment of the cp genomes of M. arvensis and five closely related species.
The alignment was performed by mVISTA with M. arvensis as the reference. Local collinear blocks within each alignment are indicated by the same color and linked.
Fig 3Analysis of the boundaries of LSC/SSC/IR and adjacent genes among six Brassicaceae cp genomes.
Sequences of the whole cp genomes M. arvensis and five closely related cp genomes, including B. rapa, B. oleracea, B. juncea, R. sativus and O. diffuses were downloaded from GenBank.
Repeat sequences in the Moricandia arvensis cp genome.
| No. | Repeat | Type | Repeat 1 Start (Location) | Repeat 2 Start (Location) | Region |
|---|---|---|---|---|---|
| 1 | 30 | F | 106671 | 106703 | IRA |
| 2 | 34 | F | 129713 | 129745 | IRB |
| 3 | 34 | F | 106667 | 106699 | IRA |
| 4 | 37 | F | 97748 | 119335 | IRA;SSC |
| 5 | 39 | F | 43158 | 97745 | SSC;IRA |
| 6 | 40 | F | 148328 | 148349 | IRB |
| 7 | 42 | F | 97743 | 119330 | IRA;SSC |
| 8 | 47 | F | 148321 | 148342 | IRB |
| 9 | 47 | F | 88057 | 88078 | IRA |
| 10 | 52 | F | 38049 | 40273 | LSC |
| 11 | 55 | F | 38046 | 40270 | LSC |
| 12 | 58 | F | 38043 | 40267 | LSC |
| 13 | 64 | F | 38007 | 40231 | LSC |
| 14 | 71 | F | 38013 | 40237 | LSC |
| 15 | 73 | F | 38028 | 40252 | LSC |
| 16 | 74 | F | 38010 | 40234 | LSC |
| 17 | 76 | F | 38025 | 40249 | LSC |
| 18 | 79 | F | 38016 | 40240 | LSC |
| 19 | 81 | F | 38020 | 40244 | LSC |
| 20 | 30 | P | 7725 | 44261 | LSC |
| 21 | 30 | P | 61749 | 61749 | LSC |
| 22 | 30 | P | 106671 | 129713 | IRA;IRB |
| 23 | 30 | P | 106703 | 129745 | IRA;IRB |
| 24 | 34 | P | 106667 | 129713 | IRA;IRB |
| 25 | 34 | P | 106699 | 129745 | IRA;IRB |
| 26 | 37 | P | 119335 | 138661 | SSC;IRB |
| 27 | 38 | P | 79499 | 79499 | LSC |
| 28 | 39 | P | 51159 | 51159 | LSC |
| 29 | 39 | P | 43158 | 138662 | LSC;IRB |
| 30 | 40 | P | 28339 | 28339 | LSC |
| 31 | 42 | P | 296 | 296 | LSC |
| 32 | 42 | P | 119330 | 138661 | SSC;IRB |
| 33 | 44 | P | 73294 | 73294 | LSC |
| 34 | 44 | P | 61981 | 61981 | LSC |
| 35 | 44 | P | 76808 | 76808 | LSC |
| 36 | 45 | P | 112700 | 112700 | SSC |
| 37 | 46 | P | 9315 | 9315 | IRA;LSC |
| 38 | 46 | P | 209 | 209 | LSC |
| 39 | 46 | P | 4686 | 4686 | LSC |
| 40 | 47 | P | 64216 | 64216 | LSC |
| 41 | 47 | P | 88057 | 148321 | IRA;IRB |
| 42 | 47 | P | 88078 | 148342 | IRA;IRB |
| 43 | 50 | P | 55437 | 55437 | LSC |
| 44 | 50 | P | 288 | 296 | LSC |
| 45 | 53 | P | 66075 | 66075 | LSC |
| 46 | 53 | P | 112692 | 112700 | SSC |
| 47 | 58 | P | 64205 | 64216 | LSC |
| 48 | 41 | R | 30823 | 30823 | LSC |
| 49 | 47 | R | 80843 | 80843 | LSC |
F: forward repeats; R: reverse repeats; P: palindrome repeats.
Distribution of SSRs in the Moricandia arvensis cp genome.
| Type | Unit | Length | No. | Position on Genoem | Loction |
|---|---|---|---|---|---|
| P1 | A | 10 | 4 | 13019–13028 | |
| 29879–29888 | |||||
| 35524–35533 | |||||
| 109311–109320 | |||||
| 11 | 2 | 13753–13763 | |||
| 27132–27143 | |||||
| 12 | 2 | 122601–122612 | |||
| 137357–137368 | |||||
| 13 | 3 | 41894–41906 | |||
| 64295–64307 | |||||
| 80068–80080 | |||||
| 14 | 1 | 113672–113685 | |||
| T | 10 | 9 | 25406–25415 | ||
| 49134–49143 | |||||
| 50320–50329 | |||||
| 51211–51220 | |||||
| 70416–70425 | |||||
| 78955–78964 | |||||
| 80675–80684 | |||||
| 112043–112052 | |||||
| 127127–127136 | |||||
| 11 | 4 | 17669–17679 | |||
| 29564–29574 | |||||
| 45504–45514 | |||||
| 123187–123197 | |||||
| 12 | 4 | 74235–74246 | |||
| 99079–99090 | |||||
| 124947–124958 | |||||
| 125265–125276 | |||||
| 13 | 3 | 65602–65614 | |||
| 77163–77175 | |||||
| 81549–81561 | |||||
| 22 | 1 | 124755–124776 | |||
| P2 | AT | 10 | 6 | 26644–26653 | |
| 35661–35670 | |||||
| 107443–107452 | |||||
| 120298–120307 | |||||
| 128995–129004 | |||||
| 142979–142988 | |||||
| 14 | 1 | 13471–13484 | |||
| 20 | 1 | 30833–30852 | |||
| TA | 10 | 4 | 19041–19050 | ||
| 62092–62101 | |||||
| 93458–93467 | |||||
| 111594–111603 | |||||
| P3 | ATT | 12 | 1 | 45957–45968 | |
| P4 | CAAA | 12 | 1 | 28186–28197 | |
| TAAA | 12 | 2 | 45780–45791 | ||
| ATAG | 12 | 1 | 111356–111367 | ||
| P6 | GAAAGT | 18 | 1 | 56632–56649 |
Summary of codon usage and amino acids patterns of Moricandia arvensis cp genome.
| Codon | Number | Amino acids | Ratio of Codon | RSCU | No. | Ratio |
|---|---|---|---|---|---|---|
| GCA | 366 | Ala | 1.44% | 1.11 | 1319 | 5.18% |
| GCC | 200 | 0.79% | 0.61 | |||
| GCG | 144 | 0.57% | 0.44 | |||
| GCU | 609 | 2.39% | 1.85 | |||
| AGA | 463 | Arg | 1.82% | 1.82 | 1528 | 6.00% |
| AGG | 153 | 0.60% | 0.60 | |||
| CGA | 349 | 1.37% | 1.37 | |||
| CGC | 104 | 0.41% | 0.41 | |||
| CGG | 123 | 0.48% | 0.48 | |||
| CGU | 336 | 1.32% | 1.32 | |||
| AAC | 284 | Asn | 1.12% | 0.46 | 1245 | 4.89% |
| AAU | 961 | 3.78% | 1.54 | |||
| GAC | 195 | Asp | 0.77% | 0.38 | 1015 | 3.99% |
| GAU | 820 | 3.22% | 1.62 | |||
| UGC | 75 | Cys | 0.29% | 0.49 | 309 | 1.21% |
| UGU | 234 | 0.92% | 1.51 | |||
| CAA | 707 | Gln | 2.78% | 1.55 | 911 | 3.58% |
| CAG | 204 | 0.80% | 0.45 | |||
| GAA | 1029 | Glu | 4.04% | 1.51 | 1359 | 5.34% |
| GAG | 330 | 1.30% | 0.49 | |||
| GGA | 706 | Gly | 2.77% | 1.66 | 1705 | 6.70% |
| GGC | 167 | 0.66% | 0.39 | |||
| GGG | 276 | 1.08% | 0.65 | |||
| GGU | 556 | 2.18% | 1.30 | |||
| CAC | 147 | His | 0.58% | 0.50 | 593 | 2.33% |
| CAU | 446 | 1.75% | 1.50 | |||
| AUA | 710 | Ile | 2.79% | 0.97 | 2206 | 8.67% |
| AUC | 398 | 1.56% | 0.54 | |||
| AUU | 1098 | 4.31% | 1.49 | |||
| CUA | 365 | Leu | 1.43% | 0.82 | 2662 | 10.46% |
| CUC | 162 | 0.64% | 0.37 | |||
| CUG | 162 | 0.64% | 0.37 | |||
| CUU | 558 | 2.19% | 1.26 | |||
| UUA | 905 | 3.56% | 2.04 | |||
| UUG | 510 | 2.00% | 1.15 | |||
| AAA | 1107 | Lys | 4.35% | 1.53 | 1451 | 5.70% |
| AAG | 344 | 1.35% | 0.47 | |||
| AUG | 555 | Met | 2.18% | 1.00 | 555 | 2.18% |
| UUC | 490 | Phe | 1.93% | 0.64 | 1521 | 5.98% |
| UUU | 1031 | 4.05% | 1.36 | |||
| CCA | 285 | Pro | 1.12% | 1.12 | 1022 | 4.02% |
| CCC | 190 | 0.75% | 0.74 | |||
| CCG | 136 | 0.53% | 0.53 | |||
| CCU | 411 | 1.62% | 1.61 | |||
| AGC | 121 | Ser | 0.48% | 0.37 | 1956 | 7.69% |
| AGU | 394 | 1.55% | 1.21 | |||
| UCA | 393 | 1.54% | 1.21 | |||
| UCC | 286 | 1.12% | 0.88 | |||
| UCG | 192 | 0.75% | 0.59 | |||
| UCU | 570 | 2.24% | 1.75 | |||
| UAA | 41 | TER | 0.16% | 1.86 | 66 | 0.26% |
| UAG | 17 | 0.07% | 0.77 | |||
| UGA | 8 | 0.03% | 0.36 | |||
| ACA | 406 | Thr | 1.60% | 1.25 | 1300 | 5.11% |
| ACC | 228 | 0.90% | 0.70 | |||
| ACG | 143 | 0.56% | 0.44 | |||
| ACU | 523 | 2.06% | 1.61 | |||
| UGG | 430 | Trp | 1.69% | 1.00 | 430 | 1.69% |
| UAC | 176 | Tyr | 0.69% | 0.37 | 951 | 3.74% |
| UAU | 775 | 3.05% | 1.63 | |||
| GUA | 489 | Val | 1.92% | 1.46 | 1343 | 5.28% |
| GUC | 165 | 0.73% | 0.49 | |||
| GUG | 190 | 0.75% | 0.57 | |||
| GUU | 499 | Ala | 1.96% | 1.49 |
Fig 4Phylogenetic analysis of 61 Brassicaceae species based on the shared common protein-coding sequence.
The evolutionary history was inferred using the Maximum Likelihood method based on the Tamura-Nei model. The bootstrap values are shown next to the nodes. The initial tree(s) for the heuristic search were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated by the Maximum Likelihood (ML) approach, and then selecting the topology with the highest log value. The tree is drawn to scale, with branch length measured by the number of substitutions per site.