| Literature DB >> 35202392 |
Zhenchao Zhang1, Meiqi Tao1, Xi Shan1, Yongfei Pan1, Chunqing Sun1, Lixiao Song2, Xuli Pei3, Zange Jing3, Zhongliang Dai1.
Abstract
Broccoli (Brassica oleracea var. italica) is an important B. oleracea cultivar, with high economic and agronomic value. However, comparative genome analyses are still needed to clarify variation among cultivars and phylogenetic relationships within the family Brassicaceae. Herein, the complete chloroplast (cp) genome of broccoli was generated by Illumina sequencing platform to provide basic information for genetic studies and to establish phylogenetic relationships within Brassicaceae. The whole genome was 153,364 bp, including two inverted repeat (IR) regions of 26,197 bp each, separated by a small single copy (SSC) region of 17,834 bp and a large single copy (LSC) region of 83,136 bp. The total GC content of the entire chloroplast genome accounts for 36%, while the GC content in each region of SSC,LSC, and IR accounts for 29.1%, 34.15% and 42.35%, respectively. The genome harbored 133 genes, including 88 protein-coding genes, 37 tRNAs, and 8 rRNAs, with 17 duplicates in IRs. The most abundant amino acid was leucine and the least abundant was cysteine. Codon usage analyses revealed a bias for A/T-ending codons. A total of 35 repeat sequences and 92 simple sequence repeats were detected, and the SC-IR boundary regions were variable between the seven cp genomes. A phylogenetic analysis suggested that broccoli is closely related to Brassica oleracea var. italica MH388764.1, Brassica oleracea var. italica MH388765.1, and Brassica oleracea NC_0441167.1. Our results are expected to be useful for further species identification, population genetics analyses, and biological research on broccoli.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35202392 PMCID: PMC8870505 DOI: 10.1371/journal.pone.0263310
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Physical map of the B. oleracea var. italica cp genome.
Summary of cp genome of B. oleracea var. italica.
| Features | Numerical value | Features | Numerical value |
|---|---|---|---|
| Genome size (bp) | 153,364 | GC content in SSC region (%) | 29.1 |
| LSC length (bp) | 83,136 | Gene number | 133 |
| SSC length (bp) | 17,834 | Protein-coding gene number | 88 |
| IR length (bp) | 26,197 | tRNA gene number | 37 |
| AT content (%) | 63.64 | rRNA gene number | 8 |
| GC content (%) | 36.36 | Gene number in LSC regions | 85 |
| GC content in IR region (%) | 42.35 | Gene number in SSC regions | 14 |
| GC content in LSC region (%) | 34.15 | Gene number in IR regions | 34 |
Gene contents in the cp genome of B. oleracea var. italica.
| Category | Group of genes | Gene names | Account |
|---|---|---|---|
|
| Ribosomal RNA genes | 8 | |
| Transfer RNA genes | 37 | ||
| Small subunit ribosomal proteins (SSU) | 15 | ||
| Large subunit ribosomal proteins (LSU) | 11 | ||
| RNA polymerase | 4 | ||
|
| Photosystem I | 5 | |
| Photosystem II | 15 | ||
| NADH dehydrogenase | 12 | ||
| Cytochrome b/f complex | 6 | ||
| ATP synthase | 6 | ||
| Large subunit of rubisco |
| 1 | |
|
| Subunit of acetyl-CoA |
| 1 |
| Envelope membrane protein |
| 1 | |
| Maturase |
| 1 | |
| Protease |
| 1 | |
| C-type cytochrome synthesis gene |
| 1 | |
|
| Conserved hypothetical chloroplast ORF | 8 |
Note:
a, bThe letters indicate the gene wite two copes and three copes, respectively.
*, ** The symbols indicate the gene with one intron and two introns, respectively.
Lengths of introns and exons in genes in the B. oleracea var. italica cp genome.
| Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
|
| LSC | 37 | 2557 | 35 | ||
|
| LSC | 46 | 859 | 221 | ||
|
| LSC | 18 | 716 | 54 | ||
|
| LSC | 145 | 721 | 410 | ||
|
| LSC | 432 | 778 | 1611 | ||
|
| LSC | 126 | 782 | 228 | 732 | 153 |
|
| LSC | 37 | 311 | 50 | ||
|
| LSC | 39 | 601 | 35 | ||
|
| LSC | 68 | 570 | 295 | 938 | 228 |
|
| LSC | 6 | 784 | 639 | ||
|
| LSC | 8 | 733 | 475 | ||
|
| LSC | 9 | 1058 | 405 | ||
|
| IRB | 393 | 684 | 435 | ||
|
| IRB | 777 | 679 | 762 | ||
|
| IRB | 231 | - | 27 | 539 | 102 |
|
| IRB | 37 | 809 | 35 | ||
|
| IRB | 38 | 800 | 35 | ||
|
| SSC | 552 | 1098 | 531 | ||
|
| IRA | 38 | 800 | 35 | ||
|
| IRA | 37 | 809 | 35 | ||
|
| IRA | 777 | 679 | 762 | ||
|
| IRA | 393 | 684 | 435 | ||
|
| IRA | 102 | - | 231 | 539 | 27 |
Differences in annotated genes between the newly generated genome (MN649876.1) and other Brassica oleracea genomes.
| Position | |||||
|---|---|---|---|---|---|
| 1656..1692,4248..4285 |
|
|
| - | - |
| 8335..8368,9081..9123 | - | - |
| - | - |
| 35478..35549 |
|
|
| - | - |
| 46097..46131,46445..46494 |
|
|
| - | - |
| 49885..49920,50520..50559 |
|
|
| - | - |
| 85288..85362 |
|
|
| - | - |
| 93224..93306 |
|
|
| - | - |
| 113194..113275 |
|
|
| - | - |
| 133052..133088,133887..133924 |
|
|
| - | - |
| 133989..134029,134837..134869 |
|
|
| - | - |
| 136816..136887 |
|
|
| - | - |
| 143195..143277 |
|
|
| - | - |
| 153253..153363 | - |
| - | - | - |
Differences in genome size and genome divergence (SNPs and Indels) between the newly generated genome (MN649876.1) and other Brassica oleracea genomes.
| Sort | Position | Gene | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Start | End | ||||||||
| genome size(bp) | - | - | 153364 | 153365 | 153364 | 153364 | 153364 | - | |
| Indel | 1 | 15623 | 15624 | GT | - | - | - | - |
|
| 15623 | 15623 | - | - | - | - | G | |||
| 2 | 26564 | 26564 | A | - | - | - | - | - | |
| 26563 | 26564 | - | - | - | - | AT | |||
| 3 | 124803 | 124803 | T | - | - | - | - |
| |
| 124803 | 124804 | - | TT | - | - | - | |||
| SNP | 1 | 70595 | 70595 | A | - | - | T | - | - |
| 2 | 79477 | 79477 | T | - | - | A | - | - | |
| 3 | 264 | 264 | A | - | - | - | T | - | |
| 4 | 265 | 265 | A | - | - | - | G | - | |
| 5 | 266 | 266 | C | - | - | - | T | - | |
| 6 | 267 | 267 | A | - | - | - | T | - | |
| 7 | 3778 | 3778 | T | - | - | - | A | - | |
| 8 | 7351 | 7351 | T | - | - | - | A | - | |
| 9 | 70595 | 70595 | A | - | - | - | T | - | |
| 10 | 75696 | 75696 | C | - | - | - | G | - | |
| 11 | 79477 | 79477 | T | - | - | - | A | - | |
| 12 | 70595 | 70595 | A | - | T | - | - | - | |
| 13 | 79477 | 79477 | T | - | A | - | - | - | |
| 14 | 124794 | 124794 | T | - | A | - | - |
| |
| 15 | 124802 | 124802 | A | - | T | - | - |
| |
| 16 | 124970 | 124970 | A | - | G | - | - |
| |
| 17 | 124971 | 124971 | G | - | A | - | - |
| |
| 18 | 124972 | 124972 | A | - | T | - | - |
| |
| 19 | 124977 | 124977 | C | - | T | - | - |
| |
| 20 | 124979 | 124979 | A | - | G | - | - |
| |
| 21 | 124985 | 124985 | T | - | A | - | - |
| |
| 22 | 124986 | 124986 | C | - | T | - | - |
| |
| 23 | 124987 | 124987 | G | - | T | - | - |
| |
| 24 | 36 | 36 | A | G | - | - | - | - | |
| 25 | 56 | 56 | A | G | - | - | - | - | |
| 26 | 7351 | 7351 | T | A | - | - | - | - | |
| 27 | 70595 | 70595 | A | T | - | - | - | - | |
| 28 | 79477 | 79477 | T | A | - | - | - | - | |
Note: Ref, Alt represent reference and alter, respectively.
Codon usage in the B. oleracea var. italica cp genome.
| Amino Acid | Codon | Number | RSCU | tRNA | Amino Acid | Codon | Number | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Ter | UAA | 52 | 1.7931 | Met | GUG | 1 | 0.0051 |
| |
| Ter | UAG | 22 | 0.7587 | Met | UUG | 1 | 0.0051 | ||
| Ter | UGA | 13 | 0.4482 | Asn | AAC | 304 | 0.4628 |
| |
| Ala | GCA | 382 | 1.116 |
| Asn | AAU | 1010 | 1.5372 | |
| Ala | GCC | 206 | 0.602 | Pro | CCA | 310 | 1.1632 |
| |
| Ala | GCG | 146 | 0.4264 | Pro | CCC | 195 | 0.7316 | ||
| Ala | GCU | 635 | 1.8552 | Pro | CCG | 141 | 0.5292 | ||
| Cys | UGC | 80 | 0.48 |
| Pro | CCU | 420 | 1.576 | |
| Cys | UGU | 247 | 1.52 | Gln | CAA | 740 | 1.5514 |
| |
| Asp | GAC | 202 | 0.3854 |
| Gln | CAG | 214 | 0.4486 | |
| Asp | GAU | 846 | 1.6146 | Arg | AGA | 472 | 1.7982 |
| |
| Glu | GAA | 1061 | 1.5212 |
| Arg | AGG | 161 | 0.6132 | |
| Glu | GAG | 334 | 0.4788 | Arg | CGA | 357 | 1.3602 | ||
| Phe | UUC | 518 | 0.64 |
| Arg | CGC | 109 | 0.4152 | |
| Phe | UUU | 1101 | 1.36 | Arg | CGG | 125 | 0.4764 | ||
| Gly | GGA | 733 | 1.6592 |
| Arg | CGU | 351 | 1.3374 |
|
| Gly | GGC | 168 | 0.3804 |
| Ser | AGC | 125 | 0.3654 |
|
| Gly | GGG | 291 | 0.6588 | Ser | AGU | 413 | 1.2066 | ||
| Gly | GGU | 575 | 1.3016 | Ser | UCA | 420 | 1.227 |
| |
| His | CAC | 149 | 0.491 |
| Ser | UCC | 293 | 0.8556 |
|
| His | CAU | 458 | 1.509 | Ser | UCG | 199 | 0.5814 | ||
| Ile | AUA | 726 | 0.9471 |
| Ser | UCU | 604 | 1.7646 | |
| Ile | AUC | 432 | 0.5634 |
| Thr | ACA | 429 | 1.2424 |
|
| Ile | AUU | 1142 | 1.4895 | Thr | ACC | 247 | 0.7156 |
| |
| Lys | AAA | 1167 | 1.5356 |
| Thr | ACG | 147 | 0.4256 | |
| Lys | AAG | 353 | 0.4644 | Thr | ACU | 558 | 1.6164 | ||
| Leu | CUA | 395 | 0.8376 |
| Val | GUA | 512 | 1.434 |
|
| Leu | CUC | 189 | 0.4008 | Val | GUC | 182 | 0.51 |
| |
| Leu | CUG | 173 | 0.3672 | Val | GUG | 201 | 0.5632 | ||
| Leu | CUU | 587 | 1.245 | Val | GUU | 533 | 1.4928 | ||
| Leu | UUA | 955 | 2.0256 |
| Trp | UGG | 452 | 1 |
|
| Leu | UUG | 530 | 1.1238 |
| Tyr | UAC | 188 | 0.381 |
|
| Met | AUG | 602 | 2.9901 |
| Tyr | UAU | 799 | 1.619 |
Fig 2Codon contents of 20 amino acid and stop codons in all protein-coding genes of the broccoli cp genome.
Repeat sequences in the broccoli chloroplast genome.
| ID | Repeat Start | Type | Size(bp) | Repeat Start2 | Mismatch(bp) | E-Value | Gene | Region |
|---|---|---|---|---|---|---|---|---|
| 1 | 61539 | F | 47 | 61583 | 0 | 3.34E-19 | IGS | LSC;LSC |
| 2 | 37725 | F | 46 | 39949 | -3 | 5.48E-13 | psaB;psaA | LSC;LSC |
| 3 | 75674 | P | 45 | 75674 | -1 | 7.21E-16 | petD;petD | LSC;LSC |
| 4 | 37704 | F | 43 | 39928 | -3 | 2.85E-11 | psaB;psaA | LSC;LSC |
| 5 | 28145 | P | 40 | 28145 | 0 | 5.47E-15 | IGS | LSC;LSC |
| 6 | 73171 | P | 40 | 73175 | -3 | 1.46E-09 | IGS | LSC;LSC |
| 7 | 97778 | F | 37 | 119318 | -3 | 7.35E-08 | IGS;ndhA | IRb;SSC |
| 8 | 119318 | P | 37 | 138687 | -3 | 7.35E-08 | ndhA;IGS | SSC;IRa |
| 9 | 9182 | P | 36 | 9182 | 0 | 1.40E-12 | IGS | LSC;LSC |
| 10 | 172 | P | 36 | 172 | -2 | 7.94E-09 | IGS | LSC;LSC |
| 11 | 106664 | F | 34 | 106696 | -2 | 1.13E-07 | IGS | IRb;IRb |
| 12 | 106664 | P | 34 | 129772 | -2 | 1.13E-07 | IGS | IRb;IRa |
| 13 | 106696 | P | 34 | 129804 | -2 | 1.13E-07 | IGS | IRb;IRa |
| 14 | 129772 | F | 34 | 129804 | -2 | 1.13E-07 | IGS | IRa;IRa |
| 15 | 6223 | P | 32 | 6223 | 0 | 3.59E-10 | IGS | LSC;LSC |
| 16 | 88070 | F | 32 | 88091 | -3 | 4.80E-05 | ycf2;ycf2 | IRb;IRb |
| 17 | 88070 | P | 32 | 148379 | -3 | 4.80E-05 | ycf2;ycf2 | IRb;IRa |
| 18 | 88091 | P | 32 | 148400 | -3 | 4.80E-05 | ycf2;ycf2 | IRb;IRa |
| 19 | 148379 | F | 32 | 148400 | -3 | 4.80E-05 | ycf2;ycf2 | IRa;IRa |
| 20 | 7603 | F | 31 | 34397 | -3 | 1.74E-04 | trnS-GCU;trnS-UGA | LSC;LSC |
| 21 | 61473 | P | 30 | 61473 | 0 | 5.74E-09 | IGS | LSC;LSC |
| 22 | 7604 | P | 30 | 43913 | -1 | 5.16E-07 | trnS-GCU;trnS-GGA | LSC;LSC |
| 23 | 42810 | F | 30 | 97787 | -2 | 2.25E-05 | ycf3;IGS | LSC;IRb |
| 24 | 42810 | P | 30 | 138685 | -2 | 2.25E-05 | ycf3;IGS | LSC;IRa |
| 25 | 64605 | P | 30 | 64605 | -2 | 2.25E-05 | IGS | LSC;LSC |
| 26 | 122596 | P | 30 | 123176 | -2 | 2.25E-05 | IGS;ycf1 | SSC;SSC |
| 27 | 124278 | P | 30 | 124278 | -2 | 2.25E-05 | ycf1;ycf1 | SSC;SSC |
| 28 | 3753 | F | 30 | 120284 | -3 | 6.29E-04 | trnK-UUU;ndhA | LSC;SSC |
| 29 | 34398 | P | 30 | 43913 | -3 | 6.29E-04 | trnS-UGA;trnS-GGA | LSC;LSC |
| 30 | 34466 | P | 30 | 43851 | -3 | 6.29E-04 | trnS-UGA;trnS-GGA | LSC;LSC |
| 31 | 65897 | P | 30 | 65948 | -3 | 6.29E-04 | IGS | LSC;LSC |
| 32 | 124225 | F | 30 | 124252 | -3 | 6.29E-04 | ycf1;ycf1 | SSC;SSC |
| 33 | 173 | R | 30 | 34492 | -3 | 6.29E-04 | IGS | LSC;LSC |
| 34 | 185 | R | 30 | 112594 | -3 | 6.29E-04 | IGS | LSC;SSC |
| 35 | 34491 | R | 30 | 174 | -3 | 6.29E-04 | trnS-UGA;IGS | LSC;LSC |
Note: IRa and IRb,represent a pair of inverted repeats. SSC and LSC represent a small single copy region and a lager single copy region, respectively
Number of SSRs distributed in the SSC, LSC, and IR regions.
| Region | Exon | Intron | Intergenic | Number | Proportion |
|---|---|---|---|---|---|
| SSC | 13 | 4 | 5 | 22 | 23.90% |
| LSC | 6 | 12 | 40 | 58 | 63.00% |
| IR | 2 | 0 | 10 | 12 | 13.00% |
Distribution of SSRs in the broccoli cp genome.
| SSR type | Unit | Length | Number | Genomic position (gene) |
|---|---|---|---|---|
| P1 | A | 10 | 8 | 12446–12455, 12867–12876, 41568–41577, 50050–50059_(trnV-UAC), 66012–66021, 109314–139323_(ndhF), 122605–122614, 138192–138201 |
| 11 | 7 | 26937–26947, 60220–60230, 64103–64113,82925–82935, 112599–112609, 119762–119772_(ndhA), 137413–137423 | ||
| 13 | 3 | 67123–67135, 124958–124970_(ycf1), 126037–126049_(ycf1) | ||
| 14 | 2 | 113669–113682_(ccsA), 140343–140356 | ||
| 16 | 1 | 30260–30275 | ||
| T | 10 | 18 | 15624–15633_(rpoC2), 16763–16772_(rpoC2), 25247–25256_(rpoB), 28408–28417, 29383–29392, 48803–48812, 53171–53180, 55633–55642, 64126–64135,70288–70297_(clpP), 80677–80686_(rpl16), 81154–81163_(rpl16), 81550–81559_(rpl16), 81661–81670, 98300–98309, 123187–123196_(ycf1), 124444–124453_(ycf1), 127178–127187_(ycf1) | |
| 11 | 11 | 3978–3988_(trnK-UUU), 6815–6825, 7777–7787, 12467–12477, 17512–17522_(rpoC2), 74110–74120_(petB), 99078–99088, 120304–120314_(ndhA), 120317–120327_(ndhA), 123208–123218_(ycf1), 126007–126017_(ycf1) | ||
| 12 | 4 | 4061–4072_(trnK-UUU), 63338–63349, 70265–70276_(clpP), 123096–123107_(ycf1) | ||
| 13 | 3 | 47324–47336, 77038–77050_(rpoA), 125310–125322_(ycf1) | ||
| 14 | 4 | 3769–3782_(trnK-UUU), 50869–50882, 96145–96158, 124990–125003_(ycf1) | ||
| 15 | 1 | 12160–12174_(atpF) | ||
| 16 | 2 | 7336–7351, 111781–111796 | ||
| 19 | 1 | 124803–124821_(ycf1) | ||
| C | 11 | 1 | 62109–62119 | |
| P2 | AT | 10 | 4 | 7917–7926, 107448–107457, 129044–129053, 143015–143024 |
| 12 | 2 | 13319–13330, 112614–112625 | ||
| 14 | 3 | 3756–3769_(trnK-UUU), 30560–30573, 120287–120300_(ndhA) | ||
| TA | 10 | 8 | 4557–4566, 6234–6243, 7841–7850, 18884–18893_(rpoC2), 26480–26489, 61869–61878, 93476–93485, 122815–122824_(ycf1) | |
| 18 | 1 | 111597–111614 | ||
| P3 | AAT | 12 | 1 | 12612–12623 |
| TTA | 12 | 1 | 26447–26458 | |
| ATT | 12 | 1 | 45612–45623 | |
| P4 | CAAA | 12 | 1 | 27991–28002 |
| TTCT | 12 | 1 | 34268–34279 | |
| TAAA | 12 | 1 | 45436–45447 | |
| TATC | 12 | 1 | 47120–47131 | |
| ATAG | 12 | 1 | 111359–111370_(ndhF) |
Fig 3Statistical summary of repeat sequences in the cp genome of broccoli.
Fig 4Comparison of boundaries between the LSC, IR, and SSC regions in chloroplast genomes of seven species.
Genes are depicted by colored boxes. Boxes above or below the main line indicate adjacent border genes.
Fig 5Phylogenetic tree inferred by the maximum likelihood method based on the complete cp genomes from 56 species.
Bootstrap support values are shown at the nodes.