Literature DB >> 35202392

Characterization of the complete chloroplast genome of Brassica oleracea var. italica and phylogenetic relationships in Brassicaceae.

Zhenchao Zhang1, Meiqi Tao1, Xi Shan1, Yongfei Pan1, Chunqing Sun1, Lixiao Song2, Xuli Pei3, Zange Jing3, Zhongliang Dai1.   

Abstract

Broccoli (Brassica oleracea var. italica) is an important B. oleracea cultivar, with high economic and agronomic value. However, comparative genome analyses are still needed to clarify variation among cultivars and phylogenetic relationships within the family Brassicaceae. Herein, the complete chloroplast (cp) genome of broccoli was generated by Illumina sequencing platform to provide basic information for genetic studies and to establish phylogenetic relationships within Brassicaceae. The whole genome was 153,364 bp, including two inverted repeat (IR) regions of 26,197 bp each, separated by a small single copy (SSC) region of 17,834 bp and a large single copy (LSC) region of 83,136 bp. The total GC content of the entire chloroplast genome accounts for 36%, while the GC content in each region of SSC,LSC, and IR accounts for 29.1%, 34.15% and 42.35%, respectively. The genome harbored 133 genes, including 88 protein-coding genes, 37 tRNAs, and 8 rRNAs, with 17 duplicates in IRs. The most abundant amino acid was leucine and the least abundant was cysteine. Codon usage analyses revealed a bias for A/T-ending codons. A total of 35 repeat sequences and 92 simple sequence repeats were detected, and the SC-IR boundary regions were variable between the seven cp genomes. A phylogenetic analysis suggested that broccoli is closely related to Brassica oleracea var. italica MH388764.1, Brassica oleracea var. italica MH388765.1, and Brassica oleracea NC_0441167.1. Our results are expected to be useful for further species identification, population genetics analyses, and biological research on broccoli.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35202392      PMCID: PMC8870505          DOI: 10.1371/journal.pone.0263310

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Broccoli is a vegetable with a high nutrient content in Brassica oleracea. It possesses of wide range of nutrients, including vitamins A and K, antioxidants, β-carotene, calcium, riboflavin, and iron [1], as well as phytochemicals, such as phenols, flavonoids, glucosinolates, minerals, and selenium. The consumption of broccoli is beneficial to human health [2], exerting anti-inflammatory, anti-obesity, cholesterol-lowering, and anti-carcinogenic effects as well as high antioxidant activity [3, 4]. Broccoli was introduced to China as a special vegetable and was initially cultivated on a small scale. Over the past few decades, broccoli has played in increasing role in the booming vegetable industry and has become an increasing source of income for farmers. Chloroplasts (cp) are crucial organelles in plant cells as a metabolic center of cellular reactions [5]. They play critical roles in carbohydrate metabolism, photosynthesis, and various molecular processes as well as in the regulation of physiology, growth, development, and stress responses [6, 7]. The typical cp genome of angiosperms has a quadripartite structure with a small single-copy (SSC) region and large single-copy region (LSC) region divided by two inverted repeat (IR) regions [8]. The gene content and organization of cp genomes are highly conserved; however, IR expansions and contractions, gene loss, inversions, and rearrangements have been reported [9]. Owing to their high conservation and slow rates of evolution, cp genomes are invaluable for phylogenetic classification, DNA barcoding, and genetic engineering [10, 11]. Broccoli crops from the Brassica oleracea group likely originated in the Mediterranean basin and are linked to closely related species, e.g., Brassica cretica and Brassica montana [12]. The selection and domestication processes led to the spread and exchange of genetic materials with other Brassica oleracea cultivars. Intense trade and human migration among several continents promoted the spread of the crop worldwide since the 15th century, resulting in the development of new cultivars and hybrids, mainly in European and Asian countries. Adaptation to different soil and climatic conditions resulted from the cultivation and selection of genotypes with beneficial agronomical and qualitative traits [13]. Advances in high-throughput Illumina genome sequencing technologies have provided an opportunity to obtain and analyze the complete chloroplast genome of broccoli for analyses of its molecular and genomic characteristics. A sufficient knowledge of its genetic diversity is essential for the development of efficient strategies for its exploitation. Several complete cp genomes of Brassica oleracea are available in GenBank (Accession numbers KX681657.1, MH388765.1, MH388764.1, KX681654.1, KR233156.1, MG717288.1, MG717287.1, KX681655.1, and KX681656.1). In this study, we sequenced and assembled the complete cp genome of broccoli cultivar 2001B (B. oleracea var. italica) for analyses of phylogenetic relationships with B. oleracea cultivars and other members of Brassicaceae. In particular, we de novo sequenced and assembled the complete cp genome of broccoli using the Illumina HiSeq2500 platform, followed by gene annotation and structural analyses, the identification of simple sequence repeat (SSR) markers, and reconstruction of evolutionary relationships among species in Brassicaceae. These results will hopefully improve our understanding of the cp genome and provide a theoretical basis for future scientific research on broccoli.

Materials and methods

DNA extraction and sequencing

Broccoli was planted in the experimental field of Zhenjiang Institute of Agricultural Sciences in Jurong, Jiangsu Province, China (N31°58’, E119°9’). Fresh leaves were collected and wrapped with tin foil, frozen with liquid nitrogen, and immediately stored at -80°C. Total genomic DNA was extracted from approximately 5 g of leaves with Plant DNA Isolation Reagent (Takara, USA) following the manufacturer’s protocol. An Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and NanoDrop 2000 Microspectrophotometer were used to evaluate the quality and integrity of the extracted DNA. After purification, the DNA was employed to build a sequencing library according to the manufacturer’s instructions. The Illumina HiSeq2500 platform (San Diego, CA, USA) was utilized to construct paired-end (PE) libraries with insert sizes of 150 bp and sequenced according to standard protocols, including sample quality testing, library construction and quality testing, and library sequencing.

Cp genome assembly and annotation

High-quality clean reads were generated by trimming and filtering the low-quality reads and sequencing adapters using Trimmomatic v. 0.3649. The clean reads were mapped onto the available cp genome reference of B. oleracea var. capitata (NCBI accession: KX681654.1) using Bowtie2(version 2. 2. 5) [14] with default parameters and preset options. All cp-like reads were assembled into contigs using SPAdes3.10.1 [15]. Then, the contigs were aligned again on the Brasscia oleracea var. capitata reference using the BLAST algorithm. The generated contigs and mate-pair reads were used for scaffolding using SSPACE(version 3.0) [16] to form a circular genome. The tRNAs, rRNAs, and protein-coding genes of the plastome were annotated using the CpGAVAS online (http://www.herbalgenomics.org/0506/cpgavas) [17] and then manually corrected. BLAST(v2.2.31) and DOGMA (http://dogma.ccbb.utexas.edu/) were used to check the annotation results [18]. The online tool tRNAscan-SE with default settings (http://lowelab.ucsc.edu/tRNAscan-SE/) was applied to analyze the tRNAs [19]. The physical circular cp genome map was generated using OrganellarGenomeDRAW (http://ogdraw.mpimp-golm.mpg.de/index.shtml) [20] with default settings and checked manually. Relative synonymous codon usage (RSCU) was evaluated using CodonW v1.4.4 [21]. Long repetitive sequences and SSRs were analyzed using REPuter (http://bibiserv.techfak.uni-bielefeld.de/reputer/) [22] and MISA v1.0 (http://pgrc.ipk-gatersleben.de/misa/misa.html) [23] with the parameters are 1–8 (single base repeats 8 times and more), 2–5 (double base repeats 5 times and more), 3–3 (tribasic repeats 3 times and more), 4–3 (tetrabase repeats 3 times and more), 5–3 (pentabase repeats 3 times and more), respectively. Sequencing data and gene annotations of B. oleracea var. italica were submitted to NCBI GenBank database (Accession Number: MN649876.1).

Cp genome comparison

The newly generated genome (MN649876.1) was compared with the genomes of Brassica oleracea var. italica (Accession Number: MH388765.1), Brassica oleracea var. capitata (Accession Number: MG717287.1), Brassica oleracea var. botrytis (Accession Number: KX681655.1), and Brassica oleracea var. gongylodes (Accession Number: KX681656.1). And compare the boundaries between the LSC, IR and SSC regions in the chloroplast genome with other six genomes(Arabidopsis thaliana, Capsella burse-pastoris, Brassica napus, Brassica juncea, Brassica nigra, and Bunias orientalis) using mVISTA (http://genome.lbl.gov/vista/index.shtml) in the shuffle-LAGAN mode [24], with the annotation of B. oleracea var. capitata as a reference. The IRB-LSC, IRB-SSC, IRA-SSC, and IRA-LSC boundaries were compared among the seven species with the annotations of cp genomes available in GenBank.

Phylogenetic analysis

The phylogenetic trees were constructed by aligning total chloroplast protein-coding sequences from 31 species in Brassicaceae obtained from the GenBank database, using Carica papaya (NC_010323.1) as an outgroup. MAFFTA version 7.017 [25] was used generate sequence alignments. FastTree v. 2.1.10 [26] was employed to construct a phylogenetic tree by the maximum likelihood (ML) method with the GTRGAMMA model and 1000 bootstrap replicates to evaluate node support.

Results

Characteristics of the broccoli cp genome

The newly generated genome (MN649876.1) was a typical quadripartite circular molecule 153,364 bp in length, containing a pair of two IR (IRA and IRB) regions of 26,197 bp each, separated by a SSC region of 17,834 bp and a LSC region of 83,136 bp (Fig 1 and Table 1). The AT and GC contents of overall cp genome were 63.64% and 36.36%, respectively. The cp genome had a biased base composition (31.36% A, 32.28% T, 17.86% G, and 18.5% C) with an overall GC content of 36.36%. The GC contents of the IR, LSC, and SSC regions were 42.35%, 34.15%, and 29.1%, respectively.
Fig 1

Physical map of the B. oleracea var. italica cp genome.

Table 1

Summary of cp genome of B. oleracea var. italica.

FeaturesNumerical valueFeaturesNumerical value
Genome size (bp)153,364GC content in SSC region (%)29.1
LSC length (bp)83,136Gene number133
SSC length (bp)17,834Protein-coding gene number88
IR length (bp)26,197tRNA gene number37
AT content (%)63.64rRNA gene number8
GC content (%)36.36Gene number in LSC regions85
GC content in IR region (%)42.35Gene number in SSC regions14
GC content in LSC region (%)34.15Gene number in IR regions34
The genome harbored 133 genes, including 88 protein-coding genes (PCGs) (79 PCG species), 37 tRNA genes (30 tRNA species), and 8 rRNA genes (4 rRNA species) (Fig 1, Tables 1 and 2). Among these, 15 genes encoded a small ribosomal subunit (SSU), 11 encoded a large ribosomal subunit (LSU), and 4 genes encoded the DNA-directed RNA polymerase. Forty-five genes were associated with photosynthesis, including 5 encoding photosystem I and 15 encoding the photosystem II complex, 12 for subunits of NADH dehydrogenase, 6 for the cytochrome b/f complex, 6 for different subunits of ATP synthase, and one for the Large subunit of rubisco. Five genes were associated with functions other than self-replication and photosynthesis, and eight genes had unknown functions. Thirty-four genes, including 14 tRNA genes, 2 rps7, 2 ndhB, 2 rpl2, 2 rpl23, 2 rrn5, 2 rrn4.5, 2 rrn23, 2 rrn16, 2 ycf2, and 2 ycf15 were duplicated in the IR regions. Most of the genes occurred as a single copy, and 18 gene species occurred in two copies, including 4 rRNA species (rrn4.5, rrn5, rrn16, and rrn23), 7 tRNA species (trnA-UGC, trnI-GAU, trnN-GUU, trnV-GAC, trnL-CAA, trnI-CAU, and trnR-ACG), and 7 PCG species (rps7, rpl2, rpl23, ndhB, ycf1, ycf2, and ycf15), in addition, one PCG species (rps12) occurred in three copies. Except for ycf1 and rps12 residing within the LSC region, all other 15 duplicated gene species were completely located within the IR regions. Nine PCG species (rps16, rpl2, rpl16, rpoC1, ndhA, ndhB, petB, petD, and atpF) and five tRNA species (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) contained a single intron, while three other PCG species (rps12, ycf3, and clpP) harbored two introns (Tables 2 and 3). The trnK-UUU gene had the largest intron (2557 bp), followed by the ndhA gene (1098 bp), whereas trnL-UAA has the smallest intron (311 bp). The intron in the trnK-UUU gene was 2555 bp, and the matK gene was contained within the intron.
Table 2

Gene contents in the cp genome of B. oleracea var. italica.

CategoryGroup of genesGene namesAccount
Self-replication Ribosomal RNA genesrrn4.5a (×2), rrn5a (×2), rrn16a(×2), rrn23a (×2)8
Transfer RNA genestrnA-UGCa,*(×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC*, trnH-GUG, trnI-CAUa(×2), trnI-GAUa,*(×2), trnK-UUU*, trnL-CAAa(×2), trnL-UAA*, trnL-UAG, trnM-CAU, trnN-GUUa(×2), trnP-UGG, trnQ-UUG, trnR-ACGa(×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GACa(×2), trnV-UAC*, trnW-CCA, trnY-GUA, trnfM-CAU37
Small subunit ribosomal proteins (SSU)rps2, rps3, rps4, rps7a (×2), rps8, rps11, rps12b,** (×3), rps14, rps15, rps16*, rps18, rps1915
Large subunit ribosomal proteins (LSU)rpl2a,* (×2), rpl14, rpl16*, rpl20, rpl22, rpl23a (×2), rpl32, rpl33, rpl3611
RNA polymeraserpoA, rpoB, rpoC1*, rpoC24
Photosynthesis Photosystem IpsaA, psaB, psaC, psaI, psaJ5
Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ15
NADH dehydrogenasendhA*, ndhBa,* (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK12
Cytochrome b/f complexpetA, petB*, petD*, petG, petL, petN6
ATP synthaseatpA, atpB, atpE, atpF*, atpH, atpI6
Large subunit of rubisco rbcL 1
Other genes Subunit of acetyl-CoA accD 1
Envelope membrane protein cemA 1
Maturase matK 1
Protease clpP ** 1
C-type cytochrome synthesis gene ccsA 1
Genes of unknown function Conserved hypothetical chloroplast ORFycf1a (×2), ycf2 a (×2), ycf15 a (×2), ycf3**, ycf48

Note:

a, bThe letters indicate the gene wite two copes and three copes, respectively.

*, ** The symbols indicate the gene with one intron and two introns, respectively.

Table 3

Lengths of introns and exons in genes in the B. oleracea var. italica cp genome.

GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)
trnK-UUU LSC37255735
rps16 LSC46859221
trnG-UCC LSC1871654
atpF LSC145721410
rpoC1 LSC4327781611
ycf3 LSC126782228732153
trnL-UAA LSC3731150
trnV-UAC LSC3960135
clpP LSC68570295938228
petB LSC6784639
petD LSC8733475
rpl16 LSC91058405
rpl2 IRB393684435
ndhB IRB777679762
rps12 IRB231-27539102
trnI-GAU IRB3780935
trnA-UGC IRB3880035
ndhA SSC5521098531
trnA-UGC IRA3880035
trnI-GAU IRA3780935
ndhB IRA777679762
rpl2 IRA393684435
rps12 IRA102-23153927
Note: a, bThe letters indicate the gene wite two copes and three copes, respectively. *, ** The symbols indicate the gene with one intron and two introns, respectively. For comparative analyses, information from the new genome and other genomes from the GenBank was compared (Tables 4 and 5). Except for the genome size of B. oleracea var. Italica (MH388765.1) which is 153365bp, the other genome sizes are 1553364bp (Table 5). The tRNA genes were exactly the same detected in MN649876.1 and MH388765.1. Besides, a pseudogene, rps19, in the MH388765.1 cp genome was not detected in other species. KX681655.1 and KX681656.1 lost 11 genes detected in MN649876.1 (Table 4). In addition, Compared to the reference genome, Brasscia gongylodes (KX681656.1) genome contains two indels and nine SNPs, one of the indels involves the rpoC2 gene, Brasscia italica (MH388765.1) genome contains one indel which involves the ycf1 gene and five SNPs, Brasscia capitata (MG717287.1)) includes 12 SNPs, 10 of which involves the ycf1 gene, and Brasscia botrytis (KX681655.1) includes 2 SNPs (Table 5).
Table 4

Differences in annotated genes between the newly generated genome (MN649876.1) and other Brassica oleracea genomes.

PositionBrasscia italica MN649876.1Brasscia italica MH388765.1Brasscia capitata MG717287.1Brasscia botrytis KX681655.1Brasscia gongylodes KX681656.1
1656..1692,4248..4285 trnK-UUU trnK-UUU trnK-UUU --
8335..8368,9081..9123-- trnG --
35478..35549 trnG-UCC trnG-UCC trnG --
46097..46131,46445..46494 trnL-UAA trnL-UAA trnL --
49885..49920,50520..50559 trnV-UAC trnV-UAC trnV --
85288..85362 trnI-CAU trnI-CAU trnI --
93224..93306 trnL-CAA trnL-CAA trnL --
113194..113275 trnL-UAG trnL-UAG trnL --
133052..133088,133887..133924 trnA-UGC trnA-UGC trnA --
133989..134029,134837..134869 trnI-GAU trnI-GAU trnI-GAU --
136816..136887 trnV-GAC trnV-GAC trnV-GAC --
143195..143277 trnL-CAA trnL-CAA trnL-CAA --
153253..153363- rps19 (pseudo) ---
Table 5

Differences in genome size and genome divergence (SNPs and Indels) between the newly generated genome (MN649876.1) and other Brassica oleracea genomes.

SortPositionBrasscia italica MN649876.1 (Ref)Brasscia italica MH388765.1(Alt)Brasscia capitata MG717287.1(Alt)Brasscia botrytis KX681655.1(Alt)Brasscia gongylodes KX681656.1(Alt)Gene
StartEnd
genome size(bp)--153364153365153364153364153364-
Indel11562315624GT---- rpoC2
1562315623----G
22656426564A-----
2656326564----AT
3124803124803T---- ycf1
124803124804-TT---
SNP17059570595A--T--
27947779477T--A--
3264264A---T-
4265265A---G-
5266266C---T-
6267267A---T-
737783778T---A-
873517351T---A-
97059570595A---T-
107569675696C---G-
117947779477T---A-
127059570595A-T---
137947779477T-A---
14124794124794T-A-- ycf1
15124802124802A-T-- ycf1
16124970124970A-G-- ycf1
17124971124971G-A-- ycf1
18124972124972A-T-- ycf1
19124977124977C-T-- ycf1
20124979124979A-G-- ycf1
21124985124985T-A-- ycf1
22124986124986C-T-- ycf1
23124987124987G-T-- ycf1
243636AG----
255656AG----
2673517351TA----
277059570595AT----
287947779477TA----

Note: Ref, Alt represent reference and alter, respectively.

Note: Ref, Alt represent reference and alter, respectively.

Examination of codon usage frequency

According to the coding sequence (CDS), the relative synonymous codon usage frequency (RSCU) and codon usage frequency were estimated (Table 6, Fig 2). All protein-coding genes in the cp genome were composed of 26,681 codons. Among these codons, the termination codons were UAA, UAG, and UGA. AUG encoding methionine had the highest RSCU value (2.9901). The most abundant amino acid in the protein-coding genes was leucine (2829 codons, approximately 10.6% of the total), compared with only 325 codons (1.22%) for cysteine.
Table 6

Codon usage in the B. oleracea var. italica cp genome.

Amino AcidCodonNumberRSCUtRNAAmino AcidCodonNumberRSCUtRNA
TerUAA521.7931MetGUG10.0051 trnM-CAU
TerUAG220.7587MetUUG10.0051
TerUGA130.4482AsnAAC3040.4628 trnN-GUU
AlaGCA3821.116 trnA-UGC AsnAAU10101.5372
AlaGCC2060.602ProCCA3101.1632 trnP-UGG
AlaGCG1460.4264ProCCC1950.7316
AlaGCU6351.8552ProCCG1410.5292
CysUGC800.48 trnC-GCA ProCCU4201.576
CysUGU2471.52GlnCAA7401.5514 trnQ-UUG
AspGAC2020.3854 trnD-GUC GlnCAG2140.4486
AspGAU8461.6146ArgAGA4721.7982 trnR-UCU
GluGAA10611.5212 trnE-UUC ArgAGG1610.6132
GluGAG3340.4788ArgCGA3571.3602
PheUUC5180.64 trnF-GAA ArgCGC1090.4152
PheUUU11011.36ArgCGG1250.4764
GlyGGA7331.6592 TrnG-UCC ArgCGU3511.3374 trnR-ACG
GlyGGC1680.3804 trnG-GCC SerAGC1250.3654 trnS-GCU
GlyGGG2910.6588SerAGU4131.2066
GlyGGU5751.3016SerUCA4201.227 trnS-UGA
HisCAC1490.491 trnH-GUG SerUCC2930.8556 trnS-GGA
HisCAU4581.509SerUCG1990.5814
IleAUA7260.9471 TrnI-CAU SerUCU6041.7646
IleAUC4320.5634 trnI-GAU ThrACA4291.2424 trnT-UGU
IleAUU11421.4895ThrACC2470.7156 trnT-GGU
LysAAA11671.5356 trnK-UUU ThrACG1470.4256
LysAAG3530.4644ThrACU5581.6164
LeuCUA3950.8376 trnL-UAG ValGUA5121.434 trnV-UAC
LeuCUC1890.4008ValGUC1820.51 trnV-GAC
LeuCUG1730.3672ValGUG2010.5632
LeuCUU5871.245ValGUU5331.4928
LeuUUA9552.0256 trnL-UAA TrpUGG4521 trnW-CCA
LeuUUG5301.1238 trnL-CAA TyrUAC1880.381 trnY-GUA
MetAUG6022.9901 trnfM-CAU TyrUAU7991.619
Fig 2

Codon contents of 20 amino acid and stop codons in all protein-coding genes of the broccoli cp genome.

The codon-anticodon recognition patterns of the cp genome showed that 30 tRNAs contained codons corresponding to 20 essential amino acids for protein biosynthesis. The AT contents at the first, second, and third codon positions were 55.3%, 62.53%, and 71.21%, respectively. Moreover, of all 66 codons, the RSCU values for 31 codons were >1, and most (13/16, 93.5%) ended with base A or U, whereas 34 codons had RSCU values of <1, and most of these (16/15, 91.2%) ended with base C or G. Trp was encoded by only a UGG codon, indicating no biased usage (RSCU = 1).

Analyses of repeat sequences and SSRs

A total of 35 repeat sequences, including 12 forward (F), 20 palindromic (P), and 3 reverse (R) repeats were detected using REPuter in the broccoli cp genome (Table 7). Repeat lengths were generally between 30 to 47 bp. LSC, SSC, and IR regions harbored 22, 7, and 12 repeats, respectively. Most repeats were mainly located in intergenic spaces (IGS), ycf, and intron sequences, whereas 13 repeats were located in psaA, psaB, trnS-GCU, trnS-GGA, and trnS-UGA.
Table 7

Repeat sequences in the broccoli chloroplast genome.

IDRepeat StartTypeSize(bp)Repeat Start2Mismatch(bp)E-ValueGeneRegion
161539F476158303.34E-19IGSLSC;LSC
237725F4639949-35.48E-13psaB;psaALSC;LSC
375674P4575674-17.21E-16petD;petDLSC;LSC
437704F4339928-32.85E-11psaB;psaALSC;LSC
528145P402814505.47E-15IGSLSC;LSC
673171P4073175-31.46E-09IGSLSC;LSC
797778F37119318-37.35E-08IGS;ndhAIRb;SSC
8119318P37138687-37.35E-08ndhA;IGSSSC;IRa
99182P36918201.40E-12IGSLSC;LSC
10172P36172-27.94E-09IGSLSC;LSC
11106664F34106696-21.13E-07IGSIRb;IRb
12106664P34129772-21.13E-07IGSIRb;IRa
13106696P34129804-21.13E-07IGSIRb;IRa
14129772F34129804-21.13E-07IGSIRa;IRa
156223P32622303.59E-10IGSLSC;LSC
1688070F3288091-34.80E-05ycf2;ycf2IRb;IRb
1788070P32148379-34.80E-05ycf2;ycf2IRb;IRa
1888091P32148400-34.80E-05ycf2;ycf2IRb;IRa
19148379F32148400-34.80E-05ycf2;ycf2IRa;IRa
207603F3134397-31.74E-04trnS-GCU;trnS-UGALSC;LSC
2161473P306147305.74E-09IGSLSC;LSC
227604P3043913-15.16E-07trnS-GCU;trnS-GGALSC;LSC
2342810F3097787-22.25E-05ycf3;IGSLSC;IRb
2442810P30138685-22.25E-05ycf3;IGSLSC;IRa
2564605P3064605-22.25E-05IGSLSC;LSC
26122596P30123176-22.25E-05IGS;ycf1SSC;SSC
27124278P30124278-22.25E-05ycf1;ycf1SSC;SSC
283753F30120284-36.29E-04trnK-UUU;ndhALSC;SSC
2934398P3043913-36.29E-04trnS-UGA;trnS-GGALSC;LSC
3034466P3043851-36.29E-04trnS-UGA;trnS-GGALSC;LSC
3165897P3065948-36.29E-04IGSLSC;LSC
32124225F30124252-36.29E-04ycf1;ycf1SSC;SSC
33173R3034492-36.29E-04IGSLSC;LSC
34185R30112594-36.29E-04IGSLSC;SSC
3534491R30174-36.29E-04trnS-UGA;IGSLSC;LSC

Note: IRa and IRb,represent a pair of inverted repeats. SSC and LSC represent a small single copy region and a lager single copy region, respectively

Note: IRa and IRb,represent a pair of inverted repeats. SSC and LSC represent a small single copy region and a lager single copy region, respectively A total of 92 SSRs, including 66 mononucleotides (P1), 18 dinucleotides (P2), 3 trinucleotides (P3), and 5 tetranucleotides were explored. Most were distributed in the LSC (58, 63.00%) and SSC regions (22, 23.9%), with some in the IR region (12,13.00%) (Tables 8 and 9, Fig 3). One SSRs belonged to the C repeat units and the others belonged to the A and T types (98.49%), while dinucleotides included TA and AT repeats. Trinucleotides were the last prevalent with the lowest number of repeat units (3). Moreover, 37 repeats were found in different genes, and the remaining were found in intergenic regions.
Table 8

Number of SSRs distributed in the SSC, LSC, and IR regions.

RegionExonIntronIntergenicNumberProportion
SSC13452223.90%
LSC612405863.00%
IR20101213.00%
Table 9

Distribution of SSRs in the broccoli cp genome.

SSR typeUnitLengthNumberGenomic position (gene)
P1A10812446–12455, 12867–12876, 41568–41577, 50050–50059_(trnV-UAC), 66012–66021, 109314–139323_(ndhF), 122605–122614, 138192–138201
11726937–26947, 60220–60230, 64103–64113,82925–82935, 112599–112609, 119762–119772_(ndhA), 137413–137423
13367123–67135, 124958–124970_(ycf1), 126037–126049_(ycf1)
142113669–113682_(ccsA), 140343–140356
16130260–30275
T101815624–15633_(rpoC2), 16763–16772_(rpoC2), 25247–25256_(rpoB), 28408–28417, 29383–29392, 48803–48812, 53171–53180, 55633–55642, 64126–64135,70288–70297_(clpP), 80677–80686_(rpl16), 81154–81163_(rpl16), 81550–81559_(rpl16), 81661–81670, 98300–98309, 123187–123196_(ycf1), 124444–124453_(ycf1), 127178–127187_(ycf1)
11113978–3988_(trnK-UUU), 6815–6825, 7777–7787, 12467–12477, 17512–17522_(rpoC2), 74110–74120_(petB), 99078–99088, 120304–120314_(ndhA), 120317–120327_(ndhA), 123208–123218_(ycf1), 126007–126017_(ycf1)
1244061–4072_(trnK-UUU), 63338–63349, 70265–70276_(clpP), 123096–123107_(ycf1)
13347324–47336, 77038–77050_(rpoA), 125310–125322_(ycf1)
1443769–3782_(trnK-UUU), 50869–50882, 96145–96158, 124990–125003_(ycf1)
15112160–12174_(atpF)
1627336–7351, 111781–111796
191124803–124821_(ycf1)
C11162109–62119
P2AT1047917–7926, 107448–107457, 129044–129053, 143015–143024
12213319–13330, 112614–112625
1433756–3769_(trnK-UUU), 30560–30573, 120287–120300_(ndhA)
TA1084557–4566, 6234–6243, 7841–7850, 18884–18893_(rpoC2), 26480–26489, 61869–61878, 93476–93485, 122815–122824_(ycf1)
181111597–111614
P3AAT12112612–12623
TTA12126447–26458
ATT12145612–45623
P4CAAA12127991–28002
TTCT12134268–34279
TAAA12145436–45447
TATC12147120–47131
ATAG121111359–111370_(ndhF)
Fig 3

Statistical summary of repeat sequences in the cp genome of broccoli.

IR junction characteristics

The expansion and contraction of IR-SSC and IR-LSC boundaries of seven species, including B. oleracea var. italica, A. thaliana, C. bursa−pastoris, B. napus, B. juncea, B. nigra, and Bunias orientalis, were compared (Fig 4). In the figure, JLB, JLA, JSB, JSA represent for IRb/LSC, IRa/LSC, IRb/SSC, and IRa/SSC junction, respectively. The IR sizes of the LSC, IR, and SSC regions were similar in the cp genomes of the seven species, and the IR length varied from 26,035 bp in B. napus to 26,459 bp in C. bursa−pastoris (accession number: AP009371). The JLB border was within the coding region of rps19 in the above seven species and only 1 base par difference in location across different cp genomes. The two genes ycf1 and ndhF crossed the JSB junction. Most of the ycf1 gene in the seven species was located in the IRB region and 1–4 bp was located in the SSC region. Overlap between ycf1 and ndhF was detected at the JSB boundary in all studied cp genomes, with lengths of 35 bp to 38 bp. The ycf1 gene crossed the JSA region in all cp genomes, and its length reflected changes in the JSA region. The tRNA noncoding gene trnH-GUG in the seven species were all within the LSC region, located 2–30 bp from the JLA boundary. These results suggested that the IR border shifts were relatively minor, involving only a small number of genes, with differences in gene overlap lengths and the distance of trnH-GUG at the junction of JLA boundaries.
Fig 4

Comparison of boundaries between the LSC, IR, and SSC regions in chloroplast genomes of seven species.

Genes are depicted by colored boxes. Boxes above or below the main line indicate adjacent border genes.

Comparison of boundaries between the LSC, IR, and SSC regions in chloroplast genomes of seven species.

Genes are depicted by colored boxes. Boxes above or below the main line indicate adjacent border genes. cpDNA-based phylogenetic analyses have provided insight into evolutionary relationships, population genetics, and classification in different plant taxa [27]. To investigate the taxonomic status and evolutionary relationships of Brassica oleracea var. italica within Brassicaceae, ML phylogenetical analyses were performed based on 56 complete cp genome sequences (Fig 5). The phylogenetic analysis revealed that all B. oleracea cultivars were closely related, forming a well-supported clade. The newly generated genome (Accession Number: MN649876.1) was classified as B. oleracea and formed a clade with Brassica oleracea (NC_041167.1). The two Brassica oleracea var. italica cultivars MH388765.1 and MH388764.1 formed a clade. B. oleracea var. Gongylodes and B. oleracea MG717288.1 formed a clade. The phylogenetic results clearly elucidate the position of B. oleracea var. italica within in Brassicaceae and provide a basis for future evolutionary studies.
Fig 5

Phylogenetic tree inferred by the maximum likelihood method based on the complete cp genomes from 56 species.

Bootstrap support values are shown at the nodes.

Phylogenetic tree inferred by the maximum likelihood method based on the complete cp genomes from 56 species.

Bootstrap support values are shown at the nodes.

Discussion

B. oleracea var. italica is an important vegetable among B. oleracea cultivars. In general, the gene content and genome organization of land plant chloroplast genomes are more highly conserved than those of mitochondrial and nuclear genomes. However, gene losses and inversions had been reported in Asteraceae, Leguminosae, and Gentianaceae [28-30]. In the present study, we compared the complete cp genomes and gene annotations of various B. oleracea cultivars with data available in the GenBank database. The size of the cp genome obtained in this study was similar to those of other B. oleracea varieties. However, the number of annotated genes differed among genomes; this may be explained by incomplete data, gene losses, or interspecific differences. Our results indicated that the DNA GC content was not evenly distributed among genomic regions. The GC content in the IR region was higher than those in other regions, possibly because the GC content (an indicator of species relationships) of the four rRNAs in this region was high [11, 31]. The newly sequenced broccoli genome contained 133 genes, with high conservation in composition and arrangement, including self-replication genes, photosynthetic genes, other functional genes, and genes with unknown functions, consistent with previous research [32]. Furthermore, 23 genes contained one intron or two introns, and trnR-UKK had the largest intron. Introns play crucial roles in the regulation of gene expression depending on conditions and on the location [33]. Coding usage is a key factor in cp genome evolution. In the broccoli cp genome, the most and least frequent amino acids were leucine and cysteine, respectively, as observed in other angiosperm genomes, such as Ananas comosus, Decaisnea insignis, Nasturtium officinale, and Magnolia zenii [34-36]. In the broccoli cp genome, AT was preferred over GC, especially at the second and third codon positions (62.53% and 71.21%, respectively), consistent with results obtained for many terrestrial species [37]. A repeat analysis revealed 12 forward, 20 palindromic, and 3 reverse repeats in the broccoli cp genome. Most of these repeats were located in intron sequences, intergenic spacers, and the ycf gene, but several occurred in CDS regions and tRNAs. Repeat sequences are involved in sequence variation, genome rearrangements, and many rearrangement endpoints in algal and angiosperm genomes [38, 39]. The organization of cp genome sequences is highly conserved and the SSR primer for cp genomes can be inherited across genera and species. Accordingly, SSRs are widely used as molecular markers for genetic linkage map construction, population genetic analyses, polymorphism identification, plant breeding, and taxonomic analyses [40]. A total of 92 SSRs were obtained in this study, and 66 (71.7%) SSRs belonged to the P1 type, among which 65 (70.7%) belonged to A and T repeat units, while TA and AT repeats belonged to the P2 type. These findings agree with previous results [41, 42]. The phylogenetic analysis yielded 53 notes with bootstrap values, among which 21 and 36 notes had bootstrap values greater than 100% and 90%, respectively. In this present study, the phylogenetic trees demonstrated that Brassica nigra and S. arvensis were clustered into one subgroup, which was consistent with others research [43]. And the newly generated genome (Accession Number: MN649876.1) was closely related to NC_041167.1. Plant cp genomes are considered highly conserved; however, the sizes and LSC/IRb/SSC/IRa boundaries change due to contraction or expansion at the borders of the IR region [44]. Our results indicated that divergence in the IR border between seven species was related to the different positions of four genes, rps19, ycf1, ndhF, and trnH-GUG, in agreement with previous results [45, 46]. It is worth noting that the ycf1 gene was found at the JSA boundary from 1022 to 1034 bp in the IRA region in all cp genomes analyzed. Besides, the trnH gene located at the LSC region in all tested cp genomes, but the distance to the JLA boundary varies from 2-30bp. Combining the above results, we indicate that these seven cp genomes were relatively conserved, and the boundary divergence in the JSA and JLA in these species was the main reason for the expansion and contraction of the IR region.
  39 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

3.  Complete chloroplast genome sequence of Gycine max and comparative analyses with other legume genomes.

Authors:  Christopher Saski; Seung-Bum Lee; Henry Daniell; Todd C Wood; Jeffrey Tomkins; Hyi-Gyung Kim; Robert K Jansen
Journal:  Plant Mol Biol       Date:  2005-09       Impact factor: 4.076

4.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

5.  Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Quercus acutissima.

Authors:  Xuan Li; Yongfu Li; Mingyue Zang; Mingzhi Li; Yanming Fang
Journal:  Int J Mol Sci       Date:  2018-08-18       Impact factor: 5.923

6.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

7.  Complete chloroplast genome sequence of MD-2 pineapple and its comparative analysis among nine other plants from the subclass Commelinidae.

Authors:  R M Redwan; A Saidin; S V Kumar
Journal:  BMC Plant Biol       Date:  2015-08-12       Impact factor: 4.215

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

9.  Complete Chloroplast Genome Sequence of Decaisnea insignis: Genome Organization, Genomic Resources and Comparative Analysis.

Authors:  Bin Li; Furong Lin; Ping Huang; Wenying Guo; Yongqi Zheng
Journal:  Sci Rep       Date:  2017-08-30       Impact factor: 4.379

10.  The Complete Plastid Genome of Magnolia zenii and Genetic Comparison to Magnoliaceae species.

Authors:  Yongfu Li; Steven Paul Sylvester; Meng Li; Cheng Zhang; Xuan Li; Yifan Duan; Xianrong Wang
Journal:  Molecules       Date:  2019-01-11       Impact factor: 4.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.