Literature DB >> 27213127

The complete chloroplast genome of Capsicum frutescens (Solanaceae).

Donghwan Shim1, Sebastin Raveendar2, Jung-Ro Lee2, Gi-An Lee2, Na-Young Ro2, Young-Ah Jeon2, Gyu-Taek Cho2, Ho-Sun Lee2, Kyung-Ho Ma2, Jong-Wook Chung3.   

Abstract

PREMISE OF THE STUDY: We report the complete sequence of the chloroplast genome of Capsicum frutescens (Solanaceae), a species of chili pepper. METHODS AND
RESULTS: Using an Illumina platform, we sequenced the chloroplast genome of C. frutescens. The total length of the genome is 156,817 bp, and the overall GC content is 37.7%. A pair of 25,792-bp inverted repeats is separated by small (17,853 bp) and large (87,380 bp) single-copy regions. The C. frutescens chloroplast genome encodes 132 unique genes, including 87 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. Of these, seven genes are duplicated in the inverted repeats and 12 genes contain one or two introns. Comparative analysis with the reference chloroplast genome revealed 125 simple sequence repeat motifs and 34 variants, mostly located in the noncoding regions.
CONCLUSIONS: The complete chloroplast genome sequence of C. frutescens reported here is a valuable genetic resource for Capsicum species.

Entities:  

Keywords:  Capsicum frutescens; Solanaceae; chili pepper; chloroplast genome; next-generation sequencing

Year:  2016        PMID: 27213127      PMCID: PMC4873274          DOI: 10.3732/apps.1600002

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


A chloroplast is an organelle with its own genome encoding a number of chloroplast-specific components (Sugiura et al., 1998). Owing to its tractable size and high level of conservation, the chloroplast genome can be used to characterize genetic relationships among species. Furthermore, plant taxonomists have widely adopted the sequence variability of two loci in land plants, consisting of portions of the chloroplast rbcL and matK genes, as an effective DNA barcode (Vijayan and Tsou, 2010). Chloroplast DNA contains many of the genes necessary for proper functioning of the organelle. The analysis of chloroplast DNA sequences has proven useful in studying plant evolution (Shaw et al., 2007), and the field of chloroplast genome characterization is growing rapidly (Timmis et al., 2004). The size of the genome, which has been determined for a number of plants and algae, ranges from 85 to 292 kbp. The complete DNA sequences of several different chloroplast genomes of plants and algae have been reported. Many chloroplast DNAs contain two inverted repeats (IRs), which separate a large single-copy region (LSC) from a small single-copy region (SSC) (Palmer and Thompson, 1982). The IRs vary in length from 4 to 25 kbp (Robinson et al., 2009). Capsicum frutescens L. (Solanaceae), a name that is generally applied to all cultivated peppers in the United States, is also known as C. annuum L. (Smith and Heiser, 1951). Cultivars of C. frutescens can be annual or short-lived perennial plants. The flowers have a greenish white or greenish yellow corolla, and they are either insect- or self-pollinated. The fruit is usually very pungent, growing to 1.0–8.0 cm long and 0.6–3.0 cm in diameter (Smith and Heiser, 1951). The fruit is typically pale yellow as it matures to a bright red, but it can also be other colors (Heiser and Smith, 1953; Stummel and Bosland, 2006). More recently, C. frutescens has been bred to produce ornamental strains with a large number of erect peppers growing in colorful ripening patterns (Stummel and Bosland, 2006). Capsicum frutescens likely originated in South or Central America (Heiser, 1979; Clement et al., 2010) and spread quickly throughout the tropical and subtropical regions in this area, where it still grows wild today (Purseglove, 1976). It is also believed that C. frutescens is the ancestor of C. chinense Jacq. (Bosland, 1996; Basu et al., 2003). In this study, using Illumina technology, the complete chloroplast genome of C. frutescens was sequenced, assembled, annotated, and mined for simple sequence repeat (SSR) markers and for single-nucleotide polymorphism (SNP) and insertion/deletion (indel) variants. The resultant data have been made publicly available as a resource for genetic information for Capsicum L. species, which will facilitate investigations into genetic variation and phylogenetic relationships of closely related Capsicum species.

METHODS AND RESULTS

For this study, C. frutescens seeds (accession no. IT158639) were obtained from the National Agrobiodiversity Center, Rural Development Administration, Republic of Korea. Seeds were germinated and grown in a greenhouse, fresh leaves were collected from 40-d-old seedlings, and DNA was extracted using a DNeasy Plant Mini Kit (QIAGEN, Valencia, California, USA) according to the manufacturer’s instructions to construct chloroplast DNA libraries. An Illumina paired-end DNA library (average insert size of 500 bp) was constructed using the Illumina TruSeq library preparation kit following the manufacturer’s instructions (Illumina, San Diego, California, USA). The library was sequenced with 2 × 300 bp on the MiSeq instrument at LabGenomics (http://www.labgenomics.co.kr/). Prior to chloroplast de novo assembly, low-quality sequences (quality score < 20; Q20) were filtered out, and the remaining high-quality reads were assembled using the CLC Genome Assembler (version beta 4.6; CLC bio, Aarhus, Denmark) with a minimum overlap size of 200 bp and maximum bubble size of 50 bp for the de Bruijn graph. Chloroplast contigs were selected from the initial assembly by performing a BLAST (version 2.2.31) search against the reference chloroplast genome of C. annuum (GenBank accession NC_018552) using CLC software with parameters of 0.5 for fraction, 0.8 for similarity, and 200–600 bp of overlap size (Jo et al., 2011). The selected chloroplast contigs were merged into a total of four contigs, and iterative contig extensions were performed to construct a complete C. frutescens chloroplast genome by mapping raw reads to the contigs. Dual Organellar GenoMe Annotator (DOGMA; Wyman et al., 2004) and CpGAVAS (Liu et al., 2012) were used to annotate the chloroplast genome. All transfer RNA (tRNA) genes were amended with tRNAscan-SE (Lowe and Eddy, 1997). OGDRAW (Lohse et al., 2007) was used to produce a map of the genome. Sputnik software (Cardle et al., 2000) was used to find the SSR markers present in the chloroplast genome of C. frutescens. It uses a recursive algorithm to search for repeats with lengths between two and five, and finds perfect, compound, and imperfect repeats. Sputnik has been applied for SSR identification in many species, including Arabidopsis and barley (Cardle et al., 2000). To identify SNP and indel variants in the C. frutescens chloroplast genome, we used BWA (Li and Durbin, 2009) with ‘mem’ command line options ‘-k19 –w100 –d100 –r1.5 –y20 –c500 –D0.5 –W0 –m50’ and SAMtools (Li et al., 2009) software with ‘mpileup’ command line options ‘-uf –d250 -q0 –e20 –h100 –L250 –m1 –o40.’ A more detailed method is described at http://samtools.sourceforge.net/mpileup.shtml. Illumina paired-end (2 × 300 bp) sequencing produced a total of 8,272,114 paired-end reads, with an average fragment length of 256 bp, which were then analyzed to generate 1,796,432,923 bp of sequence. The results contain 31,772,592 mapped nucleotides with an average coverage of 202× on the chloroplast genome. Contig alignment and scaffolding based on paired-end data resulted in a complete circular C. frutescens chloroplast genome sequence (Fig. 1). The chloroplast genome of C. frutescens has been deposited in GenBank (accession no. KR078312; National Center for Biotechnology Information [NCBI]). It has a total length of 156,817 bp and is composed of an LSC of 87,380 bp, two IRs of 25,792 bp, and an SSC of 17,853 bp. The overall GC content of the C. frutescens chloroplast genome is 37.7%, with the IRs having a higher GC content (43.1%) than the LSC (35.7%) and SSC (32.0%) due to the presence of GC-rich ribosomal RNA (rRNA) genes. The C. frutescens chloroplast genome encodes 132 unique genes (Appendix 1), including 87 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Seven of these genes are duplicated in the IR regions, nine genes (rps16, atpF, rpoC1, petB, petD, rpl16, rpl2 (IR), ndhB (IR), ndhA) and six tRNA genes contain one intron, and two genes (clpP, rps12) and one ycf (ycf3) contain two introns (Fig. 1).
Fig. 1.

Gene map of the Capsicum frutescens chloroplast genome. Genes drawn inside the circle are transcribed clockwise, while those drawn outside are transcribed counterclockwise (marked with two arrows). Different functional gene groups are color-coded. Variation in the GC content of the genome is shown in the middle circle. The map was drawn using OGDRAW version 1.2 (Lohse et al., 2007).

Gene map of the Capsicum frutescens chloroplast genome. Genes drawn inside the circle are transcribed clockwise, while those drawn outside are transcribed counterclockwise (marked with two arrows). Different functional gene groups are color-coded. Variation in the GC content of the genome is shown in the middle circle. The map was drawn using OGDRAW version 1.2 (Lohse et al., 2007). The size of the C. frutescens chloroplast genome (156,817 bp) was larger than reported for Capsicum species such as C. annuum var. glabriusculum (Dunal) Heiser & Pickersgill (GenBank accession no. KJ619462) and C. annuum (GenBank accession no. NC_018552). The lengths of the LSC and IRs in C. frutescens differed from those in the other two species and contributed to the variation of chloroplast genome size. For example, the C. frutescens chloroplast genome was 36 bp longer than the reported C. annuum chloroplast genome and 205 bp longer than the C. annuum var. glabriusculum chloroplast genome. Furthermore, the SSC and IR regions of C. frutescens were 3 and 9 bp longer, respectively, and the LSC region was 14 bp shorter and 167 bp longer, respectively, than those of the previously reported chloroplast genomes. The average GC content in the C. frutescens chloroplast genome is 37.7%, similar to other Capsicum species. The organization and gene order of the Capsicum chloroplast genome exhibited the general chloroplast genome structure seen in angiosperms (Sugiura, 1992). The Capsicum chloroplast genome contains 132 genes (Appendix 2), of which there were eight rRNA genes, 37 tRNA genes, 21 ribosomal subunit genes (12 small subunit and nine large subunit), and four DNA-directed RNA polymerase genes. Forty-six genes were involved in photosynthesis, of which 11 encoded subunits of the NADH-oxidoreductase, seven for photosystem I, 15 for photosystem II, six for the cytochrome b complex, six for different subunits of ATP synthase, and one for the large chain of ribulose bisphosphate carboxylase/oxygenase (RuBisCO). Five genes were involved in different functions, and three genes were of unknown function. As shown in Fig. 1 and Appendix 2, genome organization appeared to be more conserved with unique gene sequences, as discovered previously in Capsicum species (Jo et al., 2011; Zeng et al., 2014; Raveendar et al., 2015a). However, in this newly determined chloroplast genome, we found 132 predicted genes and size variations were observed in the IR and LSC regions. A total of 125 potential SSRs motifs were identified, located mostly in the noncoding regions (Table 1); of these, the majority belonged to tetranucleotide (50%) and trinucleotide (26%) repeats. All other types of SSRs, such as di- and pentanucleotide motifs, were relatively low (25%). The majority of tetranucleotide SSRs had the AAAT/AATA/ATAA motif, followed by those with the ATAA/TAAA/AAAT motif; the TTTG/TTGT/TGTT, TCTT/CTTT/TTTC, and AATT/ATTA/TTAA motifs were found with similar frequency (7.2%). Two different repeats—those with the TTTTA/TTTAT/TTATT and TTATT/TATTT/ATTTT motifs—were identified among pentanucleotide SSRs. The TTC/TCT/CTT and TTA/TAT/ATT motifs were identified among the trinucleotide SSRs, but only the TA/AT motif was identified for the dinucleotide SSRs (Table 1). In total, 125 potential SSRs motifs were identified in the 156.8-kb sequence of the Capsicum chloroplast genome. Hence, the observed frequency of SSRs motifs was approximately one per 1250 bp of chloroplast genome.
Table 1.

SSR candidates of the Capsicum frutescens chloroplast genome.

SSR typeSSR abundancesPercentage abundance (%)
Dinucleotide
 TA/AT97.2
Trinucleotide
 TTC/TCT/CTT97.2
 TTA/TAT/ATT2318.4
Tetranucleotide
 TTTG/TTGT/TGTT97.2
 TCTT/CTTT/TTTC97.2
 ATAA/TAAA/AAAT1612.8
 AATT/ATTA/TTAA97.2
 AAAT/AATA/ATAA1915.2
Pentanucleotide
 TTTTA/TTTAT/TTATT118.8
 TTATT/TATTT/ATTTT118.8
Total125100
SSR candidates of the Capsicum frutescens chloroplast genome. Comparison of the C. frutescens chloroplast genome sequence with the reference chloroplast sequence of C. annuum revealed a total of 34 mutations (18 SNPs and 16 indels), with 15 of these variants involving more than one nucleotide (Table 2 and 3). Among the detected variants, six SNPs and two indels were observed in the coding region of the chloroplast genome. Among these SNPs and indels, there were 29 and five mutations located in the LSC and SSC regions, respectively. These molecular markers will facilitate studies of genetic diversity, population genetic structure, and sustainable conservation for C. frutescens.
Table 2.

SNP markers of the Capsicum frutescens chloroplast genome.

No.REF (C. annuum)ALT (C. frutescens)Coding regionQUALRegion
1TCnoncoding region222LSC
2ATnoncoding region222LSC
3TGnoncoding region4.77LSC
4TCnoncoding region19.1LSC
5GTnoncoding region222LSC
6GAnoncoding region222LSC
7CAnoncoding region222LSC
8TAnoncoding region222LSC
9AGnoncoding region222LSC
10GCgene (petA)222LSC
11AGgene (petA)222LSC
12CAgene (petA)222LSC
13CAgene (petA)222LSC
14ATnoncoding region222LSC
15GAgene (rpl32)164SSC
16GTgene (rpl32)124SSC
17ATnoncoding region222SSC
18TGnoncoding region222SSC

Note: ALT = alteration; LSC = large single-copy; QUAL = Phred-scaled quality score; REF = reference; SSC = small single-copy.

Table 3.

Indel markers of the Capsicum frutescens chloroplast genome.

No.REF (C. annuum)ALT (C. frutescens)Coding regionQUALRegion
1ATTTTTTTTTATTTTTTTTTT, ATTTTTTTTTTTnoncoding region48.5LSC
2TAAAAAATAAAAAAAnoncoding region178LSC
3GAAAAAAAAAAAAAAAGAAAAAAAAAAAAAAAA, GAAAAAAAAAAA, GAAAAAAAAAAAAAAAAAnoncoding region18.5LSC
4CTTTTTTCTTTTTTTnoncoding region152LSC
5TCAACTCATTTTATnoncoding region214LSC
6TATTTTTAATTTTAATTTTTATnoncoding region217LSC
AATATATTTTAATTTTAAT
ATAAATAAATAATTTTAAT
ATATTAATATAAATAAATA
AATAAT
7CAAAAAAAAAACAAAAAAAAAAA, CAAAAAAAAAAAA, CAAAAAAAAnoncoding region65.5LSC
8ATTTTTTTTTATTTTTTTTTT, ATTTTTTTTTTTnoncoding region68.5LSC
9TAAAAAAAAAATAAAAAAAAAAA, TAAAAAAAAAAAAnoncoding region48.5LSC
10TCCGGTAAAGACTCCGGTCCGGTAAAGACGCCGGTAAAGAgene (rpl20)218LSC
TAAAGACTCCGGTAAAGACCTCCGGTAAAGACTCCGGTAAAGAC
11GTTTTTTTTGTTTTTTTTT, GTTTTTTTTTTnoncoding region94.5LSC
12GAAAAAAAGAAAAAAnoncoding region66.5LSC
13GAAAAAAAAGAAAAAAAAA, GAAAAAAAAAAnoncoding region90.5LSC
14CTTTTCTTTTTnoncoding region214LSC
15ATTCTTATTTTTTATTATTTTTTgene (rps19)203LSC
16TCCCCCTCCCCCCnoncoding region185SSC

Note: ALT = alteration; LSC = large single-copy; QUAL = Phred-scaled quality score; REF = reference; SSC = small single-copy.

SNP markers of the Capsicum frutescens chloroplast genome. Note: ALT = alteration; LSC = large single-copy; QUAL = Phred-scaled quality score; REF = reference; SSC = small single-copy. Indel markers of the Capsicum frutescens chloroplast genome. Note: ALT = alteration; LSC = large single-copy; QUAL = Phred-scaled quality score; REF = reference; SSC = small single-copy. The size of the C. frutescens chloroplast genome identified here is more closely related to that of C. annuum var. glabriusculum reported previously (Raveendar et al., 2015b). Moreover, the C. frutescens chloroplast genome has similar genome organization, gene order, gene sizes, and GC content, with only SNPs/indels variation. It has been reported that C. annuum var. glabriusculum is considered the wild parental species of the cultivated C. annuum (Votava et al., 2002; Aguilar-Meléndez et al., 2009; González-Jara et al., 2011).

CONCLUSIONS

We provide here the complete chloroplast genome sequence of C. frutescens, a cultivated pepper in the United States. Availability of this sequence and the recently determined C. annuum chloroplast genome sequence (GenBank accession no. NC_018552) enables us to assess genome-wide mutational dynamics within the genus Capsicum. The chloroplast genome possesses similar genome organization, gene order, gene sizes, and GC content, with only SNPs/indels variation having been revealed. It is difficult to get accurate phylogenies and effective species discrimination using a small number of plastid genes in evolutionarily young lineages (Ruhsam et al., 2015). Therefore, complete plastid genome sequencing provides a solution to this problem. Availability of this sequence can enable researchers to design conserved primers to sequence new genomic regions that could provide useful phylogenetic information for closely related species. Moreover, the structural details of this C. frutescens chloroplast genome join the growing database of Capsicum species, which can facilitate investigations into gene expression and genetic variation of these crop species.
Appendix 1.

General features of the Capsicum frutescens chloroplast genome.

Chloroplast genome featureQuantity
Genome size (bp)156,817
GC content (%)37.7
Total no. of genes132
Protein-coding genes87
rRNA genes8
tRNA genes37
Genes duplicated in IR regions7
Total introns12
Single intron (gene)9
Double introns (gene)3
Single intron (tRNA)6
Appendix 2.

Genes present in the Capsicum frutescens chloroplast genome.

Chloroplast genome featureGene products
Photosystem IpsaA, psaB, psaC, psaI, psaJ, ycf32, ycf4
Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Cytochrome b6/fpetA, petB1, petD1, petG, petL, petN
ATP synthaseatpA, atpB, atpE, atpF1, atpH, atpI
RuBisCOrbcL
NADH oxidoreductasendhA1, ndhB1,3, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Large subunit ribosomal proteinsrpl21,3, rpl14, rpl161, rpl20, rpl22, rpl233, rpl32, rpl33, rpl36
Small subunit ribosomal proteinsrps2, rps3, rps4, rps73, rps8, rps11, rps122,3,4, rps14, rps15, rps161, rps18, rps19
RNA polymeraserpoA, rpoB, rpoC11, rpoC2
Unknown function protein-coding geneycf13, ycf23, ycf153
Other genesaccD, ccsA, cemA, clpP2, matK
Ribosomal RNAsrrn163, rrn233, rrn4.53, rrn53
Transfer RNAstrnA-UGC1,3, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-UCC1, trnG-GCC, trnH-GUG, trnI-CAU3, trnI-GAU1,3, trnK-UUU1, trnL-UAA1, trnL-UAG, trnL-CAA3, trnfM-CAU, trnM-CAU, trnN-GUU3, trnP-UGG, trnQ-UUG, trnR-ACG3, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-UAC1, trnV-GAC3, trnW-CCA, trnY-GUA

Gene containing a single intron.

Gene containing two introns.

Two gene copies in IRs.

Transsplicing gene.

  18 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

Review 2.  The chloroplast genome.

Authors:  M Sugiura
Journal:  Plant Mol Biol       Date:  1992-05       Impact factor: 4.076

Review 3.  Evolution and mechanism of translation in chloroplasts.

Authors:  M Sugiura; T Hirose; M Sugita
Journal:  Annu Rev Genet       Date:  1998       Impact factor: 16.830

4.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors:  T M Lowe; S R Eddy
Journal:  Nucleic Acids Res       Date:  1997-03-01       Impact factor: 16.971

5.  The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum).

Authors:  Fan-chun Zeng; Cheng-wen Gao; Li-zhi Gao
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2014-05-09       Impact factor: 1.514

6.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

7.  Does complete plastid genome sequencing improve species discrimination and phylogenetic resolution in Araucaria?

Authors:  Markus Ruhsam; Hardeep S Rai; Sarah Mathews; T Gregory Ross; Sean W Graham; Linda A Raubeson; Wenbin Mei; Philip I Thomas; Martin F Gardner; Richard A Ennos; Peter M Hollingsworth
Journal:  Mol Ecol Resour       Date:  2015-02-15       Impact factor: 7.090

8.  Impact of human management on the genetic variation of wild pepper, Capsicum annuum var. glabriusculum.

Authors:  Pablo González-Jara; Alejandra Moreno-Letelier; Aurora Fraile; Daniel Piñero; Fernando García-Arenal
Journal:  PLoS One       Date:  2011-12-06       Impact factor: 3.240

9.  The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

Authors:  Sebastin Raveendar; Young-Wang Na; Jung-Ro Lee; Donghwan Shim; Kyung-Ho Ma; Sok-Young Lee; Jong-Wook Chung
Journal:  Molecules       Date:  2015-07-20       Impact factor: 4.411

10.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

View more
  3 in total

1.  Pan-plastome approach empowers the assessment of genetic variation in cultivated Capsicum species.

Authors:  Mahmoud Magdy; Lijun Ou; Huiyang Yu; Rong Chen; Yuhong Zhou; Heba Hassan; Bihong Feng; Nathan Taitano; Esther van der Knaap; Xuexiao Zou; Feng Li; Bo Ouyang
Journal:  Hortic Res       Date:  2019-09-07       Impact factor: 6.793

2.  Chloroplast Genes Are Involved in The Male-Sterility of K-Type CMS in Wheat.

Authors:  Yucui Han; Yujie Gao; Yun Li; Xiaoguang Zhai; Hao Zhou; Qin Ding; Lingjian Ma
Journal:  Genes (Basel)       Date:  2022-02-07       Impact factor: 4.096

3.  The Complete Plastome Sequences of Eleven Capsicum Genotypes: Insights into DNA Variation and Molecular Evolution.

Authors:  Nunzio D'Agostino; Rachele Tamburino; Concita Cantarella; Valentina De Carluccio; Lorenza Sannino; Salvatore Cozzolino; Teodoro Cardi; Nunzia Scotti
Journal:  Genes (Basel)       Date:  2018-10-17       Impact factor: 4.096

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.