Literature DB >> 32614355

Complete chloroplast genome sequence of Caryocar brasiliense Camb. (Caryocaraceae) and comparative analysis brings new insights into the plastome evolution of Malpighiales.

Rhewter Nunes1, Ueric José Borges de Souza1, Cintia Pelegrineti Targueta1, Rafael Barbosa Pinto1, Thannya Nascimento Soares1, José Alexandre Felizola Diniz-Filho2, Mariana Pires de Campos Telles1,3.   

Abstract

Caryocar brasiliense (Caryocaraceae) is a Neotropical tree species widely distributed in Brazilian Savannas. This species is very popular in central Brazil mainly by the use of its fruits in the local cuisine, and indeed it is one of the candidates, among Brazilian native plants, for fast track incorporation into cropping systems. Here we sequenced the complete chloroplast genome of C. brasiliense and used the data to access its genomic resources using high-throughput sequencing. The chloroplast exhibits a genome length of 165,793 bp and the typical angiosperm quadripartite structure with two copies of an inverted repeat sequence (IRa and IRb) of 34,902 bp each, separating a small single copy (SSC) region of 11,852 bp and a large single copy (LSC) region of 84,137 bp. The annotation analysis identified 136 genes being 87 protein-coding, eight rRNA and 37 tRNA genes. We identified 49 repetitive DNA elements and 85 microsatellites. A bayesian phylogenetic analysis helped to understand previously unresolved relationships in Malpighiales, placing Caryocaraceae as a separated group in the order, with high supported nodes. This study synthetizes valuable information for further studies allowing a better understanding of evolutionary patterns in the group and providing resources for future breeding programs.

Entities:  

Year:  2020        PMID: 32614355      PMCID: PMC7263422          DOI: 10.1590/1678-4685-GMB-2019-0161

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


In the order, Malpighiales, the family Caryocaraceae has not yet been explored using genomic approaches. Caryocaraceae represents a poorly resolved group within Malpighiales, forming a polytomy with other families, such as Malpighiaceae and Chrysobalanaceae, for which we have fully sequenced chloroplast genomes at the species level (Xi ; Angiosperm Phylogeny Group ). The main representatives of the Caryocaraceae family are the species of the genus Caryocar L., in particular Caryocar brasiliense Camb. This species is a Neotropical tree, the fruit of which is highly valued in Brazilian cuisine, a nutritious source for bats (Gribel and Hay, 1993), and is widely known in folk culture as a symbol of Brazilian savannas (or Cerrado) (Gribel and Hay, 1993; Araujo, 1995). The fruit pulp of C. brasiliense is rich in unsaturated fatty acids, vitamins, and phenolic acids, as well in carotenoids, such as violaxanthin, lutein, and zeaxanthin (Castro ; Mariano ). Because of all these characteristics, C. brasiliense is one of the main native Cerrado species that is a candidate for incorporation into cropping systems (Leite ; Tunholi ). Despite the importance of Caryocar brasiliense, until now no genomic resources have been developed for this species. A few studies have used microsatellite markers or short sequences to evaluate the genetic diversity patterns of this species, showing that the natural populations of C. brasiliense have a relatively low genetic structure and a high genetic and phylogeographic diversity (e.g. Diniz-Filho ; Collevatti ). Thus, in this study, we sequenced the complete chloroplast genome of C. brasiliense and used the data to access its genomic resources using high-throughput sequencing. As a result, we generated information regarding the chloroplast genome sequence, gene composition and organization, and repeat sequences using part of our data to reconstruct a phylogenetic tree for the Malpighiales order to analyze the relationships of C. brasiliense within this group. The total DNA of the fresh leaves of Caryocar brasiliense was extracted using the CTAB protocol (Doyle and Doyle, 1987). The sample was sequenced using an Illumina HiSeq2000 platform in paired-end 2100 bp mode. Raw reads were evaluated for base quality sequencing and the presence of sequencing adapters using FastQC software (Andrews, 2010). Quality control was performed using Trimmomatic (Bolger ) software. The resulting high-quality reads were selected for de novo chloroplast genome assembly using NOVOPlasty v.2.7.1 software (Dierckxsens ). We performed gene annotation of the Caryocar brasiliense chloroplast genome using CHLOROBOX GeSeq (Tillich ) and Dual Organellar GenoMe Annotator (DOGMA) (Wyman ) software. Simple sequence repeats (SSR) or microsatellite regions were predicted in the C. brasiliense chloroplast genome using Imperfect Microsatellite Extractor (IMEx) (Mudunuri and Nagarajaram, 2007). Repeat sequence elements were predicted using REPuter software (Kurtz, 2002). In addition, we performed a Bayesian phylogenetic analysis for several Malpighiales species using 76 protein-coding gene sequences. For a detailed description of the methods, please refer to the Supplementary Material. The complete sequence of the Caryocar brasiliense chloroplast genome was deposited in the GenBank database (accession number: MK726375) with a high mean genome coverage (715X). The plastome of C. brasiliense exhibited a total length of 165,793 bp and typical quadripartite division, which has previously been observed in other flowering plants (Figure 1; Figure S1). The genome was comprised of a large single copy (LSC), a small single copy (SSC), and a pair of inverted repeats (IRa and IRb). These regions were comprised of 84,137 bp, 11,852 bp, and 34,902 bp, respectively, with a GC content of 36.7%. The IR regions exhibited the greatest GC content (39.6%), followed by LSC (35.0%) and SSC (31.5%). Within the inverted repeat regions, the GC content was higher where the rRNAs were predicted.
Figure 1

Chloroplast genome map of Caryocar brasiliense. The genes drawn outside and inside of the circle are transcribed in clockwise and counterclockwise directions, respectively. Genes were colored based on their functional groups. The inner circle shows the quadripartite structure of the chloroplast: small single copy (SSC), large single copy (LSC) and a pair of inverted repeats (IRa and IRb). The gray ring marks the GC content with the inner circle marking a 50% threshold. Genes that have introns were marked with “*” and pseudogenes were marked with “#”.

We compared the structural features of the C. brasiliense chloroplast genome with another nine chloroplast genomes from nine other families in the Malpighiales order. In comparison, C. brasiliense had the largest genome size, with large IR regions, but one of the smallest small single copies regions, along with Linum usitatissimum L. (Souza et al., 2017) (Table 1). However, C. brasiliense showed a similar size with respect to the LSC region of other species. This may be explained by the fact that many of the photosynthetic genes are present in this region, and are therefore important for the persistence of these species, resulting in fewer contraction/expansion events in the region of the chloroplast genome.
Table 1

Comparative chloroplast genome structural features in 10 species from Malpighiales order. LSC: Large Single Copy region; SSC: Small Single Copy region; IR: Inverted Repeats regions and bp: base pair.

SpeciesFamilyGenome size (bp)LSC (bp)SSC (bp)IR (bp)GC(%)
Caryocar brasiliense Caryocaraceae165,79384,13711,85234,90236.7
Garcinia mangostana Clusiaceae158,17986,45817,70327,00936.1
Chrysobalanus icaco Chrysobalanaceae163,93789,18819,81726,88536.2
Erythroxylum novogranatense Erythroxylaceae163,93791,38418,13727,20835.9
Manihot esculenta Euphorbiaceae161,45389,29518,25026,95435.9
Linum usitatissimum Linaceae156,72181,76710,97431,99037.5
Byrsonima coccolobifolia Malpighiaceae160,32988,52417,83326,98636.8
Passiflora edulis Passifloraceae151,40685,72013,37826,15437.0
Populus tremula Salicaceae156,06784,36716,67027,50936.8
Viola seoulensis Violaceae156,50785,69118,00826,40436.3
We found 115 different genes in the genome, of which 77 were protein-coding genes, four ribosomal RNAs, 30 transfer RNAs, and four pseudogenes (Table S1). In addition, 10 protein-coding genes, four rRNAs, and seven tRNAs were found to be duplicated. Three of these tRNA genes, namely trnA-UGC, trnM-CAU, and trnR-ACG, were found to have more than one copy in the genome (each gene appeared four times). As such, taking into consideration these duplicated genes, the C. brasiliense chloroplast genome had a total of 136 genes (87 protein-coding genes, four pseudogenes, eight rRNAs, and 37 tRNAs). We also observed 10 genes that contained introns. Over all, C. brasiliense was found to have a relatively conserved number of genes compared to the other Malpighiales species, especially in terms of the genes related to rRNA, tRNA, and photosynthesis. The gene features of C. brasiliense were very similar to those of Byrsonima coccolobifolia Kunth (Menezes ), another Malpighiales species (Table S2). The pseudogenes predicted in the C. brasiliense chloroplast genome were ndhH, psaA, psaB, and psbA. These pseudogenes had a copy of the gene in the complete form and are supposedly involved in the structural changes of the chloroplast genome in C. brasiliense. For example, the ndhH pseudogene was related to the process of duplication of the Inverted Repeat region, while the others pseudogenes were related to the inversion events in the Large Single Copy region, generating variation in the order of the plastid genes in comparison to the chloroplast genomes of other Malpighiales species. Such inversion events were also observed in the chloroplast genome of Passiflora edulis Sims, another Malpighiales species (Cauz-Santos ), which is indicative of a non-conservative gene collinearity found in all members of this order. A comparative analysis with 10 Malpighiales species revealed a highly conserved pattern of variation within the genomes (highly variable regions between C. brasiliense and Jatropha curcas were also very variable in other species compared to J. curcas), with highly conserved protein-coding genes, as well as intergenic regions with more variation (Figure S2). Passiflora edulis had the highest divergent regions in comparison to Jatropha curcas L. and other species, with a similarity between regions of less than 50%. Compared to other species under analysis, Caryocar brasiliense displayed one of the smallest Small Single Copy (SSC) regions and the largest Inverted Repeat (IR) regions. This indicates that the size of these regions evolves differently between different species of Malpighiales (Mower ). We performed an IR boundaries comparison analysis to determine the genes present at the sites that separate the chloroplast regions (Figure 2). The most common gene to flank the region between the IR and SSC was ycf1 in seven of the ten species analyzed. Commonly, ycf1 and ycf1 pseudogene were present in the IR/SSC boundaries in Malpighiales species, such as Byrsonima coccolobifolia (Menezes ), Chrysobalanus icaco L. (Bardon ), Erythroxylum novogranatense (D. Morris) Hieron. (unpublished), Garcinia mangostana L. (Jo ), Manihot esculenta Crantz (Daniell ), Populus tremula L. (Kersten ), and Viola seoulensis Nakai (Cheon ).
Figure 2

Comparison of the junctions involving Inverted Repeat (IRa and IRb) regions with Large Single Copy (LSC) and Small Single Copy (SSC) regions among 10 chloroplast genomes of Malpighiales. The IR regions are represented in yellow whereas LSC and SSC in blue and orange, respectively. The white boxes represent the genes present in each region. The arrows represent the distance (in base pairs) of genes from the junction site between regions.

Three of the species which were subjected to analysis did not have ycf1 as a flanking gene of the IR/SSC region, Caryocar brasiliense, Passiflora edulis (Cauz-Santos ), and Linum usitatissimum L. (Souza et al., 2017). Caryocar brasiliense had a boundary between the IR/SSC region, flanked by ndhH and ndhH pseudogene, while in P. edulis, rps15 and intergenic region ycf1-ndhF were found. For L. usitatissimum, ndhA and the intergenic region ndhA-ndhF were observed. The IR/SSC boundary region was found to have a high collinearity between different genomes. The order of the conserved genes was as follows: ndhA, ndhH, rps15, and ycf1 for all species under analysis. Meanwhile, the bounder of IR/SSC for the major species included ycf1, however, in some Malpighiales species, this site was displaced with the donation of genomic segments from the SSC region to the IR regions. In addition to colinear genetic evidence, this pattern of expansion of the IR regions with the contraction of the SSC region was also supported by the region lengths, which indicated that species with no ycf1 flanking the IR/SSC boundery had a smaller SSC region (Table S2). A total of 85 perfect microsatellites (SSRs) were identified in the Caryocar brasiliense chloroplast genome sequence (Figure S3). With respect to the repeat motif, we found 52 mononucleotides, 11 dinucleotides, five trinucleotides, 12 tetranucleotides, three pentanucleotides, and two hexanucleotides. The number of continuous repeats (motif iterations) ranged from 3 to 16 (Table S3). The chloroplast region with the highest number of SSRs were LSC (56.47%), followed by IR (29.41%) and SSC (14.12%). SSRs are important regions because they serve as molecular markers in studies on genetic diversity, phylogeny, and phylogeography for Brazilian Savanna species (Rabelo ; Soares ; Telles ). The SSR regions identified can be used in molecular marker development testing for genetic diversity studies of C. brasiliense and related species (Table S4). We also identified repeat sequences in the C. brasiliense chloroplast genome and another 10 species from the Malpighiales order (chloroplast genome subset described in Material and Methods) (Table S5). We observed a total of 49 repeats in C. brasiliense. With regards to the type of repeat, we found 18 forward, 30 palindromic, and one reverse repeat. No repeats of the complement type were observed in C. brasiliense. Complement repeats only occurred in three of the 10 species in analysis: one in Chrysobalanus icaco, two in Erythroxylum novogranatense, and one in Viola seoulensis. In another three species, Linum usitatissimum, Manihot esculenta, and Passiflora edulis, we only found forward and palindromic repeats. Repeat analysis revealed a highly conserved total number of repeats within the Malpighiales order, although the type of repeats varied across species. We performed a phylogenetic analysis by sampling 52 representatives from all the families in the Malpighiales order with sequenced chloroplast genomes using protein-coding gene sequences (Table S6). Additionally, we also retrieved chloroplast gene sequences from Anthodiscus peruanus Baill. (Caryocaraceae) and Putranjiva roxburghii Wall. (Putranjivaceae). This analysis resulted in a phylogenetic tree with high supported values for the nodes, with a Bayesian posterior probability given to each one (Figure 3). As expected, all the species analyzed (including C. brasiliense) fell within the clades that represent their respective botanical families, validating the chloroplast sequences obtained in this study.
Figure 3

Phylogenetic tree reconstruction based on 52 taxa using Bayesian inference based on 76 protein-coding chloroplast genes. Numbers represent the Bayesian posterior probability given to each node. The bars on the right represents the botanic families of species.

The currently accepted phylogeny for the order Malpighiales displays a polytomy involving Caryocaraceae, Putranjivoid, Malpighioid, and Chrysobalanoid species (Angiosperm Phylogeny Group ; Xi ). Here, for the first time, a highly supported phylogenetic tree identified Caryocaraceae as a sister clade in relation to the other families within Malpighiales (Figure 3). Moreover, this result provides evidence that reinforces the non-clustering among Caryocaraceae, Linaceae, and Erythroxylaceae within the same clade, as discussed in a previous study (Soltis ). The current phylogeny of this group was investigated using several genes. The novel genomic resources produced in this study will help to improve our understanding of the phylogenetic relationships among species in the order Malpighiales (Angiosperm Phylogeny Group ; Angiosperm Phylogeny Group ; Xi ; Angiosperm Phylogeny Group ). These resources will help to resolve the problem of uncertain clades. This study demonstrates how the use of high-throughput sequencing technologies can increase the accuracy of phylogenetic analysis. Moreover, the data provided here serve as a novel genetic and genomic resource for Malpighiales and offer the first complete genome sequence and chloroplast content in the Caryocaraceae family.
  18 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates.

Authors:  Andan Zhu; Wenhu Guo; Sakshi Gupta; Weishu Fan; Jeffrey P Mower
Journal:  New Phytol       Date:  2015-11-17       Impact factor: 10.151

3.  The first complete chloroplast genome sequence from Violaceae (Viola seoulensis).

Authors:  Kyeong-Sik Cheon; Jong-Cheol Yang; Kyung-Ah Kim; Su-Kil Jang; Ki-Oug Yoo
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2015-12-28       Impact factor: 1.514

4.  IMEx: Imperfect Microsatellite Extractor.

Authors:  Suresh B Mudunuri; Hampapathalu A Nagarajaram
Journal:  Bioinformatics       Date:  2007-03-22       Impact factor: 6.937

5.  Phylogenomics and a posteriori data partitioning resolve the Cretaceous angiosperm radiation Malpighiales.

Authors:  Zhenxiang Xi; Brad R Ruhfel; Hanno Schaefer; André M Amorim; M Sugumaran; Kenneth J Wurdack; Peter K Endress; Merran L Matthews; Peter F Stevens; Sarah Mathews; Charles C Davis
Journal:  Proc Natl Acad Sci U S A       Date:  2012-10-08       Impact factor: 11.205

6.  Unraveling the biogeographical history of Chrysobalanaceae from plastid genomes.

Authors:  Léa Bardon; Cynthia Sothers; Ghillean T Prance; Pierre-Jean G Malé; Zhenxiang Xi; Charles C Davis; Jerome Murienne; Roosevelt García-Villacorta; Eric Coissac; Sébastien Lavergne; Jérôme Chave
Journal:  Am J Bot       Date:  2016-06-21       Impact factor: 3.844

7.  The complete nucleotide sequence of the cassava (Manihot esculenta) chloroplast genome and the evolution of atpF in Malpighiales: RNA editing and multiple losses of a group II intron.

Authors:  Henry Daniell; Kenneth J Wurdack; Anderson Kanagaraj; Seung-Bum Lee; Christopher Saski; Robert K Jansen
Journal:  Theor Appl Genet       Date:  2008-01-24       Impact factor: 5.699

8.  GeSeq - versatile and accurate annotation of organelle genomes.

Authors:  Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

9.  Genome Sequences of Populus tremula Chloroplast and Mitochondrion: Implications for Holistic Poplar Breeding.

Authors:  Birgit Kersten; Patricia Faivre Rampant; Malte Mader; Marie-Christine Le Paslier; Rémi Bounon; Aurélie Berard; Cristina Vettori; Hilke Schroeder; Jean-Charles Leplé; Matthias Fladung
Journal:  PLoS One       Date:  2016-01-22       Impact factor: 3.240

10.  Chloroplast genomes of Byrsonima species (Malpighiaceae): comparative analysis and screening of high divergence sequences.

Authors:  Alison P A Menezes; Luciana C Resende-Moreira; Renata S O Buzatti; Alison G Nazareno; Monica Carlsen; Francisco P Lobo; Evanguedes Kalapothakis; Maria Bernadete Lovato
Journal:  Sci Rep       Date:  2018-02-02       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.