Literature DB >> 28085925

The First Chloroplast Genome Sequence of Boswellia sacra, a Resin-Producing Plant in Oman.

Abdul Latif Khan1, Ahmed Al-Harrasi1, Sajjad Asaf2, Chang Eon Park2, Gun-Seok Park2, Abdur Rahim Khan2, In-Jung Lee2, Ahmed Al-Rawahi1, Jae-Ho Shin2.   

Abstract

Boswellia sacra (Burseraceae), a keystone endemic species, is famous for the production of fragrant oleo-gum resin. However, the genetic make-up especially the genomic information about chloroplast is still unknown. Here, we described for the first time the chloroplast (cp) genome of B. sacra. The complete cp sequence revealed a circular genome of 160,543 bp size with 37.61% GC content. The cp genome is a typical quadripartite chloroplast structure with inverted repeats (IRs 26,763 bp) separated by small single copy (SSC; 18,962 bp) and large single copy (LSC; 88,055 bp) regions. De novo assembly and annotation showed the presence of 114 unique genes with 83 protein-coding regions. The phylogenetic analysis revealed that the B. sacra cp genome is closely related to the cp genome of Azadirachta indica and Citrus sinensis, while most of the syntenic differences were found in the non-coding regions. The pairwise distance among 76 shared genes of B. sacra and A. indica was highest for atpA, rpl2, rps12 and ycf1. The cp genome of B. sacra reveals a novel genome, which could be used for further studied to understand its diversity, taxonomy and phylogeny.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28085925      PMCID: PMC5235384          DOI: 10.1371/journal.pone.0169794

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

A major distinguishing organelle of plant cells is the chloroplast, which was suggested to have originated from cyanobacteria through endosymbiosis [1-3]. The main role and function of the chloroplast is to perform photosynthesis and generate the building blocks required for plant growth and development [4,5]. Chloroplasts also participate in the biosynthesis of starch, fatty acids, pigments and amino acids [6]. Until recently, approximately 490 complete chloroplast (cp) genomes have been sequenced, and this information is publicly available (http://www.ncbi.nlm.nih.gov/ genome). Most of these assembled genomes are associated with economically important crop plants [7]. Chloroplast genome analysis and engineering, either alone or in combination with traditional breeding techniques, might provide information for the future development of novel plant sources to counteract various environmental stress tolerance issues and improve the level of human-derived benefits from the target plant [8]. The techniques might also provide information to improve the current understanding of major biosynthesis pathways and functions. The chloroplast (cp) genome contains important information in plant systematics and is maternally inherited in most angiosperms [9]. Thus, understanding these highly conserved structures could also reveal vital features for designing genetic markers. Chloroplasts contain circular DNA with nearly 130 genes ranging in size from 72–217 kb [10-11]. Chloroplast DNA (cpDNA) is remarkably conserved in gene content and structure, envisaging important information for genome-wide studies of plant evolution [12]. Recent studies, using advanced genome sequencing techniques, have improved the prospects for resolving phylogenetic homogeneity at taxonomic levels and have enhanced the current understanding of the structural and functional evolution of economically important plants and their traits [12-15]. In the last decade, numerous studies have reported the results of chloroplast genome sequencing for economically and ecologically important tree species, such as Eucommia ulmoides [16], Poplus cathayana [17], Quercus spinosa [18], Acacia ligulata [19], Pinus armandii [20], Cocos nucifera [21], Citrus aurantiifolia [22], Musa acuminata [23], Norway spruce [24], etc. The elucidation of the chloroplast genomes of important tree species has facilitated to understand the evaluation of gene structure and has targeted conservation and propagation strategies [12-13]. To understand the genome structure of chloroplasts, we have investigated the economically and culturally important frankincense-producing tree (Boswellia sacra; Burseraceae) [25]. There are approximately twenty Boswellia species, and B. sacra is an endemic species growing only in Dhofar region of Oman [25-26]. The trees grow in desert-woodlands (Fig 1) with meager amounts of water and nutrient availability [27], whereas the domesticated trees are supplied with water [28]. In response to the incisions by the local people, the tree activates its defense mechanism by producing resin [28]. The crystalline resin is used to create a fragrant smoke in homes and also sold in the market for income [26-27]. In addition to this, the tree has received much attention because of its medicinal uses in the Arabian region [29]. The essential oil and the boswellic acid or its derivatives possesses potent anticancer activities [30-32]. There is very scarce scientific knowledge exists regarding the genetic make-up and physiological functions of this endemic tree. Previously, Coppi et al. [31] has worked on the genetic diversity of B. sacra using ITS and ISSR analysis, where detailed genomic and conservation studies were suggested. Therefore, it is high time to further understand this keystone species. Hence, in the current study it was aimed to elucidate the complete chloroplast genome of B. sacra and determine the phylogenetic relationship of this tree with related species.
Fig 1

Boswellia sacra–habitat and leaf morphology.

This tree grows wildly in the Dhofar region of Oman.

Boswellia sacra–habitat and leaf morphology.

This tree grows wildly in the Dhofar region of Oman.

Materials and Methods

Ethics statement

The study site was managed by the Museum of the Frankincense Land (Salalah). The samples were collected from the Natural Park of Frankincense Tree, which is part of the Land of Frankincense sites, inscribed in the UNESCO’s World Cultural and Natural Heritage List. A permission was obtained from the Museum of the Frankincense Land to use the leaf part of the tree resources for research purposes. The trees used for sampling were treated ethically, and our study did not harm the local environment.

Site description and plant growth conditions

Wadi Dawkah, Dhofar-Oman (17°25ʹ21ʹʹN; 54°00ʹ32ʹʹE) is a completely arid desert region with small sandstone hills. The annual mean temperature of the sampling area is ~35°C, and the annual rainfall is approximately 40 mm [22]. The temperature in summer reaches to ~48°C. Leaf samples were collected from healthy trees with the least number of incisions/wounds, immediately frozen in liquid nitrogen during the field trip to Wadi Dawkah and subsequently stored at -80°C until further analysis. The plant samples were collected by the first author in July 2015, which were identified by Plant Taxonomist Mohammed Al-Broumi, University of Nizwa. The specimens were deposited at University of Nizwa Herbarium Center with a voucher number (UC29).

Chloroplast DNA extraction, sequencing and assembly

The fresh leaves from Boswellia sacra were washed with sterile distilled water to remove sand particles and soil debris. The chloroplast DNA was extracted by using a modified protocol of Shi et al. [33]. To prepare the sequencing library, 300 ng of extracted DNA was sheared to 400-bp by using the BioRuptor UCD-200 TS Sonication System (Diagenode Co., Belgium and Denville, NJ, USA). The fragmented DNA was used for subsequent library preparation using the Ion Xpress Fragment Library Kit (Life Technologies, Carlsbad, CA, USA). The DNA library was diluted to 4 pM, and emulsion PCR was performed using the Ion OneTouch System 2 (Life Technologies, Gaithersburg, MD, USA). The final sequencing library was loaded onto an Ion 316 v2 chip and sequenced on the Ion Torrent PGM for 850 flows using the Ion PGM Sequencing 400 Kit (Life Technologies, Gaithersburg, MD). The resultant sequence was de novo assembled and mapped with the reference genome [34]. The gaps identified were filled through another round of cp DNA extraction and sequencing. Following aforementioned procedures of cp DNA extraction, a second library was also prepared and sequenced separately to increase the raw data. The filtered sequence reads were assembled and aligned to the known cp genome sequences of Sapindales. At first, the filtered reads were de novo assembled into contigs using Mimicking Intelligent Read Assembly (MIRA 4.0) [35]. Second, the assembled contigs were mapped against the reference Azadirachta indica (accession number: NC 023792) through Map Reads to Reference Tool in CLC Genomics Workbench (version 8.5). The order of contigs was resolute according to the reference genomes. The gaps (7 to 150 bp) between the de novo contigs were replaced with consensus sequences of raw reads, which were later, re-mapped to the reference genomes. Any additional gaps were either filled manually by mapping tools in CLC Genomics Workbench (version 8.5) or using GapCloser (http://soap.genomics.org.cn/index.html).

Chloroplast gene annotation and sequence analyses

Genome annotation was performed by using Dual Organellar GenoMe Annotator (DOGMA) [36], tRNAscan-SE v.1.21 [37] and Basic Local Alignment Search Tool (BLAST) [38] from the National Center for Biotechnology Information (NCBI). The using MAUVE v2.4.0 software [39] was used to compare the genes and sequences of the de novo assembled B. sacra cp genome sequence and Azadirachta indica cp genome sequence, as a reference. In addition, full alignments with annotations were visualized using the VISTA viewer [40].

Repeat sequence analysis

MISA (http://pgrc.ipk-gatersleben.de/misa/) was used to analyze the types and positions of various simple sequence repeats (SSRs) in B. sacra. The results of this analysis were also correlated with the reference cp genome of A. indica according to the methods of Su et al. [22], where a minimum number of repeats were set at 10, 5, 4, 3, 3, and 3 for di-, tri-, tetra-, penta-, and hexanucleotides. For long repeats, the REPuter program [41] was used to identify the number and location of direct and inverted repeats. Minimums repeat size of 30 bp and sequence identity greater than 90% were used. Tandem repeats in the B. sacra cp genome were identified using Tandem Repeats Finder version 4.07 b [42] with the default settings. Complete cp genomes as well as a separate partition using only 76 shared genes were employed to analyze the average pairwise sequence divergence for B. sacra and Azadirachta indica. Missing and ambiguous gene annotations were confirmed by comparative sequence analysis after a multiple sequence alignment and gene order comparison [43]. These regions were aligned using MAFFT (version 7.222) [44] with the default parameters. Kimura’s two-parameter (K2P) model was selected to calculate pairwise sequence divergences [45]. The indel (insertion and deletion) polymorphism of shared genes were calculated using DnaSP 5.10.01 [46]. A custom Python script (https://www.biostars.org/p/119214/) based on single-nucleotide polymorphism (SNP) definition (a variation in a single nucleotide that occurs at a specific position in the genome) was employed to call SNPs. In addition, the SNPs in coding regions were classified in two ways: synonymous and nonsynonymous as shown by Guo et al. [47]. The number of synonymous (S) and nonsynonymous (N) substitutions for the B. sacra and Azadirachta indica was performed on MEGA 7.0 [48].

Phylogenetic analysis

Phylogenetic analysis of the Boswellia sacra cp genome was performed using Create Alignment and Create Tree tools in CLC genomics workbench version 8.5 (Gap open cost = 10.0, Gap extension cost = 1.0, Tree construction method = Neighbor Joining, Nucleotide distance measure = Jukes-Cantor) [49]. A total of 32 Malvidae species were selected as in-groups, whereas Vitis venifera was used as an outgroup. The accession numbers are provided in S1 Table. Complete cp genomes were used, and the bootstrap supports were estimated from 100 re-sampled alignments. Four methods were employed to construct phylogenetic trees from 32 cp genomes, including Bayesian inference (BI) implemented with MrBayes 3.12 [50], maximum parsimony (MP) with PAUP 4.0 [51], and maximum likelihood (ML), maximum parsimony (MP), posterior probabilities (PP) and neighbor-joining (NJ) with MEGA 7 [48].

Results and Discussion

General features of the Boswellia sacra chloroplast genome

Genome sequencing and assembly of Ion Torrent PGM sequencing produced 286.8 Mb of data for chloroplast genome of Boswellia sacra. A total of 373,248 reads (or 112,936,195 bp) were de novo assembled showing at total of 703x cp genome coverage. The chloroplast genome sequence quality was re-confirmed through repeating library preparation and sequencing of the extracted cpDNA from B. sacra tree. This result was sufficient to produce a single consensus sequence representing the complete chloroplast genome of Boswellia sacra. The de novo assembled genome was referenced with already reported cp genomes of Sapindales such as Citrus sinensis and Azadirachta indica. The gaps between the de novo contigs were replaced with consensus sequences of the raw reads and re-mapped to the reference genome. The sequences of the chloroplast genomes were deposited in GenBank (accession numbers: KT934315). The complete chloroplast genome was 160,543 bp in size, comprising a large single copy (LSC) region of 88,055 bp and a small single copy (SSC) region of 18,962 bp. In addition, a pair of inverted repeats (IRa and IRb) of 26,763 bp each (Fig 2) was also revealed. The annotation analysis revealed 114 different genes, with 83 protein-coding genes. The number of tRNA and rRNA genes was 27 and 4, respectively (Table 1). Among these, 24 genes were duplicated in the IR regions. The cp genome size of B. sacra was almost similar to Azadirachta indica (160,737 bp) and Citrus sinensis (160,129 bp). However, the size was almost 380 bp higher than closer relative in Order Sapindales i.e. Citrus aurantifolia (159,893 bp). In case of other arid-land trees, B. sacra size was higher than Acacia ligulata (158,724 bp) as shown in the previous studies of cp genomes of these plants [19, 22, 52]. This structure is consistent with the results of previous restriction mapping study [53], although the total lengths slightly differed.
Fig 2

Gene map of the Boswellia sacra chloroplast genome.

A pair of thick lines in the inside circle represents the inverted repeats (IRa and IRb; 26,763 bp each), separating the large single copy region (LSC; 88,055 bp) from the small single copy region (SSC; 18962 bp). The genes drawn inside the circle are transcribed clockwise, while those drawn outside the circle are transcribed counterclockwise.

Table 1

Summary of the chloroplast genome characteristics in Boswellia sacra and Azadirachta indica.

AttributeBoswellia sacra (KT934315)Azadirachta indica (NC023792)
Size (bp)160,543160,737
Overall GC content (%)37.637.5
LSC size in bp (% total)88,055 (54.8%)88,136 (54.8%)
SSC size in bp (% total)18,962 (11.8%)18,635 (11.6%)
IR size in bp (% total)a26,763 (16.7%)26,983 (16.8%)
Protein-coding regions size in bp (% total)80,289 (50.0%)79,685 (49.6%)
rRNA and tRNA size in bp (% total)11,978 (7.5%)11,863 (7.38%)
Introns size in bp (% total)17,621 (11.0%)18,993 (11.8%)
Intergenic spacer size in bp (% total)57,025 (35.5%)52,699 (32.8%)
Number of different genes114112b
Number of different protein-coding genes8378
Number of different rRNA genes44
Number of different tRNA genes2730
Number of different genes duplicated by IR2419
Number of different genes with introns1617

aEach cp genome contains two copies of inverted repeats (IRs).

bAccording to the original annotation, infA, ycf15, ycf68, orf42, orf56 not including.

Gene map of the Boswellia sacra chloroplast genome.

A pair of thick lines in the inside circle represents the inverted repeats (IRa and IRb; 26,763 bp each), separating the large single copy region (LSC; 88,055 bp) from the small single copy region (SSC; 18962 bp). The genes drawn inside the circle are transcribed clockwise, while those drawn outside the circle are transcribed counterclockwise. aEach cp genome contains two copies of inverted repeats (IRs). bAccording to the original annotation, infA, ycf15, ycf68, orf42, orf56 not including. Most of the essential photosynthesis-related genes were identified in the cp genome (Table 2; S2 Table). Twenty genes related to photosystem I and II, seventeen genes related to ATP synthase NADH dehydrogenase, six genes associated with the cytochrome b/f complex, a large subunit rbcL, two ORF, and other important genes related to ATP-dependent protease/ATP binding subunit (clpP), translational initiation factor (infA), Maturase K (matK), envelope membrane protein (cemA), subunit of acetyl-Co-Acarboxylase (accD), and c-type cytochrome synthesis gene (ccsA) were identified during annotation. The existence of chloroplast-encoded clpP in B. sacra suggested that this gene is indispensable for cell survival. Though this needs further verifications, but the presence of clpP indicates a better system to combat severe heat and drought stress in B. sacra, whereas Cicer arietinum, Lathyrus sativus, and Pisum sativum have lost an intron in both clpP1 and rps12 [54]. The clpP gene has also been identified in Acacia ligulata [19], Trifolium subterraneum [55], Camellia sp. [56], Actinidiaceae [57] and Wollemia nobilis [58].
Table 2

List of Genes found in cp genome of Boswellia sacra.

Group of geneName of gene
Photosystem IpsaA,psaB,psaC,psaI,psaJ
Photosystem IIpsbA,psbB,psbC,psbD,psbE,psbF,psbH,psbI,psbJ,psbK,psbL,psbM,psbN,psbT, psbZ
Cytochrome b/f complexpetA,petB,petD,petG,petL,petN
ATP synthaseatpA,atpB,atpE,atpF,atpH,atpI
NADH dehydrogenasendhA,ndhB,ndhC,ndhD,ndhE,ndhF,ndhG,ndhH,ndhI,ndhJ,ndhK
RubisCOlarge subunit rbcL
RNA polymeraserpoA,rpoB,rpoC1,rpoC2
Ribosomal proteins (SSU)rps2,rps3,rps4,rps7,rps8,rps11,rps12,rps12_3end,rps14,rps15,rps16,rps18,rps19
Ribosomal proteins (LSU)rpl2,rpl14,rpl16,rpl20,rpl22,rpl23,rpl32,rpl33,rpl36
Other genesclpP,matK,accD,ccsA,infA,cemA
hypothetical chloroplast reading framesycf1,ycf2,ycf3,ycf4,ycf15,ycf68
ORFsorf42,orf56
Transfer RNAstrnA-UGC,trnC-GCA,trnD-GUC,trnE-UUC,trnF-GAA,trnG-UCC,trnH-GUG,trnI-CAU,trnI-GAU, trnK-UUU,trnL-CAA,trnL-UAG,trnM-CAU,trnN-GUU, trnP-GGG,trnP-UGG,trnQ-UUG, trnR-ACG,trnR-UCU,trnS-GCU,trnS-GGA,trnT-GGU,trnV-GAC,trnV-UAC,trnW-CCA,trnY-GUA
Ribosomal RNAsrrn4.5,rrn5,rrn16,rrn23

General comparisons of Boswellia sacra and Azadirachta indica

The general characteristics of the B. sacra and A. indica cp genomes are presented in Table 1. The results showed a high level of similarity in the overall composition of the two cp genomes. The B. sacra cp genome had approximately 37.61% GC content, which is higher than that of A. indica (37.5%). In the B. sacra cp genome, the intergenic regions, introns, and genic regions account for ca. 33.5%, 9.6%, and 57.5%, respectively, as shown in Tables 1 & 2. The pairwise alignment of these two cp genomes revealed approximately 9.52% sequence divergence (Table 3; S3 Table; S4 Table; S5 Table) comprising 8,929 substitutions (5.56%) and 3,971 Indels (3.96%). The sizes of the LSC and SSC regions were somehow different between these two cp genomes, primarily reflecting the large index in each region. The 10 most divergent intergenic regions included the spacer between trnH—GUG-psbA (457 bp, 40.54% divergence), accD—psaI (746 bp, 37.27% divergence), atpHatpI (1,167 bp, 33.94% divergence), psbZ—trnG-UCC (546 bp, 28.14% divergence), trnK—UUU-rps16 (1,034 bp, 26.40% divergence), petN—psbM (1,139 bp, 36.21% divergence), ycf4—cemA (942 bp, 26.69% divergence), ccsA—ndhD (307 bp, 25.24% divergence), trnR-UCU—atpA (198 bp, 28.97% divergence) and trnL-UAG—ccsA (111 bp, 33.05% divergence). These findings suggest that a deletion might have occurred in this region in A. indica after the divergence of A. indica and B. sacra.
Table 3

Differences between the B. sacra and A. indica cp genomes.

Indel
Length (bp)Count
12,743
2–101,216
11–2112
Sum6,3533,971Percentagea: 3.96%
Substitution
TypeCount
A<->T1,571
C<->G837
A<->C1,416
T<->C1,828
A<->G1,777
T<->G1,500
Sum8,929Percentagea: 5.56%
10 most divergent intergenic regions
Regionlengthb (bp)Pairwise distance
trnH—GUG-psbA4570.47
accD—psaI7460.32
atpH—atpI1,1670.31
psbZ—trnG-UCC5460.30
trnK—UUU-rps161,0340.25
petN—psbM1,1390.25
ycf4—cemA9420.25
ccsA—ndhD3070.24
trnR-UCU—atpA1980.23
trnL-UAG—ccsA1110.21

aRelative to the length of B. sacra.

bLength in B. sacra.

aRelative to the length of B. sacra. bLength in B. sacra. In case of indels, the cp genome of B. sacra was highly conserved, however, in total 25 loci were identified with most variable regions. ycf1 was highly variable region showing a 4354 indel sites, followed by atpA with 124 indel sites. The average indel length for ycf1 was 174 bp, followed by rpl14 with deletion of 25.5 bp. On the other hand, the indel diversity per site was highest for rpl2 (S6 Table). We also compared the cp genomes of B. sacra and A. indica and calculated the average pairwise sequence divergence among 76 shared genes (S7 Table). Of these, the B. sacra genome has 0.066 average sequence divergence. Furthermore, among the genes identified to show highest level of divergence were atpA, rpl2, rps12, ycf1, infA, rpl22, rpl32, rpl20, ndhF and rps8 (S3 Fig). The highest average sequence distance was found for atpA (0.852), followed by rps11 (0.301). The trnH/psbA intergenic region has been studied as a barcode of differentiation at the species level in the plastids of inland plants, such as Araucaria [57-59]. This region, highly variable in length and sequence, is an on-coding region flanked by two conserved coding regions, i.e., psbA, which encodes photosystem II protein D1, and trnH-GUG, which functions in histidine translation [12-13]. The results revealed a 457 bp sequence trnH/psbA intergenic region in B. sacra. Yap et al. [57] suggested that this sequence was not present in Agathis dammara compared with Wollemia nobili. Previous detailed BLAST analyses have shown that this Indel is present in nineteen Araucaria species, including W. nobili [57]. In addition, psbA protects photosystem II from oxidative stress developed through drought. A previous report of Huo et al. [60] demonstrated that the overexpression of the psbA gene improves drought tolerance in tobacco plant. The occurrence of psbA in B. sacra suggests a strong ability to counteract oxidative stress, as this tree frequently encounters low-water conditions, however, this still needs further in-depth investigations.

Synteny between B. sacra and A. indica

A syntenic structure was also visualized in the MAUVE alignment for both cp genomes as shown in S1 Fig. The MAUVE alignment showed that there are two locally collinear blocks (LCBs) in the chloroplast genomes of B. sacra and A. indica. These regions showed homology in genome organization between the two species, although the sequences are inverted relative to each other. The LCBs of protein coding, tRNA, and rRNA genes showed a similar synteny for both cp genomes. Similarly, the two cp genomes were run global alignment through VISTA to understand the comparative genomic organization [40]. The results of the alignment showed that both are having high sequence similarity. However, the marked differences were observed between the non-coding regions (petN and psbM; accD and psal; ycf4 and cemA; rpl32 and trnL; atpH and atpl). The most divergent in coding regions were trnl, matK, clpP, rpl33, accD, rps3, ndh (A, H, D and G), ycf3 and ycf1 (Fig 3). Leliaert and Lopez-Bautista [61] revealed a similar homogenous matrix in cp genome sequencing between Bryopsis plumosa and Tydemania expeditions. However, dissimilarity in the synteny of some fragments of both cp genomes was observed, and this high dissimilarity is not unexpected, as high erraticism in the architecture of cpDNA has been observed among various species of congeneric taxa, such as picoplanktonic species, and Trebouxiophyceae and Chlorophyceae [62-63].
Fig 3

Visualization alignment of chloroplast genome sequences of B. sacra and A. indica.

VISTA based similarity graphical information portraying sequence identity of B. sacra with reference A. indica cp genomes. Thick black lines show the inverted repeats (IRs) in the chloroplast genomes. Genome regions are color-coded as protein coding, rRNA coding, tRNA coding or conserved noncoding sequences (CNS) whereas arrows show the gene presence.

Visualization alignment of chloroplast genome sequences of B. sacra and A. indica.

VISTA based similarity graphical information portraying sequence identity of B. sacra with reference A. indica cp genomes. Thick black lines show the inverted repeats (IRs) in the chloroplast genomes. Genome regions are color-coded as protein coding, rRNA coding, tRNA coding or conserved noncoding sequences (CNS) whereas arrows show the gene presence.

Analysis of repetitive sequences

Repetitive sequencing in cp genomes plays an important function and enhances the current understanding of the evolutionary aspects of plant species [62-63]. Mononucleotide microsatellite length polymorphisms have been used as markers to understand the evolutionary history of cp genomes, reflecting the high variability rates of these anomalies [56]. In the present study, a total of 109 simple sequence repeat (SSR) loci were observed in the B. sacra chloroplast genome, sharing 1,256 bp (ca. 0.78%) of the total sequence (Table 4). Most of the SSRs were identified in intergenic regions, and some SSRs were identified in coding areas, such as rpoC2 and ycf1, which could also be used as molecular markers for population diversity studies [63]. A total 39 large repeats of more than 30 bp in length were identified in the B. sacra cp genome (Table 5).
Table 4

List of simple sequence repeats.

Repeat unitLength (bp)Number of SSRsStart positiona
AT10521733 (rpoC2); 50519; 63260; 120682; 122312 (ndhD)
12272021; 118991
TA10150086
12149380
TC10165118 (cemA)
AAG12198114
ATA12352097; 57925 (atpB); 70908
CTT121150474
TAT12150027
TTA12254843; 131240 (ycf1)
AAAT1216383
AATT121117168
ATAG12134089
CATT121131003 (ycf1)
CTTT12151864
TATC12138353
TTTC121125090
ATATGA181118904

aThe SSR containing coding regions are indicated in parentheses. SSRs that are identical in the A. indica chloroplast genome are highlighted in bold.

Table 5

List of long repeat sequence.

Repeat SizeTypeaStart position of 1st repeatStart position the repeat found in other regionLocationbRegion
30P957848299trnS-GCU, trnS-GGALSC
30D3096531304IGS (petNpsbM)LSC
30D, P9575595773, 152797, 152815ycf2IR
31D5004250183IGS (trnT-UGUtrnL-UAA)LSC
32D1547415696, 16029IGS (atpH—atpI)LSC
34P58635863intron (rps16)LSC
36D, P103143125725, 145421IGS (rps12—trnV-GAC), intron (ndhA), IGS (trnV-GAC—rps12)IR, SSC
40D6442864797IGS (ycf4—cemA)LSC
42D4999150135IGS (trnT-UGU—trnL-UAA)LSC
42D6438664759IGS (ycf4—cemA)LSC
52P444444IGS (trnH-GUG—psbA)LSC
56P3181531815IGS (petN—psbM)LSC
65D1550716061*IGS (atpH—atpI)LSC
69D3110731522IGS (petN—psbM)LSC
76D4157543799psaB, psaALSC
109D6427464648IGS (ycf4—cemA)LSC
155D1541715639IGS (atpH—atpI)LSC
204D1572916061*IGS (atpH—atpI)LSC
251D6244262769IGS (accD—psaI), psaILSC

aD: direct repeat; P: palindrome inverted repeat.

bIGS: intergenic spacer region. Sequences conserved in the A. indica chloroplast genome are highlighted in bold.

*The 65 bp sequence is truncated copy of 204 bp sequence.

aThe SSR containing coding regions are indicated in parentheses. SSRs that are identical in the A. indica chloroplast genome are highlighted in bold. aD: direct repeat; P: palindrome inverted repeat. bIGS: intergenic spacer region. Sequences conserved in the A. indica chloroplast genome are highlighted in bold. *The 65 bp sequence is truncated copy of 204 bp sequence. In case of tandem repeat, some of the conifers have been noted for the presence of large number of such repeats [64]. These repeats have been associated with gene duplication and chloroplast rearrangement [65]. The results of tandem repeat analysis showed a total 30 repeats (Table 6). The highest size of repeats was 76 bp in length accD in the coding region whereas the lowest was 12 bp trnK/rps16 in the intergenic spacer region, with average copy number 2.4. The percent of indels were ranged from 0 to 18 (Table 6). A majority of these repeats were located in intergenic spacers, while five repeat sequences were identified in the coding regions of trnS-GCU, trnS-GGA, ycf2, psaB and psaA. Eight long repeats were identified in A. indica, suggesting that these repeats might be well distributed among Sapindales. In a recent study, Addisalem et al. [65] developed the first set of 46 SSR markers and estimated the genome size for B. papyrifera as 705 Mb, representing the first genome sequence analysis for any species of this genus. However, the current knowledge of the genomics and transcriptomics of this keystone species remains limited.
Table 6

Distribution of tandem repeats in the B. sacra chloroplast genome.

IndicesRepeat LengthCopy NumberPercent IndelsPercent MatchesLocation
347—387162.61188trnH/psbA (IGS)
4566—4594122.40100trnK/rps16 (IGS)
4819—4874192.9578trnK/rps16 (IGS)
7784—7814161.90100rps16/trnQ(IGS)
8690—8726182.1094trnQ/psbK(IGS)
9763—9828272.4584trnS/trnR (IGS)
9783—9831143.6583trnS/trnR (IGS)
9779—9867273.51385trnS/trnR (IGS)
9863—9892142.10100trnS/trnR (IGS)
11501—11572312.51881trnR/atpA(IGS)
11504—11573252.71776trnR/atpA(IGS)
11535—11583222.41280trnR/atpA(IGS)
16355—16385142.2094atpH/atpI (IGS)
29582—29606122.10100rpoB/trnC(IGS)
31751—317761320100petN/psbM(IGS)
49997—50046182.91182trnT/trnL(IGS)
60254—60278122.10100rbcL/accD(IGS)
62281—62429762588accD(CDS)
67916—67954192.1490psbJ/psbL(IGS)
69384—69454126.11872psbE/petL(IGS)
72176—72214211.9594rps18 (CDS)
73287—733121320100rpl20/rps12(IGS)
95746—95802183.2097ycf2(CDS)
112196—112261322.1097rrn 4.5 /rrn5(IGS)
113767—113809202.20100trnN/ycf1(IGS)
119576—119605152093ccsA(CDS)
129059—129085132.10100rps15/ycf1(IGS)
134790—134832202.20100ycf1(CDS)
136338—136403322.1097rrn5/rrn4.5(IGS)
152797—152853183.2097ycf2(CDS)
In case of single-nucleotide polymorphism (SNP) (a variation in a single nucleotide that occurs at a specific position in the genome), maximum likelihood analysis of natural selection codon-by-codon, was performed to estimate the number of inferred synonymous (s) and nonsynonymous (n) substitutions which are presented along with the numbers of sites that are estimated to be synonymous (S) and non-synonymous (N) [66]. These estimates are produced using the joint Maximum Likelihood reconstructions of ancestral states under a Muse-Gaut Model [67] of nucleotide substitution. A total to 8,301 positions were noted in the chloroplast genomes of B. sacra, A. indica and Citrus sinensis. In case of codons, TCT, GTT, GGA and CGT showed high n, S, N and dN-dS (S8 Table). However, the P value was less than 0.05 showing a significant substitution among cp genomes. The normalized dN/dS ratio was calculated for the three cp genomes which showed a positive selection ranging from 0~10.17. The number of segregating sites were 14,249 in three cp genomes with a nucleotide diversity was 0.063. The nucleotide frequencies are 30.59% (A), 31.21% (T/U), 19.49% (C), and 18.71% (G) in the cp genomes. The transition/transversion rate ratios are k1 = 2.149 (purines) and k2 = 2.072 (pyrimidines). The overall transition/transversion bias is R = 0.996. These shows that B. sacra, A. indica and Citrus sinensis species with long generation times, and both exhibit little changes in the structural organization of the chloroplast genomes. This could further reveal how these species evolved at low rates. Similar perspectives were shown for the chloroplast genomes of five Quercus species [66]. The estimation of synonymous and non-synonymous substitution rates may play an important role in understanding the dynamics of molecular evolution, and non-synonymous substitutions could be subject to natural selection during the evolutionary process [66-68].

Evolution of orf56 and ycf68 within sapindales

We compared the orf56 and ycf68 genes of three tree species viz. B. sacra, A. indica and C. sinensis (Fig 4), revealing the presence of these genes in the three species. The orf56 gene is present in an intron in the trnA-UGC gene, which contains one homologous sequence, known as mitochondrial ACR-toxin sensitivity gene in Citrus trees [22, 69]. Interestingly, the full-length orf56 gene sequence was different but was intact in all three species. Goremykin et al. [70] previously reported that the ACR-toxin sensitivity gene was transferred between plastid and mitochondrial genomes [22].
Fig 4

Comparison of the border positions of the LSC, SSC, and IR regions in three Sapindales species.

The ycf68 gene, in the trnI-GAU intron of the IR region, showed high similarity between C. sinensis and B. sacra. However, an additional T-insertion near the C-terminus was abolished at the stop codon [22]. A. indica has a T-deletion near the C-terminus abolished at the stop codon in the same position. Comparatively, an intact ycf68 gene was detected in B. sacra (Fig 4). Raubeson et al. [71] suggested that ycf68 is not a protein-coding gene based on the lack of intron folding patterns. The high levels of sequence conservation among the ORF of identified homologs indicates true functionality [72]. However, the function of this putative gene still remains unclear and needs further investigation.

Phylogenetic analysis of B. sacra

To obtain insight into the position of B. sacra within the Malvidae, we generated datasets of 32 completely sequenced chloroplast genomes, using Vitis venefera as an outgroup (S1 Table). We conducted a phylogenetic analysis of the B. sacra cp genome with the cp genomes from 8 species of Myrtaceae (Corymbia eximia, Angophora costata, Eucalyptus umbra, E. deglupta, E. globulus, Stockwellia quadrifida, Syzygium cumini and Allosyncarpia ternata), 2 species from Melianthaceae (Francoa sonchifolia and Theorbroma cacao), 3 species from Malvaceae (Gossypium incanum, G. tomentosum, and G. hirsutum), 15 species from Brassicaceae (Barbarea verna, Nasturtium officinale, Lepidium virginicum, Pachycladon enysii, P. cheesemanii, Crucihimalaya wallichii, Olimarabidopsis pumila, Capsella bursa-pastoris, Arabidopsis thaliana, Lobularia maritima, Brassica napus, Arabis hirsuta, Draba nemorosa, Aethionema cordifolium and Aethionema grandiflorum), one species from Caricaceae (Carica papaya) and one species from Rutaceae (Citrus sinensis). The phylogenetic analysis included some commercially important Sapindales plants and the more relevant A. indica species from Maliaceae. The neighbor-joining phylogenetic tree resolved 33 nodes, with strong bootstrap support of 55–100 (Fig 5). These results strongly support the position of the B. sacra within Sapindales. The phylogenetic trees obtained in the present study also indicate a close relationship between B. sacra and A. indica with high bootstrap support (100%). Krishnan et al. [73] suggested that both A. indica and Citrus sinensis form a homogenous group (100%), validating the conventional taxonomic classification of A. indica. Thus, the results of the phylogenetic analysis of the cp genome of both species showed the formation of a distinct cladogram with B. sacra, revealing new insights to understand the gene structure of these three plants.
Fig 5

Neighbor-Joining phylogeny of the representative Malvidae lineages.

The common grape vine (Vitis vinifera) is included as the outgroup to root the tree. A total of 33 complete chloroplast genomes were aligned. In total, 33 nodes were resolved. The position of B. sacra is shown in bold.

Neighbor-Joining phylogeny of the representative Malvidae lineages.

The common grape vine (Vitis vinifera) is included as the outgroup to root the tree. A total of 33 complete chloroplast genomes were aligned. In total, 33 nodes were resolved. The position of B. sacra is shown in bold. The habitats of B. sacra and A. indica are different, however, the fact that both plants produce oleugum and belong to the same order suggests interesting prospects for broader evolutionary studies of Sapindales [74]. This phylogenetic relationship was further confirmed by constructing the maximum parsimony tree [48], suggesting a close homology with C. sinensis and A. indica (S2 Fig). The Bayesian Inference (BI) also highly supported as monophyletic (100/1.00) behavior C. sinensis and A. indica. Whereas a sister relationship of B. sacra was well supported (100/1.00). The posterior probabilities (PP) were also 1.0 for B. sacra, C. sinensis and A. indica (S4 Fig), suggesting a branching order as ((C. sinensis, A. indica), B. sacra). A similar phylogenomic reconstruction was also noticed by Wang et al. [16,75], where Actinidiaceae formed a monophyly relationship in Ericales.

Conclusions

Using a combination of de novo assembly and reference genome analysis, we provided the first completely sequenced chloroplast genome of endemic Boswellia sacra. The genome organization and gene structure is almost typical to that of angiosperms plants. The main difference in the gene composition between the B. sacra and A. indica cp genomes was the protein-coding gene in the trnH-GUG-psbA region. Notably, eight shared SSR regions were identified in the B. sacra and A. indica cp genome comparison. Elucidating these regions could provide additional phylogenetic insights at the taxonomic level, and these results could be used to develop molecular markers [72]. Previously, Coppi et al. [31] suggested that different stands of tree populations growing in different areas showed significant variations in genetic distances. The findings of the present study are essential to envisage fresh in-depth intuitions into various aspects of Boswellia chloroplast genome evolution. The current new genomic datasets will enable further exploration of the genetic diversity of Boswellia sacra and the other 18 species of this genus.

List of the complete chloroplast genome sequences included in the phylogenetic analysis.

(XLSX) Click here for additional data file.

Functions and products of the chloroplast genes identified in B. sacra.

(DOCX) Click here for additional data file.

Sequences of important gene fragments in B. sacra and comparison with A. indica.

(XLSX) Click here for additional data file.

Simple Sequence Repeat analysis of B. sacra.

(XLSX) Click here for additional data file.

Pairwise distances of the important regions identified in B. sacra and A. indica.

(XLSX) Click here for additional data file.

Comparative assessment of Indels B. sacra and A. indica.

(XLSX) Click here for additional data file.

Pairwise distance among the shared genes of B. sacra and A. indica.

(XLSX) Click here for additional data file.

Comparisons of synonymous (S) and nonsynonymous (N), transitional substitutions are shown in bold and those of transversionsal substitutions in chloroplast genome of B. sacra, A. indica and Citrus sinensis.

(XLS) Click here for additional data file.

MAUVE alignment of the Boswellia sacra chloroplast genome.

The B. sacra genome is shown on top as the reference genome. Within each of the alignments, local collinear blocks are represented as blocks of the same color connected with lines. Annotations are shown above and below the LCBs, protein-coding genes are indicated as white boxes, tRNA genes are shown in green, and rRNA genes are shown in red. The lowered position of a box indicates an inverted orientation. (TIFF) Click here for additional data file.

Maximum parsimony phylogeny of the representative cp genomes.

The Gossypium incanum was included as an outgroup to root the tree. A total of 21 complete chloroplast genomes were aligned. In total, 21 nodes were resolved. The position of B. sacra is shown in green circular symbol. (TIF) Click here for additional data file.

Pairwise distances of shared 76 genes B. sacra and A. indica.

(TIF) Click here for additional data file.

Posterior probabilities out put of the selected cp genomes in relation with B. sacra and A. indica.

(PDF) Click here for additional data file.
  57 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

Review 2.  The chloroplast genome.

Authors:  M Sugiura
Journal:  Plant Mol Biol       Date:  1992-05       Impact factor: 4.076

3.  Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

Authors:  Michael J Moore; Charles D Bell; Pamela S Soltis; Douglas E Soltis
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

4.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome.

Authors:  S V Muse; B S Gaut
Journal:  Mol Biol Evol       Date:  1994-09       Impact factor: 16.240

5.  Ion Torrent next-generation sequencing reveals the complete mitochondrial genome of black and reddish morphs of the Coral Trout Plectropomus leopardus.

Authors:  Zhenzhen Xie; Cuiping Yu; Liang Guo; Mingming Li; Zhang Yong; Xiaochun Liu; Zining Meng; Haoran Lin
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2014-04-16       Impact factor: 1.514

6.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

7.  The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene.

Authors:  Anna V Williams; Laura M Boykin; Katharine A Howell; Paul G Nevill; Ian Small
Journal:  PLoS One       Date:  2015-05-08       Impact factor: 3.240

8.  Phytochemical analysis of the essential oil from botanically certified oleogum resin of Boswellia sacra (Omani Luban).

Authors:  Ahmed Al-Harrasi; Salim Al-Saidi
Journal:  Molecules       Date:  2008-09-16       Impact factor: 4.411

9.  Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

Authors:  Linda A Raubeson; Rhiannon Peery; Timothy W Chumley; Chris Dziubek; H Matthew Fourcade; Jeffrey L Boore; Robert K Jansen
Journal:  BMC Genomics       Date:  2007-06-15       Impact factor: 3.969

10.  Six newly sequenced chloroplast genomes from prasinophyte green algae provide insights into the relationships among prasinophyte lineages and the diversity of streamlined genome architecture in picoplanktonic species.

Authors:  Claude Lemieux; Christian Otis; Monique Turmel
Journal:  BMC Genomics       Date:  2014-10-04       Impact factor: 3.969

View more
  10 in total

1.  First reported chloroplast genome sequence of Punica granatum (cultivar Helow) from Jabal Al-Akhdar, Oman: phylogenetic comparative assortment with Lagerstroemia.

Authors:  Abdul Latif Khan; Sajjad Asaf; In-Jung Lee; Ahmed Al-Harrasi; Ahmed Al-Rawahi
Journal:  Genetica       Date:  2018-08-29       Impact factor: 1.082

2.  Genome structure and evolutionary history of frankincense producing Boswellia sacra.

Authors:  Abdul Latif Khan; Ahmed Al-Harrasi; Jin-Peng Wang; Sajjad Asaf; Jean-Jack M Riethoven; Tariq Shehzad; Chia-Sin Liew; Xiao-Ming Song; Daniel P Schachtman; Chao Liu; Ji-Gao Yu; Zhi-Kang Zhang; Fan-Bo Meng; Jia-Qing Yuan; Chen-Dan Wei; He Guo; Xuewen Wang; Ahmed Al-Rawahi; In-Jung Lee; Jeffrey L Bennetzen; Xi-Yin Wang
Journal:  iScience       Date:  2022-06-10

3.  The Complete Chloroplast Genome Sequence of Tree of Heaven (Ailanthus altissima (Mill.) (Sapindales: Simaroubaceae), an Important Pantropical Tree.

Authors:  Josphat K Saina; Zhi-Zhong Li; Andrew W Gichira; Yi-Ying Liao
Journal:  Int J Mol Sci       Date:  2018-03-21       Impact factor: 5.923

4.  Chemical, molecular and structural studies of Boswellia species: β-Boswellic Aldehyde and 3-epi-11β-Dihydroxy BA as precursors in biosynthesis of boswellic acids.

Authors:  Ahmed Al-Harrasi; Najeeb Ur Rehman; Abdul Latif Khan; Muhammed Al-Broumi; Issa Al-Amri; Javid Hussain; Hidayat Hussain; René Csuk
Journal:  PLoS One       Date:  2018-06-18       Impact factor: 3.240

5.  First complete chloroplast genomics and comparative phylogenetic analysis of Commiphora gileadensis and C. foliacea: Myrrh producing trees.

Authors:  Arif Khan; Sajjad Asaf; Abdul Latif Khan; Ahmed Al-Harrasi; Omar Al-Sudairy; Noor Mazin AbdulKareem; Adil Khan; Tariq Shehzad; Nadiya Alsaady; Ali Al-Lawati; Ahmed Al-Rawahi; Zabta Khan Shinwari
Journal:  PLoS One       Date:  2019-01-10       Impact factor: 3.240

6.  Complete Chloroplast Genomes of Vachellia nilotica and Senegalia senegal: Comparative Genomics and Phylogenomic Placement in a New Generic System.

Authors:  Sajjad Asaf; Arif Khan; Abdul Latif Khan; Ahmed Al-Harrasi; Ahmed Al-Rawahi
Journal:  PLoS One       Date:  2019-11-25       Impact factor: 3.240

7.  Comparative Chloroplast Genomics of Endangered Euphorbia Species: Insights into Hotspot Divergence, Repetitive Sequence Variation, and Phylogeny.

Authors:  Arif Khan; Sajjad Asaf; Abdul Latif Khan; Tariq Shehzad; Ahmed Al-Rawahi; Ahmed Al-Harrasi
Journal:  Plants (Basel)       Date:  2020-02-05

8.  Mangrove tree (Avicennia marina): insight into chloroplast genome evolutionary divergence and its comparison with related species from family Acanthaceae.

Authors:  Sajjad Asaf; Abdul Latif Khan; Muhammad Numan; Ahmed Al-Harrasi
Journal:  Sci Rep       Date:  2021-02-11       Impact factor: 4.379

9.  Characterization of the complete plastome sequence of Torreya grandis var. jiulongshanensis (Taxaceae), a rare and endangered plant species endemic to Zhejiang province, China.

Authors:  Ming Jiang; Junfeng Wang; Huijuan Zhang
Journal:  Mitochondrial DNA B Resour       Date:  2020-01-24       Impact factor: 0.658

10.  The Plastome Sequences of Triticum sphaerococcum (ABD) and Triticum turgidum subsp. durum (AB) Exhibit Evolutionary Changes, Structural Characterization, Comparative Analysis, Phylogenomics and Time Divergence.

Authors:  Sajjad Asaf; Rahmatullah Jan; Abdul Latif Khan; Waqar Ahmad; Saleem Asif; Ahmed Al-Harrasi; Kyung-Min Kim; In-Jung Lee
Journal:  Int J Mol Sci       Date:  2022-03-03       Impact factor: 5.923

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.