Literature DB >> 35280582

Insights into plastome of Fagonia indica Burm.f. (Zygophyllaceae): organization, annotation and phylogeny.

Mohamed S Elshikh1, Mohammad Ajmal Ali1, Fahad Al-Hemaid1, Soo Yong Kim2, Meena Elangbam3, Arun Bahadur Gurung4, Prasanjit Mukherjee5, Mohamed El-Zaidy1, Joongku Lee6.   

Abstract

The enhanced understanding of chloroplast genomics would facilitate various biotechnology applications; however, the chloroplast (cp) genome / plastome characteristics of plants like Fagonia indica Burm.f. (family Zygophyllaceae), which have the capability to grow in extremely hot sand desert, have been rarely understood. The de novo genome sequence of F. indica using the Illumina high-throughput sequencing technology determined 128,379 bp long cp genome, encode 115 unique coding genes. The present study added the evidence of the loss of a copy of the IR in the cp genome of the taxa capable to grow in the hot sand desert. The maximum likelihood analysis revealed two distinct sub-clades i.e. Krameriaceae and Zygophyllaceae of the order Zygophyllales, nested within fabids.
© 2021 The Author(s).

Entities:  

Keywords:  Fabids; Fagonia indica; NGS; Plastome; Zygophyllaceae

Year:  2021        PMID: 35280582      PMCID: PMC8913386          DOI: 10.1016/j.sjbs.2021.11.011

Source DB:  PubMed          Journal:  Saudi J Biol Sci        ISSN: 2213-7106            Impact factor:   4.219


Introduction

The cp (chloroplast) genome / plastome encodes several key proteins involved in the photosynthesis and in other metabolic processes important for the interactions of plants with the environment as well as defense against invading pathogens (Bobik and Burch-Smith, 2015). The availability of organelle or even whole genome sequence data in different databases repositories are gradually increasing because of the advancement of massively parallel next-generation DNA sequencing platforms and development of bioinformatics resources during last two decades; as a result, the characterization of over 5000 chloroplast (cp) genome sequences / plastome until September 2021 available in the GenBank, have revolutionized the application of plastome genomics (Ali et al., 2020), genetic engineering to enhance plant agronomic traits (Cosa et al., 2001, Ruf et al., 2001, Dufourmantel et al., 2004, Dufourmantel et al., 2005, Liu et al., 2007, Zhou et al., 2008, Singh et al., 2010, Lee et al., 2011, Jin et al., 2011, Jin et al., 2012, Jin et al., 2015, Zhang et al., 2015), synthesis of enzymes and biomaterials (Jin et al., 2011, Viitanen et al., 2004, Verma et al., 2010, enhancing nutrition (Shintani and DellaPenna, 1998, Schneider, 2005, Apel and Bock, 2009, Jin and Daniell, 2014), biopharmaceuticals (Grabowski et al., 2006; El Kaoutari et al., 2013, Kwon et al., 2013, Shenoy et al., 2014, Kohli et al., 2014, Holtz et al., 2015, Kwon and Daniell, 2015), biomedical products (Daniell et al., 2016), and in understanding the genetic diversity, and phylogeny (Daniell et al., 2016, Brozynska et al., 2016, Jansen et al., 2007, Moore et al., 2010, Elshikh et al., 2020). Fagonia indica Burm.f. (Family: Zygophyllaceae, Order: Zygophyllales, Clade: Fabids) (APG IV, 2016) is a densely to sparsely branched thorny herb approximately 60 × 100 cm in height and width respectively (Fig. 1), possess anticancer activity (Lam et al., 2014), is widely distributed in Asian and African deserts (El Hadidi, 1985, Basto, 2002, Beier et al., 2004, Beier, 2005) to the inner zone of Empty Quarter (-hottest sand desert) (Mandavil, 1986). A thorough survey of published reports revealed that the cp genomes of plants, like F. indica, which have the capability to grow in extremely hot sand desert, have been rarely characterized. The present report deals the complete cp genome sequence of F. indica, and discuss its genome organization including gene content and repeat features, phylogeny, and compare with the representative plants of major habitats to detect similarity and variations.
Fig. 1

A twig of Fagonia indica Burm.f. (family Zygophyllaceae) in the flowering stage.

A twig of Fagonia indica Burm.f. (family Zygophyllaceae) in the flowering stage.

Materials and methods

Plant material, DNA extraction and de novo genome sequencing

The fresh leaves of F. indica were collected from the desert of Riyadh region, Saudi Arabia. The genomic DNA extracted using the Qiagen DNeasy Kit (Qiagen, Hilden, Germany) was subsequently used to construct short-insert libraries according to the manufacturer’s manual (Illumina, Inc., San Diego, USA), and sequenced as a single-end run of 51 bp using the DNA Illumina sequencing platform (Quail et al., 2012).

The cp genome assembly and annotation

The sequence raw reads were filtered using fastqc to obtain the high-quality clean sequence data by removing adaptor sequences. The high-quality filtered reads were then assembled using spades (Bankevich et al., 2012). The assembled cp genome was annotated using default parameters (Tillich, 2007, Stothard, 2000) of GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html), the NCBI GenBank sequence of Larrea tridentata (DC.) Coville (family Zygophyllaceae) (GenBank accession number NC_028023.1) was used as a reference for annotation and for further comparison with the closely related L. tridentata using the mVISTA program (http://genome.lbl.gov/vista/mvista/submit.shtml) in Shuffle-LAGAN mode (Brudno et al., 2003).

Repeat structure and small inversion

The tandem repeats were analyzed using the ‘Tandem Repeat Finder’ () (Benson 1999), (Timme et al., 2007), and REPuter () (Castro et al., 2013) was used to identify and locate disperse repeats including the direct (forward) and inverted (palindrome) repeats. The tandem repeats less than 15 bp in length and the REPuter redundant results were removed manually, and then the candidate small inversions (SIs) were identified when the repeats' distance was less than 50 bp (Yang et al., 2010), and the likely secondary structures of the SIs were evaluated using MFOLD (version 3.2) ( = mfold). The potential microsatellite regions were tracked by looking for five or more repeats of the nucleotides A and T using MISA () (Beier et al., 2017).

Phylogenetic analysis

A total number of 48 chloroplast genes (Supplementary Table S1) present in the cp genomes belonging to 49 taxa (Supplementary Table S2) from 49 different orders and the three outgroup sequences belonging to Gymnosperm clade were retrieved from the GenBank, and aligned using ClustalX (Thompsone et al., 1997). The maximum likelihood (ML) analyses was performed using MEGAX software (Kumar et al., 2018).

Results

Content and organization of plastome

The mapping of the assembled cp genome resulted into circular molecule (Fig. 2) with a total number of 115 unique genes [represents 1,28,379 base pair (bp) nucleotides (nt)] which includes 80 CDS (represents 80,200 bp nucleotides coding for 42,793 codons), 31 tRNA genes and four rRNA genes. The assembled cp genome sequence was submitted to the NCBI (GenBank accession number MN521457). The cp genome size of F. indica was approximately 128 kb, which was smaller than that of L. tridentata (Fig. 3, Table 1). The coding regions were less divergent than the non-coding regions (Fig. 4).
Fig. 2

Gene map of the chloroplast (cp) genome of Fagonia indica.

Fig. 3

Comparison of the border positions of SSC, LSC, and IR regions between the chloroplast (cp) genome of the Larrea tridentata and Fagonia indica. The selected genes or portions of genes have been indicated by the boxes above the genome. SSC: small single copy; LSC: large small copy; IR: reverse complementary repeat region; bp: base pairs.

Table 1

Features (length, %GC, gene, CDS, rRNA, tRNA) of the chloroplast (cp) genome Fagonia indica and Larrea tridentata.

FeaturesF. indicaL. tridentata
Length128,379136,194
%GC34.02737235.092589
Gene115126
CDS8075
rRNA44
tRNA3128
Fig. 4

Percent identity plot of the comparison of the chloroplast (cp) genome of the Fagonia indica with Larrea tridentata.

Gene map of the chloroplast (cp) genome of Fagonia indica. Comparison of the border positions of SSC, LSC, and IR regions between the chloroplast (cp) genome of the Larrea tridentata and Fagonia indica. The selected genes or portions of genes have been indicated by the boxes above the genome. SSC: small single copy; LSC: large small copy; IR: reverse complementary repeat region; bp: base pairs. Features (length, %GC, gene, CDS, rRNA, tRNA) of the chloroplast (cp) genome Fagonia indica and Larrea tridentata. Percent identity plot of the comparison of the chloroplast (cp) genome of the Fagonia indica with Larrea tridentata. A total of 13 genes, including seven protein-coding genes and six tRNA genes, contained one or two introns (Supplementary Table S3). Among the intron-containing genes, trnK-UUU had the largest intron (2511 bp) that includes the matK gene, and trnL-UAA had the smallest intron (551 bp). The ycf3 gene had two introns of 722 and 758 bp. The sequence analysis indicates that 58.49%, 6.87%, and 3.53% of the genome sequence encode proteins, tRNAs, and rRNAs, respectively, whereas 41.50% of the genome sequence is a non-coding sequence filled with introns, intergenic spacers, and pseudogenes. Based on the sequences of protein-coding and tRNA genes within the cp genome, Phe (0.05%) and Arg (0.0038%) were the most and least used amino acids, respectively (Supplementary Table S4). The tandem and dispersed repeats were analyzed in the cp genome of F. indica. Forty-one tandem repeats were identified, of which 23 were 15–20 bp, 14 were 21–30 bp, two were 31–40 bp, one was 41–50 bp, and another one was 81–90 bp in size. Similarly, 43 dispersed repeats were identified, of which one was 21–30 bp, 22 were 31–40 bp, 11 were 41–50 bp, four were 51–60 bp, one was 61–70 bp, another one was 81–90 bp, and three were more than 91 bp in size. In total, 84 repeats were identified, of which 87% were in the intergenic spacer regions, 6% in introns, and 7% in the CDS regions, respectively (Fig. 5). The repeat structures in other members of Zygophyllaceae (L. tridentata) were also analyzed using REPuter (Fig. 6). The forward and inverted repeats were common in L. tridentata and F. indica. In addition, in the same Zygophyllaceae family, different repeat structures were found between F. indica and L. tridentata. Of the two Zygophyllaceae cp genomes studied, F. indica contained the highest total number of repeats that were 75 bp or greater in length and SIs ranging from 11 to 24 bp in size. The folded stem-loop structures of the three SIs of F. indica are shown in Fig. 7.
Fig. 5

Repeat structure analysis in the chloroplast (cp) genome. The cutoff value was 15 bp for a tandem repeat and 30 bp for a dispersed repeat. (A) frequency of repeats by length, (B) repeat type, (C) location distribution of repeats. (CDS: coding sequence).

Fig. 6

Repeat structures in Larrea tridentata and Fagonia indica (Zygophyllaceae).

Fig. 7

Folded stem-loop structures in the three small inversions (Sis) of Fagonia indica.

Repeat structure analysis in the chloroplast (cp) genome. The cutoff value was 15 bp for a tandem repeat and 30 bp for a dispersed repeat. (A) frequency of repeats by length, (B) repeat type, (C) location distribution of repeats. (CDS: coding sequence). Repeat structures in Larrea tridentata and Fagonia indica (Zygophyllaceae). Folded stem-loop structures in the three small inversions (Sis) of Fagonia indica. Within the cp genome of F. indica, 37 different SSR loci were repeated more than five times (Table 2). Of these, 31 loci were homopolymers and six were di-polymers. All homopolymeric loci contained multiple A or T nucleotides, whereas all di-polymeric loci contained multiple AT or TA nucleotides. These SSR loci contribute to the A-T richness of the cp genome of F. indica.
Table 2

Simple sequence repeat (SSR) loci in the chloroplast (cp) genome of Fagonia indica.

PositionRepeatRepeat length of consensusLocusRegion
285A10trnH-GUG-psbAintergenic
2380T11matKCDS
17,585A14trnS-GGA-ycf3intergenic
17,960A10ycf3intron
18,892A12ycf3intron
26,911T12psbZ-trnS-UGAintergenic
31,068AT6psbD-trnT-GGUintergenic
31,686T12trnT-GGU-trnE-UUCintergenic
32,350A11trnY-GUA-trnD-GUCintergenic
36,011A10trnC-GCA-rpoBintergenic
36,458A10trnC-GCA-rpoBintergenic
40,452A10rpoC1intron
44,927A13rpoC2CDS
46,844A11rpoC2CDS
49,092A11atpI-atpHintergenic
52,145T10atpF-atpAintergenic
58,939T11accD-psaIintergenic
59,528T10accD-psaIintergenic
60,008A10psaI-ycf4intergenic
64,326AT6psbJ-psbLintergenic
66,729A11trnP-UGG-psaJintergenic
68,792T11rps18-rpl20intergenic
69,919TA6rpl20-clpPintergenic
70,918T10clpPintron
76,067T11petB-petDintergenic
81,388A13rpl14-rpl16intergenic
110,755T10ycf1CDS
110,875A10ycf1CDS
111,474A10ycf1CDS
113,104A11ycf1CDS
114,482TA10ycf1-rps15intergenic
114,945A11rps15CDS
119,431AT6ndhI-ndhGintergenic
123,288TA7ndhD-ccsAintergenic
125,155T11rpl32CDS
125,883T10rpl32-ndhFintergenic
127,343A10ndhFCDS
Simple sequence repeat (SSR) loci in the chloroplast (cp) genome of Fagonia indica. The present maximum likelihood (ML) bootstrap analysis revealed two major clades—monocots and eudicots. In the eudicots clade, F. indica clades with L. tridentata and T. mongolica (family Zygophyllaceae, order Zygophyllales) nested within the clade fabids. The maximum likelihood tree (MLT) also revealed two distinct clades of Krameriaceae and Zygophyllaceae (Fig. 8).
Fig. 8

Maximum likelihood (ML) phylogenetic tree inferred from 48 protein-coding chloroplast (cp) genes from 53 plant taxa.

Maximum likelihood (ML) phylogenetic tree inferred from 48 protein-coding chloroplast (cp) genes from 53 plant taxa.

Discussion

In the present study, the mapping of the assembled cp genome was found similar to the angiosperm (Raubeson et al., 2007), except for the loss of one copy of the IR as similar to majority of papilionoid (Doyle et al., 1996, Kato et al., 2000, Saski et al., 2005, Guo et al., 2007). The rps16 gene was found in the cp genomes of most angiosperms, including the representatives of the early-branching lineages (Goremykin et al., 2003, Raubeson et al., 2007, Hansen et al., 2007); however, it was not found in the F. indica. F. indica had a single copy of inverted repeat resulted into the inverted gene order compared to its taxonomically close relative L. tridentata. The lengths of the cp genomes of angiosperms remain variable primarily because of nucleotide substitutions, gene/intron losses, and expansion and contraction of the inverted repeat IR region (Jansen et al., 2007). It was noted that the coding regions were less divergent than the non-coding regions (Fig. 4); however, further analysis showed that clpP and accD were the most divergent coding regions (Supplementary Table S5). Photosynthesis is the ultimate source of biomass production (Beadle and Long, 1985). The PAR (photosynthetically active radiation) intensity is an important factor that determines the rate of photosynthesis (Wimalasekera, 2019). The intensity of light varies in different major habitats (Warrant and Johnsen, 2013). The comparative cp genome analysis of F. indica as a representative from hot sand desert with the representatives of flowering plants occurring in different major habitats further supports the conservative pattern of the cp genome and suggests that the genes contained in the cp genome might not have roles solely in organism yield, rarity, or abundance and biomass, and in encountering stress (Elshikh et al., 2020). The knowledge of phylogeny is used in almost every branch of biology (Yang and Rannala, 2012) including taxonomy [Philippe et al., 2005, APG, 2016), evolution (Edwards, 2009, Soltis et al., 2019) and comparative biology (Eisen, 1998, Mäser et al., 2001, Kellis et al., 2003, Pedersen et al., 2006, Lindblad-Toh et al., 2011), medicine (Marra et al., 2003, Grenfell et al., 2004, Salipante and Horwitz, 2006), and genomics (Paten et al., 2008, Green et al., 2010, Gronau et al., 2011, Li and Durbin, 2011, Ma, 2011). Moreover, the family Zygophyllaceae has previously been treated as being related either to Geraniaceae (Geraniales) or to Sapindales/Rutales or Linales/Malpighiales (Sheahan and Chase, 1996). Secondly, the phylogenetic relationships of the two sister families, e.g., Zygophyllaceae and Krameriaceae (Soltis et al., 1998, Savolainen et al., 2000, Wang et al., 2009, Tao et al., 2018) under the order Zygophyllales, have often been controversial APG IV, 2016. The wood anatomy supports the separation of Krameriaceae from the Zygophyllaceae (Carlquist, 2005). Granot and Grafi (2014) argued, based on epigenetic studies, that the placement of the families Krameriaceae and Zygophyllaceae under the order Zygophyllales should be re-examined. The present maximum likelihood (ML) bootstrap analysis revealed two major clades—monocots and eudicots. In the eudicots clade, F. indica clades with L. tridentata and T. mongolica (family Zygophyllaceae, order Zygophyllales) nested within the clade fabids. The maximum likelihood tree (MLT) also revealed two distinct clades of Krameriaceae and Zygophyllaceae.

Conclusions

The analyses of de novo genome sequence of F. indica (family Zygophyllaceae) have added the evidence of the loss of a copy of the IR in the cp genome of the taxa capable to grow in the hot sand desert. The maximum likelihood analysis revealed two distinct sub-clades i.e. Krameriaceae and Zygophyllaceae of the order Zygophyllales.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  70 in total

1.  A comparative analysis of the Lactuca and Helianthus (Asteraceae) plastid genomes: identification of divergent regions and categorization of shared repeats.

Authors:  Ruth E Timme; Jennifer V Kuehl; Jeffrey L Boore; Robert K Jansen
Journal:  Am J Bot       Date:  2007-03       Impact factor: 3.844

2.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

Review 3.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis.

Authors:  J A Eisen
Journal:  Genome Res       Date:  1998-03       Impact factor: 9.043

4.  Expression and characterization of antimicrobial peptides Retrocyclin-101 and Protegrin-1 in chloroplasts to control viral and bacterial infections.

Authors:  Seung-Bum Lee; Baichuan Li; Shuangxia Jin; Henry Daniell
Journal:  Plant Biotechnol J       Date:  2011-01       Impact factor: 9.803

5.  The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.).

Authors:  Meng Yang; Xiaowei Zhang; Guiming Liu; Yuxin Yin; Kaifu Chen; Quanzheng Yun; Duojun Zhao; Ibrahim S Al-Mssallem; Jun Yu
Journal:  PLoS One       Date:  2010-09-15       Impact factor: 3.240

6.  Oral delivery of bioencapsulated exendin-4 expressed in chloroplasts lowers blood glucose level in mice and stimulates insulin secretion in beta-TC6 cells.

Authors:  Kwang-Chul Kwon; Ramya Nityanandam; James S New; Henry Daniell
Journal:  Plant Biotechnol J       Date:  2012-10-18       Impact factor: 9.803

7.  Low-cost oral delivery of protein drugs bioencapsulated in plant cells.

Authors:  Kwang-Chul Kwon; Henry Daniell
Journal:  Plant Biotechnol J       Date:  2015-09-03       Impact factor: 9.803

8.  Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences.

Authors:  V Savolainen; M W Chase; S B Hoot; C M Morton; D E Soltis; C Bayer; M F Fay; A Y de Bruijn; S Sullivan; Y L Qiu
Journal:  Syst Biol       Date:  2000-06       Impact factor: 15.683

9.  Enhancement of carotenoid biosynthesis in transplastomic tomatoes by induced lycopene-to-provitamin A conversion.

Authors:  Wiebke Apel; Ralph Bock
Journal:  Plant Physiol       Date:  2009-07-08       Impact factor: 8.340

10.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.