Literature DB >> 26990438

The Transcript Profile of a Traditional Chinese Medicine, Atractylodes lancea, Revealing Its Sesquiterpenoid Biosynthesis of the Major Active Components.

Shakeel Ahmed1,2,3,4, Chuansong Zhan1,2,3,4, Yanyan Yang1,3,4, Xuekui Wang1,3,4, Tewu Yang1,3,4, Zeying Zhao1,3,4, Qiyun Zhang1,2,3,4, Xiaohua Li1,2,3,4, Xuebo Hu1,2,3,4.   

Abstract

Atractylodes lancea (Thunb.) DC., named "Cangzhu" in China, which belongs to the Asteraceae family. In some countries of Southeast Asia (China, Thailand, Korea, Japan etc.) its rhizome, commonly called rhizoma atractylodis, is used to treat many diseases as it contains a variety of sesquiterpenoids and other components of medicinal importance. Despite its medicinal value, the information of the sesquiterpenoid biosynthesis is largely unknown. In this study, we investigated the transcriptome analysis of different tissues of non-model plant A. lancea by using short read sequencing technology (Illumina). We found 62,352 high quality unigenes with an average sequence length of 913 bp in the transcripts of A. Lancea. Among these, 43,049 (69.04%), 30,264 (48.53%), 26,233 (42.07%), 17,881 (28.67%) and 29,057(46.60%) unigenes showed significant similarity (E-value<1e(-5)) to known proteins in Nr, KEGG, SWISS-PROT, GO, and COG databases, respectively. Of the total 62,352 unigenes, 43,049 (Nr Database) open reading frames were predicted. On the basis of different bioinformatics tools we identify all the enzymes that take part in the terpenoid biosynthesis as well as five different known sesquiterpenoids via cytosolic mevalonic acid (MVA) pathway and plastidal methylerythritol phosphate (MEP) pathways. In our study, 6, 864 Simple Sequence Repeats (SSRs) were also found as great potential markers in A. lancea. This transcriptomic resource of A. lancea provides a great contribution in advancement of research for this specific medicinal plant and more specifically for the gene mining of different classes of terpenoids and other chemical compounds that have medicinal as well as economic importance.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26990438      PMCID: PMC4798728          DOI: 10.1371/journal.pone.0151975

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The plant Atractylodes lancea (Thunb.) DC., known as “Cangzhu” in China, “Khod-Kha-Mao” in Thailand [1] and its name in Japan is “So-ju-tsu”[1, 2]. A. lancea belongs to the Asteraceae family. The rhizome of A. lancea, generally called rhizoma atractylodes is used for treatment of influenza, rheumatic diseases, night blindness and a few digestive problems [3-5]. The history of using rhizomes of A. lancea as a drug can be traced back to Han dynasty (206BC-220AD), when it was described in Shen-nong-ben-cao-jing, the first Chinese pharmacopoeia. Later it was found that this herb include two species, A. lancea and A. Chinensis (DC.) Koids, known “Mao CangZhu” and “Bei CangZhu” separately in China and people have used these together as rhizoma Atractylodes [6]. Previous reports imply terpenoids and their glycosidal derivatives are the major active components [7, 8]. Terpenoids are the natural products that are mostly present in plants with specific structures [9]. Plant-derived Terpenoids show a diversity of medicinal effects that comprise of multiple industrial and pharmaceutical applications including antiparasitic, anticancer, antifungal, antiviral and antibacterial activities. Terpenoids are grouped as monoterpenoids, sesquiterpenoids, diterpenoids, triterpenoids, and others [10]. In general, terpenoids are synthesized in plants by way of MVA pathway and MEP pathway. In the MVA pathway, terpenoid is synthesized starting from primary metabolic product acetyl-CoA to crucial precursors such as isopentyl diphosphate (IPP) and dimethylallyl diphosphate. The reaction is catalyzed by a large variety of enzymes with special product specificities (Fig 1). In the MEP pathway, glycolysis products glyceraldehydyde-3-phosphate and pyruvate are catalyzed into 1-deoxy-D-xylulose-5-phosphate. After a few enzymatic steps the pathway runs into the same chemicals as MVA pathway (Fig 1). The 1-deoxy-D-xylulose-5-phosphate synthase (DXS) and hydroxymethylglutaryl-CoA synthase (HMGR), the rate-limiting enzymes in MEP and MVA pathway, respectively, are usually encoded by a group of small multigene families [11]. Sesquiterpenes synthase are universally expressed family of different proteins which are able to convert the universal precursor farnesyl diphosphate (FPP) into more than three hundred various sesquiterpenes skeletons [12]. The current study mainly emphasises on the biosynthetic pathway of sesquiterpenoids in A. lancea.
Fig 1

Putative sesquiterpenoid biosynthetic pathway in Atractylodes lancea.

A flow diagram of biosynthetic pathway of terpenoid backbone and sesquiterpenoids biosynthesis in Atractylodes lancea. The structures of chemicals in the pathway are shown in boxes. The green boxes represent the plasticidal pathway while the black boxes show the pathway in cytoplasm & mitochondria. The words on the boxes are enzymes for the reaction while the numbers in red color represent the number of transcripts for that specific gene. Reactions in cytoplasm, mitochondria and plastids are shown in green. The boxes with red border show the structure of various sesquiterpenoids of A. lancea.

Putative sesquiterpenoid biosynthetic pathway in Atractylodes lancea.

A flow diagram of biosynthetic pathway of terpenoid backbone and sesquiterpenoids biosynthesis in Atractylodes lancea. The structures of chemicals in the pathway are shown in boxes. The green boxes represent the plasticidal pathway while the black boxes show the pathway in cytoplasm & mitochondria. The words on the boxes are enzymes for the reaction while the numbers in red color represent the number of transcripts for that specific gene. Reactions in cytoplasm, mitochondria and plastids are shown in green. The boxes with red border show the structure of various sesquiterpenoids of A. lancea. Sesquiterpenoids have a wide variety of benefits such as pharmaceuticals, flavors, fragrances, industrial chemicals and nutraceuticals [13-15]. β-caryophyllene is an essential sesquiterpene that is present in different essential oils of many plants like cinnamon (Cinnamomum cassia), thyme (Thymus mongolicus), clove (Syringaspp.) and black pepper (Piper nigrum), of which mostly have been used for cure of different health problems as well as for fragrances [16, 17]. It is also prominently used as anti-carcinogenic & anti-microbial antioxidant, as well as skin penetration enhancer [18]. Germacrene D is another kind of sesquiterpene. It is a chiral compound, which is produced from FPP by enantionmers particular synthase [19]. Germacrene D has a sturdy effect on insect activities [20]. FPP can be converted into cyclic sesquiterpene, (E)-β-farnesene, which is catalyzed by (E)-β-farnesene synthase (β-FS)[21]. (E)-β-farnesene occurs in a variety of plants and animals & is widely used as a semio-chemical in insects and plants [22]. Sesiquiterpenes are the major components of the volatile essential oil from A. Lancea. In an effort to identify the chemical profiles of essential oil from A. Lancea, the wild grown plants produced mostly significant amount of sesiquiterpenes with the top three hinesol (68.5%), β-eudesmol (13.1%) and elemol (6.2%)[23]. However, the content of these chemicals is greatly influenced by the geographic location where the sample was taken from [23, 24], as A. Lancea is widely distributed in the vast area between Yellow River and Yangtze River of China. Among these diverse sesiquiterpenes, atractylenolides (I, II & III), atractylon, biatractylolide screened from A. Lancea were demonstrated to having a good protection against ethanol-induced gastric ulcer [25]. Atractylenolides were also proved to be insect repellents [26]. Recently it was found that a new sesquiterpenoid, hinesol, was responsible for the apoptosis in human cancer cells. On the contrary, the activity of β-eudesmol, a more commonly found sesiquiterpene in other medicinal plants also available from A. Lancea, was less effective as compared to hinesol [27]. There are other kinds of sesquiterpenes as guaiane, eudesmane, tricyclic carbon skeleton types, but the physiological activity remained to be elucidated [28]. Furthermore, it was proved that some other chemicals from A. Lancea, like atractylochromene, methylphenol derivatives, cyclohexadiene derivatives, polyacetylenes, atractylodin and acidic polysaccharides, showed diverse activities against inflammation, bacterial, fungi or obesity, but their structures are different from sesquiterpenes [29-31]. The chemical and structural diversity in A. Lancea correlates with its multiplex medicinal functions. Owing to recent advances in molecular biology and decreasing cost of next generation sequencing technology, RNA sequencing (RNA-seq) become a popular choice for the transcriptome studies especially in non-model species [32]. Consequently, RNA-seq has been extensively deployed in various TCM species, for example Chinese sage (Salvia miltiorrhiza)[33], Chinese Ginseng (Panax ginseng)[34] and Sanchi (Panax notoginseng). [35]. Deep transcriptome analysis also helps to discover various genetic profile, including alternative splicing isoforms [36], strand-specific expression [37] and microRNA discovery [38]. With the help of transcriptome sequencing, comprehensive information can be obtained on gene expression, molecular mechanisms and biological pathways, even in the absence of reference genome [39-43]. However, to date the study of A. lancea transcriptome is not reported yet. Here we report on the Illumina transcriptome sequencing, functional annotation and differential expression profiles in different tissues i.e. stem root, and leaf of A. lancea which will be an important resource for gene mining, genetic improvement and development of different molecular markers. Additionally, to further explore the differences of candidate unigenes in terpenoid biosynthesis among these A. lancea tissues, the transcriptional levels of all the related unigenes were concretely discussed. The results from our work could contribute to the discovery of genes dedicated to the terpenoid pathway and its accumulative regulation of volatile constituents in specific tissues of A. lancea. According to our information, the current research work is first report of secondary metabolic analysis in A. lancea based on de novo transcriptome analysis.

Materials and Methods

Collection of the A. lancea tissues

Prior to the experiment, the Institute of Science & Technology Development of HZAU university assured us that no specific permission is needed for the field experiment with A. Lancea in Hubei Province as it is commonly planted in China as a medicine sources. With the permission from Hubei Jintuyuan Forest Medicine & Seed Co. Ltd. (52 Jinyuanbao Avenue, Yuanbao, Lichuan City, Enshi autonomous district, Hubei province of China), experimental materials of A. Lancea was taken from a herbal medicine planting field (E08°56′, N30°18′) belongs to the company. The roots, stems and leaves of A. lancea were immediately frozen in liquid nitrogen after collection until use. The A. lancea was authenticated by Prof. Xuebo Hu, Assoc. Prof. Tewu Yang and Xuekui Wang.

cDNA library preparation and sequence data analysis and assembly

To extract the total RNA present, equivalent weight of three tissue samples were mixed by using RNeasy Plant Mini Kits (Qiagen, Inc., Valencia, CA, USA) according to the manufacturer's protocol. All the samples of extracted RNA were qualified and quantified using a Nanodrop ND-1000 Spectrophotometer (Nanodrop Technologies, Wilmington, DE, USA), they showed a 260/280 nm ratio from 1.9 to 2.1. No sign of degradation was found when RNA samples were analyzed by electrophoresis. Transcriptome analysis was done by taking equal amounts of all the three samples by using Illumina's kit following manufacturer's protocol. Briefly, the poly-(A) mRNA was purified from the total RNA by Oligotex mRNA Mini Kit (Qiagen, Inc., Valencia, CA, USA) following the manufacturer's protocol. The cDNA library construction and normalization were performed using protocols described previously [44].

Transcriptome de novo assembly

Trinity, a short read assembly package after sequencing was used for assembling Transcriptome mechanism, which consists of Inch-worm, a huge amount of RNA-seq reads were generated when processed sequentially by Chrysalis and Butterfly programs [45]. Consequent analysis of clean reads was carried out once they were filtered from the raw reads. Inchworms were the first to be used to assemble short reads with over-lapping sequences having longest contigs without gaps. Each cluster was used to construct a full de Bruijn graph after the clusters were grouped. Reads and pairs of reads were compared in equivalence to outline the pathways they had common. On the other hand full length transcripts were spliced isoforms, matching to paralogous genes, were generated by splicing apart transcripts. All such sequences from Trinity were defined as unigenes. In this study three samples of A. lancea were sequenced, sequence splicing was carried out for unigenes from each sample. Excess unigenes are separated from the required unigenes by using sequence clustering software. Unigenes are grouped into two classes after clustering genes into families: clusters (prefixed by CL) and single-tons (prefixed by unigene). Finally, we carried out alignment via BLASTx (E.value p 0.00001) between unigenes and protein databases of NR, Swiss-Prot, KEGG, and COG, and the course of unigene sequence was by using the best aligned results. If there is an incongruity among various databases, a priority order of NR, Swiss-Prot, KEGG, and COG was used to check the direction of the sequence. The unigenes whose sequences could not be determined by the above data base were aligned and their sequence directions determined using ESTScan [46].

Unigene differential expression analysis

Differential expression of gene function was performed using gene ontology (GO) functional analysis, and these differentially expressed genes were mapped in each term using GO database (http://www.geneontology.org) and then correspondent number of gene with each GO term was determined. Following the creation of gene list which includes the number of genes linked with every GO term, the significance of GO enriched in differentially expressed gene in comparison with genomic background hyper-geometric test was applied.

SSRs mining and primer design

SSRs consist of one to six nucleotide motifs, having minimum five tandem repeats. We used Microsatellite (MISA) detection tool for SSRs mining [47] and we design primer pairs using software primer3 (V.2.3.6) for each SSRs under default settings, with a range in the size of products of PCR from 100–250 bp [48, 49].

Results and Discussion

A. lancea transcriptome sequencing and unigene assembly

To clarify a comprehensive overview of gene expression profiles in A. lancea tissues, the construction of cDNA libraries were made from different samples of leaf, root and stem of A. lancea, respectively and sequenced by the Illumina transcriptome platform in our experiments. After removal of adaptor sequences and low quality reads, a total of 43,921,277, 37,866,604 and 40,135,278 clean reads were acquired from leaf, root and stem tissues, respectively (Table 1). These data sizes are bigger than those from peanut (Arachis hypogaea) [44], yellow horn (Xanthoceras sorbifolium) [50], siberian apricot (Prunus sibirica) [51] and Centella (Centella asiatica) [52], suggesting that the relatively complete read databases were successfully constructed from different tissues of A. lancea by Illumina sequencing. Subsequently, Trinity software was used for assembly of these clean reads (Trinityrnaseq_r2013_08_14) and low density and quality reads were filtered out, resulting in 64,106, 55,409 and 56,565 unigenes in the leaves, root and stem respectively. After de novo assembly of three A. lancea tissues, 62,352 unigenes were finally obtained with an average length of 913 bp (Table 1). Among these, 42127 unigenes having a length range between 300 nt to 1000 nt and 15263 unigenes having a length longer than 1 kb (>1000 nt) as shown in Fig 2. Furthermore, we found that the sum of unigenes (62,352) in A. lancea is more than the identified number of unigenes 59,236 in peanut (A. hypogaea) [44], 51,867 unigenes in yellow horn (X. sorbifolium) [50] and 46,940 unigenes in Siberian apricot (P. sibirica) [51].
Table 1

Statistic of sequencing and de novo assembling of transcriptome in Atractylodes lancea.

SampleTotal numberTotal length(nt)Mean Length(nt)N50Total consensus sequencesDistinct ClustersDistinct Singletons
ContigsLeaf11288342287508375806000
Root9466337505566396837000
Stem10167939492194388359000
UnigenesLeaf64106439212776851258641061971844388
Root55409378666046841221554091680238607
Stem56565401352787101328565651694739618
Total62352569232909131494623522397438378
Fig 2

Length distribution of unigenes in Atractylodes lancea.

The x-axis represent the size of the all assembled sequences and the y-axis indicates the corresponding number of unigenes.

Length distribution of unigenes in Atractylodes lancea.

The x-axis represent the size of the all assembled sequences and the y-axis indicates the corresponding number of unigenes.

Functional annotation of A. lancea unigenes

The species distribution of the non-redundant (Nr) annotation is shown in Fig 3. There was 23.79% of unigenes shown the highest homology to genes from grape (Vitis vinifera), 9.5% of unigenes matched to potato (Solanum tuberosum), 8.1% of unigenes matched to cacao (Theobroma cacao), 6.6% of unigenes matched to tomato (Solanum lycopersicum) and 5.8% & 5.1% of unigenes matched to populus (Populus trichocarpa) and peach (Prunus persica), respectively. All the A. lancea unigenes from different tissues were predicted via BLAST (basic local alignment search tool) with a cut-off E-value of 10−5 in public databases such as non-redundant (NR), SWISS-PROT, kyoto encyclopedia of genes and genomes (KEGG), classification of Orthologous Group (COG), and gene ontology (GO), which retrieved higher sequence similarity proteins among specific unigenes beside their functional annotations. From the BLAST results, a total of 43,049 (69.04%), 30,264 (48.53%), 26,233 (42.07%), 17,881(28.67%) and 29,057(46.60%) unigenes showed diverse similarity to well-known proteins in above mentioned databases, respectively (Table 2). However, 44,482 unigene (71.34%) sequence orientations are still unknown, which is higher than the peanut (A. hypogaea) (27.8%) [44] but lower than that of Chinese tulip tree (Liriodendron chinense) (73.60%) [53]. This is because of the lack of A. lancea genomic information, and few or no effective characterized protein domains of the shorter sequences for getting BLAST hits. Also, it is possible that some un-matched unigenes are the novel genes specific for A. lancea.
Fig 3

The species distribution of the non-redundant unigene annotation.

The column shows the homology of Atractylodes lancea unigene number with that from other species. The numbers inside parentheses indicate the percentage of the homology to different species.

Table 2

Statistics of annotations for assembled unigenes of Atractylodes lancea in different public databases.

DatabaseUnigenesPercentage(%)
NR4304969.04
SWISS-PROT3026448.53
KEGG2623342.07
COG1788128.67
GO2905746.6
ALL4448271.34

The species distribution of the non-redundant unigene annotation.

The column shows the homology of Atractylodes lancea unigene number with that from other species. The numbers inside parentheses indicate the percentage of the homology to different species.

Functional classification of A. lancea unigenes by GO, COG and KEGG

For categorizing the function of predicted A. lancea unigenes gene, ontology (GO) annotation was used [54]. In total, 29,057 unigenes were selected for three main GO categories and 56 subcategories (Fig 4). It shows that “metabolic process”, “cellular process”, “binding” and “catalytic activity” are the most dominant category involving more than 180,000 unigenes, while a small portion of genes were linked with terms such as “pigmentation”, “receptor regulator activity” and “protein tag”. It is interesting to observe that 20,169 unigenes from GO analysis had not been annotated in the Swiss-Prot database, which could be explained by the fact that the prediction quality could be significantly improved by GO annotation as the clustering of proteins determine their sub cellular locations reflection in a better way [55].
Fig 4

Distributions of GO annotation of all unigenes.

The results were classified into three main categories: biological process, cellular component, and molecular function. The left y-axis indicates the percentage of a specific category of genes in that category. The right y-axis indicates the number of genes in a category.

Distributions of GO annotation of all unigenes.

The results were classified into three main categories: biological process, cellular component, and molecular function. The left y-axis indicates the percentage of a specific category of genes in that category. The right y-axis indicates the number of genes in a category. To further expose the value of annotation process and predict possible functions of unigenes, we looked for the annotated sequences for genes involved in the classification of orthologous group (COG) to classify the orthologous products of genes [56]. COG database was used for the alignment and for prediction and classification of possible function of all A. lancea unigenes. Results revealed that 17,881 unigenes were recognized as 25 COG classifications (Fig 5). In 25 COG categories, the largest group represents “general function prediction (5837 unigenes)”, second cluster was ‘transcription’ (3105 unigenes) and then ‘replication, recombination & repair’ (2732 unigenes). It was also observed that just a few genes found related to the terms as “extracellular structures” and “Nuclear structures".
Fig 5

COG function classification of all unigenes.

The annotated unigenes are divided into a variety of functional orthologous groups, which are indicated by letters A-Z and annotated besides the figure.

COG function classification of all unigenes.

The annotated unigenes are divided into a variety of functional orthologous groups, which are indicated by letters A-Z and annotated besides the figure. For further recognition of the interaction and biological functions of genes in the A. lancea, KEGG was used to make canonical pathways as reference mapping of all annotated sequences [57]. KEEG was employed as a reference database of pathway networks for integration and interpretation of large scale datasets generated by high-throughput sequencing technology [58, 59]. On the fact that some unigenes were recruited in several KEGG pathways during the analysis, 26,233 unigenes were assigned to 128 KEGG pathways (S1 Table), of which most represented by Metabolic Pathway (5971 unigenes, 22.76% of annotated to KEGG database), followed by “biosynthesis of secondary metabolites” (2957 unigenes, 11.27% of annotated to KEGG database), “plant-pathogen interactions” (1608 unigenes, 6.13% of annotated to KEGG database), “Plant hormone signal transduction” (1396 unigenes, 5.32%of annotated to KEGG database) and “Ribosome” (1174 unigenes, 4.48% of annotated to KEGG database).

Differentially expressed genes (DEGs) in the leaf vs. stem, leaf vs. root and root vs. stem in A. lancea

A major function of the transcriptome sequencing is for comparison of levels of gene expression among different samples. To check the differences in expression of gene among three libraries from the leaf, stem and root, the tag frequencies of leaf vs. stem, leaf vs. root and root vs. stem were used. Through FPKM method (fragments per kb per million reads) all-unigene expressions were calculated. Firstly fragments density measures was normalized and for judgment of significance of gene expression false discovery rate(FDR) < 0.001 were used and the total value of |log2Ratio| ≥ 1 was used as a threshold. In Fig 6 the result shows a two-fold transcript difference among three libraries. We identified 22543, 18263 and 16370 unigenes in leaf vs. stem, leaf vs. root and root vs. stem libraries respectively that were differently expressed in all three libraries (S2 Table). Of these 11642, 8668 and 9038 unigenes were up-regulated and 10901, 9605 and 7232 unigenes in three libraries were down-regulated regulated by the log2 ratio bigger than 2 or less than 0.5 of leaf vs. stem, leaf vs. root and root vs. stem, respectively. It also showed among these differential expression genes, most were found expressed in the root, and then the stem and leaf. One assumption is that the diverse chemical synthesis of the plant is largely processed in the root.
Fig 6

Differentially expressed genes profiling of three libraries of leaf, root and stem of Atractylodes lancea.

The red and green columns indicate up- and down-regulated genes in comparisons of leaves, stem and root libraries in A. lancea. FDR≤0.05 and the absolute value of Log2FC Ratio ≥1 were used as the threshold to judge the significance of gene expression difference from transcriptome data.

Differentially expressed genes profiling of three libraries of leaf, root and stem of Atractylodes lancea.

The red and green columns indicate up- and down-regulated genes in comparisons of leaves, stem and root libraries in A. lancea. FDR≤0.05 and the absolute value of Log2FC Ratio ≥1 were used as the threshold to judge the significance of gene expression difference from transcriptome data.

Analysis of A. lancea unigenes related to terpenoid backbonebiosynthesis

Based on the Nr annotation, a total of 77 Contigs/unigenes were identified as the genes of MVA pathway, that include acetyl CoA C-acetyltransferase (AACT), 3-hydroxy-3-methylglutaryl CoA synthase (HMGS), 3-hydroxy-3-methylglutaryl CoA reductase (HMGR), mevalonate kinase (MK), phosphomevalonate kinase (PMK), mevalonate-5-pyrophosphate decarboxylase (MDC), isopentenyl diphosphate isomerase (IPPI), geranyl diphosphate synthase (GPPS), farnesyl diphosphate synthase (FPPS), beta-caryophyllene synthase (QHS1), germacrene D synthase (GDS), germacrene A synthase (GAS) and E-β-farnesene synthase (β-FS). These genes produce β-caryophyllene, germacrene D, germacrene A and E-β-farnesene four different types of sesquiterpenoids [21, 60–62]. It also has to be pointed out that due to the limitation of short reads of RNA-seq, some unigenes assembled by the software are too short to represent real transcripts. Other unigenes are long enough to cover one or two domains of usual protein size, but they are almost identical to a longer transcript except a small part of the fragments. These unigenes are likely from one gene, possibly generated with selective transcripts or assembly error. In the end, we predicted 33 unigenes that are responsible for the enzymatic synthesis of MVA pathway (Fig 1). Nevertheless, these unigenes need to be approved by future cloning. Based on our analysis, up to 10 non-redundant unigenes were present in the plastidal MEP pathway, liable for the synthesis of the isopentenyl diphosphate that is the building block of terpenoids. These included 2 unigenes for 1-deoxy-D-xylulose-5-phosphate synthase (DXPS), two unigenes for 1-deoxy-D-xylulose-5-phosphate reductoisomerase (DXR), 1 unigene for 2-C-methyl-D-erythritol 4-phosphate cytidylyl transferase (MCT), 2 unigenes for 4-(cytidine 5'-diphospho)-2-C-methyl-D-erythritol kinase (CMK), 1 unigene for 2-C-methyl-D-erythritol-2, 4-cyclodiphosphate synthase (MECPS), 1 unigenes for 4-hydroxy-3-methyl but-2-(E)-enyl diphosphate (HDS), and 1 unigene for 4-hydroxy-3-methyl but-2-(E)-enyl diphosphate reductase (HDR). It is shown that most of candidate genes from the MEP pathway were up-regulated in leaves except DXPS and DXR. One DXPS (CL8765.Contig3_All) is up-regulated in the root and one (Unigene15742_All) is down-regulated in the stem. While in case of HDR, out of two unigene, (CL5530.Contig1_All) is up-regulated in the root (S1 Table). We also found that the genes from MEP pathway showed higher expression in leaves than in root and stem at the transcriptional level. Only one unigene was found codifying the isopentenyl diphosphate delta-isomerase which catalyzes the alteration of isopentenyl diphosphate Dimethylallyl diphosphate. Moreover, we found that prenyl-transferases, which generates higher-order building blocks: farnesyl diphosphate synthase (2 contigs/ unigene) and geranyl diphosphate synthase (8 unigenes), are the originator of different categories of terpenoids. The protein sequences of all the transcripts are provided in S1 Table.

Analysis of A. lancea unigenes related to sesquiterpenoids biosynthesis

Sesquiterpenoids are derived from FPP which can be cyclized to produce various structures by different types of enzymes [63]. In this study, 19 contigs/unigenes (≥200 bp) were annotated to be involved in four different types of sesquiterpenoid biosynthesis, which includes β-caryophyllene synthase, germacrene D synthase, E-β-farnesene synthase and germacrene A synthase (Fig 1). These contigs/unigenes are CL4471.contig1_ALL, CL8689.contig1_ALL, Unigene14966_All, Unigene20711_All, Unigene20756_All, Unigene20757_All Unigene21327_All, Unigene21328_All, Unigene21329_All, Unigene21417_All, Unigene23621_All, Unigene25222_All, Unigene32673_All, Unigene33174_All, Unigene33794_All, Unigene34375_All, CL1332.Contig1_All, CL1748.Contig6_All and CL5528.Contig1_All. We noticed that, the above four sesquiterpenoid synthases share a high homology and it is difficult to separate them from each other without experimental confirmation. Furthermore, these enzymes are also homologous to sesquiterpene cyclase, β-pinene synthase, α-isocomene synthase, etc. All these enzymes are commonly grouped as sesquiterpene synthases. However, only one potential unigene (CL5528.Contig1_All) among the 19 sesquiterpene synthases is predicted to be germacrene A synthase with certainty. In another case, β-eudesmol synthase was reported for the specific sesquiterpene β-eudesmol biosynthesis [64]. But we were unable to find any candidate could match the enzyme with the current searching criteria. Previous study showed that βeudesmol was not always present in A. lancea samples [23, 24], In our study β-eudesmol synthase was not detected either because the gene expression was too low to be captured or because of the poor fragmentation or enrichment during the process of RNA-Seq. It is very intriguing to pinpoint all enzymes especially for A. lancea sesquiterpenoid biosynthesis. Previous study indicates that cytochrome P450 oxidase (CYP) plays an important role in generation of all kinds of terpenoid derivatives [65]. In a search of all possible CYPs encoded by the A. lancea transcriptome, a total of 3,241 CYP contigs were found in the Nr annotation. Further filtration of redundant contigs with possibly the same predicted functions, the CYP quantity was narrowed down to 369 (S1 Table), however, it is still 1.5 times more than that of Arabidopsis thaliana [65], indicating a more sophisticated chemical process and diversity. Besides the sesquiterpene, there are other kinds of terpenes with less content in the A. Lancea. It correlates with the discovery of a large number of CYP genes, of which some are predicted to be terpene modifiers. All these components lay the foundation of chemical diversity for the fact that it treats various diseases. The study on the biochemical properties of enzymes for sesiquiterpenes biosynthesis has made substantial progress in the past years, such as discovery of committed enzymatic steps in the biosynthesis of sesiquiterpenes [66, 67]. However, the identification and cloning of these enzymes are more challenging. Other than β-caryophyllene synthase, germacrene D synthase, E-β-farnesene synthase and germacrene A synthase, there are a few similar genes have been elucidated like tomato sesquiterpene synthase (Sst1) and Sst2 [68], aoeghum terpene synthase (SbTPS1-SbTPS7) [69], and a Cstps1, a sesquiterpene synthase-encoding genes for citurs aroma formation [70]. In our annotation database, we could sort out a bunch of sesquiterpene synthases. But due to the structural similarity between the sesquiterpenes, the sesquiterpene synthases also come with a close homology. Future study may explain whether those sesquiterpene synthase candidates can be grouped into further subgroups of each with the specificity to one kind of sesquiterpene.

SSR markers development in Atractylodes lancea

SSRs are used as chief molecular markers. These repetitive DNA sequences symbolize a vital section of an advanced eukaryote genome. These typically co ascendant and highly polymorphic are widely utilized for marker systems of genetic mapping, molecular breeding in a wide variety of species [71-76]. In order to develop SSR markers in A. lancea and find potential microsatellites, all the 62,352 unigenes produced in current study were utilized, for all motifs they were defined as bi-hexa nucleotide SSR with at least four repeating units (except for di-nucleotide with a minimum of six repeating units, and tri-nucleotide with a minimum of five repeats). By using different primers (S1 Table), total of 6,864 microsatellites were identified in 5,970 unigenes, 757 unigenes contained more than 1 SSR. Di-nucleotide motifs were found to be the most abundant types (3,122, 45.48%), followed by tri-nucleotide (2,307, 33.61%), hexa-nucleotide (432, 6.29%), penta-nucleotide (303, 4.41%) and tetra-nucleotide (130, 1.89%), (Table 3). In our current study, AG/CT repeat was found the most abundant motif among all the searched SSRs, (2252, 32.80%), followed by AC/GT (523, 7.61%), ACC/GGT (505, 7.35%), and AAG/CTT (484, 7.05%) (Fig 7). Conventional methods for SSR marker development are expensive, arduous and time-consuming. The newly discovered and developed high-throughput sequencing technique is a powerful and cheap tool for transcriptome sequencing [77]. For microsatellite mining, SSR markers are being developed by the transcriptome data, and had been utilized in many species [55, 78, 79].
Table 3

A summary of SSRs identified in Atractylodes lancea.

Searching ItemsNumbers
Total number of sequence examined62352
Total size of examined sequence56,9232,90
Total number of identified cSSRs6864
Number of cSSRs containing sequences5970
Number of sequences containing more than one cSSRs757
Number of cSSRs present in compound formation303
Mono-nucleotides570
Di-nucleotides3122
Tri-nucleotides2307
Tetra-nucleotides130
Penta-nucleotides303
Hexa-nucleotides432
Fig 7

Quantity statistics of SSR classification: The X-axis is the repeat times of repeat units; the Y-axis is the number of SSRs from Atractylodes lancea.

The di-nucleotide category was found in large number and among the di-nucleotide (AG/CT) was most abundant one in our SSRs.

Quantity statistics of SSR classification: The X-axis is the repeat times of repeat units; the Y-axis is the number of SSRs from Atractylodes lancea.

The di-nucleotide category was found in large number and among the di-nucleotide (AG/CT) was most abundant one in our SSRs.

Discovery of simple nucleotide polymorphisms (SNPs)

Innovating SNPs from cDNA libraries mapping revealed 91,540; 81,678 and 87,329 SNPs across 112,883; 94,663 and 101,679 contigs in leaves, root and stem of A. lancea, respectively (Table 4). A total of 260,547 heterozygous SNPs were detected from all three samples and out of these 165,120 were transitions and 95,427 were transversions (Fig 8). We also found several prospective SNP markers, which can be beneficial for the phylogenetic and population genetic studies of A. lancea. The identified SNP markers can be constructive to assist in genetic mark of selection for genetic association analysis in further research and also for identification of functional variations [80, 81]. The identification of huge SNPs provides affluence of potential markers to be helpful in various applications, such as linkage mapping, population genetics, and gene-based association studies and comparative genomics.
Table 4

A summary of SNP results in Atractylodes lancea.

SNP TypeLeaveRootStemTotal
Transition5783951702555791,65,120
AG29135260422801583192
CT28704256602756481928
Transversions33701299763175095427
AC84287402790223732
AT96018564898927154
GC72816629695220862
GT83917381790723679
Total9154081678873292,60,547
Fig 8

Statistics of SNP number.

The X-axis is SNP types; the Y-axis is the number of SNP.

Statistics of SNP number.

The X-axis is SNP types; the Y-axis is the number of SNP.

Conclusion

People in South-East Asian countries like China, Japan and Thailand make use of A. lancea as a traditional medicine for different diseases for a long time. Here we report the Illumina transcriptome sequencing, functional annotation and differential expression profiles in the different tissues i.e. stem root, and leaf of A. lancea which will be an important resource for gene mining, genetic improvement and development of different molecular markers. In current study 62,352 high quality unigenes were obtained from these tissues. Additionally we found the unigenes that are responsible for encoding the different enzymes that are involved it the biosynthesis of terpenoid backbone pathway as well as sesquiterpenoids which will help future functional & comparative genomic research on this important plant.

Unigenes from the transcriptome of Atractylodes lancea by KEGG.

26,233 unigenes were assigned to 128 KEGG pathways. (XLSX) Click here for additional data file.

Differentially expressed unigenes from three libraries of Atractylodes lancea showing up and down regulated unigenes (leaf vs stem), (leaf vs root), (root vs stem).

(XLSX) Click here for additional data file.

Putative unigenes along their enzymes of the terpenoid backbone biosynthesis as well as four different types of sesquiterpenoids with their FPKM value showing their expression in three different tissues of Atractylodes lancea.

(XLSX) Click here for additional data file.

Protein sequences of all the transcripts involved in the sesquiterpenoid biosynthesis.

(TXT) Click here for additional data file.

All contigs and unigenes for CYP450 in Atractylodes lancea.

(XLSX) Click here for additional data file.

Primers used for SSR analysis in Atractylodes lancea.

(XLS) Click here for additional data file.
  69 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Cloning and expression of sesquiterpene synthase genes from lettuce (Lactuca sativa L.).

Authors:  Mark H Bennett; John W Mansfield; Mervyn J Lewis; Michael H Beale
Journal:  Phytochemistry       Date:  2002-06       Impact factor: 4.072

3.  Insecticidal and repellant activities of polyacetylenes and lactones derived from Atractylodes lancea rhizomes.

Authors:  Hai-Ping Chen; Li-Shi Zheng; Kai Yang; Ning Lei; Zhu-Feng Geng; Ping Ma; Qian Cai; Shu-Shan Du; Zhi-Wei Deng
Journal:  Chem Biodivers       Date:  2015-04       Impact factor: 2.408

4.  Hinesol, a compound isolated from the essential oils of Atractylodes lancea rhizome, inhibits cell growth and induces apoptosis in human leukemia HL-60 cells.

Authors:  Yutaka Masuda; Takayuki Kadokura; Maki Ishii; Kimihiko Takada; Junichi Kitajima
Journal:  J Nat Med       Date:  2015-04-02       Impact factor: 2.343

5.  Rapidly developing functional genomics in ecological model systems via 454 transcriptome sequencing.

Authors:  Christopher W Wheat
Journal:  Genetica       Date:  2008-10-18       Impact factor: 1.082

6.  Dynamic evolution of herbivore-induced sesquiterpene biosynthesis in sorghum and related grass crops.

Authors:  Xiaofeng Zhuang; Tobias G Köllner; Nan Zhao; Guanglin Li; Yifan Jiang; Liucun Zhu; Jianxin Ma; Jörg Degenhardt; Feng Chen
Journal:  Plant J       Date:  2011-10-13       Impact factor: 6.417

7.  Cloning, expression, purification and characterization of recombinant (+)-germacrene D synthase from Zingiber officinale.

Authors:  Sarah Picaud; Mikael E Olsson; Maria Brodelius; Peter E Brodelius
Journal:  Arch Biochem Biophys       Date:  2006-06-21       Impact factor: 4.013

8.  Orchardgrass (Dactylis glomerata L.) EST and SSR marker development, annotation, and transferability.

Authors:  B Shaun Bushman; Steve R Larson; Metin Tuna; Mark S West; Alvaro G Hernandez; Deepika Vullaganti; George Gong; Joseph G Robins; Kevin B Jensen; Jyothi Thimmapuram
Journal:  Theor Appl Genet       Date:  2011-04-05       Impact factor: 5.699

Review 9.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

10.  De novo sequencing and assembly of Centella asiatica leaf transcriptome for mapping of structural, functional and regulatory genes with special reference to secondary metabolism.

Authors:  Rajender S Sangwan; Sandhya Tripathi; Jyoti Singh; Lokesh K Narnoliya; Neelam S Sangwan
Journal:  Gene       Date:  2013-05-01       Impact factor: 3.688

View more
  11 in total

1.  Preventive Effects of the Intestine Function Recovery Decoction, a Traditional Chinese Medicine, on Postoperative Intra-Abdominal Adhesion Formation in a Rat Model.

Authors:  Cancan Zhou; Pengbo Jia; Zhengdong Jiang; Ke Chen; Guanghui Wang; Kang Wang; Guangbing Wei; Xuqi Li
Journal:  Evid Based Complement Alternat Med       Date:  2016-12-26       Impact factor: 2.629

2.  Genetic and environmental factors influencing the contents of essential oil compounds in Atractylodes lancea.

Authors:  Takahiro Tsusaka; Bunsho Makino; Ryo Ohsawa; Hiroshi Ezura
Journal:  PLoS One       Date:  2019-05-28       Impact factor: 3.240

3.  Evaluation of heritability of β-eudesmol/hinesol content ratio in Atractylodes lancea De Candolle.

Authors:  Takahiro Tsusaka; Bunsho Makino; Ryo Ohsawa; Hiroshi Ezura
Journal:  Hereditas       Date:  2020-03-11       Impact factor: 3.271

4.  De novo transcriptome sequencing of Paecilomyces tenuipes revealed genes involved in adenosine biosynthesis.

Authors:  Long Han; Yaying Li; Xinyu Meng; Guodong Chu; Yongxin Guo; Muhammad Noman; Yuanyuan Dong; Haiyan Li; Jing Yang; Linna Du
Journal:  Mol Med Rep       Date:  2020-09-02       Impact factor: 2.952

5.  Effect of drought on photosynthesis, total antioxidant capacity, bioactive component accumulation, and the transcriptome of Atractylodes lancea.

Authors:  Aqin Zhang; Mengxue Liu; Wei Gu; Ziyun Chen; Yuchen Gu; Lingfeng Pei; Rong Tian
Journal:  BMC Plant Biol       Date:  2021-06-25       Impact factor: 4.215

Review 6.  Researches on Transcriptome Sequencing in the Study of Traditional Chinese Medicine.

Authors:  Jie Xin; Rong-Chao Zhang; Lei Wang; Yong-Qing Zhang
Journal:  Evid Based Complement Alternat Med       Date:  2017-08-16       Impact factor: 2.629

7.  Transcriptome analysis reveals the genetic basis underlying the biosynthesis of volatile oil, gingerols, and diarylheptanoids in ginger (Zingiber officinale Rosc.).

Authors:  Yusong Jiang; Qinhong Liao; Yong Zou; Yiqing Liu; Jianbin Lan
Journal:  Bot Stud       Date:  2017-10-23       Impact factor: 2.787

8.  Identification of Glutathione Peroxidase (GPX) Gene Family in Rhodiola crenulata and Gene Expression Analysis under Stress Conditions.

Authors:  Lipeng Zhang; Mei Wu; Deshui Yu; Yanjiao Teng; Tao Wei; Chengbin Chen; Wenqin Song
Journal:  Int J Mol Sci       Date:  2018-10-25       Impact factor: 5.923

Review 9.  Atractylodis Rhizoma: A review of its traditional uses, phytochemistry, pharmacology, toxicology and quality control.

Authors:  Wen-Jin Zhang; Zhen-Yu Zhao; Li-Kun Chang; Ye Cao; Sheng Wang; Chuan-Zhi Kang; Hong-Yang Wang; Li Zhou; Lu-Qi Huang; Lan-Ping Guo
Journal:  J Ethnopharmacol       Date:  2020-09-28       Impact factor: 4.360

10.  Comparative transcriptome analysis reveals sesquiterpenoid biosynthesis among 1-, 2- and 3-year old Atractylodes chinensis.

Authors:  Jianhua Zhao; Chengzhen Sun; Fengyu Shi; Shanshan Ma; Jinshuang Zheng; Xin Du; Liping Zhang
Journal:  BMC Plant Biol       Date:  2021-07-27       Impact factor: 4.215

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.