| Literature DB >> 27833630 |
Avneesh Kumar1, Sunil Kumar1, Savita Bains1, Vanya Vaidya1, Baljinder Singh1, Ravneet Kaur1, Jagdeep Kaur1, Kashmir Singh1.
Abstract
Phyllanthus emblica is an affluent source of various therapeutic components. A few of them like vitamin C and flavonoids are predominant bioactive compounds that are being used in immense pharmacological applications. In-spite of numerous applications, the genomic information of this plant was limited to a few expressed sequence tags (ESTs) in DNA databases. Herein, we developed in-depth transcriptome information of P. emblica using Illumina Hiseq 2000 platform and characterized. A total of 31,285,965 high-quality reads were assembled into 91,288 contigs with the N50 value 358. Out of them, 47,267 contigs were functionally annotated using BLASTX search against NCBI-non-redundant (NR) protein database. Further, 31,366 contigs showed similarity with various gene ontology (GO) terms, and 1299 were related to different enzymes and biosynthetic pathways. We identified the transcripts related to each gene involved in flavonoid and vitamin C biosynthesis. Several cytochrome P450s (CYPs) and glucosyltransferases (GTs) genes involved in flavonoid biosynthesis and various other metabolic pathways were also documented. Further, 6510 transcription factors and 4420 EST derived simple sequence repeat (SSR) markers were also predicted. The present study enlightened various characteristic features of P. emblica genome, and provided an important resource for future molecular and functional genomics studies.Entities:
Keywords: Phyllanthus emblica; flavonoids; gene ontology; simple sequence repeats; transcription factors; transcriptome; vitamin C
Year: 2016 PMID: 27833630 PMCID: PMC5081490 DOI: 10.3389/fpls.2016.01610
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Summary of transcriptome data generated on Illumina Hiseq2000 for .
| Total number of single-end reads | 323,828,64 |
| Number of reads obtained after quality filtering | 31,285,965 |
| Number of assembled transcripts | 91,288 |
| Singletons | 89,242 |
| Longest contig length | 5418 |
| Average length of transcripts (bp) | 278 |
| Total base covered (bp) | 25,400,212 |
Figure 1Similarity distribution with different plant species using the NR protein database (with an .
Figure 2Overview of similarities of contigs found against GO, domain, and EC databases.
Figure 3Gene Ontology (GO) classification of the . GO term are summarized into three main categories (A-biological process, B-cellular component, and C-molecular function) based on significant hits of unigenes against the NR database.
Figure 4Abundance of enzyme classes (Top 20) in . Area under each pie represents the % value of actual number of transcripts.
Figure 5Number of unigenes of . Top 22 families are shown here and rest are mentioned in Supplementary Table S3.
Figure 6. Number in brackets following EC number indicates the number of unigenes identified for the corresponding gene.
Figure 7. Number in brackets indicates the number of unigenes identified for the corresponding gene.
Simple sequence repeats (SSRs) identified in transcripts of .
| Total number of sequences examined: | 91,288 |
| Total size of examined sequences (Mb): | 25.42 |
| Total number of identified SSRs: | 4420 |
| Number of SSR containing sequences: | 4079 |
| Number of sequences containing more than one SSR: | 314 |
| Number of SSRs present in compound formation: | 248 |
| Mononucleotide | 986 |
| Dinucleotide | 1925 |
| Trinucleotide | 1479 |
| Tetranucleotide | 20 |
| Pentanucleotide | 6 |
| Hexanucleotide | 14 |
Simple sequence repeats (SSRs) identified in the genes involved in flavonoid and vitamin C biosynthesis.
| NODE_135356 | (A)11 | |
| NODE_45303 | (GA)20 | |
| NODE_6237 | (CT)22 | |
| NODE_119675 | (GA)5 | |
| NODE_25565 | (GCAT)3 | |
| NODE_15063 | (TG)5 | |
| NODE_14146 | (GGC)4 | |