| Literature DB >> 24376689 |
Shikha Kalra1, Bhanwar Lal Puniya2, Deepika Kulshreshtha2, Sunil Kumar1, Jagdeep Kaur1, Srinivasan Ramachandran2, Kashmir Singh1.
Abstract
Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24376689 PMCID: PMC3871651 DOI: 10.1371/journal.pone.0083336
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Effect of k-mer size on assembly of transcriptome data.
| k-mer | Total contigs | Contigs >100 bp | Longest contig length (bp) | Average length (bp) | Total bases covered by 100 bp length | N50 |
| 17 | 1079131 | 99031 | 1862 | 178 | 17930159 | 180 |
| 21 | 516724 | 101445 | 3216 | 220 | 22401122 | 243 |
| 23 | 408567 | 101589 | 3168 | 221 | 2247281 | 245 |
| 25 | 337186 | 101765 | 3179 | 216 | 2207354 | 240 |
| 27 | 276611 | 101491 | 5256 | 211 | 2240112 | 232 |
| 31 | 183200 | 95717 | 5256 | 199 | 1911429 | 211 |
Summary of filtered and assembled transcriptome data generated on Illumina Hiseq 2000 platform using RNA isolated from leaf tissue of C. borivilianum.
| Total number of single-end reads | 22,595,634 |
| Number of reads obtained after quality filtering | 22,161,444 |
| Number of assembled transcripts | 1,01,589 |
| Average length of transcripts (bp) | 221 |
| Average coverage (bp) | 22,472,81 |
Figure 1Overview of the C. borivilianum transcriptome assembly.
(A) Size distribution of the contigs obtained from de novo assembly of high quality clean reads. (B) Size distribution of the unigenes produced from further assembly of contigs after clustering.
Figure 2Similarity search analysis of unigenes against the NR database.
(A) E-value distribution of the top BLAST hits for each unigene (E-value of 1.0e−5). (B) Similarity distribution of the best BLAST hits for each unigene. (C) Species distribution is shown as the percentage of the total homologous sequences (with an E-value ≤1.0e−5). We used the NCBI NR proteins database for similarity search and extracted the best hit of each sequence for analysis.
Figure 3Gene Ontology (GO) classification of the C. borivilianum transcriptome.
GO term assignments to C. borivilianum unigenes based on significant plant species hits against the NR database. Results are summarized into three main GO categories (biological process, cellular component, molecular function) and 27 sub-categories.
Figure 4Functional characterization and abundance of C. borivilianum transcriptome for enzyme classes. C. borivilianum transcripts were classified in top 20 abundant enzyme classes; area under each pie represents the value in actual number of transcripts.
Figure 5Overrepresented pathways in KEGG database with respect to A) O. sativa and B) A. thaliana.
Figure 6Distribution of C. borivilianum transcripts in different transcription factor families.
Simple sequence repeats (SSRs) identified in transcripts of C. borivilianum.
| SSR mining | Number |
| Total number of sequences examined: | 10,11,41 |
| Total size of examined sequences (Mb): | 22.42 |
| Total number of identified SSRs: | 3,487 |
| Number of SSR containing sequences: | 3,321 (3.28%) |
| Number of sequences containing more than one SSR: | 160 |
| Number of SSRs present in compound formation: | 109 |
The discovery of SSR motifs in putative genes involved in saponin biosynthesis.
| Gene name | Unique sequence | SSR motif | Number of repeats | SSR start (bp) | SSR end (bp) | Sequence length (bp) |
|
| contig 780100 | ag | 7 | 6 | 19 | 1326 |
| 1- | contig 801760 | ac | 20 | 1 | 20 | 787 |
|
| contig 809736 | ta | 16 | 64 | 79 | 1647 |
|
| contig 676689 | cat | 16 | 1 | 16 | 355 |
Figure 7Chlorophytum borivilianum unigenes involved in two secondary metabolic pathways.
C. borivilianum unigenes involved in; (A) saponin biosynthesis, (B) flavonoid biosynthesis and (C) alkaloid biosynthesis. Red number in the bracket following each gene name indicates the number of corresponding unigenes.
Identification of genes involved in saponin biosynthesis along with their RPKM values.
| Gene Name | EC number | Transcript ID | Total ESTs | RPKM value | Number of reads |
|
| 2.3.1.9 | contig 662605, 789282, 807132, 807650, 810900 | 5 | 420.22 | 2162 |
|
| 2.3.3.10 | contig 617027, 642120, 642122, 642652, 643560, 648291, 662091, 685089, 714060, 719432, 730642, 750334, 752180, 787594, 791328 | 15 | 588.8 | 399 |
|
| 1.1.1.34 | contig 616693, 642352, 728682, 731828, 739034, 750024, 751864, 757982, 761916, 765000, 777172, 789074, 805890 | 13 | 684 | 1144 |
|
| 2.7.1.36 | contig 780100, 790236, 808328 | 3 | 116.9 | 337 |
|
| 2.7.4.2 | contig 796674, 802836 | 2 | 83.9 | 351 |
|
| 4.1.1.33 | contig 731872, 807394 | 2 | 183.9 | 932 |
|
| 2.2.1.7 | contig 630630, 639174, 659061, 672215, 692780, 701402, 716282, 758550, 770832, 776522, 785368, 788176, 796714, 799960, 801760, 807536, 809736, 812228 | 18 | 33.8 | 31 |
|
| 1.1.1.267 | contig 767146, 807128, 811584 | 3 | 49.4 | 325 |
|
| 2.7.7.60 | contig 697584, 812958 | 2 | 35.05 | 262 |
|
| 2.7.1.148 | contig 815196 | 1 | 30.19 | 302 |
|
| 4.6.1.12 | contig 809428 | 1 | 35.16 | 200 |
|
| 1.17.7.1 | contig 768924, 813016, 815362 | 3 | 91.9 | 179 |
|
| 1.17.1.2 | contig 663803, 674257, 674259, 697268, 713070, 757110, 770076, 775264, 776012, 776014, 780586, 780710, 813560 | 13 | 91.6 | 217 |
|
| 5.3.3.2 | contig 627116, 640576, 677655, 684005, 771148, 793800 | 6 | 239.49 | 128 |
|
| 2.5.1.29 | contig 669873, 709788, 771708, 791512, 804002 | 5 | 37.97 | 166 |
|
| 2.5.1.10 | contig 622530, 663119, 669873, 694410, 709520, 709788, 759242, 764802, 771708, 774914, 782044, 785872, 791512, 804002 | 14 | 337.61 | 174 |
|
| 2.5.1.21 | contig 660543, 669241, 676689, 684225, 720378, 762052, 771110, 771878, 776032 | 9 | 178.12 | 119 |
|
| 1.14.13.132 | contig 625956, 650031, 684205, 699388, 718888, 724460, 735738, 749834, 752160, 753330, 759096, 772472, 773000, 788034 | 14 | 789.79 | 897 |
|
| 5.4.99.8 | contig 634716, 641806, 644535, 660567, 679937, 737170, 761218, 762950, 777784, 785022, 796606, 802302, 805366, 814320, 815892 | 16 | 177.58 | 100 |
|
| 5.4.99.39 | contig 664097 | 1 | 30.56 | 21 |
|
| 4.2.1.125 | contig 663453 | 1 | 125.43 | 85 |
Genes involved in flavonoid biosynthesis along with their RPKM values.
| Gene Name | EC number | Transcript ID | Total ESTs | RPKM value | Number of read |
|
| 4.3.1.24 | contig 634698, 642066, 657575, 668909, 672281, 691416, 702626, 703366, 703960, 716568, 719604, 735030, 743228, 748710, 753296, 754648, 755468, 760124, 760936, 769576, 777484, 783272, 788948, 790736, 799910 | 25 | 241.09 | 474 |
|
| 1.14.13.11 | contig 626042, 673051, 688226, 704750, 755308, 758396, 791376, 794622 | 8 | 207.68 | 111 |
|
| 6.2.1.12 | contig 658459, 673103, 685642, 712208, 722868, 734074, 750294, 752608, 757946, 758914, 759276, 760860, 761416, 775342, 776318, 790346, 790364, 792818, 793014, 793670, 803146, 811102, 815080 | 23 | 1206.79 | 979 |
|
| 2.3.1.74 | contig 619401, 623782, 642216, 664433, 694872, 703744, 711022, 729452, 742538, 750790, 752302, 775504, 776662 790158, 803096, 804748 | 16 | 158.11 | 83 |
|
| 5.5.1.6 | contig 618735, 654743, 666277, 716164, 716334, 761068, 765552, 772422, 776068, 783048, 785354, 790738 | 12 | 343.98 | 174 |
|
| 1.14.13.21 | contig 645173, 688012, 697330, 716994, 717416, 719790, 732470, 733298, 746820, 756294, 765426, 780410, 781518, 796884, 799424, 805552, 813496 | 18 | 421.18 | 1013 |
|
| 1.14.13.88 | contig 638078, 679673, 705666 | 3 | 31.43 | 18 |
|
| 1.14.11.22 | contig 682129 | 1 | 30.66 | 24 |
|
| 1.14.11.9 | contig 617797, 622328, 628840, 631844, 649435, 649459, 656551, 665101, 712318, 715714, 721514, 724718, 728206, 741180, 760132, 763024, 767854, 789424, 790626, 792572, 800076, 801784, 803294, 812892 | 24 | 475.11 | 263 |
|
| 1.14.11.23 | contig 631844, 656551, 721514, 741180, 763024, 767854, 800076, 801784, 812892 | 9 | 475.11 | 263 |
|
| 1.1.1.219 | contig 701984, 702106, 781542, 789140, 789820, 796570 | 6 | 729.12 | 679 |
|
| 1.14.11.19 | contig 698156, 706394, 733490, 752474, 778182, 794658 | 6 | 18.9 | 61 |
|
| 1.3.1.77 | contig 660955, 667861, 675719, 718414, 719766, 730968, 756916, 794760 | 8 | 22.83 | 17 |