| Literature DB >> 29120386 |
Hukam C Rawal1, Shrawan Kumar2, Amitha Mithra S V3, Amolkumar U Solanke4, Deepti Nigam5, Swati Saxena6, Anshika Tyagi7, Sureshkumar V8, Neelam R Yadav9, Pritam Kalia10, Narendra Pratap Singh11, Nagendra Kumar Singh12, Tilak Raj Sharma13, Kishor Gaikwad14.
Abstract
Clusterbean (Cyamopsis tetragonoloba L. Taub), is an important industrial, vegetable and forage crop. This crop owes its commercial importance to the presence of guar gum (galactomannans) in its endosperm which is used as a lubricant in a range of industries. Despite its relevance to agriculture and industry, genomic resources available in this crop are limited. Therefore, the present study was undertaken to generate RNA-Seq based transcriptome from leaf, shoot, and flower tissues. A total of 145 million high quality Illumina reads were assembled using Trinity into 127,706 transcripts and 48,007 non-redundant high quality (HQ) unigenes. We annotated 79% unigenes against Plant Genes from the National Center for Biotechnology Information (NCBI), Swiss-Prot, Pfam, gene ontology (GO) and KEGG databases. Among the annotated unigenes, 30,020 were assigned with 116,964 GO terms, 9984 with EC and 6111 with 137 KEGG pathways. At different fragments per kilobase of transcript per millions fragments sequenced (FPKM) levels, genes were found expressed higher in flower tissue followed by shoot and leaf. Additionally, we identified 8687 potential simple sequence repeats (SSRs) with an average frequency of one SSR per 8.75 kb. A total of 28 amplified SSRs in 21 clusterbean genotypes resulted in polymorphism in 13 markers with average polymorphic information content (PIC) of 0.21. We also constructed a database named 'ClustergeneDB' for easy retrieval of unigenes and the microsatellite markers. The tissue specific genes identified and the molecular marker resources developed in this study is expected to aid in genetic improvement of clusterbean for its end use.Entities:
Keywords: Cyamopsis tetragonoloba; RNA-Seq; clusterbean; database; microsatellite markers; polymorphism; tissue-specific; transcriptome
Year: 2017 PMID: 29120386 PMCID: PMC5704226 DOI: 10.3390/genes8110313
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Summary of the trimming results with Trimmomatic for each cDNA library sequenced.
| Library/Sample | Number of Raw Reads (Paired) | Number of HQ Reads (Paired) | Number of HQ Reads (Un-Paired) | HQ Reads (Bases) | Average Length (HQ Paired Reads) |
|---|---|---|---|---|---|
| 16,325,263 | 11,896,062 | 3,967,411 | 2,408,942,984 | 88.54 | |
| 41,575,280 | 30,676,700 | 9,701,552 | 6,192,249,152 | 88.91 | |
| 92,030,452 | 66,439,261 | 22,849,685 | 13,560,175,894 | 88.90 | |
| 149,930,995 | 109,012,023 | 36,518,648 | 22,161,368,030 |
HQ: high quality.
Transcriptome assembly and functional annotation of Cyamopsis tetragonoloba.
| Assembly Statistics | Data |
|---|---|
| Total Assembled | 127,706 (179.50 Mb) |
| Average Length | 1405.63 bp |
| GC% | 39.22 |
| ≥1000 bp | 64,606 (150.06 Mb) |
| ≥5000 bp | 2218 (138.75 Mb) |
| ≥10,000 bp | 53 (628.55 kb) |
| Largest Transcripts | 16,940 bp |
| N50 Length | 2263 bp |
| N75 Length | 2931 bp |
| L50 | 26,460 |
| L75 | 14,819 |
| Total Number | 110,485 (152.13 Mb) |
| Average Length | 1376.95 bp |
| Number of HQ Unigenes | 48,007 (76.01 Mb) |
| Average Length (HQ Unigenes) | 1583.43 bp |
| N50 Length (HQ Unigenes) | 2179 bp |
| GC% (HQ Unigenes) | 39.87 |
| Database Searched | Unigenes with significant hits |
| Against NCBI-Plant-Genes | 37,382 |
| Against SwissProt DB | 28,905 |
| Against Pfam | 34,752 |
| With Gene Ontology (GO) terms | 30,020 |
| With Enzyme Commission (EC) numbers | 9984 |
| All annotated transcript | 37,442 |
| With No Significant hit | 10,565 |
Figure 1Statistics of BLAST search results of HQ Unigenes against Plant-genes database. (a) Length-wise distribution of HQ unigenes (query) sequence with significant matches (E-value ≤ 1 × 10−10). A very high proportion (>98%) of large unigenes (>3 kb) showing significant matches; (b) similarity distribution of the best BLAST hits for each of the unigene with significant matches showing that 70.33% these having sequence similarity from 80 to 100%; (c) percent distribution of HQ unigenes on the basis of their E-values; (d) species distribution showing percentage of the HQ unigenes (query) sequence with significant matches against different species with maximum (31.70%) of these were having top BLAST hits against Glycine max.
Figure 2Functional classification of unigenes based on GO terms, showing GO category distribution of unigenes at GO level 2 into 3 categories: Biological Process, Molecular Function and Cellular Component.
Figure 3Result showing top 5 hits against interproscan repeat, family, domain and Pfam database.
Figure 4Tissue specific expression of C. tetragonoloba. (a) Expressed genes (FPKM ≥ 1) in 3 sample tissues; (b) expression of genes at different folds, with at each fold level of 5 or higher FPKM value, genes were found expressed higher in flower tissue sample (reproductive stage) as compared to tissue of vegetative stage (shoot and leaf); (c) differentially expressed genes (DEGs) vs. samples Heatmap showing cluster analysis of 38,423 differentially expressed genes for tissue-specific expression in all the 3 tissues. DEGs partitioned into 10 gene clusters with similar expression patterns with genes in each cluster ranging from 3755 to 3940. Color scale representing normalized expression values (left-top).
Statistics of simple sequence sepeats (SSRs) identified by MISA.
| Features | Transcripts | HQ Unigenes |
|---|---|---|
| Total number of sequences examined | 127,706 | 48,007 |
| Total size of examined sequences (bp) | 179,507,503 | 76,015,970 |
| Total number of identified SSRs | 17,593 | 8687 |
| Number of SSR containing sequences | 14,566 | 7047 |
| Number of sequences containing more than 1 SSR | 2430 | 1297 |
| Number of SSRs present in compound formation | 1137 | 590 |
| Frequency of SSRs | 1 SSR/10.20 kb | 1 SSR/8.75 kb |
Figure 5Validation of randomly selected simple sequence repeats from HQ unigenes. (a) Banding pattern of SSR primers’ amplification on genomic DNA of 21 varieties of C. tetragonoloba; (b) genetic relationship among the 21 clusterbean accessions as revealed by the UPGMA method in the Numerical Taxonomy System (NTSYS-pc) ver. 2.1