| Literature DB >> 25505007 |
Yuko Makita1, Setsuko Shimada1, Mika Kawashima1, Tomoko Kondou-Kuriyama1, Tetsuro Toyoda2, Minami Matsui3.
Abstract
In transcriptome analysis, accurate annotation of each transcriptional unit and its expression profile is essential. A full-length cDNA (FL-cDNA) collection facilitates the refinement of transcriptional annotation, and accurate transcription start sites help to unravel transcriptional regulation. We constructed a normalized FL-cDNA library from eight growth stages of aerial tissues in Sorghum bicolor and isolated 37,607 clones. These clones were Sanger sequenced from the 5' and/or 3' ends and in total 38,981 high-quality expressed sequence tags (ESTs) were obtained. About one-third of the transcripts of known genes were captured as FL-cDNA clone resources. In addition to these, we also annotated 272 novel genes, 323 antisense transcripts and 1,672 candidate isoforms. These clones are available from the RIKEN Bioresource Center. After obtaining accurate annotation of transcriptional units, we performed expression profile analysis. We carried out spikelet-, seed- and stem-specific RNA sequencing (RNA-Seq) analysis and confirmed the expression of 70.6% of the newly identified genes. We also downloaded 23 sorghum RNA-Seq samples that are publicly available and these are shown on a genome browser together with our original FL-cDNA and RNA-Seq data. Using our original and publicly available data, we made an expression profile of each gene and identified the top 20 genes with the most similar expression. In addition, we visualized their relationships in gene co-expression networks. Users can access and compare various transcriptome data from S, bicolor at http://sorghum.riken.jp.Entities:
Keywords: Database; FL-cDNA; NGS; New transcript; Plant; Sorghum
Mesh:
Substances:
Year: 2014 PMID: 25505007 PMCID: PMC4301747 DOI: 10.1093/pcp/pcu187
Source DB: PubMed Journal: Plant Cell Physiol ISSN: 0032-0781 Impact factor: 4.927
Sampling tissue and stage details for FL-cDNA and RNA-Seq data
| Category | Sample name | Stage |
|---|---|---|
| FL-cDNA | Aerial tissues 1 | 7 d after sowing |
| Aerial tissues 2 | 14 d after sowing | |
| Aerial tissues 3 | 30 d after sowing | |
| Aerial tissues 4 | 60 d after sowing | |
| Aerial tissues 5 | 90 d after sowing | |
| Aerial tissues 6 | 150 d after sowing (at the time of anthesis) | |
| Aerial tissues 7 | 165 d after sowing | |
| Aerial tissues 8 | 180 d after sowing | |
| RNA-Seq | Spikelet | 150 d after sowing (at the time of anthesis) |
| Seed | 165 d after sowing | |
| Stem | 150 d after sowing | |
Aerial tissues contain leaves, stems and panicles.
FL-cDNA sequence resources in S. bicolor
| Category | No. |
|---|---|
| Partial full-length cDNA sequences | 38,981 |
| Sanger 5′ ESTs | 37,607 |
| Sanger 3′ ESTs | 1,374 |
| Total sequences mapped onto the genome | 36,700 |
| No. of genes (loci) annotated by our data | 10,811 |
| Overlapped known | 9,566 |
| Partially overlapped known genes | 650 |
| Unknown (newly identified) | 272 |
| Antisense transcripts | 323 |
| Full-length cDNA reached from both ends (contigs) | 814 |
| Full-length cDNA reached from both ends (genes) | 255 |
Fig. 1(A) Distance from our identified transcription start site (TSS) to the nearest transcription start sites in the Sbiclor_255 annotation. (B) Distance from our identified transcription start site to the translation start site (ATG).
Number of gene model updates by the PASA pipeline using 242,797 ESTs
| Category | No. |
|---|---|
| UTR extension | 18,137 |
| Altered protein sequences | 309 |
| Stitched into gene structure | 274 |
| Merging multiple genes | 29 |
| Total | 18,601 |
Some models are in multiple classes.
Summary of overlapped genes between FL-cDNA and RNA-Seq data
| No. of detected genes with FL-cDNA | No. of expressed genes in RNA-Seq | No. of shared genes with RNA-Seq and FL-cDNA | |
|---|---|---|---|
| Known genes | 9,837 | 22,824 | 9,326 (94.8%) |
| Newly detected genes | 272 | 2,592 | 192 (70.6%) |
| Antisense | 323 | 223 | 53 (16.4%) |
The values in parentheses are the percentage of overlapped expressed genes in both FL-cDNA and RNA-Seq.
Known genes include partially overlapped transcripts.
Fig. 2Venn diagram showing the tissue-specific gene expression profiling in spikelets, seeds and stems. In this figure, over five FPKM values are regarded as expressed, and less than one FPKM value is considered as non-expressed.
Fig. 3The web interface for the MOROKOSHI database. (A) Search function, retrieve with a keyword of ‘starch’ and its result page. (B) Gene annotation for the Sobic.004G163700 gene from a variety of public databases. (C) Orthologous genes in Arabidopsis, rice, corn, Brachypodium, barley, wheat and Populus. (D) Mapping result of FL-cDNA clones and their raw sequence data. (E) Expression profile of Sobic.004G163700 using 26 samples of RNA-Seq data and their mapping results on GBrowse. (F) Up to 20 genes with expression most similar to Sobic.004G163700. (G) Gene co-expression network of Sobic.004G163700 and similarly expressed genes.