| Literature DB >> 25441755 |
Alessandro Sardu, Laura Treu, Stefano Campanaro1.
Abstract
BACKGROUND: RNA-seq studies have an important role for both large-scale analysis of gene expression and for transcriptome reconstruction. However, the lack of software specifically developed for the analysis of the transcriptome structure in lower eukaryotes, has so far limited the comparative studies among different species and strains.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25441755 PMCID: PMC4302112 DOI: 10.1186/1471-2164-15-1045
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schematic representation of the transcriptome assembly process performed by ORA. The circle indicates the gaps located between reference-based blocks.
Comparison among different methods used for transcriptome reconstruction
| ORA (SOLiD) | Cufflinks (SOLiD) | Tiling arrays | 5’-RACE | Illumina | |||
|---|---|---|---|---|---|---|---|
|
| |||||||
|
|
|
| nd* | (1753/2473) 71% * | (446/615) 72% * | (1712/2298) 75% * |
|
|
| nd ^ |
| (1836/2903) 63% * | (586/885) 66% * | (1750/2656) 66% * | ||
|
| (860/1336) 64% ^ | (1882/3149) 60% ^ |
| (1039/1281) 81% * | (3092/4180) 74% * | ||
|
| (488/721) 68% ^ | (572/974) 59% ^ | (1039/1281) 81% ^ |
| (786/1009) 78% * | ||
|
| (1784/2739) 65% ^ | (1599/2906) 55% ^ | (3092/4180) 74% ^ | (786/1009) 78% ^ |
| ||
|
| |||||||
|
|
|
| nd* | (1125/2473) 45% * | nd * | (1348/2527) 53% * |
|
|
| nd ^ |
| (1293/2903) 45% * | nd * | (1356/2923) 46% * | ||
|
| (643/1336) 48% ^ | (1491/3149) 47% ^ |
| nd * | (2774/4551) 61% * | ||
|
| nd ^ | nd ^ | nd ^ |
| nd * | ||
|
| (1609/3040) 53% ^ | (1621/3214) 50% ^ | (2774/4551) 61% ^ | nd ^ |
|
Percentages of 5’-UTRs and 3’-UTRs regions determined using different methods and software and having length differences of < = 50 bases. For each comparison, the number of UTR regions with a length difference of < = 50 bases and the total number of UTRs identified with both methods are shown in parenthesis. “SOLiD ORA” refers to the transcripts determined from our experiment using ORA, “Tiling arrays” refers to the data reported by Xu and colleagues [10], “5’-RACE” and “Illumina sequencing” refers to data reported by Nagalakshmi and colleagues [9] and “SOLiD Cufflinks” refers to the transcripts reported in our experiments and analyzed using Cufflinks [20]. In the top-right half of both matrices are reported 6 g/l results (marked using * symbol), in the bottom-left half of both matrices are reported 45 g/l results (marked using ^ symbol).
Figure 2Comparison between UTR sizes predicted using different methods. (a) Comparison between the 5’-UTR size predicted by ORA and 5’-RACE in the S288c strain. Positive values indicate transcripts with larger 5’-UTR size in the prediction obtained with ORA. (b) Comparison between the 3’-UTR size obtained with ORA and the tiling arrays (S288c strain). Positive values indicate transcripts with larger 3’-UTRs in the prediction obtained using ORA. Note the slight underestimation of the 3’-UTR size obtained using ORA. (c) Histogram reporting the difference between the length of the 5’-UTR in S288c predicted by Cufflinks and by 5’-RACE. Positive values indicate a larger 5’-UTR determined by Cufflinks. Note the slight overestimation of the 5’-UTR size obtained using Cufflinks.
Figure 3Transcripts predicted in a region of chr IV (strain S288c) comprised between ~270.600 bp and ~319,000 bp. From top to bottom: coverage on the forward strand, coverage on the reverse strand, genes (protein-encoding regions) (Genes), reconstruction of the transcripts obtained with ORA (ORA) and with Cufflinks (Cufflinks). In the row reporting the predictions of ORA, the introns are colored in red. Red numbers indicate key differences in transcript reconstruction between the two software: (1) transcripts formed by multiple “blocks” in the reconstruction with Cufflinks which are determined by the presence of gaps with no coverage in the coding region, (2) adjacent genes joined in polycistronic transcripts by Cufflinks despite large coverage differences.
Figure 4Two examples of the transcript structure obtained in the reference strain S288c and vineyard strains EC1118 and P283. (a) Transcript reconstruction of the gene YBR249C (ARO4, 3-deoxy-D-arabino-heptulosonate-7-phosphate (DAHP) synthase) at 6 g/l. (b) The transcript of the gene YLR304C (ACO1; aconitase, required for the TCA cycle) at 45 g/l. The red and blue rods indicate the end of the UTR region and transcript, respectively. The y axis reports the coverage, while the x axis shows the relative position. In both examples, the genes are encoded in the reverse strand, and consequently the 5’-UTR is on the right part of the graph.
Selected results obtained from GO analysis
| GO category | Characteristics of the category | Genes |
|
|---|---|---|---|
| GO:0006790 - Sulfur compound metabolic process | Variable 5’-UTR in |
| 1.5*10−5 (6 g/l) |
| GO:0000947 - Amino acid catabolic process to alcohol via Ehrlich pathway | Variable 5’-UTR in |
| 6.4*10−4 (6 g/l) |
| GO:0016125 - Sterol metabolic process | Highly conserved 5’-UTR in |
| 6.1*10−6 (6 g/l) |
| GO:0016125 - Sterol metabolic process | Variable 5’-UTR in |
| 1.1*10−3 (6 g/l) |
| GO:0055085 - Transmembrane transport | Conserved SAUT among strains at 6 g/l |
| 0.0052 (6 g/l) |
| GO:0031505 - Fungal-type cell wall organization | Conserved SAUT among |
| 0.0059 (45 g/l) |
| GO:0006820 - Anion transport | Conserved SAUT among |
| 0.019 (45 g/l) |
| GO:0022413 - Reproductive process in single cell organisms | Conserved SAUTs among |
| 1.66*10−3 |
Relevant results obtained from analysis of the enrichment of genes involved in selected GO processes. Enrichment was calculated with respect to the entire set of S. cerevisiae genes using YeastMine and the p-value is reported on the rightmost column (http://yeastmine.yeastgenome.org/yeastmine/begin.do).
Figure 5Coverage (a) and length (b) of six classes of transcripts identified by ORA in the S288c strain at 6 g/l. From left to right are reported: transcripts encoding proteins (prot. encod.), non-coding transcripts localized in antisense to other genes (mainly protein-encoding) (SAUT), non-coding transcripts localized in intergenic regions (SUT), tRNAs, other non-coding RNAs (mainly small nuclear RNAs) and ncRNAs localized in intronic regions. The number of transcripts identified for each class in the S288c strain at 6 g/l is shown in (a).
Protein-encoding genes with a SAUT in the antisense strand
| Gene systematic name | Gene standard name | Gene name |
|---|---|---|
| YPR194C |
| Oligopeptide Transporter |
| YDL129W |
| - |
| YOR042W |
| Coupling of Ubiquitin conjugation to ER degradation |
| YOR040W |
| Glyoxalase |
| YNR002C |
| Ammonia (Ammonium) Transport Outward |
| YNL279W |
| Pheromone-Regulated Membrane protein |
| YKL151C |
| - |
| YJR129C |
| - |
| YDR242W |
| Amidase |
| YDR124W |
| - |
| YBR033W |
| Expression Dependent on Slt2 |
| YML066C |
| Spore Membrane Assembly |
| YKL187C |
| Fatty acid transporter 3 |
| YHR177W |
| Regulator Of Fluffy |
| YGL224C |
| Suppressor of Disruption of TFIIS |
| YDR222W |
| - |
| YCR045C |
| Regulator of rDNA Transcription |
| YPL021W |
| ExtraCellular Mutant |
| YMR182C |
| - |
| YML118W |
| - |
| YLR341W |
| Sporulation |
| YKR102W |
| Flocculation |
| YGL251C |
| Helicase Family Member |
| YGL059W |
| Protein Kinase of PDH |
| YER176W |
| ExtraCellular Mutant |
Protein-encoding genes with a SAUT in the antisense strand in all the species of the Saccharomyces genus analyzed (S. cerevisiae, S. bayanus, S. paradoxus, S. mikatae).
Figure 6Coverage profiles on forward and reverse strands for six selected genes. Genes reported in figure belong to the GO categories “reproductive process in single-celled organism” (RRT12, SMA2, SPO77), “sporulation resulting in formation of a cellular spore” (IME4) and “fungal-type cell wall” (DSE2). SAUTs conserved in all the Saccharomyces species analyzed (indicated by red boxes) were found in all the genes except IME4 and DSE2. An inverse correlation in gene expression between the protein-encoding transcript and the SAUT is highlighted by red/green arrows and was previously demonstrated for IME4[15].