| Literature DB >> 23884177 |
Claire Toffano-Nioche1, Alban Ott, Estelle Crozat, An N Nguyen, Matthias Zytnicki, Fabrice Leclerc, Patrick Forterre, Philippe Bouloc, Daniel Gautheret.
Abstract
The non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi is investigated using the RNA-seq technology. A dedicated computational pipeline analyzes RNA-seq reads and prior genome annotation to identify small RNAs, untranslated regions of mRNAs, and cis-encoded antisense transcripts. Unlike other archaea, such as Sulfolobus and Halobacteriales, P. abyssi produces few leaderless mRNA transcripts. Antisense transcription is widespread (215 transcripts) and targets protein-coding genes that are less conserved than average genes. We identify at least three novel H/ACA-like guide RNAs among the newly characterized non-coding RNAs. Long 5' UTRs in mRNAs of ribosomal proteins and amino-acid biosynthesis genes strongly suggest the presence of cis-regulatory leaders in these mRNAs. We selected a high-interest subset of non-coding RNAs based on their strong promoters, high GC-content, phylogenetic conservation, or abundance. Some of the novel small RNAs and long 5' UTRs display high GC contents, suggesting unknown structural RNA functions. However, we were surprised to observe that most of the high-interest RNAs are AU-rich, which suggests an absence of stable secondary structure in the high-temperature environment of P. abyssi. Yet, these transcripts display other hallmarks of functionality, such as high expression or high conservation, which leads us to consider possible RNA functions that do not require extensive secondary structure.Entities:
Keywords: archaea; hyperthermophile; non-coding RNA; transcriptome
Mesh:
Substances:
Year: 2013 PMID: 23884177 PMCID: PMC3849170 DOI: 10.4161/rna.25567
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Consensus promoter motifs. Each frame shows the best ranking sequence motif identified by a MEME search performed on the 50 nt upstream region of: (A) CDS annotations; (B) RNA-seq-based transcription units; (C) long 5′UTRs, (D) sRNAs, (E) asRNAs. Number of occurrences, MEME P value and number of sites are given for each motif. Motif coordinates are numbered from the first base of the RNA-seq transcription unit (B‒D) or from the ATG start codon (B) and correspond to the dominant motif location. No dominant location was found for the asRNA motif (E).
Table 1. RNA abundance by class
| RNA class | Number of elements | Total number of reads | Median RPKM |
|---|---|---|---|
| rRNA | 5 | 32,727,463 | 14,818 |
| tRNA | 46 | 33,414 | 264 |
| Long 5′ UTR | 98 | 172,014 | 185 |
| sRNA | 107 | 84,797 | 63 |
| Antisense RNA | 215 | 13,253 | 43 |
| CDS | 1,893 | 5,261,317 | 164 |

Figure 2. Characteristics of ncRNAs identified by RNA-seq. (A) Numbers of known RNAs vs. novel RNAs. Sources for known ncRNAs: NCBI features, RFAM, and studies from Klein et al. and Phok et al. classes are ranked in order of decreasing confidence starting from experimentally validated RNAs (from left to right on the bar plot). When the known ncRNA is from P. abyssi, a minimum overlap of 25 nt with the known RNA and the new candidate is required to assign the candidate to a class. When the known ncRNA is from another Archaeal species, a BLASTN sequence conservation with a minimum BLASTN bit score of 42 is required to assign the candidate to a class. When a candidate appears in several classes, it is counted only in the class with highest confidence. (B) Conservation of ncRNA classes. At each taxonomic level, the histogram shows the fraction of elements conserved up to, and not deeper than, this taxonomic level (see Materials and Methods). Elements shown include the three ncRNA classes (107 sRNAs, 98 long 5′UTRs, 215 asRNAs), CDS, antisense-associated CDS (CDS-asRNA), and intergenic region. To avoid conservation bias due to size differences, conservation for CDS, CDS-asRNA, and intergenic region were obtained on randomly sampled fragments from CDS, CDS-asRNA, and intergenic regions with the same size distribution as ncRNA, asRNA, and ncRNA, respectively.

Figure 3. (A) Distribution of 107 sRNAs as a function of GC% and RPKM. Unknown RNAs are dark blue, known RNAs are colored as in legend. (B) GC-contents of transcript classes. Sequences were extracted following annotations (CDS, tRNA, rRNA, snoRNA) or RNA-seq analysis (sRNA and long 5′UTR).

Figure 4. New H/ACA gene candidates (PabO1, PabO48, PabO78) corresponding to H/ACA- or H/ACA-like motifs identified in RNA-seq ncRNAs. The 2D structure representations were generated using VARNA - version 3.9 (Darty et al., 2009). The guide RNA candidates include the annotation of the K-turn or K-loop motif (green). The RNA targets (red) are represented as paired to the guide RNA in the internal loop of the H/ACA motif or in the free ends of the H/ACA-like motif. The coding strand is indicated for all targets by a (+) or (-) symbol. Targets located on the non-coding strand with respect to the CDS, are noted “strand (-).” The CRISPR 2 (+) target follows the nomenclature adopted by Phok et al. All targets shown are expressed at some degree in the genome.