| Literature DB >> 35154250 |
Preeti Madhav Kute1,2, Omar Soukarieh3, Håkon Tjeldnes1, David-Alexandre Trégouët3, Eivind Valen1,2.
Abstract
Advances in genomics and molecular biology have revealed an abundance of small open reading frames (sORFs) across all types of transcripts. While these sORFs are often assumed to be non-functional, many have been implicated in physiological functions and a significant number of sORFs have been described in human diseases. Thus, sORFs may represent a hidden repository of functional elements that could serve as therapeutic targets. Unlike protein-coding genes, it is not necessarily the encoded peptide of an sORF that enacts its function, sometimes simply the act of translating an sORF might have a regulatory role. Indeed, the most studied sORFs are located in the 5'UTRs of coding transcripts and can have a regulatory impact on the translation of the downstream protein-coding sequence. However, sORFs have also been abundantly identified in non-coding RNAs including lncRNAs, circular RNAs and ribosomal RNAs suggesting that sORFs may be diverse in function. Of the many different experimental methods used to discover sORFs, the most commonly used are ribosome profiling and mass spectrometry. These can confirm interactions between transcripts and ribosomes and the production of a peptide, respectively. Extensions to ribosome profiling, which also capture scanning ribosomes, have further made it possible to see how sORFs impact the translation initiation of mRNAs. While high-throughput techniques have made the identification of sORFs less difficult, defining their function, if any, is typically more challenging. Together, the abundance and potential function of many of these sORFs argues for the necessity of including sORFs in gene annotations and systematically characterizing these to understand their potential functional roles. In this review, we will focus on the high-throughput methods used in the detection and characterization of sORFs and discuss techniques for validation and functional characterization.Entities:
Keywords: SEPs; computational tools; mass spectrometry; ribosome profiling; sORFs
Year: 2022 PMID: 35154250 PMCID: PMC8831751 DOI: 10.3389/fgene.2021.796060
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Examples of small ORFs in coding (A) and non-coding (B) transcripts. Start and Stop indicate the initiation and termination sites of the coding sequence (CDS). uORF, upstream open reading frame fully located in the 5′UTR; uStart, upstream start site; uStop, upstream stop site; uoORF, upstream overlapping open reading frame; intStart, internal start site; intORF, internal open reading frame; intStop, internal stop site; dStart, downstream Start site; dStop, downstream stop site; sORF, small open reading frame; lncRNA, long non-coding RNA; circRNA, circular RNA.
Transcript features defining the regulatory role of upORF. upORF, upstream open reading frame; uORF, fully upstream ORF; uoORF, overlapping upORF.
| Feature | Comment(s) | Reference(s) |
|---|---|---|
| Secondary structures | Hairpin structures can function as inhibitors of translation initiation |
|
| The Kozak consensus sequence | Initially, the optimal Kozak sequence to initiate the translation was defined by a purine (R) at position -3 and a G in position +4 surrounding the translation initiation site (GCCRCCAUGG). However, recent Ribo-seq studies have shown that the optimal Kozak sequence could be different from the initially defined one, as shown in zebrafish by | ( |
| Positioning of upORFs within the 5′UTR | overlapping upORFs are more often associated with repression of the main protein levels than non-overlapping upORF |
|
| Number of upORF | More upORF generally leads to more translational repression |
|
| Length of upORF | Longer upORF is correlated with greater translational repression |
|
| Termination context of the upORF | The nucleotide context surrounding the uORF stop codon can affect translation reinitiation | ( |
FIGURE 2Overview of the commonly used techniques to identify and characterize sORFs and their encoded peptides. Novel sORFs and their products can be detected by the prediction algorithms using bioinformatic approaches, by generating peptide databases using improved mass spectrometry-based assays and by using ribosome profiling and related sequencing techniques to obtain translationally active transcripts. The predicted SEPs can be validated by various assays such as reporter-based overexpression, epitope tagging etc. Loss of function assays could be done to assess the cellular function of these SEPs.
FIGURE 3Profiling and sequencing of translating transcripts. A254 profiles shown before (A) and after digestion with ribonucleases (B,C). The fractions used for further processing are highlighted, polysomes in purple, 80S in orange and 40S in green. (D) The process of library preparation for next generation sequencing. Size selection of ∼30 nt is done for ribosome profiling and ribosome complex profiling sequencing and libraries are prepared from the size selected small RNAs, whereas for polysome profiling, libraries are prepared from total RNA. Meta-coverage shown for reads obtained from polysome profiling sequencing (E), for ribosome profiling (F) and for ribosome complex profiling [(G) top: 40S, bottom: 80S].
Studies detecting SEPs through transcriptomic and/or mass spectrometry techniques.
| Species | Technique | Number of SEPs discovered | Reference |
|---|---|---|---|
| Human | MS of HLA-I complexes | 240 |
|
| Human | MS of HLA-I complexes and Ribo-seq | >500 |
|
| Human | MS of HLA-I complexes and Ribo-seq | 320 |
|
| Human | MS | 1 |
|
| Human | MS and RNA-seq | >100 |
|
| Human | MS and RNA-seq | 311 |
|
| Human | MS and RNA-seq | 90 |
|
| Human | MS | 197 |
|
| Mouse | Ribo-seq and MS | 1 |
|
| Mouse | MS | 4 |
|
| Zebrafish | Ribo-seq and MS | 1 |
|
| Zebrafish | Ribo-seq and MS | 1 |
|
Overview of the computational tools aiding in the prediction of sORFs.
| Method | Features utilized | Input requirement | Output dataset | Reference and links |
|---|---|---|---|---|
| Sequence-based prediction tools | ||||
| CPC2 | Nucleotide composition, sequence similarity | RNA-seq | Coding potential of especially lncRNAs | ( |
| micPDP | Codon conservation | RNA-seq | sORF detection from non-coding RNA |
|
| PhyloCSF | Codon substitution | RNA-seq | Coding potential | ( |
| PhastCons | Nucleotide composition | Whole genome | Conserved elements, especially signatures outside a protein-coding region | ( |
| sORF finder | Nucleotide composition similarity | Any nucleotide sequence | sORFs | ( |
| Ribosome profiling-based tools | ||||
| FLOSS | Ribosome fragment length | Ribo-seq | True ribosome footprints |
|
| ORFscore | 3-nt periodicity | Ribo-seq | Ribo-seq ORFs |
|
| ORFquant | 3-nt periodicity, transcript features such as exonic bins and splice junctions | Ribo-seq | Ribo-seq ORFs on multiple transcript isoforms | ( |
| ORF-RATER | Read density over start and stop codons | Ribo-seq | Ribo-seq ORFs | ( |
| RiboTaper | 3-nt periodicity | Ribo-seq, RNA-seq | Ribo-seq ORFs | ( |
| RiboNT | 3-nt periodicity (noise tolerant), codon usage | Ribo-seq | Ribo-seq ORFs | ( |
| Ribotricer | 3-nt periodicity | Ribo-seq | Ribo-seq ORFs, especially sORFs | ( |
| RRS | Read density drop after stop codon | Ribo-seq | Ribo-seq ORFs |
|
| SPECtre | 3-nt periodicity | Ribo-seq | Ribo-seq ORFs | ( |
| TOC | Ribosome footprint patterns | Ribo-seq | Ribo-seq ORFs |
|
| PROTEOFORMER | 3-nt periodicity, Mass spec hits | Ribo-seq, Mass spec | Ribo-seq ORFs, MS ORFs | ( |