| Literature DB >> 30567971 |
Sjors van der Horst1, Berend Snel2, Johannes Hanson3,4, Sjef Smeekens1.
Abstract
Eukaryotic mRNAs contain a 5' leader sequence preceding the main open reading frame (mORF) and, depending on the species, 20%-50% of eukaryotic mRNAs harbor an upstream ORF (uORF) in the 5' leader. An unknown fraction of these uORFs encode sequence conserved peptides (conserved peptide uORFs, CPuORFs). Experimentally validated CPuORFs demonstrated to regulate the translation of downstream mORFs often do so in a metabolite concentration-dependent manner. Previous research has shown that most CPuORFs possess a start codon context suboptimal for translation initiation, which turns out to be favorable for translational regulation. The suboptimal initiation context may even include non-AUG start codons, which makes CPuORFs hard to predict. For this reason, we developed a novel pipeline to identify CPuORFs unbiased of start codon using well-annotated sequence data from 31 eudicot plant species and rice. Our new pipeline was able to identify 29 novel Arabidopsis thaliana (Arabidopsis) CPuORFs, conserved across a wide variety of eudicot species of which 15 do not initiate with an AUG start codon. In addition to CPuORFs, the pipeline was able to find 14 conserved coding regions directly upstream and in frame with the mORF, which likely initiate translation on a non-AUG start codon. Altogether, our pipeline identified highly conserved coding regions in the 5' leaders of Arabidopsis transcripts, including in genes with proven functional importance such as LHY, a key regulator of the circadian clock, and the RAPTOR1 subunit of the target of rapamycin (TOR) kinase.Entities:
Keywords: 5′-UTR; translation; translational initiation; translational stalling; uORF
Mesh:
Substances:
Year: 2018 PMID: 30567971 PMCID: PMC6380273 DOI: 10.1261/rna.067983.118
Source DB: PubMed Journal: RNA ISSN: 1355-8382 Impact factor: 4.942
FIGURE 1.(A) Schematic overview of the pipeline used to identify upstream conserved coding regions. (B) Species tree indicating species whose genomic data were used in this study. Branch lengths were retrieved from TimeTree.org (Hedges et al. 2015). (*) No data were present for Capsella grandisflora (Hedges et al. 2015).
Summary of discovered uCCRs and their conservation
Novel uCCRs discovered with high confidence
FIGURE 2.Start codon context of all highly conserved CPuORFs with an AUG start codon (left) and their downstream mORFs discovered in this study (right). Logos were created using weblogo.berkeley.edu.
FIGURE 3.Ribosome footprint data confirms translation of uCCRs. (A,C,E) RNA-sequencing results from Merchante et al. (2015), from total RNA (top) or ribosome footprints (middle), and the mRNA architecture (bottom), where the thicker bars represent the ORFs with the discovered uCCR in red, the thinner bars indicate other regions on the transcript, and the lines represent introns from three different genes: AT5G36250 (A,B), AT4G03260 (C,D), and AT1G01060 (LHY) (E,F). The red bar in panel E indicates a uCCR that is out of frame with the mORF. (B,D,F) Sequence alignments of the discovered uCCRs, aligned using MAFFT v. 7.307 (FFT-NS-2) and displayed using Jalview v. 2.10. See Supplemental Table S2 for species abbreviations.
FIGURE 4.Overview of the types of upstream conserved coding regions discovered by our pipeline compared to previous studies. (*) Vaughn et al. (2012) searched for nucleotide conservation and indirectly discovered some uCCRs. (**) Simpson et al. (2010) only searched for 5′ extended mORFs initiating on a CUG start codon with a guanine at +4 position.