Literature DB >> 26500431

small ORFs: A new class of essential genes for development.

João Paulo Albuquerque1, Vitória Tobias-Santos1, Aline Cáceres Rodrigues1, Flávia Borges Mury2, Rodrigo Nunes da Fonseca2.   

Abstract

Genes that contain small open reading frames (smORFs) constitute a new group of eukaryotic genes and are expected to represent 5% of the Drosophila melanogaster transcribed genes. In this review we provide a historical perspective of their recent discovery, describe their general mechanism and discuss the importance of smORFs for future genomic and transcriptomic studies. Finally, we discuss the biological role of the most studied smORF so far, the Mlpt/Pri/Tal gene in arthropods. The pleiotropic action of Mlpt/Pri/Tal in D. melanogaster suggests a complex evolutionary scenario that can be used to understand the origins, evolution and integration of smORFs into complex gene regulatory networks.

Entities:  

Keywords:  Drosophila; Tribolium; mlpt; pri; tarsal-less

Year:  2015        PMID: 26500431      PMCID: PMC4612599          DOI: 10.1590/S1415-475738320150009

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Historical Perspective on the Discovery of small Open Reading Frames (smORFs)

Our knowledge of genome sequence, size and gene content has increased with the availability of new DNA sequencing technologies. This huge amount of data has opened new avenues for the development of bioinformatics. Bioinformatic prediction methods have been used to estimate the gene numbers of several eukaryotes, which vary considerably across groups. Estimations of gene contents do not support the theory that an increase in complexity can be associated with an increase in gene number. For example, a sponge genome contains more putative genes than a human genome does (Srivastava ). However, transcriptome data have shown that half of the transcripts in mammalian genes are classified as non-coding RNAs (ncRNAs) because they do not contain large Open Reading Frames (ORFs) (e.g., Ota ). In general, gene prediction methods attempt to identify intrinsic features in DNA sequences that are characteristics of exons, such as ORFs and Kozak consensus sequences (Figure 1). Traditionally, in silico approaches have considered a lower limit of 300 nucleotides or 100 amino acids for an ORF to be annotated as a putative exon. This in silico approach excludes small ORFs (smORFs) with fewer than 100 amino acids that might be biologically active. One of the first hints that smORFs might display a biological function was obtained by Kessler . These authors identified new genes in S. cerevisiae by simply BLAST-searching potential budding yeast ORF products against sequences from other fungal and non-fungal species. Based on the hypothesis that conserved genes are functional, they found strong evidence for close to 100 new smORF genes in the S. cerevisiae genome (Kessler ). Later, using functional genomics techniques, such as EST sequencing and mutant analysis, Kastenmayer provided evidence for the existence of 299 smORFs; this figure represents approximately 5% of the annotated ORFs in S. cerevisiae. Furthermore, Kastenmayer showed by specific gene deletion that 21 smORFs (∼8%) are essential for S. cerevisaeviability. These smORFs are implicated in key cellular processes such as transport, intermediate metabolism and genome stability. Most importantly, at least some of the smORFs can be expressed and translated as peptides (Kastenmayer ). Because several smORFs were shown to be conserved among fungi and higher eukaryotes (Kastenmayer ), it remains to be investigated whether smORFs play a role during metazoan development.
Figure 1

Scheme of the general method for the identification of smORFs in different related species. Based on primary data and schemes from Kessler and Ladoukakis . smORF prediction is based on detection and filtering. The filtering process is important to reduce the false positive rate and increase the efficacy of functional smORFs estimation.

smORFs During Metazoan Development

Hormones and neuropeptides are considered the best examples of bioactive molecules of low molecular weight. They are transcribed as large mRNAs that are then processed into small peptides via post-translational mechanisms and proteolysis because they contain signal sequences at their N-termini. After processing in the ER and Golgi apparatus, hormones or neuropeptides can signal far from their production site (Figure 2A). Recent functional genomic studies have shown a new way of generating such small bioactive peptides, including the direct translation of smORFs. More than one smORF may be present in a single transcript. Hence, eukaryotic mRNA can be polycistronic, with multiple exons containing an initiation codon within a single mRNA (Savard , Kondo , Figure 2B). After secretion, smORFs act like hormones and neuropeptides and can be defined as a novel class of small peptide genes expressed during plant and animal morphogenesis (reviewed by Hashimoto ).
Figure 2

Schematic drawings of the generation of biologically active short peptides. A similar scheme was published by Hashimoto . (A) Hormones and neuropeptides are generated via a large mRNA precursor (blue) in the nucleus, then translated by ribosomes (green) from a single initiation codon and finally processed in the ER and Golgi into small peptides, which are subsequently secreted by vesicles to act far from the production site. (B) Polycistronic smORFs (red) can be translated by several ribosomes (green) along a single mRNA, followed by cell secretion. Peptides from smORFs can also act far from the releasing cell.

As previously mentioned, several smORFs have been identified based on their conserved structure and gene expression in fungi (Kessler ; Kastenmayer ). In plants, several smORFs were identified by using genetic screening methods such as POLARIS, ROTUN-DIFOLIA4, and Enod40. These plant smORFs encode peptides that are involved in morphogenetic processes, including root formation, leaf shape control, and cortical cell division during nodule formation (reviewed by Hashimoto ). Because these smORFs have been found by unbiased genetic screenings and occupy small regions of plant genomes, it is likely that smORFs play a role in other biological processes and systems.

mille-pattes, tarsal-less and polished rice: The Same Gene Can Act in Different Developmental Contexts

smORFs may also play a major role in animal development. The first hint that a smORF was required for animal development was provided by Savard , who investigated the embryogenesis of the red flour beetle, Tribolium castaneum. In an EST screening, Savard identified mille-pattes (mlpt), a polycistronic peptide encoding four smORFs, three of which containing an LDPTGLY domain. In Tribolium, mlpt acts like a bona fide gap gene because it regulates Hox genes (Savard ). The down regulation of mlpt results in beetle larvae that have up to ten pairs of legs instead of the three pairs of legs observed in wild-type beetles. Moreover, mlpt is expressed in the thorax and at the posterior growth-zone, which is the region responsible for posterior segmentation in short-germ insects, including the beetle T. castaneum (Figure 3).
Figure 3

Evolution and functional role of Mlpt/Tal/Pri in arthropods. Several arthropods display an ortholog of Mlpt/Tal/Pri (original alignments and phylogenetic trees from Galindo and Savard ). In the short-germ embryo of the beetle Tribolium castaneum, mlpt was shown to be expressed in the legs and trachea, where it acts as a gap gene during embryogenesis (Savard ). In the long-germ embryo of the fly Drosophila melanogaster, Mlpt/Tal/Pri was shown to be involved in several processes, which are displayed in red (Chanut-Delalande ; Galindo ; Kondo , 2010; Pueyo and Couso, 2008, 2011). Notch, Svb and EcR are the known regulators of Mlpt/Tal/Pri (Chanut-Delalande ). Three unknown aspects of the evolution of Mlpt/Tal/Pri are highlighted in blue. These include the origin of the gene in arthropods, its ancestral function, and the loss of gap gene function after the split between the common ancestor of Coleoptera and Diptera. It is also possible that the gap gene function of Mlpt/Tal/Pri was independently acquired in Coleoptera.

Galindo and Kondo investigated the role of the mlpt ortholog in the fruit fly Drosophila melanogaster and provided interesting results regarding the function of these smORFs. Previously, this smORF ortholog was classified as a non-coding RNA in Drosophila because the small size of its ORFs suggested they were not translated (Tupy .) The embryonic gap gene role of mlpt turned out not to be conserved in fruit flies, and synonyms of the same gene, such as tarsal-less (tal) (Galindo ) or polished rice (pri) (Kondo ) were found expressed in a pair-rule fashion during embryogenesis, but do not regulate Hox genes in flies. Kondo also showed that priis required non-cell autonomously and is essential for the formation of the specific F-actin bundles that will form the denticles, which are the typical epidermal structures of Drosophila larvae. In addition, priis reported to function during tracheal morphogenesis (Kondo ). Interestingly, the first four small peptides, which are similar to LDPTGLY, could be translated in an in vitro assay using S2 cells, but the last and largest peptide was not translated in this system (Kondo ). In vivo rescue experiments have shown that these LDPTGLY peptides are functionally redundant and that the overexpression of one of these smORFs is able to rescue the denticle and the tracheal loss-of-function phenotype (Kondo ). In addition to the roles reported during embryogenesis by Kondo , Galindo provided evidence that tal is involved in leg patterning by demonstrating that tal is expressed at the leg imaginal discs and that tal hypomorphic mutants lack the whole tarsal region (Galindo ). These three initial studies opened new avenues into smORF research because they showed that a single smORF can be involved in several developmental contexts with apparently different biological roles during morphogenesis.

tal/pri/mlpt As a Case Study for the Biological Mechanism of a smORF

Though different biological roles for mlpt/tal/pri have been described in several developmental contexts, the mechanism by which these smORFs act during development remained unknown until quite recently. One of the first hints of the mechanism of action for mlpt/tal/pri was obtained by Kondo , who investigated the specification and differentiation of larval epidermal denticle structures in Drosophila. It was previously shown that denticle differentiation in fruit flies is controlled by the activity of the transcription factor Shavenbaby (Svb) and its downstream target genes (Mevel-Ninio ), Kondo had shown that the loss of mlpt/tal/pri does not affect the expression of Svb, suggesting that these genes belong to different pathways that activate denticle formation. Subsequently, Kondo showed that the short 11 amino acid peptide found in mlpt/tal/pri is able to trigger the terminal truncation of Svb. This truncation converts Svb from a repressor that has accumulated in nuclear foci into a nucleoplasmic activator, both in vivo during denticle formation and in vitro in Drosophila S2 cells. This result was groundbreaking because it established that smORFs may cross the cell membrane and, upon reaching the nucleus, alter the function of essential transcription factors, such as Svb. This finding showed an important and new role for mlpt/tal/pri and, by association, for smORFs. The biological functions of the mlpt/tal/pri genes have also been investigated in fruit fly leg specification, where they display two independent functions. The first function is in the determination of the presumptive tarsal region in early third instar larvae (Galindo ; Pueyo and Couso, 2008). For tarsal determination, mlpt/tal/prinon-autonomously generates a new territory of presumptive tarsal cells by defining the presence of the transcription factors Rotund (Rn) and Spineless (Ss) and the absence of Dachshund (Dac) and B (Pueyo and Couso, 2008). Importantly, this role of tal-related peptides is independent of Svb, suggesting that mlpt/tal/pri peptides interact with partners other than Svb. The second biological function of mlpt/tal/pri genes in the leg occurs later in development. During early pupal Drosophiladevelopment, Notch (N) signaling activates tal mRNA expression in stripes of cells in the distal part of each tarsal segment. Interestingly, the Tal peptides feed back on N signaling by repressing the transcription of Delta (Dl) in the tarsal joints. This feedback acts through the post-transcriptional activation of Svb in a similar manner to that described for trichomes during late embryogenesis (Mevel-Ninio ; Delon ; Sucena ). Thus, a common biological mechanism involving Notch signaling and Svb may control mlpt/pri/tal expression in several developmental contexts. Finally, recent pioneering work has implicated the Mlpt/Pri/Tal peptides as mediators of ecdysone control of development (Chanut-Delalande ). A previously uncharacterized enzyme of ecdysone biosynthesis in D. melanogaster, Glutathione S transferase E14 (GstE14) was shown to be required for mlpt/pri/talexpression. Moreover, the nuclear ecdysone receptor (EcR) was found to directly bind to the mlpt/pri/tal cis-regulatory region, which suggests a direct link between ecdysone action and mlpt/pri/tal activation (Chanut-Delalande ). Therefore, Mlpt/Pri/Tal peptides provide a molecular framework to explain how systemic hormonal signaling is able to execute different genetic programs both throughout embryonic development and post-embryonically (Chanut-Delalande ). Based on these essential roles of mlpt/pri/tal in several contexts, it is important to estimate how many other smORFs may play a role in other developmental processes. This has only recently been addressed using the new bioinformatic and molecular biology techniques described below (Ladoukakis ; Aspden ).

How Many smORFs Exist in Animal Genomes? Lessons From Fruit Flies

Quantifying how many smORFs exist within animal genomes is not trivial because the prediction methods used to identify coding sequence are biased against detecting very short open reading frames (< 100 bp) (e.g., Saeys ). In general, gene prediction methods use either a de novo approach with mathematical models that determine the probabilities for all possible intron-exon annotations in a given sequence, or a comparison to a known genome or cDNA sequences from related organisms (Ladoukakis ). smORFs that contain fewer than 100 amino acids and correspond to functional genes may not be predicted and can thus be grouped with non-functional smORFs that can occur by chance (Windsor and Mitchell-Olds, 2006). Ladoukakis used a comparative approach to investigate the smORFs of the fruit fly species D. melanogaster and D. pseudoobscura, two related species that are separated from their common ancestor by 25 to 55 million years. This investigation led to a range of between 401 and 4.561 functional smORFs in Drosophila. In fact, 401 smORFs would represent 3% of the 13,907 protein-coding genes that have been annotated as of 2011 (FlyBase release 5; as accessed in October 2011). Thus, a substantial number of biologically relevant smORFs await characterization. A detailed functional analysis of one of these candidates, the transcript encoded by the gene putative non-coding RNA003in2L (pncr003:2L), indicated that this gene contains two potentially functional smORFs of 28 and 29 amino acids in a single sequence, which led to exciting results (Magny ). pcnr003:2L regulates calcium transport and thus influences regular muscle contraction in the Drosophila heart (Magny ). In contrast to the mlpt/pri/tal peptides, which are small and do not display a clear secondary structure, such as a alpha-helix or beta-sheet, pncr003:2L peptides have a predicted helical structure. Searches for a structural homolog have identified two paralogs in the human genome, sarcolamban (Scl) and phospholambam (Pcl), which both contain two smORFs of 30 amino acids that are similar to pncr003:2L. Functional analysis of these human homologs indicates that they play a conserved role in calcium trafficking, particularly in regulating the activity of the sarco-endoplasmic reticulum Ca2+ adenosine triphosphatase (SERCA) enzyme. Thus, these smORF peptides are required for regular muscle contraction in humans and Drosophila (Magny ).

Future Directions and Open Questions in smORF Research

Although considerable progress has been made in smORF research over the past few years, as highlighted by this review, several questions remain open. First, it is not known how many smORFs are important for developmental processes. Sequence analysis has shown that hundreds of smORFs are conserved among Drosophila species, suggesting that a large number of smORFs are functional (Ladoukakis ). As 401 of these conserved smORFs are also expressed during Drosophila embryogenesis, it is likely that these smORFs are functional, because high expression levels are suggestive of a functional role. It is expected that several other smORFs will have their function analyzed, in addition to mlpt/pri/tal (Galindo ; Kondo ; Savard ) and pncr003:2L (Magny ). A new promising technique, Poly-Ribo-Seq, was recently applied in the experimental validation and discovery of new smORFs (Aspden ). Briefly, Poly-Ribo-Seq requires polysome isolation for the determination of the sequence bound by each of the ribosomes. Polysomes are clusters of multiple ribosomes that are bound to mRNA during translation. The Poly-Ribo-Seq approach thus reduces the number of false positives and doubled the number of annotated smORFs in DrosophilaS2 cells (Aspden ), thereby increasing the evidence of translation from 107 to 228 smORFs. By using this approach, 700 functional smORFs were estimated within the Drosophila genome by Aspden . Recently, Lu synthesized ten bioactive peptides from the smORFs found in the genome of the ascidian Ciona intestinalis and tested them as potential antimicrobial peptides (AMPs). Five of these peptides were active against bacterial strains, suggesting that they may act as antimicrobial peptides (AMPs) in ascidians. Thus, it is possible that clusters of smORFs are activated upon infection in a fast response and release. What other open questions exist in smORF research? One of the most important regards the evolution of the most described smORF, mlpt/pri/tal in arthropods (Savard ; Galindo ; Kondo ). Because mlpt/pri/tal is involved in several biological processes, such as early patterning, trichome, tracheal patterning and leg patterning, and was recently shown to be involved in metamorphosis (Chanut-Delalande ), it will be important to investigate the evolutionary origin and ancestral role of this smORF (Figure 3). Is mlpt/pri/tal involved in all of these biological processes also in hemimetabolous insects and other arthropods? Evolutionary studies on mlpt/pri/tal have the potential to contribute to the discussion about the interaction between genetic developmental control and the environment, the so-called Eco-Evo-Devo field of knowledge (Abouheif ). If 5% of the genes in a given genome are smORFs, as recently suggested for Drosophila melanogaster (Aspden ), it will be interesting to investigate whether at least some other developmental pathways such as Hh, Wnt, FGFs and BMPs are also regulated by and regulate other smORFs. Importantly, it will be interesting to know whether smORFs are found in basal metazoans such as sponges, cnidarians and ctenophores, as the examples described so far are primarily restricted to yeast, plants, arthropods and chordates. Additionally, we expect that, as experimental and bioinformatic methods become more powerful, smORFs will be essential components of genome annotations and studies of gene regulatory networks. Finally, examples of horizontal smORF transfer between eubacteria and eukaryotes and parasites might be discovered.
  23 in total

1.  Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA.

Authors:  Takefumi Kondo; Yoshiko Hashimoto; Kagayaki Kato; Sachi Inagaki; Shigeo Hayashi; Yuji Kageyama
Journal:  Nat Cell Biol       Date:  2007-05-07       Impact factor: 28.824

2.  Pri peptides are mediators of ecdysone for the temporal control of development.

Authors:  Hélène Chanut-Delalande; Yoshiko Hashimoto; Anne Pelissier-Monier; Rebecca Spokony; Azza Dib; Takefumi Kondo; Jérôme Bohère; Kaori Niimi; Yvan Latapie; Sachi Inagaki; Laurence Dubois; Philippe Valenti; Cédric Polesello; Satoru Kobayashi; Bernard Moussian; Kevin P White; Serge Plaza; Yuji Kageyama; François Payre
Journal:  Nat Cell Biol       Date:  2014-10-26       Impact factor: 28.824

3.  The Amphimedon queenslandica genome and the evolution of animal complexity.

Authors:  Mansi Srivastava; Oleg Simakov; Jarrod Chapman; Bryony Fahey; Marie E A Gauthier; Therese Mitros; Gemma S Richards; Cecilia Conaco; Michael Dacre; Uffe Hellsten; Claire Larroux; Nicholas H Putnam; Mario Stanke; Maja Adamska; Aaron Darling; Sandie M Degnan; Todd H Oakley; David C Plachetzki; Yufeng Zhai; Marcin Adamski; Andrew Calcino; Scott F Cummins; David M Goodstein; Christina Harris; Daniel J Jackson; Sally P Leys; Shengqiang Shu; Ben J Woodcroft; Michel Vervoort; Kenneth S Kosik; Gerard Manning; Bernard M Degnan; Daniel S Rokhsar
Journal:  Nature       Date:  2010-08-05       Impact factor: 49.962

4.  Hundreds of putatively functional small open reading frames in Drosophila.

Authors:  Emmanuel Ladoukakis; Vini Pereira; Emile G Magny; Adam Eyre-Walker; Juan Pablo Couso
Journal:  Genome Biol       Date:  2011-11-25       Impact factor: 13.583

5.  Identification of putative noncoding polyadenylated transcripts in Drosophila melanogaster.

Authors:  Jonathan L Tupy; Adina M Bailey; Gina Dailey; Martha Evans-Holm; Christian W Siebel; Sima Misra; Susan E Celniker; Gerald M Rubin
Journal:  Proc Natl Acad Sci U S A       Date:  2005-04-04       Impact factor: 11.205

6.  Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae.

Authors:  James P Kastenmayer; Li Ni; Angela Chu; Lauren E Kitchen; Wei-Chun Au; Hui Yang; Carole D Carter; David Wheeler; Ronald W Davis; Jef D Boeke; Michael A Snyder; Munira A Basrai
Journal:  Genome Res       Date:  2006-03       Impact factor: 9.043

7.  Systematic discovery of new genes in the Saccharomyces cerevisiae genome.

Authors:  Marco M Kessler; Qiandong Zeng; Sarah Hogan; Robin Cook; Arturo J Morales; Guillaume Cottarel
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

Review 8.  Eco-evo-devo: the time has come.

Authors:  Ehab Abouheif; Marie-Julie Favé; Ana Sofia Ibarrarán-Viniegra; Maryna P Lesoway; Ab Matteen Rafiqi; Rajendhran Rajakumar
Journal:  Adv Exp Med Biol       Date:  2014       Impact factor: 2.622

9.  In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists.

Authors:  Yvan Saeys; Pierre Rouzé; Yves Van de Peer
Journal:  Bioinformatics       Date:  2007-01-04       Impact factor: 6.937

10.  Peptides encoded by short ORFs control development and define a new eukaryotic gene family.

Authors:  Máximo Ibo Galindo; José Ignacio Pueyo; Sylvaine Fouix; Sarah Anne Bishop; Juan Pablo Couso
Journal:  PLoS Biol       Date:  2007-05       Impact factor: 8.029

View more
  5 in total

Review 1.  Recognition of the polycistronic nature of human genes is critical to understanding the genotype-phenotype relationship.

Authors:  Marie A Brunet; Sébastien A Levesque; Darel J Hunting; Alan A Cohen; Xavier Roucou
Journal:  Genome Res       Date:  2018-04-06       Impact factor: 9.043

Review 2.  Mass Spectrometry-Based Proteomics to Unveil the Non-coding RNA World.

Authors:  Roberto Giambruno; Marija Mihailovich; Tiziana Bonaldi
Journal:  Front Mol Biosci       Date:  2018-11-08

3.  A platform for curated products from novel open reading frames prompts reinterpretation of disease variants.

Authors:  Matthew D C Neville; Robin Kohze; Chaitanya Erady; Narendra Meena; Matthew Hayden; David N Cooper; Matthew Mort; Sudhakaran Prabakaran
Journal:  Genome Res       Date:  2021-01-19       Impact factor: 9.043

4.  OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes.

Authors:  Marie A Brunet; Mylène Brunelle; Jean-François Lucier; Vivian Delcourt; Maxime Levesque; Frédéric Grenier; Sondos Samandi; Sébastien Leblanc; Jean-David Aguilar; Pascal Dufour; Jean-Francois Jacques; Isabelle Fournier; Aida Ouangraoua; Michelle S Scott; François-Michel Boisvert; Xavier Roucou
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

Review 5.  Understanding small ORF diversity through a comprehensive transcription feature classification.

Authors:  Diego Guerra-Almeida; Diogo Antonio Tschoeke; Rodrigo Nunes-da-Fonseca
Journal:  DNA Res       Date:  2021-09-13       Impact factor: 4.477

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.