Literature DB >> 21367939

Discovery and annotation of small proteins using genomics, proteomics, and computational approaches.

Xiaohan Yang1, Timothy J Tschaplinski, Gregory B Hurst, Sara Jawdy, Paul E Abraham, Patricia K Lankford, Rachel M Adams, Manesh B Shah, Robert L Hettich, Erika Lindquist, Udaya C Kalluri, Lee E Gunter, Christa Pennacchio, Gerald A Tuskan.   

Abstract

Small proteins (10-200 amino acids [aa] in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained ~2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10-200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) coding-potential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21367939      PMCID: PMC3065711          DOI: 10.1101/gr.109280.110

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  41 in total

1.  CAP3: A DNA sequence assembly program.

Authors:  X Huang; A Madan
Journal:  Genome Res       Date:  1999-09       Impact factor: 9.043

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

3.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors:  Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal:  Nucleic Acids Res       Date:  2002-07-15       Impact factor: 16.971

4.  DBParser: web-based software for shotgun proteomic data analyses.

Authors:  Xiaoyu Yang; Vijay Dondeti; Rebecca Dezube; Dawn M Maynard; Lewis Y Geer; Jonathan Epstein; Xiongfong Chen; Sanford P Markey; Jeffrey A Kowalak
Journal:  J Proteome Res       Date:  2004 Sep-Oct       Impact factor: 4.466

5.  Improvement in the accuracy of multiple sequence alignment program MAFFT.

Authors:  Kazutaka Katoh; Kei-ichi Kuma; Takashi Miyata; Hiroyuki Toh
Journal:  Genome Inform       Date:  2005

6.  Protein measurement with the Folin phenol reagent.

Authors:  O H LOWRY; N J ROSEBROUGH; A L FARR; R J RANDALL
Journal:  J Biol Chem       Date:  1951-11       Impact factor: 5.157

7.  Cell-to-cell movement of the CAPRICE protein in Arabidopsis root epidermal cell differentiation.

Authors:  Tetsuya Kurata; Tetsuya Ishida; Chie Kawabata-Awai; Masahiro Noguchi; Sayoko Hattori; Ryosuke Sano; Ryoko Nagasaka; Rumi Tominaga; Yoshihiro Koshino-Kimura; Tomohiko Kato; Shusei Sato; Satoshi Tabata; Kiyotaka Okada; Takuji Wada
Journal:  Development       Date:  2005-11-16       Impact factor: 6.868

Review 8.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

9.  Signaling of cell fate decisions by CLAVATA3 in Arabidopsis shoot meristems.

Authors:  J C Fletcher; U Brand; M P Running; R Simon; E M Meyerowitz
Journal:  Science       Date:  1999-03-19       Impact factor: 47.728

10.  Rfam: annotating non-coding RNAs in complete genomes.

Authors:  Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  39 in total

1.  Small open reading frames associated with morphogenesis are hidden in plant genomes.

Authors:  Kousuke Hanada; Mieko Higuchi-Takeuchi; Masanori Okamoto; Takeshi Yoshizumi; Minami Shimizu; Kentaro Nakaminami; Ranko Nishi; Chihiro Ohashi; Kei Iida; Maho Tanaka; Yoko Horii; Mika Kawashima; Keiko Matsui; Tetsuro Toyoda; Kazuo Shinozaki; Motoaki Seki; Minami Matsui
Journal:  Proc Natl Acad Sci U S A       Date:  2013-01-22       Impact factor: 11.205

Review 2.  Emerging evidence for functional peptides encoded by short open reading frames.

Authors:  Shea J Andrews; Joseph A Rothnagel
Journal:  Nat Rev Genet       Date:  2014-02-11       Impact factor: 53.242

3.  SearchDOGS bacteria, software that provides automated identification of potentially missed genes in annotated bacterial genomes.

Authors:  Seán S Óhéigeartaigh; David Armisén; Kevin P Byrne; Kenneth H Wolfe
Journal:  J Bacteriol       Date:  2014-03-21       Impact factor: 3.490

Review 4.  Beyond mRNA: The role of non-coding RNAs in normal and aberrant hematopoiesis.

Authors:  Mark C Wilkes; Claire E Repellin; Kathleen M Sakamoto
Journal:  Mol Genet Metab       Date:  2017-07-25       Impact factor: 4.797

Review 5.  Identification and characterization of sORF-encoded polypeptides.

Authors:  Qian Chu; Jiao Ma; Alan Saghatelian
Journal:  Crit Rev Biochem Mol Biol       Date:  2015-04-10       Impact factor: 8.250

6.  Identification of small secreted peptides (SSPs) in maize and expression analysis of partial SSP genes in reproductive tissues.

Authors:  Ye Long Li; Xin Ren Dai; Xun Yue; Xin-Qi Gao; Xian Sheng Zhang
Journal:  Planta       Date:  2014-07-22       Impact factor: 4.116

7.  The separation between the 5'-3' ends in long RNA molecules is short and nearly constant.

Authors:  Nehemías Leija-Martínez; Sergio Casas-Flores; Rubén D Cadena-Nava; Joan A Roca; José A Mendez-Cabañas; Eduardo Gomez; Jaime Ruiz-Garcia
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

Review 8.  Discovery and characterization of smORF-encoded bioactive polypeptides.

Authors:  Alan Saghatelian; Juan Pablo Couso
Journal:  Nat Chem Biol       Date:  2015-12       Impact factor: 15.040

9.  Autographa californica Nucleopolyhedrovirus Ac76: a dimeric type II integral membrane protein that contains an inner nuclear membrane-sorting motif.

Authors:  Denghui Wei; Yan Wang; Xiaomei Zhang; Zhaoyang Hu; Meijin Yuan; Kai Yang
Journal:  J Virol       Date:  2013-11-06       Impact factor: 5.103

Review 10.  Viral miniproteins.

Authors:  Daniel DiMaio
Journal:  Annu Rev Microbiol       Date:  2014-04-10       Impact factor: 15.500

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.