Literature DB >> 32333496

PsORF: a database of small ORFs in plants.

Yanjun Chen1, Danyang Li1, Weiliang Fan1,2, Xiaoming Zheng1, Yifan Zhou1, Hanzhe Ye1, Xiaodong Liang1, Wei Du1, Yu Zhou1,2, Kun Wang1.   

Abstract

Entities:  

Keywords:  Ribo-seq; database; mass spectrum; plant; small ORF

Mesh:

Year:  2020        PMID: 32333496      PMCID: PMC7589237          DOI: 10.1111/pbi.13389

Source DB:  PubMed          Journal:  Plant Biotechnol J        ISSN: 1467-7644            Impact factor:   9.803


× No keyword cloud information.
Dear Editor, Small open reading frames (sORFs) which are translated to small peptides (100 amino acids or fewer in length) have been always excluded from genome annotations. In recent years, more and more biologically significant sORFs have been discovered to encode functional peptides or play regulatory roles on mRNA translation. In plants, an evolutionarily ancient micro‐peptide, AtLURE1, promotes and maintains reproductive isolation through accelerating conspecific pollen tube penetration (Zhong et al., 2019). The sORFs in the 5’ UTR of mRNA, usually named as upstream ORFs (uORFs), were reported to mediate translational regulation of their downstream main ORFs (mORFs) (Xu et al., 2017). Recent advances in translatomics (especially the ribosome profiling, Ribo‐seq) and MS‐based proteomics have indicated that sORFs were pervasively present in non‐coding RNAs, UTR regions of mRNAs, and circleRNAs etc (Wang et al., 2019). In animals, there have been two public databases for sORF collection: SORFS.ORG (Olexiouk et al., 2016) and smProt (http://bioinfo.ibp.ac.cn/SmProt/) (Hao et al., 2017). The two databases integrated Ribo‐seq and MS‐based proteomics data in animals to annotate the sORFs. In plants, a database ARA‐PEPs (http://www.biw.kuleuven.be/CSB/ARA‐PEPs) has been constructed (Hazarika et al., 2017). The ARA‐PEPs identified sORFs based on criteria that the peptide sequences of at least 10 amino acids beginning with a canonical start codon and not truncated by a stop codon. It is a repository only for sORFs in Arabidopsis thaliana, in which the 13 748 candidate sORFs lack translational evidence, but have only RNA expression evidence (microarray and RNA‐seq). Therefore, a database of systematic sORF annotations in plants is still missing, which will not only hinder cross‐species studies in plants, but also restrict the possibility of cross‐kingdom comparison analysis between animals and plants. In this study, we collected multi‐omic data including genome, transcriptome, Ribo‐seq and mass spectrum (MS) from public database, and built a pipeline to identify sORFs in 35 different plant species. Based on the results, we designed a web‐accessible database, PsORF (http://psorf.whu.edu.cn/). The PsORF integrates released data from multiple databases to acquire a set of sORFs generated from non‐coding region annotated in reference genomes. We collected 35 reference genomes from PLAZA database (https://bioinformatics.psb.ugent.be/plaza/) with well‐annotated UTRs and lncRNAs. The five plant species including two eudicots Arabidopsis thaliana and Gossypium arboreum, two monocots Oryza sativa and Zea mays, and a algae Chlamydomonas reinhardtii which have available data of Ribo‐seq and MS in public database were selected to analyse and get the translational evidence for sORFs. Totally, we collected 103 Ribo‐seq for the five major species from NCBI (https://www.ncbi.nlm.nih.gov/) and EBI (https://www.ebi.ac.uk/), together with 93 mass spectral (MS) projects generated by high sensitivity mass spectrometry instrument (Q Exactive or LTQ Orbitrap Elite) in PRIDE database (https://www.ebi.ac.uk/pride/archive/). To integrate above data, we built a pipeline which is shown in Figure 1a. When defining the candidate sORFs, all three possible reading frames of RNA transcript were examined, and ATG and near‐cognate codons (ATG, TTG, GTG, CTG, AAG, AGG, ACG, ATA, ATT, ATC), and TAG, TAA, TGA were considered as start and stop codons, respectively. To determine whether a candidate sORF is translated, the Ribo‐seq and MS data were analysed separately using different softwares. The PRICE (v 1.0.2) (Erhard et al., 2018) was used to analyse the 3 nt periodic feature of ribosome footprints from Ribo‐seq data. The SearchGUI (v 3.3.13) (Barsnes and Vaudel, 2018) was used to find the peptides matching with the translational reading frame in MS data. Then, the two sets of sORF from Ribo‐seq and MS were filtered to retain sORFs with length of 18‐300 nt and combined by taking the union set to get the core sORF registry for the five plant species.
Figure 1

Schematic of PsORF database. (a) Data sources and data processing pipeline of PsORF. (b) The five kinds of sORFs classified by the genome location. uORF, small ORF in the upstream of mORF; uoORF, small ORF across 5’UTR and mORF; dORF, small ORF in the downstream of mORF; doORF, small ORF across mORF and 3’UTR; sORF, other small ORF in the genome. (c) The JBrowser showing a uORF, the associated tracks (Ribo‐seq and RNA‐seq) of which are showed. (d) The MS spectra of a dORF. The b and y ion are showed in blue and red colour, respectively. (e) The phylogenetic tree for a conserved sORF and its homologs across five plant species.

Schematic of PsORF database. (a) Data sources and data processing pipeline of PsORF. (b) The five kinds of sORFs classified by the genome location. uORF, small ORF in the upstream of mORF; uoORF, small ORF across 5’UTR and mORF; dORF, small ORF in the downstream of mORF; doORF, small ORF across mORF and 3’UTR; sORF, other small ORF in the genome. (c) The JBrowser showing a uORF, the associated tracks (Ribo‐seq and RNA‐seq) of which are showed. (d) The MS spectra of a dORF. The b and y ion are showed in blue and red colour, respectively. (e) The phylogenetic tree for a conserved sORF and its homologs across five plant species. For other 30 plant species, we used the BLAST to find the homologous sORFs to the core sORF registry. Finally, these sORFs from 30 other plant species and the core sORF registry were combined to get the comprehensive sORF registry of 35 plant species, which was consisted of 112,350 sORF from 51 341 transcripts. Based on their genome location, the sORFs could be divided into five categories: uORF (44,467), uoORF (4788), dORF (53 229), doORF (4403) and sORF (5463) (Figure 1b). Based on their sequence conservation, current version of psORF contains 11 665 homologous sORF family. In addition, to link the identified sORFs with known knowledge, we collected sROFs in the published literatures by using python‐scripted web crawler to discover the key words in the abstract and main text, such as small (coding) ORF/sORF, small protein/peptide, micro‐protein/peptide, unannotated translation events, downstream ORF/dORF and upstream ORF/uORF. The known sORFs were made a database which was BLAST against sORFs in the comprehensive sORFs registry by using BLASTp (v 2.6.0+) with parameter setting: cut‐offs: e‐value ≤ 0.01, coverage ≥ 30% and identity = 100. The BLAST hits were shown in the gene wiki page of each sORF. PsORF was deployed on Linux operation system with nginx web server, and all data were stored in MySQL database for query. PsORF offers convenient browse and query services for users (Figure 1a) to get the basic sORF information. In PsORF, users can: (i) browse or search sORFs with ID and sequence; (ii) BLAST the sequence similarity of sORFs across plant species; (iii) browse the Ribo‐seq and RNA‐seq data and genome location information of sORFs in genome browser JBrowser (Figure 1c) (Buels et al., 2016); (iv) view the MS/MS fragmentation spectra of small peptides (sORFs encoding) in the visual platform (Figure 1d); (v) find the phylogenetic tree of conserved sORFs across different plant species; (Figure 1e); and (vi) check whether the sORFs or their homologs have associated researches in published literature. To our best knowledge, PsORF (http://psorf.whu.edu.cn/) is the unique comprehensive database for plant sORFs. As the accumulation of translatomic data from Ribo‐seq and proteomic data from MS, more and more important sORFs and their regulatory roles will be identified. Thus, we will keep on updating PsORF as new data available. We believe that the database will facilitate plant scientists to quickly get the sORF information for further biological discovery.

Conflict of interests

The authors declare no competing interests.

Author contributions

K.W. and Y.J. designed the project and wrote the manuscript. Y.J., D.Y., W.F, H.Y., Y.F.Z., Y.Z. and W.D. contributed to data analysis and web design. X.Z. and X.L. contributed to Ribo‐seq and MS assays.
  10 in total

1.  SmProt: a database of small proteins encoded by annotated coding and non-coding RNA loci.

Authors:  Yajing Hao; Lili Zhang; Yiwei Niu; Tanxi Cai; Jianjun Luo; Shunmin He; Bao Zhang; Dejiu Zhang; Yan Qin; Fuquan Yang; Runsheng Chen
Journal:  Brief Bioinform       Date:  2018-07-20       Impact factor: 11.622

2.  SearchGUI: A Highly Adaptable Common Interface for Proteomics Search and de Novo Engines.

Authors:  Harald Barsnes; Marc Vaudel
Journal:  J Proteome Res       Date:  2018-05-25       Impact factor: 4.466

3.  Cysteine-rich peptides promote interspecific genetic isolation in Arabidopsis.

Authors:  Sheng Zhong; Meiling Liu; Zhijuan Wang; Qingpei Huang; Saiying Hou; Yong-Chao Xu; Zengxiang Ge; Zihan Song; Jiaying Huang; Xinyu Qiu; Yihao Shi; Junyu Xiao; Pei Liu; Ya-Long Guo; Juan Dong; Thomas Dresselhaus; Hongya Gu; Li-Jia Qu
Journal:  Science       Date:  2019-05-31       Impact factor: 47.728

4.  ARA-PEPs: a repository of putative sORF-encoded peptides in Arabidopsis thaliana.

Authors:  Rashmi R Hazarika; Barbara De Coninck; Lidia R Yamamoto; Laura R Martin; Bruno P A Cammue; Vera van Noort
Journal:  BMC Bioinformatics       Date:  2017-01-17       Impact factor: 3.169

5.  Improved Ribo-seq enables identification of cryptic translation events.

Authors:  Florian Erhard; Anne Halenius; Cosima Zimmermann; Anne L'Hernault; Daniel J Kowalewski; Michael P Weekes; Stefan Stevanovic; Ralf Zimmer; Lars Dölken
Journal:  Nat Methods       Date:  2018-03-12       Impact factor: 28.547

6.  JBrowse: a dynamic web platform for genome visualization and analysis.

Authors:  Robert Buels; Eric Yao; Colin M Diesh; Richard D Hayes; Monica Munoz-Torres; Gregg Helt; David M Goodstein; Christine G Elsik; Suzanna E Lewis; Lincoln Stein; Ian H Holmes
Journal:  Genome Biol       Date:  2016-04-12       Impact factor: 13.583

7.  sORFs.org: a repository of small ORFs identified by ribosome profiling.

Authors:  Volodimir Olexiouk; Jeroen Crappé; Steven Verbruggen; Kenneth Verhegen; Lennart Martens; Gerben Menschaert
Journal:  Nucleic Acids Res       Date:  2015-11-02       Impact factor: 16.971

8.  Multi-strategic RNA-seq analysis reveals a high-resolution transcriptional landscape in cotton.

Authors:  Kun Wang; Dehe Wang; Xiaomin Zheng; Ai Qin; Jie Zhou; Boyu Guo; Yanjun Chen; Xingpeng Wen; Wen Ye; Yu Zhou; Yuxian Zhu
Journal:  Nat Commun       Date:  2019-10-17       Impact factor: 14.919

9.  PsORF: a database of small ORFs in plants.

Authors:  Yanjun Chen; Danyang Li; Weiliang Fan; Xiaoming Zheng; Yifan Zhou; Hanzhe Ye; Xiaodong Liang; Wei Du; Yu Zhou; Kun Wang
Journal:  Plant Biotechnol J       Date:  2020-05-22       Impact factor: 9.803

10.  uORF-mediated translation allows engineered plant disease resistance without fitness costs.

Authors:  Guoyong Xu; Meng Yuan; Chaoren Ai; Lijing Liu; Edward Zhuang; Sargis Karapetyan; Shiping Wang; Xinnian Dong
Journal:  Nature       Date:  2017-05-17       Impact factor: 49.962

  10 in total
  7 in total

1.  SmProt: A Reliable Repository with Comprehensive Annotation of Small Proteins Identified from Ribosome Profiling.

Authors:  Yanyan Li; Honghong Zhou; Xiaomin Chen; Yu Zheng; Quan Kang; Di Hao; Lili Zhang; Tingrui Song; Huaxia Luo; Yajing Hao; Runsheng Chen; Peng Zhang; Shunmin He
Journal:  Genomics Proteomics Bioinformatics       Date:  2021-09-15       Impact factor: 6.409

Review 2.  Peptidome: Chaos or Inevitability.

Authors:  Irina Lyapina; Vadim Ivanov; Igor Fesenko
Journal:  Int J Mol Sci       Date:  2021-12-04       Impact factor: 5.923

Review 3.  Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures.

Authors:  Alyssa Zi-Xin Leong; Pey Yee Lee; M Aiman Mohtar; Saiful Effendi Syafruddin; Yuh-Fen Pung; Teck Yew Low
Journal:  J Biomed Sci       Date:  2022-03-17       Impact factor: 8.410

4.  In Depth Exploration of the Alternative Proteome of Drosophila melanogaster.

Authors:  Bertrand Fabre; Sebastien A Choteau; Carine Duboé; Carole Pichereaux; Audrey Montigny; Dagmara Korona; Michael J Deery; Mylène Camus; Christine Brun; Odile Burlet-Schiltz; Steven Russell; Jean-Philippe Combier; Kathryn S Lilley; Serge Plaza
Journal:  Front Cell Dev Biol       Date:  2022-05-26

5.  PsORF: a database of small ORFs in plants.

Authors:  Yanjun Chen; Danyang Li; Weiliang Fan; Xiaoming Zheng; Yifan Zhou; Hanzhe Ye; Xiaodong Liang; Wei Du; Yu Zhou; Kun Wang
Journal:  Plant Biotechnol J       Date:  2020-05-22       Impact factor: 9.803

Review 6.  uORFs: Important Cis-Regulatory Elements in Plants.

Authors:  Ting Zhang; Anqi Wu; Yaping Yue; Yu Zhao
Journal:  Int J Mol Sci       Date:  2020-08-28       Impact factor: 5.923

7.  SPENCER: a comprehensive database for small peptides encoded by noncoding RNAs in cancer patients.

Authors:  Xiaotong Luo; Yuantai Huang; Huiqin Li; Yihai Luo; Zhixiang Zuo; Jian Ren; Yubin Xie
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.