| Literature DB >> 32019587 |
Pan Wu1,2,3, Yongzhen Mo2, Miao Peng2, Ting Tang2, Yu Zhong2, Xiangying Deng2, Fang Xiong2, Can Guo2, Xu Wu1,2, Yong Li4, Xiaoling Li2, Guiyuan Li1,2,3, Zhaoyang Zeng1,2,3, Wei Xiong5,6,7.
Abstract
Non-coding RNAs do not encode proteins and regulate various oncological processes. They are also important potential cancer diagnostic and prognostic biomarkers. Bioinformatics and translation omics have begun to elucidate the roles and modes of action of the functional peptides encoded by ncRNA. Here, recent advances in long non-coding RNA (lncRNA) and circular RNA (circRNA)-encoded small peptides are compiled and synthesized. We introduce both the computational and analytical methods used to forecast prospective ncRNAs encoding oncologically functional oligopeptides. We also present numerous specific lncRNA and circRNA-encoded proteins and their cancer-promoting or cancer-inhibiting molecular mechanisms. This information may expedite the discovery, development, and optimization of novel and efficacious cancer diagnostic, therapeutic, and prognostic protein-based tools derived from non-coding RNAs. The role of ncRNA-encoding functional peptides has promising application perspectives and potential challenges in cancer research. The aim of this review is to provide a theoretical basis and relevant references, which may promote the discovery of more functional peptides encoded by ncRNAs, and further develop novel anticancer therapeutic targets, as well as diagnostic and prognostic cancer markers.Entities:
Keywords: cancer; circRNA; lncRNA; peptide
Year: 2020 PMID: 32019587 PMCID: PMC6998289 DOI: 10.1186/s12943-020-1147-3
Source DB: PubMed Journal: Mol Cancer ISSN: 1476-4598 Impact factor: 27.401
Classification of prediction methods for ORFs
| Name | Characteristics | Website |
|---|---|---|
| CircRNADb [ | It includes 32,914 circRNA records for human exons. Each of them has IRES sequence components, predicted ORFs, related references, and so on. The predicted ORF is usually the cross-splicing site. | http://202.195.183.4:8000/circrnadb/circRNADb.php |
| It comprises > 4,374,422 sORFs from six different species and derived from multiple ribosome profiling and sequencing datasets. | ||
| ORF Finder [ | It performs six-frame translations and returns the range of each ORF and its protein translation. These may be submitted directly for BLAST similarity or COGs database searches. | https://www.ncbi.nlm.nih.gov/orffinder/ |
| ORF Predictor [ | It was designed to predict expressed sequence tags (EST) or cDNA sequences. Its output file consists of predicted coding DNA sequences, the start and end of the coding region, and the protein peptide sequence. | |
| SMS:ORF Finder [ | It searches for newly-sequenced DNA and returns the range of each ORF and its protein translation. SMS:ORF Finder supports the entire IUPAC alphabet and several genetic codes. | http://www.bioinformatics.org/sms2/orf_find.html |
| CSCD [ | It contains cancer-associated circRNA alternative splicing, expression, and translation (ORF). It is linked to UCSC which predicts potential ORFs and highlight translatable circRNAs. | |
| PhyloCSF [ | It is based on the phylogenetic analysis of multi-species genomic sequence alignments and identifies conserved protein coding regions. It requires complete ORF sequences from different species to be able to evaluate their coding probabilities. | http://compbio.mit.edu/PhyloCSF/ |
| CircPro [ | Its data was derived from high-throughput sequencing (RNA-Seq and Ribo-Seq). It generates a list of circRNAs and reports genomic locations, ORF lengths, junction reads from Ribo-Seq, and so on. | |
| cORF_pipeline [ | Its output predicts sORF sequences, start and stop positions, annotation data, and so on. The longest ORF spanning the circRNA splicing site is the one most likely to encode. | https://github.com/kadenerlab/cORF_pipeline |
| CircBank [ | Its output contains the ORF size, the coding potential, and circRNA conservation. | http://www.circbank.cn/index.html |
IRES prediction methods
| Name | Characteristics | Website |
|---|---|---|
| IRESite [ | It is based on experimental data derived from 68 viruses and 115 eukaryotic cells. It furnishes information on experimental IRES fragments including their nature, function, origin, size, sequence, structure, relative position to the surrounding protein coding region, and so on. | |
| IRESfinder [ | It is a logit model-based forecasting tool based on 19 k-mer parameters. Its accuracy is ~80%. IRESfinder is a standalone script for Python that is applicable to high-throughput screening. | |
| IRESPred [ | It predicts viral and cellular IRES via the Support Vector Machine (SVM). This predictive model integrates 35 features based on the sequence and structural properties of UTRs and their probabilities of interacting with small subunit ribosomal proteins (SSRPs). Its accuracy is ~75.75% accuracy and it had a 0.51 Matthews correlation coefficient (MCC) in blind testing. | http://bioinfo.net.in/IRESPred/ |
| VIPS [ | It consists of the RNAL folding, RNA Align, and pknotsRG programs. Evaluations of the UTR, IRES, and virus databases disclosed that it has superior accuracy and flexibility and can predict four different sets of IRES. | http://140.135.61.250/vips/ |
M6A prediction methods
| Name | Characteristics | Website |
|---|---|---|
| DeepM6ASeq [ | It is based on miCLIP-Seq data at single-base resolution and detects m6A sites. It can recognize new reader FMR1. | |
| M6APred-EL [ | It uses position-specific k-mer nucleotide propensity, physicochemical properties, and ring function hydrogen chemical properties to optimize m6A position recognition accuracy. | |
| M6AMRFS [ | It uses dinucleotide binary encoding and local position-specific dinucleotide frequencies to encode RNA sequences. It can identify m6A sites in multiple species. | http://server.malab.cn/M6AMRFS/ |
| SRAMP [ | It identifies mammalian m6A sites at single-nucleotide resolution and builds m6A site predictors. SRAMP = sequence-based RNA adenosine methylation site predictor. | |
| iRNA-Methyl [ | Identifying m6A sites by incorporating the global and long-range sequence pattern information of RNA via the pseudo k-tupler nucleotide composition (PseKNC) approach. | http://lin.uestc.edu.cn/server/iRNA-Methyl |
| iRNA (m6A)-PseDNC [ | It uses the Euclidean distance-based method and pseudodinucleotide composition to identify m6A sites in the | |
| m6Acomet [ | It is based on the RNA co-methylation network comprising 339,158 putative gene ontology functions associated with 1,446 identified human m6A sites. | http://www.xjtlu.edu.cn/biologicalsciences/m6acomet |
| WHISTLE [ | It integrates 35 genome-derived and conventional sequence-derived features. It enable direct queries of predicted RNA-methylation sites, their putative functions, and their associations with other methylation sites or genes. | |
| pRNAm-PC [ | It predicts m6A sites in RNA sequences based on physicochemical properties. RNA sequence samples are expressed by pseudodinucleotide composition (PseDNC). | http://www.jci-bioinfo.cn/pRNAm-PC |
| TargetM6A [ | It identifies m6A sites from RNA sequences via position-specific nucleotide propensities (PSNP) and a support vector machine (SVM). | http://csbio.njust.edu.cn/bioinf/TargetM6A |
| AthMethPre [ | It trains the SVM classifier using the positional flanking nucleotide sequence and the position-independent k-mer nucleotide spectrum to predict m6A sites in | |
| RNAMethPre [ | It predicts m6A sites by integrating multiple mRNA features and training the SVM classifier in mammalian mRNA sequences. | http://bioinfo.tsinghua.edu.cn/RNAMethPre/index.html |
Fig. 1Small peptides encoded by circRNAs and lncRNAs regulate tumor proliferation. aCircSHPRH encodes SHPRH-146aa, which protects full-length SHPRH from ubiquitin protease degradation. SHPRH ubiquitinates PCNA as an E3 ligase. bCirc-AKT3 encodes AKT3-174aa, which competitively interacts with PDK1 to negatively regulate the PI3K/Akt signaling pathway. cCircPINT encodes PINT87aa, which interacts with PAF1 and inhibits transcriptional elongation of oncogenes. dCirc-FBXW7 encodes Fbxw7-185aa, which prevents interaction between USP28 and FBXW7a by competitively binding USP28 and destabilizing c-Myc. eCircE7 encodes the E7 oncoprotein, which promotes tumor proliferation. f The lncRNA UBAP1-AST6 encodes UBAP1-AST6, which is a cancer-promoting factor
Fig. 2Small peptides encoded by circRNAs and lncRNAs regulate tumor invasion, metastasis, and proliferation. aCircPPP1R12A encodes circPPP1R12A-73aa, which activates the Hippo-YAP signaling pathway. b The lncRNA HOXB-AS3 encodes the HOXB-AS3 peptide, which competitively binds hnRNP A1 and antagonizes hnRNP A1-mediated PKM splicing regulation. cCircLgr4 encodes circLgr4-peptide, which interacts with LGR4 and activates the LGR4-Wnt signaling pathway. dCircβ-catenin encodes β-catenin-370aa, which antagonizes GSK3β-induced β-catenin phosphorylation and ubiquitination/degradation, stabilizes full-length β-catenin, and activates the Wnt pathway. eLINC01420 encodes nobody, which binds EDC4 to regulate mRNA degradation. LINC01420 may promote nasopharyngeal carcinoma invasion and metastasis via this pathway