| Literature DB >> 22361292 |
Farshad Niazi1, Saba Valadkhan.
Abstract
Recent transcriptome analyses have indicated that a large part of mammalian genomes are transcribed into long non-protein-coding RNAs (lncRNAs). However, only a very small fraction of them have been individually studied, and whether the majority of lncRNAs found in large-scale studies have a cellular role is debated. To gain insight into the sequence features and genomic architecture of the subset of lncRNAs that have been proven to be functional, we created a database containing studied lncRNAs manually culled from the literature along with a parallel database containing all annotated protein-coding human RNAs. The Functional lncRNA Database, which contains 204 lncRNAs and their splicing variants, is available at valadkhanlab.org/database. Analysis of the lncRNAs and their comparison to protein-coding transcripts revealed sequence features including paucity of introns and low GC content in lncRNAs, which could explain several biological characteristics of these transcripts, such as their nuclear localization and low expression level. The predicted ORFs in lncRNAs have poor start codon and ORF contexts, which would lead to activation of the nonsense-mediated decay pathways and thus make it unlikely for most lncRNAs to code for even short peptides. Interestingly, our analyses revealed significant similarities between the lncRNAs and the 3' untranslated regions (3' UTRs) in protein-coding RNAs in structural features and sequence composition. The presence of these intriguing parallels between the lncRNAs and 3' UTRs, which constitute the two main components of the RNA-mediated cellular regulatory system, indicates that highly similar evolutionary constraints govern the function of regulatory RNA sequences in the cell.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22361292 PMCID: PMC3312569 DOI: 10.1261/rna.029520.111
Source DB: PubMed Journal: RNA ISSN: 1355-8382 Impact factor: 4.942