Literature DB >> 21622663

Computational discovery of human coding and non-coding transcripts with conserved splice sites.

Dominic Rose1, Michael Hiller, Katharina Schutt, Jörg Hackermüller, Rolf Backofen, Peter F Stadler.   

Abstract

MOTIVATION: Long non-coding RNAs (lncRNAs) resemble protein-coding mRNAs but do not encode proteins. Most lncRNAs are under lower sequence constraints than protein-coding genes and lack conserved secondary structures, making it hard to predict them computationally.
RESULTS: We introduce an approach to predict spliced lncRNAs in vertebrate genomes combining comparative genomics and machine learning. It is based on detecting signatures of characteristic splice site evolution in vertebrate whole genome alignments. First, we predict individual splice sites, then assemble compatible sites into exon candidates, and finally predict multi-exon transcripts. Using a novel method to evaluate typical splice site substitution patterns that explicitly takes the species phylogeny into account, we show that individual splice sites can be accurately predicted. Since our approach relies only on predicted splice sites, it can uncover both coding and non-coding exons. We show that our predicted exons and partial transcripts are mostly non-coding and lack conserved secondary structures. These exons are of particular interest, since existing computational approaches cannot detect them. Transcriptome sequencing data indicate tissue-specific expression patterns of predicted exons and there is evidence that increasing sequencing depth and breadth will validate additional predictions. We also found a significant enrichment of predicted exons that form multi-exon transcript parts, and we experimentally validate such a novel multi-exon gene. Overall, we obtain 336 novel multi-exon transcript predictions from human intergenic regions. Our results indicate the existence of novel human transcripts that are conserved in evolution and our approach contributes to the completion of the human transcript catalog.
AVAILABILITY AND IMPLEMENTATION: Predicted human splice sites, exons and gene structures together with a Perl implementation of the tree-based log-odds scoring and a supplementary PDF file containing additional figures and tables are available at: http://www.bioinf.uni-leipzig.de/publications/supplements/10-010. The five experimentally confirmed partial transcript isoforms have been deposited in GenBank under accession numbers HM587422-HM587426.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21622663     DOI: 10.1093/bioinformatics/btr314

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  12 in total

1.  "Hypothesis for the modern RNA world": a pervasive non-coding RNA-based genetic regulation is a prerequisite for the emergence of multicellular complexity.

Authors:  Irma Lozada-Chávez; Peter F Stadler; Sonja J Prohaska
Journal:  Orig Life Evol Biosph       Date:  2012-02-10       Impact factor: 1.950

2.  Molecular evolution of the non-coding eosinophil granule ontogeny transcript.

Authors:  Dominic Rose; Peter F Stadler
Journal:  Front Genet       Date:  2011-10-05       Impact factor: 4.599

3.  GermlncRNA: a unique catalogue of long non-coding RNAs and associated regulations in male germ cell development.

Authors:  Alfred Chun-Shui Luk; Huayan Gao; Sizhe Xiao; Jinyue Liao; Daxi Wang; Jiajie Tu; Owen M Rennert; Wai-Yee Chan; Tin-Lap Lee
Journal:  Database (Oxford)       Date:  2015-05-17       Impact factor: 3.451

4.  Genome-wide characterization of intergenic polyadenylation sites redefines gene spaces in Arabidopsis thaliana.

Authors:  Xiaohui Wu; Yong Zeng; Jinting Guan; Guoli Ji; Rongting Huang; Qingshun Q Li
Journal:  BMC Genomics       Date:  2015-07-09       Impact factor: 3.969

5.  Comparison of splice sites reveals that long noncoding RNAs are evolutionarily well conserved.

Authors:  Anne Nitsche; Dominic Rose; Mario Fasold; Kristin Reiche; Peter F Stadler
Journal:  RNA       Date:  2015-03-23       Impact factor: 4.942

6.  Purifying selection on splice-related motifs, not expression level nor RNA folding, explains nearly all constraint on human lincRNAs.

Authors:  Andreas Schüler; Avazeh T Ghanbarian; Laurence D Hurst
Journal:  Mol Biol Evol       Date:  2014-08-25       Impact factor: 16.240

Review 7.  A Review on Recent Computational Methods for Predicting Noncoding RNAs.

Authors:  Yi Zhang; Haiyun Huang; Dahan Zhang; Jing Qiu; Jiasheng Yang; Kejing Wang; Lijuan Zhu; Jingjing Fan; Jialiang Yang
Journal:  Biomed Res Int       Date:  2017-05-03       Impact factor: 3.411

8.  Identification of novel transcripts and noncoding RNAs in bovine skin by deep next generation sequencing.

Authors:  Rosemarie Weikard; Frieder Hadlich; Christa Kuehn
Journal:  BMC Genomics       Date:  2013-11-14       Impact factor: 3.969

Review 9.  Machine learning and genome annotation: a match meant to be?

Authors:  Kevin Y Yip; Chao Cheng; Mark Gerstein
Journal:  Genome Biol       Date:  2013-05-29       Impact factor: 13.583

10.  Role of Pnn in alternative splicing of a specific subset of lncRNAs of the corneal epithelium.

Authors:  Jeong Hoon Joo; Danny Ryu; Qian Peng; Stephen P Sugrue
Journal:  Mol Vis       Date:  2014-11-16       Impact factor: 2.367

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.