Literature DB >> 27429446

Efficient Approach to Correct Read Alignment for Pseudogene Abundance Estimates.

Chelsea J-T Ju, Zhuangtian Zhao, Wei Wang.   

Abstract

RNA-Sequencing has been the leading technology to quantify expression of thousands of genes simultaneously. The data analysis of an RNA-Seq experiment starts from aligning short reads to the reference genome/transcriptome or reconstructed transcriptome. However, current aligners lack the sensitivity to distinguish reads that come from homologous regions of an genome. One group of these homologies is the paralog pseudogenes. Pseudogenes arise from duplication of a set of protein coding genes, and have been considered as degraded paralogs in the genome due to their lost of functionality. Recent studies have provided evidence to support their novel regulatory roles in biological processes. With the growing interests in quantifying the expression level of pseudogenes at different tissues or cell lines, it is critical to have a sensitive method that can correctly align ambiguous reads and accurately estimate the expression level among homologous genes. Previously in PseudoLasso, we proposed a linear regression approach to learn read alignment behaviors, and to leverage this knowledge for abundance estimation and alignment correction. In this paper, we extend the work of PseudoLasso by grouping the homologous genomic regions into different communities using a community detection algorithm, followed by building a linear regression model separately for each community. The results show that this approach is able to retain the same accuracy as PseudoLasso. By breaking the genome into smaller homologous communities, the running time is improved from quadratic growth to linear with respect to the number of genes.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27429446      PMCID: PMC5514313          DOI: 10.1109/TCBB.2016.2591533

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  22 in total

1.  De novo assembly and analysis of RNA-seq data.

Authors:  Gordon Robertson; Jacqueline Schein; Readman Chiu; Richard Corbett; Matthew Field; Shaun D Jackman; Karen Mungall; Sam Lee; Hisanaga Mark Okada; Jenny Q Qian; Malachi Griffith; Anthony Raymond; Nina Thiessen; Timothee Cezard; Yaron S Butterfield; Richard Newsome; Simon K Chan; Rong She; Richard Varhol; Baljit Kamoh; Anna-Liisa Prabhu; Angela Tam; YongJun Zhao; Richard A Moore; Martin Hirst; Marco A Marra; Steven J M Jones; Pamela A Hoodless; Inanc Birol
Journal:  Nat Methods       Date:  2010-10-10       Impact factor: 28.547

2.  Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution.

Authors:  Deyou Zheng; Adam Frankish; Robert Baertsch; Philipp Kapranov; Alexandre Reymond; Siew Woh Choo; Yontao Lu; France Denoeud; Stylianos E Antonarakis; Michael Snyder; Yijun Ruan; Chia-Lin Wei; Thomas R Gingeras; Roderic Guigó; Jennifer Harrow; Mark B Gerstein
Journal:  Genome Res       Date:  2007-06       Impact factor: 9.043

3.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

Review 4.  Computational methods for transcriptome annotation and quantification using RNA-seq.

Authors:  Manuel Garber; Manfred G Grabherr; Mitchell Guttman; Cole Trapnell
Journal:  Nat Methods       Date:  2011-05-27       Impact factor: 28.547

5.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap.

Authors:  Kin Fai Au; Hui Jiang; Lan Lin; Yi Xing; Wing Hung Wong
Journal:  Nucleic Acids Res       Date:  2010-04-05       Impact factor: 16.971

6.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.

Authors:  Bo Li; Colin N Dewey
Journal:  BMC Bioinformatics       Date:  2011-08-04       Impact factor: 3.307

7.  GeneScissors: a comprehensive approach to detecting and correcting spurious transcriptome inference owing to RNA-seq reads misalignment.

Authors:  Zhaojun Zhang; Shunping Huang; Jack Wang; Xiang Zhang; Fernando Pardo Manuel de Villena; Leonard McMillan; Wei Wang
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

8.  The GENCODE pseudogene resource.

Authors:  Baikang Pei; Cristina Sisu; Adam Frankish; Cédric Howald; Lukas Habegger; Xinmeng Jasmine Mu; Rachel Harte; Suganthi Balasubramanian; Andrea Tanzer; Mark Diekhans; Alexandre Reymond; Tim J Hubbard; Jennifer Harrow; Mark B Gerstein
Journal:  Genome Biol       Date:  2012-09-26       Impact factor: 13.583

9.  Detecting transcription of ribosomal protein pseudogenes in diverse human tissues from RNA-seq data.

Authors:  Peter Tonner; Vinodh Srinivasasainagendra; Shaojie Zhang; Degui Zhi
Journal:  BMC Genomics       Date:  2012-08-21       Impact factor: 3.969

10.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors:  Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal:  Genome Biol       Date:  2013-04-25       Impact factor: 13.583

View more
  3 in total

Review 1.  Recent advances in Cannabis sativa genomics research.

Authors:  Bhavna Hurgobin; Muluneh Tamiru-Oli; Matthew T Welling; Monika S Doblin; Antony Bacic; James Whelan; Mathew G Lewsey
Journal:  New Phytol       Date:  2021-01-08       Impact factor: 10.151

2.  Integrated Analysis of Single-Molecule Real-Time Sequencing and Next-Generation Sequencing Eveals Insights into Drought Tolerance Mechanism of Lolium multiflorum.

Authors:  Qiuxu Liu; Fangyan Wang; Yang Shuai; Linkai Huang; Xinquan Zhang
Journal:  Int J Mol Sci       Date:  2022-07-18       Impact factor: 6.208

3.  Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development.

Authors:  Andrea Minio; Mélanie Massonnet; Rosa Figueroa-Balderas; Amanda M Vondras; Barbara Blanco-Ulate; Dario Cantu
Journal:  G3 (Bethesda)       Date:  2019-03-07       Impact factor: 3.154

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.