Literature DB >> 31400221

Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data.

Wenjiang Deng1, Tian Mou1, Krishna R Kalari2, Nifang Niu3, Liewei Wang3, Yudi Pawitan1, Trung Nghia Vu1.   

Abstract

MOTIVATION: Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations-such as GC content-and applied in single samples separately. The main problem is that not all biases are known.
RESULTS: We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
AVAILABILITY AND IMPLEMENTATION: The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Mesh:

Substances:

Year:  2020        PMID: 31400221     DOI: 10.1093/bioinformatics/btz640

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  6 in total

1.  Quantification of mutant-allele expression at isoform level in cancer from RNA-seq data.

Authors:  Wenjiang Deng; Tian Mou; Yudi Pawitan; Trung Nghia Vu
Journal:  NAR Genom Bioinform       Date:  2022-07-13

2.  Discovery of druggable cancer-specific pathways with application in acute myeloid leukemia.

Authors:  Quang Thinh Trac; Tingyou Zhou; Yudi Pawitan; Trung Nghia Vu
Journal:  Gigascience       Date:  2022-09-29       Impact factor: 7.658

Review 3.  Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop.

Authors:  Rob Patro; Leena Salmela
Journal:  iScience       Date:  2020-12-17

4.  Isoform-level Quantification for Single-Cell RNA Sequencing.

Authors:  Lu Pan; Huy Q Dinh; Yudi Pawitan; Trung Nghia Vu
Journal:  Bioinformatics       Date:  2021-12-02       Impact factor: 6.937

5.  Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients.

Authors:  Wenjiang Deng; Sarath Murugan; Johan Lindberg; Venkatesh Chellappa; Xia Shen; Yudi Pawitan; Trung Nghia Vu
Journal:  Front Genet       Date:  2022-02-16       Impact factor: 4.599

6.  Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.

Authors:  Philip Davies; Matt Jones; Juntai Liu; Daniel Hebenstreit
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.