Literature DB >> 28968682

Identification and visualization of differential isoform expression in RNA-seq time series.

María José Nueda1, Jordi Martorell-Marugan2, Cristina Martí2, Sonia Tarazona2,3, Ana Conesa2,4.   

Abstract

Motivation: As sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data.
Results: Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. Availability and implementation: The package is freely available under the LGPL license from the Bioconductor web site. Contact: mj.nueda@ua.es or aconesa@ufl.edu. Supplementary information: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2017. This work is written by US Government employees and are in the public domain in the US.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 28968682      PMCID: PMC5860359          DOI: 10.1093/bioinformatics/btx578

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Alternative splicing (AS) is a common mechanism of higher eukaryotes to expand transcriptome complexity and functional diversity. The expression of alternative isoforms of many genes respond to developmental regulation (Vuong ) and to environmental cues (AlShareef ) and hence, there is an interest in studying the dynamics of AS by RNA-seq. While many algorithms have been developed for differential AS analysis most of these approaches target pair-wise comparisons. Dedicated methods for time series AS analysis either restrict to the estimation of isoform levels (Huang and Sanguinetti, 2016) or require large datasets to model time profiles (Topa and Honkela, 2016). The analysis of differential isoform expression in time course experiments poses a number of specific challenges. Different transcripts of the same gene may vary in their time trajectories and the analysis algorithm should be able to identify those genes where isoform profiles change differently in a significant manner. Additionally, clustering is complicated by the fact that genes have different number of isoforms and hence data do not fit into the structure of traditional clustering, where the same number of data points is required for each feature. Therefore, novel clustering strategies should be envisioned. Finally, transcripts of the same gene have frequently very different expression levels, with one ‘major’ isoform being most expressed and alternative isoforms having lower expression. Ideally, analysis approaches should be able to account for this.maSigPro is an R package designed for the analysis of multiple time course transcriptomics data (Nueda ). We present here Iso-maSigPro, a further adaptation of this method to study differential isoform usage in time course RNA-seq experiments. More elaborated motivation and details on the algorithm can be found in Supplementary Materials.

2 Methods

Following the generalized linear model (GLM) described in Nueda , for each multi-isoform gene two GLM models are created, identifying J isoforms with J − 1 binary variables (I1,…, ). The reference model, M0, considers there exist only constant differences between isoforms and the global gene model, M1, considers the possibility of a time versus condition versus isoform interaction. For instance, for a gene with two isoforms, two experimental conditions or series and linear effects: being g the ‘link function’ that characterizes the GLM, the expected value of isoform expression y for observation i and isoform j, t the time and the binary variable that identifies the experimental condition. The significance of the interaction is estimated based on log-likelihood ratio statistic of the two models (Supplementary Materials). Iso-maSigPro takes as input a transcript-level expression data frame including a column with gene assignments. Seven new functions enable analysis of differentially expressed isoforms (Fig. 1 and Supp. Materials):
Fig. 1

Workflow for Iso-maSigPro analysis

Workflow for Iso-maSigPro analysis IsoModel() implements the DS models M0 and M1 for each multi-isoform gene, using the polynomial model obtained with the generic make.design.matrix() maSigPro function that best describes the experimental design. The comparison of both models gives as a result a FDR-corrected P-value of differential splicing. Transcripts from significant DSGs are then subjected to regular Next-maSigPro analysis to detect differentially expressed transcripts (DETs). IsoModel() returns a list of DSGs together with the estimated models of associated isoforms to be used as input in getDS() function to obtain a selection of DSGs at a pre-established level of goodness of fit. seeDS() creates a clustering of all differential transcripts (regardless their genes) and tableDS() identifies the cluster assignment of major and secondary isoforms for each gene. Genes with specific profiles in their isoforms can be selected with the function getDSPattern() and visualized with IsoPlot() PodiumChange() identifies DSGs with a switch of major isoform at the specified time points.

3 Results

Iso-maSigPro was applied to the analysis of a public RNA-seq dataset (GEO accession GSE75417) describing a mouse six time points B-cell differentiation course triggered by the expression of the transcription factor Ikaros. Transcripts were quantified with eXpress (Roberts and Pachter, 2013) to find a total of 34 156 transcripts belonging to 12 572 genes, of which 6882 genes are multi-isoform. The IsoModel() function gave as overall result the selection of 347 DSGs containing a total of 1239 transcripts. Of these, 665 also had significant time course changes (DETs) (Supplementary Table S1). seeDS() grouped these 665 DETs into 6 clusters (Supplementary Fig. S1 and Table S2) and tableDS() identified the cluster assignment of major and minor forms to reveal that for most DSGs, differential isoforms did express similar trajectories (Supplementary Table S3). However, Iso-maSigPro functions facilitated the identification and visualization of genes with biologically interesting isoform expression changes. Figure 2A shows the expression of Nfkb2 identified with getDSPattern() as a DSG with significant transcripts in two different seeDS() clusters (major isoform in cluster 4 and minor isoform in cluster 1, respectively down and up regulation patterns after Ikaros induction). PodiumChange() helped to locate 37 genes with major isoform switches at the latest time points (Supplementary Table S4 and Fig. S2). Figure 2B shows an example of one such gene (Mxi1), transcriptional repressor involved in B-cell differentiation (see more in Supplementary Fig. S3).
Fig. 2

IsoPlot() examples of the two major Iso-maSigPro DSG functionalities. (A) Nfkb2 has isoforms in cluster 1 and 4. (B) Mxi1 is a podium change gene. Ctr, Control, Ik, Ikaros

IsoPlot() examples of the two major Iso-maSigPro DSG functionalities. (A) Nfkb2 has isoforms in cluster 1 and 4. (B) Mxi1 is a podium change gene. Ctr, Control, Ik, Ikaros

4 Discussion

The Iso-maSigPro set of functions updates the maSigPro framework to analyze isoform changes in time course transcriptomics data. We model differential isoform utilization as the interactions between the isoform, experimental condition and time, and evaluate significance with the log-likelihood ratio statistic of the models including or not this interaction. To extract biologically meaningful changes in relative isoform abundances, we introduced new clustering and querying functions. seeDS() and tableDS() help to find genes with substantial isoform profile differences in time, while PodiumChange() identifies those cases with a switch in the most expressed transcript. We showed examples where these functions helped to select genes with functionally relevant isoform changes. maSigPro is the first Bioconductor package with specific functions for the analysis of time course alternative isoform expression.

Funding

This work was supported by EU FP7 STATegra project agreement [306000]; and the Spanish Ministry of Economy and Competitiveness [BIO2012-40244 and BIO2015-71658-R]. Conflict of Interest: none declared. Click here for additional data file.
  6 in total

1.  Statistical modeling of isoform splicing dynamics from RNA-seq time series data.

Authors:  Yuanhua Huang; Guido Sanguinetti
Journal:  Bioinformatics       Date:  2016-06-17       Impact factor: 6.937

Review 2.  The neurogenetics of alternative splicing.

Authors:  Celine K Vuong; Douglas L Black; Sika Zheng
Journal:  Nat Rev Neurosci       Date:  2016-05       Impact factor: 34.870

3.  Streaming fragment assignment for real-time analysis of sequencing experiments.

Authors:  Adam Roberts; Lior Pachter
Journal:  Nat Methods       Date:  2012-11-18       Impact factor: 28.547

4.  Analysis of differential splicing suggests different modes of short-term splicing regulation.

Authors:  Hande Topa; Antti Honkela
Journal:  Bioinformatics       Date:  2016-06-15       Impact factor: 6.937

5.  Herboxidiene triggers splicing repression and abiotic stress responses in plants.

Authors:  Sahar AlShareef; Yu Ling; Haroon Butt; Kiruthiga G Mariappan; Moussa Benhamed; Magdy M Mahfouz
Journal:  BMC Genomics       Date:  2017-03-27       Impact factor: 3.969

6.  Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series.

Authors:  María José Nueda; Sonia Tarazona; Ana Conesa
Journal:  Bioinformatics       Date:  2014-06-03       Impact factor: 6.937

  6 in total
  8 in total

1.  Disentangling the aging gene expression network of termite queens.

Authors:  José Manuel Monroy Kuhn; Karen Meusemann; Judith Korb
Journal:  BMC Genomics       Date:  2021-05-11       Impact factor: 3.969

2.  Modeling and analysis of RNA-seq data: a review from a statistical perspective.

Authors:  Wei Vivian Li; Jingyi Jessica Li
Journal:  Quant Biol       Date:  2018-08-10

3.  A New Machine Learning-Based Framework for Mapping Uncertainty Analysis in RNA-Seq Read Alignment and Gene Expression Estimation.

Authors:  Adam McDermaid; Xin Chen; Yiran Zhang; Cankun Wang; Shaopeng Gu; Juan Xie; Qin Ma
Journal:  Front Genet       Date:  2018-08-14       Impact factor: 4.599

4.  IRIS-EDA: An integrated RNA-Seq interpretation system for gene expression data analysis.

Authors:  Brandon Monier; Adam McDermaid; Cankun Wang; Jing Zhao; Allison Miller; Anne Fennell; Qin Ma
Journal:  PLoS Comput Biol       Date:  2019-02-14       Impact factor: 4.475

5.  tappAS: a comprehensive computational framework for the analysis of the functional impact of differential splicing.

Authors:  Lorena de la Fuente; Ángeles Arzalluz-Luque; Manuel Tardáguila; Héctor Del Risco; Cristina Martí; Sonia Tarazona; Pedro Salguero; Raymond Scott; Alberto Lerma; Ana Alastrue-Agudo; Pablo Bonilla; Jeremy R B Newman; Shunichi Kosugi; Lauren M McIntyre; Victoria Moreno-Manzano; Ana Conesa
Journal:  Genome Biol       Date:  2020-05-18       Impact factor: 13.583

6.  Time Series Transcriptomic Analysis of Bronchoalveolar Lavage Cells from Piglets Infected with Virulent or Low-Virulent Porcine Reproductive and Respiratory Syndrome Virus 1.

Authors:  J M Sánchez-Carvajal; I M Rodríguez-Gómez; I Ruedas-Torres; S Zaldívar-López; F Larenas-Muñoz; R Bautista-Moreno; J J Garrido; F J Pallarés; L Carrasco; J Gómez-Laguna
Journal:  J Virol       Date:  2021-12-01       Impact factor: 5.103

7.  Transcriptional Profiling of Host Cell Responses to Virulent Haemophilus parasuis: New Insights into Pathogenesis.

Authors:  Shulin Fu; Jing Guo; Ruizhi Li; Yinsheng Qiu; Chun Ye; Yu Liu; Zhongyuan Wu; Ling Guo; Yongqing Hou; Chien-An Andy Hu
Journal:  Int J Mol Sci       Date:  2018-04-29       Impact factor: 5.923

8.  Study on the differentially expressed genes and signaling pathways in dermatomyositis using integrated bioinformatics method.

Authors:  Wei Liu; Wen-Jia Zhao; Yuan-Hao Wu
Journal:  Medicine (Baltimore)       Date:  2020-08-21       Impact factor: 1.817

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.