Literature DB >> 27634950

RiboDiff: detecting changes of mRNA translation efficiency from ribosome footprints.

Yi Zhong1, Theofanis Karaletsos1, Philipp Drewe2, Vipin T Sreedharan1, David Kuo1, Kamini Singh3, Hans-Guido Wendel3, Gunnar Rätsch1,4.   

Abstract

MOTIVATION: Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed.
RESULTS: We present a statistical framework and an analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy.
AVAILABILITY AND IMPLEMENTATION: RiboDiff webpage http://bioweb.me/ribodiff Source code including scripts for preprocessing the FASTQ data are available at http://github.com/ratschlab/ribodiff CONTACTS: zhongy@cbio.mskcc.org or raetsch@inf.ethz.chSupplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 27634950      PMCID: PMC5198522          DOI: 10.1093/bioinformatics/btw585

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

The recently described ribosome footprinting technology (Ingolia ) allows the identification of mRNA fragments that were protected by the ribosome. It provides valuable information on ribosome occupancy and, thereby indirectly, on protein synthesis activity. This technology can be leveraged by combining the measurements from RNA-Seq estimates in order to determine a gene’s translation efficiency (TE), which is the ratio of the abundances of translated mRNA and available mRNA (Ingolia ). The normalization by mRNA abundance is designed to remove transcriptional activity as a confounder of RF abundance. The TEs in treatment/control experiments can then be compared to identify genes most affected w.r.t. translation efficiency. For instance, Thoreen considered a ratio (fold-change) of the TEs of treatment and control. However, what these initial approaches only take into account partially is that one typically only obtains uncertain estimates of the mRNA and ribosome abundance. In particular for lowly expressed genes, the error bars for the ratio of two TE values can be large. As in proper RNA-Seq analyses, one should consider the uncertainty in these abundance measurements when testing for differential abundance. For RNA-Seq, this has been described in various ways often based on generalized linear models taking advantage of dispersion information from biological replicates (Anders ; Drewe ; Robinson ). In Wolfe and Zhong , a way to adopt an approach for RNA-Seq analysis for this problem was described that had several conceptual and practical limitations. Here, we describe a novel statistical framework that also uses a generalized linear model to detect effects of a particular treatment on mRNA translation. Additionally, our approach accounts for the fact that two different sequencing protocols with distinct statistical characteristics are used. We compare it to the Z-score based approach (Thoreen ), DESeq2 (Love ) and a recently published tool Babel (Olshen ) that is based on errors-in-variables regression. Shell and Python scripts for trimming RF adaptor, aligning reads, removing rRNA contamination and counting reads are also included in the RiboDiff package.

2 Methods

In sequencing-based ribosome footprinting, the RF read count is naturally confounded by mRNA abundance (Fig. 1A). We seek a strategy to compare RF measurements taking mRNA abundance into account in order to accurately discern the translation effect in case–control experiments. We model the vector of RNA-Seq and RF read counts and , respectively, for gene i with Negative Binomial (NB) distributions, as described before (for instance, Love ; Drewe ; Robinson ): where μ is the expected count and κ is the estimated dispersion across biological replicates. Here y denotes the observed counts normalized by the library size factor (Supplementary Section A). Formulating the problem as a generalized linear model (GLM) with the logarithm as link function, we can express expectations on read counts as a function of latent quantities related to mRNA abundance β in the two conditions (), a quantity that relates mRNA abundance to RNA-Seq read counts, a quantity that relates mRNA abundance to RF read counts and a quantity that captures the effect of the treatment on translation. In particular, the expected RNA-Seq read count is given by the equation .
Fig. 1

(A) Graphical model representing RidoDiff (Gray circle: observable variables; empty circle: unobservable variables; black square: functions; r denotes biological replicates; i denotes a gene and G is the number of genes). The dashed line denotes the relationship that we aim to test (see Methods for details). (B) Receiver operating characteristic (ROC) curve of RiboDiff (with separate dispersions), edgeR and DESeq2 (with interaction model), Z-score method and Babel on simulated data with large difference between dispersions of RF and RNA-Seq counts (see also Supplementary Fig. S-4). (C) Comparison of the distribution of TE ratios of genes that were detected to have a significant change in translation efficiency by RiboDiff (w/joint dispersion), Z-score based analysis and Babel. DESeq2 was very similar to RiboDiff (w/joint dispersion) and was omitted. Data was taken from GEO accession GSE56887 (Color version of this figure is available at Bioinformatics online.)

(A) Graphical model representing RidoDiff (Gray circle: observable variables; empty circle: unobservable variables; black square: functions; r denotes biological replicates; i denotes a gene and G is the number of genes). The dashed line denotes the relationship that we aim to test (see Methods for details). (B) Receiver operating characteristic (ROC) curve of RiboDiff (with separate dispersions), edgeR and DESeq2 (with interaction model), Z-score method and Babel on simulated data with large difference between dispersions of RF and RNA-Seq counts (see also Supplementary Fig. S-4). (C) Comparison of the distribution of TE ratios of genes that were detected to have a significant change in translation efficiency by RiboDiff (w/joint dispersion), Z-score based analysis and Babel. DESeq2 was very similar to RiboDiff (w/joint dispersion) and was omitted. Data was taken from GEO accession GSE56887 (Color version of this figure is available at Bioinformatics online.) We assume that transcription and translation are successive cellular processing steps and that abundances are linearly related. The expected RF read count, , is given by . A key point to note is that is revealed to be a shared parameter between the expressions governing the expected RNA-Seq and RF counts. It can be considered to be a proxy for shared transcriptional/translation activity under condition C in this context. Then, indicates the deviation from that activity under condition C, with for C = 0 and free otherwise (See Supplementary Section B for more details). Fitting the GLM consists of learning the parameters β and dispersions κ given mRNA and RF counts for the two conditions . We perform alternating optimization of the parameters β given dispersions κ and the dispersion parameters κ given β, similar to the EM algorithm (Supplementary Sections B and C): As experimental procedures for measuring mRNA counts and RF counts differ, we enable the estimating of separate dispersion parameters for the data sources of RNA-Seq and RF profiling to account for different characteristics (Supplementary Section E). As in Anders , with raw dispersions estimated from previous steps, we regress all κ given the mean counts to obtain a mean-dispersion relationship . We perform empirical Bayes shrinkage (Love ) to shrink κ towards to stabilize estimates (see Supplementary Section D). The proposed model in RiboDiff with a joint dispersion estimate is conceptually identical to using the following GLM design matrix (for instance, in conjunction with edgeR or DESeq1/2). In a treatment/control setting, we can then evaluate whether a treatment (C = 1) has a significant differential effect on translation efficiency compared to the control (C = 0). This is equivalent to determining whether the parameter differs significantly from 0 and whether the relationship denoted by the dashed arrow in Figure 1A is needed or not. We can compute significance levels based on the distribution by analyzing -likelihood ratios of the Null model () and the alternative model ().

3 Results and discussion

We simulated data with different dispersions applied to mRNA and RF counts (see Supplementary Section F). We illustrate the performance of our method RiboDiff (with separate dispersion estimates) as well as Babel and the Z-score method. Although conceptually closely related to RiboDiff with joint dispersion estimates, we also include DESeq2 and edgeR with a GLM that includes an interaction term (GLM ) to model RNA-seq and RF counts. Figure 1B shows the receiver operating characteristic (ROC) curve for a case with large dispersion differences between RF and RNA-seq counts. RiboDiff exhibits a superior detection accuracy compared to edgeR, DESeq2, Babel and Z-score method, which is less pronounced when RF and RNA-Seq dispersions are more similar (see Supplementary Fig. S-4). We obtained close to identical results for RiboDiff with joint dispersion and DESeq2 with interaction term, although edgeR with the same setting is slightly better than RiboDiff with joint dispersion (data not shown). Our experiments illustrate that it can be beneficial to use the RiboDiff model with separate dispersions, in particular, when the dispersions of RF and RNA-seq data differ considerably. We also re-analyzed previously released ribosome footprint data (GEO accession GSE56887). After multiple testing correction, RiboDiff detected 601 TE down-regulated genes and 541 up-regulated ones with FDR 0.05, which is about twice as many as reported previously. The new significant TE change set includes more than 90% genes identified in the previous study. RiboDiff is also compared to Z-score method and we find major differences (see Fig. 1C). Supplementary Section G provides the evidences showing that the Z-score based method is biased towards genes with low read count, whereas RiboDiff identifies more plausible differences. Babel identifies only very few genes with differential TE. We ran the differential test of RiboDiff on a machine with 1.7 GHz CPU and 4 GB RAM, it took 23 min of computing time to finish (10 474 genes having both mRNA and RF counts). In summary, we propose a novel statistical model to analyze the effect of the treatment on mRNA translation and to identify genes of differential translation efficiency. A major advantage of this method is facilitating comparisons of RF abundance by taking mRNA abundance variability as a confounding factor. Moreover, RiboDiff is specifically tailored to produce robust dispersion estimates for different sequencing protocols measuring gene expression and ribosome occupancy that have different statistical properties. The described approach is statistically sound and identifies a similar set of genes from a less developed method that was used in recent work Wolfe . The release of this tool is expected to enable proper analyses of data from many future RF profiling experiments (e.g. Su ). The described model assumes that RNA-seq and RF samples are unpaired and it is future work to extend the flexibility of the tool to a broader range of experimental settings. Click here for additional data file.
  10 in total

1.  Assessing gene-level translational control from ribosome profiling.

Authors:  Adam B Olshen; Andrew C Hsieh; Craig R Stumpf; Richard A Olshen; Davide Ruggero; Barry S Taylor
Journal:  Bioinformatics       Date:  2013-09-18       Impact factor: 6.937

2.  Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes.

Authors:  Nicholas T Ingolia; Liana F Lareau; Jonathan S Weissman
Journal:  Cell       Date:  2011-11-03       Impact factor: 41.582

3.  Detecting differential usage of exons from RNA-seq data.

Authors:  Simon Anders; Alejandro Reyes; Wolfgang Huber
Journal:  Genome Res       Date:  2012-06-21       Impact factor: 9.043

4.  A unifying model for mTORC1-mediated regulation of mRNA translation.

Authors:  Carson C Thoreen; Lynne Chantranupong; Heather R Keys; Tim Wang; Nathanael S Gray; David M Sabatini
Journal:  Nature       Date:  2012-05-02       Impact factor: 49.962

5.  Interferon-γ regulates cellular metabolism and mRNA translation to potentiate macrophage activation.

Authors:  Xiaodi Su; Yingpu Yu; Yi Zhong; Eugenia G Giannopoulou; Xiaoyu Hu; Hui Liu; Justin R Cross; Gunnar Rätsch; Charles M Rice; Lionel B Ivashkiv
Journal:  Nat Immunol       Date:  2015-06-29       Impact factor: 25.606

6.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.

Authors:  Michael I Love; Wolfgang Huber; Simon Anders
Journal:  Genome Biol       Date:  2014       Impact factor: 13.583

7.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

8.  The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments.

Authors:  Nicholas T Ingolia; Gloria A Brar; Silvia Rouskin; Anna M McGeachy; Jonathan S Weissman
Journal:  Nat Protoc       Date:  2012-07-26       Impact factor: 13.491

9.  Accurate detection of differential RNA processing.

Authors:  Philipp Drewe; Oliver Stegle; Lisa Hartmann; André Kahles; Regina Bohnert; Andreas Wachter; Karsten Borgwardt; Gunnar Rätsch
Journal:  Nucleic Acids Res       Date:  2013-04-12       Impact factor: 16.971

10.  RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer.

Authors:  Andrew L Wolfe; Kamini Singh; Yi Zhong; Philipp Drewe; Vinagolu K Rajasekhar; Viraj R Sanghvi; Konstantinos J Mavrakis; Man Jiang; Justine E Roderick; Joni Van der Meulen; Jonathan H Schatz; Christina M Rodrigo; Chunying Zhao; Pieter Rondou; Elisa de Stanchina; Julie Teruya-Feldstein; Michelle A Kelliher; Frank Speleman; John A Porco; Jerry Pelletier; Gunnar Rätsch; Hans-Guido Wendel
Journal:  Nature       Date:  2014-07-27       Impact factor: 49.962

  10 in total
  57 in total

1.  Functionally significant metabolic differences between B and T lymphocyte lineages.

Authors:  Jasneet Kaur Khalsa; Amanpreet Singh Chawla; Savit B Prabhu; Mukti Vats; Atika Dhar; Gagan Dev; Nabanita Das; Sandip Mukherjee; Shalini Tanwar; Hridesh Banerjee; Jeannine Marie Durdik; Vineeta Bal; Anna George; Satyajit Rath; Gopalakrishnan Aneeshkumar Arimbasseri
Journal:  Immunology       Date:  2019-08-26       Impact factor: 7.397

2.  Generally applicable transcriptome-wide analysis of translation using anota2seq.

Authors:  Christian Oertlin; Julie Lorent; Carl Murie; Luc Furic; Ivan Topisirovic; Ola Larsson
Journal:  Nucleic Acids Res       Date:  2019-07-09       Impact factor: 16.971

3.  Widespread Alterations in Translation Elongation in the Brain of Juvenile Fmr1 Knockout Mice.

Authors:  Sohani Das Sharma; Jordan B Metz; Hongyu Li; Benjamin D Hobson; Nicholas Hornstein; David Sulzer; Guomei Tang; Peter A Sims
Journal:  Cell Rep       Date:  2019-03-19       Impact factor: 9.423

4.  Genome-wide maps of ribosomal occupancy provide insights into adaptive evolution and regulatory roles of uORFs during Drosophila development.

Authors:  Hong Zhang; Shengqian Dou; Feng He; Junjie Luo; Liping Wei; Jian Lu
Journal:  PLoS Biol       Date:  2018-07-20       Impact factor: 8.029

5.  Isoform-Level Interpretation of High-Throughput Proteomics Data Enabled by Deep Integration with RNA-seq.

Authors:  Becky C Carlyle; Robert R Kitchen; Jing Zhang; Rashaun S Wilson; Tukiet T Lam; Joel S Rozowsky; Kenneth R Williams; Nenad Sestan; Mark B Gerstein; Angus C Nairn
Journal:  J Proteome Res       Date:  2018-09-06       Impact factor: 4.466

6.  Bayesian prediction of RNA translation from ribosome profiling.

Authors:  Brandon Malone; Ilian Atanassov; Florian Aeschimann; Xinping Li; Helge Großhans; Christoph Dieterich
Journal:  Nucleic Acids Res       Date:  2017-04-07       Impact factor: 16.971

7.  Transcriptome-wide measurement of translation by ribosome profiling.

Authors:  Nicholas J McGlincy; Nicholas T Ingolia
Journal:  Methods       Date:  2017-06-01       Impact factor: 3.608

8.  Posttranscriptional regulation of colonic epithelial repair by RNA binding protein IMP1/IGF2BP1.

Authors:  Priya Chatterji; Patrick A Williams; Kelly A Whelan; Fernando C Samper; Sarah F Andres; Lauren A Simon; Louis R Parham; Rei Mizuno; Emma T Lundsmith; David Sm Lee; Shun Liang; Hr Sagara Wijeratne; Stefanie Marti; Lillian Chau; Veronique Giroux; Benjamin J Wilkins; Gary D Wu; Premal Shah; Gian G Tartaglia; Kathryn E Hamilton
Journal:  EMBO Rep       Date:  2019-05-06       Impact factor: 8.807

9.  RiboDiPA: a novel tool for differential pattern analysis in Ribo-seq data.

Authors:  Keren Li; C Matthew Hope; Xiaozhong A Wang; Ji-Ping Wang
Journal:  Nucleic Acids Res       Date:  2020-12-02       Impact factor: 16.971

10.  Riborex: fast and flexible identification of differential translation from Ribo-seq data.

Authors:  Wenzheng Li; Weili Wang; Philip J Uren; Luiz O F Penalva; Andrew D Smith
Journal:  Bioinformatics       Date:  2017-06-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.