| Literature DB >> 36035799 |
Keegan Flanagan1, Wanxin Li2, Ethan J Greenblatt3, Khanh Dao Duc4.
Abstract
Ribosome profiling is a powerful technique which maps the distribution of ribosomes along mRNAs to analyze translation genome-wide. Ribosome density can be affected by multiple factors, such as changes to translation initiation or elongation rates. We describe the application of a metric for identifying genes rate-limited by these rates by analyzing the relative distribution of ribosome footprints along transcripts. This protocol also details two sample analyses comparing gene translation efficiencies and the distribution of ribosome densities on downloadable datasets. For complete details on the use and execution of this protocol, please refer to Flanagan et al. (2022).Entities:
Keywords: Bioinformatics; Genomics; Sequence analysis
Mesh:
Substances:
Year: 2022 PMID: 36035799 PMCID: PMC9405084 DOI: 10.1016/j.xpro.2022.101605
Source DB: PubMed Journal: STAR Protoc ISSN: 2666-1667
Figure 1lllustration of translation dynamics and resulting read distributions for initiation rate-limited translation
Read densities from initiation limited transcripts are similar across the entire transcript, whereas under elongation limitation reads preferentially come from the 5′ end prior to a stalling site (Erdmann-Pham et al., 2020; Woolstenhulme et al., 2015). This causes the cumulative distribution of reads to have a characteristic kinked shape. The K-S statistic is calculated by comparing the maximum distance between the two cumulative distributions. As such, the K-S statistic will be relatively high when comparing distributions of transcripts which are initiation limited under one condition and elongation limited in the other. Read density data were created using simulations based on the inhomogeneous l-TASEP model (Erdmann-Pham et al., 2021).
Figure 2RiboDiff outputs
(A) Scatterplot comparing the log transformed translation efficiency ratios vs the mean read counts from the ribosome profiling data. Genes found to have significant changes in translation efficiency are colored yellow while non-significant genes are colored in gray. Genes exhibiting high K-S statistics are named and circled in red.
(B) Histogram showing the distribution of log transformed translation efficiency ratios for all genes. The distribution for genes found to have significantly increased or decreased translation efficiencies are shown in red and blue respectively.
(C) Comparison of dispersion measurements from sequencing data. Scatterplot showing similar amounts of dispersion in RNA-seq and ribosome profiling data, indicating that the use of a single dispersion measurement is sufficient.
Figure 3Differential analysis of ribosome profiling data for data treated to induce elongation limitation
Treatments included the depletion of selenocysteine carrying tRNA which arrested the elongation of selenoproteins (Fradejas-Villar et al., 2017), and the introduction of Torin 1 which inhibited the initiation of translation for transcripts that contain 5′TOP regions (Philippe et al., 2020). Dot plots with overlayed violin plots show the distribution of K-S statistics for genes which are targets and non-targets for the treatment. Bar plots show the enrichment of treatment targets in low, medium, and high K-S fractions. P-values were calculated using Fisher’ exact test and adjusted using the Benjamini Hochberg method. Enrichment in the high K-S fraction and changes in K-S statistic distributions are only present for selenoproteins and non-selenoproteins (A), and not for TOP mRNAs and non-TOP mRNAs (B). This suggests that only limiting elongation rates impacts our K-S statistic metric.
Figure 4Determination of the K-S statistic for elongation limited selenoprotein Selenop
(A and B) Barplots depicting the normalized read density distribution for a gene from the condition 1 dataset (A) and the condition 2 dataset (B) (see also Figure 3A).
(C and D) Lineplots comparing the smoothed, normalized read densities (C) and the cumulative read densities (D) between condition 1 and 2. The K-S statistic is calculated as the maximum distance between the two cumulative distributions.
Figure 5Determination of the K-S statistic for initiation limited non-selenoprotein Aox3 (see also Figure 3A)
(A and B) Barplots depicting the normalized read density distribution for a gene from the condition 1 dataset (A) and the condition 2 dataset (B).
(C and D) Lineplots comparing the smoothed, normalized read densities (C) and the cumulative read densities (D) between condition 1 and 2. The K-S statistic is calculated as the maximum distance between the two cumulative distributions.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Mouse reference genome assembly | ( | UCSC Genome Browser ftp: |
| Mouse reference genome annotation file | ( | UCSC Genome Browser ftp: |
| Mouse non-coding RNA assembly, release 105 | ( | Ensembl ftp: |
| Mouse mRNA sequencing data | ( | SRA Run Selector: |
| Mouse ribosome profiling data | ( | SRA Run Selector: |
| List of selenocysteine containing mouse genes | ( | NA |
| STAR(2.7.9a) | ( | |
| RSEM(1.3.3) | ( | |
| Subread/featureCounts (2.0.1) | ( | |
| FASTX-Toolkit (0.013) | ( | |
| ncbi/sra-tools (2.8.0) | NA | |
| bowtie2 (2.4.4) | ( | |
| plastid (0.4.8) | ( | |
| R (>=3.5.0) | NA | |
| Jupyterlab (3.2.4) | ( | |
| Jupyter | ( | |
| riboWaltz (1.2.0) | ( | |
| NumPy (1.21.4) | ( | |
| Pandas (1.3.4) | ( | |
| Matplotlib (3.5.0) | ( | |
| Devtools (2.4.3) | ( | |
| RiboDiff (0.2.1) | ( | |
| Python (>=3.8.1) | ( | |
| Hardware: AMD Ryzen Threadripper 2950X 16-Core Processor, 128 GB RAM, and Ubuntu version 20.04.1 | NA | NA |