| Literature DB >> 33014344 |
Aikaterini Alexaki1, Jacob Kames1, Gaya K Hettiarachchi1, John C Athey1, Upendra K Katneni1, Ryan C Hunt1, Nobuko Hamasaki-Katagiri1, David D Holcomb1, Michael DiCuccio2, Haim Bar3, Anton A Komar4, Chava Kimchi-Sarfaty1.
Abstract
Ribosome profiling provides the opportunity to evaluate translation kinetics at codon level resolution. Here, we describe ribosome profiling data, generated from two HEK293T cell lines. The ribosome profiling data are composed of Ribo-seq (mRNA sequencing data from ribosome protected fragments) and RNA-seq data (total RNA sequencing). The two HEK293T cell lines each express a version of the F9 gene, both of which are translated into identical proteins in terms of their amino acid sequences. However, these F9 genes vary drastically in their codon usage and predicted mRNA structure. We also provide the pipeline that we used to analyze the data. Further analyzing this dataset holds great potential as it can be used i) to unveil insights into the composition and regulation of the transcriptome, ii) for comparison with other ribosome profiling datasets, iii) to measure the rate of protein synthesis across the proteome and identify differences in elongation rates, iv) to discover previously unidentified translation of peptides, v) to explore the effects of codon usage or codon context in translational kinetics and vi) to investigate cotranslational folding. Importantly, a unique feature of this dataset, compared to other available ribosome profiling data, is the presence of the F9 gene in two very distinct coding sequences. Copyright:Entities:
Keywords: RNA-seq; Ribo-seq; Ribosome profiling; codon optimization; codon pair usage; codon usage; protein therapeutics; translation kinetics
Mesh:
Substances:
Year: 2020 PMID: 33014344 PMCID: PMC7509596 DOI: 10.12688/f1000research.22400.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Flowchart of ribosome profiling data analysis pipeline.
Colored arrows indicate steps that first require execution of utility script (blue and yellow) or require manual input by the user (red). Pipeline steps are represented as ovals (main step) or pentagons (validation / analysis step). Rectangles represent input / output data. UTR: untranslated region, CDS: coding sequence, RPF: ribosome protected fragments, RPKM: reads per kilobase of transcript per million mapped reads.
Figure 2. Ribo-Seq and RNA-Seq data distribution.
( a) Fragment size distribution of Ribo-seq and RNA-seq reads. The average of 6 experiments (3 WT and 3 CO F9) was plotted, s.e.m. are shown. ( b) Distribution of Ribo-seq (left) and RNA-seq (right) reads in mRNA coding regions (CDSs) and untranslated (5’UTR and 3’UTR) regions. The average of 6 experiments (3 WT and 3 CO F9) was plotted, s.e.m. are shown.
Figure 3. Triplet periodicity of Ribo-Seq data.
( a) Profiles of the 5′ end positions of all 20–22 nt (top) and 27–29 nt (bottom) fragments relative to the start codon of their genes. The average of 6 experiments (3 WT and 3 CO F9) was plotted. ( b) Positions of 20–22 nt and 27–29 nt fragments relative to the reading frame of the Ribo-seq (left) and RNA-seq (right) reads. The average of 6 experiments (3 WT and 3 CO F9) was plotted, s.e.m. are shown.
Pearson and Spearman correlations between pairs of experiments.
RPKM of each gene in the Ribo-seq and RNA-seq datasets were calculated, considering reads with the ribosome A site annotated at least 20 nt downstream of the start codon. A comparison between each pair of experiments within the 3 replicates was performed
| Experiment | Ribo-Seq
| RNA-Seq
| Ribo-Seq
| RNA-Seq
|
|---|---|---|---|---|
| WT1-WT2 | 0.9973 | 0.9958 | 0.9426 | 0.9775 |
| WT2-WT3 | 0.9972 | 0.9976 | 0.9513 | 0.9785 |
| WT1-WT3 | 0.9962 | 0.9917 | 0.9384 | 0.9774 |
| CO1-CO2 | 0.9908 | 0.9979 | 0.9314 | 0.9755 |
| CO2-CO3 | 0.9927 | 0.998 | 0.9282 | 0.9771 |
| CO1-CO3 | 0.994 | 0.9979 | 0.9428 | 0.9771 |
Quality data of sequencing files.
Sample ID, index, yield, number of clusters, percent Q30 and above and mean Q score for all sequencing experiments.
| Sample | Index | Yield
| #Cluster | %Q30 | Mean
|
|---|---|---|---|---|---|
| 1R | CAGATC | 1,669 | 13,355,848 | 66.82 | 25.8 |
| 1T | ATCACG | 1,821 | 14,566,314 | 79.59 | 30.33 |
| 2R | ACTTGA | 1,681 | 13,451,867 | 71.56 | 27.47 |
| 2T | CGATGT | 1,867 | 14,932,652 | 78.55 | 29.96 |
| 3R | GATCAG | 1,512 | 12,092,292 | 71.18 | 27.36 |
| 3T | TTAGGC | 1,653 | 13,227,113 | 79.74 | 30.37 |
| 4R | TAGCTT | 1,825 | 14,600,572 | 68.43 | 26.38 |
| 4T | TGACCA | 1,731 | 13,848,340 | 79.75 | 30.38 |
| 5R | GGCTAC | 1,537 | 12,292,279 | 67.63 | 26.08 |
| 5T | ACAGTG | 1,754 | 14,033,677 | 80.22 | 30.55 |
| 6R | CTTGTA | 1,818 | 14,541,142 | 68.64 | 26.44 |
| 6T | GCCAAT | 1,662 | 13,296,276 | 78.6 | 29.97 |
Description of data deposited in SRA.
Filenames, SRA experiment accession, SRA sample accession and brief description of the 12 Ribo-seq and RNA-seq FASTQ files. All data files are accessible from SRA BioProject accession PRJNA591214. Data files represent three replicates of each condition (WT F9 Ribo-seq, WT F9 RNA-seq, CO F9 Ribo-seq and CO F9 RNA-seq).
| Filename | SRA
| SRA Sample | Description |
|---|---|---|---|
| 1R_CAGATC_L002_R1_001.fastq.gz | SRX7201733 | SAMN13354200 | WT F9 RIBO-SEQ 1 |
| 1T_ATCACG_L002_R1_001.fastq.gz | SRX7201734 | SAMN13354201 | WT F9 mRNA-SEQ 1 |
| 2R_ACTTGA_L002_R1_001.fastq.gz | SRX7201737 | SAMN13354202 | WT F9 RIBO-SEQ 2 |
| 2T_CGATGT_L002_R1_001.fastq.gz | SRX7201738 | SAMN13354203 | WT F9 mRNA-SEQ 2 |
| 3R_GATCAG_L002_R1_001.fastq.gz | SRX7201739 | SAMN13354204 | WT F9 RIBO-SEQ 3 |
| 3T_TTAGGC_L002_R1_001.fastq.gz | SRX7201740 | SAMN13354205 | WT F9 mRNA-SEQ 3 |
| 4R_TAGCTT_L002_R1_001.fastq.gz | SRX7201741 | SAMN13354206 | CO F9 RIBO-SEQ 1 |
| 4T_TGACCA_L002_R1_001.fastq.gz | SRX7201742 | SAMN13354207 | CO F9 mRNA-SEQ 1 |
| 5R_GGCTAC_L002_R1_001.fastq.gz | SRX7201743 | SAMN13354208 | CO F9 RIBO-SEQ 2 |
| 5T_ACAGTG_L002_R1_001.fastq.gz | SRX7201744 | SAMN13354209 | CO F9 mRNA-SEQ 2 |
| 6R_CTTGTA_L002_R1_001.fastq.gz | SRX7201735 | SAMN13354210 | CO F9 RIBO-SEQ 3 |
| 6T_GCCAAT_L002_R1_001.fastq.gz | SRX7201736 | SAMN13354211 | CO F9 mRNA-SEQ 3 |