| Literature DB >> 35717158 |
Stian Ellefsen1,2, Rafi Ahmad3,4, Yusuf Khan5,1, Daniel Hammarström1.
Abstract
BACKGROUND: The biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, the choice of the normalization method is often not explicitly motivated although this choice may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as (1) per-library-size, (2) per-total-RNA, and (3) per-sample-size (per-mg-tissue). RESULT: Initially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 27% and 20% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 4% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively).Entities:
Keywords: Normalization; RNA-seq; Resistance training; Skeletal muscle
Mesh:
Substances:
Year: 2022 PMID: 35717158 PMCID: PMC9206305 DOI: 10.1186/s12859-022-04791-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1A Overview of data transformation and analyzes used. Raw counts were transformed to represent normalized data per-library-size, per-total-RNA and per-sample-size (tissue mass). Transformed counts were used to identify stable reference genes free from systematic effect and with subsequent ranking by intra-class correlation. Normalization factors comprised of 10 transcripts from each normalization approach was used in differential expression analysis. B Fold-changes of sample references (average of the top ten stable transcripts per normalization mode) ratios with numerators plotted over columns and denominators over rows. Error bars represent 95% CI. C Transcripts identified as differentially up and down-regulated over time (differences from Week 0 to Week 2 and 12 respectively) from generalized linear models with each normalization factor used as a model offset. Percentages represents proportions of all transcripts identified as differentially expressed regardless of normalization approach. Up- and down-regulation determined from false discovery rate-adjusted p values (p < 0.05). Black points represent intersections, e.g., where the same transcript has been identified from in one or more normalization perspective
Genes selected as stable reference genes from each normalization scenario
| Normalization strategy | Transcript ID | Gene symbol | Gene biotype | Intraclass correlation |
|---|---|---|---|---|
| Per-library-size | ENST00000643905 | 0.915 | ||
| ENST00000439211 | DHFR | Protein coding | 0.877 | |
| ENST00000582787 | SP2-DT | lncRNA | 0.873 | |
| ENST00000342992 | TTN | Protein coding | 0.866 | |
| ENST00000361681 | MT-ND6 | Protein coding | 0.864 | |
| ENST00000371470 | MAGOH | Protein coding | 0.846 | |
| ENST00000234256 | SLC1A4 | Protein coding | 0.842 | |
| ENST00000341162 | FCF1 | Protein coding | 0.841 | |
| ENST00000480046 | METTL2B | Protein coding | 0.839 | |
| ENST00000295955 | RPL9 | Protein coding | 0.828 | |
| Per-total-RNA | ENST00000445125 | Processed pseudogene | 0.715 | |
| ENST00000312184 | TMEM70 | Protein coding | 0.579 | |
| ENST00000552002 | CHURC1 | Protein coding | 0.559 | |
| ENST00000357033 | DMD | Protein coding | 0.559 | |
| ENST00000275300 | SLC22A3 | Protein coding | 0.555 | |
| ENST00000496823 | BCL6 | Protein coding | 0.548 | |
| ENST00000546248 | TRDN | Protein coding | 0.522 | |
| ENST00000309881 | CD36 | Protein coding | 0.505 | |
| ENST00000005178 | PDK4 | Protein coding | 0.496 | |
| ENST00000522603 | ASPH | Protein coding | 0.492 | |
| Per-sample-size | ENST00000496823 | BCL6 | Protein coding | 0.536 |
| ENST00000546248 | TRDN | Protein coding | 0.497 | |
| ENST00000216019 | DDX17 | Protein coding | 0.461 | |
| ENST00000005178 | PDK4 | Protein coding | 0.458 | |
| ENST00000361915 | AGL | Protein coding | 0.421 | |
| ENST00000418381 | 0.416 | |||
| ENST00000294724 | AGL | Protein coding | 0.405 | |
| ENST00000366645 | EXOC8 | Protein coding | 0.391 | |
| ENST00000261349 | LRP6 | Protein coding | 0.384 | |
| ENST00000306270 | IBTK | Protein coding | 0.328 |