Literature DB >> 35538164

Identification of functional features underlying heat stress response in Sprague-Dawley rats using mixed linear models.

Krzysztof Kotlarz1, Magda Mielczarek1,2, Yachun Wang3, Jinhuan Dou4, Tomasz Suchocki1,2, Joanna Szyda5,6.   

Abstract

Since global temperature is expected to rise by 2 °C in 2050 heat stress may become the most severe environmental factor. In the study, we illustrate the application of mixed linear models for the analysis of whole transcriptome expression in livers and adrenal tissues of Sprague-Dawley rats obtained by a heat stress experiment. By applying those models, we considered four sources of variation in transcript expression, comprising transcripts (1), genes (2), Gene Ontology terms (3), and Reactome pathways (4) and focussed on accounting for the similarity within each source, which was expressed as a covariance matrix. Models based on transcripts or genes levels explained a larger proportion of log2 fold change than models fitting the functional components of Gene Ontology terms or Reactome pathways. In the liver, among the most significant genes were PNKD and TRIP12. In the adrenal tissue, one transcript of the SUCO gene was expressed more strongly in the control group than in the heat-stress group. PLEC had two transcripts, which were significantly overexpressed in the heat-stress group. PER3 was significant only on gene level. Moving to the functional scale, five Gene Ontologies and one Reactome pathway were significant in the liver. They can be grouped into ontologies related to DNA repair, histone ubiquitination, the regulation of embryonic development and cytoplasmic translation. Linear mixed models are valuable tools for the analysis of high-throughput biological data. Their main advantages are the possibility to incorporate information on covariance between observations and circumventing the problem of multiple testing.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35538164      PMCID: PMC9090733          DOI: 10.1038/s41598-022-11701-y

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


Introduction

The most general formulation of heat stress is defined as an increase in ambient temperature above the threshold beyond which body temperature cannot be maintained on the physiologically optimal level[1,2]. Heat stress is among the best characterized environmental stressors and has the most severe detrimental effects. Individuals cope with heat stress by increasing body temperature, reducing feed intake as well as changing of physiological state, which is referred to as physiological adaptive responses[3]. Therefore, heat stress can lead to economic losses due to lower productivity and reproductivity, as well as increased health burden[4]. Unfortunately, global temperature is expected to rise by 2 °C in 2050[5] and furthermore, the average temperature in 2100 may rise by up to 4.8 °C[6]. As a consequence, the maximum daily temperature will be higher, so the intensity and duration of heat stress will be more severe and prolonged, compared to 2021. Therefore, heat stress may become the most severe environmental factor. Studies have reported that heat stress significantly altered physiological, biochemical, metabolic, and cellular responses in mice and livestock models[3,7,8]. However, animals have developed protective measures against the challenges of heat stress manifested by altered transcript and gene expression levels[9-11]. With the development of next generation sequencing technologies, including RNA-sequencing (RNA-seq), hundreds or thousands of genes involved in heat stress response have been identified[12,13]. Consequently, the main goal of the research was to illustrate how linear mixed models can be used to perform a biologically driven analysis of changes in transcript and gene expression levels between the control and the heat-stress groups. In contrast to the vast majority of approaches, when particular transcripts are considered as independent, we incorporated biological information on the correlation between them directly into the statistical model. Furthermore, we also considered higher-order units, expressed by Gene Ontology terms and Reactome pathways, since these are the actual functional components of the physiological response towards heat stress. The incorporation of functional units into the analysis of RNA-seq data is a known idea. It has been applied either through enrichment analysis as e.g. by Dou et al.[14] to the data set analysed in our study or through the Ingenuity Pathway Analysis as e.g. by Lan et al.[15] to gene expression under heat stress in poultry. Nevertheless, the major difference between our approach and the aforementioned studies is that the former conducts a two-step analysis, by first identifying significantly differentially expressed transcripts and genes by considering them independent (step 1) and then further analysing the selected, significant transcripts and genes in terms of the correlation between them or the enrichment of ontologies or pathways in this significant gene set (step 2). The approach proposed in our study attempted to combine the process within one-step by including the correlation of the functional information directly into the model fitted to the whole (i.e. not only to significant) expression data and is statistically similar to the linear mixed model fitted to gene expression data by Wang et al.[16], albeit with a different approach to the modelling of the covariance of random effects. In the current study, we illustrate the application of the mixed models using the whole genome transcript expression data of Sprague–Dawley rats obtained from a heat stress experiment[14].

Methods

Experimental animals

The data set underlying the analysis is a subset of the material used by Dou et al.[14]. In brief, the analysed individuals comprised eight weeks old female, specific-pathogen-free, Sprague–Dawley rats. Prior to the experiment, all rats were housed in a laboratory at 22 ± 1 °C, 50% relative humidity with 12 h reverse light/dark cycle with feed and water provided ad libitum. After one week, five rats from the heat-stressed group were exposed to 42 °C for 120 min, while five rats from the control group were housed at an initial temperature of 22 °C. After the completion of the experiment, the animals were euthanised and samples of liver and adrenal gland tissues were used as a source of total RNA.

Bioinformatic analysis

Illumina HiSeq2000 was used to sequence 150 long reads in the paired-end (PE) data mode. The total number of read pairs per sample was from 51,706,978 to 97,059,004. The detailed description of the RNA isolation and sequencing was also provided by Dou et al.[14]. The bioinformatic pipeline consisted of the following steps: quality control of raw reads, editing of raw reads based on their quality, and quantification of transcripts’ expression. In particular, the quality of raw reads was assessed by applying the FastQC software[17]. Then, reads were processed by the Trimmomatic software[18], which removed adapter sequences, trimmed reads with an average sequencing quality of 4 consecutive reads below 20 (SLIDINGWINDOW:4:20), and removed reads shorter than 60 bp (MINLEN:60). Next, the Salmon software[19] was used to quantify the abundances of transcripts. This software implements the pseudoalignment process—an approach allowing for rapid identification of the compatibility of reads with transcripts, without the need of a computationally intensive whole genome alignment. In the last step, log2 fold changes in transcript expression levels between the control and heat stressed groups were calculated using DESeq2[20].

Statistical modelling of expression data

The log2 fold changes (log2FC) calculated based on the transcript expression levels pooled over the control and heat-stressed animals respectively, were analysed in four mixed linear models. The transcript-based model (M1) is given by:where is the vector of log2FC of transcript expression, represents the general mean, is the random transcript effect with a predisposed normal distribution defined by , is a vector of residuals distributed as , is an incidence matrix for . In this model, the similarity between transcripts i and j, was introduced into the model by incorporating a nondiagonal transcript covariance matrix . The covariance between transcripts was expressed by the Jaccard similarity coefficient: which in the case of transcripts was calculated based on the similarity in their exon composition:where a represents the number of exons common between transcripts i and j, while N represents the total number of exons of a given gene. Independence was assumed between genes, so the resulting matrix had a block diagonal structure. Transcript information was obtained from the Ensembl database, release100[21] and Jaccard coefficients were calculated using the ADE-4 package[22]. The gene-based model (M2) was applied to the same dependent variable () as M1 and is given by:where is the random gene effect with a preimposed normal distribution defined by , is a vector of random residuals distributed as , is an incidence matrix for . No covariance between genes was assumed so was diagonal. Furthermore, the Gene Ontology-based model (M3):where is the random effect of Gene Ontology (GO; geneontology.org) terms assigned to transcripts whose log2FC of transcript expressions are contained in . It was assumed that follows the normal distribution , is a vector of residuals distributed as , and is an incidence matrix for the terms. describes the covariance between GO terms expressed, as above, by the Jaccard coefficients quantifying genes overlapping between two given GO terms. Each transcript was assigned GO term(s) from the biological process ontology, considering the ontologies from the 2nd hierarchy level. Finally, a model incorporating the effects of Reactome pathways (M4) was fitted to transcript log2FC:in this model, represents the random effect of Reactome pathways (reactome.org) corresponding to transcripts. The normal distribution pre-imposed on is given by , is a vector of residuals distributed as , and is an incidence matrix assigning Reactome pathways to transcripts. The covariance matrix between Reactome pathways () was also expressed by the Jaccard coefficients quantifying genes overlapping between the two given pathways.

Estimation of variance components and significance testing

The expectation–maximization algorithm[23] was applied for the estimation of variance components underlying the four above models (i.e. and ) and the mixed model equations[24] were used to obtain solutions of and :with representing or depending on the model, and being a model indicator. Each element of the solution vectors and was transformed to the standard normal distribution and tested for significance by assessing the probability of obtaining a more extreme value based on the standard normal density function.

Enhancement of the computational efficiency of the estimation

In order to maximise the computational performance of the estimation of model parameters and its variance components, a custom-written Python program implementing the Numba library[25] was used. Numba compiles a subset of native Python and NumPy code into the machine code. Since all calculations were carried out on a multicore server, the Numba library was also used to parallelize the code, what further improved the computing time compared to a native Python application.

Ethics approval

Not applicable, only in silico data was processed in this study.

Results

Obviously models M1 and M2, fitting thousands of components, i.e. transcripts (M1) or genes (M2) explain a much larger proportion of the observed variability of log2FC than the M3 and M4 fitting the functional components, i.e. GO terms (M3) or Reactome pathways (M4). Still, by taking into account the correlation between transcripts expressed by their exon composition, we explain 12.43% and 18.32% of the total variability of log2FC observed in liver and adrenal tissues, respectively. A similar picture arises when the assumed source of variability of log2FC is considered on a gene level, resulting in 10.12% and 16.45% variance components in liver and adrenal tissues respectively—somewhat lower than for correlated transcripts in M1. By further shrinking the functional units to GO terms and Reactome pathway units we explain “only” 1.48% (GO) and 1.45% (Reactome) of the total variance for liver, as well as 1.44% (GO) and 1.93% (Reactome) for the adrenal tissue (Table 1).
Table 1

Variance components estimated by the mixed models (M1–M4) expressed as the percentage of the total variance of .

Tissue\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\sigma }_{t}^{2}}{{\sigma }_{y}^{2}}$$\end{document}σt2σy2 (M1) (%)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\sigma }_{g}^{2}}{{\sigma }_{y}^{2}}$$\end{document}σg2σy2 (M2) (%)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\sigma }_{go}^{2}}{{\sigma }_{y}^{2}}$$\end{document}σgo2σy2 (M3) (%)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{\sigma }_{r}^{2}}{{\sigma }_{y}^{2}}$$\end{document}σr2σy2 (M4) (%)
Liver12.4310.121.481.45
Adrenal18.3216.451.441.93

the transcript variance, the gene variance, the GO term variance, the Reactome pathway variance, the variance of log2FC.

Variance components estimated by the mixed models (M1–M4) expressed as the percentage of the total variance of . the transcript variance, the gene variance, the GO term variance, the Reactome pathway variance, the variance of log2FC. Considering the transcript and gene levels, in the liver, M1 and M2 point at the significant effects of the PNKD and TRIP12 genes. PNKD exhibits a significant effect of a transcript ENSRNOT00000046229, which is approximately 2[23] times higher expressed in the control than in the heat-stress group. However, three of the five transcripts of the gene show higher expression in the heat-stress group, reaching 2[5] higher expression of ENSRNOT00000089580. TRIP12 synthesises one particular transcript (ENSRNOT00000022822) which is 2[22] times higher expressed in the control than in the heat-stress group. Different genes were significantly differentially expressed in the adrenal tissue. One transcript of the SUCO gene (ENSRNOG00000026542) is 2[23] times higher expressed in the control than in the heat-stress group, while the three other transcripts synthesised from this gene have an opposite effect, i.e. show higher expression in heat-stressed individuals. PLEC has two transcripts which are significantly overexpressed in the heat-stress group. PER3 was significant only on the gene level (M2) with all of its transcripts being under expressed during heat stress (Table 2).
Table 2

Top 5 significant transcripts from M1, top 2 significant genes from M2, significant Gene Ontology terms from M3, and significant Reactome pathways from M4.

TissueModelID nameEffectP
LiverTranscript (M1)

ENSRNOT00000074131

PNKD

2.764.7 × 10–10
Transcript (M1)

ENSRNOT00000093245

TRIP12

2.151.2 × 10–6
Transcript (M1)

ENSRNOT00000093735

TRIP12

1.670.00016
Transcript (M1)

ENSRNOT00000022822

TRIP12

1.650.00021
Transcript (M1)

ENSRNOT00000079452

TRIP12

1.630.00023
Gene (M2)

ENSRNOG00000016963

TRIP12

1.663.5 × 10–5
Gene (M2)

ENSRNOG00000014806

PNKD

1.120.00520
Gene ontology (M3)

GO:1901315

Negative regulation of histone H2A K63-linked ubiquitination

0.520.00055
Gene ontology (M3)

GO:2000780

Negative regulation of double-strand break repair

0.480.00130
Gene ontology (M3)

GO:2000779

Regulation of double-strand break repair

0.370.01300
Gene ontology (M3)

GO:0045995

Regulation of embryonic development

0.300.04200
Gene ontology (M3)

GO:0002181

Cytoplasmic translation

0.290.05200
Reactome pathway (M4)

R-RNO-6791226

Major pathway of rRNA processing in the nucleolus and cytosol

0.290.05500
AdrenalTranscript (M1)

ENSRNOT00000075998

SUCO

3.489.2 × 10–11
Transcript (M1)

ENSRNOT00000084058

SUCO

2.184.9 × 10–5
Transcript (M1)

ENSRNOT00000082271

PLEC

2.108.9 × 10–10
Transcript (M1)

ENSRNOT00000075936

SUCO

1.950.00029
Transcript (M1)

ENSRNOT00000088945

PLEC

1.700.00160
Gene (M2)

ENSRNOG00000026542

SUCO

2.268.4 × 10–6
Gene (M2)

ENSRNOG00000018413

PER3

1.630.00130
Reactome pathway (M4)

R-RNO-212436

Generic Transcription Pathway

0.490.00500

ID the Ensembl transcript ID, Ensembl gene ID Gene Ontology database ID, or Reactome ID, depending on the effect considered, Effect represents the estimate from the corresponding mixed model M1–M4, P P value corresponding to the normal probability density function with mean zero and the variance estimated by the corresponding model.

Top 5 significant transcripts from M1, top 2 significant genes from M2, significant Gene Ontology terms from M3, and significant Reactome pathways from M4. ENSRNOT00000074131 PNKD ENSRNOT00000093245 TRIP12 ENSRNOT00000093735 TRIP12 ENSRNOT00000022822 TRIP12 ENSRNOT00000079452 TRIP12 ENSRNOG00000016963 TRIP12 ENSRNOG00000014806 PNKD GO:1901315 Negative regulation of histone H2A K63-linked ubiquitination GO:2000780 Negative regulation of double-strand break repair GO:2000779 Regulation of double-strand break repair GO:0045995 Regulation of embryonic development GO:0002181 Cytoplasmic translation R-RNO-6791226 Major pathway of rRNA processing in the nucleolus and cytosol ENSRNOT00000075998 SUCO ENSRNOT00000084058 SUCO ENSRNOT00000082271 PLEC ENSRNOT00000075936 SUCO ENSRNOT00000088945 PLEC ENSRNOG00000026542 SUCO ENSRNOG00000018413 PER3 R-RNO-212436 Generic Transcription Pathway ID the Ensembl transcript ID, Ensembl gene ID Gene Ontology database ID, or Reactome ID, depending on the effect considered, Effect represents the estimate from the corresponding mixed model M1–M4, P P value corresponding to the normal probability density function with mean zero and the variance estimated by the corresponding model. While moving to a functional level, we did not identify any significant GO terms differentially expressed in the adrenal tissue and a single Generic Transcription Reactome Pathway (R-RNO-212436). However, in the liver, five ontologies and one Reactome pathway were significant. Functionally, they can be grouped into ontologies related to DNA repair through the regulation of double-strand break repair (GO:2000780 and GO:2000779) and histone ubiquitination (GO:1901315), to the regulation of embryonic development (GO:0045995) and to cytoplasmic translation (GO:0002181, R-RNO-6791226).

Discussion

Various scopes of information considered in our study explain different amounts of the observed variation of transcript differential expressions across the genome (Table 1). The difference in the estimated variance components related to transcripts, genes, GO terms and Reactome pathways associates with the number of levels of those independent variables. Since we are not aware of the corresponding estimates reported by other studies, a discussion with the literature is not possible. Therefore, at this stage, we can only hypothesise that on the one hand, the differences in variance components can be due to the technical nature of the model, since many more transcript effects were estimated as Reactome pathway effects. On the other hand, those effects were modelled as random and thus their variation was constrained by the predefined co-variance structure so that the differences in variance components express the biological phenomenon—Reactome pathways aggregate effects of many transcript/genes. By considering the significant transcripts, genes and pathways identified in our study, a consistent picture emerges that DNA stability and its repair mechanisms are affected by heat stress. This phenomenon has already been reported by other authors, e.g.[26] and was reviewed by Kantidze et al.[27]. On a gene level, TRIP12 was reported as encoding a protein which is associated with chromatin and plays a role in the maintenance of genome integrity[28] and PER3 influences DNA damage repair by correlating with checkpoint kinase 2 gene[29]. Mutations in the SUCO gene induce hypoglycaemia [www.ensembl.org] in humans, which is the cause of hypothermia in diabetic patients[30]. PER3 was differentially expressed in broiler chicken maintained under control and heat stress conditions[31]. In general, it has been proven that PER3 is associated with behavioural differences towards stress and was influenced by stress and ethanol treatment in BXD strain mice[32]. On the functional level, we observed a significantly under expression of genes with ontologies related to double-strand break repair (GO:2000780 and GO:2000779) and histone ubiquitination. Furthermore, outside of the nucleus, a significant effect of the cytoplasmic translation ontology (GO:0002181) and rRNA processing in the nucleolus and cytosol pathway (R-RNO-6791226) can be linked to the phenomenon of the aggregation of proteins in the cytoplasm of yeast cells subjected to heat stress, which consequence was impaired cytoplasmic translation[33]. Noteworthy, our list of significant transcripts, genes, GO terms, and Reactome pathways did not overlap with the results reported for the same material by Dou et al.[12]. Only the PER3 gene was significantly associated with differential expression in adrenal tissue in both studies. The most emerging difference concerned the number of significant effects reported. While Dou et al.[12] estimated 3909 and 4953 significantly differentially expressed genes for liver and adrenal tissues respectively, our study pointed at only two significant genes for each tissue. Similarly, the number of significant GO terms was 193 for liver and 79 for adrenal tissue in Dou et al.[12], while only five and zero in our results. The observed differences are caused by the following factors: different raw reads editing criteria, different approaches to estimate transcript expression, and different statistical modelling of expression data. Dou et al.[12] used the Cufflinks software[34] for bioinformatic processing of the expression data, which implements Tophat2[35] for the alignment of reads to the reference genome and a single gene hypothesis test with multiple testing correction of P values via the FDR for the assessment of differential expression. In our analysis, an alignment-free approach and a random effect model incorporating all transcripts/genes simultaneously were applied. Already [36] pointed at differences in the statistical inference based on fixed and random effect models, indicating that the former tend to underestimate, and the latter—overestimate a residual variance. As a consequence, a type I-error is often elevated in fixed-effect models, which was analytically demonstrated by [37] in the context of meta-analysis. Moreover, significant differences in the quantification of expression between alignment-based and alignment-free approaches were recently demonstrated in our unpublished analysis of Sus scrofa RNA-seq data by Hoffman et al.

Conclusions

Mixed models, i.e. statistical models fitting random effects, are a valuable tool for the analysis of high-throughput biological data. Their major advantages comprise: (1) the possibility to incorporate information on covariance between observations, which is often neglected while applying simple, fixed effect models, and (2) circumventing the problem of multiple testing, by simultaneous fitting all effects. We see the major limitation of the proposed approach in the varying quality of functional annotation of genomes available for different species. While genomes of humans and experimental species, such as rats, are very well functionally annotated with the most of transcripts/genes assigned to GO terms and metabolic pathways, less well-studied species have less complete annotation, which would enforce the incorporation of phantom GO and phantom pathway effects (similarly to phantom parent groups in livestock genetic evaluation). From the biological perspective, PER3 and SUCO genes as well as DNA repair and translation were indicated as factors playing a significant role in heat stress response.
  26 in total

Review 1.  Invited review: Effects of heat and cold stress on mammalian gene expression.

Authors:  Larry A Sonna; Jun Fujita; Stephen L Gaffin; Craig M Lilly
Journal:  J Appl Physiol (1985)       Date:  2002-04

2.  Per3, a circadian gene, is required for Chk2 activation in human cells.

Authors:  Jun-Sub Im; Bo-Hyeon Jung; Sang-Eun Kim; Kun-Ho Lee; Joon-Kyu Lee
Journal:  FEBS Lett       Date:  2010-11-09       Impact factor: 4.124

3.  Hypothermia is a frequent sign of severe hypoglycaemia in patients with diabetes.

Authors:  C Tran; K Gariani; F R Herrmann; L Juan; J Philippe; O T Rutschmann; U M Vischer
Journal:  Diabetes Metab       Date:  2012-05-03       Impact factor: 6.041

4.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Authors:  Cole Trapnell; Brian A Williams; Geo Pertea; Ali Mortazavi; Gordon Kwan; Marijke J van Baren; Steven L Salzberg; Barbara J Wold; Lior Pachter
Journal:  Nat Biotechnol       Date:  2010-05-02       Impact factor: 54.908

5.  Heat Stress-Induced DNA Damage.

Authors:  O L Kantidze; A K Velichko; A V Luzhin; S V Razin
Journal:  Acta Naturae       Date:  2016 Apr-Jun       Impact factor: 1.845

6.  Liver transcriptome response to hyperthermic stress in three distinct chicken lines.

Authors:  Xi Lan; John C F Hsieh; Carl J Schmidt; Qing Zhu; Susan J Lamont
Journal:  BMC Genomics       Date:  2016-11-22       Impact factor: 3.969

7.  Salmon provides fast and bias-aware quantification of transcript expression.

Authors:  Rob Patro; Geet Duggal; Michael I Love; Rafael A Irizarry; Carl Kingsford
Journal:  Nat Methods       Date:  2017-03-06       Impact factor: 28.547

8.  Transcriptome analysis identifies genes and co-expression networks underlying heat tolerance in pigs.

Authors:  Yuqing He; Christian Maltecca; Francesco Tiezzi; Emmanuel Lozada Soto; William L Flowers
Journal:  BMC Genet       Date:  2020-04-21       Impact factor: 2.797

9.  Comprehensive RNA-Seq Profiling Reveals Temporal and Tissue-Specific Changes in Gene Expression in Sprague-Dawley Rats as Response to Heat Stress Challenges.

Authors:  Jinhuan Dou; Angela Cánovas; Luiz F Brito; Ying Yu; Flavio S Schenkel; Yachun Wang
Journal:  Front Genet       Date:  2021-04-09       Impact factor: 4.599

10.  Heat Stress Impairs the Physiological Responses and Regulates Genes Coding for Extracellular Exosomal Proteins in Rat.

Authors:  Jinhuan Dou; Adnan Khan; Muhammad Zahoor Khan; Siyuan Mi; Yajing Wang; Ying Yu; Yachun Wang
Journal:  Genes (Basel)       Date:  2020-03-13       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.