| Literature DB >> 36106020 |
Rasmus Magnusson1, Olof Rundquist1, Min Jung Kim2, Sandra Hellberg3, Chan Hyun Na4, Mikael Benson5, David Gomez-Cabrero6, Ingrid Kockum7, Jesper N Tegnér8,9,10, Fredrik Piehl7, Maja Jagodic7, Johan Mellergård11,3, Claudio Altafini12, Jan Ernerudh3,13, Maria C Jenmalm3, Colm E Nestor3, Min-Sik Kim14, Mika Gustafsson1.
Abstract
Profiling of mRNA expression is an important method to identify biomarkers but complicated by limited correlations between mRNA expression and protein abundance. We hypothesised that these correlations could be improved by mathematical models based on measuring splice variants and time delay in protein translation. We characterised time-series of primary human naïve CD4+ T cells during early T helper type 1 differentiation with RNA-sequencing and mass-spectrometry proteomics. We performed computational time-series analysis in this system and in two other key human and murine immune cell types. Linear mathematical mixed time delayed splice variant models were used to predict protein abundances, and the models were validated using out-of-sample predictions. Lastly, we re-analysed RNA-seq datasets to evaluate biomarker discovery in five T-cell associated diseases, further validating the findings for multiple sclerosis (MS) and asthma. The new models significantly out-performing models not including the usage of multiple splice variants and time delays, as shown in cross-validation tests. Our mathematical models provided more differentially expressed proteins between patients and controls in all five diseases. Moreover, analysis of these proteins in asthma and MS supported their relevance. One marker, sCD27, was validated in MS using two independent cohorts for evaluating response to treatment and disease prognosis. In summary, our splice variant and time delay models substantially improved the prediction of protein abundance from mRNA expression in three different immune cell types. The models provided valuable biomarker candidates, which were further validated in MS and asthma.Entities:
Keywords: RNA-seq; T-cell differentiation; biomarkers; multiple sclerosis; proteomics
Year: 2022 PMID: 36106020 PMCID: PMC9465313 DOI: 10.3389/fmolb.2022.916128
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1RNA-seq and mass-spectrometry analysis of TH1 differentiation revealed highly variable correlations. (A) Experimental design. (B) Heat map of transcript and protein abundance dynamics in genes that show significant negative (left) and positive (right) correlations. Genes of particular relevance for T cells and T cell differentiation are highlighted in the figure. (C) Examples of transcript splice variants showing that both STX12 (left) and IL7R (right) were significantly negatively and positively correlated with protein levels. (D) Illustration of the modelling procedure for resolving the poor correlation, using STX12 as an example.
FIGURE 2Multiple transcripts and time delays increased mRNA and protein correlations significantly in multiple cell types. (A) Gene/protein Pearson correlations in TH1 (left), Treg (middle), and murine B-cell (right) differentiation. In the histogram, the grey curve shows the correlation distribution when the sum of all splice variant expressions of a transcript (Fortelny et al., 2017) is used to quantify mRNA abundance (median: dashed line), while in the blue histogram our time delayed multiple splice variant based models are used (medians: solid lines at 0.86, 0.79, and 0.94 for TH1, Treg and murine B-cells, respectively). Only cross-validated protein predictions are shown for the proteins for which the null-model could be rejected. (B) Out-of-sample cross validation prediction of the three models. Aiming to quantify the predictive power of each added input to the model, we observed that a linear model with gene-specific time delays was the model that generated predictions with the smallest sum of squared residuals. (C) Median correlation coefficients (rho) for different mathematical protein prediction models derived from mRNA with increasing protein abundance correlations. P-values were derived from predictions using leave-one-out cross-validation.
FIGURE 3Proteins models led to the discovery of new potential biomarkers of complex diseases that were validated in multiple sclerosis (MS). (A) Differential predicted protein (PP) analysis of five diseases using the TH1 (light blue) and Treg (dark blue) models showed higher fraction of nominally significant genes than that of normal differential gene expression tests. (B) Measurement of actual protein levels of the predicted proteins in a cohort of patients with early MS [clinically isolated syndrome (CIS)] vs. healthy controls (HC) (left side of the figure) and from a cohort of MS patients pre vs. post 1-year treatment with Natalizumab (right side of the figure). sCD27 was measured in cerebrospinal fluid (CSF) using ELISA. (C) Receiver operating curve using sCD27 concentration as a single prognostic marker of NEDA at four (solid line) and 2 years (dashed line) after CIS.
FIGURE 4Overview of detected potential biomarkers in asthma and MS. The model identified several proteins that have previously been identified in MS and asthma. The upper panel shows the potential biomarkers identified in MS and the lower panel shows the same in asthma. *mRNA expression, ¤ identified in mice. PBMCs, peripheral blood mononuclear cells. References stated in the figure aColamatteo A et al., J Immunol, 2019; bAchiron A et al., Ann N Y Acad Sci, 2007; cvan der Vuurst de Vries RM et al., JAMA Neurol, 2017; dWong YYM et al., Mult Scler, 2018; eMasuda H et al., J Neuroimmunol, 2017; fde JG-GJ et al., Immunobiology, 2018; gBomprezzi R et al., Hum Mol Genet, 2003; hWanke F et al., Cell Rep, 2017; iAquino DA et al., J Neuropathol Exp Neurol, 1997; jBonetti B et al., Am J Pathol, 1999; kEnomoto Y et al., J Allergy Clin Immunol, 2009; lFerreira MA et al., Nat Genet. 2017; mPersson H et al., J Allergy Clin Immunol, 2015; nMurray JT et al., Biochem J, 2004; oNestor CE et al., PLoS Genet, 2014; pPurwar R et al., PLoS One, 2011.