Literature DB >> 34755136

Platelet transcriptome identifies progressive markers and potential therapeutic targets in chronic myeloproliferative neoplasms.

Zhu Shen1, Wenfei Du1, Cecelia Perkins2, Lenn Fechter2, Vanita Natu3, Holden Maecker4, Jesse Rowley5, Jason Gotlib2,6, James Zehnder2,6,7, Anandi Krishnan2,7.   

Abstract

Predicting disease progression remains a particularly challenging endeavor in chronic degenerative disorders and cancer, thus limiting early detection, risk stratification, and preventive interventions. Here, profiling the three chronic subtypes of myeloproliferative neoplasms (MPNs), we identify the blood platelet transcriptome as a proxy strategy for highly sensitive progression biomarkers that also enables prediction of advanced disease via machine-learning algorithms. The MPN platelet transcriptome reveals an incremental molecular reprogramming that is independent of patient driver mutation status or therapy. Subtype-specific markers offer mechanistic and therapeutic insights, and highlight impaired proteostasis and a persistent integrated stress response. Using a LASSO model with validation in two independent cohorts, we identify the advanced subtype MF at high accuracy and offer a robust progression signature toward clinical translation. Our platelet transcriptome snapshot of chronic MPNs demonstrates a proof-of-principle for disease risk stratification and progression beyond genetic data alone, with potential utility in other progressive disorders.
© 2021 The Author(s).

Entities:  

Keywords:  MPN; biomarker; blood platelets; myeloproliferative neoplasms; platelet RNA-seq; platelet transcriptome; prediction algorithms; progression signatures; proteostasis; ruxolitinib

Mesh:

Substances:

Year:  2021        PMID: 34755136      PMCID: PMC8561315          DOI: 10.1016/j.xcrm.2021.100425

Source DB:  PubMed          Journal:  Cell Rep Med        ISSN: 2666-3791


Introduction

The classic Philadelphia chromosome-negative (Ph-) MPNs,5, 6, 7, 8 are clonal disorders of the bone marrow that comprise three clinical phenotypes– essential thrombocythemia (ET), polycythemia vera (PV), and myelofibrosis (MF). These myeloid neoplasms are defined by a combination of morphologic, clinical, laboratory, and cytogenetic/molecular genetic features. The existing genetic landscape9, 10, 11 of MPNs primarily involves mutations in three driver genes that lead to constitutive JAK-STAT signaling (JAK2, CALR, MPL). Several additional non-driver mutations (see references9, 10, 11, 12, 13, 14, 15 for details) as well as cytogenetic and epigenetic, abnormalities also contribute to disease initiation and progression, and impact both overall survival and potential for progression to acute myeloid leukemia (AML). Depending on the MPN and stage of disease, these patients may exhibit debilitating constitutional symptoms such as fatigue, pruritus, night sweats, and weight loss; thrombohemorrhagic diathesis and extramedullary hematopoiesis; and an increased propensity for transformation to AML. Although an increase in one or more blood cell lineages contributes to these morbid sequelae, the qualitative abnormalities of myeloid cells that increase vascular risk or disease progression are not well understood. Taken together, a limited understanding exists regarding how genotypic variability contributes to diverse phenotypic presentations and disease natural histories. We were motivated by the current clinical need,, and the potential for deeper integration of clinical features and genetics with gene expression profiling to improve stratification and management of chronic blood disorders, such as MPNs. Blood platelets play critical roles in multiple processes and diseases19, 20, 21, from their traditional function in hemostasis and wound healing to inflammation, immunity, cancer metastasis, and angiogenesis. Platelets originate from bone marrow precursor megakaryocytes as anucleate fragments with a distinctive discoid shape. Platelets contain a complex transcriptional landscape of messenger RNAs (mRNAs), unspliced pre-mRNAs, rRNAs, tRNAs and microRNAs21, 22, 23, 24. Most platelet RNA expression results from the transcription of nuclear DNA in the megakaryocyte,, and thus reflects the status of the megakaryocyte at the time of platelet release into the circulation, as well as subsequent splicing events, selective packaging, and intercellular RNA transfer. There is emerging evidence26, 27, 28, 29, 30 that the molecular signature of platelets may be changed in disease conditions where these processes are altered, including via inter-cellular transfer, of cytosolic RNA. In the context of MPNs, the platelet transcriptome therefore not only represents a critical biomarker of megakaryocytic activity31, 32, 33, 34, 35, but also provides a snapshot of the underlying hemostatic, thrombotic, and inflammatory derangements associated with these hematologic neoplasms and the potential impact of treatment. To date, no one composite study,,,4, 37, 38, 39, 40 has evaluated the disease-relevant platelet transcriptome in a sizeable clinical cohort of all three subtypes of Ph- MPNs. Here, we extend our prior preliminary work toward a comprehensive analysis of disease-relevant platelet RNA-sequencing (RNA-seq) in two temporally independent and mutually validating cohorts of all three MPN subtypes, ET, PV, and MF (primary or post ET/PV secondary). We demonstrate marked differences in platelet gene expression across the MPN spectrum, which also permits robust validated (temporal and geographical) prediction of MF. In addition to identifying novel gene expression signatures impacted by the JAK1/JAK2 inhibitor ruxolitinib (RUX), platelet profiling reveals MPN-altered pathways that may be targets for future drug development.

Results

Two independent MPN clinical cohorts and closely replicated platelet transcriptome

We prepared highly purified leukocyte-depleted, platelets from peripheral blood samples of two cohorts (approximately 2 years apart; Stanford single-center) of patients with a diagnosis of MPN (including provisional) at the time of sample acquisition, and included healthy controls in each cohort as reference (cohort 1, n = 71, and cohort 2, n = 49; Figure 1, Tables S1A and S1B). Only 2 patients (2%) received a change in diagnosis from MPN (MF) to a non-MPN phenotype; and were therefore excluded from all downstream analyses (Figure 2A principal component analysis plot panel-3 identifies these 2 outliers). Our two-cohort study was specifically designed (before knowledge of other subsequent studies) for the explicit purpose of validation, not only of intercohort RNA-seq results but also to evaluate temporal validation of our prediction model (see STAR Methods, Figures 5C, 5E, 5F). Figure S1 demonstrates our established, high-quality and highly efficient experimental framework toward a rigorous platelet RNA-seq approach. Clinical features of the MPN patients are shown in Figure 1 and listed in Tables S1A and S1B. The distribution of key variables was closely matched between the two cohorts by MPN subtype (Figure 1A), age (Figure 1B), gender (Figure 1C), JAK2/CALR mutation status (Figure 1D), and treatment (Figure 1E). Any interpatient variability in patient age, gender, and treatment were adjusted as confounding factors in all downstream gene expression analyses (see STAR Methods). Clinical laboratory measures (Figure 1F) at the time of sampling reflected the phenotype of the MPN subtypes (please see figure legend for detailed statistical comparisons). The two cohorts of platelet transcriptome data (Figure 1G, normalized transcript counts) adjusted for patient age, gender, and all treatment as confounding factors were also highly correlated (R2 = 0.89), thus demonstrating high intercohort validation of gene expression that then enabled us to combine our two RNA-seq datasets into a final integrated dataset of enhanced statistical power for downstream analyses, especially prediction modeling (Figure 5). Together, this platelet transcriptome compendium comprises 118 human peripheral blood samples from healthy controls (n = 21) and World Health Organization–defined MPN patients (24 ET, 33 PV and 40 MF) that include 7 untreated, and 92 either on cytoreductives/biologics (e.g., ruxolitinib, hydroxyurea, interferon-alpha), anti-thrombotic agents (e.g., aspirin, warfarin), or a combination of the two; and captures the real-life diversity among MPN patients. Our cross-sectional design here capturing patients from all three MPN subtypes also serves as a practical alternative to the longitudinal approach; though ideal, it is likely difficult to implement in these chronic disorders which often occur over decades.
Figure 1

Two independent MPN clinical cohorts and closely replicated patient variables

(A) Similarity in distribution of MPN subtypes between two cohorts of MPN patients (Stanford single-center; approximately 2 years apart: cohort 1: 2016-17, n = 71; and cohort 2: 2019, n = 49); the majority subtype is MF in both cohorts (34% of n = 71 and 37% of n = 49).

(B) Comparable distribution of age across MPN subtypes in the two cohorts. Violin plots of patient age from each MPN subtype reflect clinical expectation, with an almost identical match between the two cohorts. Slightly higher median age noted in the second cohort for ET and PV patients alone.

(C) Comparable and balanced distribution of gender across MPN subtypes in the two cohorts. Larger percentage of male healthy controls in both cohorts and smaller percentage of males in ET cohort 1 noted.

(D) Matched distribution of primarily JAK2 and CALR mutational status across MPN subtypes in the two cohorts. JAK2 is the most common mutation across all three subtypes; 100% of PV and over 50% of ET and MF patients have JAK2 mutation in both cohorts. Mismatch between cohorts on the MPL/triple negative patients is noted as a natural consequence of the rarer prevalence of these mutations; and therefore, not the primary focus of this work.

(E) Diversity of MPN patient therapies across the two cohorts reflecting current clinical practice. The majority being aspirin (ASA) in ET/PV patients and the JAK-inhibitor, ruxolitinib, in MF. Note that a given patient may be on more than one treatment and therefore, the total treatment percentage in this graphic may not equal 100. To control for any inter-patient variability, all treatment, in addition to patient age and gender are adjusted as confounding factors in downstream gene expression analyses.

(F) Representative clinical laboratory variables, as boxplots, measured at the same date and time as platelet sampling. Compared to controls, MPN patients show larger variance (inter-quartile range [IQR]), and reflect current clinical knowledge. Groups differ primarily only with respect to blood cell counts (platelet/RBC/WBC); and show the greatest differential in MF. Note higher platelet count in ET with a concomitant lower mean platelet volume, higher red blood cell count in PV and lower red blood cell count in MF with concomitantly lower hemoglobin, hematocrit, and higher red cell width. Wilcoxson signed rank tests marked by asterisks indicate a statistically significant difference between any two groups (∗p ≤ 0.05; ∗∗p ≤ 0.01; ∗∗∗p ≤ 0.001; ∗∗∗∗p ≤ 0.0001).

(G) High correlation (R2 = 0.89) of platelet gene expression as assessed by normalized counts of matched transcripts in each cohort between each of controls, essential thrombocythemia (ET), polycythemia vera (PV), and myelofibrosis (MF). The two-cohort collective sample size by subtype constitutes each of ET (n = 24), PV (n = 33), primary or post ET/PV secondary MF (n = 42), healthy donors (n = 21), and totals (n = 120), affording increased statistical power for subsequent analyses.

Figure 2

MPN platelet transcriptome distinguishes disease phenotype and reveals phenotype- and JAK-inhibitor specific signatures

(A) Unsupervised principal component analysis (PCA) of normalized platelet gene expression counts adjusted for age, gender, treatment, and experimental batch. Three panels of PC1 and PC2 colored by MPN subtype; and each contrasted with controls (n = 21; yellow): ET (n = 24; top, light green), PV (n = 33; middle, dark green), and MF (n = 42; bottom, dark blue). Circles filled or open mark presence or absence of ruxolitinib treatment and size of circles, smaller or larger, indicate presence or absence of JAK2 mutation. The first two principal components account for 48% of total variance in the data.

(B) Volcano plots (three panels as A above of ET, PV, and MF) of differential gene expression showing statistical significance (negative log10 of p values) versus log2 fold change of each gene. Significant upregulated and downregulated genes are those with p values (FDR) smaller or equal to 0.05 and absolute value of fold changes larger or equal to 1.5.

(C and D) Venn Diagram comparisons of MPN differential gene expression lists. In (C), each of ET, PV and MF is contrasted with controls; identifying gene sets that are shared (n = 1504, FDR < 0.05) as well as unique to each subtype. In (D), differential in the RUX-treated cohort is contrasted with MF versus controls. Differential in gene expression in RUX-treated cohort is a fraction of the total differential noted in the MF transcriptome.

(E) Volcano plot of differential gene expression between MPN patients treated with ruxolitinib and not. A small subset of overlapping differential genes that are upregulated in MF (B) (bottom panel) and suppressed in the RUX-treated cohort (E) are colored in green.

Figure 5

Prediction of MF based on unique and progressive MPN platelet transcriptome

(A and B) Top few genes (out of 3000+) demonstrating monotonic progressive gene expression (log 2 fold change in expression y axis, FDR < 0.01, Mann-Kendall test with Bonferroni correction) across x-axes (A), CTRL-to-ET-to-MF and (B, CTRL-to-PV-to-MF.

(C) Lasso-penalized multinomial logistic regression model under temporal validation i.e., trained on Stanford cohort 1 (n = 71, 2016–2017, Figure 1A) and validated on Stanford cohort 2 (n = 49, 2019, Figure 1A) as test set.

(D) Lasso-penalized multinomial logistic regression model under geographical validation using two independently published datasets for training (cohort 3, n = 31 healthy controls in addition to Stanford cohorts 1 and 2) and validation (cohort 4, n = 25 MF and n = 15 healthy controls).

(E) Receiver operating curves (ROC) toward MF prediction under conditions of temporal (C) and geographical (D) validation. Temporal validation uses three models: (i) baseline, with no gene expression data but patient age, gender, and mutation status alone; (ii) entire MPN platelet transcriptome; and (iii) MPN progressive genes alone. Outperformance of the progressive transcriptome model (red curve, AUROC = 0.96) vis-a-vis the entire transcriptome dataset (blue curve, AUROC = 0.95) and lastly, the baseline model without gene expression (black curve, AUROC = 0.68). Geographical validation using the progressive transcriptome model also demonstrates independent, high MF predictive accuracy (green curve, AUROC = 0.97).

(F) Heatmap of top recurring Lasso-selected progressive genes for each of controls (left column, CTRL, yellow bar), ET (light green), PV (dark green), and MF (dark blue). Rows indicate gradation in gene expression on a blue (low) to red (high) scale. Columns indicate sample type (CTRL, ET, PV, and MF).

Two independent MPN clinical cohorts and closely replicated patient variables (A) Similarity in distribution of MPN subtypes between two cohorts of MPN patients (Stanford single-center; approximately 2 years apart: cohort 1: 2016-17, n = 71; and cohort 2: 2019, n = 49); the majority subtype is MF in both cohorts (34% of n = 71 and 37% of n = 49). (B) Comparable distribution of age across MPN subtypes in the two cohorts. Violin plots of patient age from each MPN subtype reflect clinical expectation, with an almost identical match between the two cohorts. Slightly higher median age noted in the second cohort for ET and PV patients alone. (C) Comparable and balanced distribution of gender across MPN subtypes in the two cohorts. Larger percentage of male healthy controls in both cohorts and smaller percentage of males in ET cohort 1 noted. (D) Matched distribution of primarily JAK2 and CALR mutational status across MPN subtypes in the two cohorts. JAK2 is the most common mutation across all three subtypes; 100% of PV and over 50% of ET and MF patients have JAK2 mutation in both cohorts. Mismatch between cohorts on the MPL/triple negative patients is noted as a natural consequence of the rarer prevalence of these mutations; and therefore, not the primary focus of this work. (E) Diversity of MPN patient therapies across the two cohorts reflecting current clinical practice. The majority being aspirin (ASA) in ET/PV patients and the JAK-inhibitor, ruxolitinib, in MF. Note that a given patient may be on more than one treatment and therefore, the total treatment percentage in this graphic may not equal 100. To control for any inter-patient variability, all treatment, in addition to patient age and gender are adjusted as confounding factors in downstream gene expression analyses. (F) Representative clinical laboratory variables, as boxplots, measured at the same date and time as platelet sampling. Compared to controls, MPN patients show larger variance (inter-quartile range [IQR]), and reflect current clinical knowledge. Groups differ primarily only with respect to blood cell counts (platelet/RBC/WBC); and show the greatest differential in MF. Note higher platelet count in ET with a concomitant lower mean platelet volume, higher red blood cell count in PV and lower red blood cell count in MF with concomitantly lower hemoglobin, hematocrit, and higher red cell width. Wilcoxson signed rank tests marked by asterisks indicate a statistically significant difference between any two groups (∗p ≤ 0.05; ∗∗p ≤ 0.01; ∗∗∗p ≤ 0.001; ∗∗∗∗p ≤ 0.0001). (G) High correlation (R2 = 0.89) of platelet gene expression as assessed by normalized counts of matched transcripts in each cohort between each of controls, essential thrombocythemia (ET), polycythemia vera (PV), and myelofibrosis (MF). The two-cohort collective sample size by subtype constitutes each of ET (n = 24), PV (n = 33), primary or post ET/PV secondary MF (n = 42), healthy donors (n = 21), and totals (n = 120), affording increased statistical power for subsequent analyses. MPN platelet transcriptome distinguishes disease phenotype and reveals phenotype- and JAK-inhibitor specific signatures (A) Unsupervised principal component analysis (PCA) of normalized platelet gene expression counts adjusted for age, gender, treatment, and experimental batch. Three panels of PC1 and PC2 colored by MPN subtype; and each contrasted with controls (n = 21; yellow): ET (n = 24; top, light green), PV (n = 33; middle, dark green), and MF (n = 42; bottom, dark blue). Circles filled or open mark presence or absence of ruxolitinib treatment and size of circles, smaller or larger, indicate presence or absence of JAK2 mutation. The first two principal components account for 48% of total variance in the data. (B) Volcano plots (three panels as A above of ET, PV, and MF) of differential gene expression showing statistical significance (negative log10 of p values) versus log2 fold change of each gene. Significant upregulated and downregulated genes are those with p values (FDR) smaller or equal to 0.05 and absolute value of fold changes larger or equal to 1.5. (C and D) Venn Diagram comparisons of MPN differential gene expression lists. In (C), each of ET, PV and MF is contrasted with controls; identifying gene sets that are shared (n = 1504, FDR < 0.05) as well as unique to each subtype. In (D), differential in the RUX-treated cohort is contrasted with MF versus controls. Differential in gene expression in RUX-treated cohort is a fraction of the total differential noted in the MF transcriptome. (E) Volcano plot of differential gene expression between MPN patients treated with ruxolitinib and not. A small subset of overlapping differential genes that are upregulated in MF (B) (bottom panel) and suppressed in the RUX-treated cohort (E) are colored in green.

MPN platelet transcriptome distinguishes disease phenotype and reveals phenotype- and JAK-inhibitor specific signatures

Given the phenotypic overlap, yet also differences in disease behavior and prognosis between ET, PV, and MF, we hypothesized that there may be MPN subtype-specific differences in gene expression that are independent of JAK2/CALR/MPL mutation status. In addition, we hypothesized that MPN platelets are likely to be enriched for subtype-specific biomarkers that may otherwise be missed in blood/plasma/serum sources37, 38, 39. Therefore, we compared platelet transcriptomic expression in each of the three MPN subtypes with that of controls and discovered a shared gene set that is also progressively differentiated across the MPN spectrum (ET/PV to MF as shown in Figures 2A–2C). First, unsupervised principal component analysis (PCA) of MPN patients and controls data (Figure 2A) confirmed that the collective variability from the first two principal components (accounting for 48% of total variance), after adjusting for age, gender, treatment, and experimental batch, was MPN disease status, with increasing differentiation by subtype. Next, differential gene expression analysis (DGEA), as shown in a volcano plot (Figures 2B and 2C), efficiently distinguished each of the MPN subtypes and resulted in highly significant expression signatures (adjusted p value/FDR < 0.05) with 2634 genes differentially regulated in ET (1364 up and 1269 down), 4398 in PV (2098 up and 2300 down), and 6648 in MF (3965 up and 2683 down). A subset of 100+ long non-coding RNAs and pseudogenes also constituted the significant (FDR < 0.05) differential expression across MPNs (Table S2A-C). Specifically, DGEA also uncovered shared and unique genes between all three MPN phenotypes, thus offering a potential core set of genes involved in MPN pathogenesis (Figure 2C, with associated heatmaps in Figure 3 and pathways in Figure 4). The shared gene set at FDR < 0.01 constituted 654 upregulated genes, with a predominance of molecular pathways involving myeloid cell activation in immune response and membrane protein proteolysis, and 360 downregulated genes, reflecting negative regulation of hematopoiesis and negative regulation of transmembrane receptor protein serine/threonine kinase signaling as a consistent pathogenetic mechanism. The upregulated genes belonged to the endoplasmic reticulum/ER-Golgi intermediate compartment; they also included a particularly high expression of the cAMP-response element binding transcription factor, CREB3L1 implicated in cell differentiation and inhibition of cell proliferation, with concomitant high expression of ER chaperones calreticulin (CALR), calnexin (CANX); transport factors: golgin (GOLGB1) and folate receptor FOLR1. Platelet alpha granule proteins (F5, VWF, MMP14), several collagens (COL10A1, COL18A1, COL6A3), immune/inflammatory (IFITM2/3/10, FCGR2A, TMEM179B), Cathepsins (C/Z/D, MIF, PTGES2) and proliferation mediator genes (CDK1, CCNG1, BMP9/GDF2, LAPTM4B, PSENEN) also constituted the MPN shared set. Downregulated genes, on the other hand, were predominantly within transcription factor complexes, and included opposing expression of CREB1 (vis-à-vis CREB3L1 above), calcium-calmodulin protein kinases, CAMK4, SMAD1 and β-catenin CTNNB1 together pointing to dysregulated calcium (Ca2+) homeostasis in the ER lumen.
Figure 3

Graded differential expression by MPN phenotype and driver mutation status

(A–D) Hierarchically clustered heatmaps of the top 10 differentially expressed genes (DEGs) from controls (FDR < 0.01) of all MPN (A) and MF, ET, and PV each separately (B–D). Colored annotation is provided to indicate MPN subtype, age, gender, mutation, and ruxolitinib treatment. Rows indicate gradation in gene expression on a blue (low) to red (high) scale. Columns indicate sample type from controls (CTRL) to ET, PV, and MF.

Figure 4

Altered immune, metabolic, and proteostatic pathways underlie each MPN phenotype

Pathway-enrichment analysis of genes with MPN subtype-specific expression (color indicated; light green ET, dark green PV, and dark blue MF) overlayed with ruxolitinib-specific expression (light blue). Each point represents a pathway; the x axis gives the normalized enrichment score, which reflects the degree to which each pathway is over- or under-represented at the top or bottom of the ranked list of differentially expressed genes, normalized to account for differences in gene set size and in correlations between gene sets and the expression dataset. The y axis lists the detail-level node of the most enriched pathways; solid lines mark GSEA-recommended Bonferroni-corrected statistical significance criterion of FDR < 0.25 for exploratory analyses. Dotted lines mark FDR > 0.25 and therefore, not sufficiently significant, yet visualized alongside solid lines to retain overall context (upper-level parent nodes of the detail-level pathways are provided in Table S3A–S3C). Multiple immune and inflammatory pathways are consistently significantly enriched across ET, PV, and MF (and suppressed in the ruxolitinib-treated cohort). MF is differentiated from ET and PV through dysregulation of additional molecular processes for cellular development, proliferation, metabolism, and DNA damage.

Graded differential expression by MPN phenotype and driver mutation status (A–D) Hierarchically clustered heatmaps of the top 10 differentially expressed genes (DEGs) from controls (FDR < 0.01) of all MPN (A) and MF, ET, and PV each separately (B–D). Colored annotation is provided to indicate MPN subtype, age, gender, mutation, and ruxolitinib treatment. Rows indicate gradation in gene expression on a blue (low) to red (high) scale. Columns indicate sample type from controls (CTRL) to ET, PV, and MF. Altered immune, metabolic, and proteostatic pathways underlie each MPN phenotype Pathway-enrichment analysis of genes with MPN subtype-specific expression (color indicated; light green ET, dark green PV, and dark blue MF) overlayed with ruxolitinib-specific expression (light blue). Each point represents a pathway; the x axis gives the normalized enrichment score, which reflects the degree to which each pathway is over- or under-represented at the top or bottom of the ranked list of differentially expressed genes, normalized to account for differences in gene set size and in correlations between gene sets and the expression dataset. The y axis lists the detail-level node of the most enriched pathways; solid lines mark GSEA-recommended Bonferroni-corrected statistical significance criterion of FDR < 0.25 for exploratory analyses. Dotted lines mark FDR > 0.25 and therefore, not sufficiently significant, yet visualized alongside solid lines to retain overall context (upper-level parent nodes of the detail-level pathways are provided in Table S3A–S3C). Multiple immune and inflammatory pathways are consistently significantly enriched across ET, PV, and MF (and suppressed in the ruxolitinib-treated cohort). MF is differentiated from ET and PV through dysregulation of additional molecular processes for cellular development, proliferation, metabolism, and DNA damage. Differential markers in each of ET, PV and MF also highlight candidate genes as potential mediators of the pro-thrombotic and pro-fibrotic phenotypes in MPNs. In ET and PV, a strong thromboinflammatory profile, is revealed by the upregulation of several interferon inducible transmembrane genes (IFITM2, IFITM3, IFITM10, IFIT3, IFI6, IFI27L1, IFI27L2); interleukin receptor accessory kinases/proteins (IRAK1, IL15, IL1RAP, IL17RC); and several solute carrier family genes (SLC16A1, SLC25A1, SLC26A8, SLC2A9) as glucose and other metabolic transport proteins; and coagulation factor V (F5). In MF, fibrosis-specific markers were identified by an additional focused comparison of MF patients versus ET and PV (Figure S2), showing increased expression of several pro-fibrotic growth factors (FGFR1, FGFR3, FGFRL1); matrix metalloproteinases (MMP8, MMP14); vascular endothelial growth factor A (VEGFA); insulin growth factor binding protein (IGFBP7); and cell cycle regulators (CCND1, CCNA2, CCNB2, CCNF). GSEA of this MF-focused comparison again highlighted potential underlying molecular dysregulation in MPNs that likely contribute to the fibrotic phenotype (e.g., unfolded protein response, mTORc1 signaling, MYC/E2F targets, oxidative phosphorylation; Figure 4). Having defined differential gene expression signatures by MPN subtype, we then explored how platelet gene expression profiles differed in patients by treatment, focusing for this work on the JAK1/JAK2 inhibitor, ruxolitinib (RUX, Figure 2E). DGEA on the platelet transcriptome in RUX-treated patients identified over 400 core significant genes changed in response to treatment (Figure 2E; Tables S4 and S5). At-least two-fold reduction in expression was noted in genes associated with interferon-stimulation (IFI6, TRIM69, LY6G5C), platelet-mediated apoptosis, FASLG, G-protein coupled receptor, GPR88, calcium-calmodulin protein kinase, CAMK2A and fibroblast growth factor binding protein, FGFBP2, followed by over 150 genes with at least one-fold reduction in expression in the RUX-treated cohort (Table S5). These included genes downregulated within classical pathways of adaptive immune response (TNFSF13B, B2M, HMGB1); response to oxidative stress (COX1, COX2, COX15, TP53); and myeloid activation (TNIP2, FTH1, TOLLIP, RAB6A). In addition to confirming previous observations,50, 51, 52 on the anti-inflammatory and immunosuppressive effects of JAK inhibition by RUX (e.g., downregulation in our RUX-treated cohort of IL1RAP, CXCR5, CPNE3, ILF3), we identified new gene clusters responsive to RUX in the inhibition of type I interferon (e.g., IFIT1, IFIT2, IFI6); chromatin regulation (HIST2H3A/C, HIST1H2BK, H2AFY, SMARCA4, SMARCC2); and epigenetic methylation in mitochondrial genes (ATP6, ATP8, ND1-6 and NDUFA5). Recent literature probing the mechanisms of action of ruxolitinib in other disease settings, including SARS-CoV-2,, confirm our observations in MPNs. A direct comparison restricted to only differentially expressed genes (FDR < 0.05) in RUX-treated versus not with MF versus healthy controls revealed less than 5% overlap (Figure 2D), reflecting potentially the extent of the impact of treatment by ruxolitinib relative to the substantive disease burden in MF. Focusing further on the directionality of the changes observed, we found just 18 genes that were increased in MF and suppressed in the RUX-treated cohort (Figure 2E, colored green), and 9 vice versa (Tables S3A S3B). Despite the small overlap, we capitalized on the converging genes to better define molecular and physiological pathways underlying the effect of RUX in MPNs. The 18 RUX-downregulated genes followed expected mapping to immunosuppression through interferon- and cytokine-mediated signaling pathways. The nine genes that were upregulated with RUX in MF mapped to previously undescribed effect of the drug in select G-protein-coupled receptor and chemokine activity (e.g. CXCR5, GPR128/ADGRG7), semaphorin signaling (SEMA3C) and circadian regulation (PER3). A subcohort analysis of RUX-treated and RUX-naive MF patients alone also identified downregulation of interferon-stimulated genes (e.g., SLFN12L, G-protein-coupled receptors (S1PR5), fibroblast growth factor binding protein (FGFBP2), and tyrosine kinase (LCK).

Graded differential expression by MPN phenotype and driver mutation status

Unsupervised hierarchical clustering was used to define the nature of MPN platelet transcriptome variability more precisely from controls, as well as between and within MPN subtypes. Figure 3 reveals a spectrum of expression in the MPN platelet transcriptomic profile using just the top 10 highly significant differentially expressed genes by disease status: (Figure 3A) all MPN versus controls, (Figure 3B) MF versus controls, and (Figures 3C and 3D) ET and PV versus controls. As shown in Figure 3A, all MPN patients clustered into two distinct groups: a larger group of 87 ET, PV, and MF patients clustered independently from the 21 controls; whereas a smaller group of 10 ET, PV, and MF patients formed a homogeneous cluster of their own closer to controls reflecting a varying gradation with respect to the top-10 gene expression by MPN subtype (patient variables annotated above the heatmaps offer additional context, particularly on mutation status and RUX therapy). In the larger cluster, while we observed a graded overlap in platelet RNA signatures between ET and PV, a more distinct expression pattern characterized the more advanced population of MF patients (PC1 correlates with MF disease risk by the Dynamic International Prognostic Scoring System [DIPSS]; Figure S3). These data collectively highlight the importance of phenotype-modifying genes that are independent of JAK2/CALR/MPL mutation status. Untangling other mechanisms beyond the few genes that are recurrently mutated is critical for defining subtype-specific risk and for identifying molecular pathways for targeted therapy. Accordingly, we sought to refine the molecular classification of MPN by associating platelet gene expression profiles with the corresponding subtype, and yielded a core set of 10 highly significant preferentially expressed genes for each: (1) MF (Figure 3B), defined by high mRNA expression of proteostasis-associated CREB3L1 and CALR, and megakaryocyte-erythroid differentiation stage-associated RHAG;, (2) ET (Figure 3C) marked by comparatively high expression of interferon-related genes IFITM2/3, immune response FCGR2A, and proliferation-associated STAT5 target OSBP2; and (3) PV (Figure 3D) marked by overlapping signatures with ET in inflammation-associated IFITM3 and TBL1X, and the B-myb promoter MYBL2, with MF in the maturation-associated RHAG at variable expression, and with both ET and MF in the high expression of CREB3L1 and cell survival-associated MINDY4 and STAC.

Altered immune, metabolic, and proteostatic pathways underlie each MPN phenotype

Our analysis of MPN platelet RNA-seq enabled identification of altered MPN pathways that might be amenable to drug therapy. To understand the biological significance of transcriptional changes, we performed pathway-enrichment analysis and identified signaling pathways that are differentially activated between MPN subtypes (Figure 4). Gene set enrichment analysis (GSEA; see STAR Methods) of Hallmark gene sets found that MPN (stratified by subtypes, ET, PV, and MF) induces genes related to pathways with known immune modulatory functions (Figure 4, notably interferon alpha response in ET, PV, and MF, and IL2 STAT5 signaling, and interferon gamma response specifically enriched in PV). Moreover, among the most enriched gene sets, MPN pathology induces robust activation of oxidative phosphorylation (OXPHOS) and mTORC1 signaling pathways, with increasing enrichment and significance by MPN subtype (FDR < 0.0001 in MF). Pathways of reactive oxygen species (ROS) production paralleled activation of mTORC1 in MF. Other complementary metabolic pathways paralleled OXPHOS activation, with significant enrichment of bile and fatty acid metabolism, cholesterol homeostasis, and adipogenesis, most pronounced in MF and variably expressed in ET and PV. Coagulation- and complement-associated gene sets were expectedly enriched across ET, PV, and MF. What is particularly noteworthy is that in MF, cell cycle progression and proliferation pathways reveal significant enrichment (FDR < 0.001) around c-MYC and E2F targets, and G2M checkpoint pathways; and unfolded protein response emerges as a key factor, likely attributable to ER stress (see CREB3L1 and CALR overexpression in Figures 2 and 3). Representative GSEA profiles are shown in Figure 4 and the full list of enriched pathways and gene sets detailed in Tables S4A–S4C. The MPN pathways exhibiting significant transcriptional regulation by GSEA are consistent with our observations at the individual gene level for upregulated and downregulated transcripts, specifically those upregulated in MF. Taken together, these data demonstrate that in addition to immune factors such as type I/II interferons and dysregulation of interleukin-dependent inflammatory responses, which have been linked to MPNs, platelet transcriptional signatures of proliferation, metabolic and proteostasis signaling are a feature of MPN pathogenesis (Figure S4 captures the relative enrichment by subtype of MPN molecular pathway categories as a concept model of MPN progression).

Prediction of MF based on shared, unique, and progressive MPN platelet transcriptome

Current knowledge of MPN genetic, cytogenetic, or epigenetic abnormalities are limited, in their ability to enable prediction of disease progression or evolution of a given patient, from ET/PV phenotype to MF. In order to investigate the potential of platelet transcriptomic parameters to enable MF prediction, we constructed LASSO penalized regression classifiers (machine-learning R package glmnet) to discriminate MPN subtypes from each other, and from healthy controls (Figures 5A–5E). We apply two rigorous external validation conditions (Figures 5C and 5D): (1) training and independent temporal validation (Figure 5C) leveraging the Stanford two-cohort design, and (2) geographical validation (Figure 5D). We used two independently published platelet transcriptome datasets: first, from Rondina et al. on an additional n = 31 healthy donors integrated with the Stanford datasets as training; and second, from Guo et al. n = 25 MF and n = 15 healthy donors as geographical validation of the Lasso algorithm (Figure 5D). Our temporal validation constituted three types of models: (1) baseline (no transcriptome, but age, gender, and driver mutation status as reference variable information available not only for patients but also healthy donors); (2) full platelet transcriptome; and (3) subset platelet transcriptome that exhibits progressive differentiation from controls to ET to MF or controls to PV to MF (> 3000 genes, top few of each comparison visualized to demonstrate progressive gene expression, (Figures 5A and 5B) (progressive subset is selected unbiased as part of the Lasso learning procedure). Comparison of the classification potential among the three models demonstrated that the progressive platelet transcriptome model (Figure 5E red curve) substantially outperformed the baseline model (Figure 5E black curve) and was slightly better than the full transcriptome model (Figure 5E blue versus red curves) in the classification of ET, PV, and MF. Predicted probabilities for all three models are shown in Tables S5A–S5C. Lasso logistic regression classifiers to predict MF with each of the models under the first temporal two-cohort training-validation split of baseline, full, and progressive transcriptome each achieved area under the receiver-operating characteristic curve (AUROC) of 0.68, 0.95, and 0.96, respectively. Outperformance of the progressive transcriptome model was validated in our subsequent independent external geographic validation (Figure 5E green curve) at an AUROC of 0.97. Prediction of MF based on unique and progressive MPN platelet transcriptome (A and B) Top few genes (out of 3000+) demonstrating monotonic progressive gene expression (log 2 fold change in expression y axis, FDR < 0.01, Mann-Kendall test with Bonferroni correction) across x-axes (A), CTRL-to-ET-to-MF and (B, CTRL-to-PV-to-MF. (C) Lasso-penalized multinomial logistic regression model under temporal validation i.e., trained on Stanford cohort 1 (n = 71, 2016–2017, Figure 1A) and validated on Stanford cohort 2 (n = 49, 2019, Figure 1A) as test set. (D) Lasso-penalized multinomial logistic regression model under geographical validation using two independently published datasets for training (cohort 3, n = 31 healthy controls in addition to Stanford cohorts 1 and 2) and validation (cohort 4, n = 25 MF and n = 15 healthy controls). (E) Receiver operating curves (ROC) toward MF prediction under conditions of temporal (C) and geographical (D) validation. Temporal validation uses three models: (i) baseline, with no gene expression data but patient age, gender, and mutation status alone; (ii) entire MPN platelet transcriptome; and (iii) MPN progressive genes alone. Outperformance of the progressive transcriptome model (red curve, AUROC = 0.96) vis-a-vis the entire transcriptome dataset (blue curve, AUROC = 0.95) and lastly, the baseline model without gene expression (black curve, AUROC = 0.68). Geographical validation using the progressive transcriptome model also demonstrates independent, high MF predictive accuracy (green curve, AUROC = 0.97). (F) Heatmap of top recurring Lasso-selected progressive genes for each of controls (left column, CTRL, yellow bar), ET (light green), PV (dark green), and MF (dark blue). Rows indicate gradation in gene expression on a blue (low) to red (high) scale. Columns indicate sample type (CTRL, ET, PV, and MF). Recurrent top 4 genes from our progressive transcriptome Lasso are visualized as a heatmap in Figure 5F, clearly capturing the incremental gradient in gene expression between controls, ET, PV, and MF. These include ADAMTS3 (ADAM metallopeptidase protease with likely roles in VEGF signaling, tissue remodeling, and expression of related collagens through profibrotic PAR1 and TGF-β signaling), PSMB5 (implicated in proteasomal degradation/UPR activation and identified previously in MPNs), NT5C related to PI3K signaling, and SUPT6H/SPT6,, a tumor-initiating histone chaperone associated with chromatin remodeling. Lasso-selected candidate markers reflect the MPN clinical spectrum and capture the underlying pathology. A key aspect of the layered Lasso modeling demonstrated here is our use of an approach that can be developed in future work to incorporate additional predictors to MF (e.g., JAK2 V617F allele burden or other genetic variants beyond the driver mutations). Figure S5 demonstrates this through a second base model for prediction to MF from ET or PV alone (no transcriptome, but platelet count and hemoglobin levels as two additional clinical parameters that were available for all patients in addition to age, gender, and driver mutation status). Essentially, our analyses with both the Lasso regression as well as the progressive transcriptome (Figure 5A) enable a small gene signature that could be implemented in a PCR-based prognostic assay and performed in any standard clinical laboratory. Figure S6 demonstrates a robust model (AUROC of 0.86) with just the top progressive genes (Figure 5A) or the Lasso-selected features (Figure 5G) incorporated with patient age, gender, driver mutation status, and the two clinical laboratory variables, platelet count and hemoglobin levels. Identification of genes with significant prognostic impact in this rich dataset in a therapy-agnostic manner may prove useful for stratifying patients at high risk for progression and may be translated toward a widely applicable clinical laboratory assay.

Discussion

Here, we present a comprehensive catalog of the platelet transcriptome in chronic progressive MPN with immediate relevance to defining subtype-specific molecular differences and predicting the advanced phenotype, MF. Recent data identify the timing of MPN driver mutation acquisition to be very early in life, even before birth, with life-long clonal expansion and evolution. These new findings highlight the importance of early progression biomarkers and the substantial opportunity for early detection and intervention strategies in these disorders. Our analyses confirm and extend many important observations made previously either in vitro,69, 70, 71 in other transcriptome or microarray analyses,,,,, including our own early work, or using animal models.,74, 75, 76 By highlighting intersecting mechanisms in transcription across MPN, and by annotating MPN subtype-specific gene signatures, this dataset facilitates predictive machine-learning algorithms, that aid in MPN classification and potential prognostication. The platelet transcriptome is significantly reprogrammed in the MPN setting, with a wealth of transcript associations that may be missed in using conventional tissue sources such as serum, plasma, whole blood, or bulk bone marrow. While previous bulk RNA-seq studies on MPNs by us and others analyzed far fewer samples,36, 37, 38, select MPN subtypes,, or non-specific source tissue37, 38, 39 that may be underpowered for candidate genes, we here analyzed, by next-generation RNA sequencing, 120 purified platelet samples from healthy controls and all three subtypes; and identified clinically interpretable transcriptomic signatures for each of the three subtypes. Each subtype showed both overlapping and progressively divergent transcriptional pathways, suggesting both a shared signature across all MPN, and unique biological trajectories. Pathway-enrichment analyses confirmed the existence of a shared inflammatory milieu,77, 78, 79, 80, 81, 82 among MPN. We also confirmed that the JAK1/JAK2 inhibitor ruxolitinib was associated with inhibition of inflammatory as well as interferon-mediated signaling pathways. Additional previously undescribed insights into the mechanisms of action of RUX in MF included genes implicated in protein maturation, chaperone-mediated protein complex assembly, and circadian rhythm. These and other gene signatures and pathways identified may help guide candidate drugs to be used alone or in combination with RUX for the treatment of MPNs. Whether MPN oncogenic driver mutations increase inflammation or mutations are acquired in response to inflammatory stimuli is unclear from this work and remains an active area of investigation.,,, The 10 genes most significant (FDR < 0.001) of the commonly expressed genes across MPN indicated a gradation in platelet gene expression, with overlapping signatures in ET and PV (e.g. IFITM2, MYBL2) and a substantial difference with MF (e.g. CREB3L1, CALR) that was independent of driver mutation status or treatment. Hence, while over 1500 genes were commonly differentially expressed across MPN, their abundance and function could differ between subtypes. The nature of the separation of transcriptomic clusters between ET, PV, and MF suggest also that they represent diverse cell states along a continuous spectrum of MPN, in line with the clinical overlap of these neoplasms. Another observation relates to the association of the differential genes with signaling pathways: As indicated above, all three MPN subtypes showed a positive enrichment in immune modulation pathways, independent of mutational status. Whether this response reflects a causal effect of inflammation on bone marrow biology remains to be elucidated. Indeed, the platelet transcriptomic signatures could also reflect intercell interactions of platelets with other immune cells, including as transient aggregates with neutrophils, granulocytes, and dendritic cells. Nevertheless, observations that MPN transcriptomic biomarkers correlated robustly with immune factors such as type I/II interferons and dysregulation of interleukin-dependent inflammatory responses across ET, PV, and MF suggest opportunities for use of these and other subtype-specific genes as biomarkers for prognosis as well as design of therapies and prediction of response. Our data closely overlaps with recent MPN platelet studies: thrombo-inflammatory signatures in PV from Gangaraju and Prchal et al. (BCL2, CXCL1, MMP7, PGLYRP1, CKB, BSG, CFL1, and more) or fibrosis-associated signatures in MF from Guo et al. (CCND1, H2AFX, and CEP55). Most notably in our data on MF, high expression of ER stress and unfolded protein response (UPR) biomarkers (e.g., CREB3L1, CALR) associated with impaired proteostasis signaling and emerged as a key feature of MPN pathobiology. Indeed, recently published works from distinct research groups,, highlight protein quality control in ER-associated degradation and proteostasis, deregulation as a primary effector of myeloid transformation, highlighting the importance of protein homeostasis for normal hematopoiesis. These findings too are in line with reports implicating chronic ER stress, malfunctioning protein quality control, and loss of proteostasis as aggravating factors in age-related disorders. Most importantly, platelet gene expression profiling in MPN offers directions for prediction of MF. Applying machine-learning algorithms of LASSO penalized regression under two conditions of external validation—temporal (using our two-cohort design) and geographical (independently published datasets on healthy donors, and MF)—we uniquely discriminate MPN subtypes from each other and healthy controls, using three model types, and predict MF at high accuracy. The highest performing model used a set of progressively differentiated MPN genes at an area under the (ROC) curve of 0.96 (temporal) and 0.97 (geographical), and rendered a core signature of < 5 candidate markers as top predictors of disease progression. It will be of interest to determine what machine-learning algorithms based on a defined platelet gene expression classifier on potential new MPN datasets (ideally longitudinal) can be used to more precisely predict the probability and/or timing of an individual’s risk of progression from ET/PV to secondary MF. In conclusion, using platelet transcriptome profiling, we observed dynamic shifts in MPN immune inflammatory profile and preferential expression of interferon-, proliferation-, and proteostasis-associated genes as a progressive gradient across the three MPN subtypes. Our findings highlight that MPN progression may be influenced by defects in protein homeostasis (impaired protein folding and an accumulation of misfolded proteins within the endoplasmic reticulum) and an abnormal integrated stress response—consistent with recent studies,,, indicating dysregulated proteostasis as a primary effector of myeloid transformation. While this work has been focused on the overarching progressive platelet transcriptome across MPNs, these data open an important avenue for utilizing platelet RNA signatures to better understand specific MPN complications such as the risk of thrombosis and bleeding, or fibrosis, and transformation to AML. Altogether, this study in chronic MPNs provides a comprehensive framework for exploiting the platelet transcriptome and may inform future studies toward mechanistic understanding and therapeutic development in MPNs and potentially, other age-related disorders.

Limitations of the study

There are several limitations to our study. First, our data are not longitudinal by design but rather the closest practical alternative of cross-sectional snapshots of all three MPN subtypes with the goal of achieving a well-powered dataset in these chronic disorders. In this regard, the progressive or progression terminology used here refer strictly to trends in gene expression and do not imply study of longitudinal clinical progression. Therefore, subsidiary longitudinal evaluation of the disease as well as treatment markers identified here is warranted. Second, our focus for this study has been on the platelet transcriptome alone. Future investigations focused on ascertaining overlap between our platelet-derived molecular alterations with those of other cell types, specifically, parent megakaryocytes, CD34+ cells, granulocytes/immune cells, and even whole blood will be required to identify additional functional aspects of bone marrow pathology. Follow-on studies comparing MPN platelet transcriptomic data with other published data on non-malignant controls would also add significant value to the current study. Such integrative analyses may necessitate advanced systems genomics methods that compare or combine data without biases and batch effects inherent in each cell data type. Third, we recognize that our choice of ribosomal RNA depletion to home in on platelet mRNA signatures leaves out additional diversity in the platelet RNA repertoire (and will be important future work). Lastly, in our Lasso predictive modeling, we demonstrate two rigorous approaches of external validation (temporal and geographical) and identify a core signature toward MPN risk stratification or early detection of progression. Yet, substantive future biological and computational validations are needed to advance our findings toward clinical decision making or personalized medicine.

STAR★Methods

Key resources table

Resource availability

Lead contact

Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Anandi Krishnan (anandi.krishnan@stanford.edu).

Materials availability

This study did not generate new unique reagents.

Experimental model and subject details

Human subjects

All MPN peripheral blood samples were obtained under written informed patient consent and were fully anonymized. Study approval was provided by the Stanford University Institutional Review Board. All relevant ethical regulations were followed. Subject demographic data and other relevant clinical variables are reported in Table S1A and S1B and Figure 1. We collected blood from ninety-five MPN patients enrolled in the Stanford University and Stanford Cancer Institute Hematology Tissue Bank from December 2016- December 2019 after written informed consent from patients or their legally authorized representatives (Stanford IRB approval #18329). Eligibility criteria included age ≥ 18 years and Stanford MPN clinic diagnosis of essential thrombocythemia, polycythemia vera or myelofibrosis (defined using the consensus criteria at the time of this study). We use the term ‘myelofibrosis’ to encompass both primary myelofibrosis and myelofibrosis that evolved from essential thrombocythemia or polycythemia vera. Electronic medical records review of all subjects was performed by the clinical consultants (J.G. and L.F.), study data manager (C.P.), and the study principal investigator (A.K.). For controls, blood was collected from twenty-one asymptomatic adult donors selected at random from the Stanford Blood Center. All donors were asked for consent for genetic research. For both MPN patients and healthy controls, blood was collected into acid citrate-dextrose (ACD, 3.2%) sterile yellow-top tubes (Becton, Dickinson and Co.) and platelets were isolated by established,,, purification protocols. Blood was processed within 4 h of collection for all samples. The time from whole blood collection to platelet isolation was similar between healthy donors and MPN patients. Power calculations based on preliminary data indicated 20 PMF, 25 PV and 25 ET patients to achieve 80% power and identify statistically significant differential expression. This was based on the total number of genes as 12000 and the top 3326 genes as prognostic in PMF, 1190 in PV and 385 in ET with a desired minimum fold change of 2 and a False Discovery Rate FDR < 0.05. Our final sample set exceeded our preliminary calculations.

Method details

Platelet isolation

Human platelets were isolated and leuko-depleted using established methods (,, with excellent reproducibility,,,, resulting in a highly purified population of fewer than 3 leukocytes/107 platelets (> 99.9% purity) as counted by hemocytometer. Briefly, the ACD-tube whole blood was first centrifuged at 200xg for 20min at room temperature (RT). The platelet-rich plasma (PRP) was removed and Prostaglandin E1 was added to the PRP to prevent exogenous platelet activation. The PRP was then centrifuged at 1000xg for 20min at RT. The platelet pellet was re-suspended in warmed (37°C) PIPES saline glucose (PSG). Leukocytes were depleted using CD45+ magnetic beads (Miltenyi Biotec). Isolated platelets were lysed in Trizol for RNA extraction. Post RNA-seq analysis of an index leukocyte transcript (PTPRC; CD45) confirmed that the samples were highly purified platelet preparations (subsequent bioinformatic analyses also adjusted for PTPRC expression for absolute removal of any CD45 expression in our analyses). Two reference markers of platelet activation, P-selectin (SELP) and Glycoprotein IIbIIIA (CD41/ITGA2B) were expectedly higher in all MPN than healthy controls but were not statistically significantly different between MPN subtypes; indicating that any expression difference was not due to experimental artifacts. In addition, we know from prior work (J.R.) that regardless of activation status, RNA-seq reliably estimates mRNA expression patterns in platelets. We also know from rigorous prior work88, 89, 90 that several abundant platelet mRNAs are well-known leukocyte or red cell transcripts; and do not immediately imply contamination by these classes but rather that platelets express gene products that are also present in other cell lineages. Sixty-five of our top 100 abundant platelet transcripts matched exactly with those of the top 100 abundant genes from the three previous studies cited,; and a composite pathway analysis with the top 100 abundant genes from this as well as the previous studies matched identically.

Next-generation RNA sequencing

For next-generation RNA-sequencing (RNA-seq), 1x109 isolated platelets were lysed in Trizol and then DNase treated. Total RNA was isolated, and an Agilent bio-analyzer was used to quantify the amount and quality. The RNA yield was estimated by measuring absorbance at 260 nm on the Nanodrop 2000 (Thermo Fisher), and RNA purity was determined by calculating 260/280 nm and 260/230 nm absorbance ratios. RNA integrity was assessed on the Agilent Bioanalyzer using the RNA 6000 Nano Chip kit (Agilent Technologies). An RNA integrity number (RIN) was assigned to each sample by the accompanying Bioanalyzer Expert 2100 software. To control for variable RNA quality, RNA sequencing was only performed for samples with a RIN score of 7 or higher. RNA-seq libraries were constructed with removal of ribosomal RNA using the KAPA Stranded RNA-Seq kit with RiboErase (Roche). The RNA extraction and library preparation were performed by the same technician to minimize confounding effects. cDNA libraries were constructed following the Illumina TrueSeq Stranded mRNA Sample Prep Kit protocol and dual indexed. The average size and quality of each cDNA library were determined by the Agilent Bioanalyzer 2100, and concentrations were determined by Qubit for proper dilutions and balancing across samples. Twelve pooled samples with individual indices were run on an Illumina HiSeq 4000 (Patterned flow cell with Hiseq4000 SBS v3 chemistry) as 2 X 75bp paired end sequencing with a coverage goal of 40M reads/sample. Output BCL files were FASTQ-converted and demultiplexed.

Platelet transcriptome analysis

Picard, Samtools, and other metrics were used to evaluate data quality. Processed reads were aligned against the reference human transcriptome GRCh37/hg19 using RSEM and bowtie2, and expression at gene level determined by calculating raw gene count. Only genes that passed expression threshold were used; genes were considered expressed if, in all samples, they had at least 10 counts (genes with low counts are automatically filtered by built-in functions in DeSeq2, see below). A total of 12,924 genes were considered expressed. Gene expression data was library-size-corrected, variance-stabilized, and log2-transformed using the R package DESeq2. We refer to this version of the data as “raw data” as it is not corrected for any confounders of gene expression variability. DESeq2 was used to call differential expression while adjusting for patient age, gender and treatment as confounding variables and controlling for multiple comparisons using the Benjamini-Hochberg defined false discovery rate (FDR). Significant variance in expressed transcripts were pre-specified as those transcripts with an FDR < 0.05 and a log2 fold change ≥ 0.5 in MPN, as compared to healthy controls (the entire differential transcriptome was applied in the instances of downstream Gene Set Enrichment Analysis and the Lasso prediction modeling).

Quantification and statistical analysis

Continuous data were summarized as medians and IQRs and categorical data are presented as frequencies and percentages. To compare differences in clinical variables between healthy controls and MPN subtypes (ET, PV, and MF), we use box and whisker plots and conduct pairwise Wilcoxon signed ranked tests. For unsupervised clustering and visualization, we performed principal component analyses (identifying MPN subtypes by color, treatment by filled or open circles, and JAK2 mutation status by shape) using built-in functions of the DeSeq2 R package. We generated a heatmap of all of the top highly significant genes (FDR < 0.01) using the pheatmap R package and its built-in functions for hierarchical cluster analysis on the sample-to-sample Euclidean distance matrix of the expression data. All analyses were performed using the R studio interface.

Pathway/gene set enrichment for differentially expressed (DE) genes

Gene set enrichment analysis (GSEA) was performed on the entire DE gene set for each MPN subtype, using the Cancer Hallmarks gene sets from MSigDB. The ‘GSEA Pre-ranked’ function was used with a metric score that combines fold change and adjusted p value together for improved gene ranking. We used default settings with 10,000 gene set permutations to generate p and q values, and compared MPN subtypes in the overall cohort, and the ruxolitinib-treated subgroup and the ruxolitinib-naive subgroup separately. In these analyses, to allow for a broad comparison, we assessed all transcripts that were differentially expressed according to FDR/adjusted p < 0.25 as recommended by the authors of GSEA.

Predictive model generation and external validation

At the conception of this study late 2016, and our early work, we did not identify any publicly available RNA sequencing data on MPN platelets. This prompted our specific two-cohort design for the express purpose of temporal external validation as an essential step in rigorous prediction modeling. A subsequent independent publication facilitated an additional geographically independent external validation of our model. We used Lasso penalized regression for our model to predict MF from the either healthy controls, ET or PV. Among a variety of statistical machine-learning algorithms that have been used in prediction modeling, Lasso is favored for its flexibility and simplicity; and its ability to identify the least set of significant factors from high dimensional data. We evaluated platelet transcriptomic features with clinical features (age, gender and mutation status for the entire dataset including healthy donors, and in MPN patients alone, platelet and hemoglobin values). Normalized gene counts data were split into training (used for constructing multinomial logistic models) and validation (used for model evaluation and generalization) cohorts. Separately, we assessed the progressive and monotonic upward or downward trend in gene expression using Mann-Kendall trend test (multiple comparison adjusted with the Benjamini-Hochberg method) to normalized gene counts and identified statistically significant progressive genes across all three MPN subtypes. Three multinomial logistic models were constructed: first, with Lasso selected predictors from all genes, second, with Lasso selected predictors from progressive genes and third, a baseline model using age, gender, and mutation status (JAK2 and CALR) as predictors. Model outputs correspond to probabilities of having a CTRL, ET, PV, or MF phenotype (sum of these four probability values totaling 1). Potential interpretation of these probabilities includes MPN risk assessment, e.g., a patient with higher probabilities of PV and MF would indicate higher risk than one with higher probabilities of CTRL or ET. The dataset on MF platelet RNA seq served as an independent test set (Figure 5D schematic) while data from our cohorts at Stanford and additional external data on healthy donors from constituted an integrated training cohort (R package Limma was applied for bioinformatic correction of any batch effects). ROC curves were used to evaluate the different prediction models and discriminate outcomes. ROC curves demonstrate the trade-off between true positive and false positive rates, ideal being high true positive rate (sensitivity) and low false positive rate (specificity) the area under the curve (AUROC) as close to 1 as possible. True positive rate (TPR) is defined as correctly predicting an MF patient as MF; and false positive rate (FPR) as falsely predicting a non-MF patient as MF.
REAGENT or RESOURCESOURCEIDENTIFIER
Biological samples

Peripheral blood samples from healthy adults and patients with one of three subtypes of chronic myeloproliferative neoplasmsThis studyDeidentified
Platelets from healthy adults and patients with one of three subtypes of chronic myeloproliferative neoplasmsThis studyN/A

Deposited data

Deidentified human/patient platelet RNA-sequencing dataThis studydbGaP accession # Database: PHS-0021-21. v1.P1
Deidentified human/patient platelet RNA-sequencing dataGuo et al4Data secured via corr author, W. Erber
Deidentified human platelet RNA-sequencing dataRondina et al3NCBI Bioproject ID Database: 531691

Software and algorithms

DeSeq2Bioconductor Open SourceN/A
LASSO/glmnetCRAN R Package Open SourceN/A

  86 in total

1.  Targeting a regulator of protein homeostasis in myeloproliferative neoplasms.

Authors:  Lindsay M LaFave; Ross L Levine
Journal:  Nat Med       Date:  2016-01       Impact factor: 53.440

2.  RNA Sequencing and Analysis.

Authors:  Kimberly R Kukurba; Stephen B Montgomery
Journal:  Cold Spring Harb Protoc       Date:  2015-04-13

Review 3.  ADAMTS proteins in human disorders.

Authors:  Timothy J Mead; Suneel S Apte
Journal:  Matrix Biol       Date:  2018-06-06       Impact factor: 11.583

4.  Platelets in myeloproliferative neoplasms have a distinct transcript signature in the presence of marrow fibrosis.

Authors:  Belinda B Guo; Matthew D Linden; Kathryn A Fuller; Michael Phillips; Bob Mirzai; Lynne Wilson; Hun Chuah; James Liang; Rebecca Howman; Carolyn S Grove; Jacques A Malherbe; Michael F Leahy; Richard J Allcock; Wendy N Erber
Journal:  Br J Haematol       Date:  2019-08-19       Impact factor: 6.998

Review 5.  Myeloproliferative Neoplasms.

Authors:  Jerry L Spivak
Journal:  N Engl J Med       Date:  2017-06-01       Impact factor: 91.245

6.  Transcript profiling of human platelets using microarray and serial analysis of gene expression.

Authors:  Dmitri V Gnatenko; John J Dunn; Sean R McCorkle; David Weissmann; Peter L Perrotta; Wadie F Bahou
Journal:  Blood       Date:  2002-11-14       Impact factor: 22.113

7.  Gene expression profile correlates with molecular and clinical features in patients with myelofibrosis.

Authors:  Sebastiano Rontauroli; Sara Castellano; Paola Guglielmelli; Roberta Zini; Elisa Bianchi; Elena Genovese; Chiara Carretta; Sandra Parenti; Sebastian Fantini; Selene Mallia; Lara Tavernari; Stefano Sartini; Margherita Mirabile; Carmela Mannarelli; Francesca Gesullo; Annalisa Pacilli; Daniela Pietra; Elisa Rumi; Silvia Salmoiraghi; Barbara Mora; Laura Villani; Andrea Grilli; Vittorio Rosti; Giovanni Barosi; Francesco Passamonti; Alessandro Rambaldi; Luca Malcovati; Mario Cazzola; Silvio Bicciato; Enrico Tagliafico; Alessandro M Vannucchi; Rossella Manfredini
Journal:  Blood Adv       Date:  2021-03-09

8.  Depletion of Jak2V617F myeloproliferative neoplasm-propagating stem cells by interferon-α in a murine model of polycythemia vera.

Authors:  Ann Mullally; Claudia Bruedigam; Luke Poveromo; Florian H Heidel; Amy Purdon; Therese Vu; Rebecca Austin; Dirk Heckl; Lawrence J Breyfogle; Catherine Paine Kuhn; Demetrios Kalaitzidis; Scott A Armstrong; David A Williams; Geoff R Hill; Benjamin L Ebert; Steven W Lane
Journal:  Blood       Date:  2013-03-13       Impact factor: 22.113

9.  Germ line variants predispose to both JAK2 V617F clonal hematopoiesis and myeloproliferative neoplasms.

Authors:  David A Hinds; Kimberly E Barnholt; Ruben A Mesa; Amy K Kiefer; Chuong B Do; Nicholas Eriksson; Joanna L Mountain; Uta Francke; Joyce Y Tung; Huong Marie Nguyen; Haiyu Zhang; Linda Gojenola; James L Zehnder; Jason Gotlib
Journal:  Blood       Date:  2016-06-30       Impact factor: 22.113

10.  Gene expression profiling distinguishes prefibrotic from overtly fibrotic myeloproliferative neoplasms and identifies disease subsets with distinct inflammatory signatures.

Authors:  Waihay J Wong; Michele Baltay; Annaliese Getz; Kit Fuhrman; Jon C Aster; Robert P Hasserjian; Olga Pozdnyakova
Journal:  PLoS One       Date:  2019-05-09       Impact factor: 3.240

View more
  1 in total

Review 1.  Toward platelet transcriptomics in cancer diagnosis, prognosis and therapy.

Authors:  Anandi Krishnan; Sally Thomas
Journal:  Br J Cancer       Date:  2021-11-22       Impact factor: 9.075

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.