Literature DB >> 30840886

A Systematic Investigation of the Malignant Functions and Diagnostic Potential of the Cancer Secretome.

Jonathan L Robinson¹, Amir Feizi², Mathias Uhlén³, Jens Nielsen⁴.

Abstract

The collection of proteins secreted from a cell-the secretome-is of particular interest in cancer pathophysiology due to its diagnostic potential and role in tumorigenesis. However, cancer secretome studies are often limited to one tissue or cancer type or focus on biomarker prediction without exploring the associated functions. We therefore conducted a pan-cancer analysis of secretome gene expression changes to identify candidate diagnostic biomarkers and to investigate the underlying biological function of these changes. Using transcriptomic data spanning 32 cancer types and 30 healthy tissues, we quantified the relative diagnostic potential of secretome proteins for each cancer. Furthermore, we offer a potential mechanism by which cancer cells relieve secretory pathway stress by decreasing the expression of tissue-specific genes, thereby facilitating the secretion of proteins promoting invasion and proliferation. These results provide a more systematic understanding of the cancer secretome, facilitating its use in diagnostics and its targeting for therapeutic development.

Entities: CellLine Chemical Disease Gene Species

Keywords: cancer biomarkers; pan-cancer; protein secretion; secretome; systems biology; unfolded protein response

Mesh：

Substances：
Biomarkers

Year: 2019 PMID： 30840886 PMCID： PMC6441842 DOI： 10.1016/j.celrep.2019.02.025

Source DB: PubMed Journal: Cell Rep Impact factor: 9.423

INTRODUCTION

Early diagnosis is a major factor contributing to cancer treatment success (Etzioni et al., 2003; World Health Organization, 2017). As such, there have been extensive efforts to identify with improved accuracy and sensitivity biomarkers that indicate the presence of cancerous cells in a subject (Belczacka et al., 2019; Sawyers, 2008). Recent work has focused on the analysis of markers in biofluids, such as urine, plasma, or cerebrospinal fluid, as they are non-invasive and can be tested with greater frequency than tissue biopsies (Crowley et al., 2013; Diaz and Bardelli, 2014; Webb, 2016). A class of proteins that are of particular interest in this context is the secretome, which is the set of proteins secreted to the extracellular space, as they are generally more abundant in biological fluids than intracellular proteins (Kulasingam and Diamandis, 2008; Stastna and Van Eyk, 2012). The secretome is considered a valuable reservoir of potential biomarkers for cancern> and other diseases (Makridakis and Vlahou, 2010; Xue et al., 2008), and a number of studies have aimed to explore this class of proteins in search of tumor biomarker candidates. For example, Welsh et al., 2003 used Gene Ontology (GO) terms associated with an extracellular location and protein sequence patterns to define the secretome to compare the microarray gene expression profiles of 150 carcinomas spanning 10 tissues of origin to those of 46 healthy tissue samples. Biomarker candidates were validated via comparison with previous studies that had measured increased expression of the gene or protein in cancer tissue or in the serum of cancer patients. Other bioinformatics-based approaches to predict secreted cancer biomarkers include those of Prassas et al. (2012) for colon, lung, pancreatic, and prostate cancers, and Vathipadiekal et al. (2015) for ovarian cancer. These and other, similar investigations demonstrate the validity of using a bioinformatics-based approach to predict proteomic biofluid markers and to identify many new, promising biomarker candidates. However, these studies were generally restricted to a limited number of samples, tissue types, and/or cancer types; were often based on microarray data rather than RNA sequencing (RNA-seq) data; provided only a single set of candidates rather than a complete ranked list; and conducted little or no exploration of the biological functions associated with the proposed biomarkers. Proteomic approaches have often been used to profile the cancern> secretome (Brandi et al., 2018; Geyer et al., 2017; Hanash et al., 2008; Makridakis and Vlahou, 2010; Papaleo et al., 2017; Schaaij-Visser et al., 2013; Xue et al., 2008). These studies generally involve in vitro analyses of cell-line conditioned media or analysis of tumor interstitial fluid (or a more distant fluid such as blood, plasma, urine, or saliva) (Papaleo et al., 2017). For example, Wu et al. (2010) used SDS-PAGE followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) to analyze the secretome of conditioned media for 23 human cancer cell lines spanning 11 cancer types, which enabled the identification of both cancer-specific and pan-cancer serological biomarker candidates. Four of the candidates were validated experimentally, showing significantly elevated levels in the serum or plasma of liver, lung, or nasopharyngeal carcinoma patients relative to healthy controls. Despite the extensive information gained from these experimental investigations, there still exist a number of challenges that result in high variability and conflicting results among studies. For example, the use of cell lines is not an ideal representation of the in vivo system, culturing conditions can affect cell physiology and protein detection, there is a bias toward high-abundance proteins, protein concentrations span a large dynamic range in plasma, studies differ in sample collection and storage methods, and artifactual proteins are often identified, despite little or no relation to the disease in question (Geyer et al., 2017; Hanash et al., 2008; Kulasingam and Diamandis, 2008; Papaleo et al., 2017). In the present study, we conducted a systematic analysis of cancer-associated changes in secretome expression to predict candidate biomarkers that could be significantly elevated in the biofluids of individuals with cancer and are therefore more likely to be detectable. We then investigated the patterns and biological functions associated with shifts in secretome expression among different cancer types, focusing on shared “core” secretome behaviors, as well as cancer-specific features. The cancer secretome was explored in the context of tissue-specific genes, revealing a general pattern whereby tumor cells reduce their secretory pathway burden in an effort to relieve endoplasmic reticulum (ER) stress and the associated unfolded protein response (UPR). We expect the resulting ranked lists of biomarkers for each of the 32 different cancer types, in addition to the insight gained from the functional analysis of the cancer secretome and associated modulation of the secretory pathway in cancer cells, to expedite the development of effective diagnostic biomarkers and illuminate potential strategies for improved anti-cancer therapies.

RESULTS

Evaluation of Secretome Biomarker Candidates

To focus on proteins that are intentionally and actively secreted from the cell, we defined the secretome as all proteins possessing an N-terminal signal peptide and annotated as having a subcellular location of “secreted” (UniProt; Bateman et al., 2017). This yielded a set of 1,816 secretome genes for evaluation. In our investigation of cancer-specific secretome changes, we first sought to identify secretome genes whose encoded proteins were most likely to exhibit detectable changes in a biofluid as a result of their altered expression in a tumor. Our analysis pipeline therefore involved the comparison of primary tumor transcriptomes with those of (1) paired-normal tissue, (2) healthy tissue corresponding to the cancer tissue of origin, and (3) all healthy tissues in the human body (Figure 1A). Primary tumor and paired-normal RNA-seq profiles were retrieved for 32 cancer types from The Cancer Genome Atlas (TCGA), whereas healthy tissue profiles were obtained from the Genotype-Tissue Expression (GTEx) database (STAR Methods; Table S1).

Figure 1.

Overview of Cancer Secretome Biomarker Consensus Scoring Approach and Results

(A) Schematic overview of the scoring method. Primary tumor transcriptomic (RNA-seq) profiles were compared to those of (1) paired-normal tissue of the same patients, (2) healthy tissue matching the tumor tissue of origin, and (3) all healthy tissues in the body for which data were available. Genes were ranked according to their relative expression in tumor versus other samples; those with significantly elevated expression in the tumor were ranked highly, and these were combined into a single consensus rank score.

(B) For three representative cancer types, a t-SNE projection illustrates the separation of primary tumor (red), paired-normal tissue (yellow), and healthy tissue of origin (green) samples based on the abundance (log10TPM) of the top 10 consensus-ranked genes for that cancer type. t-SNE plots for the remaining 19 cancer types with at least one of each sample type are presented in Figure S1.

(C) Box and whisker plots showing the expression of CST1 and ANGPTL4 in all healthy tissue samples, as well as in tumor samples of the five cancer types with the highest median expression of the gene. The CST1 and ANGPTL4 genes are representative of multi-cancer and cancer-specific candidate markers, which ranked highly in the third comparison (tumor versus all healthy tissues) in multiple or only one cancer type, respectively.

(D) A heat-scatterplot presenting the results of the three comparisons for the top 10 consensus-ranked secretome biomarker candidates from each cancer type. Only cancer types with sufficient tumor, paired-normal tissue, and healthy tissue of origin samples to conduct all three comparison types are shown. The color of the circles corresponds to the mean log2FC from the first two comparison methods (FC1 and FC2), while circle size is based on the extent to which a gene is expressed at higher levels in the tumor compared to all healthy tissues (quantified as p3). Stars next to gene names indicate those whose encoded proteins are present in the human plasma proteome (Schwenk et al., 2017), in which a filled star represents a canonical (confirmed) protein, an empty star indicates some evidence but the status is uncertain, and proteins with no star are either undetected or not included in the database. Rows and columns were clustered based on Euclidean distance between mean log2FC values.

Generation of a Consensus Score

To integrate information from the three comparisons performed, the results were combined to generate a consensus score for each gene in each cancer type. Top-ranked (high-scoring) genes for each cancer type were those with elevated expression in tumor samples compared to paired-normal tissue, healthy tissue of origin, and all healthy tissues. The complete set of consensus scores for all cancer types, as well as the fold changes (log2FC) and significance values (p values) used to determine the scores, are presented in Table S2. Transcriptomic data of top-ranked genes were examined to confirm their distinct and elevated expression in tumor versus non-tumor samples. T-distributed stochastic neighbor embedding (t-SNE) was performed on tumor, paired-normal tissue, and healthy tissue transcript per million (TPM) values of the top 10 consensus-ranked genes for each cancer type (Figures 1B and S1). The majority of tumor samples exhibited clear clustering and separation from non-tumor samples, confirming distinct expression profiles between these groups among the highly ranked genes. The t-SNE plots also demonstrate differences between paired-normal tissue and healthy tissue samples, highlighting the importance of including both tumor versus paired-normal tissue and tumor versus healthy tissue comparisons in the consensus rank. Although a difference in data sources (TCGA versus GTEx) could contribute to the observed paired-normal tissue versus healthy tissue separation, a previous analysis of the same two datasets found robust differences even after normalizing for potential batch effects (Aran et al., 2017), thus supporting a biological component. The elevated expression of top-scoring candidates in tumorsn> compared to all normal tissues is illustrated in Figure 1C for two example genes, cystatin SN (CST1) and angiopoietin-like 4 (ANGPTL4). These genes are representative of two types of biomarker candidates: those with elevated expression in one cancer type (ANGPTL4) and markers with elevated expression in multiple cancer types (CST1). Previous studies have experimentally confirmed significantly elevated protein levels of ANGPTL4 in the serum of patients with renal cell carcinoma (Dong et al., 2017) and of CST1 in the serum and urine of colorectal cancer subjects (Yoneda et al., 2009) relative to non-cancer controls.

Top-Scoring Biomarker Candidates

The results for the top-ranked genes across the 20 cancern> types included in all three comparisons are illustrated in Figure 1D. There was a marked clustering of high ranks among many cancers for the collagen (COL) and matrix metalloproteinase (MMP) genes. Many members of the MMP family have been detected at significantly elevated levels in the plasma, serum, and/or urine of patients with cancers such as bladder (Eissa et al., 2007), esophageal (Mroczko et al., 2008), colorectal (Dragutinović et al., 2011), prostate (Roy et al., 2008), lung (Izbicka et al., 2012), breast (Patel et al., 2011), and renal (Sarkissian et al., 2008), compared to non-cancer controls. Collagens have similarly been validated as tumor biomarkers. Previous studies have, for example, measured a significant increased abundance of type IV collagens in the plasma of pancreatic cancer subjects (Ohlund et al., 2009), COL10A1 in the serum of colorectal cancer subjects (Solé et al., 2014), COL6A3 in the urine of bladder cancer subjects (Lindén et al., 2012), or degradation products of types I, III, and IV collagens in the serum of ovarian and breast cancer patients (Bager et al., 2015) relative to controls. Many of the top-scoring, cancer-specific markers have also been experimentally validated as significantly elevated in a biofluid of subjects harboring that particular type of cancer, some of which are currently used in the clinic for diagnosis. For example, four of the top five scoring candidates for liver hepatocellular carcinoma (LIHC) have been experimentally validated as biofluid LIHC biomarkers (ESM1, AFP, GPC3, and MDK); AFP is the most commonly used serological LIHC marker in the clinic (Capurro et al., 2003; Lou et al., 2017; Spangenberg et al., 2006; Yang et al., 2017; Zhu et al., 2013). Likewise, two (ANGPT2 [Gayed et al., 2015] and ANGPTL4 [Dong et al., 2017]) of the top five candidates for kidney renal clear cell carcinoma (KIRC) and the top candidate (LAMC2 [Kosanam et al., 2013]) for pancreatic ductal adenocarcinoma (PAAD) have been measured at significantly higher concentrations in the plasma, serum, or urine of subjects harboring the respective cancer types compared to non-cancer controls.

Extension to an Experimentally Defined Secretome

The biomarker analysis described here included only classically secreted proteins that contain a signal peptide, but many proteins are secreted through unconventional routes and possess similar diagnostic potential (Rabouille, 2017). However, defining a list of unconventionally secreted proteins is non-trivial due to the many secretion routes available (e.g., exosomes, pore-mediated translocation, ATP-driven transport [Rabouille, 2017]), as well as the variation in their protein cargo across different cell types or conditions (Vlassov et al., 2012). We therefore used the Human Cancer Secretome Database (HCSD) (Feizi et al., 2015) to generate a list of all of the proteins (regardless of signal peptide) that had been experimentally detected in the secretome among any of the 35 studies encompassed by the database. This yielded an “experimental secretome” consisting of ~6,500 proteins, ~800 of which were present in our signal peptide-derived secretome. The results and associated consensus ranks for the experimental secretome are presented in Table S3.

Exploration of the “Core” Cancer Secretome Definition of the Core Secretome

Shifts in secretome expression associated with malignant transformation can be used to identify candidate cancer biomarkers; however, based on our global analysis across different cancer types, it is also possible to address the more fundamental question of why cancer cells restructure their secretome profile throughout tumorigenesis. We therefore sought to investigate the biological features underlying the altered secretome expression. Motivated by the large number of multi-cancer candidates in our biomarker analysis, we first explored the core cancer secretome—the subset of the secretome exhibiting strong differential expression across most or all of the cancer types studied. Secretome genes were ranked based on the magnitude and significance of their expression fold changes (tumor versus paired normal) across all cancer types, referred to here as the PF rank (STAR Methods).

Members of the Core Secretome

Upon inspection of the genes populating the top 1% (16 of 1,563 genes) of the pan-cancer PF ranks, two key features were immediately apparent (Figure 2A). First, each gene exhibited an expression change in the same direction across all (or nearly all) of the cancer types, despite ignoring the fold change direction in the rank calculation. This is supportive of an important and defined tumorigenic role for each of the associated encoded proteins, independent of the tissue or cell type from which it originates. Second, 15 of the 16 top-ranked genes exhibited an expression decrease across all or nearly all of the cancer types, suggesting that cancer type-independent shifts in secretome expression tend to be decreases.

Figure 2.

Constituents and Functions of the Core Cancer Secretome

(A) A heat-scatterplot presenting the log2FCs and corresponding significance (false discovery rate [FDR]-adjusted p values) for the 16 genes making up the top 1% of the non-directional core secretome. The color and size of the points correspond to the log2FC and log-transformed p values, respectively, from the DE analysis between tumor and paired-normal samples.

(B) The top 1% of the increased core secretome, obtained in the same manner as the non-directional set in (A), except the fold change direction was incorporated to identify secretome genes exhibiting increased expression across many cancer types.

(C) Gene sets found to be significantly enriched in the decreased (left column), non-directional (center column), or increased core secretome (right column), in which the top 20 most significant sets from each directional class are shown. The intensity of the color in the heatmap indicates the enrichment significance of the gene set. Gene set names are colored according to the Molecular Signatures Database (MSigDB) collection from which they originate: Hallmark, Kyoto Encyclopedia of Genesand Genomes (KEGG), Reactome, GO biological process, and GO molecular function. A non-stacked bar plot to the left of the heatmap shows the sizes (number of genes) of the original gene sets (gray bars) and of the filtered gene sets containing only secretome genes (black bars).

Enrichment of Functions in the Core Cancer Secretome

Although analysis of the top-ranked core secretome genes offered insight into common functions that were downregulated (or upregulated) across the different cancer types, it excludes information about the remaining 99% of secretome members. We therefore conducted a gene set analysis (GSA) to account for the PF ranks of all of the secretome genes in determining coordinated shifts in secretome function. The GSA was performed using both non-directional and directional PF ranks. The most significant gene sets associated with the core secretome were related to ECM turnover, cell-matrix adhesion, and signaling processes involving the ECM or immunity and inflammation (Figure 2C). Furthermore, the secretome expression increase associated with the epithelial-mesenchymal transition (EMT) underscores the importance of the cancer secretome in metastatic and invasive processes, regardless of cancer type. Gene sets related to glycosaminoglycan (GAG) binding, specifically heparin, were among the most significant coordinated decreases in secretome expression. As the genes within these sets encode for proteins associated with cell-matrix and basement membrane adhesion, their decreased expression further supports a contribution of the secretome to a more migratory and invasive phenotype.

The Effects of Tumor Purity on Core Secretome Expression Profiles

Tumorsn> are infiltrated to varying degrees by non-cancerous cells, such as stromal or immune cells (Hanahan and Weinberg, 2011). Molecular profiles of bulk tumor samples will therefore contain signatures from these infiltrating cells, which can obscure or be misinterpreted as those originating from tumor cells. To assess whether infiltrating cells were responsible for any of the identified features of the core secretome, we repeated the analyses using only tumor samples with a consensus purity estimate (CPE) (Aran et al., 2015) of at least 80% (Figure S2). The major features remained largely unchanged, supporting their association with the cancerous cells themselves. For example, all 16 genes in the top 1% of the core secretome exhibited a significant expression decrease in most or all of the included cancer types, and 11 of those genes were also present in the top 1% for the original analysis. Functions related to ECM turnover were again enriched among core secretome expression increases, although to a lesser extent when considering only high-purity tumor samples.

Cancer Type-Specific Secretome Expression Profiles

Following the investigation of coordinated pan-cancer secretome shifts, we were interested in evaluating the cancer types individually and determining which processes and functions exhibited strong changes within each type. We therefore conducted a directional and non-directional GSA of the differential expression (DE) analysis results, in which the direction of expression fold changes were included or excluded, respectively (STAR Methods; Varemo et al., 2013). In the dirn>ectional GSA (Figure 3A), cancer types generally exhibited expression increases associated with ECM components and metalloprotease activity; however, cholangiocarcinoma (CHOL) and head and neck squamous cell carcinoma (HNSC) accounted for the most significant increases, whereas prostate adenocarcinoma (PRAD), bladder urothelial carcinoma (BLCA), uterine corpus endometrial carcinoma (UCEC), and the kidney cancers displayed no coordinated change or even a modest decrease in expression. For these latter cancers, the non-directional GSA results (Figure 3B) revealed significant expression changes associated with these processes, but it was a mix of increases and decreases rather than a coordinated shift in one direction. Conversely, expression decreases related to adhesion and GAG binding were observed across many cancer types, with the most significant decreases occurring in kidney chromophobe (KICH), BLCA, and UCEC. Again, when ignoring the direction of expression change, virtually all of the cancers exhibited significant shifts in the secretome related to these functions. These results suggest that different cancer types are shifting their secretome expression in accordance with a common set of molecular functions, but the extent and direction of these changes are often tuned specifically to the tissue of origin.

Figure 3.

Gene Set Analysis of the Cancer Secretome

Heatmaps illustrate the (A) directional and (B) non-directional GSA results for secretome genes based on the tumor versus paired-normal fold changes and significance in 17 different cancer types. Only the GO molecular function gene set collection (MSigDB) was evaluated, and sets with <10 genes were excluded. In (A), the distinct directional gene set p values are calculated for coordinated increases (padj,dist-dir-up) and decreases (padj,dist-dir-down) in expression. The more significant (lower value) of the two directional p values for each gene set is shown in the heatmap as a log10-transformed value. The value is also “signed,” meaning that gene sets with a more significant decrease than increase (padj,dist-dir-down < padj,dist-dir-up) are made negative; otherwise, they are positive. Only gene sets with a padj,dist-dir ≤ 0.01 (in either direction) in at least one cancer type are shown. A non-stacked bar plot to the left of the heatmap shows the sizes of the original gene sets (gray bars) and of the filtered gene sets containing only secretome genes (black bars). *The ammonium ion binding gene set was identical to the quaternary ammonium group binding set after removing non-secretome genes; thus, the latter set is not shown. **The chemokine activity gene set was identical to the chemokine receptor binding gene set after removing non-secretome genes; thus, the latter set is not shown. See also Figure S2.

When repeating the analysis with high-purity tumorn> samples (Figure S2), much of the enrichment of secretome expression increases in ECM-related functions were reduced or absent, indicating a potential contribution of non-tumor cells to this behavior. However, significant coordinated expression decreases among genes associated with adhesion and GAG binding were observed to an even greater extent when using high-purity tumor samples, suggesting a more tumor-specific behavior. Another feature of interest was the significant decrease in expression associated with several gene sets that was unique to CHOL and LIHC. Even when ignoring the directionality of expression change, only CHOL and LIHC exhibited significant changes in these sets (Figure 3B). These sets included genes associated with normal liver function, including binding or activity related to lipids, alcohols, sterols, and lipoproteins. Thus, it appeared that CHOL and LIHC, which both originate from the liver, were decreasing the expression of their healthy, tissue-specific secretome components in favor of those related to malignant and invasive processes.

Decreased Expression of Genes Specific to Tumor Tissue of Origin

Given that liver-derived cancers CHOLn> and LIHC exhibited significant and coordinated decreases in the expression of the secretome components specific to liver function, we investigated expression changes in the context of tissue specificity across all cancer types. In addition, to obtain a more comprehensive picture of the secretory pathway clientele, we expanded the analysis to include any protein possessing a signal peptide, not only those that are destined for secretion (e.g., membrane proteins). This corresponded to a set of 3,491 signal-peptide genes, referred to hereafter as SP genes. Tissue-specificity data from the <n class="Chemical">span class="Species">Human Protein Atlas (HPA) (Uhlen et al., 2015) was used to define the set of <spn>an class="Chemical">SP genes associated with each tissue (STAR Methods; Table S4). The DE analysis (tumor versus paired normal) results for each cancer type were then evaluated in the context of the tissue-specific gene sets to determine whether any of the cancer types exhibited significant expression changes in the subset of SP genes that are typically associated with a particular healthy tissue. As in the previous analyses, directionality of fold change was also taken into account to determine whether there were significantly coordinated expression increases or decreases. In addition to the liver-derived cancersn>, the trend of a decrease in tissue-specific SP gene expression generally held true among the other cancers (Figures 4 and S3), all of which exhibited either a significant coordinated decrease or no significant change in the genes specific to their respective tissue of origin. Furthermore, the same behavior was observed even when including only high-purity tumor samples (Figure S4A).

Figure 4.

Tissue-Specific Expression Changes in SP Genes

The heatmap shows the significance and direction of coordinated expression changes in SP genes classified as specific to various tissue types. Cancer and tissue types are organized such that entries along the diagonal represent cancer types paired with their tissue of origin and are outlined in a solid box if there is a significant (padj < 0.05) coordinated expression decrease among the tissue-specific SP genes for that cancer type or in a dotted box otherwise. The log-transformed p values of cancer types sharing the same tissue of origin were averaged to facilitate this organization. The complete results for each individual tissue and cancer type are presented in Figure S3. The number of tissue-specific SP genes for each tissue type are indicated in the bar plot to the left of the heatmap. The distribution of tissue-specific SP gene expression changes across different cancer types is presented for two representative tissue types: prostate and liver. The log2FC values for each set of genes are represented by boxplots, with the individual gene values shown as gray points whose sizes indicate the significance (p value) of their FC. See also Figure S4.

Consistent with the GSA results, LIHC and CHOL exhibited a significant coordinated decreased expression of liver-specific genes. None of the 176 liver-specific SP genes were significantly (padj < 0.05) increased in either LIHC or CHOL relative to paired-normal tissue, whereas 156 (89%) and 174 (99%) of these genes exhibited a significant decrease in expression for LIHC and CHOL, respectively. These genes encoded functions such as lipid and cholesterol transport and metabolism (apolipoproteins), the complement system, coagulation, and protease inhibition (serpins). Similar strong, coordinated decreases in the expression of tissue-specific SP genes were observed in breast, colorectal, and lung cancers, in which only three or fewer genes in each set (<6%) were significantly increased in expression, while the majority were significantly decreased. The four cancer types that did not show a significant coordinated decreased expression in SP genes specific to their corresponding tissue of origin were BLCA, esophageal carcinoma (ESCA), PRAD, and UCEC. However, ESCA, PRAD, and UCEC did exhibit a significant decrease in the expression of genes specific to a tissue near their tissues of origin (stomach, seminal vesicle, and ovary, respectively) (Figure S3), suggesting a similar phenomenon. The data cannot distinguish between tumor cells that have actively decreased their tissue-specific gene expression and those that originated from more stem-like cells from the start; however, the end state is the same in that (most) cancer types exhibit a lower expression of tissue-specific SP genes in tumor cells than in the corresponding normal tissue.

Evaluation of Secretory Pathway Stress Signatures

The common decrease in the expression of tissue-specific <n class="Chemical">span class="Chemical">SP genes across many different <spn>an class="Disease">cancer types suggests a general pattern in which tumor cells are relieving the burden on an already strained (Ma and Hendershot, 2004) secretory system. By limiting the production and secretion of tissue-specific components, tumor cells may be able to dedicate more resources to processing proteins that contribute to cell proliferation and other malignant processes. To investigate further, we evaluated the tumor versus paired normal DE data for signs of increased stress or burden on the secretory pathway.

Activation of the UPR

Disruption of the secretory pathway results in the accumulation of misfolded proteins, which in turn activates a series of adaptive processes collectively known as the UPR to restore ER homeostasis (Ron and Walter, 2007). Coordinated expression increases in UPR-associated genes would therefore be indicative of cells undergoing secretory pathway stress and UPR activation. For each cancer type, we evaluated the enrichment of expression changes in genes affiliated with the UPR (all affiliated genes, not only secreted or SP genes). The results revealed a significant coordinated increase in UPR-related gene expression in nearly all cancer types (Figures 5 and S5A), consistent with previous reports regarding the prevalence of UPR activation among many cancers (Dejeans et al., 2014; Ma and Hendershot, 2004). CHOL and papillary thyroid carcinoma (THCA), however, exhibited a negligible coordinated expression increase in UPR-associated genes. The same results were observed when considering only high-purity tumor samples (Figure S4B), although CHOL was excluded due to the absence of purity scores for this cancer type.

Figure 5.

Coordinated Expression Increases Associated with the UPR

Shown are the log-transformed directional p values representing the significance of coordinated expression changes in genes associated with the UPR, defined as those included in the unfolded protein response gene set in the Hallmark gene set collection from MSigDB. Bars are colored blue if there is a significant (padj < 0.05) expression increase among the genes for that cancer type; if not, they are colored yellow. See also Figures S5 and S6.

Given that CHOL and THCA were among the cancer types exhibiting a strong coordinated expression decrease in tissue-specific SP genes (Figure 4), the data are supportive of the observed pattern whereby tumor cells alleviate secretory pathway stress by reducing the expression of SP genes specific to sustaining the function of their tissue of origin. Likewise, cancer types with an insignificant decrease in the expression of their respective tissue-specific SP genes (BLCA, ESCA, PRAD, and UCEC) exhibited coordinated expression increases associated with ER stress and the UPR (Figures 5 and S5A).

Estimation of Secretory Burden

Proteins traversing the secretory pathway undergo a number of maturation processes such as folding and post-translational modifications (PTMs). Larger proteins with a greater number of PTMs will require more cellular resources than shorter, less-modified proteins, and thus may impart a greater burden on the secretory pathway (Feizi et al., 2017; Gutierrez et al., 2018). We reasoned that a shift in expression toward lower-cost proteins may constitute another potential strategy to alleviate secretory pathway stress in tumor cells. To quantify this cost, we formulated a secretory burden (SB) score for each SP gene i as a function of its encoded protein length L (i.e., number of amino acids) and number of disulfide (NDS) and glycosylation (Ngly) sites: where each property is normalized by the median (med) value among all of the SP genes. For each cancern> type, the Spearman correlation between gene SB scores and expression fold changes was calculated (Figures 6 and S5B). Although the correlation coefficients were low, the trend was consistent with our observations regarding UPR activation and decreased expression of tissue-specific SP genes, which is best illustrated by the two extremes, BLCA and CHOL. BLCA yielded the strongest negative correlation between SB score and log2FC, suggesting that expression increases tend to be associated with low-burden SP genes, whereas the opposite was true for CHOL. Given that BLCA showed evidence of UPR activation and exhibited the least significant expression decrease in tissue-specific SP genes, it suggests that the inability of BLCA cells to relieve secretory pathway stress via reduction in tissue-specific SP gene expression may constrain their ability to process proteins with a high secretory burden. Conversely, CHOL exhibited the strongest expression decrease in tissue-specific SP genes and showed little evidence of UPR activation, which is indicative of lower secretory pathway stress, thus relaxing the constraint on which proteins the secretory pathway can accommodate. Cancer types mirroring the trend of BLCA included ESCA, PRAD, and UCEC, whereas THCA followed that of CHOL.

Figure 6.

Correlation between Protein Secretory Burden and Gene Expression Fold Change

Cancer types with a significantly (p < 0.05) negative correlation are colored blue, significantly positive cancers are colored red, and those with an insignificant correlation are colored gray. See also Figures S5 and S6.

To further explore the PTM burden, we investigated the expression changes in genes associated with different PTMs: N- and O-linked glycosylation, and protein disulfide bond oxidation and reduction (Figure S5C). Nearly half of the studied cancer types exhibited a significant coordinated expression increase in genes associated with glycosylation and/or disulfide bond formation, suggesting an additional effort to reduce secretory stress. The opposite behavior was observed for CHOL, which exhibited significant expression decreases associated with disulfide redox and N-linked glycosylation. All of the cancer types that did not show a coordinated expression increase associated with these PTMs were those exhibiting a significant decrease in their tissue-specific secretome, providing additional support for this relief strategy.

Additional Contributors to the UPR

Although HNSC and rectum adenocarcinoma (READ) exhibited a coordinated expression decrease in tissue-specific SP genes, as well as a positive correlation between SB score and gene expression fold change, these cancer types still show evidence of an activated UPR, unlike CHOL and THCA. Because the UPR can be triggered by sources of stress other than an overburdened secretory pathway (e.g., genome instability, hypoxia, nutrient deprivation) (Corazzari et al., 2017), it is possible that one or more of these alternative sources are contributing to UPR activation in HNSC and READ cells, despite their modified secretory profile. We therefore compared genome instability among the different cancer types using mutation profiles from TCGA whole-exome sequencing datasets. HNSC and READ samples exhibited similar mutation burdens (median of 134 and 127 somatic mutations per sample, respectively), which were >2-fold greater than CHOL (63 median mutations per sample) and >10-fold greater than THCA (12 mutations per sample) (all p < 10−6, one-sided Wilcoxon rank-sum test) (Figure S6). These results support the possibility that other sources of stress beyond those directly involving the ER and secretory pathway could be responsible for elevated UPR activation in HNSC and READ.

DISCUSSION

The secretome is regarded as an attractive reservoir of disease biomarkers, as its extracellular nature offers the potential to evaluate physiological status through easily accessible biofluids (Kulasingam and Diamandis, 2008; Schaaij-Visser et al., 2013; Stastna and Van Eyk, 2012). Furthermore, there are many protein biomarkers in use for the diagnosis or monitoring of different cancer types based on their abundance in serum, plasma, or urine, such as PSA, CA-125, CA19–9, and NuMA for prostate, ovarian, pancreatic, and bladder cancer, respectively (Füzéry et al., 2013). Beyond its potential as a reservoir of biomarker candidates, the cancern> secretome is known to play a crucial role in tumor development and invasion. We sought to evaluate cancer-associated shifts in secretome expression with regard to the function of the encoded proteins. The majority of shared pan-cancer changes in secretome expression were decreases and included proteins associated with functions such as cell-cell and cell-matrix adhesion, tumor suppressors with anti-proliferative or anti-migratory activities, and immune response. These proteins harbor potential therapeutic opportunities, either by targeting the factors driving their expression decrease or through direct use of the tumor suppressor as a therapeutic peptide (Bonin-Debs et al., 2004; Guo et al., 2014; Oricchio et al., 2011). For example, ANGPTL1, which was among the top 1% core decreased secretome proteins, has been demonstrated to suppress cell migration, invasion, angiogenesis, metastasis, and/or therapy resistance in hepatocellular carcinoma (Chen et al., 2016; Yan et al., 2017), colorectal cancer (Chen et al., 2017), and lung and breast cancers (Kuo et al., 2013). The trend of expression decreases among the secretome was also observed in the cancer-specific analyses, in which liver-related cancers (LIHC and CHOL) exhibited a particularly strong decrease in the expression of liver-specific SP genes. This reduced expression of tissue-specific genes in hepatocellular carcinoma has been explored previously; the extent of expression decrease was shown to negatively correlate with tumor grade or degree of dedifferentiation (Ge et al., 2005; Uhlen et al., 2017). We investigated this further, focusing on the subset of proteins targeted to the secretory pathway and spanning many different cancer types. Using tissue-specific gene classification from the HPA, this phenomenon of a significant decrease in expression of SP genes specific to the tissue of origin of the cancer was found to hold across the majority of examined cancer types. Since UPR activation (Urra et al., 2016) and increased expression of secretory pathway machinery (Dejeans et al., 2014) are common in many cancers, our results suggest a common pattern by which tumor cells modify their secretory profile to alleviate ER stress by reducing the production of tissue-specific components in favor of tumorigenic factors. Consistent with this hypothesis, CHOL and THCA, which exhibited among the strongest decreases in their tissue-specific SP genes, were associated with the weakest UPR activation and displayed no bias toward the increased expression of low-burden (shorter and with fewer PTMs) SP genes. Conversely, the few cancer types with an insignificant decrease in their tissue-specific SP gene expression (BLCA, ESCA, PRAD, and UCEC) exhibited increased expression associated with the UPR and displayed an apparent bias in expression toward lower-burden SP genes. Given that different tissues exhibit fine-tuned expression of their secretory machinery to accommodate their unique secretome profile (Feizi et al., 2017), it is reasonable to expect that a malignant cell could quickly overload this system and induce ER stress upon increasing the production of tumorigenic components without an accompanying decrease in other SP genes. A number of anti-cancer therapies that activate the UPR are under development or approved for clinical use, demonstrating the importance of this system in cancer treatment (Hetz et al., 2013). Although many cancers are known to leverage an activated UPR for its cytoprotective and restorative effects, UPR-targeted therapies function by driving the response further to a pro-apoptotic regime. We reasoned that the strong decrease in tissue-specific SP gene expression observed in CHOL or THCA cells, coupled with the insignificant coordinated expression increase in UPR-associated genes, could indicate a heightened sensitivity of these cancers toward this form of stress. In support of this hypothesis, treatment of CHOL cells in vitro and in a subcutaneous transplantation mouse model with bortezomib, which activates the UPR via proteasome inhibition, was shown to inhibit proliferation and induce apoptosis (Vaeteewoottacharn et al., 2013). Furthermore, bortezomib has been found to induce apoptosis in THCA cell lines with half-maximal inhibitory concentration (IC50) values lower than those of other cancer types (e.g., glioma, colon, renal, ovarian, prostate) (Mitsiades et al., 2006), whereas bortezomib treatment of BLCA cell line 253JB-V did not result in significant apoptosis and could not inhibit 253JB-V tumor growth in mice unless combined with another therapy (gemcitabine) (Kamat et al., 2004). Overall, the functional diversity and close involvemenn>t of the secretome in a number of critical tumorigenic and metastatic processes highlights the importance of this group of proteins in cancer pathophysiology and presents a strong case for its targeting in anti-cancer therapeutic development. In addition, the ranked list of secretome biomarker candidates for each of the 32 different cancer types is expected to help facilitate the development of more accurate, less invasive diagnostic methods.

STAR★METHODS

CONTACT FOR REAGENT AND RESOURCE SHARING

Further information and requests for resources should be dirn>ected to and will be fulfilled by the Lead Contact, Jens Nielsen (nielsenj@chalmers.se).

METHOD DETAILS

Definition of the secretome and SP genes

The list of proteins comprising the classically secreted secretome was obtained via UniProt (uniprot.org) (Bateman et al., 2017). Beginning with the entire human proteome (UP000005640), proteins were filtered to include those labeled as “UniProtKB/Swiss-Prot (reviewed),” with a subcellular location of “Secreted,” and PTM/Processing of “Signal peptide,” yielding 1,838 unique UniProt entries. The associated Entrez gene IDs and gene names were mapped to Ensembl IDs (GRCh38.p12), where those that did not map were excluded, and duplicated entries were removed, resulting in a secretome of 1,816 unique genes when analyzing TCGA data. For analyses also involving GTEx samples, genes absent from the that dataset were excluded, yielding a secretome comprised of 1,810 genes. SPn> (signal peptide) genes were defined and generated in the same way as the secretome, except without the requirement for a subcellular location of “Secreted.” This resulted in a set of 3,491 SP genes, of which 3,111 had associated differential expression data (TCGA primary tumor versus paired normal).

The experimentally-derived secretome

Many proteins are secreted despite not having a signal peptide. To account for these unconventionally secreted proteins, we defined an “experimentally-derived” secretome consisting of proteins that had been detected within the extracellular environment in any one of the 35 secretome studies included in the Human Cancer Secretome Database (HCSD). We first retrieved the label-free proteomic data from HCSD, and extracted a list of all proteins that had been detected in at least one of the studies. For the label-based studies, proteins were retrieved if they had been measured to decrease or increase in concentration among any of the studies, as both cases imply detection. These lists were combined and mapped to the set of genes present in TCGA RNA-Seq data, resulting in a secretome consisting of 6,543 genes.

Retrieval of human plasma proteome data

Given the RNA-based nature of the analysis, we sought to enrich the results through the integration of protein-level data. We therefore retrieved a list of proteins that have been experimentally detected in plasma, which is a result of the Human Plasma Proteome Project (HPPP) (Schwenk et al., 2017). This protein evidence information was integrated with the consensus score results summarized in Figure 1D and Table S2. The humann> plasma proteome was retrieved from PeptideAtlas (Farrah et al., 2013) (htpp://www.peptideatlas.org/hupo/c-hppp/). Only entries with a neXtProt protein evidence (PE) level of 1 (evidence at the protein level) were considered. This yielded four sets of proteins with categories of “canonical,” “uncertain,” “redundant,” or “not observed” (see Tables S2 or S3 for category definitions). Non-unique protein entries were combined, where the category of greater evidence was used if multiple categories were assigned to the same entry. Genes in the present study that did not have a corresponding entry in the plasma proteome dataset were categorized as “NA.”

Transcriptomic data retrieval

RNA-Seq data (FPKM and raw gene counts) were retrieved from TCGA on May 4, 2017 using the TCGAbiolinks (Colaprico et al., 2016) package in R (Gentleman et al., 2004; R Development Core Team, 2018), for all 33 cancer types available at that time. One cancer type, acute myeloid leukemia (LAML), did not have any associated primary tumor RNA-Seq data, and was thus excluded from all analyses, resulting in a total of 32 cancer types. GTEx RNA-Seq data (V7, TPM and raw gene counts) were retrieved directly from the site (http://www.gtexportal.org/home/datasets) on October 18, 2017. Primary tumorn> and paired-normal transcriptomic (RNA-Seq) data were retrieved for 32 cancer types from TCGA, for a total of 9,760 primary tumor and 730 paired-normal samples, where both sample types were available for 697 patients. Healthy tissue RNA-Seq data was retrieved from the GTEx database, for a total of 11,688 samples spanning 714 donors and 30 tissue/organ types (or 53 subtissue types).

Mutation burden quantification

Mutation annotation files (MAFs) derived from whole-exome sequencing data were retrieved for all available cancern> types from TCGA using the TCGAbiolinks R package. The total number of somatic mutation events (insertion, deletion, or single nucleotide polymorphism) for each primary tumor sample were summed to yield a total mutation burden for each sample.

Analysis of high-purity tumor samples

Consensus purity estimate (CPE) scores for TCGA primary solid tumorn> samples were obtained from a previous study (Aran et al., 2015), which calculated and combined purity scores using four different methods (ESTIMATE (Yoshihara et al., 2013), ABSOLUTE (Carter et al., 2012), LUMP and IHC (Aran et al., 2015)). All tumor samples with a CPE of less than 80% (0.80), or those that did not have a score available, were removed from the high-purity analysis. Cancer types that did not have any scores available (CHOL, ESCA, PAAD, PCPG, STAD), or had 3 or fewer tumor-normal sample pairs after removing low-purity tumor samples (BLCA, HNSC) were also excluded.

Consensus biomarker score

The consensus biomarker score was generated by combining the results from three types of sample comparison: (1) tumorn> versus paired normal, (2) tumor versus healthy tissue of origin, and (3) tumor versus all healthy tissues.

Comparison 1: primary tumor versus paired-normal tissue

The first comparison leveraged the paired nature of TCGA samples, meaning the tumor and normal tissue sample originated from the same patient. This enabled an estimation of gene expression changes that were specific to malignant transformation, rather than those arising from variation among patients or tissues of origin. TCGA data were filtered to only keep patients with paired samples; i.e., those with both a primary tumor and normal tissue sample. Furthermore, only cancer types with at least three patients after filtering were included, resulting in a final count of 693 patients spanning 20 cancer types. For each cancer type, a differential expression analysis was performed, comparing primary tumor with paired normal tissue, using the patient ID as a blocking factor.

Comparison 2: primary tumor versus healthy matched tissue

The second comparison was conducted in recognition of the fact that paired-normal samples are not always representative of normal healthy tissue, as nearby tumor cells are known to perturb cellular function (Aran et al., 2017; Huang et al., 2016). Therefore, primary tumor TCGA samples were compared to GTEx healthy tissue samples (of the same tissue-of-origin) from non-cancer patients. For this analysis, all 9,760 primary tumor samples were used, not just those with a corresponding paired-normal tissue sample. A differential expression analysis was performed for each cancer type, comparing primary tumor samples with those of the corresponding healthy tissue from GTEx.

Comparison 3: primary tumor versus all healthy tissues

The final comparison sought to identify genes with relatively low expression throughout all tissues in the body compared to their expression in a tumorn>. We hypothesized that tumor-derived expression changes in such genes would be more detectable in a biofluid than genes expressed at similar or higher levels in many healthy tissues, as the latter could impart a “dilution” effect on the tumor-associated signal of interest. For this analysis, we were more interested in transcript abundance rather than fold-changes between two conditions. Therefore, normalized gene counts (FPKM) were retrieved from TCGA for all tumor and paired normal tissue samples and converted to transcripts per million (TPM). TPM gene counts were also retrieved from the GTEx database for all measured tissues. The complete set of healthy tissues was obtained by combining healthy tissue samples from GTEx with paired normal samples from TCGA (Table S1). For each gene in a given cancern> type, the TPM values among all TCGA primary tumor samples for that cancer type were compared to the TPM values for that gene across all normal samples for a particular tissue type, using a right-tailed Wilcoxon rank-sum test (i.e., the null-hypothesis being that the tumor counts are not sampled from a distribution with a higher median than that of the normal tissue counts). This yielded a significance (p value) for each gene for a tissue type, where a low p value corresponded to genes with higher TPM values in primary tumor tissues than in the normal tissue. The comparison was repeated for all of the healthy tissue types, to obtain a p value for each tissue. The test was performed with each healthy tissue individually rather than pooling all of the normal samples together, as the pooled test would be biased by variations in the number of samples for different tissues. Each of the p values obtained from the different tissues types were then combined (geometric mean) into a single p-like score (ranging from 0–1). The entire process was repeated for each of the different cancer types, yielding a single score for each gene and each cancer type.

Consensus score formulation

For the first two comparisons (DE analyses), genes were ranked by their combined fold-change and significance (FDR-adjusted p value). Fold-changes were ranked directly, with higher ranks assigned to genes with greater positive log2FC (tumor/normal), and vice versa. Prior to ranking p values, the associated FC direction was incorporated to generate directional p values (pdir) for each gene i (analogous to the approach described in (Väremo et al., 2013)): where sign(FC) is the sign of the corresponding log2(fold-change). In this manner, genes with low p values and a positive FC receive a pdir near zero, whereas genes with low p values but a negative FC have a pdir close to one. Genes associated with a high p value will therefore have a pdir near 0.5, regardless of FC direction. These pdir values were then ranked such that higher ranks were assigned to genes with lower pdir values. Finally, the p-like scores generated from the third comparison (tumor versus all tissues) were ranked directly, where low p-scores (high significance) were ranked highly, and vice versa. The consensus rank score was calculated by combining the gene ranks from each of the three comparative analyses, as illustrated in Figure 1A. Specifically, the FC and pdir ranks from the first comparison were averaged, and this mean rank was averaged with the mean of the FC and pdir ranks from the second comparison. The resulting combined rank was averaged with the rank of p-like scores from the third comparison to yield the overall consensus rank score, enabling the prediction of candidate biomarkers for each cancer type. The effective weight ratios from the three comparisons (tumor versus paired normal, tumor versus healthy tissue-of-origin, and tumor versus all healthy tissues) in the consensus score were therefore 1:1:2, respectively. The ratios were assigned as such because the score was designed to place equal weight on expression differences of tumor versus tissue-of-origin, and of tumor versus all tissues. Since comparisons 1 and 2 both quantify tumor versus tissue-of-origin differences, they were each assigned half the weight of comparison 3, which quantified tumor versus all tissue differences. Moreover, since the information from the first two comparisons is likely to exhibit more redundancy (paired normal tissue and healthy tissue-of-origin are relatively similar in their expression profiles compared to other tissue types), they were weighted less than comparison 3.

Cancer types lacking paired-normal or healthy tissue data

Among the 32 TCGA cancern> types with available primary tumor samples, 12 lacked sufficient paired-normal tissue data from TCGA to be included in the first comparison, and 6 types could not be appropriately matched to one of the tissue types defined in GTEx (e.g., SARC, “sarcoma”), and thus could not be included in the second comparison. However, the genes were still scored based on the results from the remaining comparisons that could be performed. Although there is less confidence associated with the scores for these particular cancer types, potential biomarkers could still be identified. For example, the top-scoring candidate for ovarian cancer (OV) was WFDC2 (also known as HE4), which is an established OV protein biomarker in both urine and serum (Hellström et al., 2003, 2010), and the next top 6 candidates included FOLR1, KLK6, KLK7, and MSLN, all of which have been experimentally confirmed as biofluid diagnostic markers of OV (Badgwell et al., 2007; Diamandis et al., 2003; Leung et al., 2013; Tamir et al., 2014).

Core secretome definition and analysis

To focus on changes in secretome expression associated specifically with malignant progression rather than inter-individual and inter-tissue variation, the analysis was conducted using paired tumor-normal samples from TCGA. Furthermore, cancer types with only a few sample pairs (CESC, PAAD, and PCPG; each had only 2 or 3 pairs) were excluded, yielding a final dataset spanning 17 cancer types, 683 patients, and 1,563 secretome genes (very low-count or non-detected genes were excluded). To identify the subset of secretome genes with substantial paired normal versus primary tumor expression changes across many cancer types, a rank-based metric was used. The rationale of implementing a relative metric rather than directly using the fold-change and significance values from the DE analyses was that their ranges, especially those of the p values, vary widely across cancer types due to differences in the number of samples for each. We therefore ranked the genes within each cancer type by p value, and by absolute log2(fold-change) value, then averaged the two ranks to yield a combined “PF-rank.” To identify the genes that exhibited the greatest and most significant changes across all included cancer types (regardless of fold-change direction), the PF-ranks for each cancer were averaged to yield a pan-cancer PF-rank. Dirn>ectional PF-ranks were also generated, where the direction of expression fold-change was incorporated instead of using the absolute log2FC values. In addition, the associated p values were converted to directional p values (pdir, Equation 2, Method Details), such that the lowest ranks were assigned to genes exhibiting a significant decrease in expression across many cancers, and the highest to those with a significant increase in expression.

Definition of tissue-specific genes

Gene tissue specificity data was retrieved from the HPA, which has compiled a list of genes for each tissue that are classified as tissue enriched, group enriched, or tissue enhanced, based on their expression in that tissue compared to others (Uhlén et al., 2015). Given the relatively small number of tissue-enriched genes for many tissues, especially when removing all non-SP genes, we defined tissue-specific gene sets for each individual tissue as the combination of all its tissue-enriched, group-enriched, and tissue-enhanced genes (Table S4).

Estimation of UPR activation

Activation of the UPR was estimated using a GSAn>. In this analysis, the full gene sets were used; i.e., they were not filtered to remove non-secretome genes. The “Hallmark,” “Canonical pathways,” and “GO gene sets” libraries from MSigDB were queried for any set containing the phrase “endoplasmic reticulum stress” or “unfolded protein response,” and sets with the term “negative regulation” were excluded. This yielded 11 gene sets related to UPR and/or ER stress, which are shown in Figures S4 and S5.

Glycosylation and disulfide bond redox

Expression changes related to glycosylation and disulfiden> bond oxidation/reduction processes were evaluated by conducting a GSA, using the “GO bioprocess: glycosylation” and “GO molecular function: protein disulfide oxidoreductase activity” gene sets from MSigDB, respectively. To add resolution to the analysis of glycosylation activity, two subsets of the glycosylation gene set, “protein N-linked glycosylation” and “protein O-linked glycosylation,” were also evaluated for coordinated changes in gene expression. These gene sets were used in their complete form, and were not filtered (e.g., by removing non-secretome genes).

Secretory burden (SB) score

The SB score was calculated for each gene based on its associated protein length (number of amino acids), number of disulfide sites, and number of glycosylation sites, as described in Equation 1. Data for each of these terms was retrieved from UniProt, where the number of glycosylation sites was the sum of all N-, C-, O-, and S-linked glycosylation sites.

QUANTIFICATION AND STATISTICAL ANALYSIS

Differential expression (DE) analysis

All differential expression analyses reported in the study were conducted using the edgeR package in R (Robinson et al., 2010), with the raw gene count (integer) values as input. For the DE analysis comparing primary tumor expression to that of paired normal tissues, the patient ID number was included as an additional field in the design matrix. When comparing primary tumor gene counts from TCGA to those of healthy tissues from GTEx, only the sample type was considered (tumor or normal). Counts were normalized using the EdgeR calcNormFactors function, which scales library sizes using the trimmed mean of M-values (TMM) between each pair of samples (Robinson and Oshlack, 2010). For each DE analysis, low-count genes were removed beforehand; i.e., only genes with at least 10 counts in at least half of the samples were retained. Furthermore, DE analyses were only performed if there were at least 3 samples in each of the 2 conditions to be compared.

Gene set analysis

To quantify the extent to which different groups of genes were enriched in a given metric (e.g., p values from a DE analysis), a gene set analysis (GSAn>) was performed. This type of analysis was applied in a number of situations throughout the study, and followed the same procedure (described below), unless stated otherwise. The following gene set collections were retrieved from the Molecular Signatures Database (MSigBD (Subramanian et al., 2005)): hallmark (H) (Liberzon et al., 2015), KEGG (C2 CP:KEGG), Reactome (C2 CP:REACTOME), GO biological process (C5 BP), and GO molecular function (C5 MF). Gene set collections were filtered to remove all non-secretome genes from each set prior to analysis, unless otherwise stated. We note that this filtration can cause the name of a gene set to become less representative if a substantial portion of genes in the set are removed. In this way, the significance of a gene set does not necessarily represent an enrichment in its named function/pathway, but instead represents an enrichment in the set of secretome genes that are associated with that function/pathway. In addition, gene sets containing more than 400 genes (before filtering) were also removed, as these sets tended to have a very low fraction of secretome genes, and were generally uninformative. Finally, to avoid statistical problems with very small gene sets, those with less than 20 genes after filtration were excluded from the analysis, unless otherwise noted. A Wilcoxon rank-sum test statistic was calculated from the DE analysis p values of genes in a given set, and compared to those of 100,000 randomly shuffled gene sets of the same size. The significance (p value) of a gene set was calculated as: where N is the number of randomly shuffled gene sets with a test statistic greater than or equal to that of the original gene set, and N is the number of random permutations (100,000 in this study). Gene set p values calculated in this manner correspond to “non-directional” p values (pnon-dir), as they do not take into account the direction (increase or decrease) of the fold-change from the DE analysis, only the significance. “Distinct dirn>ectional” gene set p values (pdist-dir-up and pdist-dir-down) Väremo et al., 2013) were obtained in the same manner, except the p values from the DE analysis were first converted to directional p values (Equation 2) before calculating the associated Wilcoxon test statistic. The resulting gene set pdist-dir-up values quantify coordinated expression increases in a gene set, where a low pdist-dir-up indicates a get set that is enriched in genes with significant expression increases. Coordinated expression decreases are quantified simply as pdist-dir-down = 1 – pdist-dir-up, where low pdist-dir-down values indicate an enrichment of genes with expression decreases.

Adjustment of p values

All adjusted p values (padj) reported in the study were adjusted to control for the false discovery rate (FDR) using the Benjamini-Hochberg procedure. Statistical significance in this study was defined as padj < 0.05.

KEY RESOURCES TABLE

REAGENT or RESOURCE Software and Algorithms	SOURCE	IDENTIFIER
The R Project for Statistical Computing	R Development Core Team, 2018	https://www.R-project.org/
Bioconductor	Gentleman et al., 2004	https://www.bioconductor.org/
EdgeR	Robinson et al., 2010	R Bioconductor
TCGAbiolinks	Colaprico et al., 2016	R Bioconductor
MATLAB R2017b	The MathWorks, Inc.	https://ch.mathworks.com/products/matlab.html
MSigDB database	Subramanian et al., 2005	http://software.broadinstitute.org/gsea/msigdb

109 in total

1. Proteomic analysis of urinary biomarker candidates for nonmuscle invasive bladder cancer.

Authors: Mårten Lindén; Sara Bergström Lind; Corina Mayrhofer; Ulrika Segersten; Kenneth Wester; Yaroslav Lyutvinskiy; Roman Zubarev; Per-Uno Malmström; Ulf Pettersson
Journal: Proteomics Date: 2011-12-02 Impact factor: 3.984

2. Proteomics. Tissue-based map of the human proteome.

Authors: Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal: Science Date: 2015-01-23 Impact factor: 47.728

Review 3. Pathways of Unconventional Protein Secretion.

Authors: Catherine Rabouille
Journal: Trends Cell Biol Date: 2016-12-16 Impact factor: 20.808

4. Interpreting expression profiles of cancers by genome-wide survey of breadth of expression in normal tissues.

Authors: Xijin Ge; Shogo Yamamoto; Shuichi Tsutsumi; Yutaka Midorikawa; Sigeo Ihara; San Ming Wang; Hiroyuki Aburatani
Journal: Genomics Date: 2005-08 Impact factor: 5.736

Review 5. The extracellular matrix modulates the hallmarks of cancer.

Authors: Michael W Pickup; Janna K Mouw; Valerie M Weaver
Journal: EMBO Rep Date: 2014-11-08 Impact factor: 8.807

6. Collagen triple helix repeat containing-1 promotes pancreatic cancer progression by regulating migration and adhesion of tumor cells.

Authors: Eun Hye Park; Seokho Kim; Ji Yoon Jo; Su Jin Kim; Yeonsil Hwang; Jin-Man Kim; Si Young Song; Dong-Ki Lee; Sang Seok Koh
Journal: Carcinogenesis Date: 2012-12-07 Impact factor: 4.944

7. Folate-receptor 1 (FOLR1) protein is elevated in the serum of ovarian cancer patients.

Authors: F Leung; A Dimitromanolakis; H Kobayashi; E P Diamandis; V Kulasingam
Journal: Clin Biochem Date: 2013-03-23 Impact factor: 3.281

8. Evaluation of midkine as a diagnostic serum biomarker in hepatocellular carcinoma.

Authors: Wen-Wei Zhu; Jia-Jian Guo; Lei Guo; Hu-Liang Jia; Ming Zhu; Ju-Bo Zhang; Christopher A Loffredo; Marshonna Forgues; Hua Huang; Xu-Jian Xing; Ning Ren; Qiong-Zhu Dong; Hai-Jun Zhou; Zheng-Gang Ren; Nai-Qing Zhao; Xin Wei Wang; Zhao-You Tang; Lun-Xiu Qin; Qing-Hai Ye
Journal: Clin Cancer Res Date: 2013-05-29 Impact factor: 12.531

9. Human protein secretory pathway genes are expressed in a tissue-specific pattern to match processing demands of the secretome.

Authors: Amir Feizi; Francesco Gatto; Mathias Uhlen; Jens Nielsen
Journal: NPJ Syst Biol Appl Date: 2017-08-18

10. Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods.

Authors: Leif Väremo; Jens Nielsen; Intawat Nookaew
Journal: Nucleic Acids Res Date: 2013-02-26 Impact factor: 16.971

18 in total

1. Using All Our Genomes: Blood-based Liquid Biopsies for the Early Detection of Cancer.

Authors: Eddie Adams; Gregory D Sepich-Poore; Sandrine Miller-Montgomery; Rob Knight
Journal: View (Beijing) Date: 2022-01-31

2. Loss of enteric neuronal Ndrg4 promotes colorectal cancer via increased release of Nid1 and Fbln2.

Authors: Nathalie Vaes; Simone L Schonkeren; Glenn Rademakers; Amy M Holland; Alexander Koch; Marion J Gijbels; Tom G Keulers; Meike de Wit; Laura Moonen; Jaleesa R M Van der Meer; Edith van den Boezem; Tim G A M Wolfs; David W Threadgill; Jeroen Demmers; Remond J A Fijneman; Connie R Jimenez; Pieter Vanden Berghe; Kim M Smits; Kasper M A Rouschop; Werend Boesmans; Robert M W Hofstra; Veerle Melotte
Journal: EMBO Rep Date: 2021-04-23 Impact factor: 8.807

Review 3. Endogenous and Exogenous Regulatory Signaling in the Secretory Pathway: Role of Golgi Signaling Molecules in Cancer.

Authors: Simona Del Giudice; Valentina De Luca; Seyedehnegar Parizadeh; Domenico Russo; Alberto Luini; Rosaria Di Martino
Journal: Front Cell Dev Biol Date: 2022-03-23

Review 4. MMPs, tyrosine kinase signaling and extracellular matrix proteolysis in kidney cancer.

Authors: Fiza Hashmi; Mehdi Mollapour; Gennady Bratslavsky; Dimitra Bourboulia
Journal: Urol Oncol Date: 2020-05-31 Impact factor: 2.954

5. Determinants of plasma calretinin in patients with malignant pleural mesothelioma.

Authors: Martin Lehnert; Daniel G Weber; Dirk Taeger; Irina Raiko; Jens Kollmeier; Susann Stephan-Falkenau; Thomas Brüning; Georg Johnen
Journal: BMC Res Notes Date: 2020-07-29

6. The expression landscape of cachexia-inducing factors in human cancers.

Authors: Paula Paccielli Freire; Geysson Javier Fernandez; Diogo de Moraes; Sarah Santiloni Cury; Maeli Dal Pai-Silva; Patrícia Pintor Dos Reis; Silvia Regina Rogatto; Robson Francisco Carvalho
Journal: J Cachexia Sarcopenia Muscle Date: 2020-03-03 Impact factor: 12.910

Review 7. Mutant p53 in cell-cell interactions.

Authors: Steven Pilley; Tristan A Rodriguez; Karen H Vousden
Journal: Genes Dev Date: 2021-04-01 Impact factor: 11.361

Review 8. The Crosstalk Between Adipose-Derived Stem or Stromal Cells (ASC) and Cancer Cells and ASC-Mediated Effects on Cancer Formation and Progression-ASCs: Safety Hazard or Harmless Source of Tropism?

Authors: Vincent G J Guillaume; Tim Ruhl; Anja M Boos; Justus P Beier
Journal: Stem Cells Transl Med Date: 2022-04-29 Impact factor: 7.655

9. Gliome database: a comprehensive web-based tool to access and analyze glia secretome data.

Authors: Jong-Heon Kim; Su-Hyeong Park; Jin Han; Pan-Woo Ko; Dongseop Kwon; Kyoungho Suk
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

10. Dysregulation of the secretory pathway connects Alzheimer's disease genetics to aggregate formation.

Authors: Chih-Chung Kuo; Austin W T Chiang; Hratch M Baghdassarian; Nathan E Lewis
Journal: Cell Syst Date: 2021-06-24 Impact factor: 11.091