Literature DB >> 25965298

Revisiting the transcriptional analysis of primary tumours and associated nodal metastases with enhanced biological and statistical controls: application to thyroid cancer.

M Tarabichi1, M Saiselet1, C Trésallet2, C Hoang2, D Larsimont3, G Andry4, C Maenhaut5, V Detours1.   

Abstract

BACKGROUND: Transcriptome profiling has helped characterise nodal spread. The interpretation of these data, however, is not without ambiguities.
METHODS: We profiled the transcriptomes of papillary thyroid cancer nodal metastases, associated primary tumours and primary tumours from N0 patients. We also included patient-matched non-cancerous thyroid and lymph node samples as controls to address some limits of previous studies.
RESULTS: The transcriptomes of patient-matched primary tumours and metastases were more similar than those of unrelated metastases/primary pairs, as previously reported in other organ systems. This similarity partly reflected patient background. Lymphoid tissues in the metastases confounded the comparison of patient-matched primary tumours and metastases. We circumvented this with an original data adjustment, revealing a differential expression of stroma-related gene signatures also regulated in other organs. The comparison of N0 vs N+ primary tumours uncovered a signal irreproducible across independent data sets. This signal was also detectable when comparing the non-cancerous thyroid tissues adjacent to N0 and N+ tumours, suggesting a cohort-specific bias also likely present in previous similarly sized studies. Classification of N0 vs N+ yielded an accuracy of 63%, but additional statistical controls absent in previous studies revealed that this is explainable by chance alone. We used large data sets from The Cancer Genome Atlas: N0 vs N+ classification was not better than random for most cancers. Yet, it was significant, but of limited accuracy (<70%) for thyroid, breast and head and neck cancers.
CONCLUSIONS: The clinical potential of gene expression to predict nodal metastases seems limited for most cancers.

Entities:  

Mesh:

Year:  2015        PMID: 25965298      PMCID: PMC4430711          DOI: 10.1038/bjc.2014.665

Source DB:  PubMed          Journal:  Br J Cancer        ISSN: 0007-0920            Impact factor:   7.640


Thyroid cancers are the most frequent endocrine malignancies. Eighty per cent of them are of the papillary subtype (PTC). In these cancers, regional lymph node metastasis is correlated with poorer survival (Lundgren ). Therapeutic dissection of lymph nodes within the central neck area improves disease outcome (Nixon and Shaha, 2013; Venkat and Guerrero, 2013) and is recommended by current international guidelines (Pacini ; American Thyroid Association (ATA) Guidelines Taskforce on Thyroid Nodules and Differentiated Thyroid Cancer ), whereas the benefit of prophylactic dissection remains a matter of heated debate (Nixon and Shaha, 2013; Venkat and Guerrero, 2013). The biology underlying the nodal dissemination of PTC is poorly understood. A number of cancer studies in other organ systems have explored with transcriptome profiling the relationship between primary tumours and their associated nodal metastases and the characteristics of those primary tumours producing nodal metastases (Huang ; Kikuchi ; Nagata ; Hao ; Takada ; Tamoto ; Croner ; Hoang ; O'Donnell ; Weigelt ; Xi ; Bidus ; Roepman ; Wang ; Feng ; Inamura ; Méndez , 2011; Nguyen ; Pei ; Suzuki and Tarin, 2007; Kashiwazaki ; Kroon ; Vecchi ; Ellsworth ; Wong ; Xie ; Edfeldt ; Smeets ). A recurrent finding was that molecular phenotypes of nodal metastases are similar to those of the primary tumours they originate from (Perou ; Hao ; Hoang ; O'Donnell ; Weigelt ; Roepman ; Feng ; Inamura ; Vecchi ; Ellsworth ). This has been interpreted as evidence that most cells in the primary tumour have the potential to seed metastases (Hao ; Hoang ; Inamura ; Ellsworth ), in contrast with the classical view that a few cells acquire metastatic capacity within the primary tumour (Fidler and Kripke, 1977). Furthermore, several studies defined the biological context of nodal spread by pointing out transcription of genes related to the immune response (Weigelt ; Wang ; Vecchi ; Ellsworth ; Xie ) and the stroma (Hao ; O'Donnell ; Weigelt ; Roepman ; Feng ; Suzuki and Tarin, 2007; Vecchi ; Ellsworth ) differing between primary and metastatic samples. Finally, the prediction of nodal spread from primary tumour yielded conflicting results ranging from no classification (Xi ; Kroon ) to perfect classification of N0 (without apparent nodal spread) and N+ (with nodal spread) diseases (Kikuchi ; Tamoto ; O'Donnell ; Roepman ; Inamura ; Pei ; Kashiwazaki ; Méndez ). Whether these discrepancies reflect technical issues or a cancer type-specific classification potential remains unclear. Data presented in these studies are not without ambiguities. Patient's genetic and physiological background could both contribute to the transcriptional similarity between primary tumours and patient-matched nodal metastases. The immune-related signals reported in nodal metastasis could, as noted by several authors, reflect the presence of lymphoid tissues alongside the metastasis. Laser capture microdissection has been proposed to address this ambiguity in some studies (Hoang ; Wang ; Inamura ; Nguyen ; Ellsworth ), whereas others have advocated the analysis of bulk tissues because they better reflect the wider context of metastasis (Hoang ; O'Donnell ; Méndez ). The N0 vs N+ classification problem has been approached with a very wide range of statistical methods and validation procedures in addition to differences in lymph node screening protocols. This study applies the transcriptomic approach to the analysis of nodal metastatic spread in PTCs. Numerous biological and computational controls have been included to address the ambiguities of previous studies.

Materials and methods

Samples and patients

Samples derived from 31 patients diagnosed for PTC were obtained from the Jules Bordet Institute (Brussels, Belgium) and La Pitié-Salpêtrière Hospital (Paris, France). They were selected on the basis of RNA quality rather than clinicopathologic parameters. Protocols were approved by the ethics committees of the institutions.

Extraction and quality assessment

RNA was extracted with RNeasy Mini Kit columns (Qiagen, Hilden, Germany). The histology of each sample was verified using haematoxylin- and eosin-stained sections. Percentages of tumour cells were estimated and, when present, the percentages of adjacent non-tumour cells or lymphocyte infiltration were quantified. Tumours with <70% cancer cells were discarded. RNA concentrations were measured using NanoDrop ND-1000 spectrophotometer (Life Technologies, Grand Island, NY, USA). RNA integrity was assessed using an automated gel electrophoresis system (Experion; Bio-Rad, Hercules, CA, USA), yielding a score from 1 (fully degraded) to 10 (intact RNA): the RNA Quality Index. Only samples with a score of 7 or above were considered.

Microarrays hybridisation

Genome-wide mRNA profiles were obtained through the hybridisation of samples onto Affymetrix GeneChip Human Genome U133 Plus 2.0 Arrays (Affymetrix, Santa Clara, CA, USA). RNA amplification, hybridisation and image scanning were performed according to standard Affymetrix protocols.

Mutational status

We assessed the mutational status of BRAF, RET/PTC1, RET/PTC3, NRAS, HRAS and KRAS in the PTC samples. After a DNase treatment with DNaseI Amplification Grade Kit (Invitrogen, Carlsbad, CA, USA), 1 μg of total RNA was used for reverse transcription using hexamers (3.6 μg μl−1; Roche, Basel, Switzerland) and reverse transcriptase (SuperscriptII RNase H ReverseTranscriptase Kit; Invitrogen). Polymerase chain reactions were performed on 2 μl of cDNA using the recombinant Taq DNA Polymerase Kit (Invitrogen) and appropriate primer pairs (primer sequences and PCR conditions provided in a previous study; Saiselet ). Polymerase chain reaction products were purified with the QIAquick PCR Purification Kit (Qiagen) according to the manufacturer's instructions. Sequencing was performed with the BigDye Terminator V3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) with the sequencer ABI PRISM 3130 (Applied Biosystems) and the genetic analysis program 3130-XI.

Microarray analysis

All analyses and images were performed in R v.3.1.0 (R Core Team, 2013). CEL files were normalized with fRMA (McCall ) v.1.6.0 with default parameters and hgu133plus2frmavecs v.1.1.8 annotation. The arrays used contain 54 613 probe sets (probes) representing 20 027 genes, annotated with R packages annotate v.1.32.3 (Gentleman, 2014) and hgu133plus2.db v2.6.3 (Carlson ). Hierarchical clustering was performed using the function hclust with ‘method=ward'. The distance between samples was defined as 1−|ρS|, where ρS is Spearman's correlation coefficient. Principal component analysis was computed with prcomp. Differential expression was assessed with Rank Product (Breitling ), a nonparametric method related to the fold-change method (Hong and Breitling, 2008). The four normal lymph node (NLN) profiles were used to assess the contaminant fraction in nodal metastasis profiles (LNM) as follows: where ge(tissue) is the expression profile of a tissue; x is the fraction of NLN contaminant and δ is an unknown signal associated with metastasis and measurement errors. Taking the ratios of expression: Thus, the fraction of contaminant x can be estimated from the slope between LNM/PTC and NLN/PTC ratios. This method was assessed with a data set of tissue mixtures of known contaminant fraction (Supplementary Figure S3). Estimated and real contaminant fractions were similar and highly correlated (Spearman's ρ=0.94; P=1.16 × 10−78). To deconvolute the contaminant signal, for each log2-transformed LNM-PTC ratio, a loess normalization was performed with regard to their NLN-PTC ratio. Gene set enrichment analysis was run with 1000 label permutations to derive a P-value and its associated false discovery rate for each gene set (Subramanian ). After balancing the number of N0 and N+ samples, nested cross-validation was performed using support vector machine as the classification method (Chang and Lin, 2011) and top scoring pairs (Leek, 2009) for the feature selection. The data sets were divided randomly into five parts of equal size and equal number of samples from each class. One part was set aside as the test set, whereas the training set consisted of the four remaining parts. The training set was used in the inner fivefold cross-validation to select the best number of probes N ∈ {8, 16, 32, 64}, that is, the number of probes maximising the average accuracy of the predictors. False discovery rates, a confidence measure that accounts for multiple testing, <0.05 were considered significant for differential expression and <0.25 for gene set analysis, as recommended by Subramanian ). This microarray data set is available from the Gene Expression Omnibus (Edgar ) under accession number GSE60542.

Results

No strong global transcriptional difference between N0 primary tumours, N+ primary tumours and nodal metastases

We profiled the transcriptomes of 11 primary PTCs with no detectable nodal invasion, 17 primary PTCs with nodal invasion and 17 patient-matched nodal metastases. We also profiled a number of control samples. These included 24 patient-matched non-cancerous thyroid tissues (11 from N0, 13 from N+ patients), and 4 normal lymph nodes, and technical and biological replicates including additional nodal metastasis for 3 patients, adjacent blocks for 5 primary tumours, 4 non-cancerous thyroid tissues and 1 nodal metastasis. The non-cancerous thyroid tissues were taken from the thyroid tissues adjacent to the tumor, and in the contralateral thyroid lobe whenever possible. To compare the global molecular phenotypes of these samples, we projected them on the two first principal components, representing together 46% of the total variance of the data set (Figure 1). Samples were grouped into three clusters: tumour tissues, non-cancerous thyroid tissues and normal lymph nodes. We eliminated from further analysis five presumably mislabelled ‘tumour' samples with global phenotypes akin to non-cancerous tissues (Figure 1, red labels). The tumour cluster included N0, N+ primary tumours and nodal metastases demonstrating the absence of a strong global transcriptional difference between these tissues.
Figure 1

Principal component analysis of the data set. Samples are projected on the two first components, which explain 28.9% and 16.7% of the total variance of the data set, respectively. Red denotes mislabelled samples. Orange denotes samples with high lymphoid tissue contamination (see Figure 4A).

The transcriptomes of primary tumours and patient-matched lymph node metastases are highly correlated owing to both patient and tumour common backgrounds

Several publications reported that the global transcriptomes of primary tumours are more related to those of their associated metastasis than to the metastasis of other primary tumours from other patients (Perou ; Hao ; O'Donnell ; Weigelt ; Roepman ; Feng ; Inamura ; Vecchi ; Ellsworth ). To verify if this applied to our data, we ran a hierarchical clustering on all pairs of nodal metastases and matched primary tumours (Figure 2A). Samples clustered by patient for 11 in 13 pairs, confirming earlier studies.
Figure 2

Correlation between expression profiles of nodal metastases and patient-matched primary tumours. (A) Hierarchical clustering of primary tumours (PT, in grey) and patient-matched lymph node metastases (LNM, in black). Numbers denote patient identity. Patient-matched samples cluster together, except for patients 7 and 8 (orange labels). (B) Panel ‘PT replicates': correlation between blocks from the same primary tumour (brown) and between primary tumours from different patients (grey). Panel ‘PTC-LNM': correlations between primary tumours and patient-matched nodal metastasis (brown) and between primary tumours and unrelated nodal metastasis (grey). Panel ‘N replicates': same as ‘PT replicates, using non-cancerous tissues instead of primary tumours. Panel ‘PTC-N': correlation between primary tumours and patient-matched non-cancerous thyroid tissues (brown) and between primary tumours and unrelated non-cancerous thyroid tissues (grey). Panel ‘LNM-N': same as ‘PTC-N' using nodal metastases instead of primary tumours.

The present and previous analyses provided little quantitative insight and did not investigate the contribution of patient and tumour backgrounds in determining sample similarities. Nodal samples were significantly more correlated with primary tumours from the same patients than with unrelated primary tumours (Figure 2B). The correlations between matched primary/nodal samples and between primary tumour replicates were not significantly different. Patient-matched non-cancerous and primary tumour samples were significantly more correlated than unmatched primary and non-cancerous samples. Thus, part of the primary tumour phenotype is related to patient background. The former correlation, however, was significantly lower than the correlations between tumour replicates and between non-cancerous tissue replicates. Patient-matched nodal metastasis and non-cancerous thyroid tissues were not more correlated than equivalent unmatched samples. We concluded that the observed pairing of the patient-matched primary and nodal transcriptomes was owing to both tumour- and patient-specific backgrounds.

Normal lymphoid and non-cancerous thyroid tissues likely confounded the expression differences between primary tumours and nodal metastases

The transcriptomes of primary tumours and nodal metastases were undistinguishable on a global scale, yet defined gene expression signature may set them apart and provide insights into the metastatic process. We performed a supervised search for genes differentially expressed between N+ primary tumours and their matched metastases (details in Material and Methods) – an approach widely used in previous studies (Huang ; Kikuchi ; Nagata ; Tamoto ; Croner ; O'Donnell ; Roepman ; Xi ; Bidus ; Inamura ; Méndez , 2011; Nguyen ; Pei ; Kashiwazaki ; Kroon ; Wong ; Smeets ). We detected 1074 probes more expressed in metastases than in primary tumours (Supplementary Table S3). To gain functional insight, we ran a gene set enrichment analysis (GSEA) (Subramanian ) using a gene set database composed of the Gene Ontology biological processes based on categories, signatures of primary tumour vs nodal metastasis from previous studies (Nagata ; Hao ; Hoang ; Roepman ; Xi ; Bidus ; Finak ; Wang ; Feng ; Inamura ; Nguyen ; Suzuki and Tarin, 2007; Casey ; Ellsworth ; Edfeldt ) and a signature of normal thyroid differentiation (Tomás ). Gene sets from previous studies and gene sets associated with the immune system were upregulated in nodal metastasis (Figure 3A). This signal could have resulted from the trivial fact that we compared tissues with two different backgrounds, that is, thyroid and lymphoid. To address this confounding variable problem, we compared the fold changes of probes obtained when comparing nodal metastases vs primary tumours with the fold changes obtained when comparing normal lymph nodes vs non-cancerous thyroid (Figure 3B). The two variables were highly correlated, ρS=0.39 (P<2 × 10−16), demonstrating that the 1074 upregulated probes and the gene sets were not informative about the metastatic process, but were due to the confounding effect of normal lymphoid cells.
Figure 3

Gene expression differences between nodal metastases and patient-matched primary tumours. (A) A gene set enrichment analysis (Subramanian ) for the nodal metastases vs patient-matched primary tumours comparison with gene sets from the Gene Ontology Biological Processes database (black) and curated gene sets from previous studies investigating nodal metastasis or stromal gene expression (Hoang ; Wang ; Feng ; Suzuki and Tarin, 2007; Casey ; Ellsworth ; blue, only significant gene sets are shown (see Supplementary Table S2 for the full list of investigated gene sets) and a signature of thyroid differentiation (Tomás ; green). Red bars denote increased expression in metastases compared with primary tumours and green bars denote decreased expression. (B) Each point denotes a gene with expression average fold change between normal lymph node (NLN) and non-cancerous thyroid (Thyroid) on the x axis and average fold change between nodal metastases (LNM) and primary tumours (PTC) on the y axis. Spearman's correlation coefficient between these two variables is 0.39 (P<2 × 10−16).

A lower number of probes, 27, were downregulated in metastases compared with primary tumours (Supplementary Table S3). Gene set enrichment analysis detected the thyroid differentiation signature as the unique downregulated gene set (Figure 3A). This raised the possibility that expression negatively associated with nodal metastases was confounded by the absence of normal, differentiated, thyroid cells in lymph nodes.

Correcting for contaminant reveals an increased stromal signal in nodal metastases compared with matched primary

The normal lymph node transcriptomes enabled estimating lymphoid cell fraction in the nodal metastasis (Material and Methods and Supplementary Figure S1). It was below the 30% limit as determined by the pathologic examination in all but two samples (Figure 4A). These two samples were precisely those not clustering with their matched primary tumour in Figure 2A, strongly suggesting that these two exceptions to the general clustering pattern were caused by low tumour content.
Figure 4

Correcting for lymphoid content reveals an upregulation of stroma-related genes in nodal metastasis compared with primary tumours. (A) Estimate of the fraction of normal lymphoid tissues in nodal metastasis samples (calculation explained in Material and Methods and illustrated in Supplementary Figure S1). (B) Same as in Figure 3A, except that expression profiles of nodal metastases were adjusted for normal lymphoid tissue content.

Expression data were adjusted for lymphoid cell fraction (Materials and Methods and Supplementary Figure S1) and gene set enrichment was investigated as described in the previous section (Figure 4B; see Supplementary Table S3 for the list of differentially expressed genes). Two signatures of nodal metastasis from breast cancer studies (Suzuki and Tarin, 2007; Ellsworth ), and one from gastric cancer using laser capture microdissection (Wang ), were upregulated in PTC nodal metastases. Two signatures obtained by comparing stroma with other cell types (Finak ; Casey ) were also upregulated in the PTC metastases. Surprisingly, the three cancer-derived signatures were downregulated in nodal metastases in the original publications (Wang ; Suzuki and Tarin, 2007; Ellsworth ). To further investigate why the same signatures had an opposite regulation in PTC metastases, we estimated their expression in primary tumour RNA-seq data from The Cancer Genome Atlas (TCGA). All three signatures had a significantly higher median expression in breast than thyroid tumours (Supplementary Figure S2). Thus, the referential defining of ‘up-' and ‘nodal up-' and ‘downregulation' is different in these two types of cancer. An equal expression of stroma-related genes in nodal metastases could lead to opposite senses of differential expression when comparing with related primary tumours. We noted, however, that this tentative explanation could not apply to gastric cancer because it had the lowest stroma-related expression among all TCGA cancers.

Differential expression between N0 and N+ tumours is reproducible in patient-matched non-cancerous tissues, but not in independent PTC data sets

The unsupervised analysis (Figure 1) revealed no global difference between the transcriptomes of N0 and N+ primary tumours. Again, we searched for specific signatures of nodal status with a supervised analysis. A preliminary analysis determined that nodal status was not confounded with any other recorded clinical variables in our samples (variables listed in Supplementary Table S1). Mutational status was not associated with nodal status: 16 out of 26, that is ∼60% of PTC were BRAF mutated (8 N0 and 8 N+). We detected 129 probes more expressed in N+ primary tumours compared with N0 and 11 less expressed (Supplementary Table S3). Gene set enrichment analysis uncovered a single significant gene set, a signature upregulated in nodal metastases in a previous breast cancer study (Suzuki and Tarin, 2007), but downregulated in N+ samples in our data set. As we will see below, such discrepancy is typical. The same analysis was performed with the non-cancerous tissues adjacent to the N+ and N0 tumours. Eight hundred and forty-seven probes were detected as upregulated and 8 as downregulated in non-cancerous tissues adjacent to N+ tumours (Supplementary Table S3). Three gene sets reached significance with GSEA: calcium-independent cell–cell adhesion was upregulated; two signatures of upregulated genes in nodal metastases elsewhere (Feng ; Suzuki and Tarin, 2007) were downregulated. Hence, the categories associated with nodal infiltration seemed similar in the tumour and matched non-cancerous tissues. To confirm this impression, we estimated with GSEA the expression of the tumour-derived N+ signature in the non-cancerous tissues (Figure 5A and B), and conversely, of the non-cancerous tissue-derived N+ signature in the tumours (Figure 5C and D). Both had a consistent expression in the alternate patient-matched tissue. In other words, there was high similarity in transcriptional signals associated with nodal status in the non-cancerous and tumour tissues. This is a straightforward consequence of the established fact (Figure 2) that transcriptomes of a tumour and its matched non-cancerous tissues were correlated on a global scale.
Figure 5

Coherence of N0 (A–D) The same N0 vs N+ differential expression is present in tumour and patient-matched non-cancerous tissues. We use the procedure and standard graphical representation of GSEA (Subramanian ). Genes are ranked from the most upregulated in N+ to the most downregulated. Genes belonging to the investigated signature are denoted with black bars at the bottom of each panel. The green curve denotes the enrichment score. The more biased on the left, resp. right, the more the signature is upregulated in N+, resp. N0. (A) A signature comprising the 200 most upregulated genes in N+ primary tumours compared with N0 tumours, which was evaluated in non-cancerous thyroid tissues: it is upregulated as well. (B) Same as in (A) with the 200 most downregulated genes. (C and D) Same as in (A and B), except that the signatures were extracted from non-cancerous tissue data and then evaluated in primary tumours. (E) The N0 vs N+ differential expression is irreproducible across independent data sets (Delys ; Detours ; Dom ). Rows depict data set and column N0 vs N+ signatures extracted from specific data sets. Signatures up- and downregulated in N+ are shown in different panels. Colours represent statistical significance, with black denoting P>0.05, red denoting significant downregulation and green significant upregulation. For example, the third column, Delys ), Down corresponds to the genes downregulated in N+ in the data set of Delys . Hence, the colour of diagonal elements was set to grey. Delys ) signature, however, is not associated with nodal status in the present study nor in the study of Dom ), yet is significantly upregulated in N+ tumours in the TCGA thyroid data set. Thus, this signature is not corroborated in two data sets and is expressed in the opposite direction in a third data set. Similarly, discordant results were obtained in a negative control obtained by running the exact same calculations on the same data, but in which N0 and N+ class labels were permuted randomly across the samples of each study.

To validate these results, we considered the 129 and 11 genes sets as signatures and evaluated their expression in three published PTC data sets with nodal status information. These signatures were not consistently regulated. To assess comprehensively the discrepancies between data sets, we selected the 200 top genes ordered by their fold change of expression and associated with nodal status in each one of the three data sets and in our data set. We then tested all these signatures in all data sets (Figure 5C). No signature showed consistent expression across data sets. Taken together, these results suggest that the signals associated with nodal status are cohort-specific – all the PTC cohorts are relatively small, except TCGA – and are associated with patient rather than tumour background.

Nodal invasion can be predicted from primary tumour transcriptomes for some, but not all cancer types

A number of studies present predictors of nodal status from primary tumour gene expression (Kikuchi ; Tamoto ; O'Donnell ; Roepman ; Inamura ; Méndez ; Pei ; Kashiwazaki ). Although the previous section ruins any hopes to find a relevant signal in our data, we conducted a comparable analysis to illustrate a widely overlooked statistical point. Using state-of-the-art machine learning methods and a nested cross-validation protocol immune to feature and model selection biases (Material and Methods), nodal status could be predicted with accuracy of 63%. However, error estimation is variable depending on the specific split of the data into training and validation samples (Ambroise and McLachlan, 2002) – an issue rarely controlled for. We reran the above analysis 100 times on our data, but each time the N0/N+ labels were shuffled randomly among the samples. Twelve per cent of the runs produced accuracy estimates >63% (Figure 6A) in our data set, that is, a 63% accuracy has a high probability to occur by chance. We extended this analysis to other published PTC data sets and obtained similar results (Figure 6A), except for the TCGA thyroid data set.
Figure 6

Classification accuracy of N0 (A) Colour bars denote the fraction of correctly classified samples in each data sets. The black density lines denote the null distribution of this fraction in data sets in which the N+ and N0 class labels have been permuted randomly across samples. Accuracy reached significance only for the TCGA thyroid data set. (B) The same information as in (A) is represented in a compact format for 14 TCGA data sets. Each row stands for a specific type of cancer. Black shapes represent the null distribution of the N+ vs N0 classification accuracy, red bar the 95% confidence limits and blue squares the accuracy obtained on the original, non-permuted, data. Significant classification accuracies were obtained for breast, thyroid and head and neck cancers. PAAD, pancreatic cancer; STAD, stomach cancer; PRAD, prostate cancer; KIRC, kidney clear cell cancer; READ, rectal cancer; BLCA, bladder cancer; COAD, colon cancer; LUSC, lung small cell cancers; HNSC, head and neck cancer; LUAD, lung adenocarcinomas; THCA, thyroid cancer (same as in A); BRCA, breast cancer. Each data set was balanced as to have an equal number of N+ and N0 samples.

This problem is more likely with small data sets. With 340 PTC samples, the TCGA made it possible to overcome the small size of our data set. We ran the classification analysis, including the random shuffling control, on the TCGA PTC data set and all other TCGA data sets with nodal status information (Figure 6B). The N0 vs N+ classification accuracy was significant in thyroid, breast and head and neck, but not high (<70%). No evidence for N0 vs N+ classification accuracy better than random was found in 11 other cancers. Note that significant classification accuracies were obtained for large data sets. Thus, data set size and cancer type influenced the classification potential of expression data.

Discussion

Nodal metastases tended to be more similar to their patient-matched primary tumours than to unrelated primary tumours (Figure 2). This trend, also present in previous studies (Perou ; Hao ; O'Donnell ; Weigelt ; Roepman ; Feng ; Inamura ; Vecchi ; Ellsworth ), has been interpreted in the context of the stepwise and the ab initio models of metastatic progression. In contrast to the stepwise model, the ab initio model posits that metastatic potential is acquired early during tumour expansion and is detectable in bulk samples. It is also compatible with a small or no difference between primary and nodal tumours (Inamura ; Ellsworth ). However, the evidence provided by the expression profile comparison is limited, as it may depend on the cell types present in the samples (Roepman ). Here, we found that all the mismatched nodal samples were highly contaminated by lymphoid tissues. Another potential confounder is the presence of multiple primary foci (Weigelt ). The interpretation of similarity results is further complicated by our findings that primary tumours are more similar to their contralateral non-cancerous thyroid tissues than to unrelated non-cancerous thyroid tissues. Thus, the stronger similarity between primary and nodal tumours is owing to both acquired tumour-specific features and patient-specific genetics and biology. The global transcriptional profiles of primary PTCs and nodal metastases were very similar (Figure 1). Yet, a supervised analysis revealed hundreds of differentially expressed genes, most of them suggesting an important immune component in the metastases (Figure 3). Again, this result echoes studies conducted in other cancer types (Weigelt ; Wang ; Vecchi ; Ellsworth ; Xie ). A number of these studies have addressed or discussed the contamination of nodal samples by lymphoid tissues (Hoang ; O'Donnell ; Weigelt ; Suzuki and Tarin, 2007; Vecchi ; Ellsworth ; Méndez ). We proposed an original approach to the problem. We demonstrated that in our data fold changes were correlated to those obtained from the comparison of normal lymph nodes and non-cancerous thyroids (Figure 3), suggesting an overwhelming confounding effect of lymphoid tissues. Adjusting for lymphoid tissue content revealed the upregulation of five published stroma-related signatures in other tissues (Finak ; Wang ; Suzuki and Tarin, 2007; Casey ; Ellsworth ). Four of five of these signatures were derived from the transcriptional profiles of laser capture microdissected primary and nodal tumours (Wang ; Ellsworth ), or epithelium and stroma (Finak ; Casey ), suggesting that our statistical adjustment removed lymphoid contamination, but not the biologically relevant signal. Three of 9 and 8 of 13 genes were shared between two studies of microdissected lung and breast primary, respectively, with associated nodal tumours (Hoang ; Ellsworth ) and our unadjusted gene list, which mostly reflects immune infiltration. These overlaps could reflect imperfect microdissection or a small-amplitude immune signal unrelated to contamination. Finally, non-cancerous thyroid cells present in the primary tissue blocks could have produced the spurious downregulation of thyroid differentiation genes (Figure 3). Because the adjustment for the lymphoid signal also removed the thyroid signal (Figure 4B), the latter was probably an indirect consequence of lymphoid contamination: the transcriptomes of bulk tissues contaminated by lymphoid tissues are obviously less thyroid-like. We compared the expression profiles of primary PTCs, which are associated or not with nodal metastasis. We found a signal associated with nodal invasion, in agreement with studies in other cancer types (Kikuchi ; Tamoto ; O'Donnell ; Roepman ; Inamura ; Méndez ; Pei ; Kashiwazaki ). To further gauge the significance of these results, we compared the signatures associated with N0/N+ obtained from four thyroid cancer data sets. The signature obtained from one data set was typically not associated with N0/N+ in the three others (Figure 5). Thus, our N0/N+ signature and, presumably, those published in other studies of comparable size, are not general markers of nodal invasion. Anecdotal comparisons of signatures already hinted at this conclusion (Méndez ). Intriguingly, the same N0/N+ transcriptional signal was also present in the patient-matched non-cancerous tissues in our data (Figure 5). To the best of our knowledge, no other study presented this control. It suggests that the signature captures patient, rather than tumour-specific characteristics. This could arise from a predisposition to develop nodal metastases. However, the inconsistency of the signals suggests a cohort-specific bias. Only 2 (Takada ; Kroon ) among more than 20 studies we surveyed tested whether the N0/N+ status was associated with other available clinical variables. We searched for such association in our data, but did not find any. Yet, association with unobserved patient characteristics cannot be ruled out and is not unlikely given the limited sample size that characterizes the studies at hand. Accuracy of N0 apart from N+ primary tumours classification was 63% in our thyroid cancer data set. However, a random permutation control demonstrated that an equal or higher accuracy might result from chance alone with a probability of 0.12. None of the studies we surveyed presented this control. It is particularly needed for small studies because accuracy estimates have a high variance (Figure 6). Published N0/N+ classification accuracies range from near perfect to near random. Several factors may account for this variation, including the type of cancer investigated, the comprehensiveness of lymph node investigation and technical details of the statistical procedure. Feature selection bias is an endemic flaw in the ‘omic' literature that results in near-perfect – but grossly inflated – classification accuracies (Ambroise and McLachlan, 2002). It occurs when the marker genes are determined using the entire data set and then when the model is adjusted and ‘validated' with classical cross-validation. Such approach breaks the statistical principle of separation of training and validation data. Five studies reporting accuracies of 89–100% presented this flaw (Takada ; Tamoto ; Inamura ; Méndez , 2011). Studies without feature selection bias reported less successful classification. Focusing on the classification of N0 vs N+ with more than 10 nodes invaded in breast primary tumours, Huang obtained an accuracy of 90%. However, a less contrasted class definition yielded a modest 62% accuracy (Smeets ), in line with our analysis of the TCGA data (57%, P=0/100, N=606; Figure 6) and in line with the fact that nodal status is a strong survival predictor that is independent of gene expression-based prognostic signatures in this disease (Van de Vijver ; Van't Veer ; Ramaswamy ; Wang ; Ivshina ). In penile cancer accuracy was 54%, that is, near random (Kroon ). It was 62–67% in colon cancer (Croner ), but significance could not be reached in TCGA (56%, P=6/100, N=136). Excluding studies with feature selection bias (Takada ; Inamura ) and a study with only four validation samples (Kikuchi ), N0/N+ lung cancers could not be classified (Xi ). We did not reach significance with TCGA lung adenocarcinomas (54%, P=6/100, N=230) and small-cell carcinoma (56%, P=11/100, N=184). In oral cancer, Roepman reported a 77% correct classification and Nguyen reported a 92% correct classification using microdissected samples. The Cancer Genome Atlas head and neck cancer samples could be classified but with lower accuracy (62%, P=0/100, N=198). Overall, our TCGA analyses show that N0 and N+ tumours cannot be classified for most cancers: accuracies are typically in the range 50–60% and not significantly better than permutation controls. Papillary thyroid carcinomas are among the most amenable to N0/N+ classification, but accuracy is still limited (67%, P=0/100, N=340). We have presented new biological and statistical controls that expose some of the limits of published studies of the relationship between primary tumours and nodal metastases. The potential clinical relevance of gene expression to predict nodal metastases seems limited for most cancers.
  57 in total

1.  A molecular signature of metastasis in primary solid tumors.

Authors:  Sridhar Ramaswamy; Ken N Ross; Eric S Lander; Todd R Golub
Journal:  Nat Genet       Date:  2002-12-09       Impact factor: 38.330

2.  Gene-expression profile changes correlated with tumor progression and lymph node metastasis in esophageal cancer.

Authors:  Eiji Tamoto; Mitsuhiro Tada; Katsuhiko Murakawa; Minoru Takada; Gaku Shindo; Ken-ichi Teramoto; Akihiro Matsunaga; Kazuteru Komuro; Motoshi Kanai; Akiko Kawakami; Yoshie Fujiwara; Nozomi Kobayashi; Katsutoshi Shirata; Norihiro Nishimura; Shun-ichi Okushiba; Satoshi Kondo; Jun-ichi Hamada; Takashi Yoshiki; Tetsuya Moriuchi; Hiroyuki Katoh
Journal:  Clin Cancer Res       Date:  2004-06-01       Impact factor: 12.531

3.  Combination of microarray profiling and protein-protein interaction databases delineates the minimal discriminators as a metastasis network for esophageal squamous cell carcinoma.

Authors:  Fen-Hwa Wong; Chi-Ying F Huang; Li-Jen Su; Yu-Chung Wu; Yong-Shiang Lin; Jiun-Yi Hsia; Hsin-Ting Tsai; Sheng-An Lee; Chi-Hung Lin; Cheng-Hwai Tzeng; Po-Min Chen; Yann-Jan Chen; Shu-Ching Liang; Jin-Mei Lai; Chueh-Chuan Yen
Journal:  Int J Oncol       Date:  2009-01       Impact factor: 5.650

4.  Clinically significant prognostic factors for differentiated thyroid carcinoma: a population-based, nested case-control study.

Authors:  Catharina Ihre Lundgren; Per Hall; Paul W Dickman; Jan Zedenius
Journal:  Cancer       Date:  2006-02-01       Impact factor: 6.860

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

6.  Gene expression signature predicts lymphatic metastasis in squamous cell carcinoma of the oral cavity.

Authors:  Rebekah K O'Donnell; Michael Kupferman; S Jack Wei; Sunil Singhal; Randal Weber; Bert O'Malley; Yi Cheng; Mary Putt; Michael Feldman; Barry Ziober; Ruth J Muschel
Journal:  Oncogene       Date:  2005-02-10       Impact factor: 9.867

7.  Tumor-specific genetic expression profile of metastatic oral squamous cell carcinoma.

Authors:  Eduardo Méndez; Wenhong Fan; Peter Choi; S Nicholas Agoff; Mark Whipple; D Gregory Farwell; Neal D Futran; Ernest A Weymuller; Lue-Ping Zhao; Chu Chen
Journal:  Head Neck       Date:  2007-09       Impact factor: 3.147

8.  Molecular signatures suggest a major role for stromal cells in development of invasive breast cancer.

Authors:  Theresa Casey; Jeffrey Bond; Scott Tighe; Timothy Hunter; Laura Lintault; Osman Patel; Jonathan Eneman; Abigail Crocker; Jeffrey White; Joseph Tessitore; Mary Stanley; Seth Harlow; Donald Weaver; Hyman Muss; Karen Plaut
Journal:  Breast Cancer Res Treat       Date:  2008-03-29       Impact factor: 4.872

9.  Proteome analysis and tissue microarray for profiling protein markers associated with lymph node metastasis in colorectal cancer.

Authors:  Haiping Pei; Hong Zhu; Shan Zeng; Yixiong Li; Huixiang Yang; Liangfang Shen; Jia Chen; Liang Zeng; Jianghong Fan; Xiaogang Li; Yuewen Gong; Hong Shen
Journal:  J Proteome Res       Date:  2007-06-02       Impact factor: 4.466

10.  Genome-wide gene expression profiling suggests distinct radiation susceptibilities in sporadic and post-Chernobyl papillary thyroid cancers.

Authors:  V Detours; L Delys; F Libert; D Weiss Solís; T Bogdanova; J E Dumont; B Franc; G Thomas; C Maenhaut
Journal:  Br J Cancer       Date:  2007-08-21       Impact factor: 7.640

View more
  21 in total

1.  Molecular Correlates of Metastasis by Systematic Pan-Cancer Analysis Across The Cancer Genome Atlas.

Authors:  Fengju Chen; Yiqun Zhang; Sooryanarayana Varambally; Chad J Creighton
Journal:  Mol Cancer Res       Date:  2018-11-06       Impact factor: 5.852

2.  Identification and validation of potential novel biomarkers to predict distant metastasis in differentiated thyroid cancer.

Authors:  Wenlong Wang; Cong Shen; Yunzhe Zhao; Botao Sun; Ning Bai; Xinying Li
Journal:  Ann Transl Med       Date:  2021-07

3.  Identification and Validation of a Prognostic Signature for Thyroid Cancer Based on Ferroptosis-Related Genes.

Authors:  Yue Wang; Jing Yang; Shitu Chen; Weibin Wang; Lisong Teng
Journal:  Genes (Basel)       Date:  2022-06-01       Impact factor: 4.141

Review 4.  Single-cell approaches for molecular classification of endocrine tumors.

Authors:  James Koh; Nancy L Allbritton; Julie A Sosa
Journal:  Curr Opin Oncol       Date:  2016-01       Impact factor: 3.645

5.  Identification of key biomarkers for thyroid cancer by integrative gene expression profiles.

Authors:  Jinyi Tian; Yizhou Bai; Anyang Liu; Bin Luo
Journal:  Exp Biol Med (Maywood)       Date:  2021-04-25

6.  Cancer Associated Fibroblasts and Senescent Thyroid Cells in the Invasive Front of Thyroid Carcinoma.

Authors:  Emanuela Minna; Silvia Brich; Katia Todoerti; Silvana Pilotti; Paola Collini; Elisa Bonaldi; Paola Romeo; Loris De Cecco; Matteo Dugo; Federica Perrone; Adele Busico; Andrea Vingiani; Ilaria Bersani; Andrea Anichini; Roberta Mortarini; Antonino Neri; Giancarlo Pruneri; Angela Greco; Maria Grazia Borrello
Journal:  Cancers (Basel)       Date:  2020-01-01       Impact factor: 6.639

7.  Immune gene signature delineates a subclass of thyroid cancer with unfavorable clinical outcomes.

Authors:  Jingtai Zhi; Jiaoyu Yi; Mengran Tian; Huijuan Wang; Ning Kang; Xiangqian Zheng; Ming Gao
Journal:  Aging (Albany NY)       Date:  2020-04-02       Impact factor: 5.682

8.  Identification of hub genes in papillary thyroid carcinoma: robust rank aggregation and weighted gene co-expression network analysis.

Authors:  Yang Liu; Ting-Yu Chen; Zhi-Yan Yang; Wei Fang; Qian Wu; Chao Zhang
Journal:  J Transl Med       Date:  2020-04-16       Impact factor: 5.531

9.  Integrated bioinformatics analysis of the association between apolipoprotein E expression and patient prognosis in papillary thyroid carcinoma.

Authors:  Qunguang Jiang; Wenqian Feng; Chengfeng Xiong; Yunxia Lv
Journal:  Oncol Lett       Date:  2020-01-17       Impact factor: 2.967

10.  Tumor Immune Microenvironment Characterization of Primary Lung Adenocarcinoma and Lymph Node Metastases.

Authors:  Yuan Zhou; Xinying Shi; Huan Chen; Beibei Mao; Xue Song; Lingling Gao; Jiao Zhang; Ying Yang; Henghui Zhang; Guo Wang; Wei Zhuang
Journal:  Biomed Res Int       Date:  2021-07-10       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.