Intratumor mutational heterogeneity has been documented in primary non-small-cell lung cancer. Here, we elucidate mechanisms of tumor evolution and heterogeneity in metastatic thoracic tumors (lung adenocarcinoma and thymic carcinoma) using whole-exome and transcriptome sequencing, SNP array for copy-number alterations (CNAs), and mass-spectrometry-based quantitative proteomics of metastases obtained by rapid autopsy. APOBEC mutagenesis, promoted by increased expression of APOBEC3 region transcripts and associated with a high-risk APOBEC3 germline variant, correlated with mutational tumor heterogeneity. TP53 mutation status was associated with APOBEC hypermutator status. Interferon pathways were enriched in tumors with high APOBEC mutagenesis and IFN-γ-induced expression of APOBEC3B in lung adenocarcinoma cells, suggesting that the immune microenvironment may promote mutational heterogeneity. CNAs occurring late in tumor evolution correlated with downstream transcriptomic and proteomic heterogeneity, although global proteomic heterogeneity was significantly greater than transcriptomic and CNA heterogeneity. These results illustrate key mechanisms underlying multi-dimensional heterogeneity in metastatic thoracic tumors. Published by Elsevier Inc.
Intratumor mutational heterogeneity has been documented in primary non-small-cell lung cancer. Here, we elucidate mechanisms of tumor evolution and heterogeneity in metastatic thoracic tumors (lung adenocarcinoma and thymic carcinoma) using whole-exome and transcriptome sequencing, SNP array for copy-number alterations (CNAs), and mass-spectrometry-based quantitative proteomics of metastases obtained by rapid autopsy. APOBEC mutagenesis, promoted by increased expression of APOBEC3 region transcripts and associated with a high-risk APOBEC3 germline variant, correlated with mutational tumor heterogeneity. TP53 mutation status was associated with APOBEC hypermutator status. Interferon pathways were enriched in tumors with high APOBEC mutagenesis and IFN-γ-induced expression of APOBEC3B in lung adenocarcinoma cells, suggesting that the immune microenvironment may promote mutational heterogeneity. CNAs occurring late in tumor evolution correlated with downstream transcriptomic and proteomic heterogeneity, although global proteomic heterogeneity was significantly greater than transcriptomic and CNA heterogeneity. These results illustrate key mechanisms underlying multi-dimensional heterogeneity in metastatic thoracic tumors. Published by Elsevier Inc.
Understanding the mechanisms by which metastatic lung adenocarcinoma (LUAD)
and thymic carcinoma (TC) evolve may provide greater insight into tumor progression
and may guide novel therapeutic avenues. Autopsy programs established to harvest
tumor tissue from metastatic sites at the end of life have demonstrated significant
heterogeneity depending on tumor type (Campbell et
al., 2010; Gundem et al., 2015;
Kumar et al., 2016; Liu et al., 2009; Patch
et al., 2015; Shah et al., 2009).
For example, metastatic pancreatic cancer has high inter-metastatic heterogeneity of
genomic rearrangements (Campbell et al.,
2010), but not of driver mutations (Makohon-Moore et al., 2017), whereas metastatic clear-cell renal
carcinoma has recurrent driver mutations that occur late within individual
metastases (Gerlinger et al., 2014),
resulting in high inter-metastatic heterogeneity. Within non-small-cell lung cancer
(NSCLC), previous work has characterized the evolution of primary tumors and has
demonstrated significant intra-tumor heterogeneity in copy-number alterations and
driver mutations (de Bruin et al., 2014;
Jamal-Hanjani et al., 2017; Zhang et al., 2014b). In these studies,
mutations generated by the cytosine deaminase activity of APOBEC (apolipoprotein B
mRNA editing enzyme, catalytic polypeptide-like) family of enzymes (substitutions of
C with T or G in TCA or TCT motifs (Roberts et al.,
2013) were found to occur late in the evolution of primary tumors (de Bruin et al., 2014) and were associated with
subclonal mutations (Jamal-Hanjani et al.,
2017). However, it is also critically important to understand the
evolution of metastatic NSCLC, given metastatic lineages develop early in primary
tumor development (Zhao et al., 2016). Thus,
the evolution of metastatic disease may be substantially different from primary
tumors. Moreover, previous NSCLCtumor heterogeneity studies have largely focused on
whole-exome sequencing approaches. The evolution of tumors at the level of
transcriptome or proteome and the underlying mechanisms that may generate
multidimensional heterogeneity remain largely unknown.Here, we sought to address the following questions:What is the degree of genomic (mutational and copy number),
transcriptomic, and proteomic heterogeneity within and between
metastases of a given patient?What is the relationship between these three levels of
heterogeneity?What are the potential drivers of such heterogeneity?To address these questions, we performed rapid (“warm”)
autopsies on four patients with LUAD and one patient with TC. The autopsies were
initiated within 3 h of death to allow for procurement of sufficient quantity and
quality of DNA, RNA, and protein from metastatic tumor tissue for whole-exome and
transcriptome sequencing, DNA copy-number analysis, and mass-spectrometry-based
proteomics. Our integrated analysis of the genome, transcriptome, and proteome
uncovered key mechanisms likely driving proteogenomic heterogeneity within and
across metastatic sites.
RESULTS
Sampling of Metastatic Tumors through Rapid Autopsy Protocol
We have established a rapid (“warm”) autopsy protocol for
thoracic malignancies (including lung cancers and thymic epithelial tumors,
among others) at the NIH Clinical Center. Under this protocol, patients with
metastatic disease who are near the end of life receive inpatient hospice care.
Upon death, an autopsy is performed within 3 h to procure sufficient quantity
and high-quality of DNA, RNA, and protein from all possible sites of metastatic
disease. For this study, we enrolled four patients with LUAD: patients RA000 and
RA004 with oncogenic KRAS mutations who were previous smokers
and patients RA003 and RA005 with EGFR mutations who were both
non-smokers. We additionally enrolled patient RA006 with squamous cell TC, a
non-smoker, who died within 1.5 years of diagnosis after no response to
treatment (Table S1).
All our study patients, except for patient RA000, were diagnosed with stage IV
disease, had received chemotherapy and/or targeted therapy (range 2–8
lines of therapy), and were previously enrolled in a clinical trial at the NIH
Clinical Center (Table
S1). For each patient, we harvested 44–183 metastatic tumor
lesions from multiple organs, including lung, liver, and kidney (Table S1). We selected a total of
40 tumor samples for further analyses based on tumor content by histology (Figure 1; Data S1). Before analysis, three
samples with sequencing data were removed from the study because of low tumor
content (Table S1).
Additional tumor tissue collected from the likely primary site at the time of
diagnosis was available for whole-exome sequencing in three patients (RA000,
RA003, and RA006).
Figure 1.
Flowchart of Study
Five patients underwent rapid autopsy (defined here as within 3 h of
patient death). Thirty-three metastatic lung adenocarcinoma (LUAD) and 7
metastatic thymic carcinoma tumors from lung, liver, and kidney were subjected
to whole-exome sequencing (WES), RNA sequencing, copy-number analysis, and
mass-spectrometry-based proteomics, followed by assessment of intra- and
inter-metastatic tumor heterogeneity. Three samples were removed from the study
after sequencing because of low tumor content.
Intra- and Inter-Metastatic Mutational Tumor Heterogeneity Is Highly Variable
within Patients and Can Be Extreme
For each patient, we performed whole-exome sequencing (WES) of the
metastatic tumors, primary tumors, where available, and matched germline DNA. A
range of 182 (RA005) to 1,058 (RA003) non-silent mutations were identified per
patient (Table S2). RNA
sequencing (RNA-seq) demonstrated a high, independent validation rate of WES
(Data S2), similar
to previous studies (Network, 2014). We found significant mutational
heterogeneity within each patient with non-truncal variants ranging from 67% in
RA000 to 99% in RA003 (Figure 2A).
Figure 2.
Intra- and Inter-metastatic Heterogeneity of Somatic Mutations of Tumors from
All Autopsy Patients
(A) Heatmaps depict the distribution of non-silent somatic mutations
among metastatic and, where available, primary tumors for each patient. Driver
mutations are listed to the left of each heatmap. The total number of non-silent
mutations and the percentage of non-truncal mutations are shown below each
heatmap. The bars to the right of each heatmap summarizes intra- and
inter-metastatic heterogeneity; mutations present in all regions (purple), in
more than one, but not all (green), or only in one region (brown).
(B and C) Jaccard similarity coefficients of metastases within each
patient based on mutations identified by exome sequencing (B) and expressed
variants by RNA-seq (C). Each circle represents the Jaccard similarity
coefficient between two metastases. Coefficients range from zero to one
representing highest and lowest heterogeneity, respectively. The p value for the
difference in mean Jaccard similarity coefficients between two groups of
patients is shown. P, tumor sample obtained at diagnosis; L, lung tumor at
autopsy; Li, liver tumor at autopsy; K, kidney tumor at autopsy.
Activating mutations in EGFR (RA003) and
KRAS (RA000, RA004) were present in primary and all
metastatic tumors of the respective patients. However, an activating
HRAS mutation was present in all metastatic sites of
patient RA006 at autopsy, but not in the primary tumor at the time of diagnosis.
Patients RA003 and RA006 had an average of 12 and 14 non-truncal driver
mutations, respectively. In contrast, there was an average of only 3.7
non-truncal driver mutations in the other three patients (RA005, RA004, and
RA000), demonstrating variability in metastatic site-specific driver mutation
acquisition among patients (Figure 2A).
Intra-tumor mutational heterogeneity in driver mutations was found when tested
in RA003 tumors L2d/L2e (Figure 2A).Next, we calculated Jaccard similarity coefficients (defined as the
ratio of shared to all mutations between two metastatic tumors) for each patient
to quantitatively assess intra- and inter-metastatic tumor genomic heterogeneity
(values 0–1 correspond to the range from minimal to maximal heterogeneity
(Blokzijl et al., 2016; Makohon-Moore et al., 2017) (Table S3). The means of
the Jaccard similarity coefficients for each patient ranged from 0.25 (RA003) to
0.73 (RA000) and were significantly different between patients (p = 2.2 ×
10−16, Kruskal-Wallis ranksum test; Figure 2B). Two patients, RA003 and RA006, were clear
outliers and exhibited what we termed “extreme” mutational
heterogeneity. These two patients had significantly lower combined mean Jaccard
similarity coefficients compared to the other patients (mean 0.28 vs. 0.57, p =
2 × 10−16, chi-square test) (Figure 2B). Jaccard similarity coefficients exhibited
a similar trend based on expressed mutations identified by RNA-seq analysis
(Figure 2C). Additional sequencing
resulting in a median exome coverage of 487× (range 435 to 528) for the
tumors from patients RA003 and RA006 did not substantially alter the observed
extreme level of heterogeneity (Figure S1; Table S3).To further evaluate mutational heterogeneity within each patient, we
applied PyClone to estimate the proportion of cancer cells identified with a
given mutation and the clonal structure of each tumor. As expected, truncal, key
driver mutations, such as EGFR and KRAS, were
clonal with high cellular prevalence whereas many shared or private mutations
had medium to lower cellular prevalence (subclonal) (Table S2). Patient RA003 had the
highest number of subconal mutations (range 243–506), followed by
patients RA006 (range 23–103) and RA005 (40–85) and then patients
RA004 (range 4–50) and RA000 (2–53) (Table S2). Similarly, patient RA003
had the largest and highest number of subclonal clusters, followed by patients
RA006 and RA005 (Figure
S2A). In contrast, patients RA004 and RA000 had very few subclonal
clusters (Figure S2A),
demonstrating differences in clonal structure between patients. However,
mutations that may be lost because of copy-number loss were not accounted for in
these analyses (McPherson et al., 2016).
Collectively, these results indicate that intra- and inter-metastatic mutational
heterogeneity can vary considerably among patients, with extreme heterogeneity
evident in a subset of patients.
APOBEC Mutagenesis Strongly Correlates with Mutational Tumor
Heterogeneity
Next, we generated phylogenetic trees by two independent methods, and
both showed a similar evolutionary pattern (Figures 3 and S2B–S2E). We then analyzed
mutational signature profiles for each tumor by layering phylogenetic trees with
mutational signatures to elucidate whether specific mutational processes could
explain the observed variability in mutational tumor heterogeneity. Smoking
signature mutations (C → A) were highly prevalent in patients RA000 and
RA004, who were smokers (Figure 3).
Mutations generated by the cytosine deaminase activity of APOBEC were most
prevalent in patients RA003, RA005, and RA006, all of whom were non-smokers
(Figure 3; Data S3). Smoking signature
mutations were generally truncal, whereas APOBEC signature mutations were
largely in the shared and private branches of the phylogenetic trees (Figures 3A–3E; Data S3).
Figure 3.
Inferred Phylogeny, Mutational Signatures, and APOBEC-Associated
Heterogeneity
(A–E) Phylogenetic trees were generated by the Phangorn method
from all validated mutations identified by whole-exome sequencing from tumors
within patient RA003 (A), RA006 (B), RA005 (C), RA000 (D), and RA004 (E) using
the maximum parsimony method. Trees are rooted in mutations common to all tumors
within each patient. Trunk and branch lengths are proportional to the numbers of
mutations acquired on the corresponding trunk or branch. Each private branch
represents mutations unique to each individual tumor. Colors represent COSMIC
mutational signatures. Select driver mutations and focal copy-number
amplifications/deletions are mapped to the trunks and branches as indicated.
Asterisks denote nonsense mutations.
(F–H) APOBEC fold enrichment correlates with Jaccard similarity
coefficients across NCI autopsy patient tumors based on exome variants (F),
expressed variants (G), and TRACERx lung adenocarcinoma patient tumors (patients
with five or more tumors shown) (H). Each circle represents mean APOBEC fold
enrichment and the Jaccard similarity coefficient between two tumors from a
given patient. P, tumor sample obtained at diagnosis; L, lung tumor at autopsy;
Li, liver tumor at autopsy; K, kidney tumor at autopsy.
P-values for correlation coefficients are shown.
To further assess the timing of APOBEC-induced mutagenesis, we evaluated
mutational signatures in available tumors collected at the time of diagnosis. We
found no evidence of APOBEC mutagenesis in these samples, including those from
patients RA003 and RA006 (Figures 3A and
3B), indicating that APOBEC signature mutations were acquired later
during further progression of metastatic disease. Next, we evaluated the
relationship between APOBEC mutagenesis and mutational heterogeneity. APOBEC
mutation fold enrichment, a measure of APOBEC mutagenesis (Roberts et al., 2013) (Table S4), strongly correlated with
Jaccard similarity coefficients based on WES variants (Pearson rho =
−0.66, p = 2.2 × 10−16) and expressed variants
by RNAseq (Pearson rho = −0.55, p = 7.704 ×
10−8) (Figures 3F and
3G). To validate this finding, we examined the relationship between
APOBEC mutagenesis and mutational heterogeneity in the TRACERx cohort of 17 LUAD
patients with five or more sites of multi-region exome sequencing (133 total
tumors)(Jamal-Hanjani et al., 2017).
In line with findings in our cohort, we found APOBEC mutagenesis strongly
correlated with Jaccard similarity coefficients (Pearson rho = −0.4689, p
= 1.255 × 10−8) in the TRACERx cohort (Figure 3H). Thus, these results suggest that APOBEC
mutagenesis is a significant contributor to the generation of mutational tumor
heterogeneity.
Expression of APOBEC3 Region Transcripts Correlates with
APOBEC Mutagenesis
Previous analyses of early-stage TCGA LUAD patients have shown a weak
but significant association between expression of APOBEC3
transcripts and APOBEC mutagenesis (Chan et al.,
2015). We examined RNA-seq data from our metastatic cohort to
determine whether APOBEC3 region transcript expression
contributed to the observed variability in APOBEC mutagenesis. Among the LUAD
patients, we found APOBEC3B to be expressed at higher levels
than APOBEC3A; this was particularly evident in the tumors of
patient RA003 (Figure 4A). APOBEC
mutagenesis (measured here by the counts of tCw to
tTw and tGw mutations) (Roberts et al., 2013) was highly correlated
with APOBEC3B expression (Pearson rho = 0.68, p = 9.4 ×
10−5), but not with APOBEC3A expression
(Pearson rho = 0.19, p = 0.34) (Figures S3A and S3b). Analysis of
isoform-specific expression of APOBEC3A and
APOBEC3B with custom TaqMan assays confirmed these results
(Figures S3C and
S3D). Moreover, in multiple tumors of patient RA003 with significant
APOBEC mutagenesis, APOBEC3B was expressed 20- to 50-fold
higher than APOBEC3A (Figures 4A
and 4B).
Figure 4.
Relationship between APOBEC Fold Enrichment, Mutational Signatures,
APOBEC3 Region Transcript Expression,
APOBEC3AB Germline Variant, and Mutant
TP53
(A) APOBEC fold enrichment of each tumor. Asterisks denote significant
enrichment. The dashed line denotes zero enrichment of APOBEC mutations. The
proportion of COSMIC signatures for each tumor appears below APOBEC fold
enrichment. The number of risk alleles for APOBEC3 germline
variant rs12628403 is shown below the proportion of COSMIC signatures. The
expression of APOBEC3B, APOBEC3A, and
APOBEC3AB is shown below the proportion of COSMIC
signatures.
(B and C) The relative expression of isoforms
APOBEC3B1, APOBEC3A1, and
APOBEC3AB in tumor relative to normal lung or liver by
custom TaqMan assays in patient RA003 (B) and patient RA006 (C). Six of the 37
tumors are not included here because of insufficient RNA for sequencing.
(D–F) APOBEC signature mutations per megabase in the TCGA (D) and in the
Broad dataset (E) and the total APOBEC mutations in TRACERx dataset (F) in
TP53 mutant compared with TP53 wild-type
tumors.
(G and H) APOBEC3B expression (G) and
APOBEC3A expression (H) in TP53 wild-type
and mutant tumors in the TCGA dataset. p values shown are adjusted for patient
age and the number of pack years smoked.
‡visual inspection of RNA-seq data shows expression of
APOBEC3AB.
See Figure S4E
for details.
To elucidate factors affecting APOBEC mutagenesis in our patients, we
genotyped an APOBEC3 germline variant, rs12628403, associated
with increased APOBEC mutagenesis (Nik-Zainal et
al., 2014). This germline variant is a proxy for a 30-kb deletion
that fuses the coding region of APOBEC3A with the 3′ UTR
of APOBEC3B to generate a chimeric
APOBEC3A-APOBEC3B
(APOBEC3AB) transcript. This chimeric transcript is more
stable than the APOBEC3A transcript and leads to higher
APOBEC3A protein levels in vitro (Caval et al., 2014). Only patient RA006, the TC
patient, was a carrier of the rs12628403 allele and was predicted to generate
the APOBEC3AB transcript (Figure
4A). Indeed, APOBEC3AB expression in tumors of this
patient was identified by RNA-seq (Figure
4A) and validated using a TaqMan assay (Figure 4C). However, the APOBEC3B transcript was
also expressed in tumors from this patient, suggesting that both may have
contributed to APOBEC mutagenesis. Expression of APOBEC3B
(Pearson rho = 0.77, p = 0.08) and APOBEC3AB (Pearson rho =
0.62, p = 0.18), but not APOBEC3A (Pearson rho = 0.23, p =
0.67), significantly correlated with APOBEC mutagenesis in all tumors from
patient RA006 (Figures
S4A–S4C).Although expression of APOBEC3AB was lower than
APOBEC3B (Figure 4C),
APOBEC3A encoded by APOBEC3AB is considered a more potent
inducer of mutagenesis than APOBEC3B (Caval et
al., 2014; Landry et al.,
2011). Therefore, we quantified the contribution of APOBEC3A and APOBEC3B
to APOBEC mutagenesis by calculating YTCA and
RTCA enrichment (where Y is a purine and R is a
pyrimidine), attributed to differential activity of these enzymes (Chan et al., 2015). In all tumors with high
APOBEC mutagenesis from patient RA006, there was significant enrichment of
YTCA compared to RTCA (Figure S4D), thereby
suggesting APOBEC3A-like mutagenesis as a likely driver of extreme heterogeneity
in this metastatic TC patient with a germline APOBEC3AB
deletion. Together, our results implicate expression of APOBEC3
region transcripts as a mediator of APOBEC mutagenesis in metastatic LUAD and
TC.
TP53 Mutations Are Associated with APOBEC Hypermutator
Status in LUAD
Given the role of TP53 in repressing APOBEC3B
expression (Periyasamy et al., 2017), we
hypothesized that mutant TP53, detected only in the LUAD
patient RA003, may be contributing to high APOBEC3B expression
and increased APOBEC mutagenesis. We found mutant TP53 was
associated with APOBEC hypermutator status (Nik-Zainal et al., 2014) in our cohort and in three independent LUAD
datasets (Imielinski et al., 2012; Jamal-Hanjani et al., 2014; Network, 2014) (Table S5). Moreover, mutant
TP53 was also associated with significantly higher counts
of APOBEC signature mutations, higher expression of APOBEC3B,
and an increase in APOBEC3A, compared with wild-type
TP53tumors in the TCGA dataset (Figures 4D–4H). Thus, our results suggest
mutant TP53 contributes to increased APOBEC3B
expression, APOBEC mutagenesis, and is associated with APOBEC hypermutator
status in LUAD.
Integration of Copy Number, Transcript, and Protein Abundance Highlights
Mechanisms of Proteomic Heterogeneity
To evaluate multi-dimensional heterogeneity, we plotted Pearson
correlation coefficients (PCCs) for each data type between pairs of tumors for
each patient across all genes for which copy number, transcript expression, and
protein abundance data were available (Figures
5A–5E). Each patient displayed variable patterns of
heterogeneity across each data type. Patient RA003 exhibited the least (Figure 5A), whereas patient RA004 the most
(Figure 5D) heterogeneity. Patient
RA005 showed the most heterogeneity in gene expression and protein abundance
only between tumor L5d and tumors L2a/L2b/L5c (Figure 5C). Patient RA006 showed lower heterogeneity in copy number,
gene expression, and protein abundance within three pairs of tumors (L20b/L5a,
Li3c/Li1a, L12a/L3a) compared to other pairs (Figure 5B).
Figure 5.
Intra-Patient Multi-Omic Tumor Relationships
(A–E) Pearson correlation coefficients (PCCs) are shown on each
axis across all common genes identified in copy number, gene expression, and
protein abundance datasets between all tumors within patient RA003 (A), RA006
(B), RA005 (C), RA004 (D), and RA000 (E). Each circle represents the
relationship between two tumors. Previously designated tumor clusters are
assigned the same color. Colors are independently assigned for each patient.
(F) PCCs between tumors from each patient grouped by data type: copy
number, gene expression, and protein abundance. p values for the difference in
mean PCCs between data types are shown for each patient.
Next, we plotted PCCs within each patient for each data type. Protein
heterogeneity was significantly greater than gene expression and copy-number
alteration (CNA) heterogeneity within all patients (Figure 5F). We then performed pairwise comparison of
the PCCs of each data type, including a separate analysis using exome
copy-number data that was corrected for tumor purity (Figure S5; Table S6). We found a strong,
positive linear correlation between CNA heterogeneity and protein heterogeneity
for patients RA004 and RA006, but not for the other patients (Figures S5A, S5D, S5G, S5J, and
S5M), providing evidence that CNAs can lead to protein heterogeneity in
these patients. Gene expression heterogeneity was also associated with protein
heterogeneity but only in patients RA000 and RA006 (Figures 5B and 5N). Together, these results
demonstrate high heterogeneity in protein abundance between metastases of these
patients that partly stems from heterogeneity in CNAs and gene expression.
Late-Event CNAs Contribute to Heterogeneity in Gene Expression and Protein
Abundance
Next, we performed hierarchical clustering by chromosomal cytoband, gene
expression, and protein abundance to further evaluate heterogeneity within each
data type. Metastases from each patient clustered together for CNAs, gene
expression, and protein abundance (Figures 6A and S6A–S6C). Metastases from patients RA004 and RA006 showed the
lowest correlation in protein abundance (Figures S6D and S6E) and clear
differences in CNAs (Figure 6A; Table S7). To explore the
downstream effects of CNAs, we plotted gene expression and protein abundance
ratios of genes within each chromosomal arm between metastatic lineages of each
patient (Figures 6B and 6C; Table S8; Data S4). Arm-level CNAs
within tumors of patients RA006 and RA004 corresponded with changes in gene
expression and protein abundance of genes at the corresponding arm level. For
example, copy-number differences in arm 4p between RA006 tumors corresponded
with changes in expression and protein abundance (Figure 6B). Where there were no copy-number differences, such as in
arm 7q in patient RA006 and arms 4p/7q of patient RA003, there were no
corresponding changes in gene expression and protein abundance (Figure 6C).
Figure 6.
Copy-Number Heterogeneity Corresponds with Transcriptomic and Proteomic
Heterogeneity
(A) Hierarchical clustering by copy number across the genome across all
tumors from all patients (at cytoband resolution). Losses (purple) and gains
(red) in log2 scale are depicted relative to mean ploidy. Mean ploidy is shown
in the top row (rounded to nearest integer)..
(B and C) Copy-number differences in chromosomal arms 4p and 7q between
tumors of patient RA006 (B) and patient RA003 (C) and corresponding changes in
FPKM and protein abundance. Probability density plots show the log ratios of
mean FPKM by chromosomal arm for sets of tumors as displayed. The dashed line on
the x axis represents the ratio of 1 (or log ratio 0). For visualization
purposes, the x axis was cut at log −1 and 1 for RNA and log −0.1
and 0.1 for protein. The y axis was cut at a probability density of 2..
(D) The phylogenetic tree depicts tumors of patient RA004 with
corresponding copy number and RNA-seq FPKM of CCND1 for each tumor..
(E) Protein expression of CCND1 for tumors of patient RA004 as assessed
by immunohistochemistry from tissue microarrays.
Heterogeneity in focal-level CNAs also corresponded with changes in RNA
and protein. For example, CCND1 was highly amplified in patient
RA004 tumors L2a, L3a, and L8b (Table S9), which corresponded with
high gene and protein expression. On the other hand, tumors L1b and L1c in which
there was minimal increase in CCND1 copy number and gene and
protein expression were low (Figures 6D and
6E). Interestingly, among patient RA004 liver and kidney tumors, Li1b
and K1, CCND1 was highly amplified with correspondingly high
protein but moderate gene expression, suggesting tissue-specific discordance of
gene and protein expression of select genes (Figures 6D and 6E).Next, we constructed circos plots (Data S5) and phylogenetic trees
based on CNAs for each patient. Both arm- and focal-level CNAs largely occurred
early in tumor development (i.e., truncal) in patients RA000, RA003, and RA005
(Figures 3 and S7A, S7B, and S7D) but occurred
later in tumor development (i.e., shared and private) in patients RA004 and
RA006 (Figures 3, S7C, and S7E). These results
suggest that late, not early, CNAs likely contributed to the observed changes in
gene expression and protein abundance between metastases of these patients.
Differential focal and arm-level CNAs may reflect ongoing chromosomal
instability as well as selective pressure during evolution of metastatic
lineages.
Enrichment of Interferon Signaling Pathways in Tumors with High
APOBEC3 Expression and Immune Heterogeneity
Next, we sought to decipher common gene sets or pathways within the
RNA-seq and mass-spectrometry-based quantitative proteomics data that were
heterogeneously enriched within each patient. Using an unbiased approach with
single-sample gene set enrichment analysis (ssGSEA) (Barbie et al., 2009; Subramanian et al., 2005), we found interferon (IFN)-signaling
pathways (related to possible activity of the IFNs: type I [IFN-α,
IFN-β], type II [IFN-γ], and type III [IFN-λ1–4]),
were the most significantly and differentially enriched pathways within patient
RA003 (tumors L1 and L4a) and RA006 (tumors L20b, L5a, Li1a, and Li3c) (Figures 7A–7D) at both gene expression
and protein abundance levels. No common outlier gene sets were identified for
the negatively enriched pathways. Within the tumors from patients RA000, RA004,
and RA005, no recurrent, common pathways were identified by GSEA of both the
transcriptome and proteome (Table S9). The six tumors from patients RA003 and RA006 that were
enriched in IFN-signaling pathways also had the highest expression of
APOBEC3 region transcripts (Figure 4A). To validate these findings in a larger cohort, we
interrogated the TCGA LUAD dataset. We found the expression of
STAT1, a downstream effector of IFN-γ signaling, was
higher in LUAD tumors with increased expression of both
APOBEC3A and APOBEC3B (Figures S8A and S8B). Multiple
other genes within IFN-α and IFN-γ pathways were also
significantly associated with expression of APOBEC3A and
APOBEC3B in the TCGA LUAD dataset (Table S10). To test whether
APOBEC3A and APOBEC3B can be induced by
IFNs, we treated three LUAD (A549, HCC4006, and H1975) and two immortalized
normal lung epithelial cell lines (HBEC and HPL1D) with IFN-γ. These cell
lines were chosen to represent major driver subtypes of LUAD (KRAS and EGFR) and
normal lung epithelium. At baseline, expression of APOBEC3B was
generally higher than expression of APOBEC3A. Treatment with
IFN-γ led to a significant increase in expression of
APOBEC3B, whereas expression of APOBEC3A
increased significantly, but marginally, in only A549 and HCC4006 cells (Figure 7E). Thus, our results suggest
IFN-signaling contributed, in part, by the tumor microenvironment, is a
potential mechanism of heterogeneity of LUAD tumors with increased
APOBEC3B transcript expression.
Figure 7.
Site-Specific Enrichment of Interferon Signaling Pathways, Immune Signature
Heterogeneity, and the Effect of Interferon on Expression of
APOBEC3 Genes among Lung Cancer and Epithelial Cell
Lines
(A–D) Single-sample gene set enrichment (ssGSEA) analysis of
transcriptome and proteome using the REACTOME databases are shown for tumors
from patients RA003 (A and B) and RA006 (C and D). Significantly enriched
interferon pathways (q < 0.05 for transcriptome, p < 0.10 for
proteome) are colored red..
(E) APOBEC3A and APOBEC3B expression
among lung cancer cell lines A549, HCC4006, and H1975 and immortalized normal
lung epithelial cell lines HBEC and HPL1D treated with interferon-gamma. p
values were calculated between time points as indicated for the significant
changes using two-sided t tests. Data are shown as values for individual
biological replicates (n = 3) and mean (red bars), normalized to endogenous
controls and presented on the log2 scale.
(F and G) Immune signature scores within the transcriptome. (F) and
proteome (G) are shown between all tumors of patients RA000, RA003, RA004,
RA005, and RA006. Scores were normalized across all tumors separately for
transcriptome and proteome. ES, enrichment score; IFN, interferon; A3A,
APOBEC3A; A3B, APOBEC3B. Symbols denote comparisons between groups of tumors as
indicated.
To further interrogate heterogeneity in the immune microenvironment
between tumors of each patient in an unbiased manner, we analyzed the gene
expression and protein abundance data using ssGSEA based on CIBERSORT immune
genes (Newman et al., 2015) and an
alternative immune scoring method (Aran et al.,
2017) (Table
S10). The overall immune signature score for tumors within each
patient considerably varied between transcriptome and proteome in patients RA003
and RA005 (Figures 7F and 7G). Patients
RA000 and RA004 showed low and high overall immune signature scores,
respectively. In contrast, tumors from patient RA006 showed large and consistent
differences in immune signature scores across both the transcriptome and
proteome (high in L20b and L5a vs. low in L12a, L3a, Li1a, Li3c), demonstrating
heterogeneous immune cell infiltration in the tumor microenvironment of this
patient (Figures 7F and 7G).
DISCUSSION
Genomic, transcriptomic and proteomic analyses of tumors from multiple
anatomic sites sampled through rapid autopsy offer a unique opportunity to
comprehensively explore the biological processes that shape the evolution of
metastatic tumors. Here, we have characterized the proteogenomic evolution of
metastatic lung and thymic carcinoma through exome and transcriptome sequencing, CNA
analysis, and unbiased quantitative mass spectrometry-based proteomics of 37
metastatic tumors acquired through rapid autopsy. Through this integrative analysis,
we have uncovered insights into the mechanisms likely driving the mutational,
transcriptomic, and proteomic landscape of these metastatic tumors.At the genomic level, we provide evidence that APOBEC mutagenesis is a
driver of mutational heterogeneity in metastatic lung and thymic carcinoma tumors.
APOBEC mutagenesis has been described as one of the most common mutational processes
second only to “ageing” (Alexandrov et
al., 2013). To date, however, within thoracic tumors, this process has
been described mostly in primary tumors (Burns et
al., 2013a, 2013b; de Bruin et al., 2014). These studies have shown APOBEC
mutagenesis to be associated with subclonal mutations that occur late in the
evolution of primary tumors and within spatially distinct regions (de Bruin et al., 2014; Jamal-Hanjani et al., 2017). In our study, we identified a subset of
patients with high APOBEC mutagenesis in metastases, but not in the primary tumors,
suggesting APOBEC mutagnesis can generate mutations late in the evolution of
metastatic disease.Given that all patients in our cohort received prior treatment, we cannot
exclude the possibility that therapy contributed to the observed findings. However,
recent studies assessing tumors pre- and post-chemotherapy in multiple tumor types
did not find an increase in overall mutational load or a significant increase in
APOBEC signature mutations (Liu et al., 2017;
Noorani et al., 2017). Moreover, the
patients in our cohort with the highest level of APOBEC mutagenesis, RA003 and
RA006, received the least therapy. Additionally, we demonstrated increased
mutational heterogeneity with increasing APOBEC mutagenesis among the TRACERx
cohort, which explored intra-tumor heterogeneity in primary lung cancer among
patients without previous chemotherapy, further validating the results in our cohort
of metastatic tumors (Figure 3H).While the existence of mutational heterogeneity in metastases has been
previously described (Campbell et al., 2010;
Yachida et al., 2010), the mechanisms
remained unclear (Vogelstein et al., 2013).
Our data suggest APOBEC mutagenesis can generate both putative driver and passenger
mutations late in metastases, thereby generating inter-metastatic mutational
heterogeneity that in some cases can be extreme. These results stand in contrast to
recent genomic studies of the metastases of patients with pancreatic (Makohon-Moore et al., 2017) and prostate (Kumar et al., 2016) cancer, which have shown
limited mutational heterogeneity and no significant APOBEC mutagenesis (Roberts et al., 2013). Both of these tumor
types have shown no evidence of APOBEC mutagenesis within primary tumors
highlighting the likely histologic specificity of this process. Ultimately, the
clinical importance of APOBEC mutagenesis will be determined by the response of
heterogeneous metastatic tumors—with and without APOBEC signature
mutations—to chemotherapy, targeted agents and/or immunotherapy.Both APOBEC3A and APOBEC3B have been shown to localize to the nucleus (Lackey et al., 2013) leading to potent DNA
damage (Taylor et al., 2013), deaminase
activity and base substitutions in the genome (Burns
et al., 2013a; Shinohara et al.,
2012). Upregulation of APOBEC3B causes APOBEC signature
mutations in vitro (Akre et al.,
2016). Expression of APOBEC3A and
APOBEC3B also has been associated with APOBEC signature
mutations in primary tumors from multiple cancer types including LUAD (Burns et al., 2013a, 2013b; Chan et al.,
2015; Leonard et al., 2013).
However, it is unclear whether the same transcripts promote APOBEC mutagenesis in
metastatic lung and thymic carcinoma tumors. In our set of metastatic tumors, we
observed a strong correlation between expression of APOBEC3B and
APOBEC3AB transcripts and APOBEC mutagenesis, suggesting
expression of such transcripts may be a more dominant mechanism of APOBEC
mutagenesis in metastatic thoracic tumors as opposed to earlier stage disease. These
results are in line with recent work in breast cancer that has shown higher
APOBEC3B expression in metastatic disease compared to
early-stage primary tumors (Sieuwerts et al.,
2017).Mutant TP53 has been previously associated with higher
APOBEC signature mutations in breast cancer (Burns et
al., 2013a) and there is recent evidence that TP53 can repress
APOBEC3B expression through direct transcriptional regulation
of its promoter (Burns et al., 2013a; Menendez et al., 2017; Periyasamy et al., 2017). Our results suggest mutant
TP53 may be an important contributor of increased
APOBEC3B and subsequent generation of APOBEC signature
mutations in LUAD. In particular, we show mutant TP53 is associated
with APOBEC hypermutator status in LUAD using three large-scale independent
datasets. It is important to analyze APOBEC hypermutator and TP53
mutation status and the resultant mutational heterogeneity in future clinical
trials.Expression of the APOBEC3AB transcript was captured in our
study by the presence of the APOBEC3 germline variant, rs1262840,
within thymic carcinomapatient RA006. This germline variant has been previously
associated with increased APOBEC signature mutations in primary breast cancer tumors
(Nik-Zainal et al., 2014). Whether this
variant is also associated with high APOBEC mutagenesis and mutational heterogeneity
within thymic carcinoma is unknown, as no previous multi-region sequencing study has
been performed on this rare tumor type. Additionally, to our knowledge, apart from
the current study, this variant has not been examined in relation to APOBEC
mutagenesis in metastatic disease. Given that this germline variant can be easily
tested utilizing blood DNA, our results warrant further testing of the association
between this APOBEC3 germline variant with APOBEC mutagenesis and
mutational tumor heterogeneity in thymic carcinoma as well as other metastatic tumor
types.Our integrated CNA, RNA-seq, and quantitative mass spectrometry study
demonstrates that late-event CNAs can be important drivers in the evolution of
metastatic cancer through downstream changes in transcript and protein abundance.
Early studies in yeast showed CNAs for a given gene lead to proportional increases
in protein abundance (Hughes et al., 2000;
Pavelka et al., 2010; Rancati et al., 2008; Torres et al., 2007). More recently, studies in primary tumors
demonstrated variability in CNA to protein cis-effects (Mertins et al., 2016; Zhang et al., 2014a, 2016). In
metastatic disease, multiple studies have reported late-event CNAs (Campbell et al., 2010; Ding et al., 2010; Gerlinger et al.,
2014; Hieronymus et al., 2014;
Robinson et al., 2015; Yates et al., 2015), but the effect of CNAs on transcript
and protein abundance has not been examined.In one recent metastatic pancreatic cancer study, CNA differences among
tumor suppressor genes were not evident at the protein level by IHC suggesting
late-event CNAs can be stochastic changes rather than evolutionary selected events
(Makohon-Moore et al., 2017). In the
current study, all patients had some evidence of late-event CNAs. However, only
patients with significant differences in late-event CNAs between tumors exhibited
corresponding differences in transcript and protein abundance. In light of the
recent association between copy-number heterogeneity and increased recurrence and
death in early-stage NSCLC (Jamal-Hanjani et al.,
2017), our results raise the question of whether late-event CNAs, through
downstream effects on gene expression and protein abundance, can result in worse
outcomes for a subset of patients with metastatic cancer. Proteomic heterogeneity
induced, in part, by CNAs may also explain why chromosomal instability (CIN) has
been associated with poor outcomes in cancer (Carter
et al., 2006; Choi et al., 2009;
Jamal-Hanjani et al., 2017; Walther et al., 2008). Importantly, proteomic
heterogeneity in our set of metastatic tumors, was much higher than CNA and
transcriptomic heterogeneity, suggesting other mechanisms, such as epigenetic and
post-translational modifications, also may be important drivers of proteomic
heterogeneity.Through our unbiased analysis of transcriptomic and proteomic data, we found
enrichment of IFN signaling within the microenvironment of tumors of patients with
the highest APOBEC3 region transcript expression. Moreover, we show
that in LUAD and normal lung epithelial cell lines, APOBEC3B
expression, and in some LUAD cell lines, APOBEC3A expression can be
induced by IFNγ treatment. These data suggest IFN signaling within the tumor
microenvironment may, in part, influence APOBEC3 region transcript
expression and thereby contribute to heterogeneity in APOBEC signature mutations
within the tumors of a given patient.We also found transcriptomic and proteomic heterogeneity in immune
signatures within and between patients. Major advances have been made in the
treatment of metastatic tumors, including LUAD and TC, through immunotherapies such
as immune checkpoint blockade (Borghaei et al.,
2015; Giaccone et al., 2018).
Nonetheless, only a subset of patients responds and multiple metastases within a
given patient may respond differently because of immune heterogeneity (Reuben et al., 2017). Even without
immunotherapy, metastases within a patient may also have differing tumor immune
microenvironments, as we demonstrate within TC patient RA006 and as has been
recently shown within an ovarian cancerpatient (Jiménez-Sánchez et al., 2017). We further demonstrate
that, within a given patient, the tumor immune microenvironment may exhibit
substantial differences between the transcript and protein expression, adding to the
complexity of assessing the immune microenvironment.One of the strengths of our study is the comprehensive examination of tumor
heterogeneity by integrating genomic (exome, CNA), transcriptomic (RNA-seq) and
proteomic (global mass spectrometry analysis for protein abundance) data from
multiple metastatic tumors procured through rapid autopsy. One of the limitations of
our study is that all autopsy patients in this study were diagnosed at the late
stage of metastatic disease. Hence, we were unable to conduct a complete temporal
analysis of tumor evolution from early- to late-stage disease. The ongoing
TRACERx study (Jamal-Hanjani et al.,
2014, 2017) and corecruitment of
those patients to the PEACE (Posthumous Evaluation of Advanced Cancer Environment)
post-mortem study (Abbosh et al., 2017) will
allow for better elucidation of the evolution primary tumor to metastatic advanced
disease, including at the end of life.In conclusion, in this report, we present the heterogeneous genomic,
transcriptomic, and proteomic landscape of metastatic lung and thymic carcinoma as
well as identify possible mechanisms underlying such multi-level heterogeneity. High
activity of the APOBEC3 enzymes, represented by transcript expression, and modulated
by germline variation, mutant TP53, and the immune
microenvironment, can greatly alter the genomic landscape between metastatic tumors
of a given patient. Arm-level and focal CNAs occurring later in tumor evolution can
generate significant downstream heterogeneity through effects on gene expression and
protein abundance. Further studies by comprehensive analyses of multiple metastatic
sites from larger patient populations, including different tumor types, are
warranted to validate these mechanisms. Such an endeavor requires development of
rapid autopsy programs, meticulous collection and processing of tumors from all
possible sites of disease and integrated “omics” analyses. These tumor
heterogeneity studies will be integral for evaluating the outcomes of ongoing
clinical trials, developing new paradigms in clinical trial design, and, ultimately,
to improve survival for patients with metastatic cancer.
STAR★METHODS
CONTACT FOR REAGENT AND RESOURCES SHARING
Further information and requests for reagents may be directed to and
will be fulfilled by the Lead Contact, Udayan Guha
(udayan.guha@nih.gov).
EXPERIMENTAL MODEL AND SUBJECT DETAILS
Biospecimen Acquisition
Samples were obtained from five patients diagnosed with thoracic
malignancies who underwent rapid autopsy. Informed consent for rapid autopsy
was obtained under an IRB approved protocol 13-C-0131 (NCT01851395) entitled
“A Pilot Study of Inpatient Hospice with Procurement of Tissue on
Expiration in Thoracic Malignancies.” Patients previously treated at
the NCI and with life expectancy less than 3 months were offered inpatient
hospice treatment at the Clinical Center of the National Institutes of
Health and upon death autopsies were initiated within 3 hours. Clinical
information, including the sex, gender of each patient is available in Table S1. One
patient, RA003, elected to receive end of life care at home and was
subsequently transported to the NIH Clinical Center post-mortem.
Prioritization of lesions removed at autopsy was based on CT scan performed
within one month before death. All tumors within each patient were removed
by an experienced pathologist and macro dissected to remove surrounding
non-neoplastic tissue. Punch biopsy needles were used to obtain spatially
distinct cores from each tumor. One-third of each tissue core sample was
fixed in 10% buffered formalin, one-third in optimal cutting temperature
compound (OCT) and the remaining tissue was immediately flash frozen in
liquid nitrogen and stored at −80°C. For each tissue sample, a
5-μm section was taken to create a hematoxylin and eosin slide to
visualize neoplastic cellularity using a microscope.For each patient, normal tissue, if available, and/or a blood sample
was used as a normal control. DNA and RNA was isolated from approximately 30
mg of snap-frozen tumor tissue using the All Prep DNA/RNA Mini Kit (QIAGEN).
RNA was partially degraded with an average RNA integrity number 5.17 but was
comparable between organs of different patients and similar in quality to
previous postmortem studies. To ensure adequate quality, samples RA003_L2f,
RA006_LN2a, RA005_L4a were removed post-sequencing but before analyses due
to low tumor content (less than 20 percent) based on Sequenza purity
estimates.
METHOD DETAILS
Whole-exome sequencing data processing, variants calling, filtering and
annotation
Whole-exome sequencing of tumor and normal samples was performed at
a sequencing core at the NCI Frederick National Laboratory at the National
Cancer Institute (NCI). Libraries were constructed and then sequenced as 2
× 126 nt paired-end reads with Illumina HiSeq2500 sequencers. Mean
coverage depth was 161x (range 114x to 231x). Raw sequencing data in FASTQ
format were aligned against the reference human genome (hg19) with BWA
(Li and Durbin, 2009). The
alignment BAM files were further processed following GATK’s best
practices (McKenna et al., 2010) with
Picard tools, namely MarkDuplicates, IndelRealigner, and BaseRecalibrator.
Somatic variants were then called from the processed BAM files using Strelka
(v1.0.10) (Saunders et al., 2012)
with the default version of BWA configuration file. The identified somatic
variants reported in the “passed” vcf files
by Strelka were used for further analysis. Variants were functionally
annotated using snpEff/snpSift version 3.4 (see URLs) with databases of
GRCh37.70 and dbNSFP version 3.4, and the types of variants were filtered
using snpSift (Cingolani et al.,
2012). If a variant was also reported in one of the three public
databases: 1000 Genomes Project, ExAC, and ESP-NHLBI with a MAF greater than
5%, the variant was removed. For each patient, variants identified by
Strelka (Saunders et al., 2012) from
all tumor regions were combined to get a unique variant list. Using this
patient-specific list, if a variant in a particular tumor site was not
called by Strelka, the Samtools mpileup was used to retrieve the reference
and alternative reads coverage for each SNV site. If the site had > =
2 alternative reads and VAF > = 1%, the SNV was considered present in
the tumor site. For short indels, if the variant site was not in the
“passed” vcf file, the Strelka called
“all” vcf file is used to retrieve the
reference and alternative reads coverage; if missing in the
“all” vcf file, the indel site was considered
absent. Patient RA000 was previously known to have a KRASG12C mutation based on molecular profiling. Although this mutation did not
pass variant filtering, it was noted on manual review. Identified missense
mutations were manually reviewed using the Integrative Genomics Viewer
version 2.4 (Robinson et al., 2011;
Thorvaldsdóttir et al.,
2013).
Phylogenetic analysis
Phylogenetic analysis was conducted using Phangorn (Schliep, 2011) and phytools R packages with all
identified variants (silent and non-silent) from all tumor sites in each
patient after converting the mutation profile into binary format. The
initial phylogenetic relationships between tumor regions for an individual
patient was inferred using both the Maximum Parsimony and the Unweighted
Pair Group Methods (UPGMA). Phylogenetic trees were then redrawn by hand in
Adobe Illustrator with branch length proportional to the number of mutations
specific to one tumor (private), two or more tumors (shared) or all tumors
(trunk). Driver mutations and focal CNAs were added to the branches. All
non-synonymous and synonymous mutations were used for tree construction.
Phylogenetic trees were also constructed for each patient using the
Treeomics computational tool (Reiter et al.,
2017) using bootstrapping values from 1,000 samples except for
patient RA004, which failed to run likely due to the large number of tumors.
Signature analysis was performed for each individual tumor as well as trunks
and each subsequent branch point for each tumor using deconstructSigs (Rosenthal et al., 2016). Mutations that
were not signature 1, 2, 4, 5, or 13-type were labeled as
“unclassified.” Mutational signature analysis was restricted
to branches with at least 10 mutations. COSMIC mutational signatures were
calculated for each branch using the R “deconstrucSigs”
package (Rosenthal et al., 2016).
Arm-level amplification and deletions were also added to previously
generated phylogenetic trees for each patient (Figure S7). For these trees,
branch lengths were redrawn for visualization purposes only.
Identification and classification of driver mutations
All identified nonsynonymous mutations were filtered to include only
driver genes based on large-scale non-small cell lung cancer sequencing
studies (Campbell et al., 2016; Ding et al., 2008; Govindan et al., 2012; Imielinski et al., 2012; Lawrence et al., 2014; Network, 2014; Weir et al., 2007) and in the COSMIC cancer gene census
(downloaded June 2016). We classified all nonsynonymous mutations into the
three categories. Category 1 ‘high-confidence driver
mutations’ contained all disrupting mutations (nonsense, frameshift,
splicing or ‘deleterious’ missense) in tumor suppressor genes
or activating amino acid substitutions in non-small cell lung cancer
oncogenes as described in lung cancer sequencing studies. Category 2
‘putative driver mutations’ contained amino acid substitutions
located at the same position or up to 5 amino acids away from a substitution
present in COSMIC. Category 3 ‘low confidence driver
mutations’ contained all other nonsilent mutations in genes that were
present in the lists of cancer-related genes described above. Mutations were
then analyzed using COSMIC to determine whether the amino acid substitution
has been previously identified. Category 2 were further scored as
‘deleterious’ when at least two out of the three predictors
classified the mutation as deleterious Functional prediction scores (SIFT,
PolyPhen2, and Provean). All category 1 mutations were considered
deleterious and category 3 mutations were not included as driver mutations
in any analyses.
APOBEC germline deletion genotyping
Germline APOBEC3AB deletion was genotyped by a
proxy SNP rs12628403 using a custom-designed TaqMan genotyping assay, as
described previously (Middlebrooks et al.,
2016). For patient RA006, the deletion status was also confirmed
in all six tumors by Sanger sequencing, and by expression analysis.
qRT–PCR analysis
Lung cell lines A549, HCC4006, and H1975 and lung epithelial cell
lines HBEC-3KT and HPL1D (Masuda et al.,
1997) were grown on 6-well plates in triplicate until confluent,
followed by treatment with IFNγ (1 ng/ml, R&D Systems) for 8 and
24 hours. Cells were lysed with RLT buffer supplemented with
B-mercaptoethanol. Total RNA for cell lines and tumors was isolated with the
QIAGEN All Prep DNA/RNA Mini Kit with on-column DNase I treatment. cDNA was
prepared from equal amounts of total RNA for each sample with the RT2
first-strand cDNA kit and random hexamers with an additional DNA removal
step (QIAGEN). Expression of APOBEC3A,
APOBEC3B, and APOBEC3AB deletion and
endogenous controls GAPDH and PPIA was
measured in each cDNA with TaqMan expression assays from Thermo Fisher.
Custom assays were used for APOBEC3B: (F:
TGCTGGGAAAACTTTGTGTACAAT; R: ATG TGTCTGGATCCATCAGGTATCT; Probe:
FAM-ATTCATGCCTTGGTACAAA), and APOBEC3AB (F:
ATCATGACCTACGATGAATTT AAGCA; R: AGCACATTGCTTTGCTGGTG; Probe:
FAM-CATTCTCCAGAATCAGGG), and commercial assay Hs00377444_m1 was used for
APOBEC3A, 4326317E for GAPDH and
4326316E for PPIA (Thermo Fisher Scientific). Reactions
were performed in four technical replicates on QuantStudio 7 (Life
Technologies) using TaqMan Gene Expression buffer (Life Technologies); water
and genomic DNA were used as negative controls for all assays. Expression
was measured by Ct values (PCR cycle at detection threshold). Expression of
APOBEC3A, APOBEC3B and
APOBEC3AB was individually normalized by the geometric
mean of endogenous controls (GAPDH and
PPIA) based on relative quantification method as
ΔCt = Ct (control) – Ct (target).
Analysis of APOBEC mutagenesis
APOBEC-signature mutation analysis for all autopsy tumor samples was
determined using an R software package kindly provided by Dr. Dmitry A.
Gordenin. We used two variables in the file
*_sorted_sum_all_fisher_Pcorr.txt: the ‘tCw_to_G+tCw_to_T’
variable, which represents total counts of APOBEC-signature mutations, and
the ‘APOBEC_Enrich’ variable, which accounts for statistical
significance of enrichment and represents the level APOBEC mutagenesis
pattern per sample. This second variable is more stringent, as many samples
were not enriched at a statistically significant level and were classified
as negative for APOBEC-signature mutations. We identified APOBEC
‘hypermutators’ as those with signature 2 + 13 mutations (TCGA
and Broad datasets) or total APOBEC mutations (TRACERx and our dataset)
exceeding 1.5 times the length of the interquartile range from the 75th
percentile (Nik-Zainal et al., 2014).
We also used the same R software package to determine
RTCA and YTCA enrichment
for patient RA006. A less stringent filtering of whole-exome variants was
used to provide sufficient sample size for this analysis. A
Benjamin-Hochberg P value of 0.05 was used as a threshold for significance,
unless specified otherwise, and all tests were two-sided.
TCGA analyses
TCGA data for lung adenocarcinoma were downloaded directly from the
Firehose pipeline of the Broad Institute. For gene expression, we used both
RNA-SeqV1 (Reads Per Kilobase per Million, RPKM) or RNA-SeqV2 (RNA-Seq by
Expectation Maximization, RSEM).
Copy Number Alteration Analysis
Copy number alteration (CNA) analysis was performed using MIP array
technology (Affymetrix OncoScan FFPE Express 2.0) with 334,183 sequence tag
site probes which were used to measure DNA copy number at different loci
across the human genome. Copy number data were processed and using the
Affymetrix OSCHP-SNP-FASST2 algorithm within the Nexus Copy Number Software,
which corrects for tumor ploidy using median centering. We used a log2 ratio
cut-off of ± 0.5 to define focal copy number amplifications and
deletions and ± 0.25 to define arm-level copy number amplifications
and deletions. To minimize overcalling heterogeneity of copy number
alterations, we employed the following methods: 1) tumors without ±
0.5 focal amplification/deletion were included if they had at log2 ratio
± 0.2 and tumors without ± 0.25 arm-level
amplification/deletion were included if they had a log2 ratio ± 0.10;
2) at least two tumors within a patient were required to have an
amplification or deletion above the threshold of ± 0.5 for focal and
± 0.25 for arm; 3) an amplification/deletion was considered truncal
if present in > 80% of the tumors within a given patient.
Allele-specific focal copy number profiles were determined for primary
tumors (available for three patients) using the Sequenza package. Circos
plots were generated using segmented GISTIC-output file for all tumors using
circos v0.69–4, for every track the min and max are set to −1
and 1 respectively, values between −0.2 and 0.2 are not shown in the
figure (Mermel et al., 2011).
Arm-level changes as depicted in the copy number phylogenetic trees were
determined using GISTIC. We also determined allele-specific copy number,
corrected for tumor ploidy and purity, from whole exome sequencing data
using the FACETS package (Shen and Seshan,
2016).
RNA-sequencing and data processing
RNA-seq sequencing was performed on 31 out of 37 tumor sites.
RNA-seq was done on Illumina HiSeq2500 platform to yield at least 100
million reads/sample using Illumina TruSeq V4 chemistry at 2 × 125 nt
paired-end. Sequencing reads were aligned with TopHat version 2.0.13 (Trapnell et al., 2009) against the
reference human genome hg19, with UCSC known gene transcripts as the gene
model annotation. Expression on gene and isoform level was quantified with
Cufflinks version 2.2.1 (Trapnell et al.,
2012).
RNA-seq variant calling and mutation validation
For RNASeq variants calling, sequencing reads were first aligned to
hg19 with STAR version 2.4.2a (Dobin et al.,
2013) and then with a second pass alignment to the transcriptome
generated by STAR for each patient. For each identified SNV in WES, its
expression was confirmed by the presence of sequencing reads of the
alternative allele assessed by Samtools mpileup on TopHat generated BAM
files from RNASeq data, whereas alternative reads coverage for indels were
extracted from vcf files generated by the GATK best
practices variant calling on RNASeq (see URLs). 69% of whole exome variants
had a minimum 1X RNA depth and of these expressed variants 59% were
confirmed by RNA-seq (data not shown). 55% of whole exome variants had
minimum of 5X RNA depth and of these expressed variants 69% were confirmed
by RNA-seq (data not shown). Validation rates for different variant types
across all tumor samples were similar (range 42% silent to 56% nonsense)
(see Data S1).
RNA-seq data analysis
Cufflinks outputted FPKM values for each gene were normalized for
all samples within each patient using limma package voom quantile method
(Law et al., 2014). This
expression data was used to predict enrichment scores among immune genes
obtained from CIBERSORT for each sample (Newman et al., 2015) and then using single-sample GSEA (ssGSEA)
from GenePattern. Using the R package “fgsea,” GSEA preranked
we performed to determine enrichment scores for REACTOME pathways. Pathways
with q-value less than 0.05 were considered significantly enriched.
Principal component analysis (PCA) was used to combine clustered samples
prior to conducting this analysis.
Protein Extraction
All but one tumor (RA004 – Li1a) had sufficient tissue for
mass-spectrometry (MS)-based proteomic characterization. About 10–15
mg of tumor tissue fresh-frozen in liquid nitrogen was lysed in 400μl
of urea lysis buffer (20 mM HEPES pH 8.0, 8 M urea, 1 mM sodium
orthovanadate, 2.5 mM sodium pyrophosphate and 1 mM
β-glycerophosphate) using a tissue lyser (QIAGEN). Lysates were
centrifuged at 14,000 rpm at 4°C for 10 mins and the clear
supernatants were transferred to new tubes. Protein concentrations were
determined by the Modified Lowry method (BioRad).
Enzymatic Digestion
The protein lysate was reduced with 45 mM dithriothreitol (Sigma
Aldrich, MO), alkylated with 100 mM iodoacetamide (Sigma Aldrich, MO), and
subsequently digested with modified sequencing grade Trypsin (Promega,
Madison, WI) at 30°C overnight. The digest was then acidified using
0.1% TFA and the peptides were desalted using solid phase extraction C18
column (Supelco, Bellefonte, PA), and vacuum dried in a centrifugal
evaporator.
TMT-Labeling
TMT10plex amine reactive reagents (0.8 mg per vial) (Thermo Fisher
Scientific) were resuspended in 41 μL of anhydrous acetonitrile (ACN)
and all 41 μL of each reagent was added to each sample and mixed
briefly on a vortexer. Reactions were incubated at room temperature for 1 h,
and then quenched by the addition of 8 μL of 5% hydroxylamine for 15
min and then combined at equal amount. All tumor tissues of LUAD patients
RA000, RA003, RA005 and RA006 were pooled together to make a reference
channel and labeled with TMT10-126. In a separate TMT labeling
experiment, tumor tissues from patient RA006 were pooled together to make a
reference channel.
Basic RPLC separation was performed with a XBridge C18, 100 ×
2.1 mm analytical column containing 5μm particles and equipped with a
10 × 2.1 mm guard column (Waters, Milford, MA) with a flow rate of
0.25 mL/min. The solvent consisted of 10 mM triethylammonium bicarbonate
(TEABC) as mobile phase A, and 10 mM TEABC in ACN as mobile phase B. Sample
separation was accomplished using the following linear gradient: from 0 to
1% B in 5min, from 1 to 10% B in 5min, from 10 to 35% B in 30min, and from
35 to 100% B in 5min, and held at 100% B for an additional 3min. A total of
96 fractions were collected during the LC separation in a 96-well plate in
the presence of 12.5 μL of 1% formic acid. The collected fractions
were concatenated into 12 fractions and dried in a vacuum centrifuge. One
tenth of the peptides were injected directly for LC-MS/MS analysis.
LC-MS/MS analyses
Peptides separated/fractionated by basic reversed-phase
chromatography were analyzed on an LTQ-Orbitrap Elite interfaced with an
Ultimate™ 3000 RSLCnano System (Thermo Scientific, San Jose, CA). The
dried peptides were loaded onto a nano-trap column (Acclaim PepMap100 Nano
Trap Column, C18, 5 μm, 100 Å, 100 μm i.d. × 2
cm) and separated on an Easy-spray™ C18 LC column (Acclaim PepMap100,
C18, 2 μm, 100 Å, 75 μm i.d. × 25 cm). Mobile
phases A and B consisted of 0.1% formic acid in water and 0.1% formic acid
in 90% ACN, respectively. Peptides were eluted from the column at 300 nL/min
using the following linear gradient: from 4 to 35% B in 60min, from 35 to
45% B in 5min, from 45 to 90% B in 5min, and held at 90% B for an additional
5min. The heated capillary temperature and spray voltage were 275°C
and 2kV, respectively. Full spectra were collected from m/z 350 to 1800 in
the Orbitrap analyzer at a resolution of 120,000, followed by data-dependent
HCD MS/MS scans of the fifteen most abundant ions at a resolution of 30,000,
using 40% collision energy and dynamic exclusion time of 30 s.
Proteomic Data Analysis
Peptides and proteins were identified and quantified using the
Maxquant software package (version 1.5.3.30) (Tyanova et al., 2016) with the Andromeda search
engine (Cox and Mann, 2008). MS/MS
spectra were searched against the Uniprot human protein database (May 2013,
38523 entries) and quantification was performed using default parameters for
TMT10plex in MaxQuant. Corrected intensities of the reporter ions from TMT
labels were obtained from the MaxQuant search. The relative ratios were
calculated for each channel to the reference channel. These ratios were then
used to predict enrichment scores of overall immune signatures obtained from
CIBERSORT (Newman et al., 2015) and
Xcell (Aran et al., 2017) using
single-sample GSEA (ssGSEA) from GenePattern and for REACTOME pathways by
GSEA preranked through the R package “fgsea.” Pathways with p
value less than 0.10 were considered significantly enriched. Samples were
similarly combined as described for the RNA-seq data analysis. Cufflinks
outputted FPKM values for each gene were normalized for all samples within
each patient using limma package voom quantile method.
Construction and Immunohistochemistry of Tissue Microarray
The physical construction of the TMA followed the guidelines
previously used by the NCI Tissue Array Project. Each tumor from each
autopsy patient was represented by 1 tumor core of 1mm that was taken from
the original paraffin block. Serial 5μm sections were cut from the
TMA block and used for immunohistochemical analysis. We used previously
reported methods for immunohistochemical staining of TMAs (Hewitt, 2004).
Integrating copy number, gene expression and protein abundance
Pearson correlation coefficients (PCCs) were calculated across all
common genes in copy number, gene expression and protein abundance data for
each patient. Prior to calculating PCCs, gene expression data (RNA-seq FPKM)
and protein abundance (protein ratios) were further normalized within each
patient using the limma package with its voom quantile method. 3DPlots were
created using the R package “scatterplot3d.” For arm-level
analyses, normalized gene expression and protein abundance data was
categorized by chromosomal arm. The mean of clusters of tumors, as
determined previously by PCA, were calculated for both sets of data. Ratios
were calculated between clusters and then log2 transformed. Probability
density plots were generated with 1% outliers removed and x axis of plots
were restricted to −1 to 1 (log2 scale) for gene expression and
−0.1 and +0.1 (log2 scale) for protein abundance for visualization
purposes.
QUANTIFICATION AND STATISTICAL ANALYSIS
All figures and graphs were generated using the “ggplot2”
package available through the R statistical program. Linear regression,
correlations and t tests were conducted though the R base packages. All tests
were two-tailed and p values less than 0.05 were considered significant.
DATA AND SOFTWARE AVAILABILITY
The sequencing and genotype data have been deposited at the database of
Genotypes and Phenotypes (dbGaP), which is hosted by the National Center for
Biotechnology Information (NCBI). The MS proteomics data in this study have been
deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomeexchange.org)
via the PRIDE partner repository. The accession numbers for the data reported in
this paper are dbGaP: phs001432.v1.p1 and PRIDE: PXD012845. Additional analyses,
data and code are available at: https://github.com/nitinroper/Rapid-Autopsy-NCI.
KEY RESOURCES TABLE
REAGENT or RESOURCE
SOURCE
IDENTIFIER
Antibodies
CCND1
Agilent
Cat#GA083
Biological Samples
Metastases collected by rapid autopsy
National Cancer Institute, Bethesda,
Maryland
https://clinicaltrials.gov/ct2/show/NCT01851395
Chemicals, Peptides, and Recombinant
Proteins
IFNγ
R&D Systems
Cat#285-IF-100
Critical Commercial Assays
HiSeq 2500 system
Illumina
N/A
Agilent SureSelect XT (All Exon V5 + UTR)
Illumina
N/A
TruSeq Stranded Total RNA Library Prep
Illumina
Cat#20020596
TMT10plex
Thermo Scientific
Cat#90110
XBridge C18
Waters
N/A
LTQ-Orbitrap Elite interfaced with
anUltimateTM 3000 RSLCnano System
Authors: T R Hughes; C J Roberts; H Dai; A R Jones; M R Meyer; D Slade; J Burchard; S Dow; T R Ward; M J Kidd; S H Friend; M J Marton Journal: Nat Genet Date: 2000-07 Impact factor: 38.330
Authors: Steven A Roberts; Michael S Lawrence; Leszek J Klimczak; Sara A Grimm; David Fargo; Petar Stojanov; Adam Kiezun; Gregory V Kryukov; Scott L Carter; Gordon Saksena; Shawn Harris; Ruchir R Shah; Michael A Resnick; Gad Getz; Dmitry A Gordenin Journal: Nat Genet Date: 2013-07-14 Impact factor: 38.330
Authors: Peter J Campbell; Shinichi Yachida; Laura J Mudie; Philip J Stephens; Erin D Pleasance; Lucy A Stebbings; Laura A Morsberger; Calli Latimer; Stuart McLaren; Meng-Lay Lin; David J McBride; Ignacio Varela; Serena A Nik-Zainal; Catherine Leroy; Mingming Jia; Andrew Menzies; Adam P Butler; Jon W Teague; Constance A Griffin; John Burton; Harold Swerdlow; Michael A Quail; Michael R Stratton; Christine Iacobuzio-Donahue; P Andrew Futreal Journal: Nature Date: 2010-10-28 Impact factor: 49.962
Authors: Ramaswamy Govindan; Li Ding; Malachi Griffith; Janakiraman Subramanian; Nathan D Dees; Krishna L Kanchi; Christopher A Maher; Robert Fulton; Lucinda Fulton; John Wallis; Ken Chen; Jason Walker; Sandra McDonald; Ron Bose; David Ornitz; Donghai Xiong; Ming You; David J Dooling; Mark Watson; Elaine R Mardis; Richard K Wilson Journal: Cell Date: 2012-09-14 Impact factor: 41.582
Authors: Dan Robinson; Eliezer M Van Allen; Yi-Mi Wu; Nikolaus Schultz; Robert J Lonigro; Juan-Miguel Mosquera; Bruce Montgomery; Mary-Ellen Taplin; Colin C Pritchard; Gerhardt Attard; Himisha Beltran; Wassim Abida; Robert K Bradley; Jake Vinson; Xuhong Cao; Pankaj Vats; Lakshmi P Kunju; Maha Hussain; Felix Y Feng; Scott A Tomlins; Kathleen A Cooney; David C Smith; Christine Brennan; Javed Siddiqui; Rohit Mehra; Yu Chen; Dana E Rathkopf; Michael J Morris; Stephen B Solomon; Jeremy C Durack; Victor E Reuter; Anuradha Gopalan; Jianjiong Gao; Massimo Loda; Rosina T Lis; Michaela Bowden; Stephen P Balk; Glenn Gaviola; Carrie Sougnez; Manaswi Gupta; Evan Y Yu; Elahe A Mostaghel; Heather H Cheng; Hyojeong Mulcahy; Lawrence D True; Stephen R Plymate; Heidi Dvinge; Roberta Ferraldeschi; Penny Flohr; Susana Miranda; Zafeiris Zafeiriou; Nina Tunariu; Joaquin Mateo; Raquel Perez-Lopez; Francesca Demichelis; Brian D Robinson; Marc Schiffman; David M Nanus; Scott T Tagawa; Alexandros Sigaras; Kenneth W Eng; Olivier Elemento; Andrea Sboner; Elisabeth I Heath; Howard I Scher; Kenneth J Pienta; Philip Kantoff; Johann S de Bono; Mark A Rubin; Peter S Nelson; Levi A Garraway; Charles L Sawyers; Arul M Chinnaiyan Journal: Cell Date: 2015-05-21 Impact factor: 41.582
Authors: Marco Gerlinger; Stuart Horswell; James Larkin; Andrew J Rowan; Max P Salm; Ignacio Varela; Rosalie Fisher; Nicholas McGranahan; Nicholas Matthews; Claudio R Santos; Pierre Martinez; Benjamin Phillimore; Sharmin Begum; Adam Rabinowitz; Bradley Spencer-Dene; Sakshi Gulati; Paul A Bates; Gordon Stamp; Lisa Pickering; Martin Gore; David L Nicol; Steven Hazell; P Andrew Futreal; Aengus Stewart; Charles Swanton Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330
Authors: Pablo Cingolani; Viral M Patel; Melissa Coon; Tung Nguyen; Susan J Land; Douglas M Ruden; Xiangyi Lu Journal: Front Genet Date: 2012-03-15 Impact factor: 4.599
Authors: Serena Nik-Zainal; David C Wedge; Ludmil B Alexandrov; Mia Petljak; Adam P Butler; Niccolo Bolli; Helen R Davies; Stian Knappskog; Sancha Martin; Elli Papaemmanuil; Manasa Ramakrishna; Adam Shlien; Ingrid Simonic; Yali Xue; Chris Tyler-Smith; Peter J Campbell; Michael R Stratton Journal: Nat Genet Date: 2014-04-13 Impact factor: 38.330
Authors: D R Mani; Karsten Krug; Bing Zhang; Shankha Satpathy; Karl R Clauser; Li Ding; Matthew Ellis; Michael A Gillette; Steven A Carr Journal: Nat Rev Cancer Date: 2022-03-02 Impact factor: 60.716
Authors: Richard G Vile; Alan Melcher; Hardev Pandha; Kevin J Harrington; Jose S Pulido Journal: Clin Cancer Res Date: 2021-02-08 Impact factor: 12.531
Authors: Arthur Krause; Luca Roma; Thomas Lorber; Tanja Dietsche; Valeria Perrina; David C Müller; Didier Lardinois; Christian Ruiz; Spasenija Savic Prince; Salvatore Piscuoglio; Charlotte K Y Ng; Lukas Bubendorf Journal: Transl Lung Cancer Res Date: 2021-04
Authors: Subramanian Venkatesan; Mihaela Angelova; Clare Puttick; Haoran Zhai; Deborah R Caswell; Wei-Ting Lu; Michelle Dietzen; Panagiotis Galanos; Konstantinos Evangelou; Roberto Bellelli; Emilia L Lim; Thomas B K Watkins; Andrew Rowan; Vitor H Teixeira; Yue Zhao; Haiquan Chen; Bryan Ngo; Lykourgos-Panagiotis Zalmas; Maise Al Bakir; Sebastijan Hobor; Eva Grönroos; Adam Pennycuick; Ersilia Nigro; Brittany B Campbell; William L Brown; Ayse U Akarca; Teresa Marafioti; Mary Y Wu; Michael Howell; Simon J Boulton; Cosetta Bertoli; Tim R Fenton; Robertus A M de Bruin; Apolinar Maya-Mendoza; Eric Santoni-Rugiu; Robert E Hynds; Vassilis G Gorgoulis; Mariam Jamal-Hanjani; Nicholas McGranahan; Reuben S Harris; Sam M Janes; Jirina Bartkova; Samuel F Bakhoum; Jiri Bartek; Nnennaya Kanu; Charles Swanton Journal: Cancer Discov Date: 2021-05-04 Impact factor: 38.272
Authors: Raul Caso; James G Connolly; Jian Zhou; Kay See Tan; James J Choi; Gregory D Jones; Brooke Mastrogiacomo; Francisco Sanchez-Vega; Bastien Nguyen; Gaetano Rocco; Daniela Molena; Smita Sihag; Prasad S Adusumilli; Matthew J Bott; David R Jones Journal: NPJ Precis Oncol Date: 2021-07-21
Authors: Abby M Green; Rachel A DeWeerd; David R O'Leary; Ava R Hansen; Katharina E Hayer; Katarzyna Kulej; Ariel S Dineen; Julia H Szeto; Benjamin A Garcia; Matthew D Weitzman Journal: EMBO Rep Date: 2021-08-04 Impact factor: 9.071