Hui Cheng1, Xinping Yang1, Han Si2, Anthony D Saleh1, Wenming Xiao3, Jamie Coupar1, Susanne M Gollin4, Robert L Ferris5, Natalia Issaeva6, Wendell G Yarbrough6, Mark E Prince7, Thomas E Carey7, Carter Van Waes8, Zhong Chen9. 1. Tumor Biology Section, Head and Neck Surgery Branch, National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20892, USA. 2. Translational Bioinformatics, MedImmune, Gaithersburg, MD 20878, USA. 3. Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA. 4. Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15260, USA. 5. Division of Head and Neck Surgery, Departments of Otolaryngology, Radiation Oncology, and Immunology, University of Pittsburgh Cancer Institute, Pittsburgh, PA 15232, USA. 6. Department of Surgery, Division of Otolaryngology, Molecular Virology Research Program, Smilow Cancer Hospital, Yale Cancer Center, Yale University Medical School, New Haven, CT 06520, USA. 7. Cancer Biology Program, Program in the Biomedical Sciences, Rackham Graduate School, and the Department of Otolaryngology-Head and Neck Surgery, University of Michigan, Ann Arbor, MI 48109, USA. 8. Tumor Biology Section, Head and Neck Surgery Branch, National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20892, USA. Electronic address: vanwaesc@nidcd.nih.gov. 9. Tumor Biology Section, Head and Neck Surgery Branch, National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20892, USA. Electronic address: chenz@nidcd.nih.gov.
Abstract
Cell lines are important tools for biological and preclinical investigation, and establishing their relationship to genomic alterations in tumors could accelerate functional and therapeutic discoveries. We conducted integrated analyses of genomic and transcriptomic profiles of 15 human papillomavirus (HPV)-negative and 11 HPV-positive head and neck squamous cell carcinoma (HNSCC) lines to compare with 279 tumors from The Cancer Genome Atlas (TCGA). We identified recurrent amplifications on chromosomes 3q22-29, 5p15, 11q13/22, and 8p11 that drive increased expression of more than 100 genes in cell lines and tumors. These alterations, together with loss or mutations of tumor suppressor genes, converge on important signaling pathways, recapitulating the genomic landscape of aggressive HNSCCs. Among these, concurrent 3q26.3 amplification and TP53 mutation in most HPV(-) cell lines reflect tumors with worse survival. Our findings elucidate and validate genomic alterations underpinning numerous discoveries made with HNSCC lines and provide valuable models for future studies. Published by Elsevier Inc.
Cell lines are important tools for biological and preclinical investigation, and establishing their relationship to genomic alterations in tumors could accelerate functional and therapeutic discoveries. We conducted integrated analyses of genomic and transcriptomic profiles of 15 human papillomavirus (HPV)-negative and 11 HPV-positive head and neck squamous cell carcinoma (HNSCC) lines to compare with 279 tumors from The Cancer Genome Atlas (TCGA). We identified recurrent amplifications on chromosomes 3q22-29, 5p15, 11q13/22, and 8p11 that drive increased expression of more than 100 genes in cell lines and tumors. These alterations, together with loss or mutations of tumor suppressor genes, converge on important signaling pathways, recapitulating the genomic landscape of aggressive HNSCCs. Among these, concurrent 3q26.3 amplification and TP53 mutation in most HPV(-) cell lines reflect tumors with worse survival. Our findings elucidate and validate genomic alterations underpinning numerous discoveries made with HNSCC lines and provide valuable models for future studies. Published by Elsevier Inc.
Entities:
Keywords:
BIRC2; FADD; NFE2L2; PIK3CA; SOX2; TP63; The Cancer Genome Atlas; cell lines; copy number; head and neck squamous cell carcinoma; human papilloma virus
Head and neck squamous cell carcinoma (HNSCC) is the sixth leading cancer by
incidence worldwide (Torre et al., 2015).
Various chemical carcinogens (tobacco, alcohol, and betel nut), human papillomavirus
(HPV) infection, and genetic predisposition contribute to the etiology of HNSCC and
to the complex genetic alterations in tumor subsets that differ in prognosis and
response to therapies (Van Waes and Musbahi,
2017). Establishing the functional role of genomic alterations in
pathogenesis and as potential targets for therapy could be accelerated by genomic
and transcriptomic characterization of a large panel of established cell line models
differing in etiologic status. However, the extent to which HNSCC cell lines reflect
the genomic and transcriptional alterations found in tumors remains unclear.Humancancer cell lines have long fostered mechanistic, genetic, molecular,
cellular, and pharmacologic studies that were not possible using tumor specimens
alone. Several large cell line panels, such as the University of Michigan Squamous
Cell Carcinoma (UM-SCC), University of Pittsburgh Cancer Institute (UPCI:SCC), and
other institutional series, have been authenticated genotypically, have been studied
extensively worldwide (Brenner et al., 2010),
and have led to important cancer discoveries and investigational drug development.
However, until the recent advent of massively parallel sequencing, it has not been
possible to comprehensively determine if and how well historic and newer cell line
panels with germline controls model genomic and transcriptomic alterations in tumor
subtypes that differ in etiology, origin, and prognosis.Recently, a comprehensive landscape of genomic and transcriptomic alterations
in HNSCC and other squamous tumors has emerged from The Cancer Genome Atlas (TCGA)
Network (Campbell et al., 2018; Cancer Genome Atlas Network, 2015). TCGA revealed novel
and previously recognized gene and chromosomal region copy number alterations
(CNAs), mutations, and expression clusters and defined their frequency,
co-occurrence, and relationship to common and rare subtypes of HPV(−) and
HPV(+) tumors that vary in prognosis. Intriguingly, although a small subset of
HNSCCs are mutation driven, both cytogenetic and TCGA studies indicate that most
HNSCCs display multiple recurrent CNAs affecting broad and focal chromosome regions
that harbor multiple candidate driver genes (Gollin,
2014). The frequent co-occurring events affecting multiple loci or genes
suggest that alterations in associated pathways may elicit important complementary
or synergistic effects important in tumorigenesis (Thomas et al., 2007). However, it remains to be fully determined whether
these recurrent chromosomal alterations drive corresponding differential gene
expression of one or multiple genes in both tumors and cell lines, where their
potential functional interactions and roles in pathways of biologic and therapeutic
interest may be investigated.To this end, human cell line models in which the complex inter-section of
CNAs, mutations, and gene expression has been defined are needed to investigate
their functional role and therapeutic potential. In particular, the development of
therapies for HNSCC patients with worse clinical outcomes and limited treatment
options would be facilitated by identification of HNSCC lines modeling the genomic
and expression alterations underlying these tumor subsets. Although the mutational
landscape of HNSCC cell line panels has been characterized by targeted or exome
sequencing (Kalu et al., 2017; Li et al., 2014; Pickering et al., 2013), these studies have not included
a comprehensive and integrated characterization of CNAs, mutations, expression, and
pathway analyses of a large panel of HPV(−) and HPV(+) cell lines for
comparison with TCGA tumors.Here, we used whole DNA exome and RNA transcriptome sequencing to
characterize somatic CNAs, mutations, and expression profiles of 15 HPV(−)
and 11 HPV(+) HNSCC cell lines. We found that the recurrent genomic alterations in
cell lines are remarkably consistent with those found in more aggressive tumors,
from which cell lines have traditionally been most readily adapted to culture (White et al., 2007). Genome-wide correlation of
CN (copy number) with expression identified a suite of potential drivers or modifier
genes that differ by HPV status and are of potential biologic and therapeutic
relevance. Furthermore, our findings elucidate and validate genomic alterations
underpinning numerous discoveries made with these widely used and recently derived
HNSCC lines and provide a roadmap for their potential use as models for future
studies of tumor subtypes with worse prognosis.
RESULTS
DNA CNAs Identified in Cell Lines Recapitulate Those Found in Clinically
Aggressive HNSCC Subsets in TCGA
The panel of 15 HPV(−) and 11 HPV(+) HNSCC cell lines included 18
widely studied and genotype-verified lines (Brenner et al., 2010; White et al.,
2007), and 8 recently derived lines, which are those with matched
blood controls (Table
S1A, indicated in the table as blood: Y). CNA was determined from
whole-exome sequence using CONTRA and significant focal CNAs were identified
using GISTIC (Figure
S1A; STAR Methods). HPV(+) cell lines were confirmed to have
detectable HPV reads exceeding the threshold (of 100 reads for HPV(–)
lines), and their HPV integration or non-integration status was determined
(Figure S1B; Table S1B). We defined a
recurrent CNA region as a chromosome region that shows a high probability of
being altered in multiple samples (Figure S2A; GISTIC q value <
0.25). We identified recurrent chromosome gains in 3q, 5p, 7p, 8q, and 11q and
losses in 3p, 5q, 8p, 9p, 10p, 18q, and 21q (Figures 1A and S2A), which are consistent with those observed in tumors in TCGA
(Cancer Genome Atlas Network, 2015),
previous cytogenetic analyses of tumors (Gollin,
2014), and published karyotypes of several of the UM-SCC lines (Van Dyke et al., 1994).
Figure 1.
DNA CN Alterations of HNSCC HPV(−) and HPV(+) Cell Lines Highlighting
Several Alterations Observed in TCGA Datasets
(A) micCircos plot showing landscape of CNAs analyzed using GISTIC for
HPV(−) (n = 15) and HPV(+) (n = 11) HNSCC cell lines. Chromosomes
displaying cytobands are plotted on the outer ring, with the inner rings showing
segmented CNAs of 15 HPV(−) lines (outer band) and 11 HPV(+) lines (inner
band). Regions of recurrent CN gains (red) or losses (blue) and selected genes
observed in TCGA are indicated. See Figure S2A for detailed GISTIC
analysis results and Table
S2 for common CNAs in cell lines and TCGA. (B and C) Modified
Integrative Genomics Viewer (IGV) plots illustrating ordered CNAs for chr 11q13
region (B) for CCND1 and FADD or 11q22 region
(C) for YAP1 and BIRC2/3. Red, amplification;
blue, deletion; white, neutral.
When comparing overall CNAs between recently derived lines and
traditional cell lines, there was no significant difference be-tween recently
derived and traditional HPV(−) lines (t = 1.654, p = 0.198, Welch
two-sample t test), but there was a significant difference between recently
derived and traditional HPV(+) lines (t = 2.769, p = 0.023, Welch two sample t
test) (Figures S2B and
S2C). Notably, there were fewer regions with significant CNAs in the
subset of recently derived HPV(+) VUMC lines (VUMC-SCC 4149, 4755, and 4975), as
summarized in Figures
S2B and S2C.
Interestingly, these VUMC lines were derived under different culture conditions
using keratinocyte growth media and were noted to harbor lower HPV reads at
baseline (Figure S1B)
and lack of integrated HPV (Table S1B), as well as reduction in HPV and proliferative potential
with further passages, consistent with episomal HPV. Exome sequencing
(exome-seq) and CONTRA also displayed several broader regions with apparent
increase in CN on chromosome (chr) 2, 4, 6, and 10 relative to control DNA that
have not been broadly reported. Multiple amplification peaks on chr 2q were
detected by GISTIC in TCGA HNSCCs (Cancer Genome
Atlas Network, 2015), including 2q31, which harbors
NFE2L2. NFE2L2 is a master transcription factor that
regulates the genes involved in oxidative stress. Studies have found that gene
signatures regulated by the NFE2L2 are associated with tumorigenesis and drug
resistance in HNSCC (Namani et al.,
2018). Broad CN gains on chr 4, 6, and 10 were not detected using GISTIC
in TCGA tumors. Lower level or more focal gains in chr 2, 4, 6, and 10 can be
sorted and visualized in subsets of TCGA tumor SNP array data using Integrated
Genome Viewer (IGV) (Figure
S3). Furthermore, although the apparent gains in chr 4 and 10 could
be independently replicated by exome-seq using an Illumina platform with CONTRA
analysis, those for chr 2 and chr 6 displayed a lack or narrower regions of
gains, respectively (Figure
S4). Thus, it appears CN estimation based on exome-seq may
overestimate CN for certain chromosomal regions (Fromer et al., 2012).Several of the regions of broad or focal amplification and deletion
harbor genes implicated in development and pathogenicity of HNSCC tumors (Figure 1A; Table S2). Notably, gain of
3q26–q28 is a dominant genomic alteration found in all of the
HPV(−) and HPV(+) HNSCC cell lines included in this study (Figure 1A). This amplicon harbors more than 70 genes
(Tables S3A and
S3B), including
several already experimentally implicated in HNSCC growth signaling
(PIK3CA), stemness (SOX2), squamous cell
differentiation, survival, migration, inflammation and angiogenesis
(TP63), and cell immortalization (TERC)
(Cao et al., 2008), as well as
numerous less well characterized genes. Chr 11q harbored two recurrent focal
amplifications found in TCGA (Figures 1B
and 1C). Locus 11q13 (Huang et al., 2006) contains FADD,
CCND1, CTTN, FGF3/4/19,
PPFIA1, MYEOV, and ANO1,
and 11q22 harbors BIRC2/3, YAP1, and
PDGFD. Several of these genes have been reported to have
individual or overlapping roles in deregulation of cancer cell cycle
(CCND1, FADD), cell death and nuclear
factor kB (NF-kB) signaling (FADD, BIRC2/3),
growth factor (FGFs, PDGFD), and Hippo
growth-regulatory (YAP1) pathways (Table S3A). Consistent with data
from TCGA, 11q13/22 were predominately amplified in HPV(−) lines. Other
common CNAs (Gollin, 2014) and genes
potentially enhancing cell growth and survival include amplifications of
oncogenes on 7p(EGFR) and 8q(MYC) (Table S2).Focal CN losses found in HPV(−) lines and tumors include deletion
of 9p21 (CDKN2A/B/p16) (Figure
1A), which has been implicated as an early event deregulating cell
cycle control, and losses of other putative tumor suppressor genes (TSGs) on 8p
(TNFRSF10B), distal 11q (ATM), and 18q
(SMAD4, GALR1; Table S3). Notably, in the HPV(+)
panel, we found only one line with low-level copy loss of 14q32.32
(TRAF3), in contrast to deletion and/or mutation observed
in 20% of HPV(+) HNSCC tumors in TCGA (Cancer
Genome Atlas Network, 2015), which are associated with better
prognosis (Hajek et al., 2017). Several
understudied focal amplifications (5p15, 8p11), deletions (5q, 10p, 21q), and
associated genes were identified in HNSCC cell lines, which have also been
associated with aggressive clinical-pathologic features of HNSCC (Tables S2 and S3A). Therefore, the CNAs
identified in many of these HNSCC lines that were derived from patients with
poor clinical outcomes similarly reflect those found among tumors with greater
CNA and worse clinical features in TCGA.
Correlation between CNAs and mRNA Expression with Concordance in HNSCC Cell
Lines and Tumors
To explore and compare how these CNAs affect the transcriptome in HNSCC
tumors and lines, the associations between CNV and mRNA abundance for 17,040
genes were determined, and the overall concordance between CNA and gene
expression for HNSCC tumors and cell lines was illustrated (Figure S5; p < 2.2e-16,
rs = 0.56, Spearman coefficient test). Among these, 4,500 genes
displayed significant CNV-expression correlation in both tumors and cell lines
(Figure S6; false
discovery rate [FDR] < 0.05, Spearman co-efficient test). Figure 2 highlights examples of significant
correlations between CN gains and expression for several oncogenes in HNSCC
lines and tumors from TCGA. These include 3q oncogenes PIK3CA
and TP63 in both HPV(−/+) subtypes; 11q13/22 genes
CCND1, FADD/BIRC2, and
YAP1, as well as 7p21 gene EGFR, mainly in
HPV(−) lines. Conversely, CN loss of TSGs predominantly correlated with
decreased expression, as illustrated by 8p gene tumornecrosis factor receptor
superfamily member 10B/death receptor 5 (TNFRSF10B/DR5), also
known as TRAIL receptor 2 (TRAILR2). The concordance between TCGA tumors and
cell lines in terms of CNA and gene expression was also examined for individual
chr 1, 2, 3, 4, 5, 6, 10, and 11, which were found to have broad or focal CNAs
using exome-seq data and CONTRA analysis (Figure S5). Analysis of cell lines
and TCGA tumors revealed higher concordance (r > 0.6) and statistical
significance for chromosomes 3 and 11 with high CN gains, consistent with prior
studies. Significant concordance but with lower r values (<0.6) were
observed between cell lines and tumors for chr 1, 2, 4, 5, 6, and 10, which
displayed CNAs in cell lines by exome-seq and CONTRA analysis. Overall, our
analysis provides significant validation for concordance between CNA and mRNA
expression and indicates that gene CN and expression of the broader
transcriptome are highly consistent between the cell line panel and tumors from
TCGA.
Figure 2.
Correlation between CNA and mRNA Expression Observed in TCGA Datasets and
HNSCC Cell Lines
Scatterplots for selected genes showing significant CNA and gene
expression correlation in both HNSCC TCGA datasets (left, n = 279) and cell
lines (right, n = 26). X axis, log2 CN ratio; 0 is diploid, 1 is
one-copy loss, 0.585 is one-copy gain, and values larger than 1 are
amplifications. Y axis, log2 gene expression from normalized RNA-seq
read count. Spearman coefficient test, with p values presented. HPV(+), red
triangles; HPV(−), gray circles. See Figure S5 for concordance between
TCGA and cell lines in the correlation of CNV and gene expression.
Gene Expression Driven by CN Gains Is Clustered on Chromosomes 3q, 5p, 8p,
and 11q
We explored further how extensively gene expression in cell lines is
driven by chromosomal amplifications. We conducted a supervised hierarchical
clustering analysis of fold changes in gene expression between HNSCC cell lines
and primary human oral keratinocytes (Figure
3). Remarkably, increased expression of the top 103 genes correlated
with CN gain were found to co-cluster with the recurrent amplicons on
chromosomes 3q22-qter (90 genes), 5p15 (7 genes), 11q13 (5 genes), and 8p11 (1
gene). Most lines clustered according to HPV status or patient origin. Three
paired lines derived from primary and recurrent tumors from the same patient
clustered together, while primary and metastatic lines UM-SCC22A/B clustered
separately. Three recently derived HPV(+) VUMC lines clustered together and
showed more limited expression of genes on chromosomes 3q, 5p, and 11q,
consistent with overall lower number of significant CNAs detected in these lines
(Figure S2B).
Figure 3.
Hierarchically Clustered Heatmap of Significantly Amplified and Expressed
Genes in HNSCC Cell Lines and TCGA Datasets
Heatmap of supervised hierarchical clustering from RNA-seq data shows
expression significantly correlated with amplification for top 103 concordant
genes in 26 HNSCC cell lines and 279 tumors from TCGA dataset. Columns, HNSCC
cell lines; rows, genes; top, HPV status; key and colored bar on the right, chr
regions. Fold expression compared with normal: red, increased; blue, decreased);
white, no change. Cell lines with colored font represent paired cell lines
derived from the same patient, with different color for each pair. See Figure S6 for workflow of
the integrated analysis and Table S3A for annotation of genes identified in this analysis.
We performed a literature-based review, which revealed that many of
these co-clustered genes represent understudied candidate oncogenes, which
warrant further investigation (Table S3A). For example, expression of several genes at 5p appeared
to be distributed among both HPV(−) and HPV(+) lines. The 5p15 amplicon
cleft-lippalate gene CLPTM1L may promote resistance to
mitochondrial cell death and cisplatin (Ni et
al., 2012). TRIP13 was recently reported to foster
error prone DNA non-homologous end-joining and chemoresistance in HNSCC (Banerjee et al., 2014).
BRD9 is a bromo-domain family chromatin modifier reported
to enhance MYC oncogene expression in AML. The 11q13 genes
CCND1, CTTN, and FADD
were more prominently expressed in a subset of HPV(−) than in HPV(+)
lines, except for 93VU147T derived from a smoker, concurring with TCGA and
another study (Ragin et al., 2006). Most
HPV(+) lines exhibited higher expression of 3q27 gene DVL3
implicated in Wnt-b-catenin pathway signaling. Other potentially interesting
3q26-qter candidates expressed across the lines include calmodulin kinase II
CAMK2N2, implicated in nicotine-induced activation of
transcription factor NF-kB and inflammatory and angiogenesis factor IL-8 (Tsunoda et al., 2016), and ephrin
EPHB3, reported to promote migration in lung cancer (Efazat et al., 2016). In contrast,
HPV(−) lines and HPV(+) pair UPCI:SCC090/152 derived from a patient with
primary and recurrent tumors who died within 22 months more often showed broader
3q arm gains with increased expression of MAP3K13 (LZK), a
kinase recently reported to stabilize mutant TP53 in HNSCC (Edwards et al., 2017); ACTL6A, which
cooperates with 3q gene TP63 and 11q22 Hippo pathway gene
YAP1, linked to growth and poor patient survival in HNSCC
(Saladi et al., 2017); PI3K-mTOR-eIF
growth pathway (Iglesias-Bartolome et al.,
2013) genes PIK3CA, EIF4A2,
EIF2A, EIF2B5, and
EIF5A2; and ZNF639 (ZASC), a zinc finger
transcription factor associated with recurrent oral squamous cell carcinoma
(SCC) (Chiang et al., 2011). The
FGFR1 gene at 8p11 is a potential target for FGFR
inhibitors (Dutt et al., 2011). Together,
these findings suggest many of these CN-driven alterations in expression
converge on common cancer-related pathways or functions. We also performed
supervised clustering of mRNA expression for 832 classifier genes previously
used to define HNSCC subtypes (Cancer Genome
Atlas Network, 2015). Compared with the subtyping observed by mRNA
expression in TCGA tumors (Cancer Genome Atlas
Network, 2015), some HNSCC cell lines distinctly mapped to certain
molecular subtypes (Figures
S7A and S7B): UM-SCC11A, B and 74 A,B co-clustered with a mesenchymal-type
expression signature; UM-SCC46 and UPCI:SCC154 were closest to the atypical
sub-type; and UM-SCC104 was closest to the basal subtype. Alternatively, heatmap
display of the correlation-based distances between centroids for HNSCC cell
lines and expression sub-types in tumors from TCGA shows how cell lines overlap
the profiles of one or more TCGA tumor subtypes (Figure S7B). Among these, the cell
lines UPCI:SCC90 and 152 from the same patient showed the shortest distance from
the classical subtype but also displayed expression pattern of the atypical
subtype. Most of the remainder either were close to basal or did not have strong
expression patterns close to any subtypes.
Integrated Analysis Reveals Differential Gene Expression in Distinct
Signaling Pathways by HPV Status
To further characterize the potential relationship of concordant genomic
and expression alterations to signaling path-ways, we selected 1,544 genes that
showed significant and consistent amplification or deletion in both tumor
specimens and HNSCC cell lines (Figure S6; GISTIC q values <
0.25). Empirically selecting as a cutoff an absolute expression fold change (FC)
> 1.5, filtering yielded 492 and 440 genes from HPV(−) and HPV(+)
lines, respectively. Ingenuity Pathway Analysis (IPA) identified distinct
canonical pathways and networks enriched for genes displaying CNA-related
differences in expression for HPV(−) and HPV(+) cell lines (Table S4). These
overlapping IPA networks include growth factor receptor/PI3K-mTOR-eIF-protein
synthesis and NF-kB pro-survival signaling; p53/death receptor/mitotic DNA
damage checkpoint arrest, inflammation, angiogenesis, and viral entry; and
stemness and differentiation (Figure 4;
Table S4).
HPV(−) and HPV(+) HNSCC cell lines overexpressed pathway components in
PI3K signaling (FGFR1, PIK3CA) and WNT
differentiation signaling pathways (DVL3). Increased
CCND1, FADD, and BIRC2/3
and decreased CDKN2A and TNFRSF10B, involved
in cell cycle and death pathways, were predominant in HPV(−) cell lines
(Figure 4A). In-contrast, greater
increased CDKN2A (p16) and decreased death pathway-related
genes ATM (11q22-qter), ATR,
RFC3, MRE11A, and PPP2R1B
occurred in HPV(+) cell lines (Figure 4B).
Together, these findings underscore potentially important differences in genomic
and transcriptomic alterations in signaling pathways in HPV(−) and HPV(+)
HNSCCs.
Figure 4.
Integrated Analysis Reveals Genomic and Expression Alterations in Signaling
Pathways in HNSCC Differing in HPV Status
See Figure S6
for workflow of analysis and Table S4 for detailed pathway analysis results.
(A and B) The key affected pathways, components, and cellular functions
in (A) HPV(−) and (B) HPV(+) cell lines are presented.
The altered genes were analyzed for corresponding pathways by Ingenuity
Pathway Analysis (IPA), and the workflow is summarized in Figure S6. Additional information
is summarized in Table
S4. Color key, fold expression compared with diploid: red, increased;
green, decreased. Function, activation (arrows), inhibition (block). Large
circles indicate biological complex, and ellipses indicate transmembrane
receptors.
Mutations and Variants in HNSCC Lines Compared with Tumors from TCGA
To identify candidate somatic mutations in lines without or with matched
DNA controls, we developed a custom pipeline that prioritized infrequent
variants with minor allele frequency (MAF) < 1% and somatic and
functionally deleterious mutations identified by COSMIC and Mutation Assessor
(Figure S8). By
filtering against the 396 top mutated genes discovered by MutSig in TCGA tumors,
we identified 155 mutations occurring in 104 genes in the cell lines.
Conservation of a footprint-like pattern of recurring mutations was observed in
four cell line pairs derived from the same patient (Figure 5A). Significantly mutated genes in TCGA were
also frequently mutated in cell lines, including established and putative TSGs
involved in genomic stability, cell cycle, and survival (TP53, RB1,
CASP8), squamous differentiation (NOTCH1), Hippo
signaling (FAT1), growth factor receptor-PI3K-AKT/PKB signaling
pathway (IGF1R, MET, ERBB2, EPHA2, PTEN), histone lysine
methylation (KMT2D), and immunosurveillance (HLA-B and
HLA-A) (Figures 5A, 5B, and S9; Tables S5 and S6A). Mutations in genes of those
lines that overlap a recent whole-exome sequencing study showed high concordance
(Kalu et al., 2017). Remarkably,
whereas PIK3CA was amplified across both HPV(_) and HPV(+) cell
lines in this panel, we did not find any of the hotspot activating mutations of
PIK3CA that occur in _21% of the TCGA dataset. Lack of
PIK3CA mutation in cell lines was validated by manual
review of ExomeSeq and RNA sequencing (RNA-seq) data of these cell lines and
independent targeted sequencing (Kalu et al.,
2017). As PIK3CA hotspot mutations were also
observed relatively infrequently by targeted sequencing in other HNSCC lines
(Mazumdar et al., 2014), these
observations suggest that HPV(_) and HPV(+) tumors harboring 3q amplifications
associated with more aggressive tumors may be more readily selected and adapted
to cell culture. Similarly, we also did not observe mutations in
NFE2L2. As NFE2L2 gains were observed in
TCGA, we analyzed CN variation and gene expression of NFE2L2
locus in the 26 cell lines. Interestingly, the IGV CN heatmap and scatterplot
show that all the cell lines except UPCI:SCC90, 152 have at least one copy gain,
and many of the lines show increased NFE2L2 expression. The
expression and CN of NFE2L2 are highly correlated (r = 0.63)
and significant (p = 0.00058) (Pearson correlation).
Figure 5.
Mutated Genes in 26 HNSCC Cell Lines Consistent with TCGA Datasets
(A) Frequently mutated genes (columns) are presented by mutation
frequency, as well as additional genes of biological interest in HNSCC. Cell
lines (rows, n = 26) are arranged by HPV status. Cell lines with colored font
represent paired cell lines derived from the same patient, with different color
for each pair. Cell lines annotated with asterisks had matched blood DNA for
comparison. Color shading indicates mutation types: green, missense; red,
nonsense; purple, splice site; yellow, frameshift deletion; blue, other
mutations. See Figure
S8 for mutation detection process and Tables S5 and S6A for detailed mutation results.
(B) Lollipop plots show the distribution and classes of mutations in TSGs:
TP53, NOTCH1, and MYH9
across HNSCC cell lines. The predicted impact of mutations detected is shown by
transcript base position and functional domain, where color of the lollipop
represents mutation types corresponding to the mutation classes in (A). Colored
bars represent protein domains or motifs. The y axis indicates the observed
number of mutations in 26 HNSCC cell lines. See Figure S9 for mutation diagrams of
other genes implicated in HNSCC.
Our HNSCC panel includes cell lines with rare gene mutations observed in
HNSCC tumors in TCGA or other studies (Tables S5 and S6A). A homozygous Q1186* nonsense
mutation in MYH9 was discovered in UM-SCC 110, derived from an
unusually young 39-year-old male smoker with an HPV(−) anterior tongue
tumor who died of early recurrence (Figures
5B and 6; Tables S1A, S5, and S6A). Heterozygous mutation was
observed in matched blood, and a loss of heterozygosity (LOH) was observed in
the cancer cell line, consistent with a second hit loss of the wild-type allele.
This MYH9 associated cancer also had TP53,
NOTCH1, and EP300 mutations associated
with smoking-related HPV(−) HNSCC. TGFb pathway SMAD4
nonsense mutations were seen in the primary and metastatic pair UM-SCC-22A, B.
Other less frequent mutations (Figure 5A;
Table S5), and
seven genetic polymorphisms implicated as prognostic factors were also
identified (Table S6B).
For more abundantly expressed genes and common alterations, we found high
concordance between mutations detected by exome-seq and RNA-seq (Tables S5 and S6A).
Figure 6.
HNSCC Cell Lines Recapitulate the Spectrum of CNAs and Mutations Found in
HNSCC Tumor Subtypes by TCGA
Integrated summary of chr and oncogene amplification and TSG mutation
and deletion for 26 HNSCC cell lines are illustrated by an oncoprint. Cell lines
are displayed in columns and grouped by HPV status. Cell lines with colored font
represent paired cell lines derived from the same patient, with different color
for each pair. Genes with chr location are displayed in rows and grouped by
putative function (oncogenes and tumor suppressors) and similarity of alteration
patterns. Copy gains: red, high-level; pink, low-level; blue, homozygous
deletions; aqua, heterozygous deletions; green bar, mutations, missense; black
bar, truncating mutations.
Integration of Genetic Alterations Highlights Potential Therapeutic Targets
for Preclinical Studies
We summarized the major CNAs and mutations for key oncogenes and TSGs in
HNSCC (Figure 6). Significantly, these
lines included genomic alterations in 3q,11q, and TP53 observed
frequently in TCGA dataset. Our previous publications demonstrated the effects
of genetic and/or pharmacologic modulation of 3q PIK3CA, the
DNp63 isoform of TP63, and
SOX2, as well as 11q FADD and
BIRC2 in distinct subsets of lines from this panel (Mohan et al., 2015; Yang et al., 2011; Eytan et al., 2016). We also analyzed the genomic and
transcriptionally altered genes from Figures
3 and 5 against the drug
databases in MetaCore and IPA. Altered genes with potential as therapeutic
targets are represented among the cell lines as potential models for drug
screens (Table S7).
Potentially interesting examples of drug targets include recently characterized
oncogenes, such as FGFR1 at 8p (AZD4547 and BGJ398) (Chae et al., 2017) and LZK
at 3q26 (lestaurtinib) (Davis et al.,
2011).
3q26.3 CN Gain and TP53 Mutation Co-occurrence Is Associated with Worse
Clinical Outcome in HPV ( ) Tumors
It is evident that many HNSCC cell lines are preferentially adapted from
tumors of patients with a poorer prognosis and who died within 2 years (Table S1A).
Interestingly, all cell lines displayed 3q26–q28 CN gain, whereas 3q CN
gain is observed in 35% of tumors in TCGA (low-level CN gain, 28%; high level CN
gain, 6.5%), and overall, 3q26–q28 CN gain is observed in 69% tumors,
whereas others appear to be driven by 3q gene PIK3CA mutations
and/or other alterations. To confer a selective advantage in tumors and cell
culture, genetic alterations often cluster in a set of essential pathways and
act in concert (Ciriello et al., 2012;
Gross et al., 2014). In this study,
we observed that most of the HPV(−) cell lines display 3q26.3 CN gain and
TP53 mutation, which also significantly co-occurred in 159
of 243 HPV(−) TCGA tumors (p = 3.3e-06, two-sided Fisher’s exact
test; Figure 7A). We explored how 3q
amplification that is linked with expression of multiple putative oncogenes and
mutations of 3q gene PIK3CA, and mtTP53,
affect prognosis. Among 243 TCGA HPV(−) HNSCCs, those containing
TP53Mut-3q26.3Amp had substantially
worse prognosis than those with TP53Mut or
3q26.3Amp alone, with a hazard ratio (HR) of 2.2 (p = 0.04, log
rank test) (Figure 7B). The 3q26.3 segment
encompasses 30 protein-coding genes, including PIK3CA,
ACTL6A, SOX2, TP63, and
others implicated in cancer stemness, growth, proliferation, and differentiation
(Table S3B). Among
these genes, three have been reported to be p53 targets (Fischer, 2017): ECT2 (epithelial
cell transforming 2) (Scoumanne and Chen,
2006), ZMAT3/WIG1 (zinc finger, matrin-type 3)
(Bersani et al., 2014), and
PIK3CA (Astanehe et al.,
2008). When we compared the overall survival for patients with
different combinations of 3q26.3Amp and PIK3CA
mutation (Figure 7C), 3q26.3Amp
with wild-type PIK3CA was associated with a worse prognosis
than PIK3CA mutation alone or
PIK3CAMut-3q26.3Amp (HR = 1.4, p =
0.3, log rank test; Figure 7D). We further
investigated how known activating PIK3CA hotspot mutations and
3q26.3Amp comparatively affected survival. We identified patients
with PIK3CA hotspot mutations (c.1624G > A[E542K],
c.1633G > A[E545K] and c.3140A > G [H1047R]) in TCGA data and
compared survival between patients with 3q26.3 amplification versus those with
PIK3CA hotspot mutations. We found that patients with
3q26.3 amplification also had worse overall survival than patients with
PIK3CA hotspot mutations (HR = 2.1, p = 0.29, log rank
test). Interestingly, HPV(–) HNSCC patients with the
PIK3CA hotspot mutations trended toward a better prognosis
compared with patients without either alteration events, but the significance of
the difference could not be established with the limited sample size in our
analysis (HR = 0.7, p = 0.64, n = 7, survival > 60 months). These results
support a role for other co-occurring genes on 3q or other chromosomes that are
altered in higher CNA sub-sets of HNSCC that display worse prognosis.
Figure 7.
Co-occurrence of TP53 Mutation and 3q26.3 Amplification Is Associated with
Poor Survival in HPV(−) HNSCC Patients from TCGA Datasets
See Table S3B
for annotation of amplified gene on chr 3q26.3.
(A) The pie chart represents the percentage of HNSCC patients with
co-occurrence of TP53 mutation
(TP53Mut) and 3q26.3 amplification
(3q26.3Amp), either alteration alone or none of the event. A
significant co-occurrence was observed between TP53 mutation
and 3q26.3 amplification (p = 3.3e-06, Fisher’s exact test).
(B) Overall survival outcome for four combinations of 3q26.3
amplification and TP53 mutation events in patients by
Kaplan-Meier analysis (colors correspond to the case subsets in A). P values
were determined using a log rank test.
(C) Distribution of patients with different combinations of
3q26.3Amp and PIK3CA mutation.
(D) Assessment of the overall survival outcome for four combinations of
3q26.3 amplification and PIK3CA mutation events in HNSCC
patients.
DISCUSSION
We report a comprehensive genomic and transcriptomic characterization of 26
HNSCC cell lines, including 15 with HPV(−) and 11 with HPV(+) status, and
performed a systematic comparison with genomic alterations in tumors from TCGA.
These cell lines reflect common and rare genomic and transcriptomic alterations
uncovered in clinically advanced and aggressive tumors by TCGA and other genomic
studies. Through integrated genomic, expression, and pathway analysis, we elucidate
deregulated gene expression involved in critical oncogenic pathways, under-pinned by
3q amplification and other shared or subset-specific alterations characteristic of
HNSCC with different HPV status. The biological, clinical, and therapeutic relevance
of some of these alterations, pathways, and functions has been anticipated and
validated by hypothesis-driven experimental studies reported by us and others using
long-established cell lines included in this panel (Conti et al., 2015; Eytan et al.,
2016; Herzog et al., 2013; Mohan et al., 2015). Recent reports also
support the utility of several HNSCC lines in this panel with mtTP53 in functional
genetic or pharmacologic screens demonstrating pre-clinical activity of WEE1 kinase
inhibitors (Moser et al., 2014). Together,
these studies have highlighted the importance of a network of pathways in promotion
of the malignant phenotype and therapeutic resistance, and of investigating the
contributions of genomic and transcriptomic alterations, which was here-tofore
limited to selected gene candidates.TCGA HNSCC study showed that patients from a subset with higher CNAs,
including 3q, 5p, and 11q amplification, have a worse prognosis (Cancer Genome Atlas Network, 2015). Here, we find that
these alterations are highly represented and associated with broad alterations in
gene expression across biologically and therapeutically relevant pathways in both
tumors and the long-and recently established HNSCC lines included in this study. It
has been evident that cell lines from HNSCC and other cancers are often more readily
adaptable from tumors associated with worse clinical outcomes, suggesting they
harbor alterations conducive to growth both in vivo and in
vitro (White et al., 2007).
Strikingly among these, we observed 3q amplification with higher frequency in both
HPV(−) and HPV(+) lines in this panel, than was observed in tumors by TCGA.
Furthermore, none of our 26 HNSCC cell lines harbored hotspot activating mutations
of the 3q gene PIK3CA, consistent with our finding that TCGA HNSCC
patients with 3q26.3 amplification had worse overall survival than those with
mutations of PIK3CA.These observations suggest that 3q26 and other CNAs may harbor and drive
altered expression of multiple genes that contribute to worse prognosis as well as
cell culture adaptability. Consistent with this hypothesis, integration of mRNA
expression and CNA data from the cell lines and TCGA tumors uncovered significant
overlap, supporting the contribution of 3q, 5p, 8p, and 11q amplifications to
increased expression of diverse genes that potentially contribute to HNSCC
pathogenesis. Remarkably, among the 103 most significantly amplified genes with
elevated expression in HNSCC cell lines and tumors, 90 of these genes were from
3q22-qter, with the remainder from 5p15, 11q13, and 8p11. We annotated multiple
interesting growth and signaling pathway-related gene alterations, which merit
further biological and preclinical investigation. These cell lines may therefore
represent useful models for investigation of the role of CN-driven expression of
clustered genes that contribute to biologic features and therapeutic resistance of
aggressive HNSCC subtypes.There are multiple candidate gene drivers on 3q, which are implicated in
growth, metabolism, and stemness in cell line models. Early studies indicated
TP53 family (Yamaguchi et al.,
2000) oncogene TP63 (Yang and McKeon, 2000), PI3-kinase subunit PI3KCA,
protein kinase C PRKCI (Justilien
et al., 2014), stemness gene SOX2 (Justilien et al., 2014) as well as ECT2
(Justilien and Fields, 2009), and
FXR1 (Majumder et al.,
2016) genes as drivers in lung, esophagus, or HNSCC. It is increasingly
appreciated that these and recently characterized 3q genes could collaborate with
one another or with genes encoded by other genomic alterations in SCCs (Justilien and Fields, 2009; Majumder et al., 2016). We have shown that both the DNp63
iso-form of TP63 and PIK3CA overexpressed by
UM-SCC lines in the current panel promote activation of NF-kB transcription factors
and gene programs that promote proliferation, migration, and inflammation (Yang et al., 2011). Furthermore, the
PRKCI and SOX2 oncogenes are co-amplified and
have been implicated in cooperative activation of Hedgehog signaling in lung SCC
(Justilien et al., 2014). Recently, the
ACTL6A gene has been reported to be co-amplified with
TP53 family oncogene TP63 on 3q, and together
with Hippo pathway gene YAP1 amplified on 11q13, cooperate to drive
proliferation and poor prognosis of HNSCC (Saladi et
al., 2017). Several other components of the PI3K-mTOR
(EIF4A2, EIF2B5, EIF2A,
EIF5A2) and WNT (DVL3) pathways that are
co-amplified on 3q and overexpressed in HNSCC cell lines and tumors may also merit
further investigation.Our analysis of TCGA data reveals that patients with aggregate events of
3q26.3 gain and TP53 mutation are associated with a significantly
worse prognosis. Intriguingly, supplemental data in a study of 250 TCGA
HPV(−) HNSCCs (Gross et al., 2014)
also support a stronger association between mutation of TP53 with
3q26.3 gain than with gains in other chromosomal regions. The aforementioned study
focused upon TP53 mutation and 3p loss, although the question
remained as to what genes encoded on chr 3p or other co-occurring chromosomal
alterations are responsible for the interaction with TP53 (Gross et al., 2014). Numerous cytogenetic
studies have found that the coordinate loss of 3p, and gain of 3q26.3, often
co-occur as a consequence of an isochromosome formation, resulting in duplication of
distal 3q and loss of 3p (Gollin, 2014).
Strikingly, we found that deletion of 3p significantly co-occurred with CN gain of
3q26.3 in 135 of 243 HPV(−) TCGA tumors (p = 1.07e-06, two-sided
Fisher’s exact test) but not with overall amplification of 3q (p = 0.59,
two-sided Fisher’s exact test). Our findings of significant co-occurrence of
3q26.3 gain and TP53 mutation, and reports identifying p53 target
genes on 3q26.3, provide a possible explanation for the highly significant
relationship between 3q26.3 amplification, TP53 mutation, and worse
prognosis. Gain of 3q26.3, loss of 3p, and mutation of TP53 appear
to be acquired early in the pathogenesis of HNSCC, suggesting an important role in
early oncogenesis (Leemans et al., 2011), and
the potential for broader gene interactions among these loci merit further
investigation. TP53 mutation has long been associated with poorer
prognosis for patients but has been insufficient alone in predicting outcome (Cancer Genome Atlas Network, 2015). The role of
inactivating mutations of TP53 in enhancing the overexpression of
the PIK3CA gene has been reported (Astanehe et al., 2008). In addition, we uncovered a mitochondrial
biosynthesis gene signature among 3q genes. Patients with Li-Fraumeni syndrome (LFS)
carrying germline TP53 mutations have increased abnormal
mitochondria function compared with healthy volunteers (Wang et al., 2013). Mean-while, genetic or pharmacologic
disruption of mitochondrial respiration improved cancer-free survival in a mouse
model of LFS (Wang et al., 2017). These
studies, together with our observation that amplification and overexpression of
multiple 3q26.3 genes are associated with TP53 mutation or
inactivation by HPV, suggest TP53 may modulate broader 3q26.3 gene
programs and signal networks that regulate bioenergetic homeostasis, growth,
stemness, and differentiation.HNSCC cell lines recapitulated other frequent CNAs and mutations observed in
tumors from TCGA that affect death pathways and that were recently implicated as
therapeutic targets using these cell lines. Intriguingly, HPV(−) tumors and
lines exhibit alterations of the TRAILR2-FADD-IAP-CASP8 axis, while HPV(+) tumors
and lines displayed copy loss and decrease in expression of ATM, ATR, and other
death pathway components. Using several HPV(−) lines from this panel, we
recently showed that FADD amplification and overexpression can
serve as an Achilles’ heel that sensitizes HNSCC to IAP antagonists in
combination with exogenous death factor TNF or radiation (Eytan et al., 2016). We observed NFE2L2
gains and expression more frequently than mutations compared with TCGA. We did not
detect mutations such as HRAS or co-occurrence with the
heterozygous CASP8 mutation found in UM-SCC-1, which are seen in
small subset of TCGA HNSCC tumors lacking 3q and 11q gains.These cell lines displayed rare alterations as well as recurrent mutations
and CN losses of putative TSGs consistent with TCGA and prior literature reports. A
unique line derived from a tongue cancer displayed a rare homozygous
MYH9 gene mutation compared with heterozygous germline
mutation, consistent with loss of the wild-type (WT) allele. As we showed that
MYH9 knockout mice develop aggressive oral HNSCC (Conti et al., 2015), this line may facilitate
studies of how MYH9 functionally contributes as a TSG. HPV(+) HNSCC
cell lines, pre-dominantly from oropharyngeal tumors, exhibited infrequent mutation
or genetic loss in CDKN2A/RB1 or TP53 components,
which are targeted by HPV oncoproteins. However, these cell lines displayed copy
loss and decreased expression of DNA damage response genes ATM and ATR, which
function as up-stream activators of WT TP53 and could augment the death defects
attributed to TP53 inactivation by HPV. Interestingly, somatic
aberrations in DNA-repair genes, including ATM, were also reported
in HPV(+) tumors (Seiwert et al., 2015). In
addition, one HPV(+) line harbored a RB1 frameshift deletion, which
could also magnify the effects of RB inactivation by HPV E7 protein.The apparent gains on broad regions of chromosomes 2, 4, 6, and 10 were
investigated using several approaches. Broad low-level or focal gains were also
observed in subsets of TCGA tumors (chr 2, 4, 6, and 10); these apparent gains
observed in cell lines and subsets of TCGA tumors also significantly affect
expression of genes in these regions in HNSCC lines and TCGA tumors, as demonstrated
by concordance between cell lines and TCGA tumors. Some of these chr alterations
were observed in previous studies. Amplifications of 2q, 4q, 6p, and 6q were
described in the invasive front area of laryngeal carcinoma and associated with
aggressive tumor progression (Ambrosio et al.,
2013). The 6p21 region harbors oncogenes CCND3,
HMGA1, cell cycle regulators CDKN1A,
CDC5L, human leukocyte antigen (HLA), and
cytidine deaminase gene APOBEC2 genes. Among these, the enrichment
of APOBEC mutations were also observed in HNSCC and other SCCs (Roberts et al., 2013). Although some of the gains were
replicated with Illumina ExomeSeq (chr 4 and chr 10), those for chr 2 and chr 6 only
partially overlapped. This apparent overestimation of CNA from whole-exome
sequencing (WES) could stem from differences in read fragment length or depth
resulting from DNA isolation, exome capture, and/or PCR amplification, and therefore
WES-based CN methods for some regions may not always reflect true CN variation.Interestingly, the recently derived HPV(+) VUMC lines showed significantly
fewer CNAs and HPV reads compared with traditional HPV(+) lines. CNA and
corroborating RNA expression data indicate that these lines with lower HPV reads,
integration, and proliferative potential also harbor fewer CNAs and corresponding
expressed genes. Of potential interest, the VUMC lines were cultured in keratinocyte
growth medium, which differs in the lower concentration of calcium, serum, and other
factors, which are used to favor and select for growth rather than differentiation
of normal keratinocytes. In comparison, high calcium serum-supplemented medium has
been used for culture of the most SCCcancer lines, including the others in this
study. Although our sample is limited, these unexpected observations suggest that
culture conditions could be another important factor determining the success rate of
culture and selection of tumors differing in HPV status, genomic CNAs and
prognosis.In summary, these HNSCC cell lines exhibit common and rare genetic
alterations observed in TCGA tumors, providing a valuable resource as head and neck
tumor models. HNSCC cell lines display concurrent genetic alterations in multiple
signaling pathways, consistent with those derived from clinically advanced and
aggressive tumors, supporting the concerted involvement of these pathways in
tumorigenesis, progression, and metastasis. The characterization of CNAs and
mutations has already identified important genetic alterations underpinning
sensitivity or resistance found in prior and recent preclinical studies. Identifying
new targets and potential mechanisms of resistance may facilitate mechanistic and
preclinical studies to advance personalized cancer medicine for HNSCC.
STAR*METHODS
CONTACT FOR REAGENT AND RESOURCE SHARING
Further information and requests for resources and reagents should be
directed to and will be facilitated by the Lead Contact, Carter VanWaes
(vanwaesc@nidcd.nih.gov).
EXPERIMENTAL MODELS AND SUBJECT DETAILS
Human Subjects
Tumor tissue, adjacent normal tissue, and normal whole blood samples
were obtained from patients at contributing centers with informed consent
according to their local Institutional Review Boards (IRBs, see below).
Biospecimens were centrally processed and DNA, RNA, and protein were
distributed to TCGA analysis centers.TCGA Project Management has collected necessary human subjects
documentation to ensure the project complies with 45-CFR-46 (the
“Common Rule”). The program has obtained documentation from
every contributing clinical site to verify that IRB approval has been
obtained to participate in TCGA. Such documented approval may include one or
more of the following: 1) An IRB-approved protocol with Informed Consent
specific to TCGA or a substantially similar program. In the latter case, if
the protocol was not TCGA-specific, the clinical site PI provided a further
finding from the IRB that the already-approved protocol is sufficient to
participate in TCGA; 2) A TCGA-specific IRB waiver has been granted; 3) A
TCGA-specific letter that the IRB considers one of the exemptions in
45-CFR-46 applicable. The two most common exemptions cited were that the
research falls under 46.102(f)(2) or 46.101(b)(4). Both exempt requirements
for informed consent, because the received data and material do not contain
directly identifiable private information; 4) A TCGA-specific letter that
the IRB does not consider the use of these data and materials to be human
subjects research. This was most common for collections in which the donors
were deceased.A total of 279 patients were analyzed in TCGA HNSCC analysis with
clinical data, whole exome sequencing (WES), RNA sequencing, miRNA
sequencing, methylation arrays, and DNA SNP chips. Most patients were male
(203 males, 73% and 76 females, 27%), heavy smokers (mean pack years = 51)
and used alcohol (188 out of 279, 67%). The median age of diagnosis was 61
years of age.Cell lines were established in accordance with IRB, exemption, or
consent requirements of 45-CFR-46 (the “Common Rule”) or
Declaration of Helsinki in place at the time of acquisition. The genomic and
transcriptomic analysis of the cell lines from de-identified individuals was
exempted from IRB review by The NIH Office of Human Subjects Research
Protection.
Clinical Samples, Data Types, and Genomic Platforms
Details about sample collection, tissue-specific sample selection
criteria, clinical annotations, and the genomic data pipelines can be found
in the TCGA HNSCC publication (Cancer Genome
Atlas Network, 2015). Human papilloma virus (HPV) status was
assessed by clinical assay (p16 immunohistochemsitry staining or in
situ hybridization) and multiple nucleic acids-based sequencing
platforms including RNA-Seq (n = 279 samples), WES (n = 279 samples),
“high coverage” WGS (n = 29) and “low pass” WGS
(n = 98). Samples were classified as HPV positive using an empiric
definition of detection of > 1000 mapped RNA-Seq reads, primarily
aligning to viral genes E6 and E7. In general there was good agreement
between all measures of HPV assessment, indicating that 36 tumors were
HPV(+) and 243 were HPV(−). Although 64% of the 33 oropharyngeal
tumors were HPV(+), 14 of the 36 cases in which the molecular data strongly
support an HPV(+) diagnosis were from non-oropharyngeal tumors. TCGA
clinical and platform data are available on the GDC website (https://gdc.cancer.gov/).
Cell lines
We selected a panel of 15 HPV(−) and 11 HPV(+) HNSCC cell
lines, including 18 historically widely studied and genotypically validated
lines (Brenner et al., 2010; White et al., 2007), and 8 recently
derived lines with matched blood DNA controls (Table S1A). Cell line stocks
from earliest passages available were genotyped and preserved in liquid
nitrogen and cultured for less than 1 month before sequencing. UM-SCC lines
(University of Michigan) originated from Drs. Thomas E. Carey, Mark E.
Prince, and Carol R. Brad-ford; VUMC-SCC lines (Vanderbilt University) from
Dr. Wendell Yarbrough; UPCI:SCC lines (University of Pittsburgh) from Dr.
Susanne M. Gollin; UD-SCC line (University of Dusseldorf,€ Germany)
from Drs. Thomas K. Hoffman and Henning Bier; and VU line (Vreije
University, Netherlands) from Johan de Winter. Authentication of UM-SCC and
other lines was done at the University of Michigan by DNA genotyping of
alleles for 9 loci, D3S1358, D5S818, D7S820, D8S1179, D13S317, D18S51,
D21S11, FGA, vWA, and the amelogenin locus in 2015, as described (Brenner et al., 2010). The HPV
expression and integration status of the lines was verified by RNA and DNA
sequencing (Figure S1A
& B, Table
S1B).
METHODS DETAILS
Sequencing
Whole Exome Sequencing (WES):
Genomic DNA from HNSCC lines and matched blood samples was
isolated from frozen cell pellets using the DNeasy Blood & Tissue
Kit (QIAGEN). Multiplexed libraries for exome capture sequencing were
constructed utilizing the 5500 SOLiD Fragment Library Core Kit,
TargetSeq Exome Enrichment Kits and 5500 SOLiD Fragment Library Barcode
Adaptors 1–16 (Life Technologies). The exome enriched genomic DNA
libraries were clonally amplified by emulsion PCR. Whole-exome
sequencing achieved an average of 87x coverage across targeted bases,
with 85% of target bases above 20x coverage.
RNA sequencing
The total RNAs of HNSCC lines and 3 keratinocyte lines were
isolated by combination of Trizol method (Life Technology) and QIAGEN
RNeasy kit (QIAGEN). Ribosomal RNA was depleted using Ribo-Zero kit
(Epicenter). Multiplexed whole transcriptome libraries were generated by
SOLiD Total RNA-Seq Kit and SOLiD RNA Barcoding Kit (Life
TechnologiesTM), and fragmented cDNA libraries were clonally amplified
by emulsion PCR.The multiplex DNA and cDNA libraries were sequenced utilizing 75
bp forward and 35 bp reverse paired-end sequencing chemistry on the ABI
SOLiD system. Color space reads were mapped into human NCBI Build 37
reference genome (Hg19) using LifeScope v.2.5 Genomic Analysis Software
(https://www.appliedbiosystems.com/lifescope) that was
developed for the 5500 Series Genetic Analysis Systems.
Copy number (CN) analysis
The exome capture area comprised 45.1 Mb in the coding regions of
over 19,000 genes (https://tools.thermofisher.com/content/sfs/manuals/TargetSeq_Exome_System_man.pdf)
identified in VEGA (http://vega.sanger.ac.uk), CCDS (https://www.ncbi.nlm.nih.gov/CCDS) and
RefSeq (http://www.ncbi.nlm.nih.gov/refseq) databases. The bam files
were analyzed with CONTRA (COpy Number Targeted Resequencing Analysis)
v2.0.4 with reference to a pooled baseline control set comprising 8 matched
blood samples. A bed file showing the primary target regions defined by
hg19-based RefSeq database was used as reference. Copy number variation in
terms of gene-level log-ratios were computed by taking the mean of the
adjusted region-level log-ratios (RLRs) of all the targeted regions within a
gene. Significant focal copy number alterations were identified from
segmented data using GISTIC (Genomic Identification of Significant Targets
in Cancer) 2.0. The confidence level was set as 90%.The copy-number profiles of TCGA samples and HNSCC cell lines were
visualized using the Integrative Genomics Viewer (IGV, http://www.broadinstitute.org/igv)
(version 2.3) and R package OmicCircos (http://bioconductor.org/packages/OmicCircos/).
Whole Exome Sequencing with Illumina HiSeq
Whole Exome DNA sequencing of independently prepared libraries from
the same samples of DNA from UM-SCC 1, UM-SCC 6, UM-SCC 11A and UM-SCC 74A
were performed using the Illumina HiSeq platform by Dr. Robert J. Morell
from the Genomics and Computational Biology Core of NIDCD. The multiplex DNA
of four cell line samples were sequenced in one lane utilizing 126 3 126
paired-end sequencing chemistry on the Illumina HiSeq system, and mapped to
human NCBI Build 37 reference genome (Hg19) with BWA-mem (http://bio-bwa.sourceforge.net/). The bam
files were analyzed with CONTRA v2.0.4 with reference to a pooled baseline
control set comprising 13 unrelated human blood samples.
Mutation analysis
Single nucleotide variants (SNVs) were called from exome sequence
data by the LifeScope platform(Tang et al.,
2008) algorithm using medium call stringency. Small insertions
and deletions were detected using the LifeScope Small InDel Tool. SNV and
indel data were annotated with gene-base annotation, population frequency,
database annotation and predicted effects at the protein level using ANNOVAR
(Wang et al., 2010). For
prioritization we excluded intronic and synonymous exonic SNVs. We then
filtered SNVs based on known single nucleotide polymorphisms (SNPs) from
dbSNP (http://www.ncbi.nlm.nih.gov/SNP) v132 and population
frequency data from the 1000 Genomes project (http://www.1000genomes.org/), excluding variants with a
population minor allele frequency (MAF) > 1%. SNVs were further
filtered, retaining variants present in the Catalogue of Somatic Mutations
in Cancer (COSMIC) v67 (http://cancer.sanger.ac.uk/cosmic), and/or predicted to have
high or medium (H, M) functional impact by MutationAssessor(Reva et al., 2011). The mutations were merged
from all cell lines and filtered against the 395 top TCGA mutated genes
identified with Mut-Sig (http://www.broadinstitute.org/cancer/cga/mutsig)(p
< 0.05).
RNA-seq expression analysis
Reads were filtered using mapping quality (threshold 10). On
average, we obtained about 168 million mapped reads per sample. Read counts
of the genes were normalized using DESeq (estimateSizeFactors) R package
(version 3.2.0) and the upper quartile normalization method. The expression
heatmap was generated using fold changes (FCs) in expression between HNSCC
cell line and the mean of three oral keratinocyte lines as normal control.
To select highly amplified and expressed genes, we combined significantly
amplified genes in both HPV(+) and ( ) HNSCCs from cell lines and TCGA,
identified by GISTIC with residual q-value of 0.05 as
cutoff in cell lines, and confidence level of 99% in TCGA. We then
identified the genes with expression significantly correlated with CN by
intersecting with CNV-gene-expression significantly correlated genes, as
described for Integrative and pathway analysis below. The genes were further
filtered by removing those with maximum FC % 2 and less than 3 samples with
FC R 1.5. For the heatmap, columns (samples) and rows (gene FCs compared to
normal) were hierarchically clustered using Canberra distance and
Ward’s minimum variance method of agglomeration and supervised to
enhance the expression comparison between HPV(−) and HPV(+)
lines.
RNA-seq variant detection
To validate SNVs identified in Exome seq, we detected variants with
RNA-seq. Raw RNA-seq data in fastq format were aligned to the NCBI human
genome build 37 (hg19) and splice junctions database using multi-threading
NovoalignCSMPI V1.04. An alignment threshold for polyclonal filter was used
(-p 7,15 0.3,15) and quality calibration enabled (-k). Output was in SAM
format (-o SAM) and 10 repeat matches for each read (-r A 10) were allowed.
Header lines (@SQ lines) and non-aligning reads were removed. The
splice-junction coordinates were converted to genomic coordinates by running
Useq software SamTranscriptomeParser (http://useq.sourceforge.net/). The resulting alignments were
sorted with Picard (http://broadinstitute.github.io/picard/) tool SortSam and
the duplicated reads were removed by SAMtools (http://samtools.sourceforge.net/) rmdup. Variants were
called with the SAMtools bcftools “view” with settings -vcg.
Variants were filtered with vcfutils “varFilter” using default
settings except retaining all variants with less than 300 reads. Where
sufficient reads enabled detection of variants by RNA-seq and Exome seq,
cross-validated SNVs are reported.
HPV detection
To identify HPV reads, color space RNA-seq data in fastq format were
aligned to an HPV reference genome with NovoalignCSMPI V1.04 using
parameters described in previous section of “RNA-seq variant
detection.” The HPV reference genome includes genome sequences of 37
HPV subtypes downloaded from NCBI. The HPV virus reads were extracted,
filtered and counted using custom Python scripts. Chimeric human and HPV
read pairs were identified by aligning the paired end reads in bam format to
a combined human and HPV16/18 reference genome, using STAR (http://code.google.com/p/rna-star/).
Comparison of CNV-gene expression association in TCGA and HNSCC cell
lines
TCGA mRNA abundance and CNV for 279 tumor specimens (243
HPV(−) and 36 HPV(+) tissue samples) were extracted from Level 3 data
associated with the data freeze (Cancer
Genome Atlas Network, 2015). For Cell lines, CN alterations were
analyzed by GISTIC, and gene expression was quantified using RNASeq read
count normalized with DESeq (estimateSizeFactors) R package and
within-sample to a fixed upper quartile of total reads. For gene level
analyses, expression values of zero were set to the overall minimum value,
and all data were log2 transformed. To compare CNV-gene
expression association in HNSCC tumor specimen and cell lines, first, for
each gene, a Spearman coefficient test (one-tailed, greater) was conducted
to check for positive correlation between its CNV and expression. The
strength of association between CNV and expression for each gene measured as
Spearman correlation coefficients
(r) were recorded. The
same tests were carried out in both TCGA and cell lines. Second, to compare
the CNV-expression association for all the genes in TCGA and cell lines,
another Spearman coefficient test was conducted between Spearman’s
rho for the common 17,040 genes in TCGA and cell lines. The concordance of
the two datasets was analyzed with correlation analysis and visualized with
a scatterplot. The concordance between TCGA tumors and cell lines in terms
of correlation between CNA and gene expression was also examined and
visualized for individual chromosome 1, 2, 3, 4, 5, 6, 10 and 11 (Figure S5).
Expression subtype analysis
Cell lines expression data was log2-transformed, genes with missing
values were removed and resulting expression were median centered by gene.
Using subtype predictor centroids, each cell line was assigned an expression
subtype using a nearest centroid predictor. Not all genes from the prior
predictor were available thus the predictor was adjusted from 838 genes to
the overlap set of 832. The 832 predictor genes were used to define
centroids for each of the four classes of HNSCC. Distance between the
centroids was empirically evaluated and visualized using 1 minus the Pearson
correlation coefficient for the centroids. Cell lines that were mapped to
both Classical and Atypical subtypes (r > 0.125)
were labeled as Classical/Atypical. Cell lines with low Pearson correlation
with all subtypes (r < 0.125) were labeled as
“No type.”
Integrative and pathway analysis
TCGA amplified and deleted genes with 99% confidence intervals in
HPV(−) and HPV(+) samples for 279 tumor specimens were extracted from
Level 3 data associated with the data freeze (Cancer Genome Atlas Network, 2015). Cell line significantly
amplified or deleted genes were identified by GISTIC with residual
q-value of 0.05 as cutoff. Merge significantly
amplified genes in HPV(−) and (+) and intersect the resulted genes in
TCGA and cell lines. The same procedure was done for significantly deleted
genes. Merge significantly amplified genes and deleted genes.CNV-gene-expression significantly correlated genes were selected
using these steps: For each gene, a Spearman coefficient test (one-tailed,
greater) was used to check for positive correlation between its CNV and log2
transformed expression. P values were adjusted for false discovery rate
(FDR) using Benjamini-Hochberg correction. An FDR value of < 0.05 was
considered statistically significant. This procedure was carried out in both
TCGA and cell lines and the intersection of significantly correlated genes
were determined.CN significantly altered genes intersected with CNV-gene-expression
significantly correlated genes in both TCGA and HNSCC cell lines. The genes
in the intersection were filtered with their expression in cell lines by
comparison to diploid. Genes with 1.5 fold difference in HPV(−) or
HPV(+) respectively were selected and analyzed with Ingenuity Pathway
Analysis (IPA) (QIAGEN Inc., https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis)
to determine affected molecular and cellular functions and the enrichment of
canonical pathways.
Survival analysis
To compare overall survival time by TP53 mutation
or 3q26.3 amplification events, subjects were categorized as four groups:
cases with wild-type (WT) TP53 and 3q26.3 CN neutral, cases
with WT TP53 and amplified 3q26.3, cases with
TP53 mutation and 3q26.3 CN neutral, cases with
TP53 mutation and 3q26.3 amplification. Five types of
non-silent mutations were included in TP53 mutation:
missense, nonsense, frameshift insertion, frameshift deletion and
splice-site mutations. The comparison of survival between PIK3CA mutation
and 3q26.3 amplification events was carried out in a similar fashion.The survival analyses were performed with R package. The overall
survival curves were obtained using Kaplan-Meier method and were compared
using the log-rank test. The Cox proportional hazards model was used to
estimate Hazard Ratios (HRs) with 95% Confidence Intervals (CIs).
Comparison of CNAs in recently derived and traditional HNSCC cell
lines
To compare CNAs in recently derived and traditional cell lines, we
analyzed GISTIC regions with significant alterations using
all_lesions.conf_90.csv file. The regions with
significant alterations in each cell line were compared and visualized with
bar chart. Cell lines were grouped by recently-derived or traditional lines
and by HPV status, and regions with significant alterations were compared by
groups and HPV status using Welch two sample t test and visualized with box
and whisker plots.
DATA AND SOFTWARE AVAILABILITY
The accession numbers for the phenotype, exome DNA-seq, and RNA-seq data
reported in this paper are SRA: PRJNA453457 and dbGaP: phs001581.
Software tools
All statistical analyses were performed in R software version 3.2.0,
and Bioconductor. The OncoPrints and lollipop mutation plots were generated
using cBioPortal Tools, OncoPrinter and MutationMapper (Gao et al., 2013). The Bioconductor package
ComplexHeatmap (Gu, 2015) was used to
draw the mutation heatmap.
Authors: Mindy I Davis; Jeremy P Hunt; Sanna Herrgard; Pietro Ciceri; Lisa M Wodicka; Gabriel Pallares; Michael Hocker; Daniel K Treiber; Patrick P Zarrinkar Journal: Nat Biotechnol Date: 2011-10-30 Impact factor: 54.908
Authors: Russell Moser; Chang Xu; Michael Kao; James Annis; Luisa Angelica Lerma; Christopher M Schaupp; Kay E Gurley; In Sock Jang; Asel Biktasova; Wendell G Yarbrough; Adam A Margolin; Carla Grandori; Christopher J Kemp; Eduardo Méndez Journal: Clin Cancer Res Date: 2014-08-15 Impact factor: 12.531
Authors: Eliane Papa Ambrosio; Cássia Gisele Terrassani Silveira; Sandra Aparecida Drigo; Vivian de Souza Sacomano; Miriam Coelho Molck; Rafael Malagoli Rocha; Maria Aparecida Custódio Domingues; Fernando Augusto Soares; Luiz Paulo Kowalski; Silvia Regina Rogatto Journal: Tumour Biol Date: 2013-06-08
Authors: Xin Huang; Tony E Godfrey; William E Gooding; Kenneth S McCarty; Susanne M Gollin Journal: Genes Chromosomes Cancer Date: 2006-11 Impact factor: 5.006
Authors: Arezoo Astanehe; David Arenillas; Wyeth W Wasserman; Peter C K Leung; Sandra E Dunn; Barry R Davies; Gordon B Mills; Nelly Auersperg Journal: J Cell Sci Date: 2008-02-12 Impact factor: 5.285
Authors: D L Van Dyke; M J Worsham; M S Benninger; C J Krause; S R Baker; G T Wolf; T Drumheller; B C Tilley; T E Carey Journal: Genes Chromosomes Cancer Date: 1994-03 Impact factor: 5.006
Authors: Roy Xiao; Yi An; Wenda Ye; Adeeb Derakhshan; Hui Cheng; Xinping Yang; Clint Allen; Zhong Chen; Nicole C Schmitt; Carter Van Waes Journal: Clin Cancer Res Date: 2019-07-02 Impact factor: 12.531
Authors: Shuxian Peng; Xun Li; Qin Liu; Yingheng Zhang; Liming Zou; Xiaoli Gong; Miaomiao Wang; Xiaodong Ma Journal: Nan Fang Yi Ke Da Xue Xue Bao Date: 2019-06-30
Authors: Ankit Srivastava; Cristina Tommasi; Dane Sessions; Angela Mah; Tomas Bencomo; Jasmine M Garcia; Tiffany Jiang; Michael Lee; Joseph Y Shen; Lek Wei Seow; Audrey Nguyen; Kimal Rajapakshe; Cristian Coarfa; Kenneth Y Tsai; Vanessa Lopez-Pajares; Carolyn S Lee Journal: Cancer Res Date: 2022-09-02 Impact factor: 13.312
Authors: Giacomo Miserocchi; Chiara Spadazzi; Sebastiano Calpona; Francesco De Rosa; Alice Usai; Alessandro De Vita; Chiara Liverani; Claudia Cocchi; Silvia Vanni; Chiara Calabrese; Massimo Bassi; Giovanni De Luca; Giuseppe Meccariello; Toni Ibrahim; Marco Schiavone; Laura Mercatali Journal: J Pers Med Date: 2022-05-24
Authors: Carter Van Waes; Ethan L Morgan; Zhengbo Hu; Ramya Viswanathan; Hui Cheng; Jianghong Chen; Xinping Yang; Angel Huynh; Paul Clavijo; Yi An; Yvette Robbins; Christopher Silvin; Clint Allen; Pinar Ormanoglu; Scott Martin; Shaleeka Cornelius; Anthony Saleh; Zhong Chen Journal: Mol Cancer Res Date: 2022-06-03 Impact factor: 6.333
Authors: Xinping Yang; Hui Cheng; Jianhong Chen; Ru Wang; Anthony Saleh; Han Si; Steven Lee; Emine Guven-Maiorov; Ozlem Keskin; Attila Gursoy; Ruth Nussinov; Jugao Fang; Carter Van Waes; Zhong Chen Journal: Cancer Immunol Res Date: 2019-10-17 Impact factor: 11.151
Authors: Nicole D Facompre; Pavithra Rajagopalan; Varun Sahu; Alexander T Pearson; Kathleen T Montone; Claire D James; Frederico O Gleber-Netto; Gregory S Weinstein; Jalal Jalaly; Alexander Lin; Anil K Rustgi; Hiroshi Nakagawa; Joseph A Califano; Curtis R Pickering; Elizabeth A White; Bradford E Windle; Iain M Morgan; Roger B Cohen; Phyllis A Gimotty; Devraj Basu Journal: Int J Cancer Date: 2020-07-06 Impact factor: 7.396
Authors: Jacqueline E Mann; Aditi Kulkarni; Andrew C Birkeland; Judy Kafelghazal; Julia Eisenberg; Brittany M Jewell; Megan L Ludwig; Matthew E Spector; Hui Jiang; Thomas E Carey; J Chad Brenner Journal: Head Neck Date: 2019-05-15 Impact factor: 3.821
Authors: Brenen W Papenberg; James Ingles; Si Gao; Jun Feng; Jessica L Allen; Steven M Markwell; Erik T Interval; Phillip A Montague; Sijin Wen; Scott A Weed Journal: Cancer Genet Date: 2021-05-28