B P D Purkayastha1, E R Chan2, D Ravillah1, L Ravi1, R Gupta3, M I Canto4, J S Wang5, N J Shaheen6, J E Willis7, A Chak3, V Varadan1, K Guda8. 1. Division of General Medical Sciences-Oncology, Case Western Reserve University School of Medicine, Cleveland, Ohio. 2. Institute for Computational Biology, Case Western Reserve University School of Medicine, Cleveland, Ohio. 3. Division of Gastroenterology, Case Western Reserve University School of Medicine, Cleveland, Ohio. 4. Division of Gastroenterology and Hepatology, Department of Medicine, The Johns Hopkins Medical Institutions, Baltimore, Maryland. 5. Division of Gastroenterology, Department of Medicine, Washington University School of Medicine, St Louis, Missouri. 6. Center for Esophageal Diseases and Swallowing, Division of Gastroenterology and Hepatology, University of North Carolina, Chapel Hill, North Carolina. 7. Department of Pathology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio. 8. Division of General Medical Sciences-Oncology, Case Western Reserve University School of Medicine, Cleveland, Ohio; Department of Pathology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, Ohio. Electronic address: gkishore@yahoo.com.
Cancer-associated gene isoforms, arising from aberrant RNA splicing and/or processing, can play a functional role in tumor pathogenesis and are attractive as biomarkers and targets for cancer therapy. To date, the prevalence and significance of such alternative transcript isoforms in esophageal adenocarcinoma (EAC), an increasingly prevalent and lethal malignancy, remain unknown. Here, using an agnostic genome-scale approach, we sought to identify and characterize aberrant cancer-associated transcript-variants in EAC.Whole transcriptome sequencing (RNAseq) was performed on a discovery sample set of 49 treatment-naive EAC and 40 normal/premalignant fresh-frozen biopsy tissues (Supplementary Table 1 and Supplementary Methods), followed by de novo transcriptome analysis to specifically identify novel/unannotated gene transcript-variants primarily induced in EACs but not in normal/premalignant tissues. Following stringent and orthogonal evaluation using transcript-variant specific polymerase chain reaction (PCR) in respective primary EAC tumors, we identified 7 novel candidate EAC-associated transcript-variants (Supplementary Figure 1, Supplementary Table 2). Together, the 7 candidate transcript-variants accounted for 71% of EACs tested, with each of the transcript-variants being induced in 10%–30% of EACs in the RNAseq discovery cohort.
Supplementary Table 1
Discovery and Validation Sample Cohorts
Discovery RNAseq samples
Number of samples
Median age at diagnosis, y (range)
Gender distribution
Cancer stage distribution
EACa
49
65 (36 - 88)
89% (male) 11% (female)
Stage I (17.9%), Stage II (19.6%), Stage III (46.4%), Stage IV (16.1%)
Nondysplastic stable Barrett's esophagusb
18
56 (18-84)
94% (male) 6% (female)
NA
Normal esophageal squamous (SQ)c
11
64 (45-83)
90% (male) 10% (female)
NA
Normal gastric (GAST)
11
63 (36-82)
82% (male) 18% (female)
NA
Total
89
11% of EACs were gastroesophageal junctional adenocarcinomas.
Median surveillance of 9 years, ranging from 6 to 22 years.
Each of the 11 normal SQ samples was obtained from respective EAC patients included in the RNA sequencing.
13% of EACs were gastroesophageal junctional adenocarcinomas.
Clinical follow-up information unavailable (progression status unknown) for these patients.
Supplementary Figure 1
Full-length structure of novel transcript-variants identified in EACs. Shown are the complete mRNA sequences (5′ to 3′) of the respective candidate transcript-variants discovered in EACs. For each of the 7 candidates, variant-specific sequences are highlighted in blue font. Shown below each of the sequences are positions of individual exons and coding sequence. For each of the variants and their corresponding canonical genes, exon-intron structures along with their relative sizes-distances are illustrated on the right.
Supplementary Table 2
Candidate Novel Transcript-Variants
Transcript_variant
CHR
Transcript_variant Genomic START (hg19)
Transcript_variant Genomic END (hg19)
Transcript_variant STRAND
Transcript_variant EXON NUMBER
Transcript_variant EXON SIZE (bp)
Transcript_variant LENGTH (bp)
Transcript_variant PREDICTED CDS START-STOP (bp) a
Transcript_variant PREDICTED PROTEIN LENGTH (AA)a
Transcript_variant– specific Forward_primer (5' to 3')
Transcript_variant-specific Reverse_primer (5' to 3')
PCR_product_size (bp)
Canonical_Gene symbol
Canonical_Gene ID
Canonical_Transcript NUCLEOTIDE ID
Canonical_Transcript LENGTH (bp)
Canonical_Transcript CDS_START-STOP (bp)
Canonical_Transcript PROTEIN ID
Canonical_Transcript PROTEIN LENGTH (AA)
chr6
116440086
116443124
–
3
3038
AGCAGCCAACAACAAGCATA
GTGGACCAGGAGTACCTTGC
252
COL10A1Var1
chr6
116446502
116446670
–
2
168
3442
252..2294
680
COL10A1
1300
NM_000493
3302
96..2138
NP_000484
680
chr6
116479777
116480013
–
1
236
chr6
39063820
39073552
–
3
9732
CATCCTCCTTCCCACTACCA
TGCCATCATTACATGCACCT
2241
SAYSD1Var1
chr6
39077090
39081496
–
2
4406
14523
4788..5138
116
SAYSD1
55776
NM_001304793
6425
4706..5056
NP_001291722
116
chr6
39082659
39083044
–
1
385
chr5
68462688
68463110
+
1
422
AGAGGCAGACCACGTGAGAG
GCTTAGGAGTTCTGTGGGACA
1431
chr5
68463735
68463905
+
2
170
chr5
68464000
68464170
+
3
170
CCNB1Var1
chr5
68467097
68467279
+
4
182
1643
403..1512
369
CCNB1
891
NM_031966
2029
114..1415
NP_114172
433
chr5
68470078
68470236
+
5
158
chr5
68470704
68470940
+
6
236
chr5
68471224
68471529
+
7
305
chr2
73300510
73302852
–
5
2342
TTGGCTCTTCAGAGTCAGCA
GGTAGTACTTGGCCGACTGG
778
chr2
73303121
73303310
–
4
189
RAB11FIP5Var1
chr2
73306779
73308473
–
3
1694
5700
144..3566
1140
RAB11FIP5
26056
NM_015470
4272
172..2133
NP_056285
653
chr2
73315178
73315877
–
2
699
chr2
73316007
73316783
–
1
776
chr18
77889764
77890260
+
1
496
CCATCAAAACTTGCTGAGAGC
GGCCACAACAGTATGGCTTT
288
ADNP2Var1
chr18
77890986
77891075
+
2
89
5369
765..3785
1006
ADNP2
22850
NM_014913
5157
225..3620
NP_055728
1131
chr18
77893495
77898279
+
3
4784
chr16
46989335
46989534
–
10
199
GGATGCCGCAGTATCGTAAT
TTGTGGGGAAGTAACCTTGG
503
chr16
46990919
46991132
–
9
213
chr16
46992915
46993042
–
8
127
chr16
46993187
46993331
–
7
144
DNAJA2Var1
chr16
46998523
46998719
–
6
196
1857
407..1645
412
DNAJA2
10294
NM_005880
3008
103..1341
NP_005871
412
chr16
47001425
47001558
–
5
133
chr16
47001996
47002076
–
4
80
chr16
47005261
47005484
–
3
223
chr16
47005808
47005867
–
2
59
chr16
47007406
47007889
–
1
483
chr14
54941202
54944877
–
3
3675
TCCCCAGGTGTTGGTAAAT
GGTCTTCGGTATTTCTTATTTCAA
250
GMFBVar1
chr14
54946504
54946577
–
2
73
3996
270..395
41
GMFB
2764
NM_004124
4085
54..482
NP_004115
142
chr14
54947592
54947840
–
1
248
Putative candidate transcript–variant coding regions were predicted using NCBI ORF finder. Listed are only those predicted ORFs for transcript–variants that are in the same reading frame as respective canonical transcripts.
We subsequently prioritized a novel transcript-variant of the collagen X alpha 1 chain precursor (COL10A1) gene for further studies, on the basis of the recognized pro-tumorigenic role of COL10A1 pathway network in other tumor contexts.3, 4, 5, 6, 7, 8 Using bidirectional rapid amplification of cDNA ends (RACE) analysis, we first characterized the full-length transcript structure of this novel COL10A1-variant, hereafter referred to as COL10A1 (deposited in GenBank: MN308081). COL10A1 is a 3-exon transcript (3444 base pairs [bp]), containing a longer and distinct 5′ exon compared with the canonical (NM_000493.4) transcript (Figure 1A,
Supplementary Figure 1). In silico analyses (NCBI ORFfinder) predicted COL10A1 to encode for a ∼66 kDa (680 aa) protein, identical in size to the secreted canonical COL10A1 protein, which we confirmed by using orthogonal immunoprecipitation and Western blot analyses upon transfecting HEK293T cells with full-length COL10A1 transcript (3444 bp), or the coding sequence of canonical COL10A1 transcript (Figure 1B).
Figure 1
Characterization of . (A) Shown are the 5′ to 3′ exon (Ex)-introns (thin line) structures of COL10A1 and canonical COL10A1. UTR, untranslated region. (B) Western blot analyses depicting COL10A1Var1 and COL10A1 proteins. IB, immunoblotting; IP, immunoprecipitation. CEMIP1 was used as positive control for secreted protein and Empty vector as a negative control. (C) Pie charts demonstrating the proportion (%) of samples positive for COL10A1 transcript (top, red color) or canonical COL10A1 (bottom, blue color) in respective SQ, GAST, BM, HGD, and malignant (EAC) tissue biopsies. ∗∗∗P< .0001 indicates significant difference in the proportion COL10A1 positivity between malignant (EAC) vs any of the respective non-EAC tissue groups, estimated by using a one-tailed Fisher exact test.
Characterization of . (A) Shown are the 5′ to 3′ exon (Ex)-introns (thin line) structures of COL10A1 and canonical COL10A1. UTR, untranslated region. (B) Western blot analyses depicting COL10A1Var1 and COL10A1 proteins. IB, immunoblotting; IP, immunoprecipitation. CEMIP1 was used as positive control for secreted protein and Empty vector as a negative control. (C) Pie charts demonstrating the proportion (%) of samples positive for COL10A1 transcript (top, red color) or canonical COL10A1 (bottom, blue color) in respective SQ, GAST, BM, HGD, and malignant (EAC) tissue biopsies. ∗∗∗P< .0001 indicates significant difference in the proportion COL10A1 positivity between malignant (EAC) vs any of the respective non-EAC tissue groups, estimated by using a one-tailed Fisher exact test.Using a robust quantitative real-time PCR (qPCR) assay that specifically detects COL10A1 but not the canonical transcript, we next evaluated the generality and frequency of COL10A1 expression in a validation cohort (N = 832) consisting of treatment-naive EAC (N = 170), Barrett’s metaplasia (BM) (N = 123), Barrett’s with high grade dysplasia (HGD) (N = 60), normal esophageal squamous (SQ) (N = 465), and normal gastric (GAST) (N = 14) biopsy tissues (Supplementary Table 1). Our orthogonal analysis demonstrated COL10A1 to be robustly induced in the majority (∼60%) of EACs (Figure 1C,
Supplementary Table 3). In striking contrast to EAC, only a minority of BM, HGD, SQ, and GAST samples tested positive for COL10A1 (Fisher exact test, P < .0001; Figure 1C,
Supplementary Table 3). We also note that COL10A1 is a more frequently detected isoform in EACs, as compared with the canonical COL10A1 transcript that was detected in approximately one-fourth of EAC samples with no marked differences between EAC and normal/premalignant tissues (Figure 1C,
Supplementary Table 3). Taken together, these findings strongly point to COL10A1 as a recurrently induced transcript-variant in advanced stages of EAC development.
Supplementary Table 3
Expression Status of COL10A1Var1 and Canonical COL10A1 Across Lesions
EAC (N = 219)a
Canonical COL10A1-positive
Canonical COL10A1-negative
COL10A1Var1-positive
53 (24.2%)
79 (36.07%)
COL10A1Var1-negative
1 (0.46%)
86 (39.27%)
NDBE, nondysplastic Barrett’s esophagus.
Number of samples combined from both Discovery and Validation cohorts.
Because fibrillary protein networks (collagen, elastin) and glycoproteins (fibronectin) play a vital role in facilitating migration and invasion of cancer cells, we next evaluated the impact of COL10A1 knockdown on the migratory potential of EAC cells in a durotaxis assay. We note that the EAC cell lines positive for COL10A1 also expressed canonical COL10A1 transcript (Figure 2A), and repeated attempts to specifically knockdown COL10A1 with custom short hairpin RNAs (shRNAs) proved technically unsuccessful. Nonetheless, because both COL10A1 and canonical COL10A1 transcripts code for identical protein (Figure 1B) and consequently may exhibit similar function, as an alternative approach we used well-characterized COL10A1 shRNAs that also target COL10A1 for subsequent studies. OE19 EAC cells (Figure 2A), stably expressing control or COL10A1 shRNAs under the control of doxycycline (Figure 2B), were seeded onto one-half of a glass coverslip coated with fibronectin alone (representing soft surface). Migration (durotaxis) of cells from the soft surface to an adjacent fibronectin-coated hydrogel (stiffer, 12 kPa) surface was monitored over time in the presence of doxycycline. Loss of COL10A1/COL10A1 indeed significantly impeded the durotactic ability of EAC cells (P < .004) (Figure 2C), suggesting COL10A1 isoforms as potential regulators of mechanosensing ability of EAC cells.
Figure 2
Impact of on durotaxis of EAC cells. (A) PCR-based analysis showing COL10A1 and canonical COL10A1 expression in normal esophageal squamous (Epc2), non-dysplastic BE (CP-A), dysplastic BE (CP-B, CP-C, CP-D), and EAC (OE19, OE33, FLO-1, EsoAd1, SKGT4) cell lines. B2M was used as the internal RNA control. BE, Barrett’s esophagus. (B) Representative images (left) demonstrating shRNA induction on doxycycline (Dox) treatment in stable OE19 cells, carrying either non-targeting control shRNA or shRNAs targeting both COL10A1 and canonical COL10A1 transcripts (depicted as COL10A1/). Note the specific induction of TurboRFP, a red fluorescent reporter of shRNA induction, on doxycyline treatment in these cells. PCR analysis (right) demonstrating knockdown of COL10A1/ RNA on doxycycline treatment of the stable OE19 cells. B2M was used as an internal RNA control. (C) Representative images of durotaxis assay in stable OE19 cells. Quantitative analysis of cell migration (bar graph), measured as total fluorescence units (TFU, Y-axis) of TurboRFP-positive cells in the stiffer surface. All data are plotted as mean ± standard error of the mean, obtained from 3 replicate experiments. ∗∗P < .004 indicates significant differences in COL10A1/ knockdown vs control shRNA cells, estimated by using a Student t test assuming unequal variances.
Impact of on durotaxis of EAC cells. (A) PCR-based analysis showing COL10A1 and canonical COL10A1 expression in normal esophageal squamous (Epc2), non-dysplastic BE (CP-A), dysplastic BE (CP-B, CP-C, CP-D), and EAC (OE19, OE33, FLO-1, EsoAd1, SKGT4) cell lines. B2M was used as the internal RNA control. BE, Barrett’s esophagus. (B) Representative images (left) demonstrating shRNA induction on doxycycline (Dox) treatment in stable OE19 cells, carrying either non-targeting control shRNA or shRNAs targeting both COL10A1 and canonical COL10A1 transcripts (depicted as COL10A1/). Note the specific induction of TurboRFP, a red fluorescent reporter of shRNA induction, on doxycyline treatment in these cells. PCR analysis (right) demonstrating knockdown of COL10A1/ RNA on doxycycline treatment of the stable OE19 cells. B2M was used as an internal RNA control. (C) Representative images of durotaxis assay in stable OE19 cells. Quantitative analysis of cell migration (bar graph), measured as total fluorescence units (TFU, Y-axis) of TurboRFP-positive cells in the stiffer surface. All data are plotted as mean ± standard error of the mean, obtained from 3 replicate experiments. ∗∗P < .004 indicates significant differences in COL10A1/ knockdown vs control shRNA cells, estimated by using a Student t test assuming unequal variances.Taken in toto, we identify COL10A1 as a novel and recurrent EAC-associated transcript-variant with a potential pro-tumorigenic function. On a broader scale, our study represents the first genome-wide analysis identifying novel transcript-variants induced in EAC. Further comprehensive studies are warranted to decipher the biologic role of the identified candidates and to evaluate their utility as biomarkers and therapeutic targets in this increasingly prevalent and lethal malignancy.
Authors: Andrew E Blum; Srividya Venkitachalam; Durgadevi Ravillah; Aruna K Chelluboyina; Ann Marie Kieber-Emmons; Lakshmeswari Ravi; Adam Kresak; Apoorva K Chandar; Sanford D Markowitz; Marcia I Canto; Jean S Wang; Nicholas J Shaheen; Yan Guo; Yu Shyr; Joseph E Willis; Amitabh Chak; Vinay Varadan; Kishore Guda Journal: Gastroenterology Date: 2019-02-12 Impact factor: 22.682
Authors: Christopher D Hartman; Brett C Isenberg; Samantha G Chua; Joyce Y Wong Journal: Proc Natl Acad Sci U S A Date: 2016-09-19 Impact factor: 11.205
Authors: Andrew E Blum; Srividya Venkitachalam; Yan Guo; Ann Marie Kieber-Emmons; Lakshmeswari Ravi; Apoorva K Chandar; Prasad G Iyer; Marcia I Canto; Jean S Wang; Nicholas J Shaheen; Jill S Barnholtz-Sloan; Sanford D Markowitz; Joseph E Willis; Yu Shyr; Amitabh Chak; Vinay Varadan; Kishore Guda Journal: Cancer Res Date: 2016-08-08 Impact factor: 12.701
Authors: Stephen P Fink; Lois L Myeroff; Revital Kariv; Petra Platzer; Baozhong Xin; Debra Mikkola; Earl Lawrence; Nathan Morris; Arman Nosrati; James K V Willson; Joseph Willis; Martina Veigl; Jill S Barnholtz-Sloan; Zhenghe Wang; Sanford D Markowitz Journal: Oncotarget Date: 2015-10-13
Authors: Arseniy E Yuzhalin; Tomas Urbonas; Michael A Silva; Ruth J Muschel; Alex N Gordon-Weeks Journal: Br J Cancer Date: 2018-01-23 Impact factor: 7.640
Authors: Jessica H Wen; Ludovic G Vincent; Alexander Fuhrmann; Yu Suk Choi; Kolin C Hribar; Hermes Taylor-Weiner; Shaochen Chen; Adam J Engler Journal: Nat Mater Date: 2014-08-10 Impact factor: 43.841
Authors: Alexandra Naba; Karl R Clauser; Charles A Whittaker; Steven A Carr; Kenneth K Tanabe; Richard O Hynes Journal: BMC Cancer Date: 2014-07-18 Impact factor: 4.430
Authors: Alexander S Brodsky; Jinjun Xiong; Dongfang Yang; Christoph Schorl; Mary Anne Fenton; Theresa A Graves; William M Sikov; Murray B Resnick; Yihong Wang Journal: BMC Cancer Date: 2016-04-18 Impact factor: 4.430