| Literature DB >> 29203868 |
John Lai1,2, Leire Moya1,2, Jiyuan An1,2, Andrea Hoffman1,2, Srilakshmi Srinivasan1,2, Janaththani Panchadsaram1,2, Carina Walpole1,2, Joanna L Perry-Keene3, Suzanne Chambers4, Melanie L Lehman1,2, Colleen C Nelson1,2, Judith A Clements1,2, Jyotsna Batra5,6.
Abstract
Short tandem repeats (STRs) are repetitive sequences of a polymorphic stretch of two to six nucleotides. We hypothesized that STRs are associated with prostate cancer development and/or progression. We undertook RNA sequencing analysis of prostate tumors and adjacent non-malignant cells to identify polymorphic STRs that are readily expressed in these cells. Most of the expressed STRs in the clinical samples mapped to intronic and intergenic DNA. Our analysis indicated that three of these STRs (TAAA-ACTG2, TTTTG-TRIB1, and TG-PCA3) are polymorphic and differentially expressed in prostate tumors compared to adjacent non-malignant cells. TG-PCA3 STR expression was repressed by the anti-androgen drug enzalutamide in prostate cancer cells. Genetic analysis of prostate cancer patients and healthy controls (N > 2,000) showed a significant association of the most common 11 repeat allele of TG-PCA3 STR with prostate cancer risk (OR = 1.49; 95% CI 1.11-1.99; P = 0.008). A significant association was also observed with aggressive disease (OR = 2.00; 95% CI 1.06-3.76; P = 0.031) and high mortality rates (HR = 3.0; 95% CI 1.03-8.77; P = 0.045). We propose that TG-PCA3 STR has both diagnostic and prognostic potential for prostate cancer. We provided a proof of concept to be applied to other RNA sequencing datasets to identify disease-associated STRs for future clinical exploratory studies.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29203868 PMCID: PMC5715103 DOI: 10.1038/s41598-017-16700-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Characterisation of STRs in the human genome. (a) Histogram of the total number of repetitive units in the genome that includes 413,414 STRs (Simple_repeats, black bar) from the Repeat Masker library. (b) Histogram indicating that the genome mostly comprises di-nucleotide repeats, and that hexa-nucleotide repeats occur in the least amount. (c) Scatterplot indicating that the genome comprises mostly of STRs with low numbers of G and C nucleotides (% GC in repeat). (d) Pie charts indicating that of the STRs that comprise of 2–6 nt nucleotides, 223,742 STRs (58%) have less than 5% mutations, insertions or deletions. (e) Pie charts indicating that 121,835 of the 223,742 STRs (75%) from the Repeat Masker library were detected in the Willems et al. Phase 1, 1000 genome dataset[3]. 120,806 of these STRs (99%) were predicted by the Willems et al. study to be polymorphic.
Figure 2STR expression in RNAseq datasets. (a) Bubble plot of STR expression for di-(2 nt), tri- (3 nt), tetra- (4 nt), penta- (5 nt) and hexa- (6 nt) nucleotide repeats. Larger sized bubbles indicate higher expression for that respective STR. Darker intensity bubbles indicate that multiple STRs of that particular length and respective number of repeat unit are expressed. (b) Pie chart detailing the percentage of expressed STRs that are located within intergenic, promoter, 5′UTR, coding (CDS), intronic, or 3′UTR DNA in LNCaP cells, the Ren et al. clinical prostate cancer RNAseq dataset[40], and our eight clinical prostate samples.
Figure 3Scatterplot of differential STR expression between tumors and adjacent non-cancer prostate cells. (a) Highlighted in black dots are 8 candidate STRs that are consistently differentially expressed in RNAseq datasets, and/or are expressed in a large number of RNAseq datasets from ours (n = 8) and Ren et al. (n = 14) clinical prostate samples[40]. (b) RT-qPCR analysis of the 8 candidate STRs in another cohort (n = 7) of clinical prostate samples. (c) Analysis of microarray expression data from the Taylor et al. study in non-cancer cells (N), and prostate cancers of Gleason score 6–9 (G6, G7, G8, and G9). The horizontal line represents the mean expression for each group.
Summary of eight candidate STRs.
| STR | Locia | Tumor expressionb | (anti)-androgen regulationc in LNCaP cells | Alleles |
|---|---|---|---|---|
| TAAA- | chr2:74144316–74144336 | Down | Not expressed | 5, 6 |
| GAAA- | chr8:102563848–102563874 | No change | Not regulated | 4, 5 |
| TTTTG- | chr8:126450287–126450311 | Up | DHT (↓) | 3, 4, 5 |
| TTTTTG- | chr9:79395653–79395679 | Up | Not assessed | Not polymorphic |
| TG- | chr9:79400650–79400676 | Up | Enzalutamide (↓) | 9, 10, 11, 12, 13 |
| TG- | chr9:124094978–124094997 | No change | Not assessed | Not genotyped |
| CAAAA- | chr10:112044843–112044867 | No change | Enzalutamide (↓), DHT (↓) | 4, 5 |
| GAAA- | chr20:48121708–48121728 | Down | Not assessed | Not polymorphic |
aRepeat Masker coordinate (hg19). bRT-qPCR validated expression in at least four of seven tumors with over 2-fold change in expression. c↓Indicates down-regulation by the respective (anti)-androgen. STR loci locations, their respective expression in tumor and in (anti)-androgen LNCaP cells and predicted number of repeats.
Figure 4(Anti)-androgen regulation of STRs in LNCaP prostate cancer cells. LNCaP cells were treated with either ethanol (Mock), 10 μM anti-androgens (bicalutamide (BIC), enzalutamide (ENZ)), or 10 nM androgen (DHT) for 24 h. Data is represented as the SEM from 6 independent RNA. The * denotes a significant (P < 0.05) difference in expression relative to Mock treated cells.
Socio-demographic and clinical characteristics of the QLD study populations.
| Characteristics | Men with prostate cancer ( | Healthy controls ( |
|
|---|---|---|---|
| Age in years (median, range) | 63.1 (42.6–87.1) | 61.8 (18–90) |
|
| BMI (Mean, SD) | 28.4 (4.7) | 27.9 (4.5) |
|
|
| |||
| Never married | 46 (4) | 88 (8) |
|
| Married/de facto | 931 (85) | 952 (81) | |
| Divorced/separated/widowed | 117 (11) | 133 (11) | |
| Unknown | 59 (5)* | 37 (3)* | |
|
| |||
| No | 499 (66) | 807 (90) |
|
| Yes | 262 (34) | 94 (10) | |
| Unknown | 392 (34)* | 309 (25)* | |
|
| |||
| No | 283 (66) | 709 (61) |
|
| Yes | 146 (34) | 447 (39) | |
| Unknown | 724 (63)* | 54 (4)* | |
|
| |||
| Never smoked | 418 (38) | 500 (43) |
|
| Former smoker | 589 (54) | 591 (50) | |
| Current smoker | 80 (7) | 81 (7) | |
| Unknown | 66 (6)* | 38 (3)* | |
|
| |||
| Non-drinker | 61 (14) | 151 (13) |
|
| Drinker | 367 (86) | 1021 (87) | |
| Unknown | 725 (63)* | 38 (3)* | |
|
| |||
| No formal education | 10 (1) | 16 (1) |
|
| Primary/Secondary school | 513 (47) | 471 (40) | |
| Professional qualification | 355 (33) | 374 (32) | |
| University degree | 212 (19) | 311 (27) | |
| Unknown | 63 (6)* | 38 (3)* | |
|
| |||
| <8 | 916 (79) | Not applicable | |
| ≥8 | 145 (13) | Not applicable | |
| Unknown | 92 (8) | Not applicable | |
aPositive family history is defined as at least one first degree relative with prostate cancer. bData was not collected for the retrospective study. *(%) with respect to the whole cohort. Individuals with “unknown” characteristics were not included in the analysis. c P values are from non-Parametric t-tests. dTwo-way ANOVA tests.
Genotype and allele associations of TG-PCA3 STR with prostate cancer risk.
| Genotype | Cases (%) | Controls (%) | OR (95% CI)a | p-valuea | OR (95% CI)b |
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 10/10 | 2 (0.2) | 3 (0.2) | — | — | — | — | — | — | — | — |
| 10/11 | 2 (0.2) | 1 (0.1)) | — | — | — | — | — | — | — | — |
| 10/12 | 3 (0.2) | 0 | — | — | — | — | — | — | — | — |
| 11/9 | 1 (0.1) | 0 | — | — | — | — | — | — | — | — |
| 11/11 | 680 (59) | 634 (52) | Reference | — | — | — | — | — | — | — |
| 11/12 | 392 (34) | 461 (38) | 0.80 (0.67–0.95) | 0.01 | 0.77 (0.64–0.92) | 0.005 | 0.008 | 0.001 | <0.0001 | 0.001 |
| 11/13 | 2 (0.2) | 0 | — | — | — | — | — | — | — | — |
| 12/12 | 73 (6.3) | 113 (9) | 0.61 (0.44–0.83) | 0.002 | 0.58 (0.42–0.81) | 0.001 | 0.002 | 0.003 | 0.001 | 0.002 |
| 12/13 | 1 (0.1) | 1 (0.1) | — | — | — | — | — | — | — | — |
|
| ||||||||||
| 9 | 1 (0.04) | 0 | — | — | — | — | — | — | ||
| 10 | 9 (0.4) | 7 (0.3) | — | — | — | — | — | — | — | — |
| 11 | 1757 (76) | 1730 (71) | 1.49 (1.11–1.99) | 0.008 | 1.55 (1.14–2.1) | 0.006 | 0.015 | 0.015 | 0.012 | 0.017 |
| 12 | 542 (23) | 688 (28) | 0.74 (0.63–0.86) | <0.0001 | 0.71 (0.61–0.84) | <0.0001 | 0.002 | 0.002 | <0.0001 | 0.001 |
| 13 | 3 (0.1) | 1 (0.04) | — | — | — | — | — | — | — | — |
Calculated using abinary logistic regression, bage corrected binary logistic regression, cbootstrap (two-tailed), dbootstrap (two-tailed) age corrected, efamily history corrected binary logistic regression, fbootstrap (two-tailed) family history corrected. The 11/11 repeats was used as reference for genotype analysis (IBM SPSS Statistic Processor; 23). GS: Gleason score; ns: no significant: CI: confidence interval.
Genotype and allele associations of TG-PCA3 STR with Gleason scores.
| Genotype | GS <8 | GS ≥8 | OR (95% CI)a |
| OR (95% CI)b |
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| 10/10 | 2 (0.2) | 0 | — | — | — | — | — | — | — | — |
| 10/11 | 2 (0.2) | 0 | — | — | — | — | — | — | — | — |
| 10/12 | 2 (0.2) | 1 (0.7) | — | — | — | — | — | — | — | — |
| 11/9 | 0 | 1 (0.7) | — | — | — | — | — | — | — | — |
| 11/11 | 534 (58) | 86 (59) | Reference | — | — | — | — | — | — | — |
| 11/12 | 309 (34) | 52 (36) | — | ns | — | — | — | — | — | — |
| 11/13 | 2 (0.2) | 0 | — | — | — | — | — | — | — | — |
| 12/12 | 64 (7) | 5 (3) | — | ns | — | — | — | — | — | — |
| 12/13 | 1 (0.1) | 0 | — | — | — | — | — | — | — | — |
|
| ||||||||||
| 9 | 0 | 1 (0.3) | — | — | — | — | — | — | — | — |
| 10 | 8 (0. 4) | 1 (0.3) | — | — | — | — | — | — | — | — |
| 11 | 1381 (75) | 225 (78) | 2.00 (1.06–3.76) | 0.031 | 2.33 (1.16–4.67) | 0.01 | 0.017. | 0.01. | 0.02 | 0.007 |
| 12 | 440 (24) | 63 (22) | — | ns | — | — | — | — | — | — |
| 13 | 3 (0.2) | 0 | — | — | — | — | — | — | — | — |
Calculated using abinary logistic regression, bage corrected binary logistic regression, cbootstrap (two-tailed), dbootstrap (two-tailed) age corrected, efamily history corrected binary logistic regression, fbootstrap (two-tailed) family history corrected. The 11/11 repeats was used as reference for genotype analysis (IBM SPSS Statistic Processor; 23). GS: Gleason score; ns: no significant: CI: confidence interval.
Figure 5Patients’ mortality data for the 11 and 12 repeats TG-PCA3 genotypes. (a) Overall mortality (n = 845; *p = 0.045; *p = 0.032). (b) prostate cancer specific mortality (n = 802). 2−∆Ct analysis from tumor (T) and adjacent non-tumor (NT) tissue. (c) Genotype expression (λ P = 0.0031; ɣ P = 0.0013) and (d), Allele expression analysis (# P = 0.0496). P values calculated with: Kaplan-Meier (Log-rank (Mantel-Cox)) (a,b); and Kolmogorov-Smirnov (c,d) tests.