Literature DB >> 31562322

12 new susceptibility loci for prostate cancer identified by genome-wide association study in Japanese population.

Ryo Takata^1,2, Atsushi Takahashi^3,4, Masashi Fujita⁵, Yukihide Momozawa⁶, Edward J Saunders⁷, Hiroki Yamada⁸, Kazuhiro Maejima⁵, Kaoru Nakano⁵, Yuichiro Nishida⁹, Asahi Hishida¹⁰, Keitaro Matsuo^11,12, Kenji Wakai¹⁰, Taiki Yamaji¹³, Norie Sawada¹³, Motoki Iwasaki¹³, Shoichiro Tsugane¹⁴, Makoto Sasaki¹⁵, Atsushi Shimizu¹⁵, Kozo Tanno¹⁵, Naoko Minegishi¹⁶, Kichiya Suzuki¹⁶, Koichi Matsuda¹⁷, Michiaki Kubo¹⁸, Johji Inazawa¹⁹, Shin Egawa⁸, Christopher A Haiman²⁰, Osamu Ogawa²¹, Wataru Obara²², Yoichiro Kamatani³, Shusuke Akamatsu^23,24, Hidewaki Nakagawa²⁵.

Abstract

Genome-wide association studies (GWAS) have identified ~170 genetic loci associated with prostate cancer (PCa) risk, but most of them were identified in European populations. We here performed a GWAS and replication study using a large Japanese cohort (9,906 cases and 83,943 male controls) to identify novel susceptibility loci associated with PCa risk. We found 12 novel loci for PCa including rs1125927 (TMEM17, P = 3.95 × 10-16), rs73862213 (GATA2, P = 5.87 × 10-23), rs77911174 (ZMIZ1, P = 5.28 × 10-20), and rs138708 (SUN2, P = 1.13 × 10-15), seven of which had crucially low minor allele frequency in European population. Furthermore, we stratified the polygenic risk for Japanese PCa patients by using 82 SNPs, which were significantly associated with Japanese PCa risk in our study, and found that early onset cases and cases with family history of PCa were enriched in the genetically high-risk population. Our study provides important insight into genetic mechanisms of PCa and facilitates PCa risk stratification in Japanese population.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：

Year: 2019 PMID： 31562322 PMCID： PMC6764957 DOI： 10.1038/s41467-019-12267-6

Source DB: PubMed Journal: Nat Commun ISSN： 2041-1723 Impact factor: 14.919

Introduction

The incidence of prostate cancer (PCa) among Asian males has been dramatically rising even though it is still lower compared to that in western countries[1]. In 2015, PCa became the most common cancer type among Japanese males[2]. The best-known risk factor for PCa is family history[3]. A patient with a family history of PCa in first-degree relative has twice the risk of developing PCa during his lifetime compared to a patient without a family history, indicating a strong influence of inherited genetics on PCa susceptibility[3]. Early onset PCa is also recognized as a marker of genetic susceptibility for hereditary PCa, and PCa cases with rare variants of BRCA2 and HOXB13 constitute 2.0 and 3.1% of early onset cases[4,5]. To identify genetic polymorphisms associated with PCa, a number of genome-wide association studies (GWASs) have been conducted, which identified ~170 risk loci associated with PCa[6-10]. However, most of these studies were carried out for population of European ancestry or cohorts of mixed ethnicity. The considerable diversity of genetic background between different populations as shown by 1000 Genomes Project, together with the large difference in the incidence of PCa among ethnic groups, suggests a role of genetic risk factors in PCa disparities[11,12]. Even though a recent study suggests that the number of genetic loci associated with PCa has almost saturated after studying more than a hundred thousand patients[10], it is possible that even more PCa associated risk loci will be identified by studying populations of non-European ancestry. We have initially identified five PCa associated risk loci by GWAS in the Japanese population[13], with a further meta-analysis in the Japanese population revealing three additional loci for PCa susceptibility[14]. We have also developed a highly reproducible polygenic risk estimation model for PCa detection, confirming polygenic risk of PCa in the Japanese population[15]. However, it is possible that increasing sample size will further improve power and may lead to the identification of yet-identified risk loci in this population. Moreover, it remains to be known which of the ~140 risk loci associated with PCa in the European ancestry population are also reproducible in the Asian population. In this study, we have performed an even larger GWAS using an independent Japanese cohort to identify novel susceptibility loci associated with PCa and also validate the association of the previously reported risk loci with PCa risk in the Japanese population. Furthermore, we stratified the polygenic risk for Japanese PCa patients by using 82 significant single nucleotide polymorphisms (SNPs) and examined clinical feature of those at genetically high risk.

Results

GWAS in the Japanese population

The overall design of the GWAS is depicted in Fig. 1. 5088 cases of PCa from Biobank Japan (BBJ) were included[16,17]. The controls consisted of 10,682 Japanese male subjects from four large cohort studies (the Japan Multi-Institutional Collaborative Cohort (J-MICC) study, the Japan Public Health Center-based Prospective (JPHC) study, Iwate Medical Megabank Organization (IMM), and Tohoku Medical Megabank Organization (ToMMo)) who had never been diagnosed with PCa[18-20]. Detailed clinical characteristics of the subjects are shown in Supplementary Table 1. After quality control, the association study was performed for 523,051 SNPs. All case and control samples belonged to the same cluster in the principal component analysis (Supplementary Fig. 1). A quantile–quantile (Q-Q) plot revealed modest inflation of the test statistics (λGC = 1.189); however, the value adjusted by the sample size, λGC　1000, was 1.027 (Supplementary Fig. 2). We further conducted imputation analysis using the data of 275 Asians in the 1000 Genomes Project Phase 1 as a reference (JPT: 89, CHB: 97, CHS: 89). As a result, 2997 SNPs represented P < 1 × 10−5 in GWAS (Fig. 1). Many of these SNPs existed in independent genetic regions including multiple loci at 8q24 (Fig. 2).

Fig. 1

Scheme for study design and screening results. First, we conducted discovery GWAS followed by imputation using 1000 Genomes Project Phase1 as a reference. Of the 2997 SNPs that were associated with PCa (P < 1 × 10−5), top 101 SNPs excluding those residing in previously reported loci were evaluated in the replication study using an independent cohort. A total of 12 SNPs were significantly associated with PCa (P < 5 × 10−8) after meta-analysis of discovery and replication cohorts

Fig. 2

Manhattan plot of GWAS. −log10 P value is plotted on the Y-axis. Each P value is calculated by a 1-degree-freedom Cochran-Armitage test

Replication study and novel PCa-susceptibility loci

Next, we conducted an independent replication study using 4818 cases from BBJ and JIKEI cohorts, and 73,261 male controls from the BBJ cohort. Of the 2997 SNPs, we removed SNPs that showed low imputation quality (R square < 0.3) and SNPs that have previously been reported to be associated with PCa. Then we selected representative SNPs for each genetic region that showed the strongest association with PCa in the same linkage disequilibrium (LD) block. We also selected 20 SNPs which showed exceptionally strong association with PCa compared to the other SNPs in the same LD block. In total, we selected 101 SNPs for the replication study (Supplementary Table 2). In the replication study, genotyping was conducted using multi-index sequencing and the multiplex invader method (see Methods). Among the candidate SNPs, five SNPs could not be genotyped by either method and were excluded. When we combined both stages using the inverse method, 12 SNPs which have not been reported previously were identified to be significantly associated with PCa at P < 5 × 10−8 (Table 1).

Table 1

12 novel PCa-susceptibility loci identified by GWAS and replication study in Japanese population

	SNP ID	Chr	Position	EffectAllele	NonEffectAllele		CaseFrequency	Control Frequency	RSQR^a	OR^b	95% CI	P-value	Nearby genes
1	rs7542260	1	5743196	C	T	GWAS	0.892	0.910	0.980	0.832	(0.750–0.914)	6.64 × 10⁻⁰⁶	–
						Replication	0.887	0.902	–	0.850	(0.784–0.916)	1.14 × 10⁻⁰⁶
						Meta	–	–	–	0.843	–	3.99 × 10⁻¹¹
2	rs11125927	2	62752975	A	G	GWAS	0.738	0.774	0.980	0.837	(0.779–0.895)	4.74 × 10⁻¹⁰	TMEM17
						Replication	0.748	0.772	–	0.876	(0.828–0.924)	4.22 × 10⁻⁰⁸
						Meta	–	–	–	0.860	–	3.95 × 10⁻¹⁶
3	rs73862213	3	128217499	A	G	GWAS	0.908	0.929	0.996	0.754	(0.666–0.842)	1.71 × 10⁻¹⁰	GATA2
						Replication	0.902	0.923	–	0.766	(0.694–0.838)	6.49 × 10⁻¹⁴
						Meta	–	–	–	0.761	–	5.87 × 10⁻²³
4	rs75777376	8	127796183	T	C	GWAS	0.896	0.908	0.528	0.779	(0.669-0.889)	6.08 × 10⁻⁰⁶	–
						Replication	0.900	0.923	–	0.744	(0.672–0.816)	3.20 × 10⁻¹⁶
						Meta	–	–	–	0.754	–	1.20 × 10⁻²⁰
5	rs16901814	8	127881952	G	A	GWAS	0.854	0.833	0.993	1.176	(1.108–1.244)	2.00 × 10⁻⁰⁶	–
						Replication	0.857	0.838	–	1.161	(1.101–1.221)	6.35 × 10⁻⁰⁷
						Meta	–	–	–	1.167	-	5.64 × 10⁻¹²
6	rs4554825	10	80244623	C	T	GWAS	0.790	0.764	1.000	1.170	(1.110-1.230)	1.06 × 10⁻⁰⁷	ZMIZ1-AS
						Replication	0.786	0.766	–	1.122	(1.070–1.174)	6.29 × 10⁻⁰⁶
						Meta	–	–	–	1.142	–	8.43 × 10⁻¹²
7	rs77911174	10	80826833	A	G	GWAS	0.604	0.643	0.957	0.853	(0.801–0.905)	6.14 × 10⁻¹⁰	ZMIZ1
						Replication	0.599	0.634		0.864	(0.820–0.908)	8.68 × 10⁻¹²
						Meta	–	–	–	0.859	–	5.28 × 10⁻²⁰
8	rs11055034	12	12890626	C	A	GWAS	0.601	0.577	0.999	1.123	(1.073-1.173)	3.11 × 10⁻⁰⁶	CDKN1B, APOLD1
						Replication	0.606	0.580	–	1.115	(1.073–1.157)	3.74 × 10⁻⁰⁷
						Meta	–	–	–	1.119	–	6.05 × 10⁻¹²
9	rs8023793	15	66942093	A	C	GWAS	0.595	0.559	0.997	1.137	(1.087-1.187)	2.56 × 10⁻⁰⁷	–
						Replication	0.592	0.569	–	1.098	(1.056–1.140)	1.27 × 10⁻⁰⁵
						Meta	–	–	–	1.114	–	2.81 × 10⁻¹¹
10	rs6117562	20	753310	G	A	GWAS	0.485	0.516	0.989	0.874	(0.824–0.924)	3.64 × 10⁻⁰⁸	SLC52A3, FAM110A
						Replication	0.498	0.512	–	0.945	(0.903-0.987)	7.12 × 10⁻⁰³
						Meta	–	–	–	0.915	–	3.34 × 10⁻⁰⁸
11	rs138708	22	39138332	G	A	GWAS	0.855	0.831	0.983	1.172	(1.104-1.240)	3.74 × 10⁻⁰⁶	SUN2
						Replication	0.859	0.834	–	1.218	(1.158-1.278)	5.12 × 10⁻¹¹
						Meta	–	–	–	1.198	–	1.13 × 10⁻¹⁵
12	rs4826594	X	54454406	G	A	GWAS	0.482	0.525	0.967	0.924	(0.888-0.960)	8.21 × 10⁻⁰⁶	FGD1, GNL3L, TSR2
						Replication	0.479	0.508	–	0.943	(0.913-0.973)	1.08 × 10⁻⁰⁴
						Meta	–	–	–	0.935	–	7.13 × 10⁻⁰⁹

aRSQR, imputation accuracy. SNPs were imputed in the GWAS

bNon-effect alleles were considered as reference

12 novel PCa-susceptibility loci identified by GWAS and replication study in Japanese population aRSQR, imputation accuracy. SNPs were imputed in the GWAS bNon-effect alleles were considered as reference Locus explore plot of each locus is shown in Fig. 3 and Supplementary Fig. 3[21]. All novel SNPs identified in this study except rs138708 existed in non-exonic regions. Of the 12 new loci, five loci (rs7542260 at chr.1, rs75777376 at chr.8, rs16901814 at chr.8, rs4554825 at chr.10, and rs8023793 at chr.15) contained no protein-coding genes in their LD blocks or in their vicinities (within 100 kb). On the other hand, rs11125927 at chr.2 was located at the intron of the TMEM17 gene. rs73862213 at chr.3 was located near GATA2. rs77911174 at chr.10 region included ZMIZ1 gene. rs11055034 at chr.12 was in the intron of APOLD1 and the region also contained the 3’ end of CDKN1B. rs6117562 at chr.20 was in the region containing SLC52A3. rs138708 at chr.22, formed a very large LD block with many genes, however, rs138708 is a non-synonymous SNP in the exon of SUN2 gene. The region spanning rs4826594 at chr.X included three genes, FGD1, GNL3L, and TSR2.

Fig. 3

Locus Explorer plots of novel five GWAS loci. a rs11125927 at chr.2. b rs73862213 at chr.3. c rs4554825 & rs77911174 at chr.10. d rs138708 at chr.22. The regional association plot (−log10(P) panel) depicts variant P-values relative to chromosomal position. Variants in linkage disequilibrium with the novel lead SNP(s) at r2 ≥ 0.1 according to the 1000 Genomes JPT population are shaded in the Manhattan plot and linkage disequilibrium track (LD panel), with darker color denoting stronger correlation with the lead variant. Lower sections of the plot indicate the relative positions of genes and selected biological annotations. Annotations displayed are: histone modifications in ENCODE tier 1 cell lines (Histone track), the positions of variants that are eQTLs with prostate tumor expression in TCGA prostate adenocarcinoma samples (eQTL track), chromatin state categorizations in the PrEC cell-line by ChromHMM (ChromHMM track), the position of conserved element peaks (Conserved track) and the position of DNaseI hypersensitivity site peaks in ENCODE prostate cell lines (DNaseI track). Genes on the positive and negative strand are denoted by brown and turquoise color respectively (Gene track). The horizontal axis represents genomic coordinates in the hg19 reference genome

Expression analysis of the new loci

It is possible that these SNPs reside in enhancer or suppressor regions of nearby genes and influence prostate carcinogenesis by altering the expression of these genes. In order to check the association between the genotype of newly identified SNPs and expression of the genes in their 1 Mb proximity, we conducted eQTL analysis using the GTEX database[22]. As a result, four SNPs showed weak association with expression of nearby genes (P < 0.05, Supplementary Table 3). Among them, rs16901814 showed association with expression of the FAM84B gene which is 311 kb away from the index SNP (P = 0.0204) (Supplementary Fig. 3b). Even though the function of FAM84B is not well known, its expression is elevated in various cancer, and its overexpression and copy number gain in PCa is reported to be associated with poor prognosis[23]. rs4554825 was associated with expression of ZMIZ1-AS (P = 0.0361). As previously mentioned, ZMIZ1 is a gene that was present in the LD block containing rs77911174 (Fig. 3c). It is possible that both rs4554825 and rs77911174 are independently associated with PCa by altering expression or function of ZMIZ1. rs6117562 was associated with expression of FAM110A (P = 0.0104). FAM110A is a cell-cycle regulated gene whose function is not well known. Families of FAM110 proteins localize to centrosomes and are associated with microtubule aberrations[24], and FAM110A may affect prostate carcinogenesis through dysregulation of cell-cycle progression. rs6117562 was also associated with CSNK2A1 expression (P = 0.0276), which is known as responsible gene for Okur-Chung neurodevelopmental syndrome[25].

Association with the reported loci

Recent PCa GWAS have analyzed more than one-hundred thousand patients and identified many genetic regions associated with PCa[6-10]. However, these previous studies as well as meta-analysis have failed to identify the 12 susceptibility loci identified in this study (Supplementary Fig. 4). Among the novel loci identified, minor allele frequency (MAF) in Caucasians were significantly lower compared to that in the Japanese in seven loci, rs7542260, rs73862213, rs75777376, rs4554825, rs77911174, rs138708, and rs4826594 (Supplementary Table 4). The difference in MAF likely accounts for the lack of identification of these loci in previous studies. On the other hand, the MAF in the European population did not significantly differ from that in the Japanese for the other SNPs. In these cases, LD structure pattern spanning the marker SNPs might be different by population. Since several of the SNPs newly identified in this study existed in regions close to previously reported PCa susceptibility loci, we checked the independency of association by conditional analysis with GWAS data (Supplementary Table 5). rs721048, a known PCa susceptibility loci, is located near a newly identified SNP, rs11125927. However, rs11125927 was independently associated with PCa (P = 1.02 × 10−9) in the conditional analysis. We also checked for the independency between rs73862213 and EEFSEC region (rs10934853), and between rs4826594 and NUT10/11 region (rs5945619); however, both rs73862213 and rs4826594 were statistically independent of the previously reported loci. Both rs75777376 and rs16901814 in 8q24 region, which exist near rs12543663, were also independently associated with PCa by conditional analysis. In addition, the two SNPs newly identified on chromosome 10 were independent of each other in the conditional analysis. In the study, rs114780236 at chr.2 and rs4842687 at chr.12 showed significant association with Japanese PCa (Supplementary Table 6). But these SNPs were located in regions close to reported loci, rs58235267 and rs5799921[26]. Since our GWAS data did not contain the genotypes of rs58235267 and rs5799921, we could not conduct conditional analysis. Although these loci were not in complete LD with the reported loci in Japanese (The R square for rs114780236 and rs58235267 was 0.4701 and the R square for rs4842687 and rs579921 was 0.6962, respectively), the relatively high correlation suggests that rs114780236 and rs4842687 may be in the same susceptibility region with these reported loci in Japanese. We examined whether the 167 previously reported PCa related loci are associated with PCa susceptibility in Japanese using our GWAS data. We excluded 13 SNPs for which data were not available and 18 SNPs which showed MAF < 0.01 or were mono-allelic, leaving 136 SNPs for analysis. We found 68 SNPs showed weak association (P < 0.05), and 28 SNPs to be strongly associated with PCa in Japanese after Bonferroni correction (P < 0.00037) (Supplementary Data 1). We then checked whether the rate of validation differs by the ethnic group the original report studied with 85 SNPs. The validation rate was the highest for the SNPs discovered using Asian cohort (Supplementary Table 7). Of the 75 SNPs found in whites, 34 (46%) were nominally significant at P < 0.05 in Japanese, with 18 at P < 0.0005. Of the 10 SNPs found previously in Asian men, 8 (80%) were nominally significant (P < 0.05) with all at P < 0.0005. The result highlights the large heterogeneity of PCa associated genetics factors between different ethnic groups.

Polygenic risk score in Japanese population

Finally, we selected 12 SNPs which were newly discovered in the study and 68 SNPs which showed nominal association with the Japanese PCa risk from previously reported SNPs (Supplementary Table 8). Since our GWAS data did not contain two reported loci, rs58235267 and rs5799921, we also included rs114780236 and rs4842687 to represent the two regions. We calculated a polygenic risk score (PRS) by counting the number of risk alleles and their effects in each individual of the GWAS samples. The distribution of the PRS in the PCa cases (n = 4893) and the male controls (n = 10,682) are shown in Fig. 4a. We defined the upper 5% of the cases (n = 245) as the genetic high-risk group and the lower 5% as the low-risk group (n = 245), and examined clinical features of these genetically risk–stratified groups. Notably, we found that the mean diagnosis age of the high-risk group was 2.7 years younger than the non-high risk group (mean age 68.7-year old vs 71.4 year old, P = 6.54 × 10−8, by t-test), while we observed little difference of mean diagnosis age between the low-risk and non-low risk groups (72.3 vs 71.1, P = 0.020). We observed the enrichment of high PRS in early onset PCa cases (P = 0.00221 for cases with age < 60-year old, by t-test, and P = 4.30 × 10−9 for cases with age < 65-year old, Fig. 4b). The high-risk group was also enriched with patients who have a positive family history of PCa (P = 0.00339, by Fisher test, Fig. 4c). On the other hand, when we recalculated the PRS using 150 SNPs after adding 68 reported SNPs which indicated no association with Japanese PCa, statistical association between PRS and early onset PCa in the high-risk group became weaker (P = 0.02395 for cases with age < 60-year old, by t-test, and P = 3.24 × 10−7 for cases with age < 65-year old,　Supplementary Fig. 5a). Association between PRS and positive family history of PCa also declined (P = 0.02395, by Fisher test, Supplementary Fig. 5b).

Fig. 4

The distribution of the polygenic risk score (PRS) for PCa in Japanese population. a The PRS distribution of the PCa cases (n = 4893) and the male control (n = 10,682) of GWAS. Density was estimated using the Gaussian kernel. The 5% higher and lower percentiles are shown as dotted lines. b The PRS distribution by the age at diagnosis of PCa in the GWAS cases (n = 4762). Density was estimated using the Gaussian kernel. Green, younger than 60 years (n = 129); blue, younger than 65 years (n = 781); gray, 65 years or older (n = 3852). c The PRS distribution by the presence of PCa family history in the GWAS cases (n = 4893). Density was estimated using the Gaussian kernel. Red, positive PCa family history (n = 272); blue, negative PCa family history (n = 4621). d The PRS distribution by the age at diagnosis of PCa in the JIKEI validation cohort (n = 2218). Density was estimated using the Gaussian kernel. Red, younger than 55 years (n = 94); green, younger than 60 years (n = 310); blue, younger than 65 years (n = 802); gray, 65 years or older (n = 802) Furthermore, we calculated PRS in the PCa cases of the two validation cohorts (BBJ n = 2386, and JIKEI n = 2218) by counting the risk allele of 63 SNPs for which genotypes were available in the replication study (Supplementary Table 8). The distribution of PRS in both cohorts is shown in Supplementary Fig. 5c. We confirmed that in both cohorts, the age at diagnosis for the 5%-high-risk group is approximately 2 years younger than the non-high risk group (P = 0.023 in BBJ cohort and P = 0.0061 in JIKEI cohort). We observed that early onset PCa was enriched in the high PRS group of the BBJ cohort (P = 0.0111 for cases with age < 60-year old, by t-test, and P = 0.0451 for cases with age < 65-year old, Supplementary Fig. 5d) and in the JIKEI cohort (P = 0.00378, for cases with age < 55-year old, P = 0.00119 for cases with age < 60-year old, and P = 8.90 × 10−5 for cases with age < 65-year old, by t-test, Fig. 4d).　We also confirmed that the high-risk group was enriched with patients who have positive family history of PCa in BBJ replication cohort (P = 0.0281, by Fisher test, Supplementary Fig. 5e).

Discussions

In this study, we identified 12 new PCa-susceptibility loci in the Japanese population, but their functional or biological significances related to PCa development are still unclear, which is also the case with many other previously reported loci. Because most of them are located in non-coding regions and our expression analysis found only four loci to be weakly related to expression of nearby genes, these loci are likely associated with gene regulatory functions. Among the 12 loci, rs11125927 at chr.2 contained TMEM17 gene in the LD block, which is a cilium associated gene reported to suppress invasion and migration of non-small cell lung cancer by restoring Occuludin and Zo-1 expression through inactivation of ERK-P90RSK-Snail pathway[27]. rs73862213 at chr.3 contained GATA2 in the same LD block, which plays an important role in promoting high grade PCa[28]. In addition to its role as a transcription factor that promotes androgen receptor (AR) binding and activation, it regulates a subset of clinically relevant PCa associated genes in an AR independent manner. Functionally, GATA2 overexpression promotes cell motility, migration, growth, tumorigenesis, and therapy resistance in PCa. rs77911174 at chr.10 region included the ZMIZ1 gene, which binds to AR and enhances its transcriptional activity in PCa cells[29]. In addition, it co-localizes with AR and SUMO1 and promotes sumoylation of AR in vivo. The same genetic region has previously been reported to be associated with susceptibility to colon cancer and breast cancer[30,31]. Our eQTL analysis suggested the association of ZMIZ1-AS (antisense) expression with another independent loci rs4554825, indicating that regulation of ZMIZ1 expression should be important in prostate carcinogenesis. The LD block spanning rs11055034 at chr.12 contained the 3’ end of CDKN1B, which is strongly expressed in non-proliferating cells and plays important roles in the regulation of both quiescence and G1 progression[32]. It is known to act as a tumor-suppressor in PCa and suppression of CDKN1B leads to growth promotion of PCa cells[32]. rs6117562 at chr.20 was in the region containing SLC52A3, which is known as a transporter for riboflavin, however, association with PCa has not been reported[33]. rs138708 at chr.22 is a non-synonymous SNP of SUN2, which is a member of LINK complex and plays an important role in nuclear-cytoplasmic connection and suppress Warburg effect in cancer[34]. Overexpression of SUN2 inhibits PCa cell growth and SUN2 knockdown promotes PCa growth[35]. The LD block spanning rs4826594 at chr.X included three genes, FGD1, GNL3L, and TSR2. Fgd1 is transiently associated with invadopodia and required for their formation and function in extracellular matrix degradation and acts by direct modulation of Cdc42 activation[36]. FGD1 is overexpressed in human prostate and breast cancer and is associated with tumor aggressiveness[36]. GNL3L has been reported to directly bind to and stabilize MDM2 protein, however its role in PCa has not been studied[37]. TSR2 is known to enhance apoptosis by suppressing NF-kB signaling in laryngeal cancer[38]. The genes listed above potentially influence PCa development and further functional analysis is warranted. Polygenic risk estimation by using common variants for PCa has been attempted. For breast cancer, PRS consisting of 77 SNPs tested in 33,000 cases demonstrated significant interaction between PRS and age and family history[39]. Genetic risk prediction of PCa was first reported using five common susceptibility variants[40], which was established by simply counting the number of risk alleles. Subsequently, models incorporated increasing number of the significant variants[15], and some models uses much more variants including the variants that are not significant at the genome-wide level[41]. It is still controversial which and how many variants should be used for PRS[42]. Ethnic-specific or genomic structure-specific information is also an important issue for PRS. In this study for PRS we used 82 SNPs which were statistically associated with PCa in Japanese population and found that early onset and familial PCa were enriched in the high-risk group in Japanese population. Early onset and familial PCa, with which genetic factors should be more involved, was reported to be enriched in cases with rare variants of BRCA2 and HOXB13 in Caucasian population[4,5]. However, this is the first report to show that risk assessment using 50~ common variants may explain hereditary phenotype of PCa.　In addition, PRS using 82 SNPs which are associated with Japanese PCa showed stronger association with early onset PCa and positive family history of PCa than PRS with 150 SNPs which also included the SNPs that failed to show association with Japanese PCa. The result suggests that more precise selection of patients with high risk of developing PCa may be possible with ethnic population-specific PRS. Early onset PCa is often more aggressive and may have a different etiology than later-onset PCa[43]. Among men diagnosed with high grade and advanced stage PCa, men with early onset PCa are more likely to die of their cancer, with higher cause-specific mortality than later-onset disease[43]. In addition, familial PCa accounts for a greater proportion of PCa in early onset cases than it does in men diagnosed at older ages. These PCas have been shown to have a more significant genetic component indicating that this group may benefit the most from evaluation of genetic risk[44]. Since the PRS model is useful for the early detection of early onset PCa and familial PCa, PRS might have a greater impact on the clinical examination and treatment of PCa. However, further replication of this risk-stratification by other larger cohort is required before the model is applied to clinical use. As with other GWAS for PCa, contamination of control with undiagnosed PCa is a limitation of our study. In Japan, as with western countries, increasing number of PCa are detected by PSA screening. However, it is estimated that still less than 50% of males over 50 are actually exposed to PSA screening in Japan, and 10% of newly diagnosed PCa patients present with metastatic disease. Therefore, it is likely that the control in this study includes undiagnosed PCa cases to certain extent. This certainly implies that the SNPs identified in this study is potentially associated with factors other than PCa carcinogenesis, such as health consciousness to receive screening. Continuing efforts should be made to reveal the biological significance of each SNPs reported in GWAS studies including this study, which hopefully delineate the complex interaction between genetic susceptibility and environmental exposure. In summary, we have conducted a large-scale GWAS for Japanese and identified 12 PCa susceptibility loci that have not been reported previously. We stratified the polygenic risk for Japanese PCa patients by using 82 associated-SNPs and indicated that early onset and familial PCa cases were enriched in the genetically high-risk population.

Methods

Study population

GWAS included 5088 cases from BioBank Japan, which was established in the Institute of Medical Science at the University of Tokyo[16,17]. Among the 5088 cases for the GWAS, 272 (5.3%) subjects had a family history of PCa, 1219 (23.9%) subjects revealed PSA ≥ 10 and 1989 (39.1%) subjects were diagnosed to have cancer with Gleason score of 7 or higher (Supplementary Table 1). From the BBJ, pathologically proven PCa cases were selected. Non-cancer controls were from three population-based cohorts, including the J-MICC study[18], the JPHCStudy[19], IMM, and ToMMo[20]. Genomic DNA samples were extracted from peripheral blood leukocytes and normal tissues using a standard method. All participating studies obtained informed consent from all participants by following the protocols approved by their institutional ethical committees before enrollment, and the ethical committees at each institute approved the project (BBJ: https://biobankjp.org/english/index.html, J-MICC: http://www.jmicc.com/en/, JPHC: https://epi.ncc.go.jp/en/jphc/index.html, IMM: http://iwate-megabank.org/en/, ToMMo: https://www.megabank.tohoku.ac.jp/english/).

Genotyping and quality control

GWAS was conducted using Illumina OmniExpress Exome or the OmniExpress + HumanExome BeadChip (Illumina Inc., San Diego, California, U.S.). Of the 947,830 SNPs genotyped, 195,588 were mono-allelic and were excluded from further analysis. Cluster plots of the top 100 SNPs showing the smallest P-values were checked by visual observation, and 604,992 SNPs met the criteria of call rate ≧ 0.99 both in case and control samples. Finally, SNPs P ≧ 1.00 × 10−6 in a Hardy-Weinberg Equilibrium test were selected. Association study was performed for the total of 523,051 SNPs. Imputation of the un-genotyped SNPs was conducted by MaCH[45] and minimac[46] using the data from the JPT/CHS/CHD subjects and using the 1000 Genome Project Phase 1 (release 16 March 2012) as a reference. We exclude SNP with a large allele frequency difference between the reference panel and the GWAS (>0.16)[47]. We also excluded SNPs with low imputation quality score (R square < 0.3) and insertion/deletion polymorphisms.

Samples and genotyping for the replication studies

We conducted a replication study using independent 4818 PCa cases and 73,261 controls. Case samples were obtained from the BioBank Japan (2236 cases) and JIKEI samples (2582 samples) from The Jikei University School of Medicine. The JIKEI sample has been described previously with new samples being added for this study[48]. All cases were histologically diagnosed by local pathologists, and clinical data were collected by local urologists. Controls in the replication study were the 73,261 male samples from BBJ that were subjected to GWAS for diseases other than PCa. A multi-index PCR-based target sequencing method was used to sequence the target region of case samples[49]. We used a two-step PCR method to construct DNA libraries. The 1st PCR (25 cycle) was performed with 202 primer pairs and 2X Platinum Multiplex PCR Master Mix (Thermo Fisher Scientific) to amplify the target region, followed by the 2nd PCR (4 cycle) with 8-bp barcode and adapter sequences added using primers targeting shared 5’ overhangs introduced during the 1st PCR and KAPA HiFi HotStart DNA Polymerase (KAPA). After purification and quantification of pooled libraries, we sequenced them by 2 150-bp paired-end reads on a HiSeq 2500 (Illumina) instrument. Sequence reads allocated to each individual were aligned to the human reference sequence (hg19) using Burrows-Wheeler Aligner (ver. 0.7.12) and processed using Genome Analysis Toolkit (GATK, ver. 3.4–46)[50,51]. For quality control, we selected individuals in which more than 98% of the target region was covered with 20 or more sequencing reads. We called variants of each individual separately using UnifiedGenotyper and HaplotypeCaller of GATK, and VCMM (ver. 1.0.2)[52]. Genotypes for all individuals were jointly determined for each variant based on the sequencing read ratio of reference and alternative alleles. When the alternative allele frequency was between 0 and 0.15, between 0.25 and 0.75, and between 0.85 and 1, we assigned homozygote of the reference allele, heterozygote, and homozygote of the alternative allele, respectively. The SNPs that could not be analyzed by multi-index sequencing were genotyped by multiplex PCR-based invader assay[53]. The five SNPs that could not be genotyped by both assays were excluded in the replication study.

Statistical analysis

In all stages, association of each SNP was assessed under an additive model.　In the GWAS, the genetic inflation factor λGC was derived from P-values obtained by the Cochran-Armitage trend test for all the tested SNPs. The quantile–quantile plot was drawn using the R program. λGC 1000 was calculated using the following formula:[54]Odds ratios were calculated using the non-effect alleles as references, unless stated otherwise. The results of the combined analyses of the GWAS and the replication study were verified by the Mantel–Haenszel method. Heterogeneity across the two stages was examined using P-link[55]. We considered P = 5 × 10−8 (GWAS and meta-analysis) as the significance threshold after Bonferroni correction for multiple testing.

Polygenic risk score

Polygenic score was computed as the weighted sum of the number of risk alleles. Log odds ratios computed in the GWAS part of this study were used as the weights. The number of incorporated SNPs was 82 and 63 in the GWAS and the validation cohorts, respectively (Supplementary Table 8). For imputed alleles in GWAS, dosage values were used as the number of risk alleles. Statistical association between polygenic score and clinical information was analyzed using the statistical software R[56].

54 in total

1. A high-throughput SNP typing system for genome-wide association studies.

Authors: Y Ohnishi; T Tanaka; K Ozaki; R Yamada; H Suzuki; Y Nakamura
Journal: J Hum Genet Date: 2001 Impact factor: 3.172

2. Assessing the impact of population stratification on genetic association studies.

Authors: Matthew L Freedman; David Reich; Kathryn L Penney; Gavin J McDonald; Andre A Mignault; Nick Patterson; Stacey B Gabriel; Eric J Topol; Jordan W Smoller; Carlos N Pato; Michele T Pato; Tracey L Petryshen; Laurence N Kolonel; Eric S Lander; Pamela Sklar; Brian Henderson; Joel N Hirschhorn; David Altshuler
Journal: Nat Genet Date: 2004-03-28 Impact factor: 38.330

3. Low-frequency coding variants in CETP and CFB are associated with susceptibility of exudative age-related macular degeneration in the Japanese population.

Authors: Yukihide Momozawa; Masato Akiyama; Yoichiro Kamatani; Satoshi Arakawa; Miho Yasuda; Shigeo Yoshida; Yuji Oshima; Ryusaburo Mori; Koji Tanaka; Keisuke Mori; Satoshi Inoue; Hiroko Terasaki; Tetsuhiro Yasuma; Shigeru Honda; Akiko Miki; Maiko Inoue; Kimihiko Fujisawa; Kanji Takahashi; Tsutomu Yasukawa; Yasuo Yanagi; Kazuaki Kadonosono; Koh-Hei Sonoda; Tatsuro Ishibashi; Atsushi Takahashi; Michiaki Kubo
Journal: Hum Mol Genet Date: 2016-11-15 Impact factor: 6.150

4. Genome-wide association study identifies five new susceptibility loci for prostate cancer in the Japanese population.

Authors: Ryo Takata; Shusuke Akamatsu; Michiaki Kubo; Atsushi Takahashi; Naoya Hosono; Takahisa Kawaguchi; Tatsuhiko Tsunoda; Johji Inazawa; Naoyuki Kamatani; Osamu Ogawa; Tomoaki Fujioka; Yusuke Nakamura; Hidewaki Nakagawa
Journal: Nat Genet Date: 2010-08-01 Impact factor: 38.330

5. The JPHC study: design and some findings on the typical Japanese diet.

Authors: Shoichiro Tsugane; Norie Sawada
Journal: Jpn J Clin Oncol Date: 2014-08-07 Impact factor: 3.019

Review 6. Statistical methods in cancer research. Volume II--The design and analysis of cohort studies.

Authors: N E Breslow; N E Day
Journal: IARC Sci Publ Date: 1987

7. Identification and comparative functional characterization of a new human riboflavin transporter hRFT3 expressed in the brain.

Authors: Yoshiaki Yao; Atsushi Yonezawa; Hiroki Yoshimatsu; Satohiro Masuda; Toshiya Katsura; Ken-Ichi Inui
Journal: J Nutr Date: 2010-05-12 Impact factor: 4.798

8. Multiple loci identified in a genome-wide association study of prostate cancer.

Authors: Gilles Thomas; Kevin B Jacobs; Meredith Yeager; Peter Kraft; Sholom Wacholder; Nick Orr; Kai Yu; Nilanjan Chatterjee; Robert Welch; Amy Hutchinson; Andrew Crenshaw; Geraldine Cancel-Tassin; Brian J Staats; Zhaoming Wang; Jesus Gonzalez-Bosquet; Jun Fang; Xiang Deng; Sonja I Berndt; Eugenia E Calle; Heather Spencer Feigelson; Michael J Thun; Carmen Rodriguez; Demetrius Albanes; Jarmo Virtamo; Stephanie Weinstein; Fredrick R Schumacher; Edward Giovannucci; Walter C Willett; Olivier Cussenot; Antoine Valeri; Gerald L Andriole; E David Crawford; Margaret Tucker; Daniela S Gerhard; Joseph F Fraumeni; Robert Hoover; Richard B Hayes; David J Hunter; Stephen J Chanock
Journal: Nat Genet Date: 2008-02-10 Impact factor: 38.330

9. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

10. A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer.

Authors: Ali Amin Al Olama; Zsofia Kote-Jarai; Sonja I Berndt; David V Conti; Fredrick Schumacher; Ying Han; Sara Benlloch; Dennis J Hazelett; Zhaoming Wang; Ed Saunders; Daniel Leongamornlert; Sara Lindstrom; Sara Jugurnauth-Little; Tokhir Dadaev; Malgorzata Tymrakiewicz; Daniel O Stram; Kristin Rand; Peggy Wan; Alex Stram; Xin Sheng; Loreall C Pooler; Karen Park; Lucy Xia; Jonathan Tyrer; Laurence N Kolonel; Loic Le Marchand; Robert N Hoover; Mitchell J Machiela; Merideth Yeager; Laurie Burdette; Charles C Chung; Amy Hutchinson; Kai Yu; Chee Goh; Mahbubl Ahmed; Koveela Govindasami; Michelle Guy; Teuvo L J Tammela; Anssi Auvinen; Tiina Wahlfors; Johanna Schleutker; Tapio Visakorpi; Katri A Leinonen; Jianfeng Xu; Markus Aly; Jenny Donovan; Ruth C Travis; Tim J Key; Afshan Siddiq; Federico Canzian; Kay-Tee Khaw; Atsushi Takahashi; Michiaki Kubo; Paul Pharoah; Nora Pashayan; Maren Weischer; Borge G Nordestgaard; Sune F Nielsen; Peter Klarskov; Martin Andreas Røder; Peter Iversen; Stephen N Thibodeau; Shannon K McDonnell; Daniel J Schaid; Janet L Stanford; Suzanne Kolb; Sarah Holt; Beatrice Knudsen; Antonio Hurtado Coll; Susan M Gapstur; W Ryan Diver; Victoria L Stevens; Christiane Maier; Manuel Luedeke; Kathleen Herkommer; Antje E Rinckleb; Sara S Strom; Curtis Pettaway; Edward D Yeboah; Yao Tettey; Richard B Biritwum; Andrew A Adjei; Evelyn Tay; Ann Truelove; Shelley Niwa; Anand P Chokkalingam; Lisa Cannon-Albright; Cezary Cybulski; Dominika Wokołorczyk; Wojciech Kluźniak; Jong Park; Thomas Sellers; Hui-Yi Lin; William B Isaacs; Alan W Partin; Hermann Brenner; Aida Karina Dieffenbach; Christa Stegmaier; Constance Chen; Edward L Giovannucci; Jing Ma; Meir Stampfer; Kathryn L Penney; Lorelei Mucci; Esther M John; Sue A Ingles; Rick A Kittles; Adam B Murphy; Hardev Pandha; Agnieszka Michael; Andrzej M Kierzek; William Blot; Lisa B Signorello; Wei Zheng; Demetrius Albanes; Jarmo Virtamo; Stephanie Weinstein; Barbara Nemesure; John Carpten; Cristina Leske; Suh-Yuh Wu; Anselm Hennis; Adam S Kibel; Benjamin A Rybicki; Christine Neslund-Dudas; Ann W Hsing; Lisa Chu; Phyllis J Goodman; Eric A Klein; S Lilly Zheng; Jyotsna Batra; Judith Clements; Amanda Spurdle; Manuel R Teixeira; Paula Paulo; Sofia Maia; Chavdar Slavov; Radka Kaneva; Vanio Mitev; John S Witte; Graham Casey; Elizabeth M Gillanders; Daniella Seminara; Elio Riboli; Freddie C Hamdy; Gerhard A Coetzee; Qiyuan Li; Matthew L Freedman; David J Hunter; Kenneth Muir; Henrik Gronberg; David E Neal; Melissa Southey; Graham G Giles; Gianluca Severi; Michael B Cook; Hidewaki Nakagawa; Fredrik Wiklund; Peter Kraft; Stephen J Chanock; Brian E Henderson; Douglas F Easton; Rosalind A Eeles; Christopher A Haiman
Journal: Nat Genet Date: 2014-09-14 Impact factor: 38.330

15 in total

Review 1. ZMIZ proteins: partners in transcriptional regulation and risk factors for human disease.

Authors: Hilda Lomelí
Journal: J Mol Med (Berl) Date: 2022-06-07 Impact factor: 5.606

2. Variants in HOXB13, G132E and F127C, Are Associated With Prostate Cancer Risk in Japanese Men.

Authors: Sota Kurihara; Hiroshi Matsui; Nobuaki Ohtake; Masanori Aoki; Yoshitaka Sekine; Seiji Arai; Hidekazu Koike; Kazuhiro Suzuki; Yoshiyuki Miyazawa
Journal: Cancer Diagn Progn Date: 2022-09-03

3. Combined analyses of RNA-sequence and Hi-C along with GWAS loci-A novel approach to dissect keloid disorder genetic mechanism.

Authors: Jia Huang; Xiaobo Zhou; Wenbo Wang; Guangdong Zhou; WenJie Zhang; Zhen Gao; Xiaoli Wu; Wei Liu
Journal: PLoS Genet Date: 2022-06-16 Impact factor: 6.020

Review 4. Polygenic risk score for genetic evaluation of prostate cancer risk in Asian populations: A narrative review.

Authors: Sang Hun Song; Seok Soo Byun
Journal: Investig Clin Urol Date: 2021-05

5. A transcriptome-wide association study identifies novel candidate susceptibility genes for prostate cancer risk.

Authors: Duo Liu; Jingjing Zhu; Dan Zhou; Emily G Nikas; Nikos T Mitanis; Yanfa Sun; Chong Wu; Nicholas Mancuso; Nancy J Cox; Liang Wang; Stephen J Freedland; Christopher A Haiman; Eric R Gamazon; Jason B Nikas; Lang Wu
Journal: Int J Cancer Date: 2021-09-25 Impact factor: 7.396