Literature DB >> 21436895

Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility Loci.

Le B Nguyen¹, Sharon J Diskin, Mario Capasso, Kai Wang, Maura A Diamond, Joseph Glessner, Cecilia Kim, Edward F Attiyeh, Yael P Mosse, Kristina Cole, Achille Iolascon, Marcella Devoto, Hakon Hakonarson, Hongzhe K Li, John M Maris.

Abstract

Neuroblastoma is a malignant neoplasm of the developing sympathetic nervous system that is notable for its phenotypic diversity. High-risk patients typically have widely disseminated disease at diagnosis and a poor survival probability, but low-risk patients frequently have localized tumors that are almost always cured with little or no chemotherapy. Our genome-wide association study (GWAS) has identified common variants within FLJ22536, BARD1, and LMO1 as significantly associated with neuroblastoma and more robustly associated with high-risk disease. Here we show that a GWAS focused on low-risk cases identified SNPs within DUSP12 at 1q23.3 (P = 2.07 × 10⁻⁶), DDX4 and IL31RA both at 5q11.2 (P = 2.94 × 10⁻⁶ and 6.54 × 10⁻⁷ respectively), and HSD17B12 at 11p11.2 (P = 4.20 × 10⁻⁷) as being associated with the less aggressive form of the disease. These data demonstrate the importance of robust phenotypic data in GWAS analyses and identify additional susceptibility variants for neuroblastoma.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2011 PMID： 21436895 PMCID： PMC3060064 DOI： 10.1371/journal.pgen.1002026

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Neuroblastoma is a pediatric cancer of the developing sympathetic nervous system and is the most common childhood solid tumor outside the central nervous system [1], [2]. Its broad spectrum of clinical behaviors is the basis for ways to categorize neuroblastoma into three risk groups: high-risk, intermediate-risk and low-risk. The approximately 50% of cases classified as high-risk show an aggressive clinical course with widespread metastases to bone and bone marrow present at diagnosis [3]. Despite intensive multimodal therapy, the long-term survival rate is less than 50% for children with high-risk neuroblastoma [1]. On the other hand, substantial portions of neuroblastoma patients show favorable clinical features including spontaneous regression of disease, and are classified as low-risk. Low-risk neuroblastoma patients have a greater than 95% survival probability with minimal, if any, chemotherapy [1]. Intermediate-risk cases are the most heterogeneous, and also the smallest subset using current definitions, comprising about 15% of all neuroblastoma patients. We have recently performed a neuroblastoma GWAS by applying single marker analyses and identified three distinct loci significantly associated with neuroblastoma. Each of these SNP associations was within genes and particularly enriched in the high-risk group of patients: FLJ22536 at chromosome 6p22 [4], BRCA1 associated RING domain 1 (BARD1) at 2q35 [5], and LIM domain only 1 (LMO1) at 11p15 [6]. A similar approach was utilized to identify a common copy number variation (CNV) at chromosome 1q21 within the NBPF23 gene that is also robustly associated with neuroblastoma [7]. In this study, we report that by adapting statistical methods to analyze genotype data, we discovered, and successfully replicated, three distinct loci as associated with the low-risk group of neuroblastoma. Furthermore, we report several gene sets as enriched in all risk groups of neuroblastoma.

Results

Gene-centric method identifies three low-risk neuroblastoma susceptibility loci

As we are interested in studying disease causal variants that have a high likelihood of impacting protein-encoding genes, we developed a gene-centric computational method to test for association signals at the gene level. This method adapted the global test [8], developed to test association of genes groups using microarray expression data, to analyze our genotype data. Our method computes an aggregated test score based on genotype data of all SNPs on a region extending 10 kilo-bases upstream and downstream of a gene. We applied this method using a discovery set containing 1627 cases and 2575 control subjects, aimed at analyzing association to 15,885 genes annotated in the UCSC Genome Browser [9] (Materials and Methods). The replication dataset contained 398 cases and 1507 control subjects. Our methodology correctly identified the three significant genes already reported (FLJ22536, BARD1 and LMO1). In addition, our method also identified the dual-specificity phosphatase 12 gene (DUSP12) at chromosome band 1q23.3 (Table 1) as significantly associated with neuroblastoma.

Table 1

Summary of gene-centric analysis results for different phenotypic neuroblastomas.

Gene Symbols	Chromosome	Start- Stop	N° of SNP	Overall Discovery P-values	Overall Replication P-values	High-risk Discovery P-values	High-risk Replication P-values	Low-risk Discovery P-values	Low-risk Replication P-values
BARD1	2q35	215301519-215382673	28	9.92×10⁻¹¹	2.19×10⁻⁰³	<1.00×10⁻³⁰	3.00×10⁻⁰³	1.62×10⁻⁰¹	6.49×10⁻⁰¹
FLJ44180	6p22.3	22243164-22255401	8	<1.00×10⁻³⁰	1.94×10⁻⁰⁴	<1.00×10⁻³⁰	5.45×10⁻⁰³	1.40×10⁻⁰³	3.66×10⁻⁰²
LMO1	11p15.4	8202432-8246758	29	1.80×10⁻⁰⁷	1.59×10⁻⁰³	2.51×10⁻⁰⁸	2.82×10⁻⁰²	1.40×10⁻⁰²	5.46×10⁻⁰²
DUSP12	1q23.3	159986204-159993576	4	1.16×10⁻⁰⁷	3.30×10⁻⁰²	4.56×10⁻⁰⁴	1.97×10⁻⁰¹	2.07×10⁻⁰⁶	2.92×10⁻⁰²
DDX4	5q11.2	55070534-55148362	11	2.81×10⁻⁰⁵	3.11×10⁻⁰³	2.95×10⁻⁰²	2.67×10⁻⁰¹	2.94×10⁻⁰⁶	7.20×10⁻⁰³
IL31RA	5q11.2	55183090-55254434	18	2.75×10⁻⁰⁴	5.74×10⁻⁰²	2.88×10⁻⁰¹	7.28×10⁻⁰¹	6.54×10⁻⁰⁷	1.48×10⁻⁰²
HSD17B12	11p11.2	43658718-43834745	22	1.29×10⁻⁰⁴	3.05×10⁻⁰²	6.82×10⁻⁰²	3.82×10⁻⁰¹	4.20×10⁻⁰⁷	5.37×10⁻⁰²

Bold-faced p-values indicate significant association signals with Bonferroni correction over 15,885 genes.

Bold-faced p-values indicate significant association signals with Bonferroni correction over 15,885 genes. We next sought to determine if association signals discovered in our unbiased scan would be further enriched, or diminished, when we restricted our analyses to the divergent phenotypes of low-risk or high-risk neuroblastoma. We first analyzed a subset of 678 high-risk neuroblastoma cases from the original discovery case series, again matched to 2575 control subjects. This analysis reconfirmed that all three previously reported signals were truly associated with high-risk neuroblastoma (Table 1), but DUSP12 did not show a strong association signal in the high-risk disease case series (P = 4.56×10−04). In parallel, we analyzed a subset of 574 low-risk cases and 1722 matched control subjects and a replication set of 124 cases and 496 matched control subjects (Materials and Methods). This analysis confirmed DUSP12 and three novel genes as associated with low-risk neuroblastoma: DEAD (Asp-Glu-Ala-Asp) box polypeptide 4 isoform (DDX4) and interleukin-31 receptor A precursor (IL31RA) both at the same locus within chromosome band 5q11.2, and hydroxysteroid (17-beta) dehydrogenase 12 (HSD17B12) at chromosome band 11p11.2 (Table 1). All signals had significant discovery p-values using Bonferroni correction over 15,885 genes (P<3.15×10−6), and replication p-values less than 0.05. Our gene-centric method was able to detect DUSP12 and HSD17B12, the only two genes containing at least one SNP that passed the Bonferroni correction in single marker (SNP) analysis of the low-risk neuroblastoma (Figure 1 and Figure 2) using association testing as implemented in PLINK [10]. The fact that our gene-centric results were compatible with the single marker results supported the effectiveness of our method. In addition, we were able to detect two gene-level association signals located at a single locus for DDX4 and IL31RA even though these genes did not contain any significant SNPs in the single marker analysis (Figure 1 and Table 2). These genes, however, contained several SNPs with moderate signals (Figure 2), and our gene-centric method was able to combine these effects and detected the overall significance of these two gene's signals. Being independently replicated in our study (P = 7.20×10−3 and 1.48×10−2 respectively), these two signals offered indications that our gene-centric method was more effective than single marker analysis in detecting gene-level association signals. Indeed, our power computation, adjusting for 15,885 tests, indicated that our method performed far better than the single SNP method in both our discovery and replication case series (Figures S1, S2, S3, S4).

Figure 1

Manhattan plot of single marker analysis of the low-risk neuroblastoma data set.

Even though the genes DDX4 and IL31RA do not contain significantly associated SNPs (P = 1.0×10−07), the combined effect of moderately associated SNPs drives these two genes to be significant in our gene-centric analysis (genome-wide gene centric threshold p-values for significance is P<3.15×10−6).

Figure 2

Haplotype view of the 4 genes significantly associated with low-risk neuroblastoma.

Table 2

Additional summaries of gene-centric analysis results for low-risk neuroblastoma.

Gene Symbols	Chromosome	Start- Stop	N° of SNP	Gene Randomi-zation P-values	Most significant SNP	Most significant SNP P-values	Single SNP Replication P range
DUSP12	1q23.3	159986204-159993576	4	2.00×10⁻⁰⁵	rs1027702	5.74×10⁻⁸	2.32×10⁻⁴–8.32×10⁻²
DDX4	5q11.2	55070534-55148362	11	1.00×10⁻⁰⁶	rs2619046	6.41×10⁻⁷	1.15×10⁻³–9.50×10⁻¹
IL31RA	5q11.2	55183090-55254434	18	1.00×10⁻⁰⁶	rs10055201	4.80×10⁻⁶	7.09×10⁻⁴–9.28×10⁻¹
HSD17B12	11p11.2	43658718-43834745	22	1.00×10⁻⁰⁶	rs11037575	2.77×10⁻⁹	1.49×10⁻⁶–8.28×10⁻¹

Bold-faced p-values indicate significant signal in single marker analysis using Bonferroni correction over 479,811 SNPs.

Manhattan plot of single marker analysis of the low-risk neuroblastoma data set.

Haplotype view of the 4 genes significantly associated with low-risk neuroblastoma.

Red line indicates P<1.0×10−7. Only DUSP12 and HSD17B12 contain SNPs with significant single-marker p-values in neuroblastoma low-risk subset. While DDX4 and IL31RA do not contain significant SNPs, our gene-centric method was able to detect these genes as associated with low-risk neuroblastoma. Bold-faced p-values indicate significant signal in single marker analysis using Bonferroni correction over 479,811 SNPs. To further confirm the validity of our discovery, we computed a randomization p-value for each of the four newly discovered genes. For each of these genes, we computed a separate null test statistic distribution by calculating the gene-centric test statistics of one million randomly selected pseudo-genes. These pseudo-genes were selected to contain the same number of SNPs as were contained in the referenced gene. This method of selecting pseudo-genes was based on our observation that the average gene-based test statistic was strongly correlated with the number of SNPs in these genes (Figure S5), Using these null distributions to compare against the observed test statistics of the four newly discovered genes, we arrived at the randomization p-values (Table 2). These p-values (range 2.0×10−5–1.0×10−6) were compatible with the p-values asymptotically computed by our gene-centric method, and notably strengthened the credibility of the discovery of four novel disease causal genes associated with low-risk neuroblastoma. To assess the joint impact on disease risk of these genes, we estimated the two-locus genotype odd ratios for all pairs amongst the four most significant SNPs within these four genes (Table 3). For each SNP pair tested, the independently contributed disease risks for carriers of risk alleles at only one locus were overall slightly stronger than the disease risks of each SNP when analyzed separately. In all but one case, the odd ratios of disease risks for carriers of both risk alleles increased markedly (odd ratios range from 2.505 to 3.435). However, no significant interaction between these SNP pairs was detected (P ranges from 0.459 to 0.909). Further, we computed all SNP pair interactions amongst all four genes and again noticed no significant SNP-SNP interaction signals (the best SNP pair signals' P ranges from 0.108 to 0.523, Table S1). In the special case of DDX4 and IL31RA, the modest disease risk for carriers of both risk alleles implicated the true association signal encompassed both genes though they are 38 kilo-bases apart from each other (Figure S6).

Table 3

Estimates of low-risk neuroblastoma odd ratios by genotype between the most significant SNPs.

Gene 1	Most significant SNP 1	Single marker SNP1 OR (95% CI) P	SNP1 carrier & SNP2 non-carrier OR (95% CI) P	Gene 2	Most significant SNP 2	Single marker SNP2 OR (95% CI) P	SNP2 carrier & SNP1 non-carrier OR (95% CI) P	SNP1 & SNP2 carrier OR (95% CI) P	Inter-action P
DUSP12	rs1027702	2.012 (1.47–2.79) 3.381×10⁻⁰⁶	2.373 (1.48–3.98) 1.217×10⁻⁰⁴	DDX1	rs2619046	1.477 (1.21–1.79) 5.702×10⁻⁰⁵	1.826 (0.97–3.49) 5.018×10⁻⁰²	3.435 (2.13–5.76) 1.123×10⁻⁰⁸	0.904
DUSP12	rs1027702	2.012 (1.47–2.79) 3.381×10⁻⁰⁶	2.108 (1.35–1.39) 4.308×10⁻⁰⁴	IL31RA	rs10055201	1.494 (1.23–1.81) 3.848×10⁻⁰⁵	1.622 (0.87–3.04) 0.132	3.140 (2.00–5.07) 2.276×10⁻⁰⁸	0.627
DUSP12	rs1027702	2.012 (1.47–2.79) 3.381×10⁻⁰⁶	2.018 (1.11–3.93) 1.753×10⁻⁰²	HSD17B12	rs11037575	1.674 (1.35–2.08) 1.075×10⁻⁰⁶	1.715 (0.87–3.57) 0.122	3.379 (1.90–6.47) 3.148×10⁻⁰⁶	0.778
DDX1	rs2619046	1.477 (1.21–1.79) 5.702×10⁻⁰⁵	1.346 (0.85–2.08) 0.170	IL31RA	rs10055201	1.494 (1.23–1.81) 3.848×10⁻⁰⁵	1.288 (0.58–2.68) 0.451	1.561 (1.27–1.91) 1.193×10⁻⁰⁵	0.459
DDX1	rs2619046	1.477 (1.21–1.79) 5.702×10⁻⁰⁵	1.546 (1.07–2.24) 1.828×10⁻⁰²	HSD17B12	rs11037575	1.674 (1.35–2.08) 1.075×10⁻⁰⁶	1.732 (1.27–2.39) 3.645×10⁻⁰⁴	2.534 (1.85–3.49) 6.632×10⁻¹⁰	0.728
IL31RA	rs10055201	1.494 (1.23–1.81) 3.848×10⁻⁰⁵	1.485 (1.02–2.15) 3.453×10⁻⁰²	HSD17B12	rs11037575	1.674 (1.35–2.08) 1.075×10⁻⁰⁶	1.665 (1.24–2.26) 4.805×10⁻⁰⁴	2.505 (1.84–3.42) 5.091×10⁻¹⁰	0.909

Odd Ratios (OR), Confident Intervals (CI) and P-values (P) were computed from Fisher's exact test. No significant interaction was detected between any pairs of most significant SNPs.

Odd Ratios (OR), Confident Intervals (CI) and P-values (P) were computed from Fisher's exact test. No significant interaction was detected between any pairs of most significant SNPs. Lastly, we sought to further replicate our results in an independent cohort in Italy of 115 low-risk cases and 680 controls. We selected to genotype the three most significant SNPs (rs1027702, rs2619046, rs11037575) in the three loci that contain DUSP12, DDX4/IL31RA, and HSD17B12 respectively. We analyzed these SNPs data sing various statistical tests listed in Table 4. Interestingly, rs1027702 showed strong replication signals for allele frequency association test as well as dominant model association test (P = 0.031 and 0.008 respectively). On the other hand, both rs2619046, rs11037575 showed strong significant signals for homozygous association test (P = 0.042 and 0.028 respectively) as well as recessive model association test (P = 0.047 and 0.037 respectively). Overall, these replication results provide unambiguous evidence to confirm these three loci as significantly associated with low-risk neuroblastoma.

Table 4

Single SNP replication results in Italian cohort (n = 115 low-risk neuroblastoma and 680 controls).

Genes	SNP	Discovery Single Marker TREND Test	Replication Allele Frequency Test	Replication Homozygous Model Test	Replication Dominant Model Test	Replication Recessive Model Test
DUSP12	rs1027702	5.74×10⁻⁰⁸	0.031	0.102	0.008	0.490
DD×4/IL31RA	rs2619046	6.41×10⁻⁰⁷	0.129	0.042	0.343	0.047
HSD17B12	rs11037575	2.77×10⁻⁰⁹	0.053	0.028	0.194	0.037

Bold-faced p-values indicate significant replication P-values<0.05.

Gene-set analyses identify enriched gene sets in all phenotypes

We are also interested in gene set analyses to identify specific pathways and gene sets that are enriched in neuroblastoma. To perform this analysis, we adapted the random-set approach, which was developed to analyze gene set analysis using gene expression data. This method [11] was suitable for our purpose since it required gene-level scores, which were conveniently obtained by taking the logarithm transformation of our gene-centric p-values. We applied this random-set procedure using the overall, high-risk and low-risk data sets described earlier over 4734 gene sets obtained from the Gene Set Enrichment Analysis site [12], and selected enriched gene sets based on Bonferroni correction criterion (P<1.05×10−5). Additionally selected based on replication p-value threshold of 0.05, three Gene Ontology [13] sets were associated with all cases of neuroblastoma: Nuclear Ubiquitin Ligase Complex, Negative Regulation of Intracellular Transport, and Regulation of Phosphorylation (Table 5). The first two gene sets were also significantly enriched in high-risk neuroblastoma (P = 1.275×10−09 and 6.332×10−07 respectively) with significant replication p-values (0.030 and 0.036 respectively). The third gene set appeared to be enriched in low-risk neuroblastoma (P = 1.678×10−06); however, we were unable to replicate this result (P = 0.96). Furthermore, we identified and successfully replicated an additional gene set that was exclusively enriched in low-risk neuroblastoma: Cytokine and Chemokine Mediate Signaling Pathway (discovery P = 8.175×10−06 and replication P = 0.040). The identification of these gene sets may elucidate biological pathways that are important in the biology of neuroblastoma.

Table 5

Summary of gene set analysis results for all, high-risk, and low-risk neuroblastoma.

Gene Set Names	N° of genes	Overall Discovery p-values	Overall Replication p-values	High-risk Discovery p-values	High-risk Replication p-values	Low-risk Discovery p-values	Low-risk Replication p-values
Nuclear Ubiquitin Ligase Complex	10	1.084×10⁻⁰⁹	6.620×10⁻⁰³	1.275×10⁻⁰⁹	3.024×10⁻⁰²	0.469	0.753
Negative Regulation of Intracellular Transport	10	5.692×10⁻⁰⁷	1.160×10⁻⁰²	6.332×10⁻⁰⁷	3.610×10⁻⁰²	0.361	0.184
Regulation of Phosphorylation	42	6.020×10⁻⁰⁷	4.142×10⁻⁰²	2.940×10⁻⁰²	0.925	1.678×10⁻⁰⁶	0.960
Cytokine and Chemokine Mediate Signaling Pathway	18	0.109	0.756	0.813	0.928	8.175×10⁻⁰⁶	4.027×10⁻⁰²

Bold face p-values of different gene sets at different risk groups (overall, high-risk, low-risk) indicate significant enrichment of that gene set in that risk group.

Discussion

Taken together, this study implicates DUSP12, DDX4, IL31RA, and HSD17B12 as neuroblastoma susceptibility genes, with particular relevance for those at low-risk for malignant progression and death from disease. Methodologically, we suggest that the gene-centric method has stronger power of detection of association signals compared to the single marker method (Figures S1, S2, S3, S4). Not only was the gene-centric method able to detect the two genes harboring genome-wide significant SNPs (DUSP12 and HSD17B12), but also it was able to detect 2 genes that would have been missed by the single marker analysis (DDX4 and IL31RA). Since this method was originally developed to analyze gene expression data, its limitation is the lack of ability to take into account the haplotype effect in computing gene level test statistics. However, our efforts to replicate the discovery with two independent cohorts unequivocally verify association signals at these loci. Further studies will be required to determine if these common variations tag cis- or trans-acting disease causal variations. Interestingly, the segregation of gene-level association signals and gene set enrichment scores between high-risk and low-risk neuroblastoma (Table 1 and Table 5) supports the view that common variation in the human genome can predisposed not only to a particular disease, but also to a clinically relevant disease subsets, thus demonstrating the power of robust phenotypic data in GWAS efforts.

Materials and Methods

Subjects and quality control

Study subjects

The neuroblastoma patients in this study were children registered through the North American-based Children's Oncology Group (COG) and were diagnosed with neuroblastoma or ganglioneuroblastoma. Blood samples from the neuroblastoma cases were identified through the COG neuroblastoma repository for specimen collection at time of diagnosis. All specimens were annotated with clinical and genomic information (Table S2). Samples were assigned into three risk groups (low-risk, intermediate-risk and high-risk) based on the COG risk assignment algorithm [1], that includes patient age at diagnosis [14], International Neuroblastoma Staging System (INSS) stage [2], tumor histopathology [15], DNA index [16], and MYCN amplification status [17]. The only eligibility criterion for genotyping was availability of 1.5 µg of high quality DNA from a tumor-free source such as peripheral blood or bone marrow cells uninvolved with tumor. Since neuroblastoma in the United States is demographically a disease of Caucasian of European descendent, we limited our analyses to this ethnic group to minimize genetic heterogeneity. Summaries of clinical and genomic information of our discovery and replication cohorts are provided in Table S2. The control group in this study included 2575 children of self-reported Caucasian ancestry who were recruited and genotyped by the Center for Applied Genomics at the Children's Hospital of Philadelphia (CHOP). Eligibility criteria for control subjects were: 1) self-reported Caucasian; 2) availability of 1.5 µg of high quality DNA from peripheral blood or mononuclear bone marrow cells; and 3) no known medical disorder, including cancer, based on self-reported intake questionnaire and/or clinician-based assessments. The CHOP Institutional Review Board approved this study.

Genotyping and quality control for discovery cohort

SNP genotyping was performed using the Illumina Infinium II BeadChip (Illumina, San Diego, CA, USA) according to methods detailed elsewhere [4], [5]. Since a portion of the individuals in the discovery cohort was genotyped by the HumanHap550 v1 array (n = 859) while others were genotyped by the v3 array (n = 768), our analysis only concerned the markers shared by the v1 and v3 array. The HumanHap550 v1 array contains 555,175 markers, while the v3 array contains 561,288 markers, including 544,902 markers that are shared by the two arrays. We filtered out 8,749 SNP markers with call rate less than 95%. We also excluded 5,415 SNP markers whose Hardy-Weinberg Equilibrium p-values were less than 0.001. Finally, we excluded additional 50,869 SNP markers whose minor allele frequency is less than 5%. A total of 96 cases were removed from our data set due to their low genotype call rate (<95%). Furthermore, we used Multi-Dimensional Scaling (MDS) as implemented in the PLINK [10], for inferring population structure (Figure S7). Comparing self-identified ancestry with MDS-inferred ancestry confirmed 1642 neuroblastoma patients of European ancestry. Finally, we calculated genome-wide identity-by-state (IBS) estimates for all pair-wise comparisons among all case subjects and control subjects to detect cryptic relatedness and potential duplicated genotype within our data set. This step further excluded 15 neuroblastoma patients from our analyses. After all quality control steps, our discovery data set contained 1627 neuroblastoma case subjects of European ancestry, each of which contained 479,811 SNP markers. To correct the potential effects of population structure, 2575 matching control subjects of European ancestry were selected based on their low IBS estimates with case subjects. The genomic control inflation factor for this data set was 1.08. Five hundred and seventy four (574) low-risk cases, selected from the above 1627 cases, were included for all low-risk neuroblastoma analyses. To keep the genomic inflation factor low, three best matching control subjects were selected for each case, based on IBS estimates, making a total of 1722 control subjects included for analyses. The genomic control inflation factor for this data set was 1.07.

Genotyping and quality control for initial replication cohort

SNP genotyping was performed using the Illumina Human610-Quad array that includes both SNP and CNV markers. The Human610-Quad array contains 620,901 SNPs. We filtered out 48,831 SNP markers with call rate less than 95%. We also excluded 13,305 SNP markers whose Hardy-Weinberg Equilibrium p-values were less than 0.001. Finally, we excluded additional 49,057 SNP markers whose minor allele frequency was less than 5%. A total of 15 cases were removed from our data set due to their low genotype call rate (<95%). After all quality control steps, our replication data set contained 398 neuroblastoma case subjects of European ancestry, each of which contained 509,708 SNP markers. To correct the potential effects of population structure, 1507 matching control subjects of European ancestry were selected based on their low IBS estimates with case subjects. One hundred and twenty four (124) low-risk cases, selected from the above 398 cases, were included for all low-risk neuroblastoma replication analyses. For each case, four best matching control subjects were selected based on IBS estimates, making a total of 496 control subjects included for analyses.

Genotyping of second replication cohort

One hundred and fifteen (115) low-risk neuroblastoma subjects for Italy and six hundred and eighty (680) control Italian subjects were selected to be genotyped at three SNPs: rs1027702, rs2619046, and rs11037575. All samples were genotyped by Taqman SNP Genotyping Assay by Applied Biosystems.

Statistical analyses

Gene-centric analysis

Our gene-centric analysis adopted the global test method [8], developed to test association of a group of genes using microarray data. First, to mirror gene expression data, we quantified our SNP genotype data by counting the number of minor alleles for each sample at each SNP. Second, due to the analogy in relative relationship of the two concepts in global test and in our study, we substituted the concepts of “genes” and “group of genes” from global test with “SNPs” and “genes” respectively. This method adopted the generalized linear model framework to model the relationship between Y, a vector of clinical outcomes, and X, the n×m matrix of genotypic data of n subjects and m SNPs. In this model, α is the intercept, β is a length m vector of regression coefficients, and h is a general link function such as the logit function Testing association between genotypic data and clinical outcomes is equivalent to testing the null hypothesis H0: β = β = 0. Since the number of SNP is much larger than the number of subjects, it is not possible to test this hypothesis in a classical way. Instead, we could test H0 if we assume β to be samples from a common distribution with expectation zero and variance τ2. The null hypothesis becomes simply H0: τ = 0. If we rewrite the model in terms r = ∑jx, with i = 1,…,n, then r is the linear predictor, the total effect of all covariates for subject i. Let = (r,…,r), then is a random vector with expectation zero and Cov() = τ. The original model simplifies to This is a simple random effect model. Under the null hypothesis, the test statistic has expectation E(Q) = trace(R) and variance: where R = (1/m)XX' is an n×n matrix proportional to the covariance matrix of the random effects , μ = h−(α) is the expectation of Y under H0, and μ2 and μ are the second and fourth central moments of Y under H0. The test statistic Q could be rewritten aswhere X is the length n vector of genotype of SNP i. The expression Q' (Y-μ)] would be exactly the test statistic of SNP i if it were the only SNP on the gene of interested; or we could interpret that Q is the “contribution” of SNP i to the overall test statistic. This means that the overall test statistic is simply the average of the statistics Q of m individual SNPs. Notably, the averaging is over a squared covariance between genotype and clinical outcomes, SNPs with large variance (i.e. strong association signals) have stronger influences on the outcome of the test statistic Q than those with weaker association signals. Using this method, we computed an aggregated effect of all SNPs that are located from 10-kilo bases upstream to 10-kilo bases downstream for the gene being tested, and computed asymptotic p-value for each gene. We performed the global test on 15,885 annotated, unannotated and predicted genes downloaded from the UCSC Genome Browser [9], and used strict Bonferroni correction criterion (P<3.15×10−6) to determine whether a gene was associated with neuroblastoma. True association signals were further selected based on replication p-value less than 0.05.

Randomization p-value computation

For each significant gene, we computed randomization p-values by comparing its test statistic and its respective null distributions. A null distribution of a gene was composed of test statistics of one million pseudo-genes having the same number of SNPs as the referenced genes. The SNPs of these pseudo-genes were randomly selected across the genome.

Odd ratios estimation

We used the Fisher's exact test to estimate the odd ratios using genotype data as well as the 95% confident interval and p-values.

SNP-SNP interaction estimation

Single marker interaction scores were computed using the general linear model to compute interaction effect between two SNPs.

Gene set analysis

To analyze the significance of gene sets, we adopted the random-set method [11] since it allows us to utilize the gene-centric results to compute enrichment score for each gene set. In this analysis, we used the logarithm transformation of our gene-centric method as gene-level scores to detect gene sets enriched in neuroblastoma. We performed three separate gene set analyses for overall, high-risk and low-risk data sets over 4734 gene sets downloaded from the Broad Institute MsigDb [12]. These gene sets include five categories: positional gene sets, curated gene sets (chemical and genetic perturbations, and canonical pathways), motif gene sets (microRNA targets, and transcription factor targets), computational gene sets (cancer modules, and cancer gene neighborhoods), and GO gene sets (GO cellular components, GO biological process, and GO molecular function). Strict Bonferroni correction criterion was used to select gene sets that are enriched in neuroblastoma.

Data deposition

The genotypic and phenotypic information from this study is deposited in dbGAP (www.ncbi.nlm.gov/gap) under accession number phs000124.v2.p1.

Ethics statement

The Children's Hospital of Philadelphia Institutional Review Board approved this study. Power calculation of single SNP analysis of the low-risk neuroblastoma discovery set, adjusting for 500,000 tests. (TIF) Click here for additional data file. Power calculation of single SNP analysis of the low-risk neuroblastoma discovery set, adjusting for 15,885 tests. (TIF) Click here for additional data file. Power calculation of single SNP analysis of the low-risk neuroblastoma replication set, adjusting for 10 SNPs (average number of SNPs in a gene). (TIF) Click here for additional data file. Power calculation of single SNP analysis of the low-risk neuroblastoma replication set with no multiple testing adjustments. (TIF) Click here for additional data file. The correlation between average test statistic of genes and the number of SNPs on those genes. (TIF) Click here for additional data file. Single SNP association signals of 610 kilo-base region encompassing DDX4 and IL31RA: blue color indicates the SNPs mapped to these two genes respectively. (TIF) Click here for additional data file. Multi-dimensional scaling plot. Circled area denotes Caucasian cluster which includes HapMap CEU subjects as well as the cases and controls used in this study. (TIF) Click here for additional data file. Summary of the most significant SNP pair interaction signals amongst the four genes associated with low-risk neuroblastoma. (DOC) Click here for additional data file. Summary of clinical and genomic information of neuroblastoma cases. (DOC) Click here for additional data file.

16 in total

1. A global test for groups of genes: testing association with a clinical outcome.

Authors: Jelle J Goeman; Sara A van de Geer; Floor de Kort; Hans C van Houwelingen
Journal: Bioinformatics Date: 2004-01-01 Impact factor: 6.937

2. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

3. Integrative genomics identifies LMO1 as a neuroblastoma oncogene.

Authors: Kai Wang; Sharon J Diskin; Haitao Zhang; Edward F Attiyeh; Cynthia Winter; Cuiping Hou; Robert W Schnepp; Maura Diamond; Kristopher Bosse; Patrick A Mayes; Joseph Glessner; Cecilia Kim; Edward Frackelton; Maria Garris; Qun Wang; Wendy Glaberson; Rosetta Chiavacci; Le Nguyen; Jayanti Jagannathan; Norihisa Saeki; Hiroki Sasaki; Struan F A Grant; Achille Iolascon; Yael P Mosse; Kristina A Cole; Hongzhe Li; Marcella Devoto; Patrick W McGrady; Wendy B London; Mario Capasso; Nazneen Rahman; Hakon Hakonarson; John M Maris
Journal: Nature Date: 2010-12-01 Impact factor: 49.962

4. Clinical relevance of tumor cell ploidy and N-myc gene amplification in childhood neuroblastoma: a Pediatric Oncology Group study.

Authors: A T Look; F A Hayes; J J Shuster; E C Douglass; R P Castleberry; L C Bowman; E I Smith; G M Brodeur
Journal: J Clin Oncol Date: 1991-04 Impact factor: 44.544

5. The International Neuroblastoma Pathology Classification (the Shimada system).

Authors: H Shimada; I M Ambros; L P Dehner; J Hata; V V Joshi; B Roald; D O Stram; R B Gerbing; J N Lukens; K K Matthay; R P Castleberry
Journal: Cancer Date: 1999-07-15 Impact factor: 6.860

6. Amplification of N-myc in untreated human neuroblastomas correlates with advanced disease stage.

Authors: G M Brodeur; R C Seeger; M Schwab; H E Varmus; J M Bishop
Journal: Science Date: 1984-06-08 Impact factor: 47.728

Review 7. Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment.

Authors: G M Brodeur; J Pritchard; F Berthold; N L Carlsen; V Castel; R P Castelberry; B De Bernardi; A E Evans; M Favrot; F Hedborg
Journal: J Clin Oncol Date: 1993-08 Impact factor: 44.544

8. GSEA-P: a desktop application for Gene Set Enrichment Analysis.

Authors: Aravind Subramanian; Heidi Kuehn; Joshua Gould; Pablo Tamayo; Jill P Mesirov
Journal: Bioinformatics Date: 2007-07-20 Impact factor: 6.937

9. The UCSC Genome Browser Database: update 2006.

Authors: A S Hinrichs; D Karolchik; R Baertsch; G P Barber; G Bejerano; H Clawson; M Diekhans; T S Furey; R A Harte; F Hsu; J Hillman-Jackson; R M Kuhn; J S Pedersen; A Pohl; B J Raney; K R Rosenbloom; A Siepel; K E Smith; C W Sugnet; A Sultan-Qurraie; D J Thomas; H Trumbower; R J Weber; M Weirauch; A S Zweig; D Haussler; W J Kent
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

10. The Gene Ontology project in 2008.

Authors:
Journal: Nucleic Acids Res Date: 2007-11-04 Impact factor: 16.971

90 in total

Review 1. Genetically engineered murine models--contribution to our understanding of the genetics, molecular pathology and therapeutic targeting of neuroblastoma.

Authors: Louis Chesler; William A Weiss
Journal: Semin Cancer Biol Date: 2011-09-21 Impact factor: 15.707

Review 2. Genetic and nongenetic risk factors for childhood cancer.

Authors: Logan G Spector; Nathan Pankratz; Erin L Marcotte
Journal: Pediatr Clin North Am Date: 2014-10-18 Impact factor: 3.278

Review 3. Mechanisms of neuroblastoma regression.

Authors: Garrett M Brodeur; Rochelle Bagatell
Journal: Nat Rev Clin Oncol Date: 2014-10-21 Impact factor: 66.675

4. Socioeconomic status and global variations in the incidence of neuroblastoma: call for support of population-based cancer registries in low-middle-income countries.

Authors: Junne Kamihara; Clement Ma; Soad Linneth Fuentes Alabi; Claudia Garrido; A Lindsay Frazier; Carlos Rodriguez-Galindo; Manuela A Orjuela
Journal: Pediatr Blood Cancer Date: 2016-10-13 Impact factor: 3.167

5. Replication of GWAS-identified neuroblastoma risk loci strengthens the role of BARD1 and affirms the cumulative effect of genetic variations on disease susceptibility.

Authors: Mario Capasso; Sharon J Diskin; Francesca Totaro; Luca Longo; Marilena De Mariano; Roberta Russo; Flora Cimmino; Hakon Hakonarson; Gian Paolo Tonini; Marcella Devoto; John M Maris; Achille Iolascon
Journal: Carcinogenesis Date: 2012-12-07 Impact factor: 4.944

Review 6. The role of genetic and epigenetic alterations in neuroblastoma disease pathogenesis.

Authors: Raquel Domingo-Fernandez; Karen Watters; Olga Piskareva; Raymond L Stallings; Isabella Bray
Journal: Pediatr Surg Int Date: 2012-12-29 Impact factor: 1.827

7. Nervous system: Embryonal tumors: Neuroblastoma.

Authors: Caileigh Pudela; Skye Balyasny; Mark A Applebaum
Journal: Atlas Genet Cytogenet Oncol Haematol Date: 2020-07

8. Mitochondrial DNA Haplogroups and Susceptibility to Neuroblastoma.

Authors: Xiao Chang; Marina Bakay; Yichuan Liu; Joseph Glessner; Komal S Rathi; Cuiping Hou; Huiqi Qu; Zalman Vaksman; Kenny Nguyen; Patrick M A Sleiman; Sharon J Diskin; John M Maris; Hakon Hakonarson
Journal: J Natl Cancer Inst Date: 2020-12-14 Impact factor: 13.506

9. Trans-population analysis of genetic mechanisms of ethnic disparities in neuroblastoma survival.

Authors: Eric R Gamazon; Navin Pinto; Anuar Konkashbaev; Hae Kyung Im; Sharon J Diskin; Wendy B London; John M Maris; M Eileen Dolan; Nancy J Cox; Susan L Cohn
Journal: J Natl Cancer Inst Date: 2012-12-14 Impact factor: 13.506

10. Evaluation of Genetic Predisposition for MYCN-Amplified Neuroblastoma.

Authors: Eric A Hungate; Mark A Applebaum; Andrew D Skol; Zalman Vaksman; Maura Diamond; Lee McDaniel; Samuel L Volchenboum; Barbara E Stranger; John M Maris; Sharon J Diskin; Kenan Onel; Susan L Cohn
Journal: J Natl Cancer Inst Date: 2017-10-01 Impact factor: 13.506