| Literature DB >> 31289270 |
Zihuai He1,2, Bin Xu3, Joseph Buxbaum4, Iuliana Ionita-Laza5.
Abstract
The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.Entities:
Mesh:
Year: 2019 PMID: 31289270 PMCID: PMC6616627 DOI: 10.1038/s41467-019-11023-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Overview of WGScan
Fig. 2Family-wise error rate and power-simulation studies. The trait is dichotomous, and region size is 200 kb. Several candidate window sizes were considered for WGScan, namely 5 kb, 10 kb, 15 kb, 20 kb, 25 kb and 50 kb. The left panel presents family-wise error rate comparison based on 105 replicates. M-Beta: method based on Beta distribution. M-Spectral-90%/95%/99%: method based on spectral decomposition, where leading eigenvalues account for 90%/95%/99% of the total variation (i.e., sum of all eigenvalues). The right panel presents power comparison based on 1000 replicates, with causal proportion 0.5%. WGScan: proposed test; sliding window: SKAT, Burden or SKAT-O test is applied to scan the region continuously using a sliding window of 10 kb, 20 kb or both, adjusted by Bonferroni correction for the total number of windows tested. Source data are provided as a Source Data file
Fig. 3Application to the Metabochip data. The top panel shows the results of WGScan (sliding-window analysis) with candidate window sizes 5 kb, 10 kb, 15 kb, 20 kb, 25 kb and 50 kb. Each dot corresponds to a window. The highlighted dots correspond to significant windows and the overlapping genes. The bottom panel shows the results from the gene-based analyses. Each dot corresponds to a gene. The highlighted dots represent significant genes. For each lipid trait, a significance threshold (3.75e−06; black dashed line) is estimated for the minimum p-value of original dispersion and burden tests, and all tests are weighted by GenoNet scores across 127 tissues (in total 256 tests per window). The red dashed line presents the Bonferroni threshold (1.66e−08). The gene-based significance threshold (1.88e−04) is calculated by Bonferroni adjustment based on 266 genes located in the 99 fine-mapping regions. The right panel presents the significant windows (and the overlapping genes) identified by WGScan. Source data are provided as a Source Data file
Fig. 4Estimated significance thresholds for whole-genome sequencing studies (based on the Simons Simplex Collection data). Bars present estimated genome-wide significance thresholds (−log10 scale) for different tests. WGScan: WGScan with dispersion and burden tests (two tests per window). WGScan-I: 127 tissue-specific GenoNet scores are integrated, in addition to the original burden and dispersion tests (for a total of 256 tests per window). The significance threshold is estimated for the minimum p-value of all tests. The Bonferroni threshold is defined as 0.05 divided by the total number of tests. Source data are provided as a Source Data file
Enrichment of association signals in promoter regions
| Test | Pilot + Phase 1 | Phase 2 | Phase 3–1 | Phase 3–2 | Meta |
|---|---|---|---|---|---|
| Dispersion | 2.0E−05 | 1 | 4.0E−05 | 0.0260 | 5.9E−08 |
| Burden | 3.2E−05 | 0.9999 | 0.9866 | 0.5377 | 0.0050 |
Observed proportion of significant windows that overlap promoter regions are compared with the expected proportion determined empirically based on control windows. The analysis is based on 100,000 replicates
ToppFun enrichment results
| Dispersion | Burden | |||||
|---|---|---|---|---|---|---|
| Category | Name | Bonferroni | Name | Bonferroni | ||
| Cellular component (GO) | Synapse | 6.36E−11 | 6.00E−08 | Synapse | 6.52E−23 | 9.13E−20 |
| Neuron part | 3.12E−09 | 2.95E−06 | Neuron part | 4.81E−20 | 6.73E−17 | |
| Neuron projection | 1.27E−08 | 1.20E−05 | Neuron projection | 1.75E−19 | 2.45E−16 | |
| Synapse part | 3.19E−08 | 3.01E−05 | Synapse part | 1.68E−18 | 2.36E−15 | |
| Cell projection part | 5.47E−08 | 5.16E−05 | Cell projection part | 2.21E−17 | 3.09E−14 | |
| Human disease (DisGeNETa) | Substance-related disorders | 4.50E−13 | 4.05E−09 | Substance-related disorders | 4.48E−13 | 7.18E−09 |
| Autism spectrum disorders | 2.37E−09 | 2.13E−05 | Autistic disorder | 2.85E−12 | 4.56E−08 | |
| Neuroblastoma | 4.92E−08 | 4.42E−04 | Schizophrenia | 3.95E−12 | 6.33E−08 | |
| Central neuroblastoma | 7.81E−08 | 7.02E−04 | Autism spectrum disorders | 1.61E−11 | 2.59E−07 | |
| Autistic disorder | 1.86E−07 | 1.67E−03 | Autosomal recessive predisposition | 4.11E−10 | 6.58E−06 | |
aDisGeNET: The DisGeNET database integrates human gene-disease associations from various databases for a large number of Mendelian and complex diseases
Enrichment of association signals in 20 target gene sets
| Dispersion | Burden | |||||
|---|---|---|---|---|---|---|
| Name | GeneSet | EnhancerSet | Enhanced GeneSet | GeneSet | EnhancerSet | Enhanced GeneSet |
| FMRP_targets_Darnell2011 | 5.0E−05 | 0.0016 | 2.2E−06 | 0.0057 | 2.6E−04 | 6.2E−04 |
| ASD_coexpression_networks_Willsey2013 | 0.4368 | 0.8772 | 0.3939 | 7.6E−05 | 0.0013 | 2.5E−05 |
| Constrained_PLIScoreOver0.9 | 2.2E-04 | 0.1319 | 1.3E−04 | 1.9E−04 | 7.0E−04 | 4.7E−05 |
| FXR2_wt_binding_sites | 0.0083 | 0.0315 | 0.0012 | 0.0023 | 9.9E−05 | 2.6E−04 |
| FMR1_iso1_wt_binding_sites | 0.0254 | 0.0225 | 0.0100 | 0.0133 | 0.0014 | 0.0021 |
| FMR1_iso7_I304N_binding_sites | 0.0485 | 0.0088 | 0.0162 | 0.0230 | 4.6E−04 | 0.0055 |
| FMR1_iso7_wt_binding_sites | 0.2504 | 0.1216 | 0.1942 | 0.0226 | 0.0082 | 0.0055 |
| Processed_Transcript_GencodeV19 | 0.0257 | — | 0.0257 | 0.0114 | — | 0.0114 |
| FMR1_iso1_I304N_binding_sites | 0.2328 | 0.0640 | 0.1521 | 0.1010 | 3.3E−04 | 0.0237 |
| regulatory_elements_neuro | 0.0269 | — | 0.0269 | 0.3980 | — | 0.3980 |
| FXR1_wt_binding_sites | 0.0611 | 0.1849 | 0.0841 | 0.1087 | 0.0381 | 0.1104 |
| BrainExpressed_Kang2011 | 0.6152 | 0.0030 | 0.2729 | 0.5306 | 0.0169 | 0.3241 |
| Protein_Coding_GencodeV19 | 0.7413 | 4.4E−04 | 0.3192 | 0.4560 | 0.0363 | 0.3496 |
| CHD8_targets_Cotney2015_Sugathan2014 | 0.3743 | 0.0080 | 0.3512 | 0.9345 | 0.2153 | 0.9074 |
| Antisense_GencodeV19 | 0.4770 | — | 0.4770 | 0.4183 | — | 0.4183 |
| lincRNA_GencodeV19 | 0.9507 | — | 0.9507 | 0.4613 | — | 0.4613 |
| ASD_risk_genes_TADA_FDR0.3 | 0.7783 | 0.1752 | 0.6973 | 0.4770 | 0.8620 | 0.4671 |
| PSD_Genes2Cognition | 0.9944 | 0.0165 | 0.8704 | 0.6989 | 4.6E−04 | 0.4879 |
| PseudoGencodeV19 | 1.0000 | — | 1.0000 | 0.5864 | — | 0.5864 |
| Developmental_delay_DDD | 0.7380 | 0.4580 | 0.6310 | 0.9987 | 0.4166 | 0.9908 |
For each gene set, we consider three related sets: the original gene set, enhancers predicted to interact with genes in the set, and an enhanced gene set containing both genes and the predicted enhancers. p-values from enrichment analyses are reported; those smaller than Bonferroni threshold = 0.05/20 = 0.0025 are bolded