| Literature DB >> 29670507 |
Maria V Fernández1,2, John Budde1,2, Jorge L Del-Aguila1,2, Laura Ibañez1,2, Yuetiva Deming1,2, Oscar Harari1,2, Joanne Norton1,2, John C Morris2,3, Alison M Goate4, Carlos Cruchaga1,2.
Abstract
Gene-based tests to study the combined effect of rare variants on a particular phenotype have been widely developed for case-control studies, but their evolution and adaptation for family-based studies, especially studies of complex incomplete families, has been slower. In this study, we have performed a practical examination of all the latest gene-based methods available for family-based study designs using both simulated and real datasets. We examined the performance of several collapsing, variance-component, and transmission disequilibrium tests across eight different software packages and 22 models utilizing a cohort of 285 families (N = 1,235) with late-onset Alzheimer disease (LOAD). After a thorough examination of each of these tests, we propose a methodological approach to identify, with high confidence, genes associated with the tested phenotype and we provide recommendations to select the best software and model for family-based gene-based analyses. Additionally, in our dataset, we identified PTK2B, a GWAS candidate gene for sporadic AD, along with six novel genes (CHRD, CLCN2, HDLBP, CPAMD8, NLRP9, and MAS1L) as candidate genes for familial LOAD.Entities:
Keywords: Alzheimer's disease; clustering; family-based; gene-based; rare variants; transmission disequilibrium; variance-component; whole exome sequencing
Year: 2018 PMID: 29670507 PMCID: PMC5893779 DOI: 10.3389/fnins.2018.00209
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1Structure of families used in this study. Black diamonds represent cases and white diamonds represent controls. Y: genetic data available. N: no genetic data available.
Demographic data for the familial dataset employed in this study.
| Cases | 824 | 73 ±7 | 48–99 | 63 | 73 |
| Controls | 411 | 83 ± 9 | 39–104 | 59 | 51 |
| Total | 1235 | 77 ± 10 | 39–104 | 61 | 65 |
Age At Onset (AAO) for cases and Age at Last Assessment (ALA) for controls.
Number of samples for which whole genome sequencing (WGS) or whole exome sequencing (WES) was performed, with detail of the exon library kits employed in this study.
| WGS | 153 | |
| Agilent's SureSelect Human All Exon kits V3 | 0 | 28 |
| Agilent's SureSelect Human All Exon kits V5 | 0 | 665 |
| Roche VCRome | 0 | 389 |
| Total | 153 | 1,082 |
Figure 2Schematic design of the analysis performed in this study.
Relationship of programs and models tested according to their main features and kinship matrix that they use.
| EPACTS | X | X | X | X | ||||||
| RVGDT | X | |||||||||
| SKAT-v2 | X | X | X | X | ||||||
| GSKAT | X | X | X | |||||||
| FSKAT | X | X | ||||||||
| FarVat-Adj | X | X | X | X | ||||||
| FarVat-BLUP | X | X | X | X | ||||||
| Pedgne | X | X | ||||||||
| RareIbd | X | |||||||||
Representation of the segregation pattern of the simulated gene.
| Fam1 | 0 | 0 | 0 | 0 | |
| Fam2 | 0 | 0 | 0 | 0 | |
| Fam3 | 0 | 0 | 0 | 0 | |
| Fam4 | 0 | 0 | 0 | 0 | |
| Fam5 | 0 | 0 | 0 | 0 | |
| Fam6 | 0 | 0 | 0 | 0 | |
| Fam7 | 0 | 0 | 0 | 0 | |
| Fam8 | 0 | 0 | 0 | 0 | |
| Fam9 | 0 | 0 | 0 | 0 | |
| Fam10 | 0 | 0 | 0 | 0 | |
| Fam11 | 0 | 0 | 0 | 0 | |
| Fam12 | 0 | 0 | 0 | 0 | |
| Fam13 | 0 | 0 | 0 | 0 | |
| Fam14 | 0 | 0 | 0 | 0 | |
| Fam15 | 0 | 0 | 0 | 0 | |
| Fam16 | 0 | 0 | 0 | 0 | |
| Fam17 | 0 | 0 | 0 | 0 | |
| Fam18 | 0 | 0 | 0 | 0 | |
| Fam19 | 0 | 0 | 0 | 0 | |
| Fam20 | 0 | 0 | 0 | 0 | |
| Fam21 | 0 | 0 | 0 | 0 | |
| Fam22 | 0 | 0 | 0 | 0 | |
| Fam23 | 0 | 0 | 0 | 0 | |
| Fam24 | 0 | 0 | 0 | 0 | |
| Fam25 | 0 | 0 | 0 | 0 | |
One (1) means that all cases within the family are carriers of the variant. Zero (0) means that the variant is not present in that family.
Gene-based p-values for the simulated dataset under different scenarios for the gene-based methods tested in the subset of 25 families.
| 5FC × 0FNC | 0.236 | NA | 0.141 | 0.004 | 0.301 | 0.003 | <1 × 10−5 | NA | 5.42 × 10−6 | 4.66 × 10−6 | NA | NA | NA | 3.93 × 10−9 | 3.06 × 10−9 | NA | NA | NA |
| 5FC × 5FNC | 0.235 | 0.124 | 0.023 | 0.002 | 0.123 | 7.99 × 10−4 | <1 × 10−5 | NA | 0.004 | 0.005 | NA | NA | NA | 2.10 × 10−5 | 4.00 × 10−5 | NA | NA | NA |
| 5FC × 10FNC | 0.354 | 0.338 | 0.112 | 0.005 | 0.079 | 7.99 × 10−4 | <1 × 10−5 | NA | 0.032 | 0.036 | NA | NA | NA | 7.71 × 10−4 | 1.01 × 10−3 | NA | NA | NA |
| 5FC × 15FNC | 0.377 | 0.359 | 0.202 | 0.005 | 0.095 | 0.002 | <1 × 10−5 | NA | 0.062 | 0.061 | NA | NA | NA | 0.002 | 2.84 × 10−3 | NA | NA | NA |
| 5FC × 20FNC | 0.377 | 0 | 0.201 | 0.006 | 0.114 | 0.003 | <1 × 10−5 | 0.321 | 0.073 | 0.075 | 0.670 | 0.075 | 0.134 | 0.002 | 2.40 × 10−3 | 0.132 | 0.002 | 0.005 |
| 10FCA × 15FNC | 0.083 | 0 | 0.028 | 9 × 10−4 | 0.004 | 2.65 × 10−6 | <1 × 10−5 | 0.047 | 0.005 | 0.008 | 0.272 | 0.008 | 0.017 | 6.81 × 10−6 | 1.33 × 10−5 | 0.013 | 1.33 × 10−5 | 3.62 × 10−5 |
| 15FC × 10FNC | 0.014 | 0 | 0.005 | 9 × 10−4 | 0.001 | 1.77 × 10−9 | <1 × 10−5 | 0.051 | 1.72 × 10−6 | 6.31 × 10−5 | 0.024 | 6.31 × 10−5 | 1.30 × 10−4 | 4.26 × 10−11 | 3.27 × 10−9 | 0.001 | 3.27 × 10−9 | 8.93 × 10−9 |
| 20FC × 5FNC | 0.002 | 0 | 0.002 | 9 × 10−4 | 0.002 | 1.30 × 10−9 | <1 × 10−5 | 0.039 | 1.48 × 10−11 | 7.85 × 10−7 | 0.024 | 7.85 × 10−7 | 1.14 × 10−6 | 6.12 × 10−18 | 2.12 × 10−12 | 6.32 × 10−4 | 2.12 × 10−12 | 2.54 × 10−10 |
| 25FC × 0FNC | 3 × 10−4 | 0 | 0.001 | 9 × 10−4 | 0.001 | 1.42 × 10−10 | <1 × 10−5 | 0.033 | 1.55 × 10−19 | 4.44 × 10−8 | 0.025 | 4.44 × 10−8 | 7.06 × 10−8 | 4.59 × 10−29 | 4.58 × 10−15 | 5.10 × 10−4 | 4.58 × 10−15 | 2.54 × 10−10 |
Simulated scenarios: 5FC, five families carrier of variants within the hypothetical gene; 5FC × 5FNC, five families carrier of variants within the hypothetical gene and five families non-carrier of variants within the hypothetical gene; 5FC × 10FNC, five families carrier of variants within the hypothetical gene and 10 families non-carrier of variants within the hypothetical gene; 5FC × 15FNC, five families carrier of variants within the hypothetical gene and 15 families non-carrier of variants within the hypothetical gene; 5FC × 20FNC, five families carrier of variants within the hypothetical gene and 20 families non-carrier of variants within the hypothetical gene; 10FC × 15FNC, 10 families carrier of variants within the hypothetical gene and 15 families non-carrier of variants within the hypothetical gene; 15FC × 10FNC, 15 families carrier of variants within the hypothetical gene and 10 families non-carrier of variants within the hypothetical gene; 20FC × 5FNC, 20 families carrier of variants within the hypothetical gene and five families non-carrier of variants within the hypothetical gene; 25FC, 25 families carrier of variants within the hypothetical gene.
we tested SKAT, CMC, and VT on EPACTS, but CMC and VT reported all NA values so data is not shown.
Gene-based p-values for the APOE gene under different gene-set scenarios for the gene-based methods tested in the entire dataset (N = 1235, 285 families).
| gene | 19 | 0.035 | 0.037 | 0.061 | 0.164 | 0.515 | 0.712 | 0.205 | 0.053 | 0.379 | 0.379 | 0.036 | 0.311 | 0.017 | 0.311 | 0.034 | |||
| HM-ε2ε4 | 4 | 0.412 | 0.414 | 0.359 | 0.020 | 0.420 | 0.420 | 0.275 | 0.275 | ||||||||||
| HM | 2 | 0.067 | 0.089 | 0.048 | 0.237 | 0.177 | 0.177 | 0.741 | 0.022 | 0.028 | 0.052 | 0.014 | 0.052 | 0.018 | 0.053 | 0.090 | 0.024 | 0.090 | 0.031 |
| ε2ε4 | 2 | 0.849 | 0.855 | 0.024 | |||||||||||||||
In the analysis, only nonsynonymous variants (only SNVs) with a MAF < 0.01, and the APOE ε2 and ε4, were considered and we adjusted by sex and PCAs. Highlighted in bold, significant p-values after multiple test correction.
gene, set of 19 polymorphic variants within APOE gene, including APOE ε2 and ε4 variants; HM-ε2ε4, set of variants considered HIGH or MODERATE including APOE ε2 and ε4 variants; HM, set of variants considered HIGH or MODERATE without APOE ε2 and ε4 variants; ε2ε4, APOE ε2 and ε4 variants alone. N, number of variants that went into analysis.
We tested SKAT, CMC, and VT on EPACTS, but CMC and VT reported all NA values so data is not shown.
Figure 3Quantile-quantile (QQ) plots from different family-based gene-based methods for all nonsynonymous variants with a MAF <1% in our family-based dataset. (A) Comparison of SKAT test using different kinship matrices: pedigree calculation (PED), Identity By Similarity (IBS) estimation, Balding-Nichols (BN) estimation, and the kinship generated by EPACTS (HR). (C) Comparison of different collapsing tests: GSKAT, EPACTS, FarVAT, and PedGene. (B) Comparison of different variance-component gene-based methods: GSKAT, FSKAT, SKAT, EPACTS, FarVAT, and PedGene. (D) Comparison of transmission disequilibrium tests: RVGDT and RareIBD.
Top results for all gene-based methods tested.
| PedGene | SKAT | 2.42 × 10−12 | 3.533 | |
| PedGene | Burden | 1.04 × 10−8 | 2.997 | |
| GSKAT | Burden | 3.04 × 10−3 | 1.704 | |
| GSKAT | SKAT | 1.90 × 10−3 | 1.681 | |
| Rare-IBD | TDT | 1.00 × 10−4 | 1.450 | |
| FarVAT-BLUP | CALPHA | 4.60 × 10−07 | 1.259 | |
| FarVAT | CALPHA | 2.09 × 10−07 | 1.152 | |
| FarVAT-BLUP | CLP | 1.14 × 10−4 | 1.112 | |
| FarVAT-BLUP | SKATO | 7.37 × 10−7 | 1.101 | |
| FarVAT-BLUP | CMC | 1.28 × 10−4 | 1.066 | |
| FarVAT-BLUP | Burden | 1.14 × 10−4 | 1.031 | |
| FarVAT | SKATO | 3.54 × 10−7 | 1.016 | |
| FarVAT | CLP | 1.25 × 10−5 | 1.000 | |
| RVGDT | TDT | 9.99 × 10−4 | 0.995 | |
| FarVAT | CMC | 4.40 × 10−5 | 0.993 | |
| FarVAT | Burden | 1.25 × 10−5 | 0.985 | |
| EPACTS | VT | 1.20 × 10−4 | 0.954 | |
| FSKAT | SKAT | 2.00 × 10−5 | 0.938 | |
| EPACTS | CMC | 1.05 × 10−3 | 0.849 | |
| SKAT | IBS | 7.94 × 10−5 | 0.668 | |
| EPACTS | SKAT | 2.42 × 10−5 | 0.635 | |
| SKAT | PED | 2.47 × 10−4 | 0.360 | |
| SKAT | HR | 2.06 × 10−2 | 0.039 | |
| SKAT | BN | 2.21 × 10−2 | 0.038 |
Top gene, p-value and lambda for each test is given, ordered by lambda value.
Figure 4Correlation plots from different family-based gene-based methods for genes with a p ≤ 0.005. (A) Pearson correlation correlates genes according to their p-values. (B) Spearman correlation correlates genes according to their rankings.
Most frequent genes, within p-value threshold category, across the different gene-based family-based methods tested.
| ≤5 × 10−7 | 3 | 0.007 | 0.031 | 2.42 × 10−5 | 1.50 × 10−5 | 0.013 | 0.013 | 0.990 | 7.94 × 10−5 | 0.007 | 0.007 | 0.007 | 0.004 | 0.004 | 0.004 | 7.37 × 10−7 | 0.071 | ||||
| ≤5 × 10−6 | 4 | 0.007 | 0.031 | 0.000 | 0.000 | 0.013 | 0.013 | 0.990 | 0.000 | 0.007 | 0.007 | 0.007 | 0.004 | 0.004 | 0.004 | 0.071 | |||||
| ≤5 × 10−5 | 5 | 0.007 | 0.031 | 0.013 | 0.013 | 0.990 | 0.000 | 0.007 | 0.007 | 0.007 | 0.004 | 0.004 | 0.004 | 0.071 | |||||||
| 4 | 0.018 | 0.043 | 2.33 × 10−4 | 2.07 × 10−4 | 0.002 | 0.020 | 1.000 | 7.30 × 10−4 | 0.006 | 0.005 | 0.005 | 0.011 | 0.009 | 0.009 | 0.299 | ||||||
| 3 | 0.002 | 0.003 | 0.057 | 0.019 | 0.187 | 0.187 | 0.998 | 0.042 | 4.65 × 10−4 | 4.27 × 10−4 | 0.001 | 1.32 × 10−4 | 1.32 × 10−4 | 0.015 | 2.73 × 10−4 | 0.685 | |||||
| 3 | 0.001 | 0.009 | 0.331 | 0.205 | 0.090 | 0.090 | 1.000 | 0.193 | 1.23 × 10−4 | 0.060 | 0.001 | 2.39 × 10−4 | 2.39 × 10−4 | 0.113 | 4.93 × 10−4 | 0.443 | |||||
| ≤5 × 10−4 | 8 | 0.002 | 0.003 | 0.652 | 0.178 | 0.155 | 0.191 | 9.99 × 10−4 | 0.572 | 0.309 | 0.268 | 6.00 × 10−4 | |||||||||
| 8 | 0.001 | 0.013 | 0.020 | 0.013 | 0.029 | 0.029 | 0.998 | 0.019 | 0.002 | 0.003 | 0.157 | ||||||||||
| 8 | 0.002 | 0.003 | 0.057 | 0.019 | 0.187 | 0.187 | 0.998 | 0.042 | 0.001 | 0.015 | 0.685 | ||||||||||
| 7 | 0.007 | 0.031 | 0.013 | 0.013 | 0.990 | 0.007 | 0.007 | 0.007 | 0.004 | 0.004 | 0.004 | 0.071 | |||||||||
| 7 | 0.001 | 0.009 | 0.331 | 0.205 | 0.090 | 0.090 | 1.000 | 0.193 | 0.060 | 0.001 | 0.113 | 0.443 | |||||||||
| 6 | 0.018 | 0.043 | 0.020 | 0.020 | 1.000 | 7.30 × 10−4 | 0.006 | 0.005 | 0.005 | 0.011 | 0.009 | 0.009 | 0.299 | ||||||||
| 5 | 0.002 | 0.024 | 0.009 | 0.001 | 0.031 | 0.032 | 0.996 | 0.002 | 0.021 | 0.028 | 0.028 | 0.068 | 0.046 | 0.428 | |||||||
Highlighted in bold the tests with significant p-value according to threshold category.
PedGene results have not been included given the inflated results of this test and the low correlation with the other gene-based methods.
Figure 5Gene network for the seven candidate genes (CHRD, CLCN2, CPAMD8, HDLBP, MAS1L, NLRP9, and PTK2B) with multiple evidence of a p ≤ 5 × 10−04, anchored with known AD genes (APP, PSEN1, PSEN2, APOE, TREM2, ADAM10, and PLD3), as described by GeneMANIA.