Literature DB >> 33204816

Haplotype architecture of the Alzheimer's risk in the APOE region via co-skewness.

Alexander M Kulminski¹, Ian Philipp¹, Yury Loika¹, Liang He¹, Irina Culminskaya¹.

Abstract

INTRODUCTION: As a multifactorial polygenic disorder, Alzheimer's disease (AD) can be associated with complex haplotypes or compound genotypes.
METHODS: We examined associations of 4960 single nucleotide polymorphism (SNP) triples, comprising 32 SNPs from five genes in the apolipoprotein E gene (APOE) region with AD in a sample of 2789 AD-affected and 16,334 unaffected subjects.
RESULTS: We identified a large number of 1127 AD-associated triples, comprising SNPs from all five genes, in support of definitive roles of complex haplotypes in predisposition to AD. These haplotypes may not include the APOE ε4 and ε2 alleles. For triples with rs429358 or rs7412, which encode these alleles, AD is characterized mainly by strengthening connections of the ε4 allele and weakening connections of the ε2 allele with the other alleles in this region. DISCUSSION: Dissecting heterogeneity attributed to AD-associated complex haplotypes in the APOE region will target more homogeneous polygenic profiles of people at high risk of AD.

Entities: Chemical

Keywords: APOE polymorphism; Alzheimer's disease; age‐related phenotypes; linkage disequilibrium

Year: 2020 PMID： 33204816 PMCID： PMC7656174 DOI： 10.1002/dad2.12129

Source DB: PubMed Journal: Alzheimers Dement (Amst) ISSN： 2352-8729

BACKGROUND

Late‐onset Alzheimer's disease (AD), referred hereto as AD, is commonly considered as a multifactorial polygenic disorder. Despite relatively high heritability of AD of 58% to 79% estimated in the most extensive study to date using all twins in the Swedish Twin Registry aged 65 years and older, no one gene appears to be causative of AD. , , , , In contrast, early onset AD can be caused by highly penetrant mutations in the APP gene (chromosome 21) and two homologous genes, PSEN1 (chromosome 14) and PSEN2 (chromosome 1). , , , The ε4 allele from the apolipoprotein E (APOE) gene ε2/ε3/ε4 polymorphism has been known for decades as the strongest single genetic risk factor for AD in various populations, whereas the ε2 allele can be protective. , Studies also advocate for shifting “category of the APOE gene from ‘risk factor’ to ‘major gene.’” Nevertheless, even the role of the strongest genetic risk factor for AD is still controversial. Indeed, although most researchers believe that the APOE ε4 allele itself is a risk factor for AD; the others argue that the association between this allele and AD can be modulated by variants from nearby genes in this region. For example, Roses et al. advocate that variants from the nearby TOMM40 gene, such as tightly correlated with ε4 allele long poly thymine repeat polymorphism tagged by rs10524523, can increase susceptibility to AD either independently or in cis combination with the ε4 allele. , Franceschi et al. identified that a haplotype, including rs405509_T and ε4 alleles, increases the risk of AD when they are both in cis position. Furthermore, a more complex haplotype in the APOE region may increase susceptibility to AD independently of the ε4 allele. To better understand the complex role of genetic variants in AD pathogenesis, more comprehensive methods can be used. For example, Nielsen et al. reviewed and extended methods based on testing deviations from Hardy‐Weinberg equilibrium (HWE) in the affected subjects to localize disease‐susceptibility loci. Zaykin et al. developed a method based on contrasting linkage‐disequilibrium (LD) patterns between the affected and unaffected subjects to map patterns of single nucleotide polymorphisms (SNPs) to disease. Some studies attempted to access the differences in LD structures in the APOE region between AD‐affected and unaffected subjects qualitatively. , Recently we reported significant associations of both the entire SNP patterns and specific SNPs pairs in the APOE region with AD in populations of different ancestries. , , In this study, we use the standardized third mixed moment, or co‐skewness, to generalize LD between pairs of SNPs to triples of SNPs, and then use this metric to study the relationship of triples of SNPs in the APOE region (19q13.3) to AD. This region is represented by 32 SNPs from the BCAM, NECTIN2, TOMM40, APOE, and APOC1 genes, which include rs429358 and rs7412 SNPs coding the ε2/ε3/ε4 polymorphism. We performed the analysis using a mega sample of 2789 AD‐affected and 16,334 unaffected subjects from four studies. Our findings include heterogeneous patterns of triples of SNPs associated with AD led by the triple comprising rs2075650 (TOMM40), rs12721046 (APOC1), and the APOE ε4‐coding rs429358 SNP. Our results support the definitive roles of complex haplotypes in predisposition to AD in the APOE region.

RESEARCH IN CONTEXT

Systematic review: A literature review (PubMed and Google Scholar) identified interest in complex haplotypes/genotypes predisposing to Alzheimer's disease (AD), particularly in the apolipoprotein E gene (APOE) region. To better understand the role of such variants in AD pathogenesis, more comprehensive methods can be used. These relevant citations are appropriately cited. Interpretation: We leverage a new method to map triples of single nucleotide polymorphisms (SNPs) in the APOE region to AD. The analysis supports the definitive role of heterogeneous AD‐related haplotypes in AD, which include SNPs from five genes in the APOE region. AD is characterized mainly by strengthening connections of the ε4 allele and weakening connections of the ε2 allele with other SNPs in this region. Future directions: This work presents an approach to examine associations of haplotypes/genotypes comprising triples of SNPs with diseases such as AD. Extension to a larger number of SNPs would suggest more homogeneous genetic profiles of AD risk and protection.

METHODS

Study cohorts and phenotypes

We used data from the Framingham Heart Study (FHS) original and offspring cohorts, Cardiovascular Health Study (CHS), Health and Retirement Study (HRS), and the National Institute on Aging Late‐Onset Alzheimer's Disease Family Study (LOADFS) for individuals of European ancestry. LOADFS and FHS released information on AD defined using diagnoses made according to the National Institute of Neurological and Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association. A diagnosis of AD in HRS and CHS was defined based on ICD‐9:331.0x codes in Medicare service use files. There were N = 2799 AD‐affected subjects (cases) and N = 16,354 AD‐unaffected subjects (non‐cases) (Table 1).

TABLE 1

Basic characteristics of the genotyped participants in the selected studies

Sample	N_total	AD cases (%)	Men (%)	Birth year mean (SD)	Age at the end of follow‐up mean (SD), years	Follow‐up through
LOADFS	3999	1973(49.3)	1491(37.3)	1928.3 (12.5)	76.7 (12.5)	2015
HRS	7226	263 (3.6)	3129 (43.3)	1934.2 (8.4)	79.1 (8.1)	2012
CHS	4273	247 (5.8)	1864 (43.6)	1914.1 (5.7)	83.5 (5.4)	2002
FHS	3625	306 (8.4)	1862 (51.4)	1931.6 (12.7)	76.2 (11.1)	2012

Ntotal is total number of subjects in the analysis; AD cases: the number of Alzheimer's disease cases.

Abbreviations: SD, standard deviation; LOADFS, the National Institute on Aging Late‐Onset Alzheimer's Disease Family Study; HRS, Health and Retirement Study; CHS, Cardiovascular Health Study; FHS, Framingham Heart Study parental and offspring cohorts.

LOADFS is a study with a case‐control design that explains a large proportion of AD cases. The other studies are of longitudinal design following the study participants for long periods of time. HRS is a population‐based study, whereas FHS and CHS are community‐based studies. All studies included AD‐affected and unaffected subjects and, therefore, they could be separated into the samples according to the affection status to be used in the comparative analyses of co‐skewness.

Basic characteristics of the genotyped participants in the selected studies Ntotal is total number of subjects in the analysis; AD cases: the number of Alzheimer's disease cases. Abbreviations: SD, standard deviation; LOADFS, the National Institute on Aging Late‐Onset Alzheimer's Disease Family Study; HRS, Health and Retirement Study; CHS, Cardiovascular Health Study; FHS, Framingham Heart Study parental and offspring cohorts. LOADFS is a study with a case‐control design that explains a large proportion of AD cases. The other studies are of longitudinal design following the study participants for long periods of time. HRS is a population‐based study, whereas FHS and CHS are community‐based studies. All studies included AD‐affected and unaffected subjects and, therefore, they could be separated into the samples according to the affection status to be used in the comparative analyses of co‐skewness.

Genotypes

We used genotypes from genome‐wide and custom SNP arrays available for the selected studies, including the same customized Illumina iSelect array (the IBC‐chip, ≈50K single nucleotide polymorphisms [SNPs]) in the FHS and CHS cohorts, Affymetrix 500K in the FHS, Illumina HumanCNV370v1 chip (370K SNPs) in the CHS, Illumina HumanOmni 2.5 Quad chip (≈2.5 M SNPs) in the HRS, and Illumina Human 610Quadv1_B Beadchip (≈610K SNPs) in the LOADFS. The analyses focused on the same 32 SNPs representing the BCAM‐NECTIN2‐TOMM40‐APOE‐APOC1 (19q13.3) region (Table S1) as in previous studies. We selected SNPs available from common GWAS arrays, which were genotyped directly in at least two cohorts and which were not in strong LD in the mega sample of all studies (r<0.8). We excluded subjects with missingness >5%. Genotypes in these cohorts were phased and imputed using the Michigan imputation server with a reference panel from the Haplotype Reference Consortium (HRC) (version r1.1 2016). Only SNPs with high imputation quality were selected for the analysis (Table S1).

Co‐skewness metrics

We used the standardized third mixed moment, or co‐skewness, to generalize LD between pairs of SNPs to triples of SNPs (see Note S1). Then we used this metric to study the associations of triples of SNPs in the APOE region with AD by computing the difference in co‐skewness between AD‐affected and unaffected subjects. We derived two metrics based on genotype () and haplotype () counts. Haplotypes were inferred under the assumption of Hardy‐Weinberg equilibrium (or HWE) using the EM algorithm (the R package haplo.stats). HWE was interpreted in the broad sense that the gametes are probabilistically independent. Therefore, as in the case of pairwise LD, , the difference between and , that is., , characterizes deviation from HWE at the haplotype level, which, otherwise, may be difficult to detect. Because all SNPs were selected to be in HWE in the sample of cases and non‐cases combined, this deviation is unlikely to be an artifact and warrants biological interpretation. Haplotype‐based co‐skewness also provided convenient decomposition of the metric into two fractions () measuring (1) how far the triple is from mutual independence () and (2) a weighted sum of the pairwise LD values (). Co‐skewness is interpreted as joint deviation of the distributions of random variables from the normal distribution, that is, that larger values of or imply stronger connections between SNPs in a triple (Figure 1). This is consistent with the interpretation of the coefficient of pairwise LD. As in the case of pairwise LD, individually, the and coefficients are invariant under the change of the sign, whereas their signs have to be used consistently in comparative analyses, for example, when evaluating the difference .

FIGURE 1

Illustration of co‐skewness in the AD‐affected and unaffected subjects. The diagram shows the joint distributions of the compound genotypes for the SNPs rs2075650, rs12721046, and rs429358 as proportions. The left (blue) column shows proportions of the compound genotypes among Alzheimer's disease (AD) non‐cases, and the right (red) displays proportions among AD cases. Larger dot size indicates a larger proportion. Rows keep track of the number of rs429358 minor alleles coding the ε4 allele. Observe that the distribution for cases is considerably more clustered along the diagonal (0,0,0), (1,1,1), (2,2,2) than for non‐cases. This phenomenon visually confirms our result that the mutual term increases substantially from non‐cases to cases for this triple. Thus we can think of the mutual term as a direct generalization of linkage disequilibrium and view co‐skewness as a more subtle metric as it contains additional pairwise information A permutation test was employed to compute the significance of the effects defined by the differences in co‐skewness between the AD‐affected and unaffected subjects. We resampled the entire set of subjects into two groups with the same sizes as those of the affected and unaffected groups to estimate the permutation distribution of χ² = (g₁‐g₀)², where g₁ and g₀ are the co‐skewness estimates for the resampled groups for a fixed triple of SNPs. Using quantile‐quantile plots and Shapiro‐Wilk tests, we checked that the resulting permutation distribution of g₁‐g₀ is approximately normal and, therefore, that the permutation distribution of χ² indeed follows a chi‐squared distribution. We then calculated , where and s are the sample mean and standard deviation of the permutation distribution of g₁‐g₀, respectively. G₁ is the estimate of the co‐skewness for the original (non‐permuted) affected and unaffected groups. Afterward, we compared z² to a chi‐squared distribution with one degree of freedom to obtain P‐values.

Analysis

Co‐skewness was first evaluated in LOADFS and the mega sample of non‐LOADFS studies (created by pooling FHS, CHS, and HRS data sets) to examine the consistency of the directions of the effects in independent samples that is widely regarded as replication. Then we used the mega sample of all studies combined to increase statistical power. This study did not examine the roles of sex and age as we showed their trivial effect for the same SNPs.

RESULTS

Co‐skewness for triples of SNPs

We evaluated LD between triples of SNPs using the standardized third mixed moment, or co‐skewness (see Methods). For 32 SNPs representing the BCAM‐NECTIN2‐TOMM40‐APOE‐APOC1 (19q13.3) region (see Methods and Table S1), there were 4960 triples. We examined associations of these triples with AD by evaluating differences in co‐skewness between the AD‐affected (cases, N = 2,789) and unaffected (non‐cases, N = 16,334) subjects using data from four studies (Table 1). To perform the analysis, we used genotype‐based, , and haplotype‐based, , co‐skewness metrics. The genotype‐based method identified 51 triples of SNPs associated with AD at a Bonferroni‐adjusted locus‐wide significance level P < 10−5 = 0.05/4960 in LOADFS (Table S2). For most of these triples, 47 of 51 (92.2%), the effect direction, that is, the difference in co‐skewness between the AD‐affected () and unaffected () subjects , was replicated in the non‐LOADFS mega sample. The analysis of the mega sample of all studies revealed 302 significant associations at P < 10−5. For 269 of them (89.1%), the effect directions were consistent in LOADFS and non‐LOADFS. The haplotype‐based method identified significant effects at P < 10−5 for 313 triples in LOADFS. The effect directions were replicated for 277 (88.5%) triples in non‐LOADFS studies (Table S3). The analysis of the mega sample of all studies combined identified 1127 significant differences, of which consistent effect directions in LOADFS and non‐LOADFS were for 999 triples. The haplotype‐based estimates of the associations of with AD were consistent with those from the genotype‐based method for 989 of 999 triples. The top difference in the magnitude of co‐skewness was observed for the triple of rs2075650 (TOMM40), rs12721046 (APOC1), and rs429358 (minor allele of this SNP encodes the APOE ε4 allele) SNPs, , P = 7.38×10−178. The effect for this triple was much more extreme than the effects for the other triples, followed by the rs17561351 (NECTIN2), rs4081918, (NECTIN2), and rs405509 (APOE) triple with a 2.5‐fold smaller effect , P = 1.59×10−15 (Table 2, RΔG). The effect for the first triple was also the most significant, followed by , P = 1.24×10−167 for the rs157580 (TOMM40), rs440446 (APOE), and rs429358 (APOE) triple (Table 2, Rp‐val). Notably, the top 20 most significant triples included SNPs only from the TOMM40‐APOE‐APOC1 locus.

TABLE 2

Top differences in co‐skewness between Alzheimer's disease (AD) affected and unaffected subjects ranked using two metrics

ID	R_p‐val	R_ΔG	SNP1	SNP2	SNP3	E4	E2	G1caseh	G1nch	ΔG1h	p_value	G1caseh,m	G1nch,m	G1caseh,pw	G1nch,pw
4922	1	1	rs2075650	rs429358	rs12721046	e4	no	0.414	1.112	–0.698	7.38E‐178	0.439	0.276	–0.025	0.836
4895	2	12	rs157580	rs440446	rs429358	e4	no	0.258	0.058	0.199	1.24E‐167	–0.551	–0.267	0.809	0.325
4952	3		rs440446	rs429358	rs439401	e4	no	0.222	0.044	0.178	3.70E‐148	0.210	–0.285	0.012	0.329
4956	4		rs440446	rs439401	rs12721046	no	no	0.195	0.043	0.151	2.00E‐127	–0.511	–0.290	0.706	0.333
4921	5	18	rs2075650	rs429358	rs439401	e4	no	0.155	0.347	–0.192	5.05E‐124	–0.501	–0.552	0.656	0.899
4898	6		rs157580	rs440446	rs12721046	no	no	0.229	0.058	0.171	2.32E‐123	–0.494	–0.270	0.724	0.328
4900	7		rs157580	rs429358	rs439401	e4	no	–0.155	0.006	–0.161	4.35E‐119	–0.212	–0.174	0.057	0.179
4937	8	7	rs8106922	rs429358	rs12721046	e4	no	0.174	0.383	–0.209	8.96E‐115	0.251	0.086	–0.078	0.297
4907	9	5	rs2075650	rs8106922	rs429358	e4	no	0.118	0.336	–0.218	3.88E‐105	0.249	0.085	–0.132	0.251
4932	10		rs8106922	rs440446	rs7412	no	e2	0.106	0.249	–0.143	1.01E‐104	0.130	0.193	–0.024	0.057
4880	11	6	rs157580	rs2075650	rs429358	e4	no	0.108	0.325	–0.217	2.87E‐102	–0.382	–0.557	0.490	0.882
4904	12		rs157580	rs439401	rs12721046	no	no	–0.139	0.008	–0.146	2.87E‐98	0.509	0.275	–0.648	–0.267
4947	13	19	rs405509	rs429358	rs12721046	e4	no	0.334	0.525	–0.191	3.08E‐93	0.326	0.119	0.008	0.406
4920	14		rs2075650	rs429358	rs7412	e4	e2	0.031	0.120	–0.088	1.42E‐89	–0.159	–0.213	0.190	0.332
4959	15	11	rs429358	rs439401	rs12721046	e4	no	0.116	0.317	–0.201	2.85E‐89	0.243	0.088	–0.127	0.229
4916	16		rs2075650	rs440446	rs429358	e4	no	0.161	0.347	–0.187	2.99E‐88	0.233	0.084	–0.072	0.264
4953	17	13	rs440446	rs429358	rs12721046	e4	no	0.105	0.303	–0.198	4.15E‐76	–0.479	–0.523	0.585	0.826
4901	18	8	rs157580	rs429358	rs12721046	e4	no	0.108	0.315	–0.207	4.61E‐75	–0.476	–0.547	0.583	0.862
4958	19		rs429358	rs7412	rs12721046	e4	e2	0.036	0.121	–0.085	1.31E‐69	–0.126	–0.205	0.162	0.326
4938	20		rs8106922	rs7412	rs439401	no	e2	0.084	0.199	–0.115	6.53E‐69	0.135	–0.132	–0.051	0.331
4019		2	rs17561351	rs4081918	rs405509	no	no	0.102	0.380	–0.278	1.59E‐15	0.680	0.542	–0.579	–0.162
4015		3	rs17561351	rs4081918	rs283813	no	no	0.270	0.014	0.255	1.27E‐09	0.182	0.162	0.087	–0.147
4897		4	rs157580	rs440446	rs439401	no	no	0.457	0.222	0.235	6.72E‐53	–1.114	–0.865	1.571	1.087
1361		9	rs10402271	rs4803763	rs3852856	no	no	0.151	0.354	–0.203	1.82E‐30	–1.231	–1.152	1.382	1.506
4422		10	rs519113	rs387976	rs405509	no	no	0.036	0.239	–0.203	1.74E‐26	0.494	0.413	–0.458	–0.174
4012		14	rs17561351	rs4081918	rs11667640	no	no	1.397	1.595	–0.198	9.75E‐04	0.145	0.155	1.252	1.440
4912		15	rs2075650	rs405509	rs429358	e4	no	0.271	0.467	–0.196	2.06E‐61	0.309	0.107	–0.038	0.360
4771		16	rs6859	rs2075650	rs429358	e4	no	0.172	0.367	–0.195	1.18E‐23	–0.600	–0.805	0.772	1.172
4915		17	rs2075650	rs405509	rs12721046	no	no	0.268	0.461	–0.193	3.76E‐62	0.288	0.114	–0.020	0.347
2657		20	rs4803763	rs429358	rs12721046	e4	no	0.110	0.301	–0.191	4.71E‐16	0.406	0.292	–0.295	0.009

The data are from the pooled sample of the National Institute on Aging Late‐Onset Alzheimer's Disease Family Study, the Health and Retirement Study, the Cardiovascular Health Study, and the Framingham Heart Study parental and offspring cohorts.

ID in this column corresponds to IDs in Table S3.

R and R show ranking based on P‐value and effect size, respectively. Blank cells indicate ranking outside of top 20 triples for each metric.

E4 or E2 indicates presence of the APOE ε4‐coding rs429358 SNP or ε2‐coding rs7412 SNP in the triple.

is the effect defined as difference in co‐skewness between the AD‐affected () and unaffected () subjects using the haplotype‐based method.

and denote partitions of the and metrics into two fractions of mutual independence () and a weighted sum of the pairwise linkage disequilibrium () values (see Methods).

Italic shows SNPs from the NECTIN2 gene. Underlining denotes BCAM SNP. The other SNPs are in the TOMM40‐APOE‐APOC1 locus (see more details in Table S3).

Top differences in co‐skewness between Alzheimer's disease (AD) affected and unaffected subjects ranked using two metrics The data are from the pooled sample of the National Institute on Aging Late‐Onset Alzheimer's Disease Family Study, the Health and Retirement Study, the Cardiovascular Health Study, and the Framingham Heart Study parental and offspring cohorts. ID in this column corresponds to IDs in Table S3. R and R show ranking based on P‐value and effect size, respectively. Blank cells indicate ranking outside of top 20 triples for each metric. E4 or E2 indicates presence of the APOE ε4‐coding rs429358 SNP or ε2‐coding rs7412 SNP in the triple. is the effect defined as difference in co‐skewness between the AD‐affected () and unaffected () subjects using the haplotype‐based method. and denote partitions of the and metrics into two fractions of mutual independence () and a weighted sum of the pairwise linkage disequilibrium () values (see Methods). Italic shows SNPs from the NECTIN2 gene. Underlining denotes BCAM SNP. The other SNPs are in the TOMM40‐APOE‐APOC1 locus (see more details in Table S3). The top 30 differences in co‐skewness between AD‐affected and unaffected subjects ranked using and P‐value‐based metrics included 18 triples with rs429358 SNP and four triples with rs7412 (minor allele of this SNP encodes the APOE ε2 allele) (Table 2). Two of these four triples included both rs429358 and rs7412 SNPs, and either rs2075650 (TOMM40) or rs12721046 (APOC1). LD between rs2075650 and rs12721046 was moderate (r 2 = 0.48). Co‐skewness for the top triple of rs2075650, rs12721046, and rs429358 SNPs in cases ( ) appears to be smaller than in non‐cases ( ); that should indicate weaker connections between SNPs in this triple in the AD‐affected subjects than in the AD‐unaffected subjects. Partitioning these haplotype‐based metrics into two fractions of mutual independence () and a weighted sum of the pairwise LD values () (see Methods) shows, however, that the component increases, whereas the component decreases in cases (Table 2). This change in co‐skewness between AD‐affected and unaffected subjects is consistent with stronger connections between all three SNPs in cases (Figure 1).

Co‐skewness for triples harboring the ε2 or ε4 coding SNP

We further characterized differences in co‐skewness for 465 triples, which included rs429358, and 465 triples, which included rs7412. In the rs429358‐tailored set, there were 116 triples for which the difference in co‐skewness attained the locus‐wide significance (P < 10−5) and 105 such triples in the rs7412‐tailored set (Figure 2A and Table S3). These triples included SNPs from all selected genes in this region. We found seven triples, which had both rs429358 and rs7412, and one of the following SNPs: rs4803763, rs440277, rs6859, rs283813 (all four are from NECTIN2), rs2075650 (TOMM40), rs405509 (APOE), and rs12721046 (APOC1).

FIGURE 2

Heat maps for triples harboring ε4‐coding rs429358 (upper‐left triangle) or ε2‐coding rs7412 (lower‐right triangle). Heat maps for triples in the samples with (A) no exclusions and (B) exclusion of either carriers of the ε2 (upper‐left) or ε4 (lower‐right) allele. Numbers show the effect multiplied by 103 defined as the difference in co‐skewness between the AD‐affected () and unaffected () subjects. Red (blue) shows a positive (negative) difference that indicates strengthening (weakening) connections between SNPs in the triple. Purple shows triples for which the effect directions are of opposite signs in AD cases and non‐cases. Color shades denote significance levels, as shown in the upper inset. No color shows P > 0.05. Orange indicates undefined values where either rs429358 or rs7412 would occur twice in the same triple (A and B), or the value is undefined due to one of rs429358 or rs7412 being constant on the strata (B)

Co‐skewness for triples in the sample with no either the ε4 or ε2 allele carriers

Next, we examined the roles of the ε4 and ε2 alleles in the associations of with AD identified in Section 3.2. In the sample with no carriers of the ε4 allele (ie, excluding subjects carrying a minor allele of rs429358), there were five significant associations of with AD at P < 10−5 and 529 significant associations at 10−5 ≤ P < .05 (Table S4). In contrast, in the sample with no carriers of the ε2 allele (ie, excluding subjects carrying a minor allele of rs7412), there were 899 significant associations at P < 10−5 and 1474 significant associations at 10−5 ≤ P < .05 (Table S4). In part, this difference is due to a 2.5‐fold smaller sample of the AD cases with no carriers of the ε4 allele (maximum of cases is Ncase = 1035) compared to that with no carriers of the ε2 allele (maximum of cases is Ncase = 2578). To examine potential connections of the ε2 allele with alleles from other SNPs in the triples, we focused on all 435 triples with rs7412 in the sample with no carriers of the ε4 allele. The heat maps in Figure 2 (lower‐right triangle) show that the significance of the associations was substantially decreased for most triples in this sample; that was due to reduced sample size and/or effect size (Table S4). For two triples, was associated with AD at P < 10−5. Both of them included rs7412 and rs8106922 (TOMM40), and either rs440446 (APOE) or rs405509 (APOE). LD between rs440446 and rs405509 was modest (r 2 = 0.62). For 85 triples, the association of with AD attained significance at 10−5 < P < 0.05. The ε4 allele was more extensively involved in the associations of with AD than the ε2 allele as we observed 95 of 435 associations of with AD at P < 10−5 in the sample with the exclusion of the ε2 allele carriers (Figure 2, upper‐left triangles, and Table S4). Figure 2 shows prevailing patterns of decreased co‐skewness in the ε2‐bearing triples and increased co‐skewness in the ε4‐bearing triples in AD‐affected subjects.

DISCUSSION

Here we leverage a new method to map triples of SNPs in the APOE region to AD. This method uses haplotype‐ and genotype‐based metrics of co‐skewness and generalizes LD between pairs of SNPs to SNP triples (see Methods and Note S1). By analogy with pairwise LD, the associations have been assessed by evaluating the difference in co‐skewness between the AD‐affected and unaffected subjects of European ancestry. The haplotype‐based method identified 1127 triples associated with AD at the locus‐wide significance P < 10−5, whereas the genotype‐based method identified 302 triples. This 3.7‐fold difference is driven mainly by the sensitivity of the haplotype‐based method to the deviation from HWE, as haplotypes are inferred under HWE, whereas the genotype‐based metric () does not require HWE. Because HWE for individual SNPs in full samples of LOADFS and non‐LOADFS (Table S1) does not guarantee HWE for subsamples (eg, as those stratified by the AD status) or at haplotype level (see Methods), consistently with the case of LD between SNP pairs, , , the deviation from HWE is likely meaningful. This means that such a departure is unlikely an artifact but rather the result of some unobserved biological processes in an organism. Here this sensitivity is driven primarily by deviation from HWE at haplotype level in the AD‐affected subjects, as indicated by consistently larger differences in cases than non‐cases (Figure 3, Tables S2 and S3). Accordingly, the observed sensitivity indicates SNPs, which can be involved in the regulation of such unobserved biological processes specific to AD‐affected subjects. This information may be missed using the genotype‐based method alone as it is extracted from the difference of the results from the genotype‐ and haplotype‐based methods. These findings support a role of haplotypes comprising at least three alleles from different SNPs from the same or different genes in the APOE region in AD pathogenesis, rather than independent alleles (see the Introduction).

FIGURE 3

Differences in the estimates of co‐skewness using haplotype‐ and genotype‐based methods. The histogram shows the differences between the estimates of the differences in co‐skewness in cases and non‐cases multiplied by 103, that is, . Red (blue) denotes negative (positive) differences. Red shows dominant skewness driven by deviation from the Hardy‐Weinberg equilibrium at the haplotype level in the AD‐affected subjects. The differences in magnitude, which are larger than the x‐axis limits, are included in the flanking bins Despite understanding that AD is a highly heterogeneous, genetically complex disorder, current research often pursues the logic of medical genetics, assuming the existence of variants causing a complex trait. For example, a fundamental concept of genome‐wide association studies (GWAS) is that SNPs discovered by GWAS are merely proxies for actual causal variants for a given trait. This logic is inherited from studies of monogenic diseases when a single highly penetrant mutation can cause a specific disease, which seems to be the case of APP, PSEN1, and PSEN2 genes and the autosomal dominant form of early onset AD. , , , For complex (ie, non‐Mendelian) diseases, such as late‐onset AD, GWAS report only small or moderate statistical effects. , , These findings inspire studies of combined action of small‐effect variants. One common strategy is to aggregate the effects of many small‐effect variants spread through the entire genome into a polygenic risk score. Another approach is to identify AD risks attributed to haplotypes comprising specific alleles. , , , , An advantage of co‐skewness is that it highlights SNPs whose alleles can naturally define extended haplotypes. This is important in the framework of polygenic predisposition to complex diseases, as such haplotypes represent more accurate polygenic disease profiles. The complexity of the co‐skewness pattern in the APOE region is unlikely caused by pairwise LD between SNPs comprising different triples, as these SNPs have been selected not to be in strong LD. This complexity supports genetic heterogeneity in susceptibility to AD beyond that related to race/ethnic differences. Accordingly, the co‐skewness approach is also useful to highlight such genetic heterogeneity. Our results show that AD is characterized mainly by the decrease and increase of co‐skewness in the ε2‐ and ε4‐bearing triples, respectively, in AD‐affected subjects. These changes indicate strengthening connections (ie, a higher rate of co‐occurrence) of the ε4 allele and weakening connections (ie, a smaller rate of co‐occurrence) of the ε2 allele with alleles from the other SNPs in this region (Figure 2). For example, for the top triple, which includes the ε4 coding rs429358 SNP and SNPs from the TOMM40 (rs2075650) and APOC1 (rs12721046) genes, we observe strengthening connections between all SNPs in the triple in the AD‐affected subjects (Table 2, Figure 1). Notably, we found that three triples include rs405509 (APOE) (Table 2), which is linked to all SNPs from the top triple, and that all these triples show the same type of associations with AD as the top triple, that is, stronger connections between all SNPs in the triple in the AD‐affected subjects. This result provides firm support to previous reports on the association of a haplotype comprising the rs405509_T and ε4 alleles with the risk of AD and a reason for the dose‐dependent association of rs405509_T with AD in carriers of the ε4/ε4 homozygotes. This result, however, suggests that the rs405509_T and ε4 haplotype should include a higher number of the risk alleles, which may be from the same or different genes, that concurs with. The functional role of NECTIN2, TOMM40, and APOC1 also suggests that they could contribute to the pathogenesis of AD in addition to APOE. In contrast, four top rs7412‐bearing triples (Table 2) show that AD is associated with weakening connections of the ε2 allele with alleles from TOMM40 (rs2075650, rs8106922), APOE (rs429358, rs440446), and APOC1 (rs439401, rs12721046). The same prevailing pattern of weakening connections of rs7412 with other SNPs in the APOE region is observed in AD‐affected subjects who do not carry the ε4 allele (Figure 2B). Two triples, which include rs7412 and rs8106922 (TOMM40) and either rs440446 (APOE) or rs405509 (APOE), were associated with AD at P < 10−5. Thus better protection of the ε2 allele against AD may require haplotypes, including the ε2 allele and alleles from neighboring genes. Figure 2 shows that the ε4 allele is associated with a more complex AD‐related pattern of co‐skewness than that for the ε2 allele. Therefore, although these alleles can be involved in haplotypes affecting the AD risks, there should be a higher number of the AD‐associated haplotypes with ε4 than ε2 in this region, and the ε4‐bearing haplotypes should be more sophisticated, including a more sizeable number of alleles from the other SNPs from the same or different genes. Associations of triples without rs429358 and rs7412 with AD support the existence of AD‐predisposing haplotypes in this region, which may not include ε4 or ε2 alleles. However, the lack of rs429358 and rs7412 SNPs in top triples (Table 2) should be interpreted with caution given that haplotypes may include more genetic variants. Our results are supported by prior findings of a complex transcriptional regulatory structure in the APOE region, which includes multiple enhancers modulating gene expression. Bekris et al. also showed that regional enhancers in cis could functionally influence the cell‐specific expression of TOMM40 and APOE, and suggested the biological role of promoter‐enhancer haplotypes in AD pathogenesis. Given that the function of enhancers can be modulated by nearby and distant non‐coding variants, the role of extended haplotypes harboring local and distant alleles in AD pathogenesis is feasible. Our results raise a fundamental issue of a driving force of AD‐related haplotypes. By contrasting pairwise LD in younger and older subjects, we recently showed that LD structures observed in older AD‐free subjects and the younger subjects who were not under noticeable mortality risk were the same. This finding favors the role of recent and specific (eg, within families or communities, a divergence of ancestral groups) selection, which can be indirectly relevant to AD. Such selection could be driven by exogenous factors supporting, thus, the concept of the AD exposome. Trumble and Finch provided solid arguments supporting the role of exposures to environmental toxins such as air pollution and tobacco smoking in recent human evolution. It was shown that the ε4 allele increases the risk of dementia from air pollution. Thus environmental toxins, the prevalence of which increases over time, can be a driver of the adaptive haplotypes in the APOE region, which may become deleterious for cognition in late life. Another factor driving the adaptation of haplotypes in the APOE region could be health protection in infected environments in early life. , , Of interest, such protection can be even extended to adult life, as was shown in the Tsimané indigenous population living in the highly infectious environment. Despite the rigor of this study, we acknowledge the limitation that the AD diagnoses based on ICD‐9 codes in HRS and CHS can be less accurate than those in LOADFS and FHS. Although this inaccuracy may affect the precision of the co‐skewness estimates, the consistency of the directions of differences in the AD‐affected and unaffected subjects in independent samples partly offsets this problem. In conclusion, our results on the 3.7‐fold excess in the estimates of the associations of triples of SNPs with AD using the haplotype‐based method compared to the genotype‐based method support the definitive roles of complex haplotypes in predisposition to AD in the APOE region. The complex structure of such haplotypes is supported by a large number of the AD‐associated triples, which include SNPs from all five genes. AD is characterized mainly by strengthening connections of the ε4 allele and weakening connections of the ε2 allele with the other alleles in this region. Finally, these results support a more extensive role of the ε4 allele than the ε2 allele in complex AD‐related haplotypes. However, the latter result should be interpreted with caution as the number of the ε2 carriers in this sample is substantially smaller than those of ε4.

CONFLICTS OF INTEREST

Nothing to report.

FUNDING INFORMATION

This research was supported by Grants No R01 AG047310, R01 AG061853, R01 AG065477, and R01 AG070488 from the National Institute on Aging. The funders had no role in study design, data collection, and analysis, decision to publish, or preparation of the manuscript. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

AUTHOR CONTRIBUTIONS

Alexander M Kulminski conceived and designed the experiment and wrote the paper. Ian Philipp developed co‐skewness metrics, wrote the paper, and performed statistical analyses. Irina Culminskaya wrote the paper. Yury Loika and Liang He prepared data. Supporting information Click here for additional data file. Supporting information Click here for additional data file. Supporting information Click here for additional data file. Supporting information Click here for additional data file. Supporting information Click here for additional data file.

45 in total

1. Detecting marker-disease association by testing for Hardy-Weinberg disequilibrium at a marker locus.

Authors: D M Nielsen; M G Ehm; B S Weir
Journal: Am J Hum Genet Date: 1998-11 Impact factor: 11.025

2. Segregation of a missense mutation in the amyloid precursor protein gene with familial Alzheimer's disease.

Authors: A Goate; M C Chartier-Harlin; M Mullan; J Brown; F Crawford; L Fidani; L Giuffra; A Haynes; N Irving; L James
Journal: Nature Date: 1991-02-21 Impact factor: 49.962

3. Inflammatory gene variants in the Tsimane, an indigenous Bolivian population with a high infectious load.

Authors: Sarinnapha Vasunilashorn; Caleb E Finch; Eileen M Crimmins; Suvi A Vikman; Jonathan Stieglitz; Michael Gurven; Hillard Kaplan; Hooman Allayee
Journal: Biodemography Soc Biol Date: 2011

Review 4. Role of apolipoprotein E4 in protecting children against early childhood diarrhea outcomes and implications for later development.

Authors: Reinaldo B Oriá; Peter D Patrick; James A Blackman; Aldo A M Lima; Richard L Guerrant
Journal: Med Hypotheses Date: 2006-11-13 Impact factor: 1.538

5. Next-generation genotype imputation service and methods.

Authors: Sayantan Das; Lukas Forer; Sebastian Schönherr; Carlo Sidore; Adam E Locke; Alan Kwong; Scott I Vrieze; Emily Y Chew; Shawn Levy; Matt McGue; David Schlessinger; Dwight Stambolian; Po-Ru Loh; William G Iacono; Anand Swaroop; Laura J Scott; Francesco Cucca; Florian Kronenberg; Michael Boehnke; Gonçalo R Abecasis; Christian Fuchsberger
Journal: Nat Genet Date: 2016-08-29 Impact factor: 38.330

6. Role of genes and environments for explaining Alzheimer disease.

Authors: Margaret Gatz; Chandra A Reynolds; Laura Fratiglioni; Boo Johansson; James A Mortimer; Stig Berg; Amy Fiske; Nancy L Pedersen
Journal: Arch Gen Psychiatry Date: 2006-02

Review 7. Replicability and Prediction: Lessons and Challenges from GWAS.

Authors: Urko M Marigorta; Juan Antonio Rodríguez; Greg Gibson; Arcadi Navarro
Journal: Trends Genet Date: 2018-04-30 Impact factor: 11.639

8. DNA methylation of TOMM40-APOE-APOC2 in Alzheimer's disease.

Authors: Yvonne Shao; McKenzie Shaw; Kaitlin Todd; Maria Khrestian; Giana D'Aleo; P John Barnard; Jeff Zahratka; Jagan Pillai; Chang-En Yu; C Dirk Keene; James B Leverenz; Lynn M Bekris
Journal: J Hum Genet Date: 2018-01-25 Impact factor: 3.172

9. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease.

Authors: J C Lambert; C A Ibrahim-Verbaas; D Harold; A C Naj; R Sims; C Bellenguez; A L DeStafano; J C Bis; G W Beecham; B Grenier-Boley; G Russo; T A Thorton-Wells; N Jones; A V Smith; V Chouraki; C Thomas; M A Ikram; D Zelenika; B N Vardarajan; Y Kamatani; C F Lin; A Gerrish; H Schmidt; B Kunkle; M L Dunstan; A Ruiz; M T Bihoreau; S H Choi; C Reitz; F Pasquier; C Cruchaga; D Craig; N Amin; C Berr; O L Lopez; P L De Jager; V Deramecourt; J A Johnston; D Evans; S Lovestone; L Letenneur; F J Morón; D C Rubinsztein; G Eiriksdottir; K Sleegers; A M Goate; N Fiévet; M W Huentelman; M Gill; K Brown; M I Kamboh; L Keller; P Barberger-Gateau; B McGuiness; E B Larson; R Green; A J Myers; C Dufouil; S Todd; D Wallon; S Love; E Rogaeva; J Gallacher; P St George-Hyslop; J Clarimon; A Lleo; A Bayer; D W Tsuang; L Yu; M Tsolaki; P Bossù; G Spalletta; P Proitsi; J Collinge; S Sorbi; F Sanchez-Garcia; N C Fox; J Hardy; M C Deniz Naranjo; P Bosco; R Clarke; C Brayne; D Galimberti; M Mancuso; F Matthews; S Moebus; P Mecocci; M Del Zompo; W Maier; H Hampel; A Pilotto; M Bullido; F Panza; P Caffarra; B Nacmias; J R Gilbert; M Mayhaus; L Lannefelt; H Hakonarson; S Pichler; M M Carrasquillo; M Ingelsson; D Beekly; V Alvarez; F Zou; O Valladares; S G Younkin; E Coto; K L Hamilton-Nelson; W Gu; C Razquin; P Pastor; I Mateo; M J Owen; K M Faber; P V Jonsson; O Combarros; M C O'Donovan; L B Cantwell; H Soininen; D Blacker; S Mead; T H Mosley; D A Bennett; T B Harris; L Fratiglioni; C Holmes; R F de Bruijn; P Passmore; T J Montine; K Bettens; J I Rotter; A Brice; K Morgan; T M Foroud; W A Kukull; D Hannequin; J F Powell; M A Nalls; K Ritchie; K L Lunetta; J S Kauwe; E Boerwinkle; M Riemenschneider; M Boada; M Hiltuenen; E R Martin; R Schmidt; D Rujescu; L S Wang; J F Dartigues; R Mayeux; C Tzourio; A Hofman; M M Nöthen; C Graff; B M Psaty; L Jones; J L Haines; P A Holmans; M Lathrop; M A Pericak-Vance; L J Launer; L A Farrer; C M van Duijn; C Van Broeckhoven; V Moskvina; S Seshadri; J Williams; G D Schellenberg; P Amouyel
Journal: Nat Genet Date: 2013-10-27 Impact factor: 38.330

10. Enhancer variants associated with Alzheimer's disease affect gene expression via chromatin looping.

Authors: Masataka Kikuchi; Norikazu Hara; Mai Hasegawa; Akinori Miyashita; Ryozo Kuwano; Takeshi Ikeuchi; Akihiro Nakaya
Journal: BMC Med Genomics Date: 2019-09-09 Impact factor: 3.063

6 in total

1. APOE, TOMM40, and sex interactions on neural network connectivity.

Authors: Tianqi Li; Colleen Pappas; Scott T Le; Qian Wang; Brandon S Klinedinst; Brittany A Larsen; Amy Pollpeter; Ling Yi Lee; Mike W Lutz; William K Gottschalk; Russell H Swerdlow; Kwangsik Nho; Auriel A Willette
Journal: Neurobiol Aging Date: 2021-09-30 Impact factor: 4.673

2. Definitive roles of TOMM40-APOE-APOC1 variants in the Alzheimer's risk.

Authors: Alexander M Kulminski; Ian Philipp; Leonardo Shu; Irina Culminskaya
Journal: Neurobiol Aging Date: 2021-09-15 Impact factor: 4.673

3. Genome-wide analysis identified abundant genetic modulators of contributions of the apolipoprotein E alleles to Alzheimer's disease risk.

Authors: Alireza Nazarian; Yury Loika; Liang He; Irina Culminskaya; Alexander M Kulminski
Journal: Alzheimers Dement Date: 2022-01-03 Impact factor: 16.655

4. Challenges at the APOE locus: a robust quality control approach for accurate APOE genotyping.

Authors: Michael E Belloy; Sarah J Eger; Yann Le Guen; Vincent Damotte; Shahzad Ahmad; M Arfan Ikram; Alfredo Ramirez; Anthoula C Tsolaki; Giacomina Rossi; Iris E Jansen; Itziar de Rojas; Kayenat Parveen; Kristel Sleegers; Martin Ingelsson; Mikko Hiltunen; Najaf Amin; Ole Andreassen; Pascual Sánchez-Juan; Patrick Kehoe; Philippe Amouyel; Rebecca Sims; Ruth Frikke-Schmidt; Wiesje M van der Flier; Jean-Charles Lambert; Zihuai He; Summer S Han; Valerio Napolioni; Michael D Greicius
Journal: Alzheimers Res Ther Date: 2022-02-04 Impact factor: 6.982

5. Protective association of the ε2/ε3 heterozygote with Alzheimer's disease is strengthened by TOMM40-APOE variants in men.

Authors: Alexander M Kulminski; Ian Philipp; Yury Loika; Liang He; Irina Culminskaya
Journal: Alzheimers Dement Date: 2021-07-26 Impact factor: 21.566

6. Haplotype architecture of the Alzheimer's risk in the APOE region via co-skewness.

Authors: Alexander M Kulminski; Ian Philipp; Yury Loika; Liang He; Irina Culminskaya
Journal: Alzheimers Dement (Amst) Date: 2020-11-11

6 in total