| Literature DB >> 35205221 |
Hong Liang1, Luolong Cao1, Yue Gao1, Haoran Luo1, Xianglian Meng2, Ying Wang3, Jin Li1, Wenjie Liu1,2.
Abstract
As an efficient method, genome-wide association study (GWAS) is used to identify the association between genetic variation and pathological phenotypes, and many significant genetic variations founded by GWAS are closely associated with human diseases. However, it is not enough to mine only a single marker effect variation on complex biological phenotypes. Mining highly correlated single nucleotide polymorphisms (SNP) is more meaningful for the study of Alzheimer's disease (AD). In this paper, we used two frequent pattern mining (FPM) framework, the FP-Growth and Eclat algorithms, to analyze the GWAS results of functional magnetic resonance imaging (fMRI) phenotypes. Moreover, we applied the definition of confidence to FP-Growth and Eclat to enhance the FPM framework. By calculating the conditional probability of identified SNPs, we obtained the corresponding association rules to provide support confidence between these important SNPs. The resulting SNPs showed close correlation with hippocampus, memory, and AD. The experimental results also demonstrate that our framework is effective in identifying SNPs and provide candidate SNPs for further research.Entities:
Keywords: Alzheimer’s disease; Eclat; FI; FPM; association rules; vGWAS
Mesh:
Year: 2022 PMID: 35205221 PMCID: PMC8871801 DOI: 10.3390/genes13020176
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1The workflow of this research. ADNI: Alzheimer’s Disease Neuroimaging Initiative; MNI: Montreal Neurological Institute; SNP: single nucleotide polymorphism; GWAS: genome-wide association study; FIs: frequent itemsets.
Demographic statistics of the participants in ADNI database.
| Characters | CN | SMC | EMCI | LMCI | AD |
|---|---|---|---|---|---|
| Number of samples | 353 | 89 | 273 | 504 | 296 |
| Gender(M/F) | 187/166 | 36/53 | 153/120 | 309/195 | 166/130 |
| Age (year, Mean ± SD) | 74.9 ± 5.7 | 72.2 ± 5.7 | 71.3 ± 7.1 | 74.0 ± 7.6 | 74.7 ± 7.6 |
| Education (year, Mean ± SD) | 16.1 ± 2.7 | 16.8 ± 2.6 | 16.1 ± 2.6 | 16.0 ± 2.9 | 15.5 ± 2.9 |
CN: clinically normal; SMC: subjective memory concerns; EMCI: early mild cognitive impairment; LMCI: late mild cognitive impairment; AD: mild Alzheimer’s disease dementia.
Figure 2The workflow of imaging data pre-processing.
Top 10 frequent SNPs sorted by coverage rate on 49,900 brain voxels.
| NO. | SNP | Support Rate |
|---|---|---|
| 1 | rs6014724 | 0.46 |
| 2 | rs11731587 | 0.39 |
| 3 | rs7806 | 0.36 |
| 4 | rs6024860 | 0.35 |
| 5 | rs1060743 | 0.34 |
| 6 | rs4243693 | 0.34 |
| 7 | rs7790238 | 0.33 |
| 8 | rs7219391 | 0.31 |
| 9 | rs386274 | 0.30 |
| 10 | rs6092321 | 0.30 |
The FP-Growth algorithm process. (FP-Tree: FPT, current itemset suffix: P = ϕ, Support rate threshold: s).
| Begin: |
| If (FPT is a single path or empty): |
| For each subset of item in path (return FI and its support judge by s) |
| Else: |
| ( |
| For each item i in chain of pointers |
| ( |
| Generate conditional pattern base Pi = ( |
| Extract conditional FP-tree FPTi from chain of pointers in Pi |
| If (FPT |
| ) |
| ) |
| end |
The Eclat algorithm process. (frequent pattern itemset: FP, Support rate threshold: s).
| Begin: |
| |
| FP |
| |
|
|
|
|
| |
| Recurve Eclat(FP |
| ) |
| ) |
| end |
Figure 3Number of frequent itemsets (FIs) for 49,900 brain voxels in different support rate threshold value using 2 algorithms.
Figure 4Number of 1-item FIs in different support rate threshold value for right hippocampus (A) and left hippocampus (B).
k-item FIs ordered by support rate using Eclat algorithm. FI: frequent itemset.
| Right Hippocampus | Left Hippocampus | ||
|---|---|---|---|
| 3-Item, 4-Item and 5-Item FIs (Top 5) | Support Rate | 3-Item, 4-Item and 5-Item FIs (Top 5) | Support Rate |
| rs1047389, rs11731587, rs10277969 | 0.65 | rs10277969, rs2242065, rs10498633 | 0.72 |
| rs1047389, rs10498633, rs10277969 | 0.63 | rs10277969, rs2242065, rs1047389 | 0.71 |
| rs1047389, rs11731587, rs16881446 | 0.60 | rs2242065, rs10498633, rs1047389 | 0.71 |
| rs11731587, rs10498633, rs10277969 | 0.58 | rs7563345, rs10498633, rs1047389 | 0.70 |
| rs1047389, rs11731587, rs10498633 | 0.58 | rs2242065, rs10498633, rs6082 | 0.70 |
| rs1047389, rs11731587, rs10498633, rs10277969 | 0.56 | rs10277969, rs2242065, rs10498633, rs6082 | 0.67 |
| rs1047389, rs11731587, rs16881446, rs10277969 | 0.54 | rs10277969, rs2242065, rs10498633, rs1047389 | 0.67 |
| rs1047389, rs10277969, rs1918296, rs886969 | 0.53 | rs7563345, rs2242065, rs10498633, rs6082 | 0.65 |
| rs1047389, rs11731587, rs10277969, rs1918296 | 0.52 | rs10277969, rs2242065, rs10498633, rs7563345 | 0.65 |
| rs1047389, rs10498633, rs10277969, rs1918296 | 0.51 | rs10277969, rs2242065, rs7000615, rs1047389 | 0.65 |
| NULL | NULL | rs10277969, rs7563345, rs2242065, rs10498633, rs6082 | 0.63 |
| rs10277969, rs1047389, rs2242065, rs10498633, rs6082 | 0.62 | ||
| rs10277969, rs1047389, rs7563345, rs2242065, rs10498633 | 0.62 | ||
| rs10277969, rs1047389, rs2242065, rs10498633, rs7000615 | 0.61 | ||
| rs1047389, rs7563345, rs2242065, rs10498633, rs6082 | 0.61 | ||
Figure 5Number of k-item FIs for s = 0.5 in right hippocampus and s = 0.6 in left hippocampus. FI: frequent itemset.
Association roles of 2-item FIs (top 5) and corresponding confidence.
| 2-Item FIs (Top 5) | Confidence | 2-Item FIs (Top 5) | Confidence |
|---|---|---|---|
| rs10498633 to rs10277969 | 0.90 (0.74/0.82) | rs10277969 to rs10498633 | 0.84 (0.74/0.88) |
| rs1047389 to rs10277969 | 0.94 (0.74/0.79) | rs10277969 to rs1047389 | 0.84 (0.74/0.88) |
| rs11731587 to rs1047389 | 0.97 (0.71/0.73) | rs1047389 to rs11731587 | 0.90 (0.71/0.79) |
| rs11731587 to rs10277969 | 0.92 (0.67/0.73) | rs10277969 to rs11731587 | 0.76 (0.67/0.88) |
| rs10277969 to rs1918296 | 0.76 (0.67/0.88) | rs1918296 to rs10277969 | 0.99 (0.67/0.68) |
Figure 6Predicted function of 21 frequent SNPs on right hippocampus (A) and 20 frequent SNPs on left hippocampus (B).
Figure 7The growth rate of correlations between 4 features and identified FIs in right hippocampus (A) and left hippocampus (B). The baseline 1-item FI is (rs10498633), 2-item FI is (rs10498633, rs10277969), 3-item FI is (rs10498633, rs10277969, rs1047389), 4-item FI is (rs10498633, rs10277969, rs1047389, rs11731587), 5-item FI is (rs10498633, rs10277969, rs1047389, rs11731587, rs2242065). Using 1-item as the benchmark, the growth rate of k-item (k = 2, 3, 4, 5) relative to 1-item was calculated.
Brain ROIs and Hippocampus subregions activated by rs10498633.
| Activated Brain ROIs: | Activated Hippocampus Subregions: | ||
|---|---|---|---|
| NO. | ROI | NO. | Subregion |
| 1 | Frontal_Inf_Orb | 1 | Hippocampal-amygdaloid Transition area |
| 2 | Olfactory | 2 | Cornu ammonis 1 |
| 3 | Insula | 3 | Pre subiculum |
| 4 | Hippocampus | 4 | Cornu ammonis 4 |
| 5 | Para Hippocampal | 5 | Para subiculum |
| 6 | Amygdala | 6 | Hippocampal fissure |
| 7 | Fusiform | ||
| 8 | Temporal_Pole_Sup | ||
| 9 | Temporal_Pole_Mid | ||
| 10 | Temporal_Inf | ||
Association rules of some significant FIs.
| Association Rules | Confidence | |
|---|---|---|
| Right Hippocampus | Left Hippocampus | |
| rs10498633 to rs10277969 | 0.90 | 0.86 |
| rs10498633, rs10277969 to rs1047389 | 0.85 | 0.91 |
| rs10498633, rs10277969, rs1047389 to rs11731587 | 0.88 | 0.87 |
| rs10498633, rs10277969, rs1047389, rs11731587 to rs2242065 | -- * | -- |
* The corresponding confidence is under 0.7.