| Literature DB >> 33282526 |
Nibal Arzouni1,2, Will Matloff1,3, Lu Zhao1, Kaida Ning1,2, Arthur W Toga1.
Abstract
BACKGROUND: Alzheimer's Disease (AD) is a neurodegenerative complex brain disease that represents a public health concern. AD is considered the fifth leading cause of death in Americans who are older than 65 years which prioritizes the importance of understanding the etiology of AD in its early stages before the onset of symptoms. This study attempted to further understand Alzheimer's disease (AD) etiology by investigating the dysregulated genes using gene expression data from multiple brain regions.Entities:
Keywords: AD biomarkers; Biological pathways; Brain; Classification; Gene expression; Late-onset Alzheimer’s disease; Linear mixed models; Machine learning
Year: 2020 PMID: 33282526 PMCID: PMC7717689
Source DB: PubMed Journal: J Alzheimers Dis Parkinsonism
Summary of demographics and characteristics for subjects with and without AD.
| Category | Control Subjects | Alzheimer’s Subjects |
|---|---|---|
| Mean (SD) | Mean (SD) | |
| Age at death (yrs) | 87.13 (5.62) | 89.60 (6.21) |
| Education (yrs) | 14.70 (3.13) | 13.67 (2.76) |
| Braak stage | 2.67 (1.24) | 4.13 (1.84) |
| CERAD score | 1.16 (0.94) | 1.80 (1.32) |
|
|
| |
| Sex | 14 F/16 M | 6 F/9 M |
| 3 Yes/ 27 No (10 %) | 5 Yes/10 No (33.33%) |
Top 10 differentially expressed genes between disease and control samples.
| Entrez ID | Symbol | LogFC | AvExp | t | P-value | B |
|---|---|---|---|---|---|---|
| 4744 |
| 0.653 | 4.387 | 14.34 | 5.12E-42 | 94.387 |
| 5816 |
| 0.631 | 2.839 | 9.60 | 7.80E-39 | 86.539 |
| 9840 |
| 0.587 | 3.437 | 12.91 | 6.00E-34 | 75.185 |
| 1.01E+08 |
| 0.574 | 1.014 | 12.62 | 1.86E-32 | 71.555 |
| 6616 |
| 0.560 | 8.954 | 12.3 | 8.17E-31 | 67.650 |
| 222008 |
| 0.551 | 4.364 | 12.11 | 6.87E-30 | 65.397 |
| 9118 |
| 0.544 | 4.665 | 11.95 | 3.89E-29 | 63.556 |
| 1123 |
| 0.529 | 7.890 | 8.43 | 7.51E-27 | 58.283 |
| 4747 |
| 0.538 | 6.488 | 8.42 | 5.15E-26 | 56.246 |
| 5999 |
| 0.512 | 5.509 | 11.26 | 9.18E-25 | 55.646 |
|
|
|
|
| |||
| NEFH | Neurofilament, heavy polypeptide | 11160 | 22 | |||
| PVALB | Parvalbumin | 18795 | 22 | |||
| TESPA1 | Thymocyte expressed, positive selection associated | 136747 | 12 | |||
| RNU6-33P | RNA, U6 small nuclear 33, pseudogene | 106 | 4 | |||
| SNAP25 | Synaptosomal-associated protein, 25kDa | 88588 | 20 | |||
| VSTM2A | V-set and transmembrane domain containing 2A | 28755 | 7 | |||
| INA | Internexin neuronal intermediate filament protein, alpha | 13208 | 10 | |||
| CHN1 | Chimerin | 1206068 | 2 | |||
| NEFL | Neurofilament, light polypeptide | 6155 | 8 | |||
|
| Regulator of G-protein Signaling | 48027 | 1 | |||
Figure 1:The Volcano plot of Log2 fold change versus B-statistic for all genes. Genes with P-value<.05 are shown as red circles. Otherwise, they are blue crosses. The NEFH gene is marked. The NEFH gene was the most statistically significant differentially expressed gene.
Figure 2:Log2 fold change comparison for the top ten genes in all 4 brain regions. All these genes were up regulated in AD using the four-brain region LMM analysis. All the genes were mostly up regulated in all four brain regions except for SNAP25, NEFL, RGS4 that were down regulated in the Hippocampus. RGS4 was also down regulated in the temporal cortex.
GO annotations using the top genes and the enriched pathways.
| Biological Process GO Terms | P-value | |
|---|---|---|
| Postsynaptic intermediate filament cytoskeleton organization | 9.56E-06 | |
| Neurofilament cytoskeleton organization | 5.73E-05 | |
| Postsynaptic cytoskeleton organization | 2.17E-04 | |
| Intermediate filament cytoskeleton organization | 8.73E-03 | |
| Intermediate filament-based process | 9.28E-03 | |
| Neurofilament bundle assembly | 1.43E-02 | |
|
|
| |
| Structural constituent of postsynaptic intermediate filament cytoskeleton | 3.00E-06 | |
| Structural constituent of synapse | 1.99E-04 | |
| Structural constituent of cytoskeleton | 2.52E-02 | |
|
|
| |
| Neurofilament | 2.82E-05 | |
| Postsynaptic intermediate filament cytoskeleton | 1.40E-03 | |
| Schaffer collateral – CA1 synapse | 8.07E-03 | |
| Postsynaptic cytoskeleton | 1.05E-02 | |
|
|
|
|
| Neuronal system | 276 | 2.66 |
| Transmission across chemical synapses | 184 | 2.5 |
| Voltage gated potassium channels | 43 | 2.44 |
| Potassium channels | 98 | 2.43 |
| Neurotransmitter receptor binding and transmission in postsynaptic cell | 135 | 2.34 |
| Neurotransmitter release cycle | 34 | 2.11 |
| Axon guidance | 243 | 1.7 |
|
|
|
|
| Translation | 209 | −2.6 |
| SRP dependent cotranslational protein targeting to membrane | 169 | −2.58 |
| Peptide chain elongation | 145 | −2.5 |
| Metabolism of RNA | 316 | −2.26 |
| DNA strand elongation | 30 | −1.93 |
| Synthesis of DNA | 91 | −1.91 |
| Metabolism of proteins | 484 | −1.9 |
| Lipoprotein metabolism | 28 | −1.88 |
Figure 3:Network graph of the AD enriched pathways. A zoomed-in network graph on the AD nodes representing the up regulated pathways in AD with P-value<.01 and FDR<.02. The yellow highlighted nodes are the most significantly enriched up regulated pathways with red edges showing the connections between those pathways. This is a zoomed-in graph from supplemental figure 1.
Figure 4:Supervised ML classification using the top ten genes. (A) Classification results on the training data using four algorithms. The plots correspond to the maximum training accuracy achieved on a combination of N genes where 2 ≤ N ≤ 10. (B) Classification results on the testing data. The plots correspond to the testing accuracy on N genes that achieved the highest accuracy on the training data.