| Literature DB >> 33968940 |
Longxiu Yang1, Yuan Qin1, Chongdong Jian2.
Abstract
Alzheimer's disease (AD), a nervous system disease, lacks effective therapies at present. RNA expression is the basic way to regulate life activities, and identifying related characteristics in AD patients may aid the exploration of AD pathogenesis and treatment. This study developed a classifier that could accurately classify AD patients and healthy people, and then obtained 3 core genes that may be related to the pathogenesis of AD. To this end, RNA expression data of the middle temporal gyrus of AD patients were firstly downloaded from GEO database, and the data were then normalized using limma package following a supplementation of missing data by k-Nearest Neighbor (KNN) algorithm. Afterwards, the top 500 genes of the most feature importance were obtained through Max-Relevance and Min-Redundancy (mRMR) analysis, and based on these genes, a series of AD classifiers were constructed through Support Vector Machine (SVM), Random Forest (RF), and KNN algorithms. Then, the KNN classifier with the highest Matthews correlation coefficient (MCC) value composed of 14 genes in incremental feature selection (IFS) analysis was identified as the best AD classifier. As analyzed, the 14 genes played a pivotal role in determination of AD and may be core genes associated with the pathogenesis of AD. Finally, protein-protein interaction (PPI) network and Random Walk with Restart (RWR) analysis were applied to obtain core gene-associated genes, and key pathways related to AD were further analyzed. Overall, this study contributed to a deeper understanding of AD pathogenesis and provided theoretical guidance for related research and experiments.Entities:
Keywords: Alzheimer’s disease; IFS; KNN model; feature selection; protein-protein interaction network
Year: 2021 PMID: 33968940 PMCID: PMC8101499 DOI: 10.3389/fcell.2021.668738
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
FIGURE 1Overall workflow of this study.
FIGURE 2IFS and ROC analyses. (A) IFS curves of SVM, RF and KNN classifiers. The black curves indicate SVM classifiers; The blue curves indicate RF classifiers; The red curves indicate KNN classifiers; (B) ROC curve of the KNN classifier.
FIGURE 3PCA and heatmap analysis based on the feature genes in the KNN classifier. (A) PCA showed diagnostic efficiency of the KNN classifier in ND and AD populations; (B) Heatmap showed expression of feature genes in the KNN classifier in ND and AD populations. The red means high expression while the green means low expression.
FIGURE 4Core gene selection and functional enrichment analysis. (A) Venn diagram was drawn to select core genes between DEGs and feature genes in the KNN classifier; (B,C) Results of GO and KEGG enrichment analyses. The dot size means the number of genes enriched in corresponding terms; The dot color represents the significance of corresponding terms; (D) Expression of core genes (HSPB3, AEBP1, RNU1G2) in ND (green) and AD (red) populations.