| Literature DB >> 36046232 |
Wen Ma1, Yu-Lung Lau1, Wanling Yang1, Yong-Fei Wang1,2.
Abstract
Patients with systemic lupus erythematosus (SLE) present varied clinical manifestations, posing a diagnostic challenge for physicians. Genetic factors substantially contribute to SLE development. A polygenic risk scoring (PRS) model has been used to estimate the genetic risk of SLE in individuals. However, this approach assumes independent and additive contribution of genetic variants to disease development. We aimed to improve the accuracy of SLE prediction using machine-learning algorithms. We applied random forest (RF), support vector machine (SVM), and artificial neural network (ANN) to classify SLE cases and controls using the data from our previous genome-wide association studies (GWAS) conducted in either Chinese or European populations, including a total of 19,208 participants. The overall performances of these predictors were assessed by the value of area under the receiver-operator curve (AUC). The analyses in the Chinese GWAS showed that the RF model significantly outperformed other predictors, achieving a mean AUC value of 0.84, a 13% improvement upon the PRS model (AUC = 0.74). At the optimal cut-off, the RF predictor reached a sensitivity of 84% with a specificity of 68% in SLE classification. To validate these results, similar analyses were repeated in the European GWAS, and the RF model consistently outperformed other algorithms. Our study suggests that the RF model could be an additional and powerful predictor for SLE early diagnosis.Entities:
Keywords: SLE early detection; machine learning; polygenic risk score; random forests; systemic lupus erythematosus (SLE)
Year: 2022 PMID: 36046232 PMCID: PMC9421562 DOI: 10.3389/fgene.2022.902793
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Performances of the random forests (RF; (A)), support vector machine (SVM; (B)), artificial neural network (ANN; (C)), and the lassosum-based polygenic risk scoring (PRS) model (D) in predicting the development of SLE in Chinese population. The dashed line indicates the performance for each repeat. The solid lines indicate the averaged performance among the four repeats.
FIGURE 2Performances of the random forests (RF; (A)), support vector machine (SVM; (B)), artificial neural network (ANN; (C)), and the lassosum-based polygenic risk scoring (PRS) model (D) in predicting the development of SLE in European population. The dashed line indicates the performance for each repeat. The solid lines indicate the averaged performance among the four repeats.