| Literature DB >> 34987547 |
Yazheng Di1,2, Jingying Wang3, Xiaoqian Liu1,2, Tingshao Zhu1,2.
Abstract
Background: The application of polygenic risk scores (PRSs) in major depressive disorder (MDD) detection is constrained by its simplicity and uncertainty. One promising way to further extend its usability is fusion with other biomarkers. This study constructed an MDD biomarker by combining the PRS and voice features and evaluated their ability based on large clinical samples.Entities:
Keywords: biomarkers; computer technology; depression; major depressive disorder (MDD); polygenic risk score (PRS); voice biomarkers
Year: 2021 PMID: 34987547 PMCID: PMC8721147 DOI: 10.3389/fgene.2021.761141
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Fivefold cross-validation of voice–gene data. In each fold, the samples were split into a training group and a test group. Voice and genetic sequence data of the training group were used to train the universal background model (UBM) and linear mixed model (LMM) separately. Then, i-vectors for the training and test groups were extracted through the UBM, and the polygenic risk score (PRS) can be calculated through the LMM. The i-vectors and PRS will be concatenated as input features for a machine learning (ML) model.
FIGURE 2Process of i-vector extraction. UBM-GMM is a universal background model adapted by a Gaussian mixture model. n = 256 means there were 256 Gaussian mixture clusters. d = 400 means the dimension of i-vectors is 400.
Number of SNPs selected on different p-value thresholds (PTs)
| PT | 5E−08 | 1E−06 | 1E−5 | 5E−5 | 1E−4 | 5E−4 | 1E−3 | 5E−3 |
|---|---|---|---|---|---|---|---|---|
|
| 3 | 5 | 11 | 44 | 79 | 321 | 580 | 2,350 |
FIGURE 3Polygenic risk score (PRS) model prediction results with different p-value thresholds (PTs) under different covariate use strategies. no-cov, no covariates were considered during the training and prediction processes; all-cov, all covariates were considered during the training and prediction processes; random-cov, the PRS model was trained with a sample genetic matrix along with covariates, but made predictions on samples whose covariates were replaced with random numbers. AUC, Area under the receiver operating characteristic curve.
FIGURE 4Prediction results with different p-value thresholds (PTs) using different biomarkers. The x-axis is the p-value threshold (PT) used in the gene model and the combined biomarkers. Voice biomarkers are not related to PT and are indicated by a dashed horizontal line. AUC, Area under the receiver operating characteristic curve.
FIGURE 5Stratified population accuracy using different biomarkers. The test samples were divided into three groups according to their predicted polygenic risk scores (PRSs). Accuracies were calculated for the three groups separately.
Classification results using different machine learning (ML) models
| Gene (PT = 0.001) + voice | Gene (PT = 0.005) + voice | |||||
|---|---|---|---|---|---|---|
| AUC | Sensitivity | Specificity | AUC | Sensitivity | Specificity | |
| LR | 0.79 | 0.79 | 0.78 | 0.83 | 0.83 | 0.83 |
| SVM | 0.79 | 0.80 | 0.78 | 0.83 | 0.83 | 0.83 |
| RF | 0.74 | 0.70 | 0.77 | 0.80 | 0.78 | 0.81 |
| MLP | 0.81 | 0.83 | 0.79 | 0.86 | 0.87 | 0.85 |
PT, p-value threshold; AUC, area under the receiver operating characteristic curve; LR, logistic regression; SVM, support vector machine; RF, random forest; MLP, multilayer perceptron