| Literature DB >> 32234053 |
Xia Ma1,2, Yanping Wu3, Ling Zhang4, Weilan Yuan5,6, Li Yan7, Sha Fan8, Yunzhi Lian9, Xia Zhu3, Junhui Gao5,6, Jiangman Zhao5,6, Ping Zhang10, Hui Tang11,12, Weihua Jia13.
Abstract
BACKGROUND: Chronic obstructive pulmonary disease (COPD) is a major public health problem and cause of mortality worldwide. However, COPD in the early stage is usually not recognized and diagnosed. It is necessary to establish a risk model to predict COPD development.Entities:
Keywords: AQCI; Allele frequencies; COPD; Machine learning tools; SNP
Mesh:
Year: 2020 PMID: 32234053 PMCID: PMC7110698 DOI: 10.1186/s12967-020-02312-0
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Flow chart display. Flow chart showing the SNP selection, model training, and performance evaluation processes. A total of 633 subjects were recruited for the current study. The data were preprocessed and randomly divided into a training set (393 participants) and a test set (240 participants). k-fold cross-validation was used in the training set, and performance evaluation indexes such as AU-ROC and AU-PRC were adopted to judge the average predictive performance of each model
Fig. 2Nine SNPs associated with COPD. Forest plots show 9 SNPs associated with COPD (odds ratios (ORs) and 95% confidence intervals (CIs)). ORs are denoted by black boxes, and 95% CIs are denoted by the corresponding black lines
Fig. 3Evaluation of the predictive models. a, b The picture shows the AU-ROC and AU-PRC curves of the 6 models in the training set. Mean AUC values and 95% CIs of different prediction models are shown in the box
The efficacy of KNN, LR, SVM, DT, MLP and XGboost in the training set
| Metrics | KNN | LR | SVM | DT | MLP | XGboost |
|---|---|---|---|---|---|---|
| (95% CI) | (95% CI) | (95% CI) | (95% CI) | (95% CI) | (95% CI) | |
| AU-ROC | 0.83 (0.76–0.86) | 0.89 (0.83–0.93) | 0.88 (0.84–0.91) | 0.85 (0.77–0.88) | 0.80 (0.74–0.84) | 0.94 (0.89–0.98) |
| AU-PRC | 0.93 (0.90–0.94) | 0.96 (0.93–0.98) | 0.95 (0.89–0.96) | 0.95 (0.94–0.97) | 0.92 (0.89–0.94) | 0.97 (0.93–0.99) |
| Accuracy | 0.82 (0.77–0.86) | 0.83 (0.77–0.86) | 0.84 (0.82–0.88) | 0.89 (0.84–0.92) | 0.76 (0.74–0.79) | 0.91 (0.88–0.95) |
| Precision | 0.88 (0.83–0.92) | 0.86 (0.83–0.89) | 0.88 (0.84–0.91) | 0.92 (0.89–0.95) | 0.76 (0.74–0.79) | 0.95 (0.93–0.96) |
| Recall | 0.88 (0.85–0.90) | 0.91 (0.85–0.96) | 0.92 (0.90–0.95) | 0.94 (0.92–0.98) | 0.99 (0.98–1.00) | 0.93 (0.88–0.97) |
| F1 score | 0.88 (0.84–0.90) | 0.89 (0.84–0.91) | 0.90 (0.88–0.92) | 0.93 (0.91–0.96) | 0.86 (0.85–0.88) | 0.94 (0.91–0.96) |
| MCC | 0.54 (0.39–0.64) | 0.54 (0.39–0.61) | 0.58 (0.51–0.69) | 0.70 (0.57–0.79) | 0.22 (0.10–0.40) | 0.77 (0.70–0.86) |
| SPC | 0.67 (0.53–0.80) | 0.59 (0.48–0.71) | 0.63 (0.50–0.76) | 0.79 (0.72–0.85) | 0.10 (0.05–0.26) | 0.85 (0.81–0.90) |
| NPV | 0.66 (0.56–0.71) | 0.72 (0.56–0.83) | 0.74 (0.69–0.79) | 0.81 (0.70–0.98) | 0.87 (0.54–1.00) | 0.81 (0.73–0.89) |
AU-ROC area under the receiver operating characteristic curve, AU-PRC area under the precision-recall curve, MCC Matthews correlation coefficient, SPC specificity, NPV negative prognostic value, KNN k-nearest neighbors classifier, LR logistic regression, SVM support vector machine, DT decision tree, MLP multilayer perceptron, 95% CI 95% confidence interval
Fig. 4Analysis of the importance of each feature. The histogram describes the relative importance of 9 SNPs and 5 clinical features in the XGboost model. The relative importance is quantified by assigning a weight between 0 and 1000 for each variable
Fig. 5Validation of the training set. a, b The picture shows the AU-ROC and AU-PRC curves of all models in the test set
The efficacy of KNN, LR, SVM, DT, MLP and XGboost in the test set
| Metrics | KNN | LR | SVM | DT | MLP | XGboost |
|---|---|---|---|---|---|---|
| AU-ROC | 0.81 | 0.82 | 0.78 | 0.73 | 0.79 | 0.83 |
| AU-PRC | 0.88 | 0.86 | 0.81 | 0.87 | 0.81 | 0.88 |
| accuracy | 0.81 | 0.81 | 0.78 | 0.78 | 0.77 | 0.80 |
| precision | 0.82 | 0.80 | 0.77 | 0.76 | 0.73 | 0.79 |
| recall | 0.91 | 0.93 | 0.93 | 0.95 | 1.00 | 0.93 |
| F1 score | 0.86 | 0.86 | 0.84 | 0.85 | 0.85 | 0.85 |
| MCC | 0.59 | 0.58 | 0.52 | 0.53 | 0.53 | 0.56 |
| SPC | 0.65 | 0.60 | 0.53 | 0.51 | 0.38 | 0.57 |
| NPV | 0.81 | 0.84 | 0.82 | 0.85 | 1.00 | 0.84 |
AU-ROC area under the receiver operating characteristic curve, AU-PRC area under the precision-recall curve, MCC Matthews correlation coefficient, SPC specificity, NPV negative prognostic value, KNN k-nearest neighbors classifier, LR logistic regression, SVM support vector machine, DT decision tree, MLP multilayer perceptron