| Literature DB >> 30311271 |
Svetlana Cherlin1, Darren Plant2, John C Taylor3,4, Marco Colombo5, Athina Spiliopoulou5, Evan Tzanis6, Ann W Morgan4,7, Michael R Barnes6, Paul McKeigue5, Jennifer H Barrett3,4, Costantino Pitzalis6, Anne Barton2,8, Matura Consortium1,2,3,4,5,6,7,8, Heather J Cordell2.
Abstract
Although a number of treatments are available for rheumatoid arthritis (RA), each of them shows a significant nonresponse rate in patients. Therefore, predicting a priori the likelihood of treatment response would be of great patient benefit. Here, we conducted a comparison of a variety of statistical methods for predicting three measures of treatment response, between baseline and 3 or 6 months, using genome-wide SNP data from RA patients available from the MAximising Therapeutic Utility in Rheumatoid Arthritis (MATURA) consortium. Two different treatments and 11 different statistical methods were evaluated. We used 10-fold cross validation to assess predictive performance, with nested 10-fold cross validation used to tune the model hyperparameters when required. Overall, we found that SNPs added very little prediction information to that obtained using clinical characteristics only, such as baseline trait value. This observation can be explained by the lack of strong genetic effects and the relatively small sample sizes available; in analysis of simulated and real data, with larger effects and/or larger sample sizes, prediction performance was much improved. Overall, methods that were consistent with the genetic architecture of the trait were able to achieve better predictive ability than methods that were not. For treatment response in RA, methods that assumed a complex underlying genetic architecture achieved slightly better prediction performance than methods that assumed a simplified genetic architecture.Entities:
Keywords: cross validation; prediction; snp data; treatment response
Mesh:
Year: 2018 PMID: 30311271 PMCID: PMC6334178 DOI: 10.1002/gepi.22159
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
The number of SNPs included in the analysis and the tuning parameters that require cross validation, for the MATURA data sets, for the 11 methods.
| Method | Number of SNPs for anti‐TNF cohort | Number of SNPs for MTX cohort | Tuning parameters that require cross validation |
|---|---|---|---|
| Lasso | |||
| Elastic Net | 40,000 | 42,000 | Penalty parameter |
| Ridge | |||
| RF | 9,000 | 9,000 | Number of variables to split |
| SVR | 9,000 | 9,000 | Standard deviation for Gaussian RBF kernel |
| SPLS | 40,000 | 42,000 | Number of components, sparsity parameter |
| GCTA‐GREML | 4,542,024 | 6,291,430 | NA |
| PRSice | |||
| BSLMM | 40,000 | 42,000 | NA |
| SkyNet | 9,000 | 9,000 | NA |
| LDpred | 340,000 | 370,000 | NA |
Note. The prediction is based on the SNP effects only. For the PBC data set, we use SVM (Support Vector Machine) instead of SVR on account of binary outcome.
For PRSice, the LD‐clumping is performed within the software, resulting in ≈ 140,000 SNPs for the anti‐TNF cohort, and ≈ 170, SNPs for the MTX cohort.
Figure 1Pearson correlation coefficient from the prediction analyses for the 11 methods for all the data sets [Color figure can be viewed at wileyonlinelibrary.com]
Figure 2Calibration slope (a slope of 1 suggests perfect calibration) from the prediction analyses for the 11 methods for all the data sets [Color figure can be viewed at wileyonlinelibrary.com]
Figure 3Prediction mean squared error (PMSE; lower values indicate better fit) from the prediction analyses for the 11 methods for all the data sets [Color figure can be viewed at wileyonlinelibrary.com]
Figure 4Prediction with lasso for the SimSparse data set. The black dashed line is the equality line; the red dashed line is the best fit line [Color figure can be viewed at wileyonlinelibrary.com]
The number of responders and nonresponders after the transformation of the phenotype to the binary format for the MATURA data sets
| Treatment | Phenotype | Responders | Nonresponders | Total |
|---|---|---|---|---|
| Anti‐TNF | CRP | 192 | 896 | 1,088 |
| SJC28 | 660 | 1,122 | 1,782 | |
| ESR | 513 | 1,062 | 1,575 | |
| MTX | CRP | 144 | 474 | 618 |
| SJC28 | 161 | 468 | 629 |
Figure 5Area under the curve (AUC) from the prediction analyses for the 11 methods for all the data sets, after transforming the phenotype to a binary format [Color figure can be viewed at wileyonlinelibrary.com]
Pearson correlation coefficient (Cor.), the calibration slope (a slope of 1 suggests perfect calibration), the prediction mean squared error (PMSE; lower values indicate better fit), and area under the curve (AUC) for the anti‐TNF data sets, for the three methods
| Data Set | Method | Cor. | Slope | PMSE | AUC |
|---|---|---|---|---|---|
| Anti‐TNF (CRP) | Lasso | 0.44 | 0.98 | 1.27 | 0.81 |
| BSLMM | 0.43 | 0.91 | 1.29 | 0.8 | |
| SkyNet | 0.34 | 0.5 | 1.57 | 0.74 | |
| Anti‐TNF (SJC28) | Lasso | 0.77 | 0.99 | 17.63 | 0.73 |
| BSLMM | 0.76 | 0.97 | 18.7 | 0.73 | |
| SkyNet | 0.7 | 0.81 | 23.45 | 0.71 |
Note. The prediction is based on the SNP effects and the covariates.