| Literature DB >> 32594841 |
Abstract
Risk prediction models have been developed in many contexts to classify individuals according to a single outcome, such as risk of a disease. Emerging "-omic" biomarkers provide panels of features that can simultaneously predict multiple outcomes from a single biological sample, creating issues of multiplicity reminiscent of exploratory hypothesis testing. Here I propose definitions of some basic criteria for evaluating prediction models of multiple outcomes. I define calibration in the multivariate setting and then distinguish between outcome-wise and individual-wise prediction, and within the latter between joint and panel-wise prediction. I give examples such as screening and early detection in which different senses of prediction may be more appropriate. In each case I propose definitions of sensitivity, specificity, concordance, positive and negative predictive value and relative utility. I link the definitions through a multivariate probit model, showing that the accuracy of a multivariate prediction model can be summarised by its covariance with a liability vector. I illustrate the concepts on a biomarker panel for early detection of eight cancers, and on polygenic risk scores for six common diseases.Entities:
Keywords: Risk prediction; biomarkers; multiplicity; multivariate analysis; polygenic risk score; screening
Mesh:
Substances:
Year: 2020 PMID: 32594841 PMCID: PMC7682512 DOI: 10.1177/0962280220929039
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Figure 1.Example outcomes in eight individuals. Outcomes predicted to occur are shown in black on the left panel. Outcomes that did occur are shown in black on the right panel. Ticks show individuals counting in the numerator for each sense of sensitivity. Here the sample joint sensitivity is 1/2, the screening sensitivity is 4/7, and the panel-wise sensitivity is 3/7. The outcome-wise sensitivity is 7/16.
Properties of fitted PRS for six common diseases.
| Disease | AUC | Prevalence | Liability | SNP | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Type-2 Diabetes | 0.66 | 0.102 | 0.0856 | 0.196 | 0.630 | 0.599 |
| Coronary Artery Disease | 0.623 | 0.0461 | 0.0398 | 0.22 | 0.600 | 0.575 |
| Crohn’s Disease | 0.75 | 0.005 | 0.103 | 0.26 | 0.701 | 0.666 |
| Ulcerative Colitis | 0.7 | 0.0025 | 0.0553 | 0.19 | 0.657 | 0.632 |
| Schizophrenia | 0.62 | 0.01 | 0.0254 | 0.235 | 0.595 | 0.576 |
| Rheumatoid Arthritis | 0.7 | 0.01 | 0.0732 | 0.18 | 0.661 | 0.629 |
Note: AUC and Prevalence, the reported values in the literature.[54–60] Liability , the diagonal elements of derived from AUC and Prevalence [49]. SNP , the liability variance explained by all genotyped SNPs, which is the maximum possible value of Liability .[24,60–63] Sensitivity and sensitivity, their values when risk threshold equals the prevalence.
Variance–covariance matrix between PRS for the six diseases of Table 1.
| T2D | CAD | CD | UC | SCZ | RA | |
|---|---|---|---|---|---|---|
| T2D | 0.0856 | |||||
| CAD | 0.0225 | 0.0398 | ||||
| CD | −0.0111 | 0.0347 | 0.102 | |||
| UC | −0.0086 | 0.0191 | 0.0409 | 0.0553 | ||
| SCZ | −0.00131 | 0 | 0.00679 | 0.00480 | 0.0254 | |
| RA | −0.038 | −0.034 | −0.00251 | 0.00566 | −0.00185 | 0.0732 |
Note: Assumed to equal the liability-PRS covariance matrix .
Genetic correlations between the six diseases of Table 1.
| T2D | CAD | CD | UC | SCZ | RA | |
|---|---|---|---|---|---|---|
| T2D | 1 | |||||
| CAD | 0.384 | 1 | ||||
| CD | −0.119 | 0.057 | 1 | |||
| UC | −0.125 | 0.038 | 0.543 | 1 | ||
| SCZ | −0.028 | 0 | 0.113 | 0.128 | 1 | |
| RA | −0.048 | −0.063 | −0.029 | 0.089 | −0.043 | 1 |
Note: Assumed to equal the correlations between their overall liabilities .