| Literature DB >> 30081875 |
Haifang Ni1,2, Rolf H H Groenwold3,4, Mirjam Nielen5, Irene Klugkist6,7.
Abstract
BACKGROUND: Random effects modelling is routinely used in clustered data, but for prediction models, random effects are commonly substituted with the mean zero after model development. In this study, we proposed a novel approach of including prior knowledge through the random effects distribution and investigated to what extent this could improve the predictive performance.Entities:
Keywords: Clustered data; Expert knowledge; Informative priors for the random effects; Random effects prediction model; Truncated distribution
Mesh:
Year: 2018 PMID: 30081875 PMCID: PMC6080562 DOI: 10.1186/s12874-018-0543-5
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
An example of using posterior samples from model development data analysis for prediction in a new cluster
| Posterior from model development data | Prediction for new cluster | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Iteration | Subject 1 |
| Subject | ||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5001 | −1.35 | 1.07 | 1.17 | .50 | 1.11 | .58 |
| −.46 | .21 |
| 5011 | −1.24 | 1.08 | .88 | −1.89 | 1.11 | .13 |
| −.46 | .03 |
| 5021 | −1.36 | 1.18 | 1.28 | −.06 | 1.11 | .47 |
| −.46 | .12 |
| 5031 | −1.31 | 1.05 | .98 | −.64 | 1.11 | .31 |
| −.46 | .08 |
| 5041 | −.94 | .98 | 1.37 | .26 | 1.11 | .60 |
| −.46 | .24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Median | .52 | .15 | |||||||
arandom effect sampled from the normal distribution
bpredicted risk calculated by
Fig. 1The random effects distribution divided into multiple truncated areas of equal proportions in 3 different scales. A truncated area contains either half, one third, or one fifth of the distribution. Based on elicited expert knowledge for each cluster, a particular truncated area from each scale is chosen and used as prior distribution for the random effect of the cluster. We considered a prior distribution that contains 1/2 of the distribution as low informative, 1/3 as medium informative, and 1/5 as highly informative
Results from the prediction models for data simulated with prevalence = 50%, ICC = .20, n = 5000 (J = S = 50, n = n=100), β1 = 1.5 (the default setting)
| Optimal score | FREQ | BAYES.WI | BAYES.LI | BAYES.MI | BAYES.HI | FREQ.2 | FREQ.3 | FREQ.5 | |
|---|---|---|---|---|---|---|---|---|---|
| Overall Brier score | 0 | .191 | .192 | .179 | .174 | .170 | .173 | .170 | .167 |
| Overall C-index/AUC | 1 | .782 | .781 | .808 | .818 | .826 | .822 | .827 | .833 |
| Overall calibration slope | 1 | .911 | .907 | .957 | .982 | .989 | .965 | .972 | .994 |
| Within cluster C-index/AUCa | 1 | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] |
| Within cluster calibration slopea | 1 | .914 [.102] | .914 [.102] | .947 [.091] | .956 [.078] | .963 [.058] | .954 [.092] | .973 [.080] | .977 [.062] |
amean[sd]
Fig. 2Calibration plots for the five prediction models. Predicted risks are plotted against the true latent underlying risks for 5000 subjects from 50 equal sized clusters. The diagonal indicates the line of identity (i.e., predicted risks are equivalent to the true risks). Each dot represents a subject, and each line formed by the dots represents a cluster
Results from the Bayesian models with informative priors including different percentages of discrepant expert opinion
| FREQ | BAYES.WI | BAYES.LI | BAYES.MI | BAYES.HI | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Percentage wrong expert opinion | – | – | 10% | 30% | 50% | 10% | 30% | 50% | 10% | 30% | 50% |
| Overall Brier score | .191 | .192 | .180 | .192 | .201 | .174 | .179 | .182 | .170 | .173 | .174 |
| Overall C-index/AUC | .782 | .781 | .806 | .781 | .764 | .818 | .808 | .801 | .826 | .821 | .818 |
| Overall calibration slope | .911 | .907 | .946 | .874 | .824 | .982 | .964 | .950 | .989 | .988 | .987 |
| Within cluster C-index/AUCa | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] | .805 [.037] |
| Within cluster calibration slopea | .914 [.102] | .914 [.102] | .946 [.091] | .939 [.100] | .935 [.100] | .953 [.077] | .939 [.084] | .935 [.085] | .962 [.059] | .953 [.068] | .951 [.070] |
amean[sd]
Fig. 3Calibration plots for Bayesian models using discrepant expert opinion as prior information for the random effects. Predicted risks are plotted against the true latent underlying risks for 5000 subjects from 50 equal sized clusters. Clusters using optimal expert opinion are displayed in grey color, whereas clusters using discrepant expert opinion are addressed in black color. The diagonal line is the line of identity (i.e., predicted risks are equal to the true risks). Each dot represents a subject, and each line formed by the dots represents a cluster