Literature DB >> 32639513

Probabilistic Quantification of Bias to Combine the Strengths of Population-Based Register Data and Clinical Cohorts-Studying Mortality in Osteoarthritis.

Aleksandra Turkiewicz, Peter M Nilsson, Ali Kiadaliri.

Abstract

We propose combining population-based register data with a nested clinical cohort to correct misclassification and unmeasured confounding through probabilistic quantification of bias. We have illustrated this approach by estimating the association between knee osteoarthritis and mortality. We used the Swedish Population Register to include all persons resident in the Skåne region in 2008 and assessed whether they had osteoarthritis using data from the Skåne Healthcare Register. We studied mortality through year 2017 by estimating hazard ratios. We used data from the Malmö Osteoarthritis Study (MOA), a small cohort study from Skåne, to derive bias parameters for probabilistic quantification of bias, to correct the hazard ratio estimate for differential misclassification of the knee osteoarthritis diagnosis and confounding from unmeasured obesity. We included 292,000 persons in the Skåne population and 1,419 from the MOA study. The adjusted association of knee osteoarthritis with all-cause mortality in the MOA sample had a hazard ratio of 1.10 (95% confidence interval (CI): 0.80, 1.52) and was thus inconclusive. The naive association in the Skåne population had a hazard ratio of 0.95 (95% CI: 0.93, 0.98), while the bias-corrected estimate was 1.02 (95% CI: 0.59, 1.52), suggesting high uncertainty in bias correction. Combining population-based register data with clinical cohorts provides more information than using either data source separately.

Entities: Chemical Disease Gene Species

Keywords: mortality; osteoarthritis; probabilistic quantification of bias; register data

Mesh：

Year: 2020 PMID： 32639513 PMCID： PMC7705601 DOI： 10.1093/aje/kwaa134

Source DB: PubMed Journal: Am J Epidemiol ISSN： 0002-9262 Impact factor: 4.897

Abbreviations

confidence interval Malmö Osteoarthritis Study Malmö Diet and Cancer Study Historically, many epidemiologic studies were based on data collected for specific study purposes and thus including a well-defined study sample (typically from a cohort or a case-control design) (1). Such studies include careful assessments of the exposures and outcomes of interest, while collecting data on relevant confounders using validated standardized instruments or disease classification criteria. However, a common problem encountered in such epidemiologic studies is nonresponse and loss to follow-up, leading to high risk of selection bias (2, 3). In the computerized era, epidemiologists have started to use population-based register data (4). Utilization of such existing data is cost-effective, maximizes the value of already collected data, and enables inclusion of the whole population of a particular region (5). Such population-based register data can minimize selection bias, not only due to inclusion of the whole population of a particular region but also due to the lack of relationship between the data collection process and the study question (6). Additionally, register data usually also provide high statistical precision due to the inclusion of a large number of participants. However, register data might suffer from misclassification of both exposures and outcomes, given that data collection might not be based on established criteria or instruments. Further, they often lack information on important confounding variables, such as measurements of body weight or smoking (7). Quantification of bias in epidemiologic studies, while strongly advocated, is still quite rare in practice (8). Methods for probabilistic quantification of bias can be used on population-based data sets that are free of selection bias to correct for misclassification and confounding using bias parameters derived from a smaller cohort study with gold-standard measures of exposures, outcomes and confounders (9). Here, we illustrate such an approach in the study of mortality associated with osteoarthritis, by combining population-based register data with a smaller cohort study nested within the same population, to utilize the strengths of both data sources. Osteoarthritis is widespread chronic joint disease that, as of today, has no disease-modifying treatments available (10). In the literature, estimates of the association between osteoarthritis and all-cause mortality vary depending on the type of study. Population-based register studies suggest no association or a slight decrease in risk, while studies based on smaller cohorts suggest increased risk of mortality (11–18). An explanation for this discrepancy could be the different biases present in the different data sources. Thus, in this observational study we use probabilistic quantification of bias using an entire regional population as the main study sample, and a smaller cohort nested within this population as a validation sample, to estimate the effect of osteoarthritis on all-cause mortality, considering 3 potential sources of bias: selection, misclassification, and confounding.

METHODS

Aims of the study

This study had 2 related aims. First, we aimed to illustrate how methods for probabilistic bias corrections can be used on population-based register data that suffer from both misclassification of exposure and unmeasured confounding, by using bias parameters derived from smaller cohort study. Second, we aimed to use this approach to estimate the effect of knee osteoarthritis on all-cause mortality, to clarify previously conflicting results in the literature on this important epidemiologic question. For this, the main estimand of interest is the total causal effect of knee osteoarthritis on all-cause mortality in the population of the Skåne region in southern Sweden.

Data sources

Population-based cohort—Skåne region.

The population of interest includes persons aged 56–84 years, resident in the Skåne region on December 31, 2006, who were also residents of the region in 1997 (to ensure coverage in the register data). Using data from the Swedish Population Register, we identified all eligible persons, including age, sex, and vital status. For all persons in the Skåne region, the Skåne Healthcare Register (SHR) contains information about all health-care visits, including diagnostic codes (International Classification of Diseases, Tenth Revision) assigned by the treating physician within the public health-care system (19). From this register we retrieved diagnoses recorded between January 1, 1998, and June 30, 2008, for all persons included in the study. We considered a person to have diagnosed knee osteoarthritis if they had at least 1 visit with a code M17 registered. However, we believe that knee osteoarthritis diagnoses in the register data are prone to misclassification with respect to the gold-standard clinical knee osteoarthritis measure. This is because only approximately 50% of persons with clinical knee osteoarthritis in Sweden consult health providers and get their knee osteoarthritis diagnosed (20). For assessment of confounders, we used data from the LISA database (in Swedish, “Longitudinell integrationsdatabas för sjukförsäkrings- och arbetsmarknadsstudier”) maintained by Statistics Sweden. We extracted information for all included persons on highest level of education attained, whether they were married (or had a registered partner), income, and whether they were born in Sweden. Overview of the samples for a study of all-cause mortality and osteoarthritis, Sweden, 2007–2017. MOA, Malmö Osteoarthritis Study.

Validation cohort—Malmö Osteoarthritis Study.

The Malmö Osteoarthritis Study (MOA) was carried out between 2007 and 2008 with the aim of estimating the prevalence of knee osteoarthritis in the city of Malmö, Sweden. The details of the study can be found in Turkiewicz et al. (20). In brief, the MOA study originated from another large cohort study within the region, the Malmö Diet and Cancer Study (MDCS), established between 1991 and 1996 (21). The first part of the MOA study consisted of a knee pain questionnaire sent to a random sample of 10,000 subjects from the MDCS cohort who were still alive and resident in the Skåne region at the end of the year 2006. Further, the persons were required to be aged 56–84 years at this time. Respondents were classified into 2 groups: having frequent knee pain or not. Additionally, a question about willingness to participate in the second part of the study was included. In the second part of the study, from the responders that indicated willingness to participate, a random sample of 1,300 subjects with frequent knee pain and 650 subjects without were invited to a clinical visit and radiographic examination (this stage had different sampling weights for those with and without frequent knee pain). The clinical examinations took place between May 2007 and June 2008. Thus, within the MOA study there are 2 samples—first, the total MOA sample of 10,000 persons, and second, the MOA subsample that attended the clinical examination. From both of these samples, in the present study, we included persons still living in the region and alive on July 1, 2008 (i.e., when all the clinical examinations within the MOA study were completed). We retrieved the following variables as measured between years 1991 and 1996 as part of the MDCS for persons in the total MOA sample: weight and height, waist circumference, systolic blood pressure, and smoking status. From the MOA clinical examination in the years 2007–2008 we included the assessment of clinical knee osteoarthritis (i.e., in accordance with the American College of Rheumatology diagnostic criteria (22), as defined above). Further, as the participants of the MOA study are also part of the population of Skåne, we could retrieve information on diagnosed knee osteoarthritis and date of death for all MOA participants, in the same way as in the population-based cohort (Figure 1).

Figure 1

Overview of the samples for a study of all-cause mortality and osteoarthritis, Sweden, 2007–2017. MOA, Malmö Osteoarthritis Study.

The ethical approval for this study was obtained from the ethical review board in Lund (Dnr. 2006–552, 2011–277 and 2019–03213).

Statistical methods

Analytical model.

Our primary analysis concerned the association between clinical knee osteoarthritis and all-cause mortality in the population-based cohort. For this analysis, we used the Cox proportional hazards regression model. The exposure was diagnosed knee osteoarthritis. We adjusted for the following potential confounders: age, sex, education (categorical: ≤9 years, 10–12 years, 13–14 years, ≥15 years), income, whether married/legally partnered, and whether born outside Sweden, all measured before the start of follow-up. The start of follow-up for all persons was July 1, 2008 (for consistency with the validation sample), and the follow-up ended on the date of death or December 31, 2017 (the end of study), whichever came first. We evaluated the assumption of proportional hazards using plots of Schoenfeld residuals, and while there was a slight indication of nonproportional hazards, it was small enough not to have relevant impact for the analyses in this study. This primary analysis suffers from 2 important biases: 1) potential misclassification of knee osteoarthritis, because diagnosed knee osteoarthritis is expected to be misclassified with respect to the gold-standard measure of clinical knee osteoarthritis, and 2) unadjusted confounding from obesity, because information about weight is not available in our register data. We aimed to correct these biases using bias parameters derived from the validation sample—the MOA study. The overview of these corrections is given below. However, we suspected that the MOA sample might suffer from selection bias (i.e., be nonrepresentative of the underlying Skåne population) due to nonresponse at several stages of the study.

Addressing selection bias in the MOA study.

The MOA sample might suffer from selection bias due to nonresponse at several stages of the study. We used weighting to correct for this nonresponse (23, 24). A logistic regression model with age, sex, income, education, marital/partner status, whether born abroad, and diagnosed knee osteoarthritis was used to estimate the probability of being included in the total MOA study sample, and the reciprocal was used as a weight. A similar procedure was used to derive weights to address nonresponse in the MOA clinical examination sample. There were 3 stages where the nonresponse could arise: 1) not answering the postal questionnaire, 2) not willing to participate in the second part of the MOA study, and 3) not coming to the clinical examination. Thus, we used 3 logistic regression models to estimate the probability of nonresponse at each of these 3 stages. Each model was fitted weighted with nonresponse weights derived in previous steps. We estimated the probability of responding to the MOA postal questionnaire, in a logistic model with age, sex, income, education, marital status, whether born abroad, diagnosis of knee osteoarthritis, continuous weight and binary obesity status (defined as a body mass index, calculated as height (kg)/weight (m)2, of >30), waist circumference, systolic blood pressure, and smoking status (categorized into current smoker, former smoker, or never smoker). Further, we estimated the probability of willingness to participate in the second part of the MOA study, using a model as above, and additionally adjusted for frequent knee pain status. In the last model, we estimated the probability of attending the clinical examination using a model similar to the model for willingness to participate. The design weights (the reciprocal of the probability of being invited to the second stage of the MOA study) were multiplied by the weights estimated from the 4 models above to form the final weights for analyses of the MOA clinical examination sample (24, 25). To assess whether the 2 MOA samples were representative of the underlying Skåne population with respect to the prevalence of diagnosed knee osteoarthritis and its association with mortality, we estimated both of these quantities in both MOA samples and also in the underlying Skåne population.

Probabilistic quantification of bias in the population-based cohort.

In estimating the association between knee osteoarthritis and all-cause mortality, we considered 2 sources of bias. First, diagnosed knee osteoarthritis might suffer from differential misclassification if persons with other serious comorbidities and thus higher propensity of death are less likely to have their knee osteoarthritis diagnosed or, on the contrary, more likely to have their osteoarthritis diagnosed. Second, our analysis model did not adjust for obesity—one of the major risk factors for both knee osteoarthritis and mortality, which could result in confounding bias (26). We considered clinical knee osteoarthritis as assessed at the MOA clinical examination as the gold standard (i.e., true exposure). Further, we considered obesity defined as BMI of >30 from the MOA study as a true measure of obesity. The obesity definition was based on BMI measurements within MDCS in the years 1991–1996 (likely preceding the incidence of knee osteoarthritis and thus potentially being a true confounder).

Parameters for correcting the differential misclassification.

All bias parameters were estimated separately for those who died and for those who were alive at the end of follow-up to allow for differential misclassification. From the MOA clinical examination sample, taking into account the weights used to minimize selection bias, we estimated the sensitivity and specificity of the diagnosed knee osteoarthritis. For both sensitivity and specificity we assumed a beta distribution parameterized with modes. For example, for sensitivity we set the α parameter as the number of test positives plus 1, and the β as the number of test negatives plus 1 (9). All estimates and bias parameters are given in Table 1.

Table 1

Bias Parameters Used for Probabilistic Quantification of Bias, Derived From the Malmö Osteoarthritis Study (Sweden, 2008) and Obtained in Simulations

Parameter	MOA Sample ^a				Mean (SD) of Sampled Parameters From Simulations ^b
	Mean (Linearized SE of the Mean)	95% CI of the Mean	Parameters of Beta Distribution
	Mean (Linearized SE of the Mean)	95% CI of the Mean	α	β
Sensitivity
Died	0.50 (0.08)	0.35, 0.64	58	58	0.50 (0.05)
Alive	0.36 (0.04)	0.28, 0.45	127	221	0.36 (0.03)
Specificity
Died	0.97 (0.01)	0.95, 0.99	188	6	0.97 (0.01)
Alive	0.96 (0.01)	0.95, 0.98	805	32	0.96 (0.01)
Prevalence of clinical knee osteoarthritis^c
Died	—^c	—^c	—^c	—^c	0.13 (0.03)
Alive	—^c	—^c	—^c	—^c	0.12 (0.02)
Positive predictive values^c
Died	—^c	—^c	—^c	—^c	0.70 (0.13)
Alive	—^c	—^c	—^c	—^c	0.55 (0.09)
Negative predicted values^c
Died	—^c	—^c	—^c	—^c	0.93 (0.02)
Alive	—^c	—^c	—^c	—^c	0.92 (0.02)
Log(HR_conf)^d	0.09 (0.24)	−0.39, 0.57	N/A	N/A	0.09 (0.25)
Prevalence of obesity^e
Diagnosed knee osteoarthritis and died	0.36 (0.03)	0.30, 0.42	97	174	0.36 (0.03)
Diagnosed knee osteoarthritis and alive	0.28 (0.02)	0.24, 0.32	143	369	0.28 (0.02)
No diagnosed knee osteoarthritis and died	0.15 (0.01)	0.14, 0.17	367	2,011	0.15 (0.01)
No diagnosed knee osteoarthritis and alive	0.113 (0.004)	0.105, 0.121	731	5,733	0.11 (0.004)
Prevalence of obesity^e
Clinical knee osteoarthritis and died	0.37 (0.07)	0.24, 0.51	43	73	0.37 (0.04)
Clinical knee osteoarthritis and alive	0.29 (0.04)	0.21, 0.38	100	248	0.29 (0.02)
No clinical knee osteoarthritis and died	0.13 (0.04)	0.07, 0.21	25	169	0.13 (0.02)
No clinical knee osteoarthritis and alive	0.09 (0.01)	0.06, 0.12	73	764	0.09 (0.01)

Abbreviations: CI, confidence interval; HRconf, confounding hazard ratio; MOA, Malmö Osteoarthritis Study; N/A, not applicable; SD, standard deviation; SE, standard error.

a Summary statistics for parameters estimated in MOA sample.

b Summary statistics for parameters actually sampled from the respective beta distribution in quantification of bias analysis, from 10,000 repeats.

c Derived from sensitivity and specificity.

d Log(HRconf) was assumed to follow normal distribution with the estimated mean and standard deviation.

e Diagnosed knee osteoarthritis and prevalence of obesity were available for the total MOA sample, n = 9,628; clinical knee osteoarthritis was available from the MOA clinical examination sample, n = 1,491.

Bias Parameters Used for Probabilistic Quantification of Bias, Derived From the Malmö Osteoarthritis Study (Sweden, 2008) and Obtained in Simulations Abbreviations: CI, confidence interval; HRconf, confounding hazard ratio; MOA, Malmö Osteoarthritis Study; N/A, not applicable; SD, standard deviation; SE, standard error. a Summary statistics for parameters estimated in MOA sample. b Summary statistics for parameters actually sampled from the respective beta distribution in quantification of bias analysis, from 10,000 repeats. c Derived from sensitivity and specificity. d Log(HRconf) was assumed to follow normal distribution with the estimated mean and standard deviation. e Diagnosed knee osteoarthritis and prevalence of obesity were available for the total MOA sample, n = 9,628; clinical knee osteoarthritis was available from the MOA clinical examination sample, n = 1,491.

Parameters for correcting confounding by obesity.

We used 3 approaches to correct for confounding from obesity. First, we estimated the confounding hazard ratio (HRconf) in the MOA clinical examination sample. We did this by fitting 2 Cox regression models for all-cause mortality, one adjusted for obesity and one unadjusted, and we calculated the ratio of the hazard ratios for the association between clinical knee osteoarthritis and mortality. To estimate the 95% confidence intervals for the confounding hazard ratio, we used a formula for the 95% confidence interval for a ratio of ratios. Second, we estimated the proportion of obesity among 4 strata created by tabulating the exposure and outcome, with 95% confidence intervals. We did this twice, once using diagnosed knee osteoarthritis as the exposure (available in the total MOA sample) and a second time using clinical knee osteoarthritis as the exposure (available only within the MOA clinical examination sample). Using the estimated prevalence of obesity in the 4 strata, we sampled the obesity status (obese or not) for each person in the population-based sample. We used a normal distribution to parameterize the logarithm of the confounding hazard ratio, and beta distributions to parameterize the prevalence of obesity. All estimates and parameters for bias correction are given in Table 1.

Probabilistic bias analysis.

We corrected bias both from misclassification of exposure and from unmeasured confounding by sampling bias parameters from their respective distributions, correcting the data using these parameters, refitting the Cox model using the corrected data (including adjustment for obesity), and repeating this procedure 10,000 times (27). In each iteration we sampled the bias parameters for misclassification (i.e., sensitivity, specificity) from their respective distributions. We then calculated prevalence of clinical knee osteoarthritis in the population-based sample and corresponding positive predictive values and negative predictive values using standard formulas (Web Table 1, available at https://academic.oup.com/aje). We made a Bernoulli draw for both exposed (from positive predictive values) and unexposed (from negative predictive values) persons and then corrected the exposure status accordingly to the results of this Bernoulli draw. Then we used 3 approaches to additionally correct for confounding by obesity. In the first approach, we fitted our primary Cox model with the corrected exposure status and then sampled the value of the confounding hazard ratio from its distribution and used the sampled value to correct the estimated hazard ratio according to the analytical approach as described by Lash et al. (28). In the second and third approaches, we sampled the probability of being obese for the 4 strata created by tabulating exposure and outcome statuses. Then, for each person, we sampled the confounder value (obese or not) from Bernoulli draws. In the second approach, we sampled from the obesity distribution based on diagnosed knee osteoarthritis as exposure. In the third approach, we sampled from the obesity distribution based on clinical knee osteoarthritis. In all analyses, we used the formula estimate – N(0,1) × SE, where estimate is the estimated regression coefficient, N(0,1) is a draw from a normal distribution with mean 0 and standard deviation 1, and SE is the standard error of estimate, to derive corrected estimates and their 95% confidence intervals including both systematic and random error, as suggested previously (9). The final 95% confidence intervals were derived as 2.5th and 97.5th percentiles of the distribution of the estimates corrected for both systematic and random error, from 10,000 repeats of the procedure. In the minority of repeats (0.5%) the samples values were implausible given the data and were thus excluded.

RESULTS

Characteristics of the study samples

The population-based cohort consisted of 292,000 persons with mean age of 69.3 (standard deviation, 8.0) years, and with 47% men. The MOA study sample consisted of 9,628 persons, with mean age of 71.6 (standard deviation, 7.6) years (Table 2, Web Table 2).

Table 2

Descriptive Data According to Diagnosed Knee Osteoarthritis at Baseline in the Skåne Population and Malmö Osteoarthritis Study (Without Reweighting), Sweden, 2007–2017

	Skåne Population (n = 292,000)				MOA, All (n = 9,658)				MOA Clinical Examination (n = 1,491)
	Diagnosed Knee OA		No Diagnosed Knee OA		Diagnosed Knee OA		No Diagnosed Knee OA		Diagnosed Knee OA		No Diagnosed Knee OA
Variable	No.	%	No.	%	No.	%	No.	%	No.	%	No.	%
Age, years^a	71.7 (8.1)		69.1 (8)		73.6 (7.4)		71.4 (7.5)		72.7 (7.3)		70.4 (7.1)
Male sex	9,665	41	127,527	48	262	34	3,298	37	89	37	448	36
Annual income, per 100,000 SEK^a	1.8 (10.4)		1.9 (4)		1.7 (1.7)		1.9 (6.4)		2 (2.3)		1.9 (1.4)
Education
≤9 years	10,588	45	105,116	39	292	37	3,057	35	81	33	380	30
10–12 years	8,988	38	103,885	39	347	45	3,819	43	118	49	551	44
13–14 years	1,784	8	23,837	9	69	9	887	10	22	9	151	12
≥15 years	2,236	9	35,579	13	71	9	1,086	12	21	9	167	13
Married	17790	75	208,611	78	562	72	6,687	76	182	75	982	79
Born outside Sweden	2954	13	34,082	13	117	15	1,192	13	36	15	133	11
Obese^b	N/A		N/A		233	30	1,090	12	63	26	166	13

Abbreviations: N/A, not applicable; OA, osteoarthritis; SEK, Swedish krona.

a Numbers are expressed as mean (standard deviation).

b Obesity defined as a body mass index (weight (kg)/height (m)2) of >30.

Descriptive Data According to Diagnosed Knee Osteoarthritis at Baseline in the Skåne Population and Malmö Osteoarthritis Study (Without Reweighting), Sweden, 2007–2017 Abbreviations: N/A, not applicable; OA, osteoarthritis; SEK, Swedish krona. a Numbers are expressed as mean (standard deviation). b Obesity defined as a body mass index (weight (kg)/height (m)2) of >30. After reweighting to correct for potential selection bias in the 2 MOA samples, the prevalence of doctor-diagnosed OA was similar in the whole MOA sample and the underlying population but higher in the MOA clinical examination sample (Table 3).

Table 3

Estimates of the Prevalence of Diagnosed Knee Osteoarthritis in the Underlying Population (Skåne Region, 2008) and the Malmö Osteoarthritis Study (Clinical Study Cohort, 2007–2008) and Estimates of the Association Between Diagnosed Osteoarthritis and All-Cause Mortality, With Follow-up to 2017, Sweden

Cohort	No.	Correction of Selection Bias	Among Persons Diagnosed Knee OA		All-Cause Mortality Among Persons With OA Compared With Those Without OA
Cohort	No.	Correction of Selection Bias	Prevalence	95% CI	HR	95% CI
Skåne population	292,000	None	0.081	0.080,0.082	0.95	0.93,0.98
MOA all	9,628	None	0.081	0.075,0.086	1.07	0.95,1.22
MOA clinical examination	1,491	None, design weights only	0.108	0.092,0.126	1.04	0.72,1.51
MOA all	9,628	Reweighted	0.083	0.077,0.088	1.08	0.95,1.24
MOA clinical examination	1,491	Reweighted	0.101	0.082,0.122	1.12	0.74,1.68

Abbreviations: CI, confidence interval; HR, hazard ratio; MOA, Malmö Osteoarthritis Study; OA, osteoarthritis.

Bias parameters

The sensitivity (confidence interval) of diagnosed knee osteoarthritis with respect to clinical knee osteoarthritis (gold standard) was 0.36 (95% confidence interval (CI): 0.28, 0.45) in those alive at the end of follow-up and 0.50 (95% CI: 0.35, 0.64) in those who died during follow-up. Specificity was high, 0.96 (95% CI: 0.95, 0.98) and 0.97 (95% CI: 0.95, 0.99), respectively. The prevalence of obesity was higher in persons with osteoarthritis than those without (31% vs 10%) and also higher in those who died during the follow-up than in those who were alive (Table 1). Association Between Clinical Knee Osteoarthritis and All-Cause Mortality Estimated Within the Malmö Osteoarthritis Study, Sweden, 2007–2017 Abbreviations: CI, confidence interval; HR, hazard ratio. a Clinical examination sample, n = 1491. The Malmö Osteoarthritis Study was conducted 2007–2008, with follow-up to 2017. The Cox regression model adjusted for age, sex, income, education, whether married, and whether born outside Sweden.

The association between knee osteoarthritis and mortality

The association between clinical knee osteoarthritis (the gold-standard exposure) and all-cause mortality in the MOA clinical examination sample, reweighted to minimize selection bias and adjusted for obesity, was 1.10 (95% CI: 0.80, 1.52) (Table 4). The confidence intervals are wide and imprecise, but the included values are in line with other clinical cohorts, suggesting potentially relevant excess mortality in persons with knee osteoarthritis. The association between diagnosed knee osteoarthritis and all-cause mortality in the population-based Skåne cohort was 0.95 (95% CI: 0.93, 0.98), a very precise confidence interval reflecting the large sample size and in line with other register-based studies, suggesting a very slightly lower mortality in persons with osteoarthritis. After correcting for both misclassification of diagnosed knee osteoarthritis and confounding from obesity, the estimate in the Skåne population was 1.02 (95% CI: 0.59, 1.52), reflecting high uncertainty in the bias correction, mainly due to low sensitivities. The point estimate—reflecting average values of bias parameters—is very close to 1, suggesting no association (Figure 2).

Table 4

Association Between Clinical Knee Osteoarthritis and All-Cause Mortality Estimated Within the Malmö Osteoarthritis Study, Sweden, 2007–2017

Bias Correction	HR	95% CI
Design weights	1.14	0.86, 1.52
Design weights and adjusted for obesity	1.08	0.81, 1.45
Reweighted to correct potential selection bias	1.19	0.87, 1.62
Reweighted to correct potential selection bias and adjusted for obesity	1.10	0.80, 1.52

Abbreviations: CI, confidence interval; HR, hazard ratio.

a Clinical examination sample, n = 1491. The Malmö Osteoarthritis Study was conducted 2007–2008, with follow-up to 2017. The Cox regression model adjusted for age, sex, income, education, whether married, and whether born outside Sweden.

Figure 2

Comparison of bias-corrected association between diagnosed knee osteoarthritis and all-cause mortality, estimated in the Skåne population (n = 292,000) and in the Malmö Osteoarthritis Study (MOA, n = 9,658), Sweden, 2007–2017. Diamond denotes the estimate based on the reweighted MOA data. Points denote estimates based on population-based register data. Misclassification and confounding A: corrected for misclassification of knee osteoarthritis and confounding using probabilities of being obese based on diagnosed knee osteoarthritis. Misclassification and confounding B: corrected for misclassification of knee osteoarthritis and confounding using probabilities of being obese based on clinical knee osteoarthritis. Misclassification and confounding C: corrected for misclassification of knee osteoarthritis and confounding using confounding hazard ratio. Misclassification: corrected for misclassification of knee osteoarthritis. No correction: no bias correction.

DISCUSSION

Our results suggest that the apparent discordance in the estimates of the association between knee osteoarthritis and mortality risk from cohort studies and population-based register data might be due to different biases present in these data sources. Cohort studies are likely to suffer from selection bias from selective participation and loss to follow-up, and they often provide estimates with wide confidence intervals reflecting random error only (15, 17). On the other hand, population-based register data might suffer from misclassification and unmeasured confounding while reporting overly precise estimates reflecting random error only (11, 13). In this study, we corrected the estimates based on population-based register data for both differential misclassification and confounding to provide estimates with 95% confidence intervals that encompass both systematic and random error. Our results suggest that knee osteoarthritis might not be associated with an increased hazard of death during 10 years of follow-up, but results are imprecise due to uncertainty in the bias parameters. Considering that these results are derived from the Skåne population in the south of Sweden, with easy access to public health care, these results might be generalizable to other populations with similar health-care systems. Our bias-corrected estimates reflect much higher uncertainty around the estimated hazard ratio than what is reflected by random error only (11, 13). On the other hand, the shift in the estimate, as compared with the MOA study, suggests potential selection bias affecting estimates from MOA study. Probabilistic quantification of bias is often not straightforward and is based on strong assumptions. In this study, we used the beta distribution to define the distribution of the bias parameters, given that this distribution is well suited for proportions (9) and the MOA was nested within the Skåne population. This might not be a suitable approach if using external validation samples. Using the beta distribution led most often to plausible values; only 0.5% of combinations of sampled parameters were impossible given the data. Our bias analysis is also based on a set of untestable assumptions. First, we assumed that the bias parameters derived from the reweighted MOA sample were accurate with respect to the underlying Skåne population and not affected by any selection bias remaining after reweighting. Second, we assumed a simple differential misclassification, without taking into account other possible relations between exposure, outcome, and confounders. Also, we assumed that our gold-standard exposure was free of measurement error. Due to the relatively small size of the MOA sample, we were not able to derive bias parameters in relevant subgroups (stratified, for example, by sex or age); their confidence intervals would have been too large and would have led to inconclusive estimates. Further, there might be other unmeasured confounders that we did not adjust for, such as knee injury preceding knee osteoarthritis. Another important confounder that we could not adjust for is physical activity, which affects both risk of knee injury and body weight and thus risk for knee osteoarthritis. However, the potential existence of unmeasured confounders, inevitable in observational studies, should not be considered a reason not to correct the biases that are known. Methods for probabilistic bias analysis have existed for some time, but a problem might be the lack of adequate bias parameters. It is rare to perform a smaller substudy within a larger study with the aim of addressing potential misclassification or confounding not addressed in the main study due to feasibility or costs. We argue that using data from existing cohort studies is a cost-effective solution. Population-based register data, such as those in the Nordic countries, can be assumed to have negligible selection bias, which is often a very difficult bias to correct (29), and they have adequate sample size. When corrected for misclassification and confounding, they could provide valid and precise estimates that are not available otherwise. To achieve this, the question of generalizability and transportability of the bias parameters is crucial, given that the bias parameters must be adequate for application in the population-based cohort (30, 31). Thus, utilizing data from existing cohorts sampled from the population of interest could be a valid solution. If the bias parameters are estimated with care to ensure internal validity, they almost automatically gain external validity with respect to the underlying population of interest. Considering the vast number of existing cohort studies, they could become an invaluable source of useful bias parameters for the populations they were sampled from. New examples in the literature suggest that the opposite procedure might also be useful: correcting biases in case-control or cohort studies using population-based register data (32, 33). Further, even more general approaches to combining data from several data sources with different biases have been recently formalized (34). In conclusion, our results suggest that cohort studies that require active participation might suffer from selection bias, while population-based, register-based data might underestimate the association between knee osteoarthritis and all-cause mortality due to misclassification of osteoarthritis. When minimizing these biases, our results suggest no increase in all-cause mortality in persons with knee osteoarthritis as compared with persons without, but the uncertainty is substantial. In this era of increasing use of population-based electronic health-care databases for epidemiologic studies, probabilistic quantification of bias using bias parameters derived from smaller cohorts within the same population might increase the validity of reported estimates. Click here for additional data file.

29 in total

1. Causal inference and the data-fusion problem.

Authors: Elias Bareinboim; Judea Pearl
Journal: Proc Natl Acad Sci U S A Date: 2016-07-05 Impact factor: 11.205

2. Target Validity and the Hierarchy of Study Designs.

Authors: Daniel Westreich; Jessie K Edwards; Catherine R Lesko; Stephen R Cole; Elizabeth A Stuart
Journal: Am J Epidemiol Date: 2019-02-01 Impact factor: 4.897

3. Good practices for quantitative bias analysis.

Authors: Timothy L Lash; Matthew P Fox; Richard F MacLehose; George Maldonado; Lawrence C McCandless; Sander Greenland
Journal: Int J Epidemiol Date: 2014-07-30 Impact factor: 7.196

4. When the entire population is the sample: strengths and limitations in register-based epidemiology.

Authors: Lau Caspar Thygesen; Annette Kjær Ersbøll
Journal: Eur J Epidemiol Date: 2014-01-10 Impact factor: 8.082

5. The Malmö Diet and Cancer Study: representativity, cancer incidence and mortality in participants and non-participants.

Authors: J Manjer; S Carlsson; S Elmståhl; B Gullberg; L Janzon; M Lindström; I Mattisson; G Berglund
Journal: Eur J Cancer Prev Date: 2001-12 Impact factor: 2.497

6. All-cause Mortality in Knee and Hip Osteoarthritis and Rheumatoid Arthritis.

Authors: Aleksandra Turkiewicz; Tuhina Neogi; Jonas Björk; George Peat; Martin Englund
Journal: Epidemiology Date: 2016-07 Impact factor: 4.822

7. Current and future impact of osteoarthritis on health care: a population-based study with projections to year 2032.

Authors: A Turkiewicz; I F Petersson; J Björk; G Hawker; L E Dahlberg; L S Lohmander; M Englund
Journal: Osteoarthritis Cartilage Date: 2014-07-30 Impact factor: 6.576

Review 8. Mortality in osteoarthritis.

Authors: M C Hochberg
Journal: Clin Exp Rheumatol Date: 2008 Sep-Oct Impact factor: 4.473

9. All cause and disease specific mortality in patients with knee or hip osteoarthritis: population based cohort study.

Authors: Eveline Nüesch; Paul Dieppe; Stephan Reichenbach; Susan Williams; Samuel Iff; Peter Jüni
Journal: BMJ Date: 2011-03-08

10. Osteoarthritis and all-cause mortality in worldwide populations: grading the evidence from a meta-analysis.

Authors: Dan Xing; Yuankun Xu; Qiang Liu; Yan Ke; Bin Wang; Zhichang Li; Jianhao Lin
Journal: Sci Rep Date: 2016-04-18 Impact factor: 4.379