Literature DB >> 30774449

The internal validation of weight and weight change coding using weight measurement data within the UK primary care Electronic Health Record.

Brian D Nicholson¹, Paul Aveyard¹, Willie Hamilton², Clare R Bankhead¹, Constantinos Koshiaris¹, Sarah Stevens¹, Frederick Dr Hobbs¹, Rafael Perera¹.

Abstract

PURPOSE: To use recorded weight values to internally validate weight status and weight change coding in the primary care Electronic Health Record (EHR). PATIENTS AND METHODS: We included adult patients with weight-related Read codes recorded in the UK's Clinical Practice Research Datalink EHR between 2000 and 2017. Weight status codes were compared to weight values recorded on the same day and positive predictive values (PPVs) were calculated for commonly used codes. Weight change codes were validated using three methods: the percentage (%) difference in kilograms at the time of the code and 1) the previous weight measurement, 2) the weight predicted using linear regression, and 3) the historic mean weight. Weight change codes were validated if estimates were consistent across two out of three methods.
RESULTS: A total of 8,108,481 weight codes were recorded in 1,000,002 patients' EHR. Twice as many were recorded in females (n=5,208,593, 64%). The mean body mass index for "overweight" codes ranged from 31.9 kg/m2 to 46.9 kg/m2 and from 17.4 kg/m2 to 19.2 kg/m2 for "underweight" codes. PPVs for the most commonly used weight status codes ranged from 81.3% (80%-82.5%) to 99.3% (99.2%-99.4%). Across the estimation methods, and using only validated weight change codes, mean weight loss ranged from - 5.2% (SD 5.8%) to -7.9% (SD 7.3%) and mean weight gain from 4.2 % (SD 5.5%) to 7.9 % (SD 8.2%). The previous and predicted weight methods were most consistent.
CONCLUSION: We have developed an internationally applicable methodology to internally validate weight-related EHR coding by using available weight measurement data. We demonstrate the UK Read codes that can be confidently used to classify weight status and weight change in the absence of weight values. We provide the first evidence from primary care that a Read code for unexpected weight loss represents a mean loss of ≥ 5 % in a 6-month period, which was broadly consistent across age groups and gender.

Entities: Chemical

Keywords: body weight; data quality; electronic health records; primary health care; validation studies; weight gain; weight loss

Year: 2019 PMID： 30774449 PMCID： PMC6354686 DOI： 10.2147/CLEP.S189989

Source DB: PubMed Journal: Clin Epidemiol ISSN： 1179-1349 Impact factor: 4.790

Introduction

Extremes of weight and unexpected weight change are associated with multiple disease states such as cancer and cardiovascular disease, with increased morbidity and mortality.1–3 Although weight is a simple low-cost biometric retained in multiple clinical prediction models,4–7 weight measurements are commonly missing (not at random) from Electronic Health Records (EHRs) in primary care.8,9 In clinical settings where weight is not measured routinely, it is most commonly measured in relation to the clinical problem, chronic disease, or in patients who appear overweight or underweight.10,11 When measured, kilograms (kg), pounds, and body mass index (BMI) may be recorded inconsistently in structured or free text.12,13 Weight-related codes may be recorded in the presence or absence of weight measurements. The relationship between the two has not been investigated in any setting. There is no international standardization of how and when clinical codes are chosen and entered into an EHR14 leading to inconsistencies in coding practice and discordant coding hierarchies, such as Read and the ICD coding.15 Murtaugh et al identified bodyweight measures in the free text of 8% of the Veterans Administration EHR when no coded weight information was present.13 Price et al showed an increased prevalence of jaundice and visible hematuria when free text evidence of these clinical features was added to coded entries.16 At present, the major UK EHR platforms provide only coded EHR data for research.17 Read codes are used in the English National Health Service to document the clinical history and care process of patients attending primary care.18 The Read code hierarchy is complex with multiple overlapping terms including for symptoms and signs, investigations, diagnoses, and medications.19 Code lists are used to define variables of interest in epidemiological studies but there is no accepted method for code list generation, so researchers develop their own strategies and collate existing lists.20–22 Guidelines recommend that code lists are published to promote transparency and to allow replication and validation studies in other datasets.23 This is not yet common practice: out of 25 studies included in a recent systematic review, none of the 19 studies reported the code lists used to define weight loss.10 Validation studies of coding aim to ensure the accuracy and credibility of epidemiological studies24 and often utilize external questionnaire or linked secondary care data (ie, external validation).25–27 In comparison, internal validation uses corroborative data from the source dataset, such as blood pressure measurement data to verify hypertension coding.28 Validation studies of symptom codes are relatively uncommon, probably due to the scarcity of external datasets holding detailed symptom information.28,29 The aim of this study was to develop an internationally applicable methodology to assess the internal validity of weight-related coding by utilizing the objective weight values recorded in the EHR. Once developed, this methodology could be applied to any EHR dataset containing weight-related codes and weight measurements. To achieve this, we investigated the average weight value at each weight-related code, the positive predictive value (PPV) of commonly used weight status codes, and the degree of weight change prior to weight change codes, and developed several methods to do so and compared these.

Patients and methods

Study population

We accessed the Clinical Practice Research Datalink (CPRD) GOLD database, an ongoing primary care database of anonymized EHR data that covers a representative sample of ~6.9% of the UK population in terms of age, sex, and ethnicity, from ~674 participating practices using the Vision EHR at the time of this study.8 Patients aged >18 years for the study period of January 1, 2000 and December 31, 2017 were included in this analysis. All included were eligible for linkage to the National Cancer Registration and Analysis Service cancer registry, practice and patient level Index of Multiple Deprivation data, and Office for National Statistics mortality data, as independent markers of data quality rather than disease status.

Weight-related codes

A long list of candidate weight-related codes was generated by searches of the Read terms in the medical dictionary of the CPRD code browser using: *weight*, *body mass index*, *BMI*, *fat*, *thin*, *cachexi*. The Read code hierarchies were then explored around the candidate codes to identify potential related terms not picked up by the initial searching.

Code categorization

All candidate codes were categorized into four groups by the authors (PA, BDN, WH, RP, SS) using Microsoft Excel to reach consensus (Figure 1). These were

Figure 1

Flowchart of candidate code categorization.

Abnormal weight – codes that reported a weight outside of normality Weight change – codes that suggested weight change Weight “symptom” – codes that reported a weight “symptom” without clarifying which Weight other – codes that reported weight measurement, weight related advice, or a normal weight All codes were then extracted from CPRD together with the associated date, the patient’s age at the time of the weight code, their gender, and all available height values.

Weight measurements

Weight measurement values were extracted from CPRD for every patient with a “weight-related” code and a bespoke method developed to convert implausible kg values. Weight values were initially assumed to be measured in kg and the median weight (in kg) for each patient was calculated. Weight values were then assumed to be recorded in stones and pounds, or pounds alone, and converted into kg. Original kg values that fell outside of the range of less than half to more than 1.5 times the individual’s median kg value were replaced by the converted value if the converted value fell within this range. Finally, any remaining measurements under 20 kg or over 200 kg were dropped. As expected, and as previously described in large population-based cohorts,30 the distribution of the remaining weight measurements was slightly positively skewed.

Body mass index

BMI values were generated using the available height and weight data. An algorithm was developed to identify and convert heights recorded as centimeters into meters and remaining heights less than 1.3 m or greater than 2.1 m were dropped. The closest previous height was carried forward if there was no height measurement on the day of the weight measurement and the closest later height carried backwards if there was no height prior to or on the day of weight measurement.

Statistical analysis

Validity of weight-related coding

To investigate whether weight values could be used to validate weight status codes, weight values recorded on the same day as the abnormal weight codes were retained and the mean weight calculated (with corresponding 95% CIs). Weight symptom codes were also included in this analysis to understand their use, and two normal BMI codes as an additional sense check. PPVs (with corresponding 95% CIs) were also calculated for weight codes that specified a BMI range, using the BMI values as the reference standard.

Validity of weight-change codes

Weight measurement values were ordered by date leading up to the first weight change code in each patient. Again, weight symptom codes were also included in this analysis to understand their use. Weight change was estimated using three methods. These were the percentage (%) difference between the kg value at the time of the code and the Previous weight value within a 2-year period Predicted weight value for the time of the weight code, estimated by fitting a linear regression model for each patient through 3-monthly means of all weight values recorded in the preceding 5 years31 Historic mean weight of each patient using values recorded in the preceding 5 years We also calculated estimates for the absolute difference using these three methods. For the second and third methods, the weight measurement closest to the weight code and within the month preceding the weight code was classified as the weight value at the time of the weight code if there was no weight measurement on the day of the weight code. For all three methods, implausible changes in weight were set as those less than the first percentile and greater than the ninety-ninth percentile, and were excluded. This approach ensured that we minimized data loss, and increased the likelihood that we would capture weight change where present, whilst excluding implausible extreme changes. The mean weight change (95% CI) and median weight change IQR were calculated for each weight code for each method. Codes for which the IQR remained below 0 for at least two of the three methods were considered validated as weight loss codes. Codes for which the IQR remained above 0 for at least two of the three methods were considered validated as weight gain codes. By ensuring that the IQR remained above or below 0, we reduced the influence of outliers and ensured that at least 75% of values accurately represented the weight change described by the code. This study was approved as a component of the Independent Scientific Advisory Committee Protocol 16_164A2.32 All analyses were performed in Stata version 15.

Results

A total of 8,108,481 weight-related codes were recorded in CPRD between January 1, 2000 and December 31, 2017 in 1,000,002 individuals. Twice as many weight-related codes were recorded in females (n=5,208,593, 64%) than males. They were most commonly recorded in individuals aged 50–60 years (1,533,516, 19%); 40–50 years (1,470,111, 18%); and 60–70 years (1,425,102, 18%). The three most commonly used codes were “O/E – weight” (5,378,411, 66%) which usually accompanies a weight measurement value, “body mass index” (1,023,792, 13%), and “ideal weight” (314,499, 4%).

Internal validity of abnormal weight coding

Table 1 presents the most frequently used codes by increasing mean weight. In total, 396,864 (60%) abnormal weight codes were accompanied by a weight measurement value recorded on the same day, 108,647 (16%) had a prior weight measurement value recorded, and 154,939 (23%) codes had no prior weight values.

Table 1

The face validity of the 30 most commonly used abnormal weight codes

Abnormal weight code	Instances of code	Male	Same day weight measurements
	N	n (%)	n (%)	Mean
	N	n (%)	n (%)	kg (95% CI)	BMI (95% CI)
O/E – underweight	2,601	755 (29)	1,518 (58)	48.2 (47.8–48.6)	17.4 (17.3–17.4)
BMI <20	2,294	614 (27)	1,180 (51)	50.7 (50.3–51.2)	18.3 (18.2–18.4)
BMI low kg/m²	4,751	1,267 (27)	3,799 (80)	53 (52.6–53.3)	19.2 (19.1–19.3)
BMI normal kg/m²	62,233	22,035 (35)	47,745 (77)	63.7 (63.6–63.8)	22.8 (22.8–22.8)
BMI 20–24 – normal	6,579	2,315 (35)	5,565 (85)	64.1 (63.9–64.3)	22.9 (22.8–22.9)
BMI 25–29 – overweight	24,778	12,002 (48)	15,931 (64)	78.1 (78–78.3)	27.5 (27.5–27.6)
Has seen dietician – obesity	3,655	1,151 (31)	1,103 (30)	83.8 (82.6–85.1)	31.1 (30.6–31.5)
BMI high kg/m²	81,453	37,107 (46)	64,334 (79)	85.9 (85.8–86)	30.8 (30.8–30.8)
Referral to weight management service declined	9,390	4,399 (47)	5,600 (60)	87.9 (87.3–88.4)	30.9 (30.8–31.1)
Weight symptom	96,442	20,969 (22)	56,428 (59)	89.1 (88.9–89.3)	32.6 (32.5–32.7)
O/E – overweight	7,971	2,353 (30)	4,370 (55)	92.2 (91.6–92.7)	33.1 (32.9–33.2)
Dietary advice for weight reduction	9,528	3,985 (42)	6,225 (65)	95.2 (94.7–95.7)	33.7 (33.6–33.9)
Obesity monitoring	69,543	15,231 (22)	44,339 (64)	96 (95.8–96.2)	35 (35–35.1)
BMI 30+ – obesity	100,687	42,312 (42)	61,734 (61)	96.8 (96.7–96.9)	34.5 (34.5–34.6)
Weight reducing diet	7,879	2,109 (27)	3,883 (49)	97.3 (96.6–97.9)	35.1 (34.9–35.4)
Patient advised to lose weight	6,388	2,743 (43)	4,418 (69)	97.3 (96.7–97.9)	34.5 (34.3–34.6)
Weight loss advised	85,248	32,717 (38)	56,094 (66)	97.5 (97.4–97.7)	34.7 (34.6–34.7)
Refer to weight management program	10,725	2,704 (25)	5,509 (51)	97.7 (97.2–98.3)	35.7 (35.5–35.8)
Follow-up obesity assessment	4,056	894 (22)	2,702 (67)	98 (97.1–98.8)	35.9 (35.6–36.1)
Obesity monitoring check done	4,603	1,421 (31)	1,545 (34)	98.1 (97.2–99)	35.5 (35.2–35.8)
Wants to lose weight	13,602	2,678 (20)	8,046 (59)	98.9 (98.5–99.4)	35.9 (35.8–36.1)
Weight management program offered	26,211	6,940 (26)	16,709 (64)	99.5 (99.1–99.8)	36.3 (36.2–36.4)
Patient advised to lose weight	24,865	10,871 (44)	14,806 (60)	99.8 (99.4–100.1)	35.2 (35.1–35.3)
Obesity monitoring NOS	1,624	393 (24)	741 (46)	101.3 (99.8–102.8)	36.5 (36–36.9)
Weight management plan started	7,953	2,116 (27)	5,350 (67)	102.1 (101.5–102.7)	37 (36.8–37.2)
Initial obesity assessment	2,584	764 (30)	945 (37)	103.8 (102.5–105.1)	37.2 (36.8–37.6)
Obesity	193,221	59,626 (31)	112,516 (58)	104.8 (104.7–104.9)	37.7 (37.7–37.7)
O/E – obese	4,172	1,361 (33)	2,414 (58)	106.1 (105.3–106.9)	38.1 (37.8–38.3)
BMI 40+ – severely obese	14,632	4,379 (30)	9,068 (62)	123.3 (122.9–123.7)	44.7 (44.6–44.8)
Morbid obesity	6,138	2,157 (35)	2,505 (41)	131.6 (130.7–132.5)	46.9 (46.6–47.2)

Abbreviations: kg, kilogram; BMI, body mass index; n, numerator; N, denominator; NOS, not otherwise specified; O/E, observation/examination.

The most commonly entered abnormal weight codes were “obesity” (193,221, 7%), “body mass index 30+ – obesity” (100,687, 4%), and “weight symptom” (96,442, 4%). The mean BMI for weight codes classified as “overweight” ranged from 31.9 to 46.9. The mean BMI of weight codes classified as “underweight” ranged from 17.4 to 19.2. The mean BMI for the remaining “weight symptom” code was 32.6. Due to the large number of codes included in the analysis, CIs were narrow for all codes (Table 1). Specific codes such as “body mass index 25–29 – overweight” were associated with appropriate mean BMI value with relatively narrow SD (mean =27.5, SD=2). Less specific codes, such as “obesity”, were associated with an appropriate mean BMI value but with less precision as demonstrated by a larger SD (mean =37.7, SD=6). Non-specific codes, such as “weight symptom”, were associated with a relatively wide variation (mean =32.6, SD=9). The PPVs were high for the more commonly used and specific weight codes, ranging from 81.3% (80%–82.5%) for “Body Mass Index low K/M2” to 99.3% (99.2%–99.4%) for “O/E – overweight” (Table 2).

Table 2

PPV of the most commonly used and specific abnormal weight codes

Abnormal weight code	Criteria for true positive	PPV (95% CI)
O/E – underweight	BMI <20	94.9 (93.7–95.9)
BMI <20	BMI <20	95.4 (94.1–96.5)
BMI low kg/m²	BMI <20	81.3 (80–82.5)
BMI normal kg/m²	BMI 20–24	91 (90.7–91.2)
BMI 20–24 – normal	BMI 20–24	96.9 (96.4–97.3)
BMI 25–29 – overweight	BMI 25–29	96.4 (96.1–96.7)
BMI high kg/m²	BMI >25	99.3 (99.2–99.4)
O/E – overweight	BMI >25	99.6 (99.4–99.8)
Obesity monitoring	BMI >29	80.9 (80.5–81.2)
BMI 30+ – obesity	BMI >29	99 (98.9–99.1)
Follow-up obesity assessment	BMI >29	87.9 (86.6–89.1)
Obesity monitoring check done	BMI >29	94.4 (93.1–95.5)
Obesity monitoring NOS	BMI >29	91.3 (89.1–93.1)
Weight management plan started	BMI >29	92.7 (91.9–93.4)
Initial obesity assessment	BMI >29	95.5 (94–96.7)
Obesity	BMI >29	96.8 (96.7–96.9)
O/E – obese	BMI >29	98.8 (98.2–99.1)
BMI 40+ – severely obese	BMI >39	97.2 (96.8–97.5)
Morbid obesity	BMI >39	92.6 (91.5–93.5)

Abbreviations: BMI, body mass index; NOS, not otherwise specified; PPV, positive predictive value.

Internal validity of weight change coding

Table 3 shows the percent weight change expressed as the median (IQR) weight change for each estimation method and each weight change or weight symptom code. Ten codes met the criterion for validation as indicators of weight loss and four codes met the criterion for validation as indicators of weight gain: at least two of the three IQRs obtained for each code did not include 0. The two weight symptom codes could not be confidently reclassified as weight loss or weight gain codes and were retained as “weight symptom” codes (Table 3, Figure 2).

Table 3

Internal validation of weight change Read codes using three methods of percent weight change estimation

Weight change code	Method of % weight change estimation
	Current vs last		Current vs predicted		Current vs mean
	N	Median (IQR)	N	Median (IQR)	N	Median (IQR)
[D]Abnormal loss of weight	13,126	−5.3 (−9.1 to −2)	8,641	−6 (−10.7 to −1.9)	13,189	−8.5 (−12.8 to −4.8)
Abnormal weight loss – symptom	21,603	−5.1 (−8.9 to −1.8)	14,007	−5.9 (−10.4 to −1.8)	22,448	−8.1 (−12.4 to −4.5)
Unintentional weight loss	168	−5.2 (−9.6 to −1.7)	109	−7 (−11.4 to −2.5)	152	−8.6 (−12.2 to −5.4)
Abnormal weight loss	1962	−4.7 (−9 to −1.1)	1,170	−5.8 (−10.8 to −1.6)	1910	−7.9 (−12.6 to −3.7)
Complaining of weight loss	5,397	−4.3 (−7.9 to −1.3)	3,773	−5 (−9.4 to −1)	5,702	−7 (−11.3 to −3.4)
Weight decreasing	7,089	−3.9 (−7.4 to −1.3)	4,419	−4.8 (−9.2 to −1.5)	6,881	−6 (−10.2 to −2.3)
O/E – cachexic	31	−3.6 (−10 to 0.2)*	13	−7.6 (−8.4 to −4.9)	15	−13.7 (−22.2 to −3.5)
Intentional weight loss	238	−3.9 (−7.6 to −1.5)	168	−5.4 (−9.7 to −2.8)	213	−5.1 (−8.8 to −1.5)
Cachexia	175	−2.9 (−7.2 to 0)*	48	−5.5 (−10.6 to −0.9)	73	−10.7 (−15.2 to −7.1)
Weight loss from baseline weight	3,013	−2.7 (−5.1 to −1.1)	2,465	−3.3 (−6.5 to −0.7)	3,058	−2.6 (−6.2 to 0.6)*
Pattern of weight gain	3	−0.4 (−7.1 to 0)*	2	2 (−6.4 to10.4)*	2	11.8 (9.7 to 14)
H/O: attempted weight loss	8	−1.7 (−4 to 0.2)*	8	−0.7 (−4.3to 6)*	9	−0.2 (−7 to 0.7)*
History of attempted weight loss	1,349	−0.5 (−3.9 to 2.5)*	1,031	−2.1 (−6.4 to 1.8)*	1,303	0.7 (−5.6 to 5.2)*
Weight symptom NOS	163	0 (−2.8 to 3.3)*	80	−1.4 (−6.1 to 2.4)*	137	0.3 (−4.8 to 4.3)*
Weight symptom	27,925	0.6 (−2 to 4)*	18,135	−0.1 (−4 to 4.2)*	27,385	2.4 (−2.2 to 7.3)*
Excessive weight gain in pregnancy	22	0.2 (−1.7 to 6)*	22	0.8 (−5.6 to 20.7)*	28	9.6 (−2.3 to 18)*
Weight increasing	9,401	2.9 (0.5–6.3)	5,831	3 (−0.4 to 7.7)*	8,601	5.9 (2.2–10.3)
Abnormal weight gain	8,210	4.2 (1.1–7.9)	4,782	3.9 (0–9.2)	7,876	7.8 (4–12.7)
Abnormal weight gain	346	4.4 (0.8–8.1)	215	4.6 (0.3–9)	354	7.9 (4.2–13.2)
Unintentional weight gain	1	7.1 (7.1–7.1)	1	12 (12–12)	1	2.6 (2.6–2.6)

Notes:

Denotes the IQR incorporates 0. Orange shading denotes validated weight loss codes. Blue shading denotes weight symptom codes, which can be classified as neither weight gain nor weight loss codes. Green shading denotes validated weight gain codes.

Abbreviations: [D], diagnosis; H/O, history of; N, denominator; NOS, not otherwise specified; O/E, observation/examination.

Figure 2

Comparison of the mean percent weight change estimated using three methods for each weight change code (hollow circles) and combined by weight change group (full circles).

Abbreviation: kg, kilogram.

Amount of weight change prompting a weight change code

The mean weight loss prior to a validated weight loss code ranged from 5.2% (95% CI, −5.2% to −5.2%) to −7.9% (−8% to −7.8%), depending on the method (Table 4). For females, the amount of weight loss ranged from −5.4% (−5.5% to −5.3%) to −8.3% (−8.4% to −8.2%), and in males from −4.8% (−4.9% to −4.7%) to −7.4% (−7.5% to −7.3%). Within each method, there was little variation in the mean weight loss across age groups, although there was a slight trend toward greater weight loss being observed in the oldest age groups: mean weight loss for the trend method ranged from −5.2% (−5.7% to −4.7%) to −5.8% (−6% to −5.6%) across age groups and from −6.3% (−6.5% to −6.1%) to −10.2% (−10.3% to −10.1%) using the historic mean.

Table 4

Weight change by weight change group, gender, and age group for each of the three methods of percent weight change estimation

	Method of % weight change estimation
	Current vs last		Current vs predicted		Current vs mean
	N	Mean (95% CI)	N	Mean (95% CI)	N	Mean (95% CI)
Weight loss	52,802	−5.2 (−5.2 to −5.2)	34,813	−5.4 (−5.5 to −5.3)	53,641	−7.9 (−8 to −7.8)
Male	21,785	−4.8 (−4.9 to −4.7)	14,048	−5.2 (−5.3 to −5.1)	22,292	−7.4 (−7.5 to −7.3)
Female	31,017	−5.4 (−5.5 to −5.3)	20,765	−5.6 (−5.8 to −5.4)	31,349	−8.3 (−8.4 to −8.2)
18–29 y	5,903	−5 (−5.2 to −4.8)	3,097	−5.2 (−5.7 to −4.7)	6,313	−6.3 (−6.5 to −6.1)
30–39 y	4,379	−4.9 (−5.1 to −4.7)	2,678	−5.4 (−5.9 to −4.9)	5,089	−6.5 (−6.7 to −6.3)
40–49 y	5,150	−5 (−5.2 to −4.8)	3,354	−5.2 (−5.5 to −4.9)	5,977	−6.8 (−7 to −6.6)
50–59 y	6,478	−4.9 (−5 to −4.8)	4,367	−5 (−5.3 to −4.7)	7,108	−7 (−7.2 to−6.8)
60–69 y	8,267	−5 (−5.1 to −4.9)	5,663	−5.2 (−5.4 to −5)	8,327	−7.7 (−7.8 to −7.6)
70–79 y	11,618	−5.3 (−5.4 to −5.2)	8,130	−5.7 (−5.9 to −5.5)	10,859	−8.9 (−9 to −8.8)
80+ y	11,007	−5.7 (−5.8 to −5.6)	7,524	−5.8 (−6 to −5.6)	9,968	−10.2 (−10.3 to −10.1)
Weight symptom	28,088	1 (0.9–1.1)	18,215	0.7 (0.6−0.8)	27,522	2.7 (2.6–2.8)
Male	6,628	0.1 (0–0.2)	4,120	−0.3 (−0.5 to −0.1)	6,537	0.9 (0.7–1.1)
Female	21,460	1.3 (1.2–1.4)	14,095	1 (0.8–1.2)	20,985	3.2 (3.1–3.3)
18–29 y	5,931	1.7 (1.5–1.9)	3,427	1 (0.6–1.4)	5,875	4.5 (4.2–4.8)
30–39 y	5,600	1.7 (1.6–1.8)	3,659	1.4 (1.1–1.7)	5,761	4.1 (3.9–4.3)
40–49 y	5,226	1.4 (1.3–1.5)	3,413	1.2 (0.9–1.5)	5,340	3.5 (3.3–3.7)
50–59 y	4,489	1 (0.9–1.1)	3,106	1.1 (0.8–1.4)	4,440	2.6 (2.4–2.8)
60–69 y	3,400	0.5 (0.3–0.7)	2,338	0.2 (−0.1–0.5)	3,126	1.4 (1.1–1.7)
70–79 y	2,162	−0.9 (−1.1 to −0.7)	1,448	−1 (−1.4 to −0.6)	1,896	−2.1 (−2.5 to −1.7)
80+ y	1,280	−3 (−3.4 to −2.6)	824	−2.3 (−3 to −1.6)	1,084	−6.7 (−7.2 to −6.2)
Weight gain	17,957	4.2 (4.1–4.3)	10,828	4.9 (4.7–5.1)	16,831	7.9 (7.8–8)
Male	4,006	3.3 (3.2–3.4)	2,473	4 (3.7–4.3)	3,707	6.4 (6.2–6.6)
Female	13,951	4.5 (4.4–4.6)	8,355	5.2 (5–5.4)	13,124	8.4 (8.3–8.5)
18–29 y	3,320	5.4 (5.2–5.6)	1,806	4.9 (4.4–5.4)	3,155	10.3 (10–10.6)
30–39 y	3,284	4.7 (4.5–4.9)	1,999	5.2 (4.7–5.7)	3,236	9.1 (8.8–9.4)
40–49 y	3,409	4.3 (4.1–4.5)	2,093	5 (4.6–5.4)	3,488	7.9 (7.6–8.2)
50–59 y	3,294	3.7 (3.5–3.9)	2,078	5 (4.6–5.4)	3,115	7 (6.8–7.2)
60–69 y	2,592	3.4 (3.2–3.6)	1,644	4.5 (4.1–4.9)	2,243	6 (5.7–6.3)
70–79 y	1,544	3.3 (3.1–3.5)	935	4.5 (4–5)	1,240	5.6 (5.2–6)
80+ y	514	3.3 (2.8–3.8)	273	5.4 (4.3–6.5)	354	4.9 (4–5.8)

Abbreviations: N, denominator; y, years.

The mean weight gain prior to a validated weight gain code ranged from 4.2% (95% CI, 4.1%–4.3%) to 7.9% (7.8%–8%), depending on the method (Table 4). Weight gain codes were 3.5 times more likely to be recorded in females. For females, the amount of weight gain ranged from 4.5% (4.4%–4.6%) to 8.4% (8.3%–8.5%), and in males from 3.3% (3.2%–3.4%) to 6.4% (6.2%–6.6%). Each method demonstrated a different pattern in weight gain across age groups: the trend method suggested a similar pattern of weight gain across age groups, whilst the previous and historic mean methods suggested a decreasing amount of weight gain triggered a code with increasing age. The mean weight change associated with a weight symptom code ranged from 0.7% (95% CI, 0.6%–0.8%) to 2.7% (2.6%–2.8%), depending on the method (Table 4). For females with a weight symptom code, weight change ranged from 1% (0.8%–1.2%) to 3.2% (3.1%–3.3%), and for males from −0.3% (−0.5% to −0.1%) to 0.9% (0.7%–1.1%). Whichever method was used to assess change in weight, weight symptom codes were more commonly used after weight loss in older people, with weight change ranging from –2.3% (−3% to −1.6%) to −6.7% (−7.2% to −6.2%) in the 80+ years age groups, while the same weight symptom code ranged from 1% (0.6%–1.4%) to 4.5% (4.2%–4.8%) in the 18–29 years age group.

Method consistency

For the validated weight change codes, there was greater consistency between the method using the previous weight measurement value and the predicted measurement method compared with the historic mean method (Figure 2). In addition, the greater the time between the weight code and the last weight measurement, the greater the weight change (Table 5). To evaluate the potential bias of modeling only individuals with valid BMI measures, we compared information on age and gender for those with and without a valid measure. The proportion of individuals with weight measured was consistent across age group and gender.

Table 5

Association between time to previous weight measurement and estimated weight change

	Current vs last (%)
	N	Mean (95% CI)	Median (IQR)
Weight loss
0–2 w	3,278	−1.1 (−1.2 to −1)	−1.1 (2.9 to 0.3)
2–4 w	3,853	−2.3 (−2.4 to −2.2)	−2.1 (4 to −0.3)
1–2 m	5,401	−3.4 (−3.5 to −3.3)	−3.1 (5.5 to −1.1)
2–6 m	15,272	−4.9 (−5 to −4.8)	−4.7 (7.8 to −1.9)
6–12 m	13,572	−6.3 (−6.4 to −6.2)	−6.1 (9.9 to −2.7)
12–24 m	11,426	−7.2 (−7.3 to −7.1)	−7.1 (11.3 to −3.3)
Weight symptom
0–2 w	1,704	−0.3 (−0.4 to −0.2)	−0.1 (1.7 to 1)
2–4 w	2,209	−0.5 (−0.6 to −0.4)	−0.5 (2.2 to 1)
1–2 m	3,217	−0.3 (−0.4 to −0.2)	0 (−2.3 to 1.6)
2–6 m	8,331	0.5 (0.4–0.6)	0.6 (−2.2 to 3.4)
6–12 m	6,755	1.7 (1.5–1.9)	1.9 (−1.9 to 5.6)
12–24 m	5,872	2.4 (2.2–2.6)	2.7 (−1.8 to 7.1)
Weight gain
0–2 w	738	0.6 (0.4–0.8)	0 (−0.9 to 1.8)
2–4 w	923	1 (0.8–1.2)	0.8 (−0.9 to 2.5)
1–2 m	1,499	1.7 (1.5–1.9)	1.4 (−0.2 to 3.3)
2–6 m	5,552	3.4 (3.3–3.5)	2.9 (0.6 to 5.7)
6–12 m	4,983	5 (4.8–5.2)	4.6 (1.5 to 8.2)
12–24 m	4,262	6.6 (6.4–6.8)	6.2 (2.7 to 10.3)

Abbreviations: N, denominator; m, months; w, weeks.

Discussion

We report the methodology and findings of the first internal EHR validation study using the available weight measurement values and weight-related coding. We demonstrate which weight-related Read codes can be used to quantify weight with the greatest precision and have provided estimates of weight and PPV for the most commonly used codes. Out of the three methods developed to assess the extent of weight change prior to a weight change code, the two that performed similarly were the difference from the preceding weight measurement and the predicted weight based on linear trend. Weight loss codes were typically employed when weight loss was ≥5%, especially when the previous weight measurement was over 6 months ago. Weight symptom codes were not used by general practitioners to record weight change and were used variably depending on patient age.

Comparison with existing literature

We have found no directly comparable studies. We are aware of only one English study using weight measurement data to define weight loss: a case-control study investigating the risk of symptoms for colorectal cancer in different age groups.33 Weight loss was defined by using the highest recorded weight in the preceding 2 years, leading to the possibility of under-or over-estimation of weight change. We generated BMI for each weight measurement using the available height information in CPRD using a similar algorithm to Bhaskaran et al, who compared BMI recorded in CPRD to that from a representative sample of the population of England (the Health Survey for England [HSE]).34 They found that mean BMI of those with data in CPRD more closely matched the HSE when the CPRD data on BMI were limited to those recorded in the last 3 years. Based on this, we confined our validations of weight status to individuals with a weight status code and a weight measurement occurring on the same day. Using this method, the mean BMI was 29.5, higher than the mean BMI of 27.0 recorded by the HSE for the same period.35 This difference most probably occurred because there are a large number of codes to describe being overweight and only a few to described being underweight. When restricted to the “body mass index” code denoting that a BMI measurement had taken place, the mean BMI was 27.2, demonstrating external validity with the HSE.

Strengths and limitations

As weight is not measured routinely in English primary care, weight measurement is an example of informative observation: sicker patients (potentially with a weight-related problem) are more likely to attend and to have their weight measured36 or a weight-related code (some weight values may be entered under free text).37 Sperrin et al modeled the time to next BMI measurement as a recurrent event using anonymized UK primary care data in patients with type 2 diabetes in Salford, UK.38 They showed that the higher the previous BMI measurement the higher the likelihood of repeat BMI measurement, that an increasing trend in BMI lowered the likelihood of repeat BMI measurement, and that the presence of comorbidity increased the likelihood of BMI measurement. To minimize the impact of weight coding on subsequent weight measurement patterns, we restricted our analysis of weight change to measurements prior to the first weight change code. Weight typically increases slowly throughout adulthood with gain in adiposity and then slowly decreases from the seventh decade of life as muscle mass is slowly lost.39 Deviations from this underlying trend are unusual without an identifiable pathological or behavioral explanation.3 The slight trend observed toward greater weight loss in the oldest age groups may represent expected muscle loss or alternatively underlying serious disease in the frail elderly, but as we do not have disease status we cannot investigate these possibilities in this dataset. To account for expected weight loss, we chose a simple linear regression modeling approach in preference to more sophisticated modeling techniques such as regression with restricted cubic splines. Linear regression was feasible as we estimated weight over short periods up to 5 years and generated 3-monthly means to reduce the effect of measurement error after cleaning the data for outliers.

Implications

Although the findings of this study have specific relevance to epidemiologists utilizing Read coded data derived from the UK’s EHR, the methods developed are transferrable to any EHR that includes weight measurement data and weight-related coding used in any health care economy. The REporting of studies Conducted using Observational Routinely-collected Data (RECORD) statement is a reporting guideline for observational studies using health data collected for non-research purposes.23 It recommends that the validation steps used when choosing codes or algorithms to select the study population should be provided or referenced, together with a complete list of codes and algorithms used to classify exposures, outcomes, confounders, and effect modifiers. For example, a cross-sectional analysis recently examined diabetes coding between 1995 and 2014 and showed that code selection made a significant difference to the incidence of diabetes.40 To this end, we present validated weight-related and weight change Read codes for use in primary care EHR epidemiology. Such codes will be invaluable for any epidemiological study interested in weight as a covariate and in particular in disease areas with clear links between weight change and clinical outcomes. For example, extant studies assessing whether weight loss predicts the presence of cancer have mostly defined weight loss using weight loss Read codes without publishing the code lists.10 Without understanding which codes have been used to define weight loss and the internal validity of these codes, it has been impossible to ascertain the accuracy of the prevalence and the predictive value of weight loss. Further research should investigate whether estimates of the predicted value of weight loss are modified when using the codes found here to classify weight loss with greater confidence. This improved transparency could further inform clinical practice by, for example, informing clinicians about how much weight change is predictive of cancer or other serious disease in primary care. In primary care populations, the optimal percentage weight loss to maximize its predictive value as a sign of underlying serious illness has remained elusive.3 In UK primary care, the NICE guidelines for suspected cancer recommend that unexpected weight loss in combination with other clinical features should prompt further investigation for cancer.41 A subsequent review suggested that unexpected weight loss alone should prompt investigation for cancer.32 Neither of these recommendations defined the degree of weight loss, or the time period of loss, that should prompt action but both included studies that defined unexpected weight loss using Read coding. Previous reviews recommended that ≥5% involuntary weight loss over 6–12 months should be investigated.3,42,43 However, these data mainly come from populations recruited from hospital outpatients or inpatients, most of whom were elderly, where the prevalence of cancer and other serious disease is much higher than in primary care. This study provides the first evidence from primary care that a Read code for unexpected weight loss represents a mean loss of ≥5% in a 6-month period, which was broadly consistent across age groups and gender.

Conclusion

Our study reports the findings of an internationally applicable methodology to internally validate weight-related coding using the available weight measurement values. We demonstrate the UK Read codes that can be confidently used to classify weight status and weight change in the absence of weight values. We also provide the first evidence from primary care that a Read code for unexpected weight loss represents a mean loss of ≥5% in a 6-month period, a finding that is broadly consistent across age groups and gender.

Data sharing statement

The categorization is available from the corresponding author.

35 in total

Review 1. Administrative database research has unique characteristics that can risk biased results.

Authors: Carl van Walraven; Peter Austin
Journal: J Clin Epidemiol Date: 2011-11-09 Impact factor: 6.437

2. The history of the Read Codes: the inaugural James Read Memorial Lecture 2011.

Authors: Tim Benson
Journal: Inform Prim Care Date: 2011

Review 3. Validity of diagnostic coding within the General Practice Research Database: a systematic review.

Authors: Nada F Khan; Sian E Harrison; Peter W Rose
Journal: Br J Gen Pract Date: 2010-03 Impact factor: 5.386

Review 4. Investigation and management of unintentional weight loss in older adults.

Authors: Jenna McMinn; Claire Steel; Adam Bowman
Journal: BMJ Date: 2011-03-29

5. Body weight and the shape of the natural distribution of weight, in very large samples of German, Austrian and Norwegian conscripts.

Authors: M Hermanussen; H Danker-Hopfe; G W Weber
Journal: Int J Obes Relat Metab Disord Date: 2001-10

Review 6. Epidemiology of weight loss in humans with special reference to wasting in the elderly.

Authors: Jeffrey I Wallace; Robert S Schwartz
Journal: Int J Cardiol Date: 2002-09 Impact factor: 4.164

7. How accurate are diagnoses for rheumatoid arthritis and juvenile idiopathic arthritis in the general practice research database?

Authors: S L Thomas; C J Edwards; L Smeeth; C Cooper; A J Hall
Journal: Arthritis Rheum Date: 2008-09-15

Review 8. Validation and validity of diagnoses in the General Practice Research Database: a systematic review.

Authors: Emily Herrett; Sara L Thomas; W Marieke Schoonen; Liam Smeeth; Andrew J Hall
Journal: Br J Clin Pharmacol Date: 2010-01 Impact factor: 4.335

9. The risk of colorectal cancer with symptoms at different ages and between the sexes: a case-control study.

Authors: William Hamilton; Robert Lancashire; Debbie Sharp; Tim J Peters; Kk Cheng; Tom Marshall
Journal: BMC Med Date: 2009-04-17 Impact factor: 8.775

10. Validation of suicide and self-harm records in the Clinical Practice Research Datalink.

Authors: Kyla H Thomas; Neil Davies; Chris Metcalfe; Frank Windmeijer; Richard M Martin; David Gunnell
Journal: Br J Clin Pharmacol Date: 2013-07 Impact factor: 4.335

5 in total

1. Validity of ICD-10 diagnoses of overweight and obesity in Danish hospitals.

Authors: Sigrid Bjerge Gribsholt; Lars Pedersen; Bjørn Richelsen; Reimar Wernich Thomsen
Journal: Clin Epidemiol Date: 2019-09-11 Impact factor: 4.790

2. Measured weight loss as a precursor to cancer diagnosis: retrospective cohort analysis of 43 302 primary care patients.

Authors: Brian David Nicholson; Matthew James Thompson; Frederick David Richard Hobbs; Matthew Nguyen; Julie McLellan; Beverly Green; Jessica Chubak; Jason Lee Oke
Journal: J Cachexia Sarcopenia Muscle Date: 2022-07-28 Impact factor: 12.063

3. The association between unexpected weight loss and cancer diagnosis in primary care: a matched cohort analysis of 65,000 presentations.

Authors: Brian D Nicholson; Willie Hamilton; Constantinos Koshiaris; Jason L Oke; F D Richard Hobbs; Paul Aveyard
Journal: Br J Cancer Date: 2020-04-15 Impact factor: 7.640

4. Determinants and extent of weight recording in UK primary care: an analysis of 5 million adults' electronic health records from 2000 to 2017.

Authors: B D Nicholson; P Aveyard; C R Bankhead; W Hamilton; F D R Hobbs; S Lay-Flurrie
Journal: BMC Med Date: 2019-11-29 Impact factor: 8.775

5. Prior event rate ratio adjustment produced estimates consistent with randomized trial: a diabetes case study.

Authors: Lauren R Rodgers; John M Dennis; Beverley M Shields; Luke Mounce; Ian Fisher; Andrew T Hattersley; William E Henley
Journal: J Clin Epidemiol Date: 2020-03-17 Impact factor: 6.437

5 in total