| Literature DB >> 32060159 |
Patrick Rockenschaub1,2, Vincent Nguyen3,2, Robert W Aldridge3,2, Dionisio Acosta3,2, Juan Miguel García-Gómez4, Carlos Sáez4.
Abstract
OBJECTIVES: To demonstrate how data-driven variability methods can be used to identify changes in disease recording in two English electronic health records databases between 2001 and 2015.Entities:
Keywords: cardiovascular disease; clinical coding; data quality; electronic health records; statistics & research methods
Mesh:
Year: 2020 PMID: 32060159 PMCID: PMC7045100 DOI: 10.1136/bmjopen-2019-034396
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Step-by-step explanation to estimate and visualise the temporal variability of a dataset. Methods included in the R package EHRtemporalVariability were created to support researchers with all steps in this process.
Figure 2IGT plot of demography (without age*) and cardiovascular disease prevalence in CPRD between 2001 and 2015. Each point represents joint prevalence in a single month (labelled with the last 2 digits of the year and the month) and distances represent the relative difference between them. Dimensions have no inherent meaning but represent the three ordered dimensions of highest variability as determined by multidimensional scaling. (a) Between 2001 and 2008, there was a gradual increase in disease prevalence, with two indentations corresponding to the years 2003 and 2005. (b) In 2008, the general trend reverses and prevalences decrease again, shown by a change in the direction of the graph. (c) The magnitude of variability increases after 2011, predominantly owing to changes in the socioeconomic status due to a reduction in the number of practices contributing to the dataset. Detailed subplots of a, b and c can be found in the supplementary material (online supplementary figure 1). CPRD GOLD, Clinical Practice Research Datalink; IGT, information-geometric temporal; J, January; F, February; M, March; A, April; m, May; j, June; x, July; a, August; S, September; O, October; N, November; D, December. *The given graph excluded the age variable for clarity. Since CPRD GOLD includes only the year of birth, including age leads to artificial yearly jumps in July when every patient is considered 1 year older. The overall conclusion remains unaltered. A full graph including age can be found in the supplementary material (online supplementary figure 2).
Figure 3IGT plot of demography and cardiovascular disease coding in HES between 2001 and 2015. Each point represents joint prevalence in a single month (labelled with the last 2 digits of the year and the month) and distances represent the relative difference between them. Dimensions have no inherent meaning but represent the three dimensions of highest variability (in order) as determined by multidimensional scaling. (a) From 2001 to 2009, there was a gradual change in which cardiovascular codes were associated with hospital admission. The data distributions started to diverge from the previous trend in March 2009. (b) In March 2010, the distribution of cardiovascular codes abruptly changed. (c and d) Similar and even stronger changes in cardiovascular disease coding occurred again in April 2012 and April 2014. The distributions within these 2 year batches remained stable. Detailed subplots of a, b, c and d can be found in the supplementary material (online supplementary figure 4). IGT, information-geometric temporal; J, January; F, February; M, March; A, April; m, May; j, June; x, July; a, August; S, September; O, October; N, November; D, December.
Figure 4DTM of ICD-10 coding linked to hospital admissions in HES between 2001 and 2015. Each row represents a single ICD-10 code (3 characters) and the colour shows the proportion of admissions with that code in each month. Gradual changes in code frequency can notably be seen for I20—angina pectoris, I21—acute myocardial infarction, I63—cerebral infarction and I64—stroke, not specified. Abrupt changes appear in the coding of G45—transient Cerebral ischaemic attack (2009), I21—acute myocardial Infarction (2010 and 2012) and I20—angina pectoris (2014). DTM, data temporal map; HES, Hospital Episode Statistics; ICD-10, International Classification of Diseases 10th revision.
Variability in CPRD GOLD and HES and their potential causes and solutions
| Finding | Observable cause | Possible original cause | Possible solutions |
|
| |||
| Gradual change in the population distribution between 2001 and 2007 | Increases in the prevalence of recorded cardiovascular disease | Demographic changes (eg, ageing); incremental improvements in diagnostic coding or in clinical procedures | Incremental learning of models; inclusion of time interaction effect |
| Shift in the direction of change in 2008 | After the previous year’s increase, the prevalence of CHD, heart failure and PAD started decreasing again around the same time | No immediate reason identified | Separate analyses of prechange and postchange data |
| Oscillations in the data distributions after 2010 | Changes in the distribution of socioeconomic status in the target distribution | Selective dropout of practices, possibly related to a switch in the practice management software | Mixed models with practice effects |
|
| |||
| Gradual change in the population distribution between 2001 and 2008 | Increase in reported chronic CHD and atrial fibrillation; decreases in reported angina pectoris, acute myocardial infarction, heart failure and stroke | Demographic changes (eg, ageing); incremental improvements in diagnostic coding or in clinical procedures; selective increase of disease incidence | Incremental learning of models; inclusion of continuous time interaction effect |
| Shift in the direction of change in 2009 | Increased coding of transient cerebral ischaemic attacks between 2009 and 2010 | No immediate reason identified | Separate analyses of prechange and postchange data |
| Abrupt change in March 2010 | Drop in acute myocardial infarction coding | Update to the HSCIC Coding Clinic Guidance in February 2010 | Separate analyses; incremental learning of models |
| Further abrupt changes in April 2012 and 2014 | Sudden increase in acute myocardial infarction coding in 2012 with concomitant drop in subsequent myocardial infarction records; sudden further decrease in angina pectoris codes in 2014 | Update to the National Clinical Coding Guidance National Clinical Coding Standards ICD-10 fourth Edition | Separate analyses; incremental learning of models |
CHD, coronary heart disease; CPRD GOLD, Clinical Practice Research Datalink; HES, Hospital Episode Statistics; ICD-10, International Classification of Diseases 10th revision; PAD, peripheral arterial disease.