| Literature DB >> 32045428 |
Daniel Thayer1, Arfon Rees1, Jon Kennedy2, Huw Collins1, Dan Harris3, Julian Halcox3, Luca Ruschetti1, Richard Noyce1, Caroline Brooks1.
Abstract
A key requirement for longitudinal studies using routinely-collected health data is to be able to measure what individuals are present in the datasets used, and over what time period. Individuals can enter and leave the covered population of administrative datasets for a variety of reasons, including both life events and characteristics of the datasets themselves. An automated, customizable method of determining individuals' presence was developed for the primary care dataset in Swansea University's SAIL Databank. The primary care dataset covers only a portion of Wales, with 76% of practices participating. The start and end date of the data varies by practice. Additionally, individuals can change practices or leave Wales. To address these issues, a two step process was developed. First, the period for which each practice had data available was calculated by measuring changes in the rate of events recorded over time. Second, the registration records for each individual were simplified. Anomalies such as short gaps and overlaps were resolved by applying a set of rules. The result of these two analyses was a cleaned set of records indicating start and end dates of available primary care data for each individual. Analysis of GP records showed that 91.0% of events occurred within periods calculated as having available data by the algorithm. 98.4% of those events were observed at the same practice of registration as that computed by the algorithm. A standardized method for solving this common problem has enabled faster development of studies using this data set. Using a rigorous, tested, standardized method of verifying presence in the study population will also positively influence the quality of research.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32045428 PMCID: PMC7012444 DOI: 10.1371/journal.pone.0228545
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Event volume over time by GP practice.
Effects of combinations of two grouping parameters.
| group_on_sail_data | group_on_practice | Results Produced |
|---|---|---|
| 0 | 0 | Periods in which people are continuously registered with any Welsh GP. |
| 1 | 0 | Periods in which people have GP data available on SAIL. |
| 0 | 1 | Cleaned version of GP registration history. |
| 1 | 1 | Cleaned GP registration history, with flag indicating whether each record has SAIL data. |
Overview of different follow-up requirements in example analysis.
| Followup Requirement | Description |
|---|---|
| None | Any individual with a diagnosis of AF was included in the study; no further restrictions. |
| Present at diagnosis | An individual was required to be registered with a practice submitting data to SAIL at diagnosis. |
| Present at start of year | An individual was required to be registered with a practice submitting data to SAIL at the start of the year being measured. |
| Present entire year | An individual was required to have data continuously available in SAIL for the entire year. |
| Present entire 10 years | An individual was required to have data continuously available in SAIL for the entire ten years. |
Fig 2Monthly event recording rates: Mean vs. median.
Fig 3Measured rate of warfarin prescription by year from AF diagnosis.
Fig 4Measured incidence of stroke by year from AF diagnosis.