| Literature DB >> 25965407 |
Stuart G Nicholls1, Pauline Quach2, Erik von Elm3, Astrid Guttmann4, David Moher5, Irene Petersen6, Henrik T Sørensen7, Liam Smeeth8, Sinéad M Langan8, Eric I Benchimol9.
Abstract
OBJECTIVE: Routinely collected health data, collected for administrative and clinical purposes, without specific a priori research questions, are increasingly used for observational, comparative effectiveness, health services research, and clinical trials. The rapid evolution and availability of routinely collected data for research has brought to light specific issues not addressed by existing reporting guidelines. The aim of the present project was to determine the priorities of stakeholders in order to guide the development of the REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement.Entities:
Mesh:
Year: 2015 PMID: 25965407 PMCID: PMC4428635 DOI: 10.1371/journal.pone.0125620
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flow diagram of steps used to elicit stakeholder priorities for RECORD.
Fig 2Examples of layout from (A) first survey (free-text responses) and (B) second survey (Likert-scale quantitative ranking).
Characteristics of respondents.
| First Survey (n = 68) | Second Survey (n = 56) | |
|---|---|---|
| n (%) | n (%) | |
|
| ||
| Male | 37 (54.5%) | 28 (50.0%) |
| Female | 31 (45.6%) | 28 (50.0%) |
|
| ||
| 18–34 | 8 (11.8%) | 2 (3.6%) |
| 35–49 | 39 (57.4%) | 36 (64.3%) |
| 50–64 | 20 (29.4%) | 16 (28.6%) |
| 65+ | 1 (1.5%) | 2 (3.6%) |
|
| ||
| United Kingdom | 34 (50.0%) | 24 (42.9%) |
| United States of America | 7 (10.3%) | 5 (8.9%) |
| Canada | 7 (10.3%) | 7 (12.5%) |
| France | 4 (5.9%) | 5 (8.9%) |
| Germany | 3 (4.4%) | 3 (5.4%) |
| Italy | 3 (4.4%) | 2 (3.6%) |
| Australia | 2 (2.9%) | 3 (5.4%) |
| Switzerland | 2 (2.9%) | 2 (3.6%) |
| Sweden | 1 (1.5%) | 0 |
| Denmark | 1 (1.5%) | 0 |
| The Netherlands | 1 (1.5%) | 2 (3.6%) |
| New Zealand | 1 (1.5%) | 0 |
| Brazil | 1 (1.5%) | 1 (1.8%) |
| Finland | 1 (1.5%) | 0 |
| Uganda | 0 | 1 (1.8%) |
| Romania | 0 | 1 (1.8%) |
|
| ||
| Researcher (MD) | 26 (32.8%) | 22 (39.3%) |
| Researcher (PhD, MSc., BSc., etc.) | 37 (54.4%) | 31 (55.4%) |
| Journal editor, journal administrative staff | 2 (2.9%) | 3 (5.4%) |
| Public (non-medical, non-researcher) stakeholder | 1 (1.5%) | 0 |
| Policymaker | 1 (1.5%) | 0 |
| Pharmaceutical industry representative | 1 (1.5%) | 0 |
|
| ||
| N/A (not a researcher) | 3 (4.4%) | 3 (5.4%) |
| Health administrative data (government) | 35 (51.5%) | 32 (57.1%) |
| Health administrative data (insurance provider) | 14 (20.6%) | 13 (23.2%) |
| Health administrative data (other) | 17 (25.0%) | 9 (16.1%) |
| Cancer registry | 18 (26.5%) | 12 (21.4%) |
| Disease registry | 27 (39.7%) | 26 (46.4%) |
| Electronic health/medical records | 44 (64.7%) | 30 (53.6%) |
| Government statistical databases (excluding health) | 17 (25.0%) | 16 (28.6%) |
| Primary care databases ( | 33 (48.5%) | 27 (48.2%) |
a Sixty-eight out of 76 respondents provided information on demographics (an optional question).
b Fifty-six out of 71 respondents provided information on demographics (an optional question).
c Respondents were able to select more than one option.
Overall themes suggested by respondents in the first survey and mean ratings from the second survey.
| Overall Theme | Frequency | Example Phrasing | Mean Rating |
|---|---|---|---|
| Characteristics of the data itself- quality, data source/setting, type of database, data collection process | 66 | Data completeness; Purpose of the recorded data; Data collection process; Description of datasets |
|
| Validity of diagnostic codes for outcomes of exposures | 59 | Validity; Validity (codes); Validity (procedures); Validity of outcomes; Data quality (data checks) |
|
| General methods (methods, analysis, confounding) | 35 | Confounding; Rationale for research; Bias; Bias (time); Confounders (measurement of) | 4.36 |
| Disease/exposure identification algorithms (excluding validation) | 32 | Disease identification algorithm; Exposure (definition of); Algorithm specification; Case definition; Code (selection) |
|
| Linkage | 31 | Data linkage; Data linkage (procedures); Data linkage (success rates); Data linkage (quality of); Matching algorithms |
|
| Characteristics of the population included in the data- including geographic region, population included, sampling | 30 | Coverage of dataset; Population (definition of); Population covered; Dataset characteristics described (or reference provided); Sampling |
|
| List of Codes | 23 | Code list; Codes (type of); Setting and sampling; Diagnosis codes; Codes (definition of) | 4.18 |
| Missing Data- How was missing data handled? Why? Proportion? | 15 | Missing data; Missing data (approaches to deal with); Administrative data methods used (imputation, etc.); Bias (missing data); Missing data (proportion of) | 4.43 |
| Ethical/legal/access/availability | 11 | Availability of databases being used; Access to data codebooks; Data security issues; Governance issues; Legal access to the database | 3.76 |
| Uncategorizable | 8 | Temporal relationships; Variable generation; Abstract; Code (impact of) | 3.50 |
| Identify as routine data study | 1 | Description of study as routine data study | 4.21 |
a Number of respondents providing a phrase in this category in the first-stage survey.
b List of 'example phrases' from the first stage survey with the highest frequencies.
c Mean score from the second stage survey is provided. 5-Strongly agree for importance for inclusion; 1- Strongly disagree for importance for inclusion.
d Top five themes from the second-stage survey with the highest means are bolded.
Mean rating of themes by manuscript section, as defined by the STROBE reporting checklist.
| Theme | Title | Abstract | Introduction | Methods (Study design) | Methods(Setting) | Methods (Participants) | Methods (Variables) | Methods (Data sources) | Methods (Bias) | Methods (Study Size) | Methods (Quantitative Variables) | Methods (Statistical Methods) | Results (Participants) | Results (Descriptive) | Results (Outcome Data) | Results (Main Results) | Results (Other Analyses) | Discussion (Key Results) | Discussion (Limitations) | Discussion (Interpretation) | Discussion (Generalizability) | Other |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Specify database (name) |
| - | 3.33 |
| - | - | - |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Specify that routine data were used |
|
| - |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Type of routine data used |
|
|
|
|
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Geographic area, setting, population |
|
|
| 3.89 |
|
| 2.93 | 3.76 | 3.53 |
| - | - |
|
|
|
| - | - | 4.28 |
|
| - |
| Linkage | 2.95 |
|
| 4.13 |
| 3.48 | 3.42 |
|
| - | - |
|
|
|
| - |
| - | 4.26 |
| - | - |
| Validation | 3.00 | 3.45 | 3.09 | 4.05 | 3.18 |
|
|
|
| - | - |
|
|
|
| - |
|
|
|
|
| 2.84 |
| Coding/classification used | 2.48 | 3.32 |
| 3.33 |
|
|
|
| - |
| - | 3.41 | - |
| - | - |
| 4.28 | 3.42 | - |
| |
| Rationale for registry/routine data collection approach | - | - |
|
|
|
| - | - | - | - | - | - | - | - | - | - | - |
|
|
|
|
|
| General methods (methods, analysis, confounding) |
|
|
|
|
|
|
| 3.56 |
|
|
|
|
|
|
|
|
|
| - |
|
| - |
| Specify objectives | - | - |
| 3.71 |
| - | - | - | - | - |
|
| - | - | - |
|
|
| - | - | - | - |
| Methods used to get data into analyzable format ( | - | - |
| 3.62 | 2.93 | 3.33 |
| 3.59 | 3.48 | - | - | 3.37 | - | - | 3.37 | - | - | - |
| - | - |
|
| Ethics, legality, access, availability | - | - | - | 3.71 | 2.95 | 2.67 | - | 3.23 | - | - | - | - | - | - | - | - | - | - | 3.07 | - | - |
|
| Characteristics of the data-quality, data source/setting, data collection | - | 3.42 | - | 3.92 |
|
|
|
|
|
|
|
|
|
| 3.28 |
| - |
|
| 3.65 |
|
|
STROBE sections with "-" for a theme indicate that no "example phrases" related to that theme were provided under that STROBE category in the first survey. Top five themes from the second survey with the highest means are bolded.
Respondents ranked themes on a 5-point Likert scale to determine whether they agreed with inclusion in the final checklist (5-Strongly agree for importance for inclusion; 1- Strongly disagree for importance for inclusion.)
Most highly rated themes by manuscript section.
| Manuscript Section | Highest Ranking Items | Sample of ‘Example Phrases’ under Theme | Ranking of Theme | ||
|---|---|---|---|---|---|
| Mean | Median | IQR | |||
|
| General Methods | Indicate key findings | 3.82 | 4 | 2 |
| Specify that routine data were used | Specify administrative/ routinely collected data was used | 3.66 | 4 | 2 | |
|
| Type of routine data used | Specify/describe data source(s) used | 4.52 | 5 | 1 |
| Geographic area, population setting | Describe study setting; Describe time period covered | 4.43 | 5 | 1 | |
|
| Specify objectives | Specify objectives | 4.50 | 5 | 1 |
| Rationale for registry/routine data collection approach | Rationale for routine data collection approach; Justify selection of specific database(s) | 4.09 | 4 | 1.25 | |
|
| |||||
| Study design | General methods | Indicate whether retrospective or prospective; Describe how missing data was dealt with; Specific if self controlled case series (SCCS) or other methods were used | 4.60 | 5 | 0 |
| Specify that routine data were used | Make reference to routinely collected data | 4.57 | 5 | 0 | |
| Setting | Geographic area, population setting | Time period for which data was retrieved and time period that database covers; Describe population covered by data source | 4.77 | 5 | 0 |
| Type of routine data used | Purpose of the database ( | 4.16 | 5 | 1 | |
| Participants | Geographic area, population setting | Describe how the cohort was constructed; Flow diagram of how population was narrowed down | 4.57 | 5 | 1 |
| Coding used | Algorithm do identify individuals; Code list for diagnoses | 3.98 | 4 | 2 | |
| Variables | Coding used | Share code list in supplement (when feasible); Describe diagnostic algorithms | 4.58 | 5 | 1 |
| Validation | Validation of code list; Describe validity of the variables | 4.30 | 5 | 1 | |
| Data Sources/ Measurement | Specify database | Define data source | 4.60 | 5 | 0 |
| Linkage | Describe linkage methods and their accuracy and quality | 4.25 | 5 | 1 | |
| Characteristics of the data | Data collection methods; Database creation and maintenance | ||||
| Bias | General methods | Describe methods to address confounding due to unmeasured variables; Discuss immortal person time bias; Discuss bias due to missing data | 4.27 | 5 | 1 |
| Characteristics of the data | Bias due to the purpose of original data collection; Bias in source data; Changes in codes over time | 4.17 | 4.5 | 1 | |
| Study Size | General Methods | Specifying the power based on size of the database; Being aware that clinical relevance is more important than sample size and to interpret P values with caution | 4.05 | 4 | 1.25 |
| Geographic area, population setting | Providing details on whether the complete population was used or just sample (convenience sample); Describing any reduction in sample size due to matching | 4.02 | 4 | 1 | |
| Quantitative Variables | General methods | Identifying whether case-based or person-based analyses were done; Explaining how time-varying variables were handled; Describing management of missing data | 4.24 | 5 | 1 |
| Characteristics of the data | Completeness of the data; Were cut-offs pre-specified in the database or later by the researchers? | 4.10 | 4 | 1 | |
| Statistical Methods | General methods | Describe how clustering was dealt with; Assessment of multi-collinearity | 4.44 | 5 | 1 |
| Specify objectives | Indicate that there was a pre-analysis statistical plan | 3.98 | 4 | 1.5 | |
|
| |||||
| Participants | Geographic area, population setting | Flow diagram to illustrate study numbers of each database used at each stage; Indicate number of missing due to missing code | 4.40 | 5 | 1 |
| Linkage | Number of cases linked, and number of cases excluded due to non-linkage; Comment on differences between linked and non-linked individuals; Comment on linkage quality | 4.25 | 5 | 1 | |
| Descriptive | Geographic area, population setting | Describe censoring (as a result of leaving the healthcare system) | 4.53 | 5 | 1 |
| Linkage | Similar ‘example phrases’ from results (Participants) | 4.05 | 4 | 0 | |
| Outcome Data | General Methods | SCCS, report number of outcome events in exposed patients | 4.26 | 5 | 1 |
| Geographic area, population setting | Number in each outcome group | 3.93 | 4 | 2 | |
| Main Results | General methods | Provide appropriate P values for sample sizes (e.g. P <0.01 may be more appropriate); Translate effect size back into units which readers will understand | 4.44 | 5 | 1 |
| Other Analyses | General methods | Report assessment of collinearity for categorical variables; Report assessment of correlation for categorical co-variables; Report sensitivity analyses (if there is reason to suspect differential misclassification) | 4.21 | 4 | 1 |
| Linkage | Provide any estimates of bias from linkage | 3.88 | 4 | 2 | |
|
| |||||
| Key Results | General Methods Rationale for | Comparisons with other possible sources of data | 4.32 | 5 | 1 |
| Routine Data Collection Approach | Specifying importance of kind of database used | 4.25 | 4 | 1 | |
| Limitations | Characteristics of the Data | Limitations of the data source (quality of the data source, bias in original data source); Database coverage of the population; Completeness of the database; Impact of changing practice in data collection or coding on results | 4.47 | 5 | 1 |
| Validation | Degree of validity of the data; Accuracy of the coding used | 4.46 | 5 | 1 | |
| Interpretation | General Methods | Coherence, comparability, and consistency to other studies; Validity of conclusions drawn (from using poor quality data) | 4.14 | 5 | 1 |
| Linkage | Potential bias a result of differential linkage in subgroups | 3.91 | 4 | 2 | |
| Generalizability | Geographic area, population setting | Describe how participating practices/health care providers are representative of those in that country; Completeness of coverage (which patients/providers are missed) | 4.49 | 5 | 3 |
| Characteristics of the Data | Generalizability with reference to other databases | 4.12 | 4 | 1 | |
|
| Ethics/Legality/Access/Availability | Address funders of the dataset; Address any data sharing issues (i.e. their availability); Address roles of the data custodians; Justify lack of open availability (if not open to other researchers); Discuss privacy protection | 3.88 | 4 | 2 |
| General Methods | Include technical appendix | 3.60 | 4 | 1 |
Respondents ranked themes on a 5-point Likert scale to determine whether they agreed with inclusion in the final checklist (5-Strongly agree for importance for inclusion; 1- Strongly disagree for importance for inclusion.)
Summary of post-meeting message board activity for draft statements.
| Statement Category | Mean Ranking | Number of Comments | Number of Views |
|---|---|---|---|
|
| 7.5 | 10 | 77 |
|
| 5.8 | 13 | 103 |
|
| 7.5 | 14 | 85 |
|
| 8.3 | 5 | 84 |
|
| 9.2 | 8 | 131 |
|
| 7.5 | 18 | 125 |
|
| 10.0 | 7 | 85 |
|
| 6.7 | 12 | 87 |
|
| 8.3 | 6 | 90 |
Rankings were assigned (out of 10) by participants who did not respond with a written comment.