| Literature DB >> 20950423 |
Elizabeth L Turner1, Joanna E Dobson, Stuart J Pocock.
Abstract
BACKGROUND: Reports of observational epidemiological studies often categorise (group) continuous risk factor (exposure) variables. However, there has been little systematic assessment of how categorisation is practiced or reported in the literature and no extended guidelines for the practice have been identified. Thus, we assessed the nature of such practice in the epidemiological literature. Two months (December 2007 and January 2008) of five epidemiological and five general medical journals were reviewed. All articles that examined the relationship between continuous risk factors and health outcomes were surveyed using a standard proforma, with the focus on the primary risk factor. Using the survey results we provide illustrative examples and, combined with ideas from the broader literature and from experience, we offer guidelines for good practice.Entities:
Year: 2010 PMID: 20950423 PMCID: PMC2972292 DOI: 10.1186/1742-5573-7-9
Source DB: PubMed Journal: Epidemiol Perspect Innov ISSN: 1742-5573
Main features of the 58 eligible articles with a continuous risk factor.
| American Journal of Epidemiology (48) | 31 (53%) |
| Annals of Epidemiology (23) | 9 (16%) |
| Epidemiology (16) | 6 (10%) |
| Journal of the American Medical Association (26) | 4 (7%) |
| New England Journal of Medicine (32) | 3 (5%) |
| Annals of Internal Medicine (11) | 2 (3%) |
| British Medical Journal (24) | 1 (2%) |
| International Journal of Epidemiology (16) | 1 (2%) |
| Lancet (49) | 1 (2%) |
| Journal of Clinical Epidemiology (9) | 0 |
| < 1,000 | 10 (17%) |
| 1,000-5,000 | 23 (40%) |
| 5,000-20,000 | 8 (14%) |
| 20,000-100,000 | 11 (19%) |
| > 100,000 | 6 (10%) |
| Cohort | 31 (53%) |
| Cross-sectional | 17 (29%) |
| Case-control | 10 (17%) |
| Binary | 27 (47%) |
| Time to event | 18 (31%) |
| Continuous | 9 (16%) |
| Ordered categorical | 2 (3%) |
| Unordered categorical | 2 (3%) |
| Categorically only | 29 (50%) |
| Both continuously and categorically | 21 (36%) |
| Continuously only | 8 (14%) |
Journal issues from December 2007 to January 2008 were reviewed.
a All original research articles in each journal issue
Figure 1Example from the survey: categorical results displayed as a figure. Relative risk (& 95% CI) for coronary events by quintiles of NO2 (μg/m3) exposure [17]. Reference category is bottom fifth, trend line is fitted. More emphasis given to quantitative analysis. This image is reproduced with permission from Epidemiology.
Categorisation characteristics of the main continuous risk factor.
| One | 38 (76%) |
| Two | 12 (24%) |
| Three or more | 0 |
| 2 | 3 (6%) |
| 3 | 9 (18%) |
| 4 | 17 (34%) |
| 5 | 13 (26%) |
| 6 | 4 (8%) |
| 7 to 10 | 3 (6%) |
| Unknownb | 1 (2%) |
| Quantiles | 17 (34%) |
| Equally spaced intervals | 9 (18%) |
| External criteria | 6 (12%) |
| Other | 17 (34%) |
| Unknownb | 1 (2%) |
| 10 (20%) | |
| Tables only | 37 (74%) |
| Figures only | 5 (10%) |
| Both Tables and Figures | 5 (10%) |
| Neither | 3 (6%) |
For the sub-set of articles where categorisation was performed (n = 50 articles).
a Of primary form of categorisation when more than one form was used
b One article stated that categorisation was used but no results were presented
Example of categorisation from the survey (1).
| Formaldehyde levels (ppb) | Adjusted | ||
|---|---|---|---|
| Prevalence (%) | Odds Ratio (95% CI) | Odds Ratio (95% CI) | |
| < 18 | 15/298 (5.0) | 1.00 | |
| 18-27 | 15/299 (5.0) | 1.03 (0.47-2.29) | 1.00 |
| 28-46 | 17/301 (5.7) | 1.11 (0.50-2.42) | |
| ≥47 | 10/100 (10.0) | 2.36 (0.92-6.09) | 2.25 (1.01-5.01) |
| P-value for trend = 0.08 | |||
Prevalence of atopic eczema by formaldehyde levels [19].
Reason for inclusion as an example. Two alternative groupings: 4 groups (split at 30th, 60th and 90th percentile) and top 10% versus the rest. Quantitative analysis also presented but not reported in abstract. P-value for trend calculated using category medians.
Example of categorisation from the survey (2).
| 34- < 37 weeks | < 34 weeks | |||||||
|---|---|---|---|---|---|---|---|---|
| Inflammation | No. | Prevalence (%) | Adjusted Odds Ratio | 95% CI | Prevalence (%) | Adjusted Odds Ratio | 95% CI | |
| No | 279 | 20.8 | 1.0 | 8.6 | 1.0 | |||
| Yes | 58 | 31.0 | 1.9 | 1.0,3.7 | 15.5 | 2.0 | 0.8,4.9 | |
Inflammation (C-reactive protein≥8 μg/ml) before 21 weeks' gestation and risk of spontaneous pre-term birth by preterm birth status [20].
Reason for inclusion as an example. Dichotomisation of inflammation with no alternative grouping presented. Quantitative analysis also presented but not reported in abstract.
Example of categorisation from the survey (3).
| % optimal birth weight | Adjusted | |
|---|---|---|
| Odds Ratio | 95% CI | |
| < 75 | 2.42 | 1.93,3.05 |
| 75-84 | 1.73 | 1.47,2.02 |
| 85-94 | 1.09 | 0.95,1.26 |
| 95-104 | 1 (referent) | |
| 105-114 | 0.98 | 0.83,1.15 |
| 115-124 | 0.97 | 0.76,1.24 |
| > 124 | 1.15 | 0.78,1.69 |
Risk of mild-moderate intellectual disability, by % optimal birth weight [21].
Reason for inclusion as an example. Seven groups in a large cohort, reference is middle group. Numbers in each group were not provided in the table.
Example of categorisation from the survey (4).
| No. of cases | No. of controls | Odds Ratio | 95% CI | |
|---|---|---|---|---|
| No bereavement | 2589 | 12722 | Referent | |
| Time since bereavement (yrs) | ||||
| ≤ 5 | 24 | 180 | 0.6 | 0.4, 1.0 |
| 6-10 | 18 | 116 | 0.8 | 0.5, 1.2 |
| 11-15 | 8 | 107 | 0.4 | 0.2, 0.8 |
| 16-20 | 11 | 80 | 0.7 | 0.4, 1.3 |
| ≥ 21 | 44 | 265 | 0.8 | 0.6, 1.1 |
Risk of amyotrophic lateral sclerosis (ALS) for bereaved parents by years since bereavement [23].
Reason for inclusion as an example. Example of a 'never' category, reference is 'never' category.
Estimation and statistical testing by analysis typea.
| Analysis type | ||||
|---|---|---|---|---|
| Continuous | 7 | 0 | 16 | 23 (40%) |
| By group for all groups | 0 | 4 | 6 | 10 (17%) |
| By group relative to ref group | 0 | 26 | 12 | 38 (66%) |
| Other | 1b | 1c | 3d | 5 (9%) |
| Continuous | 8 | 0 | 19 | 27 (47%) |
| Score trend test | 0 | 11 | 1 | 12 (21%) |
| Median/mean trend | 0 | 7 | 1 | 8 (14%) |
| Pairwise | 0 | 17 | 9 | 26 (45%) |
| Global | 0 | 3 | 6 | 9 (16%) |
| Other | 0 | 0 | 1e | 1 (2%) |
For the 58 articles with a continuous risk factor.
a More than one estimate type and more than one statistical test is possible: 40 (69%) articles had one type of estimate (8 from 'continuous', 27 from 'categorical' and 5 from 'both') whilst 18 (31%) articles had two types of estimate (2 from 'categorical' and 16 from 'both'); 35 (60%) articles had one type of statistical test (8 from 'continuous', 20 from 'categorical' and 7 from 'both'); 21 (36%) articles had two types of statistical test (9 from 'categorical' and 12 from 'both') and 2 (3%) articles (from 'both') had three types of statistical test.
b A continuous analysis estimate given as difference between 90th and 10th percentile.
c Reference group is the background population overall i.e. standardised incidence.
d One article gave hazard ratios per one category increase, another article compared 1st and 4th quartiles only, the final article reported the mean by categories.
e A t-test comparing means in two outcome groups.