| Literature DB >> 31822541 |
Mitchell H Gail1, Douglas G Altman2, Suzanne M Cadarette3, Gary Collins4, Stephen Jw Evans5, Peggy Sekula6, Elizabeth Williamson7, Mark Woodward8.
Abstract
The purpose of this paper is to help readers choose an appropriate observational study design for measuring an association between an exposure and disease incidence. We discuss cohort studies, sub-samples from cohorts (case-cohort and nested case-control designs), and population-based or hospital-based case-control studies. Appropriate study design is the foundation of a scientifically valid observational study. Mistakes in design are often irremediable. Key steps are understanding the scientific aims of the study and what is required to achieve them. Some designs will not yield the information required to realise the aims. The choice of design also depends on the availability of source populations and resources. Choosing an appropriate design requires balancing the pros and cons of various designs in view of study aims and practical constraints. We compare various cohort and case-control designs to estimate the effect of an exposure on disease incidence and mention how certain design features can reduce threats to study validity. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: cardiac epidemiology; epidemiology; health informatics; public health; statistics & research methods
Mesh:
Year: 2019 PMID: 31822541 PMCID: PMC6924819 DOI: 10.1136/bmjopen-2019-031031
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Key points.
Figure 2Designs for estimating an association between an exposure and disease incidence.
Numbers of incident disease cases in a cohort study of 10 000 exposed and 20 000 unexposed individuals followed for 10 years
| Exposed | Not exposed | Total population | |
| Developed disease | 100 | 50 | 150 |
| Did not develop disease | 9900 | 19 950 | 29 850 |
| 10 000 | 20 000 | 30 000 |
Cohort study designs, including subsampling from the cohort
| Data needed | Quantities that can be estimated | Strengths | Weaknesses | |
| Prospective cohort study | Eligibility information; baseline exposure and other covariate information; dates of follow-up and diagnosis of disease(s) | Exposure-specific absolute risks; relative risks; absolute risk differences; other | Baseline exposure and other covariate data are less subject to ‘reverse causation’ or to recall bias. Ability to obtain updated exposure values; ability to estimate absolute risks of several health outcomes | Very large samples and long-term follow-up may be needed for rare outcomes. Not feasible to obtain extensive covariate information for all members of a large cohort. |
| Case-cohort study; subcohort is a subsample of the prospective cohort | As for cohort except exposure and other covariate information only needed for cases and for the subsample | As for prospective cohort | As for cohort. Expensive laboratory tests and questionnaire processing only needed for cases and members of subcohort. Easy to estimate absolute risks of several health outcomes | Because one does not know at the outset who will develop disease, blood samples and unprocessed questionnaire data needed to be collected (but not analysed) for all members of the cohort. Mild loss of precision for estimating certain parameters, compared with full cohort. |
| Nested case-control study within a cohort; controls matched to cases on time (ie, age or time since recruitment) from those at risk at that time | As for cohort except exposure and other covariate information only needed for cases and for the matched controls | As for prospective cohort | As for cohort. Expensive laboratory tests and questionnaire processing only needed for cases and matched controls | As for case-cohort. Additionally, the controls are tailored to one disease. |
| Historical cohort study | Eligibility information; baseline exposure and other covariate information; dates of follow-up and diagnosis of disease(s). This is obtained from historical records | As for prospective cohort | Baseline exposure and other covariate information typically not subject to ‘reverse causation’. Because historical data are used, one does not need to wait for disease to develop. | Records (eg, industrial administrative files) may be incomplete, making it difficult to reconstruct who was in the cohort, to obtain accurate and complete follow-up information and to obtain accurate baseline exposure and other covariate information. |
Case-control designs that are not nested within an explicit cohort
| Data needed | Quantities that can be estimated | Strengths | Weaknesses | |
| Population-based incident case-control study | Eligibility information; representative samples of incident cases and controls from the source population. Retrospective information on exposure and other covariates, including possible laboratory measurements. | Relative odds of disease and relative risks of disease if controls are age-matched to cases. Only if external data on disease rates in the population are available can exposure-specific absolute risk be estimated. | Few controls needed, compared with cohort study. Time to accrue cases is short, compared with cohort study. Possible to obtain extensive information on exposure and other covariates. | Exposure and other covariates subject to recall bias and reverse causation. Low participation rates may lead to biased samples of cases or controls. Usually not possible to obtain serial exposure and other covariate measurements. Usually limited to a single health outcome. However, a single large control group may serve for several diseases in a study population. |
| Hospital-based incident case-control study | Eligibility information; data from hospital cases and hospital controls with some other disease. Retrospective information on exposure and other covariates, including possible laboratory measurements. | Relative odds or relative risks with respect to the control disease(s), not necessarily with respect to the source population. | As for population-based incident case-control study. Higher participation rates than in general population and more willingness to provide biological samples. | As for incident case-control study. Also, the cases and controls may not be representative of the general population due to selection bias for a particular hospital. If the exposure is associated with the control disease, the exposure OR will be biased. |