Literature DB >> 26397878

Guidelines for reporting methodological challenges and evaluating potential bias in dementia research.

Jennifer Weuve¹, Cécile Proust-Lima², Melinda C Power³, Alden L Gross⁴, Scott M Hofer⁵, Rodolphe Thiébaut⁶, Geneviève Chêne⁶, M Maria Glymour⁷, Carole Dufouil⁸.

Abstract

Clinical and population research on dementia and related neurologic conditions, including Alzheimer's disease, faces several unique methodological challenges. Progress to identify preventive and therapeutic strategies rests on valid and rigorous analytic approaches, but the research literature reflects little consensus on "best practices." We present findings from a large scientific working group on research methods for clinical and population studies of dementia, which identified five categories of methodological challenges as follows: (1) attrition/sample selection, including selective survival; (2) measurement, including uncertainty in diagnostic criteria, measurement error in neuropsychological assessments, and practice or retest effects; (3) specification of longitudinal models when participants are followed for months, years, or even decades; (4) time-varying measurements; and (5) high-dimensional data. We explain why each challenge is important in dementia research and how it could compromise the translation of research findings into effective prevention or care strategies. We advance a checklist of potential sources of bias that should be routinely addressed when reporting dementia research.

Entities: Chemical

Keywords: Alzheimer disease; Big data; Brain imaging; Dementia; Epidemiologic factors; Genomics; Longitudinal studies; Neuropsychological tests; Selection bias; Statistical models; Survival bias

Mesh：

Year: 2015 PMID： 26397878 PMCID： PMC4655106 DOI： 10.1016/j.jalz.2015.06.1885

Source DB: PubMed Journal: Alzheimers Dement ISSN： 1552-5260 Impact factor: 21.566

1. Introduction

Despite more than two decades of research on prevention and treatment of dementia and aging-related cognitive decline, highly effective preventive and therapeutic strategies remain elusive. Many features of dementia render it especially challenging: supposedly distinct underlying pathologies lead to similar clinical manifestations, development of disease occurs insidiously over the course of years or decades, and the causes of disease and determinants of its severity are likely multifactorial. However, progress in preventing and treating dementia also rests on how dementia research is conducted: informative research requires valid and rigorous analytic approaches, and yet the research literature reflects little consensus on “best practices.” Several methodological challenges arise in studies of the determinants of dementia risk and cognitive decline. Some challenges, such as unmeasured confounding or missing data, are common in many research areas; others, such as outcome measurement error and lack of a “gold standard” outcome assessment, are more pervasive or more severe in dementia research [1-3]. Currently, researchers handle these challenges differently, making it difficult to directly compare studies and combine evidence. Although some methodological differences across studies arise because analytic methods are explicitly tailored to the study design and realities of the data at hand, other differences arise for less substantive reasons. Modifiable sources of inconsistency include the absence of consensus and definitive standards for best analytic approaches; different disciplinary traditions in epidemiology, clinical research, biostatistics, neuropsychology, psychiatry, geriatrics, and neurology; and software and technical barriers. The various analytic methods used in dementia research often address subtly distinct scientific questions, depend on different assumptions, and provide differing levels of statistical precision. Unfortunately, there is often insufficient attention to whether a chosen method addresses the most relevant scientific question and relies on plausible assumptions. Some common methods likely provide biased answers—i.e., answers that diverge systematically from the truth—to the most relevant scientific questions. Even if several alternative approaches might be appropriate and innovative or novel analyses used in individual studies may be valuable, it can be advantageous to report results using a shared approach [4,5]. The “inconsistent application of optimal methods” within and across studies makes it difficult to qualitatively or quantitatively summarize results across studies (meta-analyses). By contrast, a core set of shared analytic approaches would enhance opportunities to synthesize results and more conclusively address our research questions. Applying a set of standardized sensitivity analyses would help evaluate the plausible magnitude of various sources of bias or violations of assumptions. In randomized clinical trials (RCTs), for example, there are strict rules regarding intention-to-treat analyses, which are often complemented with additional approaches, such as per protocol analysis or modeling the complier average causal effect to account for noncompliance. The CONsolidated Standards of Reporting Trials (CONSORT) [6] and Strengthening the Reporting of Observational studies in Epidemiology (STROBE) guidelines [7] provide helpful indications of broad relevance in human subjects research but are too broad to address several specific methodological difficulties in dementia research. Topic-specific guidelines building on STROBE have proven useful in several domains, such as genetic association studies [8]. The MEthods in LOngitudinal research on DEMentia (MELODEM) initiative was formed in 2012 to address these difficulties and achieve greater consistency in the process of selecting and applying preferred analytic methods across research on dementia risk and cognitive aging. The initial MELODEM findings outline a set of methodological problems that should routinely be addressed in dementia research, summarized in the guidelines in Fig. 1. We advance this list as a working set of guidelines for transparent reporting of methods and results and therefore the best chance of accelerating scientific progress in identifying determinants as well as validating biomarkers for earlier diagnosis of Alzheimer’s disease (AD). The goals of MELODEM include fostering methodological innovation to address these challenges and improving understanding of tools to address each challenge. In this initial report from MELODEM, we focus on outlining major categories of bias and why they are especially relevant in dementia research. We briefly discuss in the following, with more details in the online supplement, five major challenges: (1) selection, i.e., handling selection stemming from study participation, attrition, and mortality; (2) measurement, i.e., dealing with the quality of measurements of exposure and outcomes and how imperfect measurement quality affects analysis and interpretation of results; (3) alternative timescale, i.e., specifying the time-scale and the shape of trajectories in longitudinal models; (4) time-varying exposures and confounding, i.e., accounting for changes in explanatory variables; and (5) high-dimensional data, i.e., analyzing complex and multidimensional data such as neuroimaging, genomic information, or database linkages.

Fig. 1

Guidelines for reporting methodological challenges and evaluating bias in cognitive decline and dementia research.

For some topics in the checklist (Fig. 1), substantial controversy remains regarding optimal analytic approaches, especially when considering both bias and variance of the methods. In many cases, although the potential for bias is clear, it has not been established that this bias is substantial in real data. The guidelines in Fig. 1 are intended as a first step toward improved evaluation and reporting of methodological challenges in dementia research, to support a move toward field-wide consensus on best practices, and identifying the highest priority areas for methodological innovations.

2. Major methodological challenges in longitudinal research on dementia

2.1. Defining the scientific question of interest

Epidemiologic research in cognitive aging identifies correlates of cognitive performance, cognitive decline, and dementia. A critical first step in conducting effective research is often taken for granted: clearly defining the scientific question and distinguishing whether this question is causal or predictive [9]. Research on the etiology of dementia or evaluating potential preventive or therapeutic interventions seeks to address causal questions. Predictive studies, designed for estimating prevalence or identifying high-risk individuals, are important for anticipating future trends in public health burden of dementia and individual patient outcomes. The analytic concerns of predictive studies differ from those of studies intended to support causal inferences. Observational research is often critiqued for drawing unsubstantiated causal inferences, but providing evidence to support causal inference is commonly the primary goal of statistical analysis. Although conclusive inferences for many treatments may ultimately depend on evidence from RCTs, intervention trials are typically fielded only after extensive observational research. This observational evidence should be accumulated with the goal of illuminating causal structures and guiding design of (eventual) trials.

2.2. Selection into the analysis sample

Selection issues in dementia research arise from differential attrition of enrolled participants, differential survival of enrolled participants, and differential enrollment, either due to refusal to participate or differential survival up to the moment of study initiation (Table 1 and Supplementary Appendix 1). Each of these processes can bias effect estimates. Spurious associations between the putative risk factor and cognitive decline or dementia can occur when selection processes are related to cognitive status and the exposure of interest (or their determinants). The bias is not necessarily toward the null (i.e., which would tend to mask an association) and can sometimes reverse the direction of association (i.e., making harmful exposures appear protective or protective exposures appear harmful) [10]. Impaired cognition and dementia have broadly debilitating consequences, setting the stage for selection bias in longitudinal studies of dementia; risk of illness [11-15], death [16,17], and study attrition [18-20] are all heightened among persons with impaired cognition. Selection bias is likely to result if the risk factor under study is associated with attrition as well. For example, chronic disease, adverse health behaviors, socioeconomic status, and race are all associated with substantial morbidity and mortality risks [3,21-25]. Various approaches are adopted to address potential selection bias (Table 1), but the performance of these approaches has rarely been evaluated. Even the likelihood of substantial bias from “ignoring the selection process,” which is arguably the most common strategy in the applied literature, is rarely formally quantified, although specific examples suggest it may be reasonably large. In the Chicago Health and Aging Project, accounting for selective attrition increased estimated associations between smoking and cognitive decline by 56%–86% [26]. MELODEM guidelines are intended to provide evidence to evaluate whether selection is likely to introduce a substantial bias.

Table 1

Selection processes: problems and commonly adopted analytic approaches*

Differential attrition of enrolled participants	Differential survival of enrolled participants	Differential study enrollment or “muting”
Approaches commonly applied to multiple selection problems
Ignore	Ignore	Ignore
Sensitivity analyses for magnitude of bias under plausible set of selection processes	Sensitivity analyses for magnitude of bias under plausible set of selection processes	Sensitivity analyses for magnitude of bias under plausible set of selection processes
Assess bounds based on best or worst case assumptions	Assess bounds based on best or worst case assumptions	Assess bounds based on best or worst case assumptions
Model determinants of selection to evaluate whether ignoring selection is appropriate	Model determinants of selection to evaluate whether ignoring selection is appropriate	Model determinants of selection to evaluate whether ignoring selection is appropriate
Adjust for determinants of selection	Adjust for determinants of selection	Adjust for determinants of selection
Weight on the inverse of the probability of selection	Weight on the inverse of the probability of selection	Weight on the inverse of the probability of selection
Instrumental variable methods, if an instrument for selection is available	Instrumental variable methods, if an instrument for selection is available	Instrumental variable methods, if an instrument for selection is available
Joint modeling of selection process (dropout, death, enrollment) and outcome	Joint modeling of selection process (dropout, death, enrollment) and outcome	Joint modeling of selection process (dropout, death, enrollment) and outcome
Approaches commonly applied only to specific selection problems
Multiple imputation and likelihood-based estimation including covariates related to missingness mechanism	Competing risks analysis (only when dementia is the outcome)	Principal stratification

For many of these approaches, there is currently limited empirical or theoretical evidence comparing the performance (i.e., providing a precise estimate of the effect of interest) in dementia research.

Over the course of longitudinal follow-up, many older participants may “drop out” or refuse continued study participation. If drop-out is dependent on measured parameters (a “missing at random” mechanism”), several analytical approaches can provide unbiased effect estimates; however, if drop-out is dependent on unknown or unmeasured parameters, there is no easy solution for bias correction (Table 1). In this situation, sensitivity analyses can illuminate the robustness of the findings [27]. Mortality is also a significant source of censoring in longitudinal studies. Many approaches considered appropriate for handling dropout-related attrition are more controversial in the context of survival, and there is no current consensus on preferred approaches [28]. Although cognitive function is “missing” for individuals who drop out of a study, it is more appropriately described as “undefined” for people who die [29]. This conceptualization suggests that people whose survival is determined by exposure should be excluded from the population for whom we try to estimate effects. For example, smoking may predict lower dementia diagnosis rates by causing earlier mortality from other causes [2]. Thus, the central challenge when addressing selective survival is to clearly define the question of scientific interest [28,29]: what parameter are we trying to estimate, for whom are we trying to estimate it, and which analysis methods correspond with this estimand (Table 1)? In dementia cohort studies, truncation of follow-up by death also introduces interval censoring, which occurs because diagnosis of dementia can only be made at periodic follow-up visits. Therefore, dementia status at death is unknown for participants who were free of dementia at their last visit before death. Interval censoring in the presence of competing risk of death can induce an underestimation of dementia incidence and alter estimated effects of exposures [23,25]. For example, the protective effect of high education on the risk of dementia was overestimated by 36% in men when not accounting for interval censoring in the French PAQUID cohort. This was likely because, while higher educational attainment predicts elevated risk of dementia diagnosis, it also predicts faster death after the diagnosis [25]. Bias may also arise from how participants are selected into a study if enrollment is influenced by the exposure of interest and the outcome (or their determinants). Similar biases may occur whether differential enrollment occurs because people with particular dementia risk factors systematically refuse study participation or because these people are unlikely to survive to the age of enrollment. Consider a hypothetical study of the effect of smoking on AD, enrolling participants at age 70 (Fig. 2). If the effects of smoking and APOE-ε4 status on mortality are synergistic (i.e., more than multiplicative), then the surviving 70-year-old smokers will have a lower APOE-ε4 prevalence than 70-year-old nonsmokers. The study may conflate effects of APOE-ε4 with the effects of smoking. Pre-enrollment selective mortality is particularly likely in population-based studies of older adults: persons exposed to detrimental risk factors may have survived to the age of enrollment only by virtue of their unusually effective detoxification genotype or cognitive acumen. Challenges arising from selective enrollment are broadly recognized, although the potential for this selection to compromise both generalizability and internal validity is sometimes disregarded.

Fig. 2

Hypothetical illustration of selection processes before and after study enrollment. At age 20, smoking and APOE status are unrelated, but these risk factors synergistically affect mortality, with more than multiplicative effects on survival up to age 70 [72]. By the time of study initiation at age 70, smokers are very unlikely to be APOE ε4 carriers. Analyses that did not control for APOE ε4 would conflate APOE status and smoking and spuriously underestimate effects of smoking.

Each of these selection processes may partially “mute” effect estimates. For example, the association between smoking and AD progressively attenuates, or even becomes protective, in older samples [2]. Similar muting effects with increased participant age are apparent for multiple other risk factors [2,22,30-33]. Selection may also create spurious differences in effect estimates between subgroups. Table 1 summarizes these problems and commonly adopted analytic approaches to addressing issues of selection. Evaluating the usefulness of each approach and how each of these approaches perform in specific situations in dementia research is an important methodological question. Although issues of selective attrition and selective survival affect both observational research and RCTs, selective enrollment does not compromise the internal validity of RCTs (although may reduce generalizability) conferred through randomization.

2.3. Measurement validity and reliability

Measurement challenges in dementia research result from the disjuncture between disease pathophysiology and clinical and research measures, due to imperfect validity and reliability (detailed in Supplementary Appendix 2). For example, performance on neuropsychological tests does not necessarily precisely reflect biological functioning and capacity of the brain. Similar measurement challenges pertain to measures of the consequences, severity, and progression of dementia, including functional dependency, neuropsychiatric symptoms, and behavioral patterns [34]. Distinguishing between neuropathologic processes in the brain and cognitive symptom trajectories is important to elucidate specific causal pathways to disease and possible interactive effects on closely related outcomes. Validity refers to whether a measurement instrument assesses the phenomenon of interest. Gold standard measures can be used to assess the validity of alternative measurements, but there is often no clear gold standard in dementia research. Measures valid for one group of people may not be valid for another, leading to biased estimates of disparities and risk factor effects [35,36]. For example, historical inequalities in educational access for African-Americans compared with white Americans have led to systematic differences in literacy levels in older adults. These literacy differences appear to contribute to racial disparities in dementia risk [35,36]. Identifying valid dementia biomarkers is critical but many efforts risk circular reasoning, in which we use clinical diagnoses to validate biomarkers and those same biomarkers to validate the clinical diagnoses [37]. Indeed, even the phenotype of interest is often controversial, and it is likely that many common disease definitions include diverse underlying pathologies. Reliability is the proportion of variability in a measure explained by the construct of interest, as opposed to the proportion attributable to measurement error (random fluctuations in the measurement not reflecting changes in the underlying construct) [38,39]. Nearly all neuropsychological assessments have substantial unreliability, which reduces statistical power and introduce the potential for regression to the mean. Practice or retest effects arising from changes in familiarity with the testing process, use of strategies, or recall of test-specific content can also hamper detection of cognitive decline [40] because practice effects can be large enough to offset several years of cognitive decline in elderly adults [41]. Cognitive declines stemming from incipient dementia may thus be impossible to detect because of practice-related improvements on test performance [41]. Practice effects could lead to underestimation of the rate of cognitive decline, incidence of dementia, and, if the magnitude of practice effects differs by background characteristics, to incorrect inferences about determinants of cognitive decline [41,42]. Statistical approaches for handling practice effects remain controversial (Table 2) [43,44].

Table 2

Measurement challenges: problems and commonly adopted analytic approaches*

Validity of measurement	Reliability/random measurement error	Practice or retest effects	Unequal-interval scaling (including ceilings/floors on measures)
Approaches commonly applied to multiple measurement problems
Ignore	Ignore	Ignore	Ignore
Multivariate latent variable methods or measurement error models	Multivariate latent variable methods or measurement error models
Approaches applied only to specific measurement problems
Compare to a gold standard/criterion validity	Instrumental variable analyses	Drop the first assessment or average first two assessments	Drop observations at the ceiling/ floor or otherwise condition on the baseline score
Compare to measures of theoretically correlated variables	Use composite scores from multiple neuropsychological assessments (e.g., summed Z-scores†)	Choose tests with limited retest effects	Item response theory or factor analysis based models. Factor analyses, imposing distributional assumptions
Evaluate Differential Item Functioning (DIF) and implement statistical corrections or adjustment for source of DIF		Randomize time of first assessment	Rescale by Z-scoring†
		Indicator for first assessment	Transform the measure with a monotonic transformation intended to reduce non-interval scales (e.g. logarithm, box-cox, specifically designed normalizing transformation)
		Other models of practice (linear or non-linear increases in practice effects)	Categorize the outcome (impaired vs. not impaired)
		Mixed models identifying practice effects based on time-varying interview delays	Tobit regression models (for ceilings/floors) or quantile regressions
			Joint estimation of a normalizing transformation of the outcome and the coefficients

Z-scoring rescales each individual’s raw score with respect to the distribution of scores for other individuals in the sample. From each individual’s raw score, the Z-score is calculated by subtracting the sample mean (usually at baseline) and dividing by the sample standard deviation (also at baseline).

Another measurement challenge comes from the assumption made by most common analytic methods that a 1-point difference in a test score has the same substantive meaning at high and low ends of the scale. A decline in Mini-Mental State Examination (MMSE) score from 25 to 24, however, may not be equivalent to a decline from 20 to 19. Ceilings and floors constitute extreme examples of such unequal interval scaling. Ceilings and floors attenuate effect estimates in cross-sectional analyses but may either attenuate or inflate effect estimates in longitudinal analyses [1]. An extensive simulation study showed that failing to account for unequal interval scaling of psychometric tests when studying effects of a risk factor on cognitive slope can substantially inflate type 1 errors (i.e., spurious associations) if the risk factor also predicts baseline cognitive level [45].

2.4. Alternative timescales and specification of longitudinal models

The specification of the timescale(s) and functions of within-person change in longitudinal studies can dramatically influence results and replicability. Because of the close link between age and dementia, age constitutes a natural and appropriate timescale for studying dementia risk or related binary outcomes [46-48]. In studies of cognitive ageing, this also applies when studying time-invariant exposures (such as gender or genes). However, research often addresses time-varying exposures that are measured only once during the study (e.g., nutrition, diabetes, treatment) and thus at different ages. In this situation, using the time since exposure measurement (usually enrollment) as the timescale may be more appropriate. When focusing on specific phases of cognitive aging such as the prodromal phase of dementia or terminal decline, reverse time (e.g., years before diagnosis or death) may also be informative [44,49]. However, using reverse time inherently selects participants who developed the outcome, which might cause biases in estimated longitudinal changes. This is an active area of methods development, with several approaches used in the current literature (see section 2.2). Generally, the fundamental underlying causal process presumed to be relevant should guide the choice of metrics [50] (Table 3).

Table 3

Defining the time scale for longitudinal analyses: problems and commonly adopted analytic approaches*

Divergence of within-person change and between-person age differences	Analysis of terminal decline preceding death, dementia, or other “milestone” events	Nonlinear cognitive trajectories
Approaches commonly applied to multiple time scale problems
Ignore	Ignore	Ignore
Approaches applied only to specific time scale problems
Age as the time-scale with adjustment for age at entry or time-from-entry as the time-scale adjusting for age at entry	Analysis among the participants who had the event	Polynomial trajectory (quadratic, cubic)
Use of age at assessment as the time scale, without adjustment for age at entry.	Time to event as time scale in the group with event versus time to last measure for the healthy participants matched by or adjusted for the age at the last measure among others	Trajectories with random, pre-specified, or empirically selected change-points
Other time scale of interest adjusting for a cross-sectional age (possibly other than age at entry)	Joint model of the longitudinal outcome and the time to the event of interest (death, dementia or others)	Flexible parametric (splines, fractional polynomials) or non-parametric trajectories

Most observational studies on cognition recruit participants over a wide age range so that studying cognitive change with age mixes two processes: within-person change with age (usually of main interest) and between-person age differences that are also influenced by birth cohort. Ignoring age differences at baseline when studying cognitive decline with age is appropriate only if individuals “converge” onto the same age trajectory whatever their birth cohort: i.e., if a person entering at 85 years is expected to have the same cognitive level as a person entering at 65 years and followed for 20 years. This may be unrealistic [50,51]. For example, for women in the Whitehall cohorts, between-person age effects overestimated rate of cognitive decline compared with within-person effects because of large cohort differences in educational levels [52]. The “convergence” issue can be easily disentangled by distinguishing two timescales: a longitudinal timescale (e.g., current age or time since enrollment) for within-person change and a cross-sectional timescale (e.g., age at enrollment) for between-person age differences (Table 3). Average year-to-year cognitive changes are not expected to be the same at all ages; cognitive decline may accelerate at older ages. In studies with short follow-ups, linear approximations may be adequate [53] but with longer follow-up, age-heterogeneous samples, or pathologic events, linearity rarely holds. Approaches to account for this heterogeneity include polynomial cognitive trajectories [54]; biphasic trajectories with change points [55,56] or nonparametric estimation of cognitive trajectories [49]. The Supplementary Appendix 3 provides further detail on the problems of timescales.

2.5. Time-varying exposure/time-varying confounding

Pathologic brain changes are evident at least two decades before clinical dementia diagnosis. Effects of exposure on cognitive outcomes may depend on when exposure occurs, and the relevant timing likely differs for exposures influencing pathogenesis, disease progression, and/or maintenance of function. Research identifying relevant etiologic periods is essential for guiding clinical decisions and preventive interventions targeting known risk factors. For example, elevated blood pressure in midlife predicts higher dementia risk, whereas elevated blood pressure late in life does not [33,57-59]. Although the explanation for this difference is unclear, recommendations on hypertension treatment for dementia prevention must be tailored to a person’s age and existing morbidities. A “critical window” hypothesis has been suggested for hormone therapy effects on dementia, with benefits from initiation in the perimenopausal period but harms from later initiation [60]. A recent systematic review identified only a handful of studies directly addressing this question [61]. For evaluating etiologic periods, cohorts with very long follow-up (e.g., PREVENT [62] or the Framingham Heart Study [63]) are informative because they provide measures of exposure at multiple ages. Quantifying how an exposure’s effects on dementia evolve with age of exposure requires care because different methods of analyzing time-varying exposure data can yield results that vary substantially in magnitude and, more crucially, in interpretation (Table 4) [64]. Although risk factor-dementia associations that differ by age at exposure may reflect relevant etiologic windows, they could also reflect reverse causation or measurement error that differs across age-specific exposures. When cumulative effects of long-term exposure, rather than variation in exposure or point in time exposure effects, are hypothesized, repeated measures of exposure can be combined (e.g., averaged) to achieve more precise exposure estimates.

Table 4

Handling time-varying exposures and time-varying confounding: problems and commonly adopted analytic approaches*

Time-varying exposures	Time-varying confounding
Approaches commonly applied to both time-varying exposures and time-varying confounding
Ignore(e.g., consider variable at a single point in time)	Ignore(e.g., consider variable at a single point in time)
Marginal structural models and inverse probability weighting Structural nested models	Marginal structural models and inverse probability weighting Structural nested models
Approaches commonly applied to either time-varying exposures or time-varying confounding
Time-to-event models, allowing exposure to update or lag	Compare effect estimates with or without adjustment for time-varying confounders
Summaries of time-varying exposure (e.g. average, duration, age at initiation)	Longitudinal propensity score models
Compare estimates from several models using exposure status at a single point in time or moving time windows; formally test alternative lifecourse models.	Instrumental variables models

Just as exposures may vary over time, so too may confounders. Adding to the challenge are differences across studies in how and when exposures and potential confounders are measured. Similar to analyzing time-varying exposure data, adequately adjusting for confounding when exposures and confounders change over time requires special care, especially when confounders also act as potential mediators [65]. The Supplementary Appendix 4 provides further detail on and examples of the problems of time-varying exposures and confounders.

2.6. High-dimensional data

The proliferation of data sources and emergence of high-dimensional data could powerfully accelerate dementia research but only if harnessed effectively [66]. Administrative databases, omics data, brain imaging data (magnetic resonance imaging and positron emission tomography), biomarker panels assessing gene expression, and metabolic pathways, among many others, present both new challenges and opportunities (Supplementary Appendix 5). New technologies will provide information on numerous biomarkers, which may help us distinguish more specific dementia phenotypes, but we need strong measurement tools to better take advantage of these data. Challenges in the statistical analyses of high-dimensional data are numerous (Table 5), even when the number of observations is much larger than typically available in research cohorts. Big data do not necessarily resolve the familiar internal and external validity challenges in epidemiologic studies and may in some situations exacerbate challenges with measurement validity and selection bias. For example, dementia is known to be substantially underrepresented in many US administrative databases, due to underdiagnosis. Underdiagnosis rates may differ by demographic or other background variables [67]. A provocative and unexpected finding was recently reported from a UK administrative database (UK Clinical Practice Research Datalink) with almost 2 million individuals ages ≥40 years, accruing over 45,000 incident dementia cases [68]: obese people carried a third lower risk of developing dementia than their normal weight peers. The huge sample provided very precise effect estimates but also entailed numerous tradeoffs that may have introduced biases, such as limited confounder control based on variables recorded in the database and potential for misclassification of the outcome [69].

Table 5

Handling high dimensional data: challenges and commonly adopted analytic approaches

Multiple comparisons/false discovery	Summarizing multiple highly correlated variables	Regression with high dimensional data
Family wise error correction (e.g. Bonferroni)	Theoretically motivated summaries or selected indicators based on prior knowledge, e.g., candidate gene approaches	Preselection of the variables of interest for adjustment
False discovery rate (e.g. BH correction)	Combination of variables (e.g. principal components analysis, partial least square)	Regularization methods (e.g. Lasso, ridge regression, elastic net)

Recent worldwide efforts to merge data from multiple dementia cohorts are generating unique and promising databases: the European Medical Information Framework-AD, a partnership of academics, pharmaceutical companies, and medical informatics specialists focuses on the identification of preclinical biomarkers of AD. Despite huge sample sizes, results will be useful only if it is possible to derive harmonized measures of exposure, biomarkers, and outcomes. Coordinating such efforts is a major challenge [66]. The Integrated Analysis of Longitudinal Studies of Aging collaboration focuses on data harmonization and reproducibility of research from international longitudinal studies focused on aging and health-related change in cognition, health, and well-being [70]. High-dimensional data analysis generates multiple statistical comparisons/tests potentially addressed by various statistical corrections (family-wise error and false discovery). A related issue is possible “overfitting” (i.e. when a statistical model describes random error or noise instead of the underlying relationship). In this case, dimension reduction techniques can be used either during or before supervised analyses (e.g., Lasso or partial least-squares methods). The challenges here are both to account for the high dimensionality in models (multiple testing and overfitting) and to extract meaningful information from these data to better understand dementia etiology. New data sources may allow more powerful hypothesis-driven research, i.e., evaluating prespecified social, behavioral, or clinical determinants of dementia, and also agnostic search conducted in the absence of clearly specified hypotheses. Agnostic search approaches are challenging because of type 1 error problems, but this is an important frontier for the field. With the growing availability of biomarkers in large data sets, there may be new opportunities to evaluate interactive effects of risk factors and pathologic processes. For example, researchers may be able to model whether genetic or behavioral risk factors influence cognitive symptoms differently depending on level of underlying neuropathology. Importantly, because many but not all risk factors will directly influence neuropathology, this will entail careful mediation models to decompose direct, indirect, and interactive pathways. The Supplementary Appendix 5 provides further detail on the problems of high-dimensional data.

3. Conclusion

Clinical and epidemiologic research over the past two decades has witnessed a remarkable movement toward improved transparency, reproducibility, and methodological rigor, as reflected in the CONSORT [6,71], STROBE [7], and recently STARDdem [37] guidelines. Although the STROBE recommendations apply to studies of dementia, they provide quite general suggestions, with little guidance on many important issues of special salience in dementia and cognitive ageing research. The MELODEM guidelines fill this gap. Embracing and expanding on the foundation of STROBE and related efforts could strengthen and accelerate dementia research. The MELODEM guidelines, highlighting a set of common methodological challenges in longitudinal research on dementia, complements STROBE and CONSORT guidelines but focuses on technical challenges specific to dementia-related research. We hope researchers adopting the MELODEM guidelines will routinely acknowledge these challenges and justify analytic decisions. We anticipate that MELODEM will provide a platform for continued discussion and innovation of methodological tools to strengthen dementia research. The guidelines indicate standards for reporting but do not make specific recommendations for how to best address analytic challenges. Ideally, consensus recommendations for “common denominator” analyses will emerge in coming years to facilitate meta-analyses and integration of evidence. Findings from common denominator analyses could routinely be included in research articles, alongside additional methodological approaches based on research team innovations or particular strengths of the data set. Observational studies suggest major differences in prevalence and incidence of dementia-related phenotypes across population groups; these differences represent opportunities to prevent dementia, if we can identify the reasons for epidemiologic patterns. Such research will only be effective if we can overcome the methodological challenges discussed here. The challenges are not trivial, and these difficulties have presented important barriers to progress in research on the determinants of dementia incidence and progression. The most powerful statistical solutions may require capacity building, including new software and skills development, before they will be broadly adopted. Alongside major investments in strengthening measurements via genotyping, neuroimaging, and biomarker assessments, methodological advances hold promise to accelerate progress toward successful prevention and treatment of dementia. Indeed, without strong research methods, the investments in high-quality measures will be of little use.

68 in total

1. Methodological issues in the study of cognitive decline.

Authors: M C Morris; D A Evans; L E Hebert; J L Bienias
Journal: Am J Epidemiol Date: 1999-05-01 Impact factor: 4.897

2. Effect of literacy on neuropsychological test performance in nondemented, education-matched elders.

Authors: J J Manly; D M Jacobs; M Sano; K Bell; C A Merchant; S A Small; Y Stern
Journal: J Int Neuropsychol Soc Date: 1999-03 Impact factor: 2.892

3. The Framingham Offspring Study. Design and preliminary data.

Authors: M Feinleib; W B Kannel; R J Garrison; P M McNamara; W P Castelli
Journal: Prev Med Date: 1975-12 Impact factor: 4.018

4. Marginal structural models and causal inference in epidemiology.

Authors: J M Robins; M A Hernán; B Brumback
Journal: Epidemiology Date: 2000-09 Impact factor: 4.822

Review 5. Practice effects on cognitive tasks: a major problem?

Authors: Keith Wesnes; Claire Pincock
Journal: Lancet Neurol Date: 2002-12 Impact factor: 44.182

6. Education and sex differences in the mini-mental state examination: effects of differential item functioning.

Authors: Richard N Jones; Joseph J Gallo
Journal: J Gerontol B Psychol Sci Soc Sci Date: 2002-11 Impact factor: 4.077

7. A change point model for estimating the onset of cognitive decline in preclinical Alzheimer's disease.

Authors: C B Hall; R B Lipton; M Sliwinski; W F Stewart
Journal: Stat Med Date: 2000 Jun 15-30 Impact factor: 2.373

8. Are sex and educational level independent predictors of dementia and Alzheimer's disease? Incidence data from the PAQUID project.

Authors: L Letenneur; V Gilleron; D Commenges; C Helmer; J M Orgogozo; J F Dartigues
Journal: J Neurol Neurosurg Psychiatry Date: 1999-02 Impact factor: 10.154

9. Cognitive impairment and mortality in the community-dwelling elderly.

Authors: S S Bassuk; D Wypij; L F Berkman
Journal: Am J Epidemiol Date: 2000-04-01 Impact factor: 4.897

10. Individual differences in rates of change in cognitive abilities of older persons.

Authors: Robert S Wilson; Laurel A Beckett; Lisa L Barnes; Julie A Schneider; Julie Bach; Denis A Evans; David A Bennett
Journal: Psychol Aging Date: 2002-06

76 in total

1. Inequalities in dementia incidence between six racial and ethnic groups over 14 years.

Authors: Elizabeth Rose Mayeda; M Maria Glymour; Charles P Quesenberry; Rachel A Whitmer
Journal: Alzheimers Dement Date: 2016-02-11 Impact factor: 21.566

2. Jump, Hop, or Skip: Modeling Practice Effects in Studies of Determinants of Cognitive Change in Older Adults.

Authors: Alexandre Vivot; Melinda C Power; M Maria Glymour; Elizabeth R Mayeda; Andreana Benitez; Avron Spiro; Jennifer J Manly; Cécile Proust-Lima; Carole Dufouil; Alden L Gross
Journal: Am J Epidemiol Date: 2016-01-28 Impact factor: 4.897

3. Measuring childhood cancer late effects: evidence of a healthy survivor effect.

Authors: Peter Haubjerg Asdahl; Rohit Priyadarshi Ojha; Jeanette Falck Winther; Anna Sällfors Holmqvist; Sofie de Fine Licht; Thorgerdur Gudmundsdottir; Laura Madanat-Harjuoja; Laufey Tryggvadottir; Klaus Kaae Andersen; Henrik Hasle
Journal: Eur J Epidemiol Date: 2017-11-28 Impact factor: 8.082

4. Associations of cumulative Pb exposure and longitudinal changes in Mini-Mental Status Exam scores, global cognition and domains of cognition: The VA Normative Aging Study.

Authors: Zishaan Farooqui; Kelly M Bakulski; Melinda C Power; Marc G Weisskopf; David Sparrow; Avron Spiro; Pantel S Vokonas; Linda H Nie; Howard Hu; Sung Kyun Park
Journal: Environ Res Date: 2016-10-19 Impact factor: 6.498

5. Dealing with death when studying disease or physiological marker: the stochastic system approach to causality.

Authors: Daniel Commenges
Journal: Lifetime Data Anal Date: 2018-11-17 Impact factor: 1.588

6. A longitudinal study of polychlorinated biphenyls and neuropsychological function among older adults from New York State.

Authors: Eva M Tanner; Michael S Bloom; Kurunthachalam Kannan; Julie Lynch; Wei Wang; Recai Yucel; Edward F Fitzgerald
Journal: Int J Hyg Environ Health Date: 2019-11-06 Impact factor: 5.840

7. The role of education in a vascular pathway to episodic memory: brain maintenance or cognitive reserve?

Authors: Laura B Zahodne; Elizabeth Rose Mayeda; Timothy J Hohman; Evan Fletcher; Annie M Racine; Brandon Gavett; Jennifer J Manly; Nicole Schupf; Richard Mayeux; Adam M Brickman; Dan Mungas
Journal: Neurobiol Aging Date: 2019-08-14 Impact factor: 4.673

8. Education and Cognitive Aging: Accounting for Selection and Confounding in Linkage of Data From the Danish Registry and Survey of Health, Ageing and Retirement in Europe.

Authors: Else Foverskov; M Maria Glymour; Erik L Mortensen; Anders Holm; Theis Lange; Rikke Lund
Journal: Am J Epidemiol Date: 2018-11-01 Impact factor: 4.897

9. Cognitive Aging in Black and White Americans: Cognition, Cognitive Decline, and Incidence of Alzheimer Disease Dementia.

Authors: Jennifer Weuve; Lisa L Barnes; Carlos F Mendes de Leon; Kumar B Rajan; Todd Beck; Neelum T Aggarwal; Liesi E Hebert; David A Bennett; Robert S Wilson; Denis A Evans
Journal: Epidemiology Date: 2018-01 Impact factor: 4.822

10. An Investigation of Selection Bias in Estimating Racial Disparity in Stroke Risk Factors.

Authors: D Leann Long; George Howard; Dustin M Long; Suzanne Judd; Jennifer J Manly; Leslie A McClure; Virginia G Wadley; Monika M Safford; Ronit Katz; M Maria Glymour
Journal: Am J Epidemiol Date: 2019-03-01 Impact factor: 4.897