| Literature DB >> 32925068 |
Toni Saari1,2, Anne Koivisto1,3,4,5, Taina Hintsa2, Tuomo Hänninen3, Ilona Hallikainen1.
Abstract
Neuropsychiatric symptoms cause a significant burden to individuals with neurocognitive disorders and their families. Insights into the clinical associations, neurobiology, and treatment of these symptoms depend on informant questionnaires, such as the commonly used Neuropsychiatric Inventory (NPI). As with any scale, the utility of the NPI relies on its psychometric properties, but the NPI faces unique challenges related to its skip-question and scoring formats. In this narrative review, we examined the psychometric properties of the NPI in a framework including properties pertinent to construct validation, and health-related outcome measurement in general. We found that aspects such as test-retest and inter-rater reliability are major strengths of the NPI in addition to its flexible and relatively quick administration. These properties are desired in clinical trials. However, the reported properties appear to cover only some of the generally examined psychometric properties, representing perhaps necessary but insufficient reliability and validity evidence for the NPI. The psychometric data seem to have significant gaps, in part because small sample sizes in the relevant studies have precluded more comprehensive analyses. Regarding construct validity, only one study has examined structural validity with the NPI subquestions. Measurement error was not assessed in the reviewed studies. For future validation, we recommend using data from all subquestions, collecting larger samples, paying specific attention to construct validity and formulating hypotheses a priori. Because the NPI is an outcome measure of interest in clinical trials, examining measurement error could be of practical importance.Entities:
Keywords: Alzheimer’s disease; Neuropsychiatric Inventory; behavioral and psychological symptoms of dementia; dementia; measurement; neuropsychiatric symptoms; reliability; validity
Mesh:
Year: 2022 PMID: 32925068 PMCID: PMC9108559 DOI: 10.3233/JAD-200739
Source DB: PubMed Journal: J Alzheimers Dis ISSN: 1387-2877 Impact factor: 4.160
Fig. 1The skip-question and scoring procedure of the Neuropsychiatric Inventory. This procedure is repeated until all domains are covered.
COSMIN guidelines for items to be assessed in systematic reviews of PROMs [28]
| Psychometric property |
| Content validity |
| Structural validity |
| Internal consistency |
| Reliability |
| Measurement error |
| Hypotheses testing for construct validity |
| Cross-cultural validity/measurement invariance |
| Criterion validity |
| Responsiveness |
Flake et al. [27] phases for construct validation
| Phase | Validity evidence |
| Substantive | Literature review and construct conceptualization |
| Item development and scaling selection | |
| Content relevance and representativeness | |
| Structural | Item analysis |
| Factor analysis | |
| Reliability | |
| Measurement invariance | |
| External | Convergent and discriminant |
| Predictive or criterion | |
| Known groups differences |
Psychometric properties reported in the development, validation and translation studies of the Neuropsychiatric Inventory
| Authors, year and version | Sample | Setting | Content validity | Structural validity: item analysis | Structural validity: factor analysis or IRT | Internal consistency (Cronbach’s alpha) | Test-retest reliability | Interrater reliability | Measurement error | Convergent validity | Discriminant validity | Criterion validity | Cross-cultural validity and measurement invariance | Known groups differences |
| Cummings 1994; 10-domain [ | 40 (20 AD, 9 VaD, 11 other dementia) for convergent validity; 45 for interrater (42 AD, 1 VaD, 2 other dementia), of which 20 participated in test-retest; 20 caregivers for responsiveness | University or Veterans Affairs dementia clinic or clinical trial participants | Established via Delphi panel | N/R | N/R | Whole scale 0.88, severity 0.88, frequency 0.87 | 0.51–1 | 0.89–1 | N/R | 0.33–0.76 (BEHAVE-AD, HAM-D) | 22% domains correlated; MMSE correlated -0.31 to -0.39 with De, Di, An and AMB; age correlated 0.38 with Ap | N/R | N/R | 4.5% FP rate; differences across MMSE strata |
| Binetti et al., 1998; 10-domain [ | 50 Italian (AD); 50 American (AD) for cross-cultural validity analyses | N/R | Back-translation | N/R | N/R | Whole scale 0.76, individual domains 0.68–0.74 | Total score 0.78 | 0.84–1 | N/R | N/R | N/R | N/R | Stratified by MMSE, Italian patients had higher total and Ap and AMB scores than American patients | Differences across MMSE strata |
| Choi et al., 2000; 12-domain [ | 92 Korean (43 AD, 32 VaD, 11 FTLD, 3 PDD, 1 PSP, 1 NPH, 1 TBI); 49 controls; 29 for test-retest reliability analysis, 7 of which were in the control group | Tertiary care | Back-translation, reviewed by an expert group, piloted in 10 patients and further modified. | N/R | N/R | Whole scale 0.85, severity 0.82, frequency 0.81 | 0.43–0.78 | N/R | N/R | N/R | N/R | N/R | N/R | 0–22.4% FP rate; differences across MMSE strata |
| Leung et al., 2001; 10-domain [ | 62 Chinese (41 AD, 16 VaD, 5 other dementia), 29 of which were in the inter-rater reliability analysis | Tertiary care | Back-translation, appraisal by psychiatric experts | N/R | N/R | Whole scale 0.84, severity 0.86, frequency 0.79 | N/R | 0.92–1 | N/R | 0.48–0.77 (BEHAVE-AD, HAM-D) | N/R | N/R | N/R | 0–11.7% FN and 2.1% FP rate; differences across MMSE strata |
| Fuh et al., 2001; 12-domain [ | 95 Taiwanese (AD); 86 of which were in the test-retest analysis | Tertiary care | Back-translation, reviewed by expert panel | N/R | PCA using domain frequency scores* | Whole scale 0.78, severity 0.78, frequency 0.74 | 10-domain scale frequency 0.88, severity 0.84, 12-domain scale frequency 0.85, severity 0.82, domains 0.37–0.76 for frequency and 0.34–0.79 for severity | N/R | N/R | N/R | Ha, Di and SD correlated significantly with CDR; MMSE correlated –0.25 with AMB | N/R | N/R | N/R |
| Baiyewu et al., 2003; 12-domain [ | 40 Nigerian (39 AD, 1 nonspecific dementia), 10 of which were in test-retest and 15 in inter-rater reliability studies | Community | Back-translation, harmonization using established procedures | Distribution for frequency scores | N/R | Whole scale 0.90, severity 0.73, frequency 0.73 | Total score 0.81 | Whole scale 0.99 | N/R | N/R | MMSE correlated –0.32 to –0.47 with De, Ha, and Ag, and –0.33 with total score; ADL 0.33 to 0.5 with Dep, An, SD, and 0.32 with total score | N/R | No differences in NPI total scores stratified by CDR | |
| Politis et al., 2004; 12-domain [ | 29 Greek (AD) | Tertiary care | Back-translation | N/R | N/R | Whole scale 0.76, domains 0.69–0.76 | N/R | N/R | N/R | 0.48–0.68 (BPRS) | N/R | De, Ag, Apa, AMB, Ir, SD and total score distinguished patients referred for ‘behaviors causing fear’ vs ‘behaviors causing embarassment’ | N/R | N/R |
| Kørner et al., 2008; 12-domain [ | 72 Icelandic (59 AD, 8 VaD, 5 other dementia); 29 controls; 84 of the combined sample participated in the test-retest analysis and 17 in the inter-rater reliability analysis | Tertiary care | Back-translation | N/R | Loevinger coefficients 0.25 for whole scale severity, frequency and total scores* | N/R | Total score 0.88, no statistically significant differences between domains’ change scores | Whole scale 0.94 | N/R | N/R | N/R | N/R | Rasch analyses indicate measurement invariance for sex | Differences in total score across dementia severity |
| Camozzato et al., 2008; 12-domain [ | 36 Brazilian (AD), all of which participated in test-retest and inter-rater reliability studies | Tertiary care | Back-translation, adaptation to ensure cultural and educational comprehension | Distribution for frequency scores | N/R | Severity 0.7 | Total score 0.82, 0.86 severity, 0.82 frequency; domains 0.4–0.97 | Total score 0.98, severity 0.96; domains 0.12–0.91 | N/R | N/R | N/R | N/R | N/R | N/R |
| Gallo et al., 2009; 12-domain NPI-A [ | 124 (62 AD, 43 VaD, 19 mixed dementia) | Outpatient memory assessment program | N/R | N/R | PCA for all items under the 12 domains, resulting in 3-component structure | All items 0.96, domains 0.57–0.91 | N/R | N/R | N/R | N/R | N/R | N/R | N/R | N/R |
| Wang et al., 2012; 12-domain [ | 219 Mainland Chinese (AD) | Tertiary care | Back-translation, appraisal by psychiatric experts | N/R | PCA using domain scores* | Whole scale 0.69 | Total score 0.96, domains 0.66–0.95 | N/R | N/R | N/R | N/R | N/R | N/R | N/R |
| Malakouti et al., 2012; 12-domain [ | 100 Iranian (diagnosis of dementia), 50 of which participated in inter-rater reliability analyses, of these, 30 were randomly selected for test-retest; the other 50 participated in convergent validity analyses; 49 controls | Convenience sample | Back-translation, appraisal by researchers, pilot study in four caregivers | N/R | N/R | Whole scale 0.8, domains 0.73–0.82 | 0.51–0.95 | 0.59–0.98 | N/R | 0.3–0.9 (PANSS, GDS-15) | MMSE correlated –0.34 to –0.56 with Ap, SD, Ag, AMB, and –0.49 with NPI total score | N/R | N/R | Ag, An, Ir and Eu elevated in controls; differences across MMSE strata |
| Davidsdottir et al., 2012; 12-domain [ | 38 Icelandic (19 AD, 19 VaD) | Tertiary care | Back-translation by a translator blinded to the original NPI, pilot study | Item-total correlations 0.25–0.69 | N/R | Whole scale 0.81, frequency 0.76, severity 0.78 | Total score 0.86, domains 0.38–0.96 | N/R | N/R | 0.18–0.9 (BEHAVE-AD, GDS-30) | N/R | N/R | N/R | Ap scores associated with disease severity |
| Ferreira et al., 2015; 12-domain [ | 166 European Portuguese (“cognitive deficits” 60%) | Nursing home | Translated, details unavailable in English | N/R | N/R | Whole scale 0.76, domains 0.71–0.77 | Total score 0.91, domains 0.3–0.98 | N/R | N/R | 0.17 Depression domain with GDS | MMSE correlated –0.17 to –0.18 with De, Di and AMB | N/R | N/R | N/R |
Combined items from Prinsen et al. [28] as well as Flake, Pek, and Hehman [27]. Values are correlations, percentages or coefficient alpha for frequency×severity scores, unless otherwise indicated. *This is not a method of structural validation in the traditional sense, as it aims to find higher-order structures, not that the existing structures perform as intended. Abbreviations for NPI domains: De, delusions; Ha, hallucinations; Ag, agitation; Dep, depression; Eu, euphoria; An, anxiety; Ap, apathy; Di, disinhibition; Ir, irritability; AMB, aberrant motor behavior; SD, sleep disturbances; AED, appetite and eating disturbances. Other abbreviations: AD, Alzheimer’s disease; BEHAVE-AD, Behavioral Pathology in Alzheimer’s Disease Rating Scale; BPRS, Brief Psychiatric Rating Scale; CDR, Clinical Dementia Rating scale; FN, false negative; FTLD, fronto-temporal lobar degeneration; FP, false positive; GDS, Geriatric Depression Scale; HAM-D, Hamilton Depression Rating Scale; IRT, item response theory; MMSE, Mini-Mental State Examination; NPI-A, Neuropsychiatric Inventory-Alternative; NPH, normal pressure hydrocephalus; PANSS, Positive and Negative Symptoms Scale; PSP, progressive supranuclear palsy; VaD, vascular dementia
Fig. 2Two levels of factor analyses of the Neuropsychiatric Inventory. 1) The syndrome level is where the majority of the NPI factor analytic studies has taken place. The aim of this research is to explore whether latent variables (syndrome ABC), often called a syndrome, would underlie the correlations between different domain scores (A, B, C) of the NPI. These studies implicitly assume that the domain scores can be used as useful indicators for a latent variable. This is the level of factor analytic research reviewed by Canevelli et al. [48]. 2) The subquestion level is critical for establishing structural validity, but it has not been extensively studied, indicated by the dashed lines. The aim of this research is to show that the subquestions (e.g.., A1, A2, ... An) address a unidimensional construct (the one suggested by the screening question), justifying their use in scoring the domain. Structural validity studies can reveal, for example, that the relationships between subquestions and the latent construct are not strong enough, that the subquestions under a single domain address more than one construct like in the study by Gallo et al., or that subquestions from a domain could reflect some other construct instead, or in addition to, the one it is intended to. To explore potential cross-loadings (e.g., C1 to domain C and A), asking all subquestions from the informant without screening questions is required.