| Literature DB >> 29394950 |
Hannah McKenna1, Charlene Treanor2, Dermot O'Reilly2,3,4, Michael Donnelly2,3,4.
Abstract
PURPOSE: To review studies about the reliability and validity of self-reported alcohol consumption measures among adults, an area which needs updating to reflect current research.Entities:
Keywords: COSMIN systematic review; Psychometric properties; Self-reporting alcohol intake
Mesh:
Year: 2018 PMID: 29394950 PMCID: PMC5797334 DOI: 10.1186/s13011-018-0143-8
Source DB: PubMed Journal: Subst Abuse Treat Prev Policy ISSN: 1747-597X
Fig. 1Search strategy; List of free text terms and medical subject headings searched for using the conjunctions ‘AND’ or ‘OR’ to find articles which met the inclusion criteria using the online bibliographic databases
Summary of characteristics and psychometric properties for included studies
| Author (country) | Study Population | Methods used | Studies and measures | Psychometric properties reported by studies | COSMIN quality ratings |
|---|---|---|---|---|---|
| Bonevski et al. (2010) | Group 1 was 30% male and 70% female, Group 2 37% male and 63% female, Group 3 44% male and 56% female and Group 4 41% male and 59% female. Group 1 mean age 25 years. Group 2 mean age 27 years. Group 3 mean age 25 years. Group 4 mean age 25 years. | Participants were asked to recall alcohol intake using either a computer or paper administered measure. 4–7 days later both modes of measures were administered again. | Weekly quantity-frequency measure. | Test-retest reliability-kappa coefficient range (0.90–0.96). Test-retest reliability was good. | Test-retest reliability |
| Chaikelson et al. (1994) | Random sampling was used. The sample was 100% male with mean age 69 years. Wives were also asked same questions via written questionnaire to assess concordance. | Results compared to alcohol test the MAST (Michigan Alcoholism Screening Test [ | Short-term recall measure (drinking occasions in the previous month recall). | Test-retest reliability- kappa coefficients (0.76) total lifetime drinking, (0.84) last reported month and (0.77) monthly alcohol consumption indicating good test-retest reliability. | Test-retest reliability (fair) |
| Crum et al. (2002) | Random sampling was used. The sample was 58% female and 42% male with mean age 76.2 years. Data was obtained from the 1993–1994 follow-up of the Washington County cohort of men and women 65 years and older. | Participants completed a measure of their usual alcohol consumption in two ways: (1) a quantity-frequency measure; (2) same questions asked in an interview about drinking habits. | Weekly quantity-frequency measure. | Hypothesis validity-past week recall of alcohol intake 15–20% lower than the quantity-frequency measure. Hypothesis validity was good. | Hypothesis validity (good) |
| Cutler et al. (1988) | Random sampling was used. 63.4% of the sample were male and 36.6% female. No median or mean age was reported but participants were aged 18 and older. | CAGE responses and the quantity-frequency questions taken from Health Survey Questionnaire were compared. | Weekly quantity-frequency measure. | Criterion validity-sensitivity (42.9) specificity (97.1) positive predictive value (65.8) negative predictive value (92.8) for males and sensitivity (46.6) specificity (98.6) positive predictive value (50.3) negative predictive value (98.4) for females indicating good criterion validity. | Criterion validity (excellent) |
| Dollinger et al. (2009) | The sample was composed of volunteers and was 61% female and 39% male with a mean age 22 years. | Responses to quantity-frequency measures at both time points compared. Nightly log of alcohol consumption compared to hours spent studying, socialising and religious behaviours. | Daily graduated-frequency measure. | Test-retest reliability-alcohol quantity coefficient of 0.85 and an alcohol frequency coefficient of 0.84 indicating good test-retest reliability. | Test-retest reliability (fair) |
| Greenfield et al. (2014) | Random sampling was used. Respondents were 48.1% male and 53.2% female and aged over 18 years. | Participants completed questionnaires and a follow-up survey by phone or mail. | Short-term recall measure (occasions of ≥5 drinks during specific life decades). | Test-retest reliability-kappa values for gender (0.64–0.80), age groups (0.59–0.83), ethnicity (0.70–0.73), interview mode (0.72–0.73) and childhood victimisation (0.75) (0.73) indicating moderate to good test-retest reliability. | Test-retest reliability (fair) |
| Gruenewald et al. (1995) | Random sampling was used. Respondents were 43.5% male and 56.5% female and aged 18 years or older. | Responses to graduated-frequency measures at two time points compared. | Gruenewald et al. (1995) | Test-retest reliability-coefficients for average drinking quantity | Test-retest reliability (fair) |
| Hansell et al. (2008) | Random sampling was used. Respondents were 40% male and 60% female and aged between 19 and 90 years old. | The measures examined were a dependence score, based on DSM-IIIR (Diagnostic and Statistical Manual of Mental Disorders [ | Annual quantity-frequency measure | Test-retest reliability-continuous data quantity x frequency of alcohol (0.61) between phase 1 and phase 3, and (0.55) between phase 2 and phase 3. Categorical data quantity x frequency of alcohol (0.64) between phase 1 and phase 3, and (0.59) between phase 2 and phase 3, indicating moderate test-retest reliability. | Test-retest reliability (poor) |
| Hilton (1989) | Volunteer sample. Respondents were 50% male and 50% female and had a mean age of 30 years. The volunteer participants were recruited from the San Francisco Bay Area newspaper. | Participants completed 2 retrospective recall measures-graduated-frequency and beverage-specific quantity-frequency measures post diary completion. Responses compared. | Short-term recall measure (10 week recall). | Convergentvalidity-correlations 0.88 for volume of drinks consumed, 0.85 for days of beer consumed, 0.89 for days of beer usually consumed, 0.80 for days of wine consumed, 0.66 for days of wine usually consumed, 0.81 for days of liquor consumed and 0.65 for days of liquor usually consumed, indicating moderate to good convergent validity. | Convergent validity (fair) |
| Koppes et al. (2002) | Random sampling was used. Respondents were 46% male and 54% female with mean age 36 years. Data was collected from 1 time point, the 2000 follow-up measurement of 171 male and 197 female participants from the Amsterdam Growth and Health Longitudinal Study. | Subjects visited study premises for 1 day. The quantity-frequency measure and dietary history interview were based on alcohol consumption over the previous month and were completed in no particular order. | Quantity-frequency measure (ranging from never drinking to daily alcohol intake). | Concurrent validity-correlation between (0.77) for men and (0.87) for women, which indicates good concurrent validity. | Criterion validity (poor) |
| LaBrie et al. (2004) | The sample was composed of volunteers and was 100% male with a mean age of 20.6 years. 211 male college students participated. | Drinking variables assessed were drinking days, average drinks, and total drinks during a 30-day period. | Short-term recall measure (monthly TimeLine follow back method). | Convergentvalidity-correlation coefficients between 0.52–0.69 showing moderate convergent validity. | Convergent validity |
| Lennox et al. (1996) | Analysis was conducted of a sample of a household survey aged 18–64 years. Gender proportions were not reported. Responses were analysed from 1 time point (the 1991 follow-up) from 8755 participants in the 1988 National Household Survey of Drug Abuse. | Used a latent variable approach. In this model covariation among multiple indicators was used as an estimate of the latent construct. | Quantity-frequency measure of alcohol consumption over past 30 days. | Structural validity-correlations at 0.36, alcohol abuse and consequences between constructs correlates at 0.28 showing poor structural validity. | Structural validity (fair) |
| McGinley et al. (2014) | A sample of 18–20 year olds were selected from respondents to the National Survey on Drug Use and Health. Gender proportions were not reported. | Quantity and frequency of alcohol consumption estimates derived from graduated-frequency measure. Estimates compared to the quantity-frequency measure. | Graduated-frequency measure of alcohol consumption over past 30 days. | Construct validity-mid values for quantity of alcohol consumed were (3.5) and (14.5) for frequency indicating poor construct validity. | Construct validity |
| Northcote and Livingston (2011) | Respondents were 47.3% male and 53.3% female and aged 18–25 years. | Participants reported number of alcoholic drinks consumed 1–2 days after drinking occasion which was compared to reported alcohol intake observed by peer-based researchers on the occasion. | Short-term recall measure (last occasion self-report of drinks consumed). | Criterion validity-significant associations with | Criterion validity (poor) |
| O’Hare et al. (1991) | Respondents were 41.6% female 58.4% male and with mean age 20.6 years. | Participants were asked to complete mailed questionnaire with both measures of alcohol consumption included. | Weekly graduated-frequency measure. | Convergent validity-correlations were significant at 0.74, with gender specific correlations for men as 0.79 and women 0.60, indicating moderate to good convergent validity. | Convergent validity (good) |
| O’Hare et al. (1997) | Random sample of an undergraduate university population. Gender proportions were reported as ‘representative of sex’. Respondents had a mean age of 18.7 years. | All students completed quantity-frequency questions, MmMAST and 7 day recall. The MmMAST was used as a criterion variable. | Weekly graduated-frequency measure. | Criterion validity-association was significant at | Criterion validity (fair) |
| Parker et al. (1996) | Random sampling was used. Respondents were 39% male and61% female and aged 18–64. Data was taken from surveys 1987–1989, 1989–1990 and 1992–1993 of the Pawtucket Health Program conducted among home dwelling adults. | Alcohol intake assessed with food frequency question as a component of the general health survey was compared against alcohol intake assessed with a graduated-frequency measure as part of a survey. | Short-term recall measure (beverage specific past 24 h recall). | Concurrent validity-kappa statistics reported between measures ranged from 0.08 ( | Criterion validity (poor) |
| Poikolainen et al. (2002) | Volunteer sample recruited from their workplace. Respondents were 83% female and 17% male with a mean age of 42 years. | Quantity-frequency and graduated-frequency obtained before and after 1-month daily recall on alcohol intake. Blood sample obtained at outset. | Annual quantity-frequency questionnaire. | Convergent validity-coefficients were 0.95 between the short-term recall measure and quantity-frequency 1, 0.95 between the short-term recall measure and quantity-frequency 2, 0.90 between the short-term recall measure and graduated-frequency 1 and 0.93 between the short-term recall measure andgraduated-frequency 2. Convergent validity was reported as good. | Convergent validity (good) |
| Read et al. (2006) | College students who reported drinking different amounts of alcohol were selected for the sample to be representative of variation in drinking levels. Respondents were 52% female and 48% male with a mean age 19 years. | College students completed self-report questionnaire on demographic characteristics, drinking behaviours and drinking consequences. Drinking consequences assessed with composite measure based on Drinker Inventory of Consequences and Young Adult Alcohol Problem Screening Test developed by researchers. | Short-term recall measure (past 90 day intake). | Concurrent validity-correlation values of 0.36, | Criterion validity (excellent) |
| Rehm et al. (1999) | The sample was chosen to be representative of the wider drinking population. Respondents were 48% male and 52% female, and chosen to be representative of age ≥ 18 years. | Population samples from 4 surveys conducted for Alcohol Research Group. Surveys used computer-assisted telephone interviews with random digit dialling sampling techniques. | Quantity-frequency measure for drinking occasion. | Convergent validity-correlations moderate at both approximately 0.40. | Convergent validity (fair) |
| Reid et al. (2003) | Random sampling was used. The veteran primary care sample was 3% female 97% male and the community dwelling sample was 60% female 40% male. Mean ages were 73.1 for the veteran primary care sample and 75.9 for the community dwelling sample. | Telephone call allowed self-report of quantity-frequency measure, binge and heavy drinking questions, and the AUDIT (Alcohol Use Disorders Identification Test [ | Weekly quantity-frequency measure. | Inter-rater reliability-kappa values were 0.44 and 0.33. For population sample 2 kappa values were 0.21 and 0.46 indicating moderate to poor inter-rater reliability. | Inter-rater Reliability (fair) |
| Russell et al. (1991) | Random sampling was used. Respondents were 50.5% male and 49.5% female and aged over 18 years. Data was taken from 1 time point of the survey. | Quantity-frequency questions were asked about the amount and frequency of particular alcoholic beverages consumed via telephone interview using a random-digit-dial technique and supplemented by samples of homeless people, college students and those without telephones. | Typical annual beverage-specific Quantity-frequency measure | Criterion validity-correlations between 0.73 and 0.77 for subtypes of alcohol reported showing good criterion validity. | Criterion validity (poor) |
| Sander et al. (1997) | 175 patients with traumatic brain injury were recruited from a medical rehabilitation centre along with their relatives. Respondents were 65% male and 35% female. Mean age 39.2 years for patients and 45.9 years for relatives. | Alcohol use examined 1 year after injury through quantity-frequency measure and brief MAST test. Patients and their relatives both completed measures and concordance between reports were examined. | Annual quantity-frequency measure | Concurrent validity-concordance showed 95.4% agreement indicating good criterion validity. | Criterion validity (fair) |
| Searles et al. (1995) | The sample was chosen to be representative of male drinking population in Vermont enrolled in the Alcohol Research Centre. Respondents had a median age of 28 years(ranging from 21 to 56 years) and were 100% male. | Subjects self-reported daily alcohol intake via telephone.At 90days subjects completed an interview using DSM criteria to assess alcohol abuse ordependence. | Short-term recall measure (Daily self-report of alcohol intake). | Predictive validity-correlations0.86 andwith alcohol related problems level as 0.69. Predictive validity is moderate between daily self-report and retrospective recall and alcohol related problems, and good between daily self-report and retrospective recall and alcohol intoxication level. | Predictive validity (poor) |
| Searles et al. (2000) | Volunteer sample of those enrolled in the Vermont Alcohol Research Centre. Respondents were 100% male and had a mean age of 36.2 years for those without alcohol problems tested at outset and 30.4 years for those with alcohol problems. | Participants recorded alcohol intake on interactive voice response system using telephones. In person interviews were conducted every 13 weeks during which they completed timeline follow back. Results were compared. | Short-term recall measure (Timeline Follow back over 366 days). | Convergent validity-correlations 0.60 at 180 days of administration, 0.57 at 270 days of administration and 0.57 at 366 days of administration, indicating moderate convergent validity. | Convergent validity (fair) |
| Tuunanen et al. (2013) | The sample included 45 year olds resident in Finnish city of Tampere. The sample was 100% male. | Participants completed a mailed health questionnaire which invited previous week recall of alcohol intake, a quantity-frequency measure and structured quantity-frequency questions based on the AUDIT. | Quantity-frequency measure (typical drinks consumed per occasion). | Hypothesis validity-the past week recall measure reported mean alcohol consumption lower than the quantity-frequency measure indicating good hypothesis validity. | Hypothesis validity (fair) |
| Weingardt et al. (1998) | Random sampling was used. Respondents were 58% female and 42% male and aged 18–20 years.Data was taken from 1990 and 1994 cohorts of college undergraduate students. | Peak consumption, typical weekend quantity and typical daily quantity measures used to derive binge drinking data to analyse validity. Binge drinking defined as 5–6 drinks per occasion for men and 3–4 drinks per occasion for women. | Graduated-frequency measure (peak monthly alcohol consumption). | Concurrent validity-r value 0.57 and Alcohol Dependence Scale with r value 0.54. | Criterion validity (good) |
| Whitfield et al. (2004) | Voluntary sample. Respondents were 36% male and 64% female with a mean age of 33.7 years. Data was taken from 3 waves (1980, 1989 and 1993) using adult male and female participants of the AustralianTwin Registry. | Test-retest reliability was calculated as correlations between occasions and between measures. Relationships between alcohol use and lifetime DSMIIIR alcohol dependence examined. | Annual quantity-frequency measure. | Test-retest reliability-correlations between (0.54–0.70) indicating moderate to good test-retest reliability. | Test-retest reliability (fair) |
Table Legend: Table summarising the characteristics, findings and COSMIN quality ratings of included studies grouped by study author, study population, methods used, studies and measures, psychometric properties reported by study authors and COSMIN quality ratings
Fig. 2PRISMA flow diagram [16]; Flowchart depicting the process of searching, selecting and sifting studies according to eligibility criteria. The search stages were identification, screening, eligibility and inclusion
COSMIN definitions of domains, measurement properties, and aspects of measurement properties [18]
| Term | Definition | ||
|---|---|---|---|
| Domain | Measurement property | Aspect of a measurement property | |
| Reliability | The degree to which the measurement is free from measurement error | ||
| Reliability (extended definition) | The extent to which scores for patients who have not changed are the same for repeated measurement under several conditions: e.g. using different sets of items from the same health related-patient reported outcomes (HR-PRO) (internal consistency); over time (test-retest); by different persons on the same occasion (inter-rater); or by the same persons (i.e. raters or responders) on different occasions (intra-rater) | ||
| Internal consistency | The degree of the interrelatedness among the items | ||
| Reliability | The proportion of the total variance in the measurements which is due to ‘true’a differences between patients | ||
| Measurement error | The systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured | ||
| Validity | The degree to which an HR-PRO instrument measures the construct(s) it purports to measure | ||
| Content validity | The degree to which the content of an HR-PRO instrument is an adequate reflection of the construct to be measured | ||
| Face validity | The degree to which (the items of) an HR-PRO instrument indeed looks as though they are an adequate reflection of the construct to be measured | ||
| Construct validity | The degree to which the scores of an HR-PRO instrument are consistent with hypotheses (for instance with regard to internal relationships, relationships to scores of other instruments, or differences between relevant groups) based on the assumptionthat the HRPRO instrument validly measures the construct to be measured | ||
| Structural validity | The degree to which the scores of an HR-PRO instrument are an adequate reflection of the dimensionality of the construct to be measured | ||
| Hypotheses testing | Idem construct validity | ||
| Cross-cultural validity | The degree to which the performance of the items on a translated or culturally adapted HR-PRO instrument are an adequate reflection of the performance of the items of the original version of the HR-PRO instrument | ||
| Criterion validity | The degree to which the scores of an HR-PRO instrument are an adequate reflection of a ‘gold standard’ | ||
| Responsiveness | The ability of an HR-PRO instrument to detect change over time in the construct to be measured | ||
| Responsiveness | Idem responsiveness | ||
| Interpretabilityb | Interpretability is the degree to which one can assign qualitative meaning - that is, clinical or commonly understood connotations – to an instrument’s quantitative scores or change in scores. | ||
Table Legend: Table of definitions of psychometric properties measured by the COSMIN checklist, grouped by property (e.g. reliability, validity, responsiveness and interpretability)
aThe word ‘true’ must be seen in the context of the CTT, which states that any observation is composed of two components – a true score and error associated with the observation. ‘True’ is the average score that would be obtained if the scale were given an infinite number of times. It refers only to the consistency of the score, and not to its accuracy [54]
bInterpretability is not considered a measurement property, but an important characteristic of a measurement instrument
Summary table of the advantages and disadvantages of the quantity-frequency, graduated-frequency and short-term recall measures
| Measure type | Advantages | Disadvantages |
|---|---|---|
| Quantity-frequency measures | • Easily administered. | • May not record heavy episodic drinking occasions. |
| Graduated-frequency measures | • Categories act as prompts for respondents. | • May not record heavy episodic drinking occasions. |
| Short-term recall measures | • Can focus questions on specific drinking events. | • Hard to standardise answers to the same measure recorded in different formats. |
Table Legend: Summary of the advantages and disadvantages of the three self-reported alcohol consumption measure types; the quantity-frequency, graduated-frequency and short-term recall measures