Literature DB >> 30696511

Assessing Planning Ability Across the Adult Life Span in a Large Population-Representative Sample: Reliability Estimates and Normative Data for the Tower of London (TOL-F) Task.

Josef M Unterrainer1, Benjamin Rahm1, Christoph P Kaller2, Philipp S Wild3, Thomas Münzel4, Maria Blettner5, Karl Lackner6, Norbert Pfeiffer7, Manfred E Beutel8.   

Abstract

OBJECTIVES: The Tower of London (TOL) test has probably become the most often used task to assess planning ability in clinical and experimental settings. Since its implementation, efforts were made to provide a task version with adequate psychometric properties, but extensive normative data are not publicly available until now. The computerized TOL-Freiburg Version (TOL-F) was developed based on theory-grounded task analyses, and its psychometric adequacy has been repeatedly demonstrated in several studies but often with small and selective samples.
METHOD: In the present study, we now report reliability estimates and normative data for the TOL-F stratified for age, sex, and education from a large population-representative sample collected in the Gutenberg Health Study in Mainz, Germany (n=7703; 40-80 years).
RESULTS: The present data confirm previously reported adequate indices of reliability (>.70) of the TOL-F. We also provide normative data for the TOL-F stratified for age (5-year intervals), sex, and education (low vs. high education).
CONCLUSIONS: Together, its adequate reliability and the representative age-, sex-, and education-fair normative data render the computerized TOL-F a suitable diagnostic instrument to assess planning ability. (JINS, 2019, 25, 520-529).

Entities:  

Keywords:  Gutenberg Health Study (GHS); Normative data; Planning; Reliability; TOL-F; Tower of London

Mesh:

Year:  2019        PMID: 30696511      PMCID: PMC6669988          DOI: 10.1017/S1355617718001248

Source DB:  PubMed          Journal:  J Int Neuropsychol Soc        ISSN: 1355-6177            Impact factor:   2.892


INTRODUCTION

Since the introduction of the Tower of London (TOL) planning paradigm by Tim Shallice (1982), several studies have denoted insufficient psychometric properties of the task, especially an insufficient reliability of the original 12-item problem set (α=.25; Humes, Welsh, Retzlaff, & Cookson, 1997; see also Berg & Byrd, 2002; Kafer & Hunter, 1997; Lowe & Rabbitt, 1998). One approach to increase its reliability was to select items from a larger pool of items based on the item-total correlations and to re-evaluate the internal consistency of the resulting set of items (cf. Schnirman, Welsh, & Retzlaff, 1998; 30-item TOL with α=.79). As an alternative approach, based on comprehensive problem space analyses and empirical data, Kaller, Unterrainer, and Stahl (2012) introduced a TOL problem set of 32 items consisting of four-, five-, six-, and seven-move TOL problems. This version revealed acceptable split-half reliability (r=.72) and internal consistency (α=.69) values for TOL performance in terms of the total number of correctly solved problems. Further improvement through item selection resulted in the development of the computerized TOL-Freiburg Version (TOL-F; Kaller, Unterrainer, Kaiser, Weisbrod, & Aschenbrenner, 2012), with a substantial reduction from 32 to 24 items using four- to six-move TOL problems only. Subsequently, Kaller et al. (2016) presented psychometric data on the TOL-F from two large-scale samples revealing adequate internal consistency and split-half reliability (α=.73; ωtot=.73; glb=.76) both of which were stable across the adult life span. In summary, TOL versions are now available that provide satisfactory reliability, a sufficiently broad range of item difficulties and an adequate test economy in terms of a relatively short and clinically practicable test duration. From this overview, it seems as if test versions that comprised a larger number of problems yielded higher reliability. One reason for this may be that the abovementioned studies by Schnirman et al. (1998) and our own group that provided adequate test criteria have used an optimized selection of problems drawn from a larger item pool. But moreover, a larger number of problems may be advantageous in that it reduces the impact of basic strategy learning during early parts of testing on overall performance. For example, results by Shallice (1982) and especially Morris, Miotto, Feigenbaum, Bullock, and Polkey (1997), the latter using the Tower-of-Hanoi, suggested that early items may stress different processes when the participant is developing a strategy from later problems where strategy may be relatively stable. Quite obviously, stable strategy should result in more stable estimates of planning performance. While it has become common to publish a detailed description of the single items used in a study (e.g., Culbertson & Zillmer, 1998; Krikorian, Bartok, & Gay, 1994), supporting tests of reproducibility and the comparability of different versions, there is a clear shortage of publicly available normative data. As a notable exception, Rognoni et al. (2013) presented normative data of Spanish young adults (age 18 to 49 years; n=179) of the 10-item Tower of London-Drexel University test (Culbertson & Zillmer, 2001). Michalec et al. (2017) provided normative standards of 298 healthy adults (age 19 to 84 years) using the original 12-item TOL. Only recently, Boccia et al. (2017) reported the standardization of a 16-item TOL (containing the original 12 items by Shallice plus 4 newly added problems; n=896 individuals, aged 15–86 years), taking into account gender, age, and years of education. This was well justified by previous findings showing that planning ability clearly depends on age, education level, and sex (D’Antuono et al., 2017). Kaller et al. (2016) revealed a linear increase of difficulty, sex, and age. To be specific, performance differences between the sexes and the age groups gradually increased from four-, over five-, to six-move problems. This is in line with larger performance variability in more difficult problems, whereas easier four-move problems are usually almost perfectly solved by most participants. Yet, in accordance with the Board of Assessments of the European Federation of Psychologists’ Associations (EFPA, 2013), good to excellent sample sizes in subgroups should contain 100 to 150 respondents each. Although the overall number of approximately 900 individuals in the study of Boccia et al. is quite respectable, it is clearly insufficient given this recommendation for fine-grained analyses. In some subgroups, percentiles were not applicable due to the limited number of participants (Boccia et al., 2017). As outlined above, a psychometrically well-validated and reliable TOL version providing a fine-grained standardization with a large sample size has not been available by now, but would be highly desirable for use of the TOL by neuropsychologists in both research and clinical practice. Thus, the aim of the present study was two-fold: To re-evaluate the reliability of the previously reported TOL version across the adult life span in a larger sample and to use a sufficiently large number of participants to provide normative data that account for individual age, education level, and sex. We believe that there is a common agreement in test development to adjust for sociodemographic data such as age or educational attainment (see, e.g., Wechsler Adult Intelligence Scale-Revised, or Raven tests). This renders comparisons within groups more meaningful, which may be especially important for clinical assessments, and often is regarded as increasing test fairness for comparisons across groups. As the only publicly available standardization of the TOL by Boccia et al. (2017) also adjusted for age, education level, and sex, we followed their approach, however, providing the recommended number of cases and a psychometrically improved TOL version. To these aims, we present psychometric and normative data on the computerized TOL-F (24 items) from a large sample (n=7703) collected in the Gutenberg Health Study (GHS) in Germany.

METHODS

Sample

The GHS was designed as a population-based, prospective, observational, single-center cohort study in the Rhine-Main region in western mid-Germany. The primary aim was to evaluate and improve cardiovascular risk stratification. The still-ongoing project examines cardiovascular diseases, cancer, eye diseases, metabolic diseases, diseases of the immune system, and mental diseases. The study aims at improving the individual risk prediction for diseases. Therefore, lifestyle, psychosocial and environmental factors, laboratory parameters, as well as the extent of the subclinical disease are investigated. In the baseline examination between April 2007 and March 2012, the GHS assessed a representative population sample of approximately 15,000 individuals from the city of Mainz and the district of Mainz-Bingen (overall population approximately 400,000 residents). The sample was drawn randomly from the governmental local registry offices in the city of Mainz and the district of Mainz-Bingen, where every inhabitant of the area is obliged to register. The sample was stratified 1:1 for sex and residence (urban and rural) and in equal strata for decades of age. Individuals between 35 and 74 years of age were enrolled, and written, informed consent was obtained from all participants. No seeding of persons with very low ability or health status was performed. The only exclusion criteria concerned insufficient knowledge of the German language to understand instructions and to give informed consent and physical or psychological inability to participate in the examinations at the study center. The norms thus are based on data of German speakers of different backgrounds. Demographic characteristics of the sample are presented in Table 1.
Table 1.

Demographic characteristics of the Gutenberg Health Study sample

All (7703)Men (51.4%)Women (48.6%)
Age [years]59.5 (10.6)59.8 (10.6)59.2 (10.5)
Body mass index [kg/m²]26.9 (24.1/30.2)27.4 (25.1/30.3)26.0 (23.1/30.1)
SES*
Not retired14.06 (4.20)14.59 (4.22)13.42 (4.09)
Retired11.41 (4.35)12.16 (4.44)10.70 (4.14)
Education
Years of education12.90 (2.01)13.16 (2.07)12.63 (1.91)
Secondary schools35.9%35.9%36.0%
No vocational training5.4%2.6%8.3%
Apprenticeship46.9%38.2%56.0%
Technician/master15.6%19.0%12.0%
Intermediate secondary schools24.9%18.8%31.3%
High school10.2%13.6%6.6%
University, university of applied science11.3%15.8%6.5%
Marital status
Living in a partnership86.1%89.0%82.9%
Married74.5%78.3%70.5%
Married, living separated1.8%1.8%1.9%
Registered partnership0.1%0.1%0.1%
Divorced9.0%7.8%10.2%
Widowed6.1%2.9%9.6%
Unmarried8.4%9.1%7.8%
Status of employment
Unemployed2.2%2.6%1.8%
Full-time40.5%53.3%27.0%
Part-time12.2%3.4%21.6%
Small-scale employment3.5%2.2%4.9%
Retired42.4%39.5%45.5%
Income [€, after tax]
Not retired2125 (1375/3375)2875 (1979/3875)1625 (875/2125)
Retired1375 (875/2125)1875 (1375/2875)875 (450/1625)
Household income [€, after tax] Not retired
<12503.7%2.7%5.0%
1250–250019.9%16.7%23.7%
>250076.4%80.6%71.3%
Household income [€, after tax] Retired
<12508.7%6.2%11.2%
1250 - 250040.1%37.8%42.4%
>250051.2%56.0%46.4%

Socioeconomic status (SES) was defined according to Lampert and Kroll’s scores of SES (Lampert & Kroll, 2006) ranging from 3 to 21 with 3 indicating the lowest and 21 the highest SES. This scoring combines three different dimensions that represent school education level and professional training, income, and professional status. Please note that all variables that comprise the participants’ income are indicated separately for not retired and retired subjects. Normally distributed variables are presented by their mean and their standard deviation (one number in brackets). Variables not following a normal distribution are shown using their median and their interquartile range (two numbers in brackets). Relative frequencies are shown in percent.

Demographic characteristics of the Gutenberg Health Study sample Socioeconomic status (SES) was defined according to Lampert and Kroll’s scores of SES (Lampert & Kroll, 2006) ranging from 3 to 21 with 3 indicating the lowest and 21 the highest SES. This scoring combines three different dimensions that represent school education level and professional training, income, and professional status. Please note that all variables that comprise the participants’ income are indicated separately for not retired and retired subjects. Normally distributed variables are presented by their mean and their standard deviation (one number in brackets). Variables not following a normal distribution are shown using their median and their interquartile range (two numbers in brackets). Relative frequencies are shown in percent. The present analyses comprise 7870 subjects who participated in the second run of the GHS and were tested between June 2012 and December 2015. Subjects’ age ranged between 40 and 80 years. The GHS was approved by local ethics authorities. Data acquisition complied with local institutional research standards for human research and was completed in accordance with the Helsinki Declaration. To assess effects of age on planning ability, the sample was divided into eight 5-year groups between 40 and 80 years of age, covering an age range from mid- to late adulthood (Table 2). Besides individual age and sex, subjects were also characterized by their highest achieved education level assessed on a 5-point scale with the following levels (Kaller et al., 2016): An educational level of 1 corresponded to 8 or less years of schooling and was typically applied to participants who completed elementary school, but did not obtain higher education (n=73). An educational level of 2 was used to classify participants who completed 9 years of schooling, but without vocational training (n=798). An educational level of 3 corresponded to 10 to 12 years of education and the completion of vocational training (n=3956). An educational level of 4 was used to denote the completion of high school and the qualification for university entrance (n=809). An educational level of 5 was assigned if a participant had obtained an academic degree (n=2241). Information on education level was not available for two subjects who were consequently excluded.
Table 2.

Descriptives of the Gutenberg Health Study sample

SexED.L.Age groups (years)N Accuracy (mean)Accuracy (SD)Cancel 3 False (%)Time out 20 min (%)
MaleLow40.00–44.9912115.623.54.11.7
45.00–49.9920514.773.53.91.5
50.00–54.9927414.803.25.11.1
55.00–59.9927313.953.27.71.5
60.00–64.9931313.743.413.12.9
65.00–69.9931212.563.416.73.8
70.00–74.9935711.613.424.97.3
75.00–79.9924711.043.824.311.3
High40.00–44.9922616.333.24.90.4
45.00–49.9924916.363.34.00.4
50.00–54.9931615.563.35.40.3
55.00–59.9928715.273.39.10.7
60.00–64.9925114.623.28.00.8
65.00–69.9921613.673.619.42.3
70.00–74.9917512.303.120.07.4
75.00–79.9914011.353.728.614.3
FemaleLow40.00–44.9917114.403.24.70.0
45.00–49.9928414.133.16.70.7
50.00–54.9929113.573.49.61.7
55.00–59.9936113.483.316.11.4
60.00–64.9939812.233.316.82.8
65.00–69.9939511.773.323.34.8
70.00–74.9941310.543.436.36.5
75.00–79.992549.693.239.014.6
High40.00–44.9917615.563.05.10.0
45.00–49.9923515.233.06.00.9
50.00–54.9920814.513.711.11.4
55.00–59.9919513.623.19.71.0
60.00–64.9912313.023.418.71.6
65.00–69.9910412.513.326.96.7
70.00–74.998312.143.427.76.0
75.00–79.995011.003.632.012.0

Note. Sample descriptives in dependence of sex, education level (ED.L.), and age. N denotes the respective subsample size. Accuracy represents the total number of correctly solved TOL-problems (at maximum 24). Test cancellations due to exceeding the item-wise time limit for solution three times in a row (Cancel 3 False) are given as percentage of the subsample. Likewise, test cancellations because of reaching the overall 20-min limit for the duration of the test session (Time out 20 min) are presented as percentage of the subsample.

Descriptives of the Gutenberg Health Study sample Note. Sample descriptives in dependence of sex, education level (ED.L.), and age. N denotes the respective subsample size. Accuracy represents the total number of correctly solved TOL-problems (at maximum 24). Test cancellations due to exceeding the item-wise time limit for solution three times in a row (Cancel 3 False) are given as percentage of the subsample. Likewise, test cancellations because of reaching the overall 20-min limit for the duration of the test session (Time out 20 min) are presented as percentage of the subsample. To preserve sufficient numbers of subjects within cells considering the factors sex and age groups, we excluded participants with educational level 1 (0.93% of the overall sample) from further analysis and merged participants with educational level 2 and 3 to a factor labeled “low education” and participants with educational level 4 and 5 to one factor called “high education”. Data inspection revealed 73 cases (0.99%) of the sample with no usable data that presumably showed a lack of motivation or task compliance and that were hence also excluded before the analyses. At the beginning of the study, participants used a computer mouse to solve the tasks. Due to handling-problems in older subjects, the study continued with a touchscreen as response device. Thus, the first 19 cases who had used the computer mouse were excluded and the final sample consisted of n=7703 participants. An overview on the descriptive information for age, sex, and education level of the two overall samples as well as of the resulting subgroups is provided in Table 2.

Tower of London – Freiburg Version (TOL-F)

Task description

The TOL-F (Kaller, Unterrainer, Kaiser, et al., 2012) is as a computerized pseudo-realistic representation of the originally wooden configuration of the Tower of London and is implemented in the Vienna Test System (VTS; https://www.schuhfried.com/test/TOL-F, last accessed 2018-04-18). In the TOL-F, individual problem items consist of a start and a goal state that are presented in the lower and upper halves of the computer screen, respectively. Subjects are instructed to transform the start into the goal state in the minimum number of moves which are shown to the left of the start state. Written instructions inform that only one ball may be moved at a time, that balls cannot be placed beside the rods, that only the top-most ball can be moved in case several balls are stacked on a rod, and that the rods differ in their capacities of accommodating one, two, or three balls at maximum. The computer program does not allow breaking these rules, but records any attempts to do so. Instructions further emphasize that problems have to be solved in the minimum number of moves and that participants should always plan ahead the problem solution before starting with movement execution. To transfer the start into the goal state, the TOL-F can be worked on by touch screen. Thus, a ball is picked up simply by clicking the ball via finger touch. The selected ball is then encircled by a transparent whitish corona and can be moved to another rod. The respective rod is likewise selected by finger touch. Participants were not allowed to retract moves after they were made. During the instruction phase, participants’ task comprehension was controlled by two two-move problems. To get used to the task and to handling the touchscreen, participants practiced with an additional set of four three-move problems. Only thereafter, the proper testing started, comprising eight four-, five-, and six-move problems presented in increasing minimum number of moves, respectively. The instruction and practice phase was scheduled to take 5 min, whereas for the testing of the 24 problems a time limit of 20 min was applied. After initial pilot testing in 2012, it turned out that this time limit was sufficient for most participants. In a previously published report on a subsample of the present one (n=3770; Kaller et al., 2016) 95% of the participants finalized the overall task (inclusive instructions) after 22 min. Thus, in most cases the pre-specified time was sufficient. In addition, a 1-min time limit per trial was implemented, like in the original study of Shallice (1982). To avoid unnecessary frustration (and a reduced compliance and/or motivation in subsequent tests, for instance, in a clinical setting), the TOL-F allows for an automatic cancellation of the test if the time limit of a single trial is exceeded three times in a row. In Table 2, right column, the percentage of test cancellations due to exceeding time limits after three times is presented as a function of age, education level, and sex. As becomes obvious, cancellation rate considerably increased from 40 to 80 years. As for the automatic cancellation of the test if the time limit of a single trial was exceeded three times in a row, the percentage of participants who failed to finalize the test session within 20 min clearly increased with age (Table 2, rightmost column). Statistical analyses did not reveal any biases depending on educational level, that is, time out rate was not increased in older participants with low compared to high education. Further details of the experimental procedure and the problem set used are described in Kaller et al. (2016). The TOL-F was the only cognitive test, and thus the only digitally provided test, during the GHS-procedure. It was embedded in a series of non-cognitive medical examinations comprised in the GHS.

Dependent measures

For assessment of individual planning ability with the TOL-F, overall planning accuracy, defined as the percentage of problems that were correctly solved in the minimum number of moves, is regarded as the primary outcome variable of interest. The TOL-F provides three different levels of minimum moves (four-, five-, and six-move problems, eight of each) resulting in an overall planning accuracy of max. 24 problems.

Data Analyses

Analyses of variance

Analyses of variance (ANOVAs) on planning accuracy as dependent variable were conducted using IBM SPSS Statistics for Windows (Version 23.0.0.2) to test for main effects and interactions of the between-subjects factors Age Group, Education Level, and Sex.

Reliability estimates

In accordance with the study of Kaller et al. (2016) and based on the revised review model for the description and evaluation of psychological and educational tests (Version 4.2.6; http://www.efpa.eu/professional-development/assessment) recently suggested by the Board of Assessments of the European Federation of Psychologists’ Associations (EFPA, 2013), the following estimates of reliability are reported: Lambda 2 (λ2), lambda 3 (λ3) reflecting Cronbach alpha (α), lambda 4 (λ4), omega total (ωtot), and the greatest lower bound (glb). While all these indices seek to provide estimates of the lower bound of true test reliability, they differ with respect to their exact assumptions and their computation. Guttman’s lambda 3 reflects the mean of all split-half reliabilities, but is said to often underestimate true reliability (Revelle & Zinbarg, 2009; Sijtsma, 2009). Compared to lambda 3, lambda 2 additionally takes into account inter-item covariance. As the sum of squares of covariances is used, lambda 2 will in the vast majority of cases be higher than lambda 3 but never lower (Guttman, 1945). Lambda 4 is calculated by dividing the total pool of items into two halves in such a way that the covariance between scores on the two halves is as high as possible, it should thus represent the greatest split-half reliability that can be attained. Sijtsma (2009) recommended the glb as the best estimate of the lower limit of true reliability. Based on classical test theory, observed scores are considered as the sum of the true covariance matrix between items and the diagonal matrix of item error covariances. Estimating the glb is then pursued by finding the error matrix whose sum of diagonal elements is maximum, while both the resulting true item covariance matrix and the error covariance matrix are still valid (that is, non-negative definite) covariance matrices (Bendermacher, 2017). Revelle and Zinbarg (2009) favored the alternative estimate omega that represents the total reliable variance estimated by a factor model as it may often be closer to the true value than glb, and often reaches higher values. In their study, glb actually never provided the highest estimate. Only recently, Tunstall, O’Gorman, and Shum (2016) published reliability estimates on a Tower of London version. In addition to Cronbach’s alpha, they also provided lambda 4 (λ4), omega total (ωtot), and the glb. Reporting these indices here thus additionally facilitates comparisons to the present findings. All indices were computed for the overall sample as well as for the respective age subgroups using the psych package (Version 1.3.2; Revelle, 2013) for the R open-source statistical software (Version 3.4.3; R Core Team, 2013).

Normative data

Normative data in the tables contain rounded raw cumulative percentages sorted by the total number of correctly solved problems. No z-transformation or smoothing was applied.

RESULTS

Effects of Age, Education Level, and Sex on Planning Accuracy

An ANOVA with the between-subjects factors Age Group (eight 5-year intervals), Education Level (low vs. high), and Sex (male vs. female), and planning accuracy as dependent variable revealed significant main effects of Age Group (F (7,7671)=166.51; p<.001; η 2 partial=.132), Education Level (F (7,7671)=124.43; p<.001; η 2 partial=.016), and Sex (F (1,7671)=141.11; p<.001; η 2 partial=.018). As evident from Table 2, planning accuracy decreased with age and was reduced in less educated as well as in female participants. Beside these main effects, present data also reveal a significant three-way interactions of Age Group by Education Level and by Sex (F (7,7671)=2.43; p=.018; η 2 partial=.002). Graphical analyses suggest that the mean difference of approximately one more solved problem in higher compared to lower educated participants is rather stable for men and women across the life span, however, with one exception: Women in the age group of 55 to 59.99 years revealed equal planning performances for both education levels, which should explain the significant triple-interaction. No other interactions reached significance (highest F=1.34; lowest p=.225).

Reliability Estimates for Overall Planning Accuracy

Reliability estimates are provided in Table 3. The five different estimates of the overall sample on reliability ranged between .715 and .757. As in the preceding analyses of Kaller et al. (2016), in both the overall samples and in the respective age groups, estimates were highest for glb and λ , whereas λ or Cronbach’s α yielded the lowest estimate in all cases.
Table 3.

Reliability estimates of the Tower of London (TOL-F) task

(Sub)SampleNAge M±SD (yr)Sex m, f (N)ED. L. low, high (N)λ2 λ3 (α)λ4 ωtot glb
Overall sample770360.00±10.593962, 37414669, 30340.7190.7150.7550.7320.757
40.00–44.99 years69443.05±1.25347, 347292, 4020.6390.6310.7080.6510.722
45.00–49.99 years97347.49±1.48454, 519489, 4840.6330.6250.6970.6450.690
50.00–54.99 years108952.43±1.46590, 499565, 5240.6680.6560.7170.6740.726
55.00–59.99 years111657.47±1.45560, 556634, 4820.6330.6250.6970.6480.728
60.00–64.99 years108562.50±1.42564, 521711, 3740.6690.6610.7350.6810.730
65.00–69.99 years102767.33±1.42528, 499707, 3200.6810.6740.7380.6960.757
70.00–74.99 years102872.52±1.4532, 496770, 2580.6870.6790.7440.7040.752
75.00–79.99 years69177.18±1.42387, 304501, 1900.7370.7290.7970.7540.802

Note. Reliability estimates for the overall sample and age-related subgroups on the TOL-F.

M=mean; SD=standard deviation; yr=years; m=male; f=female; N=number of participants in the (sub-)sample.

Reliability estimates of the Tower of London (TOL-F) task Note. Reliability estimates for the overall sample and age-related subgroups on the TOL-F. M=mean; SD=standard deviation; yr=years; m=male; f=female; N=number of participants in the (sub-)sample.

Standardization

Normative data for age groups and education level are provided in Table 4. Sex-adjusted versions of this table are presented in Tables 5 and 6 for women and men, respectively.
Table 4.

Normative data of the Tower of London (TOL-F) task adjusted for age and education

Low educationHigh education
TOL40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years
10000000000000001
20000001200000001
30000012300000001
40010113600001114
50010136800001315
6112146101500102328
71333691623002146816
834561114223112446101324
9788121521304033789162132
10111113172430424946101313212941
111719212633425561910151820283853
1224293036465267721317222630385065
1333374145566576802223293840536673
1445495357687585883131414754607678
1555616470778690924242535765718486
1666727380859194965456647075829291
1776848488909698986669747985899596
18859092949597991007680828791959899
199295969798999910087888995959799100
209698989999100100100949396979899100100
219899991001001001001009898989910099100100
221001001001001001001001001009999100100100100100
23100100100100100100100100100100100100100100100100
24100100100100100100100100100100100100100100100100
N 292489565634711707770501402484524482374320258190

Percentile ranks of TOL performance, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed.

Table 5.

Age and education adjusted normative data of the Tower of London (TOL-F) task for women

Low educationHigh education
TOL40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years
10000000000000010
20010001100000012
30010012300000012
40010124600001226
500112461000101346
61121671218002023410
7124491019271053771112
8246713162636127611111326
96811131924364734111015161836
10121118192933485557141721252942
1119202629394762701013182431383956
1229313439555874811521293441495270
1341404848657181882628365155636474
1453526259787890953837466266717680
1563647172838893974749607176848788
1671767982899396995963707986909394
17818889899496981007277779092949596
188994959497989910083888495949898100
1995989897999910010092949397989899100
209899989999100100100959798989998100100
2199999910010010010010098999910010099100100
2210010010010010010010010010099100100100100100100
23100100100100100100100100100100100100100100100100
24100100100100100100100100100100100100100100100100
n1712842913613983954132541762352081951231048350

Percentile ranks for females, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed.

Table 6.

Age and education adjusted normative data of the Tower of London (TOL-F) task for men

Low educationHigh education
TOL40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years40–44 years45–49 years50–54 years55–59 years60–64 years65–69 years70–74 years75–79 years
10000010100000001
20000011200000001
30000011300000001
40000113500000104
51000125700001204
611112481200002228
72311471318011036717
84435812182611233101324
9775101118243333476152231
10912816192734434661010193041
11141816222637465387121415243852
1217272532354659631212182024324963
1321333540445871721918252933476673
1433444355567180812626373648557678
1543575767688287883836494860658386
1659676778798893945050616370779289
1769797988869598976162727282879596
1880849093939799997172818290939899
198792939796989910083818693949799100
209497979999100100100929095969799100100
2197999910010010010010098989799100100100100
2299100100100100100100100999998100100100100100
23100100100100100100100100100100100100100100100100
24100100100100100100100100100100100100100100100100
n121205274273313312357247226249316287251216175140

Percentile ranks for males, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed.

Normative data of the Tower of London (TOL-F) task adjusted for age and education Percentile ranks of TOL performance, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed. Age and education adjusted normative data of the Tower of London (TOL-F) task for women Percentile ranks for females, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed. Age and education adjusted normative data of the Tower of London (TOL-F) task for men Percentile ranks for males, separately for low and high education and age groups. In the left-most column, the number of correctly solved TOL-problems is listed.

DISCUSSION

As expected, the effects of age, education level, and sex on Planning Accuracy replicate the results of Kaller et al. (2016) who also explicitly discussed these effects. These results also concur well with the findings of D’Antuono (2017) and Boccia et al. (2017) who reported the same effects with similar effect sizes in large Italian samples. Apparently, effects of these demographic variables are quite comparable and may thus be generalizable at least to Western Europe. As performance in the TOL is associated to sociodemographic and economic factors such as education, whether our normative data can be generalized to other samples from other societies will likely depend on their comparability with respect to such factors. Moreover, especially regarding the age-related performance trajectory, the public health system will most likely play an additional role, given the impact of cardiovascular factors such as elevated blood pressure and cardiac disease on TOL performance and their increasing prevalence in older adult age (Gold et al., 2005; Jefferson, Poppas, Paul, & Cohen, 2007). The reason behind the triple-interaction of Age Group by Education Level and by Sex is not easily accounted for and any attempts to do so are highly speculative. Considering the small F-value and low effect size in such a large sample, one can question the meaningfulness of this effect. More importantly, findings like this clearly demonstrate the necessity of a fine-grained standardization of the test. The reliability estimates were adequate and we could repeatedly show that estimates were highest for glb and λ , whereas λ or Cronbach’s α yielded the lowest value in all cases. This ordering of reliability estimates for the TOL-F is in line with those observed for a non-computerized four-disc TOL variant (TOL-4D) recently put forward by Tunstall et al. (2016). However, reliability estimates of the TOL-4D in adults were substantially lower reaching only a glb of .65, a λ4 of .56, a ωtot (omega) of .35, and a Cronbach’s α of .27. Thus, although the current reliability estimates clearly exceed those of the TOL-4D, both findings strongly conform with the argument of Sijtsma (2009) that the λ3 measure (or Cronbach α) often constitutes a gross underestimate. Thus, Sijtsma (2009) recommended glb as a better alternative that was extensively discussed in the earlier reports (Kaller, Unterrainer, Kaiser, et al., 2012; Kaller et al., 2016). Obviously, reliability estimates of the TOL-F remained stable on this level as they only minimally changed compared to the data of Kaller et al. (2016), even though estimates in that study was based on roughly half of the current sample size (Kaller et al., 2016; n=3770; current study: n=7703). Taken together, these results suggest that the TOL-F features an adequate and satisfactory reliability with estimates based on glb and λ attaining values close to or exceeding .7 for the overall sample as well as for all age-groups (Table 3). Thus, the TOL-F succeeded in overcoming the seemingly contradictory demands of providing items that rely on novel situations to overcome routine behavior as defined for executive functions on the one hand, and to offer a homogeneous, limited set of problems exhibiting sufficient reliability on the other hand. As one could expect from the strong main effects of the reported ANOVA above, there are notable differences in the distribution of the percentiles in the normative data depending on age, education, and sex. To give an example: If a highly educated man aged 45 solved 50%, that is, 12 of the 24 TOL problems correctly, he only scores at the 12th percentile. Assuming values equal to or greater than the 16th percentile as the lower end of the normal performance range, his planning ability can be rated as below average. In contrast, a 45-year-old man with low education and 12 correctly solved problems reaches the 27th percentile and is thus well in the normal range. Please note that even in most subgroups of the sex-separated normative data the number of subjects reaches a minimum of 100 to 150 and thus meets the demands for an excellent sample size as suggested by the EFPA (2013). Only for the highly educated female participants, cell sizes for the two oldest subgroups (ages 70 to 80 years) are considerably lower (83 and 50, respectively), although the recruitment of the GHS study set highest standards to cover a representative population based sample. This presumably reflects the more limited access of females to higher education compared to males 70 or 80 years ago.

Limitations

When translating these results to other TOL studies, one has to consider some special characteristics of the TOL-F version used in the presented study. First, single items were time-limited to 1 min. Although Tim Shallice used the same time restriction in his original version, other versions have longer (e.g., Culbertson & Zillmer, 1998; Schnirman et al., 1998; 2 min for each trial) or no time limits reported (e.g., Krikorian et al., 1994), respectively. Second, participants had to solve the problems using a touch-screen, not by computer mouse. Especially among the older participants, many were unexperienced in computer use and handled a computer mouse for the first time at the beginning of the GHS. Thus, we switched to using a touchscreen version, which has proven feasible and advantageous for elderly participants. Third, an overall time limit of 20 min was introduced, mainly to avoid delays in the subjects’ schedules at the GHS. As available time is a very critical issue in clinical assessment, this overall testing limitation should guarantee the tasks’ suitability for both research and clinical application. Cancellation rates due to the 20-min limit even in the oldest subgroups remained rather low (max. 14%; Table 2) justifying this consideration. Moreover, there was no bias with respect to education level, unduly constraining participants with lower education, which might have led to an underestimation of their performance. As we cannot avoid age-related slowing from impacting performance, we provide age group-wise normative data, hereby ensuring a presumably less biased basis for comparisons across ages.

CONCLUSION

The TOL-F was shown to possess adequate psychometric properties that are stable across the adult life span. The 24-item version covers a broad range of graded difficulty even in healthy adults, which makes this task suitable for both research and clinical application. The reported normative data enable assessment of individual planning performance compared to a comprehensive representative age-, sex-, and education-fair sample. This in combination with use of a computerized task version should ease and standardize the use of the Tower of London task. We thank an unknown reviewer who commented that “This is an important time in the field of neuropsychology as the need to use technology to improve our assessments is vital to the sustainability of the field. You need to show that technology is an inclusive model for assessing all individuals and provide sufficient information to pass the high level of scrutiny that computerized tests will endure from clinicians.” We hope that our study could help to support this development in neuropsychological assessment.
  20 in total

1.  The Tower of London spatial problem-solving task: enhancing clinical and research implementation.

Authors:  W Keith Berg; Dana Byrd
Journal:  J Clin Exp Neuropsychol       Date:  2002-08       Impact factor: 2.475

2.  Spanish normative studies in young adults (NEURONORMA young adults project): norms for Stroop Color-Word Interference and Tower of London-Drexel University tests.

Authors:  T Rognoni; M Casals-Coll; G Sánchez-Benavides; M Quintana; R M Manero; L Calvo; R Palomo; F Aranciva; F Tamayo; J Peña-Casanova
Journal:  Neurologia       Date:  2012-05-30       Impact factor: 3.109

3.  A basis for analyzing test-retest reliability.

Authors:  L GUTTMAN
Journal:  Psychometrika       Date:  1945       Impact factor: 2.500

4.  Towers of Hanoi and London: Reliability and Validity of Two Executive Function Tasks.

Authors:  G E Humes; M C Welsh; P Retzlaff; N Cookson
Journal:  Assessment       Date:  1997-09

5.  Assessing planning ability with the Tower of London task: psychometric properties of a structurally balanced problem set.

Authors:  Christoph P Kaller; Josef M Unterrainer; Christoph Stahl
Journal:  Psychol Assess       Date:  2011-08-22

6.  A four-disc version of the Tower of London for clinical use.

Authors:  Jenny R Tunstall; John G O'Gorman; David H K Shum
Journal:  J Neuropsychol       Date:  2014-11-25       Impact factor: 2.864

7.  Hypertension and hypothalamo-pituitary-adrenal axis hyperactivity affect frontal lobe integrity.

Authors:  Stefan M Gold; Isabel Dziobek; Kimberley Rogers; Abdul Bayoumy; Pauline F McHugh; Antonio Convit
Journal:  J Clin Endocrinol Metab       Date:  2005-03-22       Impact factor: 5.958

8.  The Tower of London(DX): a standardized approach to assessing executive functioning in children.

Authors:  W C Culbertson; E A Zillmer
Journal:  Arch Clin Neuropsychol       Date:  1998-04       Impact factor: 2.813

9.  Systemic hypoperfusion is associated with executive dysfunction in geriatric cardiac patients.

Authors:  Angela L Jefferson; Athena Poppas; Robert H Paul; Ronald A Cohen
Journal:  Neurobiol Aging       Date:  2006-02-15       Impact factor: 4.673

10.  On the Use, the Misuse, and the Very Limited Usefulness of Cronbach's Alpha.

Authors:  Klaas Sijtsma
Journal:  Psychometrika       Date:  2008-12-11       Impact factor: 2.500

View more
  3 in total

1.  Status and predictors of planning ability in adult long-term survivors of CNS tumors and other types of childhood cancer.

Authors:  Mareike Ernst; Ana N Tibubos; Josef Unterrainer; Juliane Burghardt; Elmar Brähler; Philipp S Wild; Claus Jünger; Jörg Faber; Astrid Schneider; Manfred E Beutel
Journal:  Sci Rep       Date:  2019-05-13       Impact factor: 4.379

2.  The Importance of Childhood for Adult Health and Development-Study Protocol of the Zurich Longitudinal Studies.

Authors:  Flavia M Wehrle; Jon Caflisch; Dominique A Eichelberger; Giulia Haller; Beatrice Latal; Remo H Largo; Tanja H Kakebeeke; Oskar G Jenni
Journal:  Front Hum Neurosci       Date:  2021-01-28       Impact factor: 3.169

3.  Novel 3-D action video game mechanics reveal differentiable cognitive constructs in young players, but not in old.

Authors:  Tomihiro Ono; Takeshi Sakurai; Shinichi Kasuno; Toshiya Murai
Journal:  Sci Rep       Date:  2022-07-21       Impact factor: 4.996

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.