Literature DB >> 35199852

The identification of gifted underachievement: Validity evidence for the commonly used methods.

Abstract

BACKGROUND: Much confusion exists about the underachievement of gifted students due to significant variations in how the phenomenon has been identified. From a review of the literature, five methods were found to be commonly used to identify gifted underachievement. AIMS: The purpose of the study was to assess the equivalence of the commonly used methods to identify gifted underachievement, and to determine which of these methods may be optimal. SAMPLE: Data were collected from a school in Sydney, Australia.
METHOD: Three measures of convergence (i.e., difference in proportions, phi association, and kappa agreement) were used to assess the equivalence of the identification methods, while latent class analysis was used to determine the optimal identification method.
RESULTS: The convergence evidence suggested that the commonly used identification methods may not be considered convergent, while the criterion evidence indicated that one of the five identification methods may have strong levels of criterion validity.
CONCLUSIONS: A conclusion was reached that the simple difference method may be the most valid method to identify gifted underachievement.

Entities: Chemical

Keywords: gifted education; identification; measurement; simple difference method; underachievement

Mesh：

Year: 2022 PMID： 35199852 PMCID： PMC9543815 DOI： 10.1111/bjep.12492

Source DB: PubMed Journal: Br J Educ Psychol ISSN： 0007-0998

Background

Although giftedness may be perceived to provide advantage in the modern world (Jung, 2014), gifted students have in fact been reported to be suffering from an ‘epidemic’ of underachievement (Rimm, 2003). This disturbing claim is supported by evidence which indicates that up to, or even greater than, half the population of gifted students exhibit significant academic underachievement (although the most commonly reported prevalence rates are between 10% and 20%; Hsieh, Sullivan, & Guerra, 2007; Rimm, 2003; Steenbergen‐Hu, Olszewski‐Kubilius, & Calvert, 2020; White, Graham, & Blaas, 2018). Of note, the disappointing educational outcomes for many gifted students do not appear to be fully explainable by the commonly recognized sources of disadvantage, including socio‐economic status and ethnic background. The various socio‐emotional factors associated with the learning needs of these students (i.e., self‐perceptions, self‐concept, self‐efficacy, self‐regulation, attitudes towards teachers/school, anxiety, emotional engagement, and goal orientations) also appear to give rise to underachievement (Coleman & Cross, 2014; Desmet, Pereira, & Peterson, 2020; Gilar‐Corbi, Veas, Miñano, & Castejón, 2019; Masden, Leung, Shore, Schneider, & Udvari, 2015; Rimm, 2003; Steenbergen‐Hu et al., 2020; White et al., 2018). When educators attempt to respond to the problem of underachievement in gifted students, they may not always find clear guidance from the literature due to the lack of consensus among scholars (Gorard & Smith, 2004; Steenbergen‐Hu et al., 2020). One of the major reasons for such confusion may lie in the significant variations in how gifted underachievement has been identified (Mofield & Parker Peters, 2019; White et al., 2018). Of note, Dowdall and Colangelo (1982) indicated that the use of different identification methods by researchers may lead to the study of populations that are, effectively, different, while White et al. (2018) proposed that ‘(m)easurement inconsistencies … (may) lead to variation in the number and type of students participating in gifted education research, which affects the comparability, validity and applicability of research findings’ (p. 56). To address the problem, Reis and McCoach (2000) proposed ‘an imperfect, yet workable operational definition’ of gifted underachievement to be ‘a severe discrepancy between expected achievement (as measured by cognitive/intellectual ability assessments or standardized achievement test scores) and actual achievement (as measured by class grades and teacher evaluations)’ (p. 157). It is noted that this discrepancy must not be due to a disability (e.g., specific learning disability) and must last for an extended period of time (Reis & McCoach, 2000; Ridgley, Rubenstein, & Callan, 2020). The definition appears to have achieved wide acceptance in the identification of gifted underachievement today (Gilar‐Corbi et al., 2019; Mofield & Parker Peters, 2019; Steenbergen‐Hu et al., 2020). Nevertheless, substantial inconsistencies continue to exist in the manner in which expected and actual achievement are compared to establish gifted underachievement. At present, four types of methods appear to be commonly used to identify underachievement in gifted students (McCoach & Siegle, 2014; Steenbergen‐Hu et al., 2020). In order of popularity, they are as follows: The absolute split (‘ABS’) method (e.g., Vlahovic‐Stetic, Vidovic, & Arambasic, 1999): Gifted underachievement is deemed when the level of expected achievement is above the threshold for giftedness (e.g., top 10%), and the level of actual achievement is below the threshold for poor achievement (e.g., below average); The nomination (‘NOM’) method (e.g., Cavilla, 2015): Gifted underachievement is deemed when a student is nominated as exhibiting gifted underachievement on the basis of the observations of nominees (e.g., teachers); The simple difference (‘SDF’) method (e.g., Lau & Chan, 2001): Gifted underachievement is deemed when the standardized level of actual achievement is substantially below the standardized level of expected achievement (i.e., commonly a difference of at least one standard deviation), for a student identified as being gifted (e.g., the level of expected achievement is above the threshold for giftedness); and The regression (‘REG’) method (e.g., Redding, 1990): Gifted underachievement is deemed when the level of expected achievement, as predicted using the simple linear relationship between the expected and actual achievement scores for the cohort of interest (i.e., identified through regression analysis), is substantially below the level of actual achievement (i.e., commonly a difference of at least one standard error) for a student identified as being gifted (e.g., the level of expected achievement is above the threshold for giftedness). These methods of identification of gifted underachievement may be classified according to whether they rely on statistical techniques that compare quantitative assessments of expected and actual achievement (i.e., the absolute split method, the simple difference method, and the regression method), or qualitative judgements of individuals (i.e., the nomination method). Furthermore, as two clear variants of the type of thresholds used with the absolute split method were identified in the literature (i.e., a threshold based on student rank and a threshold based on student raw achievement scores), the absolute split method may be further classified into two separate identification methods (i.e., the absolute split I method [a threshold based on achievement rank] and the absolute split II method [a threshold based on raw achievement scores]). To date, three empirical attempts have been made to compare the various methods used to identify gifted underachievement. First, Annesley, Odhner, Madoff, and Chansky (1970) investigated a number of these methods (i.e., the simple difference method, the regression method, and the nomination method) with a group of 157 first grade students, and concluded that the different methods do not identify the same group of students. Thereafter, Lau and Chan (2001) investigated these methods (i.e., the simple difference method, the regression method, the absolute split I method, and the nomination method) with a group of 126 Grade 7 students. In contrast to Annesley et al. (1970), Lau and Chan (2001) reached the conclusion that the three statistical methods were in agreement. Most recently, Gilar‐Corbi et al. (2019) compared the rates of identification of gifted underachievement among 164 gifted seventh and eighth grade students using two of these methods (i.e., the simple difference method and the regression method), along with the less common Rasch method, and found significant differences in the rates of identification according to the method that was used. A further study by Cheung and Rudowicz (2003), that included both gifted and non‐gifted students, investigated underachievement among 2,720 students in Grades 8 and 9 by comparing six variations of the commonly used identification methods to reach the conclusion that these identification methods ‘were far from identical’ (p. 310). Some possible reasons for the inconsistent findings across these studies may lie in the relatively small sample sizes that were generally utilized, the recruitment of participants from different grade levels, the recruitment of participants from only one or two grade levels, the employment of a restricted combination of expected and actual achievement measurements, and the differing use of statistical techniques to assess convergence among the different identification methods. It is noteworthy that the problems associated with the identification of underachievement extend beyond the field of gifted education. For example, Smith (2010), in reference to the broad field of education, suggested that ‘underachievement is a term over which there is little consensus’ (p. 41). Multiple fields outside of education may also be affected, as some methods that have been used to identify underachievement have also been used in the fields of sociology (Smith, 2010), psychology (Preckel, Holling, & Vock, 2006; Tuss, Zimmer, & Ho, 1995), law (Maki, Floyd, & Roberson, 2015; Zirkel & Thomas, 2010), and policy (Sikora & Saha, 2011) on topics including disability (Mayes, Waschbusch, Calhoun, & Mattison, 2019; Van den Broeck, 2002), delinquency (Hoffmann, 2020; Timmermans, vanLier, & Koot, 2009), motivation, child development (Maynard, Waters, & Clement, 2013; McCall, Beach, & Lau, 2000), resilience, ethnicity, social class, and gender gaps (Strand, 2014; Wood, 2003).

Research questions

The main purpose of this study was to produce guidance for researchers and practitioners in the selection of the best method(s) to identify gifted underachievement, to resolve a long‐standing concern first expressed over five decades ago (Farquhar & Payne, 1964), and which continues to this day (Desmet et al., 2020; Ridgley et al., 2020). To this end, a rigorous review and assessment of the various methods that are commonly used to identify gifted underachievement was undertaken, with a focus on the validity of the use of these methods (Kane, 2013). The following research questions directed the investigation: Are the different methods commonly used to identify gifted underachievement equivalent? What is the optimal method to identify gifted underachievement?

Method

Selection of identification methods for assessment

Each of the commonly used methods to identify gifted underachievement, with the two common variations of the absolute split method treated as separate methods (i.e., the absolute split I, absolute split II, nomination, simple difference, and regression methods), were assessed in this study. For the statistical identification methods, the thresholds that were adopted to discriminate between achievement and underachievement were those that were commonly adopted in the literature (i.e., a difference of one standard deviation between expected and actual achievement for the simple difference method [Lau & Chan, 2001; White et al., 2018], a difference of one standard error between expected achievement and actual achievement for the regression method [Redding, 1990], a 50th percentile rank as the threshold for ‘poor’ achievement in the absolute split I method [Staudt & Neubauer, 2006], and a raw score of 80% as the threshold for ‘poor’ achievement in the absolute split II method). Table 1 provides details on the five identification methods that were assessed.

Table 1

Methods used to identify gifted underachievement

Method of identification	Threshold	Type of method
Absolute split I	Actual achievement < 50^th percentile	Statistical
Absolute split II	Actual achievement < 80% raw score	Statistical
Simple difference	Actual achievement < expected achievement by 1 standard deviation	Statistical
Regression	Actual achievement < expected achievement by 1 standard error of estimate	Statistical
Nomination	Personal judgement of nominator	Nomination

Methods used to identify gifted underachievement

Sample

In recognition of the relatively small sample sizes that were utilized in the three previous studies on the topic with gifted students (Annesley et al., 1970; Gilar‐Corbi et al., 2019; Lau & Chan, 2001), a large sample size was targeted for this study. Specifically, archive data were obtained from a co‐educational K‐12 Independent school located in the south‐western suburbs of Sydney, Australia. At the time of data collection, the school, which had been in operation for over three decades, had a total enrolment of over 1,300 students, of which, 53% were male students and 41% had a language background other than English. The school’s Index of Community Socio‐Educational Advantage, which is a measure of the socio‐economic status of the students who attend the school, placed the school on the 70th percentile of Australian schools (ACARA, 2013). In accordance with Gagné (2004, 2009, 2013), whose model of giftedness acknowledges underachievement (Jung, 2022), only data relating to current or former students at the school whose expected achievement level placed them within the top 10% of age peers (and were therefore classifiable as gifted students) were selected for inclusion.

Instruments

Statistical methods

Over the history of the school from which the data were obtained, many different instruments were used, in different periods, and for different grade groups, to assess expected and actual achievement of students. These instruments include the: Otis–Lennon School Ability Test (OLSAT): A group test of cognitive abilities that produces scores for verbal ability (i.e., Verbal Score [VS]), non‐verbal ability (i.e., Non‐verbal Score [NV]), and composite ability (i.e., School Ability Index [SAI]). Higher School Certificate (HSC): The highest educational credential that a student can receive in the state of New South Wales after 13 years of education (i.e., K‐12). Students who are awarded the HSC are assessed in multiple self‐selected subject areas. Nevertheless, for the purposes of this study, only the results for English and mathematics were utilized, as almost all students complete these subjects. School Certificate (SC): A recently retired educational credential awarded to students in the state of New South Wales after the completion of Grade 10, and prior to the commencement of the HSC. Students who are awarded the SC complete external examinations in English and mathematics. National Assessment Program ‐ Literacy and Numeracy (NAPLAN): A series of tests that students sit in Grades 3, 5, 7, and 9, which monitor skills, and provide measurements, in literacy and numeracy. School assessments (SA): The internal assessments used by the school to determine student progress in mastering the content of courses. For comparability to the other available data, a focus was given to school assessment results in English and mathematics. The archive data obtained comprised all data for Grade 7 to 12 students at the school for more than ten years. The school has systematically tested all students and has maintained consistent records of multiple measurements of achievement for every student. Specifically, the archives contained OLSAT results for 2,501 students, HSC results for 1,267 students, SC results for 1,207 students, NAPLAN results for 2,599 students, and 367,567 individual SA results. For the purposes of this study, the individual SA results were combined using school‐specified weights to arrive at an overall score for each student in English and mathematics across the junior high school years (Grades 7 to 10; i.e., ‘Junior SA’ for English and mathematics) and senior high school years (Grades 11 to 12; i.e., ‘Senior SA’ for English and mathematics; a total of 6,511 records). Nevertheless, there were some limitations to the data, as each instrument was not administered to all students, some instruments were only administered over a limited number of years (e.g., 2006–2014 for the OLSAT), while other instruments were only administered to students at certain grade levels (e.g., Grades 3, 5, 7, and 9 for the NAPLAN). Consequently, there were differing sample sizes for each combination of instruments used to identify gifted underachievement. Table 2 provides details on the calendar years and grade levels for which the archives contained data for each instrument (e.g., the OLSAT data relates to the performance of Grade 6 students at the school who were administered the instrument from 2006 to 2014).

Table 2

Outline of the administration of instruments used

Instrument	Calendar years of administration (inclusive)	Grades administered
OLSAT	2006–2014	Grade 6
HSC	2002–2014	Grade 12
SC	2001–2011	Grade 10
NAPLAN	2008–2013	Grades 3, 5, 7, 9
SA	2002–2014	Grades 7 to 12

Outline of the administration of instruments used

Nomination method

As the archives did not include any nomination data, such data were newly collected by requesting current teachers at the school from which archive data were accessed to complete a survey. Although nominations may be made by a number of parties (e.g., teachers, parents, peers, and students), data from teachers were collected in this study, as teacher nominations appear to be the most commonly utilized by researchers to identify gifted underachievement (for some exceptions, see Flint, 2002, 2002; Lau & Chan, 2001). No published teacher nomination instruments for the identification of gifted underachievement could be located in the literature. Nevertheless, most researchers appear to require teachers to classify each student in their classes as exhibiting achievement or underachievement on the basis of their observations (Annesley et al., 1970; Cavilla, 2015; Jones & Myhill, 2004; White et al., 2018), with some also requiring reasons for the nominations (Dunne & Gazeley, 2008; Lau & Chan, 2001). In addition, some researchers appear to provide guidance to teachers on gifted underachievement before they are asked to nominate students (Kanevsky & Keighley, 2003). All these elements of teacher nominations were incorporated into the teacher nomination survey that was developed for this study. Specifically, a presentation was made to Grade 7 to 12 teachers at the school on gifted underachievement, following which, the teachers were asked to reflect on their current students (for whom they had just prepared academic reports) and: (a) nominate students in their classes who exhibit gifted underachievement (i.e., defined as ‘achievement significantly below their potential in the past semester’), (b) indicate the classes the students were enrolled in, and (c) provide reasons for their nominations. The nomination instrument was presented as a self‐administered online form, which was emailed to all Grade 7 to 12 teachers at the school at the conclusion of the presentation. The teachers nominated students in each grade across 79 Grade 7 to 12 classes in all subject areas, for an average of 1.4 nominations in each class. None of the 122 teacher nominations that were received needed to be removed from consideration due to the non‐provision of reasons for nomination or the provision of inappropriate reasons (Dunne & Gazeley, 2008; Lau & Chan, 2001). Generally, students were nominated due to the observation of some common characteristics of underachieving gifted students (79% of responses) and/or comparisons of the expected and actual achievement of the students (29% of responses). Some examples of student characteristics cited by the participating teachers as indicative of underachievement included a substantial difference in the quality of work produced and verbal contributions in class, procrastination, rushed work, and argumentative task avoidance (among students placed in gifted classes). If the assumption is made that the students who were not nominated are classifiable as exhibiting achievement, the nomination process resulted in a total of 1,505 teacher classifications of gifted students as exhibiting achievement or underachievement.

Data preparation

One hundred and ten possible combinations of expected and actual achievement data existed in the archive data. Nevertheless, to ensure that only meaningful comparisons of expected and actual achievement data were made, and to reduce the total number of data combinations studied to a more manageable size, only those data combinations relating to the same type of ‘content’ were investigated in the study. For example, combinations of expected achievement in numeracy and actual achievement in mathematics were included, while combinations of expected achievement in numeracy with actual achievement in English were excluded. Moreover, any archive data were only considered to assess expected achievement if the assessment took place prior to the assessment of actual achievement. For example, data from the Higher School Certificate (HSC) could not be used to assess expected achievement as it is the final assessment of a student’s achievement before they leave high school. Collectively, this process reduced the number of expected/actual data combinations to 41 (nevertheless, it is noted that NAPLAN data, which could not be converted into raw percentage scores due to the non‐publication of maximum possible scores by the administering organization, could not be used to assess the absolute split II method). All such data were converted into standardized units, with a mean of zero and a standard deviation of one. It is noted that for analysis relating to the nomination method, as the collected data related to multiple subject areas (e.g., a nomination could be made in science rather than English/literacy or mathematics/numeracy), meaningful comparisons to the other identification methods were only possible when the general School Ability Index of the OLSAT was used to assess expected achievement, and a weighted school assessment score in the relevant subject area for the immediately preceding semester (i.e., ‘SAR’) was used to assess actual achievement, for the other identification methods.

Analytic strategy

Different types of analyses were undertaken to address the two research questions that guided the study.

Equivalence of common methods

To address the first research question on the equivalence of the common identification methods, an assessment was made of the convergence of the results obtained from these methods. Following Zaki, Bulgiba, Ismail, and Ismail (2012), who proposed that comparisons of proportions, association, and agreement should all be used as convergence evidence, all three of these approaches were used in this study: Proportions: An examination of whether similar percentages of classifications are made (Ho et al., 2014). If the difference in percentage classifications is large and statistically significant, convergence is not supported. Association: A measurement of the degree to which two variables are related (Lau & Chan, 2001). If the association between variables is small or non‐significant, convergence is not supported. Agreement: A measurement of the degree to which two variables are equal (Agresti, 2013; Bland & Altman, 1999; Hanneman, 2008). If the level of agreement is weak, convergence is not supported. The collective consideration of the three approaches not only allowed for different perspectives to be gained on the convergence of the various identification methods, but also accommodated for the possible limitations associated with making an assessment using only one of the approaches (Bland & Altman, 1999; Zaki et al., 2012). Specifically, comparisons of only the proportions of gifted underachievement may be limited, because even if similar proportions of gifted underachievement are obtained for different identification methods, these methods may in fact identify different students as exhibiting gifted underachievement. Similarly, while assessments of association may suggest that the classifications obtained from two different identification methods may have a perfect relationship, this only indicates that the classification of gifted underachievement obtained using one identification method may be used to predict the classification that may be obtained using another identification method (i.e., a high association may not be used to infer that the classification using two different identification methods are the same, or even similar).

An optimal method

To address the second research question on the optimal identification method, the results obtained from the commonly used identification methods were compared to criterion values (i.e., values obtained from another source with more established validity, or a method that more thoroughly assesses the relevant variables to determine the value of interest; Kane, 2006, 2013), using the three above‐mentioned approaches. Alonzo and Pepe (1999) referred to the process of obtaining such evidence (i.e., criterion evidence) by making comparisons to criterion values, as the process of making an assessment of accuracy. The methods that are traditionally used to assess the relative merits of identification methods, such as receiver operating characteristic (ROC) curves, could not be used in this study, as not all of the common identification methods under investigation rely on a single continuous variable, and due to the difficulty in arriving at a suitable criterion from which sensitivity and specificity may be directly determined. As a result, latent class analysis was utilized (Collins & Huynh, 2014). Latent class analysis is undertaken in situations where multiple classification methods are used to classify the same variable, the true classification of which is not directly observable (Collins & Huynh, 2014). The procedure assumes that a true underlying condition (i.e., gifted underachievement) exists, which influences each of the different classification methods, and involves the calculation of an estimate of the most likely ‘true’ classification (i.e. the criterion estimate) by combining the classifications from each of the different methods. Scholars including Mammadov, Ward, Cross, and Cross (2016) have noted the possible benefits of this methodological tool with high ability populations.

Meta‐analyses

The statistical techniques used in this investigation produced results for each of the 41 combinations of expected achievement and actual achievement data, and for each possible pair of methods used to identify gifted underachievement. As such a large volume of results may lead to difficulty in the extraction of a single meaningful conclusion about each identification method, meta‐analyses were conducted to produce an overall weighted average estimate of an effect size for each identification method. Specifically, the meta‐analysis procedures outlined by Borenstein, Hedges, Higgins, and Rothstein (2009) were followed, whereby a weighted average of 41 raw effect sizes (from the 41 data combinations) was calculated, using the inverse of the variances of each effect size as weights. Lin and Sullivan (2009, p. 870) has recognized that such use of ‘the usual weights based on inverse variances’ is appropriate in meta‐analyses of summary results of this nature involving overlapping subjects. The use of these procedures is consistent with the recommendation by Steenbergen‐Hu and Olszewski‐Kubilius (2016) to use meta‐analytical techniques to increase the precision and reliability of research findings.

Results

Equivalence of identification methods

Convergence evidence was collected, using three approaches (i.e., comparisons of the proportions, association, and agreement of the commonly used identification methods) to ascertain the equivalence of the commonly used methods to identify gifted underachievement.

Proportions

As a first step, the proportion of students exhibiting gifted underachievement, as assessed using the 41 different combinations of expected/actual achievement data for each identification method, was determined. Thereafter, the differences in each of these proportions of gifted underachievement, for all possible pairs of the five identification methods, were calculated. A test that is commonly used to assess whether any difference between two sets of dichotomous classifications of the same group of individuals is statistically significant (i.e., McNemar’s test; Agresti, 2013; Roberts, Sheffield, McIntire, & Alexander, 2011; Tang, He, & Tu, 2012) was used to establish whether any of these differences in proportions were statistically significant. The results of the analyses are outlined in Table 3 (e.g., in the first row, the differences in proportions between each pair of identification methods were calculated using OLSAT School Ability Index [SAI] data as the measure of expected achievement, and NAPLAN Literacy results data as the measure of actual achievement). Positive values in the table indicate that the first specified method identified a higher proportion of gifted underachievement cases than the second specified method.

Table 3

Difference in proportions of gifted underachievement and mcnemar test results

Data Combination	ABSI/	ABSI/	ABSI/	ABSI/	ABSII/	ABSII/	ABSII/	SDF/	SDF/	REG/
(Expected–Actual Achievement)	ABSII	SDF	REG	NOM	SDF	REG	NOM	REG	NOM	NOM
OLSAT SAI–NAPLAN Lit	‐	‐0.24*	‐0.10*	‐	‐	‐	‐	0.12*	‐	‐
OLSAT SAI–NAPLAN Num	‐	‐0.09*	‐0.11*	‐	‐	‐	‐	‐0.02*	‐	‐
OLSAT VS– NAPLAN Lit	‐	‐0.20*	‐0.10*	‐	‐	‐	‐	0.06*	‐	‐
OLSAT NV–NAPLAN Num	‐	‐0.16*	‐0.10*	‐	‐	‐	‐	0.06*	‐	‐
OLSAT SAI–SC E	‐0.10	‐0.15*	‐0.13*	‐	0.05	0.03	‐	‐0.03	‐	‐
OLSAT SAI–SC M	‐0.13*	‐0.05	‐0.08	‐	‐0.08	‐0.05	‐	0.03	‐	‐
OLSAT VS–SC E	‐0.04	‐0.13	‐0.09	‐	0.09	0.04	‐	0.04	‐	‐
OLSAT NV–SC M	‐0.13*	‐0.13*	‐0.07	‐	0.00	‐0.07	‐	‐0.07	‐	‐
OLSAT SAI–HSC E	0.02	‐0.20*	0.45*	‐	‐0.22*	0.44*	‐	‐0.65*	‐	‐
OLSAT SAI–HSC M	0.04	‐0.25*	0.50*	‐	‐0.29*	0.46*	‐	0.75*	‐	‐
OLSAT VS–HSC E	0.03	‐0.22*	0.39*	‐	‐0.25*	0.36*	‐	0.61*	‐	‐
OLSAT NV–HSC M	0.10	‐0.31*	0.59*	‐	‐0.41*	0.48*	‐	0.90*	‐	‐
NAPLAN Lit–SC E	‐0.05	‐0.13*	‐0.18*	‐	‐0.08	‐0.13*	‐	‐0.05	‐	‐
NAPLAN Num–SC M	‐0.03	‐0.08*	‐0.07*	‐	‐0.05	‐0.04	‐	0.01	‐	‐
NAPLAN Lit–HSC E	0.00	‐0.41*	0.34*	‐	‐0.41*	0.34*	‐	0.76*	‐	‐
NAPLAN Num–HSC M	0.07	‐0.24*	0.56*	‐	‐0.31*	0.49	‐	0.80*	‐	‐
SC E–HSC E	‐0.01	‐0.26*	0.25*	‐	‐0.25*	0.26*	‐	0.51*	‐	‐
SC M–HSC M	‐0.01	‐0.30*	0.47*	‐	‐0.29*	0.48*	‐	0.77*	‐	‐
OLSAT SAI–Junior SA E	‐0.64*	‐0.25*	‐0.01	‐	0.39*	0.63*	‐	0.24*	‐	‐
OLSAT SAI–Junior SA M	‐0.28*	‐0.30*	‐0.08*	‐	‐0.01	0.20*	‐	0.22*	‐	‐
OLSAT VS–Junior SA E	‐0.61*	‐0.20*	‐0.02	‐	0.41*	0.59*	‐	0.17*	‐	‐
OLSAT NV–Junior SA M	‐0.31*	‐0.35*	‐0.04*	‐	‐0.03	0.27*	‐	0.30*	‐	‐
OLSAT SAI–Senior SA E	‐0.49*	‐0.28*	0.17*	‐	0.21*	0.67*	‐	0.45*	‐	‐
OLSAT SAI–Senior SA M	‐0.40*	‐0.35*	0.07*	‐	0.05	0.47*	‐	0.41*	‐	‐
OLSAT VS–Senior SA E	‐0.51*	‐0.24*	0.16*	‐	0.27*	0.67*	‐	0.40*	‐	‐
OLSAT NV–Senior SA M	‐0.43*	‐0.35*	0.08*	‐	0.09	0.51*	‐	0.42*	‐	‐
NAPLAN Lit–Junior SA E	‐0.52*	‐0.09*	‐0.09*	‐	0.43*	0.44*	‐	0.01	‐	‐
NAPLAN Num–Junior SA M	‐0.31*	‐0.44*	‐0.14*	‐	‐0.13*	0.18*	‐	0.30*	‐	‐
NAPLAN Lit–Senior SA E	‐0.52*	‐0.24*	0.08*	‐	0.28*	0.60*	‐	0.32*	‐	‐
NAPLAN Num–Senior SA M	‐0.40*	‐0.41*	0.05*	‐	‐0.02	0.45*	‐	0.47*	‐	‐
Junior SA E–SC E	‐0.01	0.00	‐0.11*	‐	0.01	‐0.09*	‐	‐0.11*	‐	‐
Junior SA M–SC M	‐0.06	‐0.01	‐0.06	‐	0.04	0.00	‐	‐0.04	‐	‐
SC E–Senior SA E	‐0.32*	‐0.20*	0.03	‐	0.12*	0.34*	‐	0.22*	‐	‐
SC M–Senior SA M	‐0.47*	‐0.28*	‐0.02	‐	0.19*	0.46*	‐	0.27*	‐	‐
Junior SA E–HSC E	0.00	‐0.33*	0.22*	‐	‐0.33*	0.22*	‐	0.54*	‐	‐
Junior SA M–HSC M	0.04	‐0.24*	0.47*	‐	‐0.29*	0.43*	‐	0.71*	‐	‐
Senior SA E–HSC E	‐0.02	‐0.28*	‐0.09	‐	‐0.26*	‐0.07	‐	0.20*	‐	‐
Senior SA M–HSC M	‐0.05	‐0.28*	0.15*	‐	‐0.23*	0.21*	‐	0.44*	‐	‐
Junior SA E–Senior SA E	‐0.46*	‐0.10*	0.00	‐	0.36*	0.45*	‐	0.09*	‐	‐
Junior SA M–Senior SA M	‐0.40*	‐0.10*	0.01	‐	0.32*	0.42*	‐	0.09*	‐	‐
OLSAT SAI–SAR	‐0.27*	‐0.26*	0.06*	‐0.21*	‐0.01	0.34*	‐0.02	0.32*	0.03	0.26*
Weighted average	‐0.29*	‐0.23*	0.00	‐0.21*	0.04*	0.34*	‐0.02	0.22*	0.03	0.26*
Weighted average of absolute values	0.29*	0.23*	0.12*	0.21*	0.18*	0.35*	0.02	0.23*	0.03	0.26*

ABSI = Absolute split I; ABSII= Absolute split II; E = English; Junior SA = School Assessment in Grades 7 to 10; Lit = Literacy; M = Mathematics; NOM = Nomination; Num = Numeracy; NV = Non‐Verbal Score; REG = Regression; SAI = School Ability Index; SAR = School Assessment for Relevant Subject; SDF = Simple difference; Senior SA = School Assessment in Grades 11 and 12; VS =Verbal Score.

< 0.05.

Difference in proportions of gifted underachievement and mcnemar test results OLSAT NV–NAPLAN Num ABSI = Absolute split I; ABSII= Absolute split II; E = English; Junior SA = School Assessment in Grades 7 to 10; Lit = Literacy; M = Mathematics; NOM = Nomination; Num = Numeracy; NV = Non‐Verbal Score; REG = Regression; SAI = School Ability Index; SAR = School Assessment for Relevant Subject; SDF = Simple difference; Senior SA = School Assessment in Grades 11 and 12; VS =Verbal Score. < 0.05. The observed differences in proportions of gifted underachievement classifications, reflecting the 41 combinations of expected and actual achievement data, varied from −0.65 to 0.90. The McNemar test results indicated that most (73%) of these differences in proportions for each data combination were statistically significant (p < 0.05), and therefore non‐supportive of convergence between the identification methods. The weighted averages of the differences in the proportions of gifted underachievement (range from −0.29 to 0.34), and the weighted averages of the absolute values of these differences in proportions of gifted underachievement (range from 0.02 to 0.35), as calculated using the approach of Borenstein et al. (2009), were mostly statistically significant, and therefore similarly non‐supportive of convergence between the identification methods. According to the weighted average calculations, the exception to these findings was some lack of statistically significant difference in the proportions of gifted underachievement cases for certain pairs of identification methods (i.e., absolute split II/nomination, simple difference/nomination, and absolute split I/regression). Therefore, some support was provided for the possibility of convergence between these identification methods. Nevertheless, for the two pairs of methods involving the nomination method, the support only appears to be tentative, as it was not derived from analyses relating to multiple expected achievement/ actual achievement data combinations.

Association

An assessment was made of the level of association between the results obtained from the various methods that identify gifted underachievement, by the calculation of phi coefficients (ϕ) which may be considered to be measures of the strength of the relationship (i.e., association) between two sets of categorical variables (Agresti, 2013; Tang et al., 2012). Phi coefficient values may be interpreted by making a comparison to threshold values. For example, Pett (1997) provides guidelines for weak (0.00 ‐ 0.29), low (0.30 ‐ 0.49), moderate (0.50 ‐ 0.69), strong (0.70 ‐ 0.89), and very strong (0.90 ‐ 1.00) levels of association. In comparison, Park, Riddle, and Tekian (2014) suggest that a phi coefficient value of greater than 0.70 may be required to establish convergence. The phi coefficient calculations for the results obtained from all the possible pairings of the identification methods are reported in Table 4. It is noted that only 38 of the 238 measurements of association (i.e., 16%) could be classified as strong or very strong according to Pett’s (1997) criteria (i.e., ϕ > 0.70). A further 59 (25%) were of moderate strength (i.e., 0.50 < ϕ < 0.70), 90 (38%) were of low strength (i.e., 0.30 < ϕ <0.50), and 51 (21%) were of weak strength (i.e., ϕ < 0.30). If Park et al.’s (2014) criteria are used (ϕ > 0.70), only 16% of the associations appeared to be strong enough to support convergence. The findings were corroborated with the weighted average associations calculated using the approach of Borenstein et al. (2009), all of which were classifiable as being of weak, low, or moderate strength according to Pett’s (1997) guidelines, and none of which met Park et al.’s (2014) criteria for convergence.

Table 4

Phi coefficients of association between identification methods

Data Combination	ABSI/	ABSI/	ABSI/	ABSI/	ABSII/	ABSII/	ABSII/	SDF/	SDF/	REG/
(Expected‐Actual Achievement)	ABSII	SDF	REG	NOM	SDF	REG	NOM	REG	NOM	NOM
OLSAT SAI–NAPLAN Lit	‐	0.34	0.49	‐	‐	‐	‐	0.70	‐	‐
OLSAT SAI–NAPLAN Num	‐	0.41	0.38	‐	‐	‐	‐	0.92	‐	‐
OLSAT VS– NAPLAN Lit	‐	0.34	0.43	‐	‐	‐	‐	0.79	‐	‐
OLSAT NV–NAPLAN Num	‐	0.36	0.45	‐	‐	‐	‐	0.78	‐	‐
OLSAT SAI–SC E	0.00	0.00	0.00	‐	0.79	0.88	‐	0.90	‐	‐
OLSAT SAI–SC M	0.50	0.69	0.61	‐	0.72	0.82	‐	0.88	‐	‐
OLSAT VS–SC E	0.00	0.00	0.00	‐	0.55	0.69	‐	0.34	‐	‐
OLSAT NV–SC M	0.46	0.46	0.61	‐	0.39	0.76	‐	0.58	‐	‐
OLSAT SAI–HSC E	0.96	0.60	0.35	‐	0.58	0.36	‐	0.21	‐	‐
OLSAT SAI–HSC M	0.92	0.49	0.29	‐	0.45	0.32	‐	0.14	‐	‐
OLSAT VS–HSC E	0.94	0.58	0.42	‐	0.55	0.44	‐	0.24	‐	‐
OLSAT NV–HSC M	0.80	0.26	0.20	‐	0.21	0.25	‐	0.05	‐	‐
NAPLAN Lit–SC E	0.00	0.00	0.00	‐	0.61	0.50	‐	0.82	‐	‐
NAPLAN Num–SC M	0.70	0.48	0.52	‐	0.50	0.54	‐	0.93	‐	‐
NAPLAN Lit–HSC E	1.00	0.37	0.37	‐	0.37	0.37	‐	0.14	‐	‐
NAPLAN Num–HSC M	0.86	0.40	0.26	‐	0.34	0.31	‐	0.10	‐	‐
SC E–HSC E	0.91	0.58	0.54	‐	0.60	0.52	‐	0.31	‐	‐
SC M–HSC M	0.86	0.40	0.32	‐	0.41	0.31	‐	0.13	‐	‐
OLSAT SAI–Junior SA E	0.21	0.48	0.94	‐	0.43	0.22	‐	0.51	‐	‐
OLSAT SAI–Junior SA M	0.37	0.36	0.65	‐	0.64	0.53	‐	0.56	‐	‐
OLSAT VS–Junior SA E	0.21	0.50	0.78	‐	0.37	0.24	‐	0.57	‐	‐
OLSAT NV–Junior SA M	0.39	0.36	0.67	‐	0.51	0.48	‐	0.45	‐	‐
OLSAT SAI–Senior SA E	0.32	0.56	0.62	‐	0.43	0.20	‐	0.35	‐	‐
OLSAT SAI–Senior SA M	0.42	0.47	0.81	‐	0.56	0.34	‐	0.38	‐	‐
OLSAT VS–Senior SA E	0.32	0.61	0.60	‐	0.44	0.19	‐	0.36	‐	‐
OLSAT NV–Senior SA M	0.39	0.47	0.79	‐	0.40	0.31	‐	0.38	‐	‐
NAPLAN Lit–Junior SA E	0.18	0.54	0.55	‐	0.34	0.33	‐	0.92	‐	‐
NAPLAN Num–Junior SA M	0.23	0.17	0.37	‐	0.49	0.49	‐	0.47	‐	‐
NAPLAN Lit–Senior SA E	0.30	0.53	0.68	‐	0.33	0.20	‐	0.36	‐	‐
NAPLAN Num–Senior SA M	0.42	0.40	0.79	‐	0.45	0.35	‐	0.34	‐	‐
Junior SA E–SC E	0.00	0.00	0.00	‐	0.00	0.34	‐	0.00	‐	‐
Junior SA M–SC M	0.00	0.00	0.00	‐	0.49	0.47	‐	0.49	‐	‐
SC E–Senior SA E	0.48	0.61	0.91	‐	0.68	0.43	‐	0.55	‐	‐
SC M–Senior SA M	0.34	0.50	0.77	‐	0.35	0.27	‐	0.53	‐	‐
Junior SA E–HSC E	0.94	0.50	0.45	‐	0.50	0.45	‐	0.22	‐	‐
Junior SA M–HSC M	0.84	0.56	0.27	‐	0.52	0.29	‐	0.15	‐	‐
Senior SA E–HSC E	0.88	0.40	0.67	‐	0.46	0.58	‐	0.60	‐	‐
Senior SA M–HSC M	0.75	0.54	0.38	‐	0.62	0.51	‐	0.31	‐	‐
Junior SA E–Senior SA E	0.19	0.81	0.86	‐	0.40	0.22	‐	0.55	‐	‐
Junior SA M–Senior SA M	0.28	0.65	0.89	‐	0.44	0.25	‐	0.57	‐	‐
OLSAT SAI–SAR	0.56	0.58	0.82	0.18	0.67	0.48	0.33	0.49	0.34	0.13
Weighted average	0.50	0.42	0.59	0.18	0.48	0.39	0.33	0.58	0.34	0.13

Phi coefficients of association between identification methods OLSAT NV–NAPLAN Num ABSI = Absolute split I; ABSII= Absolute split II; E = English; Junior SA = School Assessment in Grades 7 to 10; Lit = Literacy; M = Mathematics; NOM = Nomination; Num = Numeracy; NV = Non‐Verbal Score; REG = Regression; SAI = School Ability Index; SAR = School Assessment for Relevant Subject; SDF = Simple difference; Senior SA = School Assessment in Grades 11 and 12; VS =Verbal Score. Generally, the association results did not provide support for the convergence of the results of the five methods that identify gifted underachievement. Nevertheless, two pairs of identification methods (i.e., absolute split I/regression and simple difference/regression) which had the highest weighted average phi coefficient values (i.e., 0.59 and 0.58, respectively) may be considered to be approaching convergence. The weakest levels of convergence appeared to exist between the results obtained from the nomination method and the statistical identification methods (i.e., phi coefficient values ranged from 0.13 to 0.34).

Agreement

An assessment was made of the level of agreement between the results obtained from the methods that identify gifted underachievement (i.e., the identification of the same students as exhibiting gifted underachievement) by making use of a statistical tool that is commonly adopted in the health fields to assess agreement between multiple methods that diagnose a health condition (Ewe et al., 2013). The Cohen’s kappa (κ) statistic is the most popular tool used to measure the degree of agreement between classification tasks and has an advantage over other tools in that it accounts for agreement by chance factors (Agresti, 2013; Sim & Wright, 2005). It is noted that multiple scholars have proposed varying guidelines for the interpretation of Cohen’s kappa values (Kundel & Polansky, 2003; Rettew, Lynch, Achenbach, Dumenci, & Ivanova, 2009; Walts et al., 2011). Of these, the guidelines proposed by Walts et al. (2011) were selected, as they appeared to represent a useful, ‘intermediate’ position among the range of existing guidelines. Consequently, a Cohen’s kappa statistic of 0.75 was deemed to be necessary for convergence to be supported between any pair of identification methods. The Cohen’s kappa values for the results obtained from all possible pairings of the identification methods are reported in Table 5. Only 14% of the Cohen’s kappa values were found to be greater than the Walts et al. (2011) threshold for excellent agreement (κ = 0.75), while none of the weighted average Cohen’s kappa values calculated using the approach of Borenstein et al. (2009) were above this threshold. Therefore, strong support could not be found for convergence between the various methods that identify gifted underachievement from an evaluation of the agreement of the results obtained from the various identification methods. In particular, the level of agreement between the nomination method and the other statistical methods was consistently less than may be expected by chance alone (κ < 0.00). This suggested that the nomination method, consistent with the findings of Lau and Chan (2001), may have the least agreement with the other methods used to identify gifted underachievement.

Table 5

Cohen’s kappa statistics between identification methods

Data Combination	ABSI/	ABSI/	ABSI/	ABSI/	ABSII/	ABSII/	ABSII/	SDF/	SDF/	REG/
(Expected–Actual Achievement)	ABSII	SDF	REG	NOM	SDF	REG	NOM	REG	NOM	NOM
OLSAT SAI–NAPLAN Lit	‐	0.21	0.38	‐	‐	‐	‐	0.66	‐	‐
OLSAT SAI–NAPLAN Num	‐	0.29	0.25	‐	‐	‐	‐	0.91	‐	‐
OLSAT VS– NAPLAN Lit	‐	0.20	0.31	‐	‐	‐	‐	0.77	‐	‐
OLSAT NV–NAPLAN Num	‐	0.23	0.33	‐	‐	‐	‐	0.76	‐	‐
OLSAT SAI–SC E	0.00	0.00	0.00	‐	0.77	0.87	‐	0.89	‐	‐
OLSAT SAI–SC M	0.40	0.64	0.54	‐	0.69	0.80	‐	0.87	‐	‐
OLSAT VS–SC E	0.00	0.00	0.00	‐	0.47	0.65	‐	0.33	‐	‐
OLSAT NV–SC M	0.36	0.36	0.54	‐	0.39	0.73	‐	0.56	‐	‐
OLSAT SAI–HSC E	0.96	0.53	0.22	‐	0.50	0.23	‐	0.08	‐	‐
OLSAT SAI–HSC M	0.91	0.38	0.16	‐	0.33	0.19	‐	0.04	‐	‐
OLSAT VS–HSC E	0.94	0.51	0.29	‐	0.46	0.32	‐	0.11	‐	‐
OLSAT NV–HSC M	0.79	0.13	0.08	‐	0.08	0.11	‐	0.01	‐	‐
NAPLAN Lit–SC E	0.00	0.00	0.00	‐	0.54	0.40	‐	0.80	‐	‐
NAPLAN Num–SC M	0.65	0.37	0.42	‐	0.46	0.51	‐	0.93	‐	‐
NAPLAN Lit–HSC E	1.00	0.24	0.24	‐	0.24	0.24	‐	0.04	‐	‐
NAPLAN Num–HSC M	0.85	0.27	0.13	‐	0.21	0.17	‐	0.02	‐	‐
SC E–HSC E	0.91	0.51	0.45	‐	0.53	0.43	‐	0.18	‐	‐
SC M–HSC M	0.86	0.27	0.18	‐	0.28	0.18	‐	0.03	‐	‐
OLSAT SAI–Junior SA E	0.08	0.37	0.94	‐	0.32	0.09	‐	0.41	‐	‐
OLSAT SAI–Junior SA M	0.24	0.23	0.59	‐	0.64	0.46	‐	0.47	‐	‐
OLSAT VS–Junior SA E	0.08	0.40	0.78	‐	0.26	0.11	‐	0.49	‐	‐
OLSAT NV–Junior SA M	0.26	0.23	0.66	‐	0.51	0.37	‐	0.33	‐	‐
OLSAT SAI–Senior SA E	0.19	0.48	0.55	‐	0.37	0.08	‐	0.21	‐	‐
OLSAT SAI–Senior SA M	0.30	0.36	0.79	‐	0.56	0.20	‐	0.25	‐	‐
OLSAT VS–Senior SA E	0.19	0.54	0.52	‐	0.36	0.07	‐	0.23	‐	‐
OLSAT NV–Senior SA M	0.27	0.37	0.77	‐	0.39	0.18	‐	0.25	‐	‐
NAPLAN Lit–Junior SA E	0.07	0.45	0.47	‐	0.21	0.20	‐	0.92	‐	‐
NAPLAN Num–Junior SA M	0.10	0.06	0.24	‐	0.47	0.43	‐	0.36	‐	‐
NAPLAN Lit–Senior SA E	0.16	0.44	0.63	‐	0.28	0.08	‐	0.23	‐	‐
NAPLAN Num–Senior SA M	0.30	0.28	0.78	‐	0.45	0.22	‐	0.20	‐	‐
Junior SA E–SC E	0.00	^N	0.00	‐	0.00	0.20	‐	0.00	‐	‐
Junior SA M–SC M	0.00	0.00	0.00	‐	0.39	0.47	‐	0.39	‐	‐
SC E–Senior SA E	0.37	0.54	0.91	‐	0.66	0.32	‐	0.47	‐	‐
SC M–Senior SA M	0.21	0.40	0.77	‐	0.33	0.17	‐	0.43	‐	‐
Junior SA E–HSC E	0.94	0.40	0.33	‐	0.40	0.33	‐	0.10	‐	‐
Junior SA M–HSC M	0.84	0.48	0.14	‐	0.42	0.16	‐	0.04	‐	‐
Senior SA E–HSC E	0.88	0.28	0.62	‐	0.34	0.56	‐	0.53	‐	‐
Senior SA M–HSC M	0.75	0.46	0.33	‐	0.55	0.41	‐	0.18	‐	‐
Junior SA E–Senior SA E	0.07	0.37	0.85	‐	0.27	0.09	‐	0.46	‐	‐
Junior SA M–Senior SA M	0.15	0.59	0.88	‐	0.32	0.12	‐	0.49	‐	‐
OLSAT SAI–SAR	0.48	0.50	0.81	‐0.16	0.67	0.37	‐0.32	0.36	‐0.10	‐0.34
Weighted average	0.33	0.30	0.46	‐0.16	0.40	0.17	‐0.32	0.11	‐0.10	‐0.34

Nkappa could not be calculated as no cases of gifted underachievement were identified by either method.

Cohen’s kappa statistics between identification methods OLSAT NV–NAPLAN Num ABSI = Absolute split I; ABSII= Absolute split II; E = English; Junior SA = School Assessment in Grades 7 to 10; Lit = Literacy; M = Mathematics; NOM = Nomination; Num = Numeracy; NV = Non‐Verbal Score; REG = Regression; SAI = School Ability Index; SAR = School Assessment for Relevant Subject; SDF = Simple difference; Senior SA = School Assessment in Grades 11 and 12; VS =Verbal Score; Nkappa could not be calculated as no cases of gifted underachievement were identified by either method. For comparison with the Cohen’s kappa values, Table 6 outlines the raw percentage agreement values for each possible pairing of the five investigated identification methods. Weighted average calculations indicated that raw percentage agreement ranged from 0.36 to 0.91 for different pairs of identification methods. Nevertheless, as raw percentage agreement values do not account for agreement by chance factors, these findings should not be given substantial weight in assessments of the agreement between identification methods.

Table 6

Percentage Agreement between Identification Methods

Data Combination	ABSI/	ABSI/	ABSI/	ABSI/	ABSII/	ABSII/	ABSII/	SDF/	SDF/	REG/
(Expected–Actual Achievement)	ABSII	SDF	REG	NOM	SDF	REG	NOM	REG	NOM	NOM
OLSAT SAI–NAPLAN Lit	‐	0.76	0.88	‐	‐	‐	‐	0.88	‐	‐
OLSAT SAI–NAPLAN Num	‐	0.91	0.89	‐	‐	‐	‐	0.98	‐	‐
OLSAT VS– NAPLAN Lit	‐	0.83	0.89	‐	‐	‐	‐	0.94	‐	‐
OLSAT NV–NAPLAN Num	‐	0.84	0.90	‐	‐	‐	‐	0.94	‐	‐
OLSAT SAI–SC E	0.90	0.85	0.87	‐	0.95	0.97	‐	0.97	‐	‐
OLSAT SAI–SC M	0.87	0.95	0.92	‐	0.92	0.95	‐	0.97	‐	‐
OLSAT VS–SC E	0.96	0.87	0.91	‐	0.91	0.96	‐	0.87	‐	‐
OLSAT NV–SC M	0.87	0.87	0.93	‐	0.83	0.93	‐	0.89	‐	‐
OLSAT SAI–HSC E	0.98	0.80	0.55	‐	0.78	0.56	‐	0.35	‐	‐
OLSAT SAI–HSC M	0.96	0.75	0.50	‐	0.71	0.54	‐	0.25	‐	‐
OLSAT VS–HSC E	0.97	0.78	0.61	‐	0.75	0.64	‐	0.39	‐	‐
OLSAT NV–HSC M	0.90	0.69	0.41	‐	0.59	0.52	‐	0.10	‐	‐
NAPLAN Lit–SC E	0.95	0.87	0.82	‐	0.92	0.87	‐	0.95	‐	‐
NAPLAN Num–SC M	0.97	0.92	0.93	‐	0.92	0.93	‐	0.99	‐	‐
NAPLAN Lit–HSC E	1.00	0.59	0.66	‐	0.59	0.66	‐	0.24	‐	‐
NAPLAN Num–HSC M	0.93	0.76	0.44	‐	0.69	0.51	‐	0.20	‐	‐
SC E–HSC E	0.96	0.74	0.75	‐	0.75	0.74	‐	0.49	‐	‐
SC M–HSC M	0.93	0.70	0.53	‐	0.71	0.52	‐	0.23	‐	‐
OLSAT SAI–Junior SA E	0.36	0.75	0.99	‐	0.61	0.37	‐	0.76	‐	‐
OLSAT SAI–Junior SA M	0.72	0.70	0.92	‐	0.84	0.78	‐	0.78	‐	‐
OLSAT VS–Junior SA E	0.39	0.80	0.96	‐	0.57	0.41	‐	0.83	‐	‐
OLSAT NV–Junior SA M	0.69	0.65	0.93	‐	0.76	0.73	‐	0.70	‐	‐
OLSAT SAI–Senior SA E	0.51	0.72	0.83	‐	0.73	0.33	‐	0.55	‐	‐
OLSAT SAI–Senior SA M	0.60	0.65	0.93	‐	0.79	0.53	‐	0.59	‐	‐
OLSAT VS–Senior SA E	0.49	0.76	0.84	‐	0.69	0.33	‐	0.60	‐	‐
OLSAT NV–Senior SA M	0.57	0.65	0.92	‐	0.72	0.49	‐	0.58	‐	‐
NAPLAN Lit–Junior SA E	0.48	0.91	0.91	‐	0.57	0.56	‐	0.98	‐	‐
NAPLAN Num–Junior SA M	0.69	0.56	0.86	‐	0.74	0.78	‐	0.70	‐	‐
NAPLAN Lit–Senior SA E	0.48	0.76	0.92	‐	0.61	0.40	‐	0.68	‐	‐
NAPLAN Num–Senior SA M	0.60	0.59	0.93	‐	0.74	0.55	‐	0.53	‐	‐
Junior SA E–SC E	0.99	1.00	0.89	‐	0.99	0.91	‐	0.89	‐	‐
Junior SA M–SC M	0.94	0.99	0.94	‐	0.96	0.94	‐	0.96	‐	‐
SC E–Senior SA E	0.68	0.80	0.97	‐	0.83	0.66	‐	0.78	‐	‐
SC M–Senior SA M	0.53	0.72	0.93	‐	0.65	0.51	‐	0.73	‐	‐
Junior SA E–HSC E	0.97	0.67	0.78	‐	0.67	0.78	‐	0.46	‐	‐
Junior SA M–HSC M	0.92	0.76	0.53	‐	0.71	0.57	‐	0.29	‐	‐
Senior SA E–HSC E	0.98	0.72	0.91	‐	0.74	0.89	‐	0.80	‐	‐
Senior SA M–HSC M	0.90	0.72	0.79	‐	0.77	0.79	‐	0.56	‐	‐
Junior SA E–Senior SA E	0.54	0.90	0.99	‐	0.64	0.55	‐	0.91	‐	‐
Junior SA M–Senior SA M	0.60	0.92	0.99	‐	0.68	0.58	‐	0.91	‐	‐
OLSAT SAI–SAR	0.73	0.74	0.93	0.48	0.84	0.66	0.36	0.68	0.36	0.51
Weighted average	0.79	0.81	0.91	0.48	0.77	0.69	0.36	0.85	0.36	0.51

Percentage Agreement between Identification Methods OLSAT NV–NAPLAN Num ABSI = Absolute split I; ABSII= Absolute split II; E = English; Junior SA = School Assessment in Grades 7 to 10; Lit = Literacy; M = Mathematics; NOM = Nomination; Num = Numeracy; NV = Non‐Verbal Score; REG = Regression; SAI = School Ability Index; SAR = School Assessment for Relevant Subject; SDF = Simple difference; Senior SA = School Assessment in Grades 11 and 12; VS =Verbal Score.

Overall assessment of the convergence evidence

While convergence did appear to exist for some of the identification methods with some specific data combinations, the collected evidence indicated that convergence was generally not supported. Indeed, consistent evidence across the multiple approaches used to assess convergence could not be found for any pair of identification methods.

An optimal method

After the collection of convergence evidence to assess whether the commonly used methods to identify gifted underachievement may be considered equivalent, criterion evidence was collected to make a determination on an optimal method for the identification of gifted underachievement. As a first step, latent class analysis was undertaken with the entire data set using the poLCA statistical package within the R software environment (Linzer & Lewis, 2011), to create a single latent class model that provided criterion value estimates (i.e., estimates of the most likely ‘true’ classification of each student as exhibiting gifted achievement or gifted underachievement; Collins & Huynh, 2014). Thereafter, the convergence of the identification results obtained from the various identification methods with the latent class model (the criterion value estimates) was assessed to determine the accuracy of the various identification methods. The three approaches used previously to assess convergence were again used. Table 7 outlines the results, including the difference in the proportions of gifted underachievement (δ), phi coefficient values (ϕ) that indicate the level of association, and Cohen’s kappa values (κ) that indicate the level of agreement.

Table 7

Accuracy results using a latent class model

Latent class model	Method	Accuracy
Latent class model	Method	δ	ϕ	κ
All data	SDF	0.08*	0.86	0.85
	REG	‐0.23*	0.59	0.52
	ABSI	‐0.23*	0.61	0.54
	ABSII	0.12*	0.70	0.68
	NOM	‐0.31*	0.13	0.10
Nomination removed	SDF	0.08*	0.86	0.85
	REG	‐0.23*	0.59	0.52
	ABSI	‐0.23*	0.61	0.54
	ABSII	0.12*	0.70	0.68

ABSI = Absolute split I; ABSII= Absolute split II; NOM = Nomination; REG = Regression; SDF = Simple difference; = difference in proportion = kappa agreement coefficient; = phi association coefficient.

* p < 0.05 from McNemar’s test.

Accuracy results using a latent class model ABSI = Absolute split I; ABSII= Absolute split II; NOM = Nomination; REG = Regression; SDF = Simple difference; = difference in proportion = kappa agreement coefficient; = phi association coefficient. * p < 0.05 from McNemar’s test. The initial results demonstrated that the classifications of the simple difference method had the highest degree of convergence with the criterion value estimates. Specifically, the simple difference method produced: (a) the smallest difference in identified proportions of gifted underachievement with the criterion values, (b) association levels with the criterion value estimates that were well above the required threshold for convergence (ϕ > 0.70; Park et al., 2014; Pett, 1997), and (c) a Cohen’s kappa value that may be classifiable as being in almost perfect agreement with the criterion value estimates (Kundel & Polansky, 2003). While the McNemar test did indicate that the difference in the proportions of gifted underachievement identified between the simple difference method and the criterion value estimates was statistically significant, this may not be of practical significance (Wasserstein & Lazar, 2016). Among the other identification methods, the classifications of the absolute split II method showed the greatest convergence with the criterion value estimates, with a strong association equal to Park et al.’s (2014) threshold, and an agreement measure that was only 0.07 below the Walts et al. (2011) threshold. The classifications of the absolute split I and regression methods showed more moderate levels of convergence. The least convergent with the criterion value estimates were the classifications of the nomination method. Not only did the nomination method produce the lowest Cohen’s kappa and phi coefficient values, it also identified 31% less students as exhibiting gifted underachievement than the latent class model (i.e., the largest difference among all identification methods). Therefore, the initial latent class model provided the least support for the validity of the use of the nomination method. In recognition of the possibility that the inclusion of any identification method in the latent class analysis may influence the latent class model, the analyses were repeated without the nomination method. Table 7 outlines the results of the refined analyses, which were identical to the initial latent class model, and verified that the classifications of the simple difference method had the highest level of convergence with the criterion value estimates.

Overall assessment of criterion evidence

The criterion results supported the validity of the use of only one of the investigated identification methods to identify gifted underachievement (i.e., the simple difference method). Differing levels of support were found for the validity of the use of the other methods. As such, the findings allowed for a ranking of the confidence in the validity of the use of the commonly used methods to identify gifted underachievement (i.e., from the most valid to the least valid: the simple difference, the absolute split II, the absolute split I, the regression, and the nomination methods).

Discussion

The empirical evidence gathered in this investigation suggested that the commonly used methods to identify (i.e., the absolute split I, absolute split II, nomination, regression, and simple difference methods) gifted underachievement may not be considered convergent, and therefore should not be used interchangeably. Indeed, the various identification methods were demonstrated in this study to have differing levels of validity. The strongest evidence for validity was found for the simple difference method, to suggest that this method may be the optimal method to identify gifted underachievement. The level of empirical support for the simple difference method and the other identification methods should be considered alongside the possible theoretical issues associated with each method. First of all, it is noted that both absolute split methods rely on a fixed expected achievement threshold (i.e., rank or score), which may mean that all gifted students will be treated under these methods as having identical capabilities, possibly leading to the systematic non‐identification of underachievement among the most highly gifted students (Reis & McCoach, 2000; Tze, Daniels, & Klassen, 2016). In comparison, concerns have been raised about the regression method as it may be ‘logically inconsistent with the concept of underachievement’ (Van den Broeck, 2002, p.197). Unlike the other identification methods which compare a student’s expected achievement and actual achievement, the regression method compares a student’s actual achievement to patterns of achievement of other students in a cohort of interest. Finally, scholars have suggested that despite the knowledge of teachers, they appear largely unable to correctly identify cases of gifted underachievement (Dunne & Gazeley, 2008; Jones & Myhill, 2004; Lau & Chan, 2001), raising questions about the validity of the use of the nomination method. The simple difference method is free of all of these concerns. Furthermore, it has the advantage of providing a measurement of the degree of underachievement of gifted students, that is not possible with either of the absolute split methods or the nomination method, and may represent valuable additional information that may inform decision‐making on the most appropriate educational and related provisions for the identified students.

Implications for research and practice

This study may have a number of implications for consideration by researchers. First, as the commonly used methods that identify gifted underachievement have not been established to be equivalent, researchers will need to be consistent in their selection of identification methods, recognize the differences between these methods (and the meaninglessness of making simple comparisons between these methods), and refrain from aggregating the results obtained from these methods. Of the available methods, the simple difference method has been identified to be optimal. The wide adoption of this method may help to resolve one of the longest standing issues in the field of gifted education (Desmet et al., 2020; Dowdall & Colangelo, 1982; Farquhar & Payne, 1964; Reis & McCoach, 2000; Ridgley et al., 2020), and therefore to improve the comparability and validity of studies on gifted underachievement. With respect to practice, the findings of the study provide clear guidelines on how specifically underachievement should be identified in gifted students. Of note, the study suggested that a reliance on teacher nominations may not be particularly valid or tenable. Instead, a systematic approach that utilizes the simple difference method, and involves the assessment of all gifted students with the increasingly sophisticated software packages that are now available in schools to manage student data, may be useful. The resulting information may serve as useful triggers for further investigation, and the provision of appropriate educational and related interventions for the affected students. It is possible that, while this study was designed to investigate the identification of gifted underachievement, its findings may also have application to the general student population. Indeed, a significant discrepancy between expected and actual achievement, or underachievement, is likely to be a major problem that could affect any student (Smith, 2010; Veas, Gilar, Castejon, & Minano, 2016). Nevertheless, further research may be needed to determine whether any modifications to the identification process recommended in this study may be necessary for students who are not identified as gifted. The findings of the study may also have application to the proposal by some scholars for a paradigm shift in the field of gifted education, whereby giftedness should not be considered to be a construct of an individual student, but rather a construct of the environment that students are in (Dai & Renzulli, 2008; Harder, Vialle, & Ziegler, 2014; Ziegler, 2005; Ziegler & Phillipson, 2012). Of note, Funk‐Werblo (2003) has proposed that underachievement could be assessed at the level of the environment (e.g., schools). Such an approach to the evaluation of the educational ‘health’ of schools, or the effectiveness of teachers, may have signficant advantages over some of the contemporary approaches that are used (e.g., standardized achievement test results), which may be subject to manipulation (e.g., the exclusive enrolment of high achievers), are likely impacted by socio‐economic factors (Smith, 2010), and may systematically ignore the underachievement of gifted students (who may nevertheless achieve highly; Kanevsky & Keighley, 2003).

Areas for further investigation

A number of areas exist for further investigation. First, of the various types of nominations, only teacher nominations were investigated in this study. Therefore, a more thorough examination of the nomination method, which includes data from different nominating parties (e.g., parents, peers, and students), and a greater total number of nominations, may allow for a more complete understanding of the validity of the use of this method. Second, one area that was not investigated in this study was the discrimination of underachievement from normal fluctuations in achievement. While the commonly adopted threshold of one standard deviation between expected and actual achievement levels to deem underachievement under the simple difference method does appear to be widely accepted, there is no strong argument for the optimality of this somewhat arbitrary value. Finally, the empirical evidence in this study was obtained using data obtained from gifted students attending a single school. Consequently, replication studies using samples from other schools, school sectors, geographical locations, and countries may be desirable.

Conflicts of interest

All authors declare no conflict of interest.

Author contribution

Rahmi Luke Jackson: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Writing – original draft; Writing – review & editing. Jae Yup Jung: Supervision; Writing – review & editing.

19 in total

1. The nature and correlates of underachievement among elementary schoolchildren in Hong Kong.

Authors: R B McCall; S R Beach; S Lau
Journal: Child Dev Date: 2000 May-Jun

Review 2. The kappa statistic in reliability studies: use, interpretation, and sample size requirements.

Authors: Julius Sim; Chris C Wright
Journal: Phys Ther Date: 2005-03

3. Meta-analysis of genome-wide association studies with overlapping subjects.

Authors: Dan-Yu Lin; Patrick F Sullivan
Journal: Am J Hum Genet Date: 2009-12 Impact factor: 11.025

4. Urine screening for Chlamydia trachomatis during pregnancy.

Authors: Scott W Roberts; Jeanne S Sheffield; Don D McIntire; James M Alexander
Journal: Obstet Gynecol Date: 2011-04 Impact factor: 7.661

5. Novel selective medium for isolation of Staphylococcus lugdunensis from wound specimens.

Authors: Pak-Leung Ho; Sammy Man-Him Leung; Herman Tse; Kin-Hung Chow; Vincent Chi-Chung Cheng; Tak-Lun Que
Journal: J Clin Microbiol Date: 2014-04-23 Impact factor: 5.948

6. Accuracy of three-dimensional versus two-dimensional echocardiography for quantification of aortic regurgitation and validation by three-dimensional three-directional velocity-encoded magnetic resonance imaging.

Authors: See Hooi Ewe; Victoria Delgado; Rob van der Geest; Jos J M Westenberg; Marlieke L A Haeck; Tomasz G Witkowski; Dominique Auger; Nina Ajmone Marsan; Eduard R Holman; Albert de Roos; Martin J Schalij; Jeroen J Bax; Allard Sieders; Hans-Marc J Siebelink
Journal: Am J Cardiol Date: 2013-05-15 Impact factor: 2.778

Review 7. State learning disability eligibility criteria: A comprehensive review.

Authors: Kathrin E Maki; Randy G Floyd; Triche Roberson
Journal: Sch Psychol Q Date: 2015-01-12

8. A simplified Bethesda System for reporting thyroid cytopathology using only four categories improves intra- and inter-observer diagnostic agreement and provides non-overlapping estimates of malignancy risks.

Authors: Ann E Walts; Shikha Bose; Xuemo Fan; David Frishberg; Karen Scharre; Mariza de Peralta-Venturina; Jing Zhai; Alberto M Marchevsky
Journal: Diagn Cytopathol Date: 2011-05-06 Impact factor: 1.582

9. Meta-analyses of agreement between diagnoses made from clinical evaluations and standardized diagnostic interviews.

Authors: David C Rettew; Alicia Doyle Lynch; Thomas M Achenbach; Levent Dumenci; Masha Y Ivanova
Journal: Int J Methods Psychiatr Res Date: 2009-09 Impact factor: 4.035

Review 10. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review.

Authors: Rafdzah Zaki; Awang Bulgiba; Roshidi Ismail; Noor Azina Ismail
Journal: PLoS One Date: 2012-05-25 Impact factor: 3.240