Heather Wild1, Aki-Juhani Kyröläinen2, Victor Kuperman1. 1. Department of Linguistics and Languages, McMaster University, Hamilton, Ontario, Canada. 2. School of Languages and Translation, University of Turku, Turku, Varsinais-Suomi, Finland.
Abstract
Psychological research, including research into adult reading, is frequently based on convenience samples of undergraduate students. This practice raises concerns about the external validity of many accepted findings. The present study seeks to determine how strong this student sampling bias is in literacy and numeracy research. We use the nationally representative cross-national data from the Programme for the International Assessment of Adult Competencies to quantify skill differences between (i) students and the general population aged 16-65, and (ii) students and age-matched non-students aged 16-25. The median effect size for the comparison (i) of literacy scores across 32 countries was d = .56, and for comparison (ii) d = .55, which exceeds the average effect size in psychological experiments (d = .40). Numeracy comparisons (i) and (ii) showed similarly strong differences. The observed differences indicate that undergraduate students are not representative of the general population nor age-matched non-students.
Psychological research, including research into adult reading, is frequently based on convenience samples of undergraduate students. This practice raises concerns about the external validity of many accepted findings. The present study seeks to determine how strong this student sampling bias is in literacy and numeracy research. We use the nationally representative cross-national data from the Programme for the International Assessment of Adult Competencies to quantify skill differences between (i) students and the general population aged 16-65, and (ii) students and age-matched non-students aged 16-25. The median effect size for the comparison (i) of literacy scores across 32 countries was d = .56, and for comparison (ii) d = .55, which exceeds the average effect size in psychological experiments (d = .40). Numeracy comparisons (i) and (ii) showed similarly strong differences. The observed differences indicate that undergraduate students are not representative of the general population nor age-matched non-students.
Over the past two decades growing concerns have been raised about psychological research’s overreliance on convenience samples of undergraduate students. Arnett (2008) [1] found that up to 80% of samples in APA-published studies consisted of samples of undergraduate psychology students. A decade later, Rad et al. (2018) [2] reported that although the trend was decreasing, many studies continued to rely on students. Relying heavily on student samples is an extension of the well-known bias of drawing samples from Western Educated Industrialized Rich and Democratic (WEIRD) societies [3]. Not only are student samples frequently drawn from WEIRD countries [1, 2], but they are even WEIRDer within their countries given that students tend to come from higher socio-economic backgrounds, be between age 18–24, and are by nature highly educated. As such, the undergraduate sampling bias compromises one of the core goals of psychological research: external validity.We are not the first to raise concerns about external validity and the undergraduate sampling bias (see above as well as [4]). Rather, we seek to strengthen the literature by quantifying just how well students represent the general population of their countries. Several previous studies have found students to be unrepresentative in the field of cognitive psychology. Snowberg and Yariv (2021) [5] found American undergraduates exhibited greater cognitive skill and strategic sophistication than a representative sample of the United States. Similarly, Brañas-Garza et al.’s (2019) [6] cognitive meta-study found students score significantly higher than non-students on the Cognitive Reflection Test (CRT), a measure used to assess decision making processes. Performance on the CRT is also highly correlated with other cognitive measures such as the Wonderlic Personnel Test (WPT), which measures general cognitive ability, and standardized college admissions tests such as the American College Testing (ACT) and Scholastic Aptitude Test (SAT), which measure academic achievement [7]. These findings suggest that relying on undergraduate samples will be equally challenging to the generalizability of educational outcomes such as literacy and numeracy–the focus of this study.As mentioned above, undergraduate samples do not challenge the generalizability of literacy and numeracy research simply because they are highly educated, but also because they represent a narrow age range. A number of studies indicate age is a significant predictor of cognitive skills including literacy and numeracy. For instance, Kirasic et al. (1996) [8] showed that middle aged and older adults performed worse than young adults on information processing, working memory, and declarative learning tasks, many of which tap into the component skills of literacy and numeracy. Older adults likewise perform worse on direct measures of numeracy skills than younger adults [9-11]. Similarly, Green and Riddell (1998) [12] and Kyröläinen and Kuperman (2021) [13] also report a negative correlation between age and performance on literacy assessments in adults aged 26–65. Therefore, samples of undergraduate students, who tend to be young adults in peak cognitive conditioning, are unlikely to be representative of the cognitive behaviours of the general population.The current study seeks to quantify just how accurately undergraduate students represent the general population in terms of two complex cognitive skills, namely literacy and numeracy (defined below). There are at least three reasons to single out literacy and numeracy from other cognitive and social phenomena. First, research on these topics is biased towards studying student populations. University students are overrepresented as a source of empirical data in reading research, particularly when it comes to lexical mega-studies and eye-movement corpora. The English Lexicon Project, British Lexicon Project, and Dutch Lexicon Project are large scale collections of lexical decision and naming times for thousands of words in their respective languages and have been used to develop several theories of word processing [14-16]. Similarly, the Ghent Eye-tracking Corpus (GECO) and Multilingual Eye-tracking Corpus (MECO), which recorded eye-movement data while participants read longer texts, have been used to inform theories of reading behaviour and eye-movement control [17, 18]. Each of these valuable and well cited datasets collected their data primarily or exclusively from university students. How well students represent the general population in terms of complex skills such as literacy and numeracy may be an indicator of how representative these samples are in terms of component skills such as reading behaviour, numeric reasoning, working memory, and cognitive control.The second reason for singling out literacy and numeracy is for their societal importance. In this technological era, these advanced cognitive skills are critical for individual employability, life satisfaction, health, and for the societal and economic prosperity of nations [19, 20]. Finally, literacy and numeracy are skills that students are actively trained on and selected for (e.g. [21]), whereas, on a daily basis, non-students employ these skills to more varied degrees. Over the course of their secondary schooling, individuals typically need to succeed in a series of examinations that precisely target literacy and numeracy in order to be admitted to post-secondary education. Simultaneously, an individual’s perception of their literacy and numeracy levels informs their decision on whether to pursue post-secondary education [21]. This selectivity favors more literate and numerate individuals to become undergraduate students in the first place. In addition, post-secondary education further boosts students’ literacy and numeracy by providing intense practice and high stakes for meeting institutional demands on these skills [22, 23]. Against this background, the question is hardly whether university students differ from the broader population of language speakers. Instead, we ask just how different are they?The present study answers this question by reporting an analysis of literacy and numeracy skills based on comparative data from 24 languages and 32 countries across 5 continents. To our knowledge, this is the first large-scale analysis that quantifies the degree to which undergraduate students represent the general population regarding literacy and numeracy. Given that undergraduate students are the population most frequently sampled in psycholinguistics, we seek to determine how different students are from (i) the general population of adults and (ii) from the age-matched non-student population, in terms of literacy and numeracy skills within and across countries.We use the Programme for the International Assessment of Adult Competencies (PIAAC) [24] which is an international survey assessing literacy, numeracy and problem-solving skills in the adult population. PIAAC defines literacy as “understanding, evaluating, using and engaging with written texts to participate in society, to achieve one’s goals, and to develop one’s knowledge and potential” [25]. The PIAAC definition of numeracy is “the ability to access, use, interpret and communicate mathematical information and ideas, in order to engage in and manage the mathematical demands of a range of situations in adult life”. The assessment measures reading for a purpose (i.e. to gather knowledge, evaluate the text, form an opinion etc.) [26], see methods for an example. This draws on information processing and working memory skills in addition to basic reading skills such as phonological decoding and vocabulary knowledge. Literacy and numeracy tasks in PIAAC clearly require combining and coordinating multiple cognitive processes and component skills. PIAAC only provides the scores for the most inclusive and complex literacy and numeracy task rather than the individual component skills (except for a small subset of mainly low-literacy participants [27]). Yet group differences in the participants’ performance in these complex tasks enables speculation and hypothesis-building with respect to the expected differences in at least some of the required component skills.One beneficial feature of the PIAAC data is that each participating country was required to produce a probability-based sample (with a minimum size N = 5000) representative of the population of adults aged 16 to 65 in the country. Another advantage of the PIAAC data is that the literacy and numeracy scores are psychometrically validated and directly comparable across countries and languages of administration. The result is rich data from 24 languages (including Arabic, Hebrew, Japanese, Kazakh, and Korean) adding valuable insights beyond the over-researched realm of alphabetic Indo-European languages.
Methods
Programme for the international assessment of adult competencies
We use the publicly available PIAAC data to estimate effect sizes for comparisons of literacy and numeracy skills between (i) university students and their respective country’s adult population (16–65 years old), and (ii) students and non-students in the same age cohort. The more specific comparison (ii) pits undergraduate students against their own age group (16–25 years old) and thus estimates the critical difference while largely subtracting the effect of aging and the cohort effect, which are known to be pivotal in the distribution of cognitive skills in society [13, 28, 29].We focus on two cognitive skills: literacy and numeracy. Both skills are assessed in PIAAC through tests that simulate the demands of work, social and everyday life on multiple skill facets [30, 31]. For instance, participants may read a list of preschool rules and be asked what is the latest time that children should arrive. In the case of literacy, the test items engage all levels of reading comprehension–including decoding, knowledge of vocabulary, ability to process information at the word, sentence and discourse level, reading fluency and inferential skills–as well as ability to read digital texts (using hyperlinks and navigation). For sample items see http://www.oecd.org/skills/piaac/Literacy%20Sample%20Items.pdf.The publicly available files with PIAAC data from 35 countries were retrieved from https://www.oecd.org/skills/piaac/data/. We used the files from the first cycle of data collection which took place from 2011–2012 (round 1), 2014–2015 (round 2), and 2017 (round 3). The [redacted] Research Ethics Board deems this use of secondary data exempt from ethics clearance requirements. Three national samples out of the total set of 35 participating countries were removed from the analysis either because they did not contain variables critical for our analyses (Denmark, Russian Federation) or had a sample of fewer than 1000 participants after the trimming described below (Singapore). The following data-processing and trimming steps were applied to the remaining 32 datasets. First, we only considered individuals who were born in the country of test administration and were native speakers of the language in which they took the test. This restriction enabled us to filter out effects of immigration and second language acquisition on the distribution of cognitive skills in a national sample (see [32]). Individual data with missing values for education and occupational status were removed as well.The resulting national samples and respective weights (see below) were used for estimation of literacy and numeracy skills in different population segments of respective countries. One such segment, labeled Student, included individuals between 16 and 25 years of age who were (a) studying in a formal education setting or working and studying simultaneously, and (b) had completed either upper secondary education, a bachelor’s degree or a master’s degree at the time of data collection. Another sample, labeled Young, incorporated all individuals in the 16–25 age range who were not part of the Student sample. The final and most inclusive sample, labeled General, consisted of all participants from the trimmed sample of a given country except those in the Student sample. That is, the General sample included the Young sample, but not those in the Student sample. Naturally, many of the participants in the General sample are also former students, which may attenuate the differences between the General and Student samples. Since neither the Young nor the General samples overlapped with the Student sample, we administered pairwise comparisons between independent samples. Sizes of all samples are reported in Table 1 for each country.
Table 1
Descriptive statistics of literacy for the General, Young, and Student national samples, and Cohen’s d estimates of differences between Student and General samples, and Student and Young samples, presented by country.
Ctry
NTotal
General
Student
Young
Group comparisons
N
M(SD)
N
M(SD)
N
M(SD)
d_S_G
d_S_Y
AUT
5130
4136
273.47 (39.97)
111
301.90 (33.25)
650
279.53 (38.93)
0.71
0.59
BEL
5463
4140
277.29 (44.66)
272
301.37 (32.34)
551
281.04 (40.71)
0.55
0.53
CAN
26683
18479
278.61 (47.12)
870
295.71 (38.44)
2653
270.88 (45.00)
0.37
0.57
CHL
5212
4680
217.33 (52.25)
316
259.64 (41.72)
696
227.43 (47.05)
0.82
0.71
CYP
5053
3687
269.95 (39.85)
207
272.76 (33.72)
395
266.72 (37.57)
0.07
0.17
CZE
6102
5398
272.96 (40.36)
453
297.03 (35.54)
986
273.01 (38.74)
0.60
0.64
DEU
5465
4395
273.77 (45.22)
228
308.00 (33.16)
696
276.87 (42.42)
0.77
0.77
ECU
5702
5166
194.96 (50.06)
301
222.69 (42.21)
1064
201.07 (48.00)
0.56
0.46
ESP
6055
4770
253.61 (47.26)
245
288.37 (32.06)
609
256.62 (39.39)
0.75
0.85
EST
7632
6124
277.36 (43.49)
379
306.77 (34.18)
919
279.61 (40.06)
0.68
0.71
FIN
5464
4899
289.59 (46.84)
234
319.76 (34.92)
611
291.90 (38.88)
0.65
0.74
FRA
6993
5620
265.61 (45.81)
357
291.21 (38.28)
672
269.10 (42.11)
0.56
0.54
GBR
8892
7361
275.39 (47.00)
379
281.94 (39.12)
708
260.60 (44.52)
0.14
0.50
GRC
4925
4198
253.57 (46.28)
273
273.74 (41.40)
323
252.68 (45.69)
0.44
0.48
HUN
6149
5665
262.73 (45.31)
207
295.28 (31.97)
648
260.86 (41.36)
0.72
0.88
IRL
5983
4543
266.58 (46.41)
181
285.49 (35.45)
418
266.31 (40.12)
0.41
0.49
ISR
5538
3706
259.23 (54.73)
217
278.85 (40.45)
897
258.92 (48.34)
0.36
0.42
ITA
4621
3909
252.19 (43.34)
153
277.16 (37.01)
322
261.65 (39.83)
0.58
0.40
JPN
5278
4906
295.83 (39.83)
231
309.44 (31.17)
528
295.87 (35.82)
0.34
0.39
KAZ
6050
3010
250.03 (39.04)
130
259.31 (39.57)
397
246.31 (39.91)
0.24
0.33
KOR
6667
6126
271.29 (41.00)
386
301.51 (29.80)
656
289.24 (31.41)
0.75
0.40
LTU
5093
4383
266.00 (40.74)
195
287.90 (34.20)
455
274.26 (38.34)
0.54
0.37
MEX
6306
5847
221.70 (46.40)
248
257.65 (36.65)
1137
228.82 (40.72)
0.78
0.72
NLD
5170
4277
288.10 (44.06)
300
315.22 (34.61)
520
287.83 (36.62)
0.62
0.76
NOR
5128
4052
283.16 (41.80)
199
300.60 (36.69)
661
273.53 (38.13)
0.42
0.72
NZL
6177
4081
283.32 (45.40)
258
294.19 (38.45)
658
274.16 (42.66)
0.24
0.48
PER
7289
6173
194.65 (50.68)
535
234.54 (43.63)
865
201.72 (47.09)
0.80
0.72
POL
9366
7263
265.17 (47.92)
1962
294.65 (37.71)
2446
274.50 (41.78)
0.64
0.50
SVK
5723
4909
274.13 (39.39)
333
291.56 (34.55)
746
270.08 (39.45)
0.45
0.57
SVN
5331
4316
258.04 (46.94)
365
287.74 (32.43)
427
264.89 (41.23)
0.65
0.61
SWE
4469
3496
288.24 (41.52)
130
307.60 (35.20)
581
287.28 (36.51)
0.47
0.56
USA
7921
5957
277.07 (46.36)
468
292.12 (39.56)
1229
270.82 (42.34)
0.33
0.51
M stands for mean, SD for standard deviation, N for sample size and d for Cohen’s d (d-S-G compares the Student and the General samples; d-S-Y the Student and Young samples). Ctry stands for Country.
M stands for mean, SD for standard deviation, N for sample size and d for Cohen’s d (d-S-G compares the Student and the General samples; d-S-Y the Student and Young samples). Ctry stands for Country.
Statistical considerations
Large-scale international assessments such as PIAAC aim to test a broad range of test constructs while minimizing the response burden on the individual. As such, each participant in PIAAC only responded to a subset of test items and a set of plausible values were derived to estimate the individual’s overall proficiency, including on the items they did not respond to [33]. The matrix sampling method of PIAAC determines that the sets of items that each participant encounters and responds to are not identical. To enable an accurate estimation of the measurement error, an individual score in each cognitive skill test is represented as 10 plausible estimates of what that person’s performance would be. Each plausible value is defined on the test scale from 0 to 500 points. When estimating a participant’s performance in, say, a literacy or numeracy task, plausible values are sampled through a bootstrapping procedure to produce both a point-wise estimate and an estimate of variability incurred by the non-identical test items that each participant encounters.Moreover, each participant in the PIAAC survey is associated with a weight, allowing the tested person to stand for a larger segment of the population. The weights are based on census data and determined by the combination of the participant’s age, gender, education, place of residence and additional factors (for details see [34]). Specifically, the PIAAC data use Jackknife Repeated Replication weights that correct for the complex designs of the samples which vary from country to country [34]. Computational procedures have been developed which process the individual plausible values and apply the appropriate weighting to derive estimates of means and variances that are representative of a given participant sample in the given country (for more detail see [33]). In this analysis, nationally representative estimates of literacy and numeracy have been obtained for the General, Young and Student samples using the package instvy, which is provided in the statistical platform R 3.6.1 [35] and is specifically designed for the PIAAC data [36]. To quantify differences between literacy and numeracy scores between samples, we used the classic Cohen’s d metric for independent samples, where the difference of means between samples is divided by the pooled standard deviation accounting for nonequal sample sizes [37, 38]. Estimates of Cohen’s d as an effect size metric are based on estimates of means and standard deviations corrected through weighting to be nationally representative.
Results
Figs 1 and 2 plot the distribution of literacy and numeracy skills respectively among the Student, Young and General samples of all countries combined. Breakdowns of skill distributions by country can be found in the supplementary materials: S1–S32 Figs for the distributions of literacy skills, and S33–S64 Figs for the distributions of numeracy skills. Notably, both in the aggregated data and in specific countries, the distribution of skills in each sample (General, Student, Young) is symmetrical and the Student sample is shifted to the right relative to the Young and General samples.
Fig 1
Distribution of literacy skills among the Student, Young and General samples of all countries.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.
Fig 2
Distribution of numeracy skills among the Student, Young and General samples of all countries.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.
Distribution of literacy skills among the Student, Young and General samples of all countries.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.
Distribution of numeracy skills among the Student, Young and General samples of all countries.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.Tables 1 and 2 report descriptive statistics and sample sizes for General, Young, and Student samples in each country for literacy and numeracy respectively. Additionally, the Tables report effect sizes (Cohen’s d) of comparisons between the Student and Young populations, as well as the Student and General populations (see below).
Table 2
Descriptive statistics of numeracy for the General, Young, and Student national samples, and Cohen’s d estimates of differences between Student and General samples, and Student and Young samples, presented by country.
Ctry
NTotal
General
Student
Young
Group comparisons
N
M(SD)
N
M(SD)
N
M(SD)
d_S_G
d_S_Y
AUT
5130
4136
280.33 (44.70)
111
304.77 (37.68)
650
281.75 (41.56)
0.55
0.56
BEL
5463
4140
282.28 (48.64)
272
300.38 (36.55)
551
278.74 (45.18)
0.38
0.51
CAN
26683
18479
269.63 (52.05)
870
288.92 (46.73)
2653
262.42 (51.26)
0.37
0.53
CHL
5212
4680
203.24 (58.94)
316
248.59 (43.32)
696
208.92 (49.14)
0.78
0.84
CYP
5053
3687
264.56 (46.20)
207
272.14 (39.51)
395
262.06 (43.34)
0.17
0.24
CZE
6102
5398
274.97 (43.13)
453
298.47 (36.51)
986
268.82 (41.02)
0.55
0.75
DEU
5465
4395
276.41 (50.08)
228
305.91 (38.92)
696
273.23 (45.47)
0.59
0.74
ECU
5702
5166
183.50 (53.90)
301
206.19 (45.20)
1064
182.54 (49.10)
0.42
0.49
ESP
6055
4770
247.57 (49.80)
245
278.30 (34.24)
609
248.05 (41.72)
0.63
0.76
EST
7632
6124
273.79 (44.94)
379
300.48 (36.10)
919
269.90 (41.64)
0.60
0.76
FIN
5464
4899
284.63 (48.49)
234
309.67 (39.38)
611
279.25 (42.85)
0.52
0.73
FRA
6993
5620
258.89 (52.70)
357
280.89 (42.32)
672
257.15 (48.27)
0.42
0.51
GBR
8892
7361
265.80 (52.43)
379
273.34 (42.85)
708
252.28 (47.68)
0.14
0.46
GRC
4925
4198
251.18 (48.75)
273
271.04 (41.46)
323
243.90 (46.19)
0.41
0.62
HUN
6149
5665
271.25 (52.25)
207
296.61 (40.37)
648
260.70 (47.90)
0.49
0.78
IRL
5983
4543
254.61 (52.56)
181
272.99 (44.36)
418
251.94 (46.23)
0.35
0.46
ISR
5538
3706
254.68 (63.03)
217
269.98 (49.48)
897
245.94 (54.98)
0.25
0.45
ITA
4621
3909
248.49 (49.25)
153
264.10 (42.17)
322
252.58 (45.30)
0.32
0.26
JPN
5278
4906
287.77 (44.01)
231
301.36 (38.46)
528
277.10 (41.32)
0.31
0.60
KAZ
6050
3010
247.28 (37.24)
130
253.83 (39.23)
397
241.63 (39.14)
0.18
0.31
KOR
6667
6126
262.13 (45.27)
386
290.57 (34.40)
656
276.09 (35.58)
0.64
0.41
LTU
5093
4383
266.35 (47.24)
195
293.38 (36.73)
455
277.27 (41.54)
0.58
0.40
MEX
6306
5847
210.26 (49.69)
248
244.08 (41.11)
1137
214.09 (45.46)
0.69
0.67
NLD
5170
4277
285.37 (45.98)
300
307.07 (36.31)
520
278.29 (40.02)
0.48
0.74
NOR
5128
4052
284.42 (47.41)
199
300.66 (40.97)
661
269.12 (43.37)
0.34
0.74
NZL
6177
4081
272.70 (52.27)
258
284.38 (46.28)
658
261.44 (49.14)
0.22
0.47
PER
7289
6173
179.46 (62.74)
535
216.14 (51.00)
865
180.62 (57.77)
0.59
0.64
POL
9366
7263
258.22 (50.66)
1962
286.39 (42.36)
2446
259.47 (44.48)
0.57
0.62
SVK
5723
4909
276.72 (46.78)
333
293.78 (40.31)
746
272.50 (46.46)
0.37
0.48
SVN
5331
4316
260.41 (52.91)
365
289.70 (36.66)
427
263.64 (44.13)
0.57
0.64
SWE
4469
3496
288.87 (45.67)
130
302.20 (40.06)
581
283.13 (40.87)
0.29
0.47
USA
7921
5957
261.56 (52.77)
468
275.31 (46.70)
1229
251.04 (49.45)
0.26
0.50
M stands for mean, SD for standard deviation, N for sample size and d for Cohen’s d (d-S-G compares the Student and the General samples; d-S-Y the Student and Young samples). Ctry stands for Country.
M stands for mean, SD for standard deviation, N for sample size and d for Cohen’s d (d-S-G compares the Student and the General samples; d-S-Y the Student and Young samples). Ctry stands for Country.In all countries, the mean literacy and numeracy scores of the Student samples were superior to those found among both young adults and among the general populations. On the PIAAC test scale, the mean difference between Student and General samples was 24 points for literacy and 22 points for numeracy. A comparable advantage of the Student sample over the Young sample was observed: 22 points for literacy and 25 points for numeracy. These differences are massive: that is, they are as large as or larger than the difference between the 25th and 75th percentile of literacy (20 points) and numeracy (23 points) for the General samples of all countries. The variance of scores in the Student sample was not statistically different from variances in either the Young or the General sample, neither in terms of literacy nor numeracy (all F < 1.3, all p > 0.5 in F tests). This finding runs counter to the intuition that student samples are more homogenous–due to selectivity of educational institutions and self-selection–than the population at large. However, the finding converges with Hanel and Vione’s (2016) [39] report of a similar variability in personality traits and attitudes among students and general populations of 59 countries.To quantify the differences in a way that is comparable to the relevant psychological literature, we calculated Cohen’s d metric for independent samples. Cohen’s d for the comparison of literacy scores between students and the general population ranged from negligible (d = 0.07, Cyprus) to strong (d = 0.82, Chile), with the median of d = 0.56 and d = 0.40 and d = 0.69 as the first and third quartiles. Effect size estimates for differences in literacy between students and young adults were somewhat stronger and less disperse. They ranged from d = 0.17 (Cyprus) to d = 0.88 (Hungary), with the median d = 0.55: the first and third quartiles of this distribution were d = 0.48 and d = 0.71. Fig 3 plots Cohen’s d estimates by country in the increasing order of the effect size for the Student vs General group comparison (red dots). Values of Cohen’s d for the Student vs Young group comparison are reported in blue triangles.
Fig 3
Effect size comparisons in literacy scores between the Student and General samples and the Student and Young by country.
Red dots represent effect size comparison of Student vs General samples and blue triangles represent effect size comparison for Student vs Young samples.
Effect size comparisons in literacy scores between the Student and General samples and the Student and Young by country.
Red dots represent effect size comparison of Student vs General samples and blue triangles represent effect size comparison for Student vs Young samples.Comparisons of numeracy scores between students and general population across countries showed Cohen’s d values ranging from d = 0.17 (UK) to d = 0.78 (Chile), with the median d = 0.42: the first and third quartiles of this distribution were d = 0.32 and d = 0.57. A comparison of numeracy performance between students and young adults revealed even greater d values. The d values varied between d = 0.24 (Cyprus) and d = 0.84 (Chile), with the median d = 0.55 and d = 0.47 and d = 0.73 as the first and third quartiles. Fig 4 reports Cohen’s d estimates for the Student vs General group comparison (red dots) and the Student vs Young group comparison (blue triangles).
Fig 4
Effect size comparisons in numeracy scores between the Student and General samples and the Student and Young by country.
Red dots represent effect size comparison of Student vs General samples and blue triangles represent effect size comparison for Student vs Young samples.
Effect size comparisons in numeracy scores between the Student and General samples and the Student and Young by country.
Red dots represent effect size comparison of Student vs General samples and blue triangles represent effect size comparison for Student vs Young samples.The importance of the present findings comes to light when compared against meta-analytical estimates of effect sizes of studies published in the field of psychology. An influential meta-analysis and replication of 100 experimental and correlational papers in psychology [40] places the estimated average effect size of the original studies at d = 0.403 (SD = 0.188) and that of the replications at d = 0.197 (SD = 0.257). Another meta-analysis of 447 psychological papers [41] reports a negative correlation between sample size and effect size. While their estimate of the mean effect size d across all sample sizes is close to 0.4, the largest samples in their data (n = 500–1000) only yielded a mean effect size d close to 0.25. Since the sample sizes in our data exceed the maximum considered in Kühberger et al. (2014) [41], a predicted effect size for such samples would hover around d = 0.2. Thus, the effect sizes that we observe in our data exceed the expected values by the factor of 2 to 2.5 for both literacy and numeracy when comparing students to both the general and age-matched non-student populations. Moreover, virtually all individual countries in our analyses showed effects stronger than those expected in the published literature in the field of psychology (for variability of effect sizes across types of studies and subdisciplines of psychology see e.g., [42]). In summary, the results show that drawing conclusions about language and math functioning among groups of adult speakers based on evidence from undergraduate students comes with a strong systematic bias in many countries of the world.
Discussion
The present paper advances the research agenda that examines sampling biases in psychological research, and more specifically literacy and numeracy studies. Convenience samples of undergraduate students are over-represented in the empirical evidence base and play a disproportionately large role in scientific theory-making [1, 2]. Given the common practice of using data from university students to inform theories of linguistic and cognitive processing (as reviewed in the Introduction), reading studies are similarly likely to suffer from a student sampling bias. We quantified just how well undergraduate students represent (i) the general population (age 16–65) of their country and (ii) age-matched non-students (age 16–25) in terms of literacy and numeracy skills across 32 countries and numerous languages and cultures. Most importantly for the current study, the PIAAC data avoid bias within each selected country. That is, students and all other population segments are represented with the same probability as they naturally occur in that country [43].In all countries in the dataset, students’ mean literacy and numeracy scores were far superior to those of both the non-student young adults and the general populations. While the latter fact may not seem surprising, we find it noteworthy given that many participants in the General population were former post-secondary students and furthered their cognitive skills through additional years of practice. While effect sizes varied across countries, median effect sizes in all comparisons either met or exceeded those typically found in psychological literature (d > 0.4) [40–42, 44].These observations lead to several striking conclusions about the practice of studying language behavior and numeracy using convenience pools of university students. First, it is inaccurate to consider students as a group representative of the population at large. They are as different from the General population (excluding students) as the 25th percentile is different from the 75th percentile in that population. Second, it is even less accurate to treat students as a group representative of non-students of the same age. To put “inaccuracy” into perspective, imagine a high powered pre-registered psychological experiment with a treatment and control group. Imagine further that this group difference shows an effect stronger than those typically observed in experimental psychology (d > 0.4). Imagine, finally, that the experimenter interprets the behavior of the treatment group as a valid approximation of the behavior of the control group. In fact, they view the results as support for the null hypothesis and claim the treatment group is representative of the entire population. This scenario is a statistical equivalent of assuming the literacy and numeracy behavior of students represents that of the general or age-matched population of speakers of the same language.One clear theoretical impact of this mismatch between students’ reading skills and those of other populations is that it raises questions about generalizability and external validity of empirical research based on the findings from literacy- and numeracy-related behaviours of undergraduate students. To be clear, sampling from student populations is not in itself a problem, so long as the findings are interpreted within the student population. Yet such disclaimers are rarely found in psychological literature (including in our own work). Consequently, readers can make the logical assumption that findings based on undergraduate student groups generalize over the entire adult population. However, as the findings above indicate, students are rarely representative of the adult population when it comes to literacy and numeracy. Therefore, we hope to have demonstrated that caution is needed when studying phenomena that rely on highly trained cognitive skills such as literacy and numeracy.
Limitations and future directions
The estimates in this study are calculated on the basis of a single, though complex and comprehensive, task. As such, we can only say with certainty that students do not represent other populations in terms of the PIAAC measures of literacy and numeracy. It is up to future research to quantify how representative undergraduate students are on other measures of literacy and numeracy. For instance, we speculate that students will not be representative of other population groups in terms of their fluency in literacy and numeracy-related tasks. Specifically, we predict students to be faster than other populations both because they showed higher accuracy in the tasks reported here, and because of multiple reports of higher being associated with higher speed of task completion: for early reports and recent reviews in reading see [45, 46].Literacy and numeracy, particularly as assessed in PIAAC, require the coordination of multiple cognitive processes and mastery of multiple component skills. We predict that students will also be unrepresentative when it comes to the component skills of reading and numeracy such as working memory, numeric reasoning, and word processing. Since PIAAC does not test these component skills directly, the current study cannot indicate whether these group differences indeed exist or whether the effect sizes will be reduced or amplified on other tasks. Future investigations should continue to quantify differences between student and other populations both on comprehensive literacy and numeracy assessments, as well as tasks targeting their component skills. This paper provides a qualitative indication that such differences are likely to be found.The main question explored in this study–how different are students’ cognitive skills from those of other population groups–is coupled with at least two other questions that are out of the scope of the present paper: (a) what contributes to these differences, and (b) how do these differences influence the inquiry into psychological traits and processes in other domains. Question (a) has been extensively covered in studies of literacy and numeracy development as well as research on post-secondary education (for select reviews see [21, 47–49]). We note however that the by-country breakdown of the differences between samples (reported in Tables 1 and 2) can further boost this research as these differences are likely to be co-determined by demographic and socio-economic characteristics of those countries and their investment in both the spread and quality of (post-secondary) education. The present data do not shed light on question (b), therefore we relegate further exploration of (a) and (b) to future research. We also note that the present study highlighted group differences in advanced behaviors tested in PIAAC data. These behaviors demand a proficient and coordinated use of multiple component skills, including word recognition, reading fluency and reading comprehension. How the over-reliance on sampling university students affects the accuracy of verbal and computational models of such component skills (partly discussed in the Introduction) is an important question for further examination.
Conclusion
To be sure, few researchers of literacy or numeracy are likely to endorse a premise that students accurately represent the literacy or numeracy skills of the general population. Yet it is important to realize that this premise is implicit in the common practice of reporting experimental findings or computational models based on university students without a disclaimer about their limited generalizability. We do not wish to imply that the field of language or numeracy research is ignorant of the problem. To give only a few examples to the contrary, there are ongoing efforts to study literacy in older adults [8, 12, 50], communities of low socio-economic status [51], as well as readers with lower literacy or lower academic attainment populations [20, 52–58]. Additionally, an increasing number of comparative literacy and numeracy studies draw community or representative samples for their hypothesis testing (see among many others [13, 29, 59–62]. Finally, as undergraduate sampling relates to the WEIRD bias, we also highlight the growing body of cross-linguistically comparable samples in literacy research (among others, [18, 63–66].Still, collecting normative population-wide data is an expensive, time-consuming process, and funding agencies can be more reluctant to provide support for such endeavors than for research of groups defined by their clinical, demographic, or social status. Change in the culture of research must be complemented by change in scientific policy-making. We echo the recommendations of Henrich et al. (2010) [3] and Rad et al. (2018) [2] for researchers to explicitly address questions of generalizability in their samples, make data freely available to aid comparative research efforts, collect data broadly within their countries, and build partnerships with community members and researchers, particularly in non-WEIRD countries. Moreover, we urge funding agencies and policy-makers to recognise the importance of minimising the student sampling bias in language research and value projects with representative and non-WEIRD samples accordingly. The movement towards more inclusive data coverage and external support for such coverage is necessary to maintain high standards of psychological research.
Distribution of literacy skills among the Student, Young and General samples of Austria.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Belgium.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Canada.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Chile.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Cyprus.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Czech Republic.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Germany.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Ecuador.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Spain.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Estonia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Finland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of France.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Great Britain.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Greece.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Hungary.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Ireland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Israel.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Italy.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Japan.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Kazakhstan.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Korea.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Lithuania.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Mexico.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Netherlands.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of New Zealand.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Norway.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Peru.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Poland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Slovakia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Slovenia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of Sweden.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of literacy skills among the Student, Young and General samples of United States of America.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Austria.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Belgium.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Canada.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Chile.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Cyprus.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Czech Republic.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Germany.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Ecuador.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Spain.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Estonia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Finland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of France.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Great Britain.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Greece.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Hungary.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Ireland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Israel.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Italy.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Japan.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Kazakhstan.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Korea.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Lithuania.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Mexico.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Netherlands.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of New Zealand.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Norway.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Peru.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Poland.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Slovakia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Slovenia.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of Sweden.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.
Distribution of numeracy skills among the Student, Young and General samples of United States of America.
The red curve represents the Student sample, green represents the Young Sample, and blue represents the General sample.(TIF)Click here for additional data file.21 Apr 2022
PONE-D-22-04101
Just how misleading are student convenience samples? A comparative study of literacy and numeracy in 32 countries
PLOS ONE
Dear Dr. Wild,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.==============================
Two expert Reviewers have now carefully scrutinised your submission (Reviewer 1 chose to remain anonymous, Reviewer 2 is Dr Julia Carroll). In addition, I have read the manuscript with great interest as it deals with a topic close to my heart, and have voiced similar concerns for some time. Both Reviewers and I believe that your manuscript deals with an important and timely topic, and I am of the opinion that a suitably revised version of it would certainly be of interest to the scientific community. However, both Reviewers have highlighted a number of points which need to be taken into account in your revision, and I would like you to address each one of them as I believe they are all (very) valid concerns/suggestions.In line with Reviewer's 1 comment on simulations, I too found the diversion into the modelling part somewhat puzzling as EZ Reader deals with reading time data, which is not tested in the PIAAC. Moreover, while the original parameter settings might have been based on a restricted sample, there is a large literature that show that the model does remarkably well in simulating actual (undergraduate) reading data. Personally, I would suggest to de-emphasise the somewhat hazy link with the modelling literature and provide more detail on important differences that have been found when more diverse populations have been included (see references on lines 298-303).Thank you for submitting this interesting study and I am looking forward to reading a revised version of the manuscript.Steven Frisson==============================Please submit your revised manuscript by Jun 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Steven FrissonAcademic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: PartlyReviewer #2: Yes********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/AReviewer #2: Yes********** 3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: YesReviewer #2: Yes********** 4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: YesReviewer #2: Yes********** 5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Using data from PIAAC, the authors tested whether undergraduate students are representative of the general population and/or of age-matched controls in terms of literacy and numeracy. The authors report large differences between undergraduate students and both the general population and age-matched controls, indicating that findings from studies that use convenience samples of undergraduate students cannot necessarily be expected to generalize.I like this paper for its briefness and clarity, as well as for its potentially very important conclusion. Nevertheless, I believe that some points need to be addressed by the authors before publication:* The title is too strong given the conclusions. The title presupposes that undergraduate samples *are* misleading - something that doesn't necessarily follow from the results, given that, for instance, it is never demonstrated that numeracy actually influences reading behavior (see below). It depends entirely on what is being studied.* There is something of a disconnect between the literature review on page 2 and the discussion of psycholinguistic computational modeling on page 3. It is left to the reader to think about how differences in personality measures or analytic thinking would cast doubt on the generalizability of parameter estimates in the E-Z Reader model, for instance. It may help to briefly introduce some of the relevant parameters and speculate about connections here.* More generally, I wasn't quite sure where the strong initial focus on computational modeling was coming from. The main point - that conclusions drawn from student data may not generalize to the overall population - applies to any study that uses undergraduate students as "controls", even to studies where, say, monolingual undergraduate students are compared to bilingual undergraduate students. It is, for instance, conceivable that the student status of the sampled individuals would nullify any advantages or disadvantages that bilinguals may have compared to monolinguals.* This ties into my next point of criticism: The authors largely avoid speculating about the possible effects that differences in numeracy and literacy as measured by PIAAC may have on, say, online reading behavior. I was not able to find out if PIAAC measured reading times to assess reading fluency, for instance. I understand that it's not the authors' main interest to develop a theory of what connections may exist, but I think readers would welcome some speculation in this regard. This is perhaps even more important for numeracy, where I can't think of any obvious connections from the top of my head.* page 2: CRT, ACT, SAT and WPT should be briefly explained.* page 2, line 59, "The second reason for singling out ..." - The scientific relevance of this point is not clear to me.* page 4, line 67: "Individuals typically need to succeed ..." - This sounds like it's referring to university admission tests, which are not used everywhere in the world. I assume that the authors are also referring more generally to any tests taken during secondary education that influence people's chances of getting admitted to university, but this should be clarified.* page 5, line 115: I looked at the PIAAC sample items, which I had never seen before. What struck me is that the sample items are very specific in that a clear question with a relatively simple, unambiguous answer is asked before participants even start reading. The second item is also not a text but a table. In both cases, 95% of the contents can be ignored by the reader if they only want to answer the question. What I'm asking myself is how representative these sample items are of PIAAC as a whole, and how similar the PIAAC items are to items used in, say, a typical psycholinguistic experiment. This is relevant because the authors' contention is that people who perform differently from the general population in PIAAC can't be reasonably expected to perform similarly to the general population in a psycholinguistic experiment - but if the two tasks mostly tap into unrelated skills, this conclusion would be less justified.* page 6: The "Statistical considerations" section needs some clarification. The section starts with the introduction of "10 plausible values", which the reader doesn't know what to do with at this point. There is some explanation later on, but it doesn't quite become clear to the uninitiated reader what is happening and why the raw data can't be used.* page 8: Instead of Tables 1 and 2, I would have liked to see a histogram of the distribution of literacy and numeracy scores in each sample. For instance, are the scores normally distributed? This is crucial for interpreting mean and SD values. Furthermore, I don't quite see what showing the scores for each country contributes to the main point.* page 10: I'm not sure that the comparison of effect sizes between PIAAC score differences and psychology studies is warranted or useful. Even though Cohen's d is supposed to abstract away from differences wrt *what* was measured on which scale, I think it's still apples and oranges to compare differences in traits between groups of people with differences in dependent variables caused by some experimental manipulation. I think this part could be dropped without loss of clarity or relevance. By contrast, what I would like to see are the estimates of the score differences between groups and their associated confidence intervals. In order to interpret these, it would be helpful to know what the distribution of scores is (see previous point).* page 11: "We identified the magnitude of this bias ... " - This needs clarification. The authors did not identify the magnitude of the undergraduate sampling bias, which to my mind would indicate how much more likely undergraduates are to be sampled compared to other people. I think what the authors mean is that they quantified the sampling bias regarding literacy and numeracy skills that *results* from the undergraduate sampling bias.* References: Many references have incorrect capitalization, incomplete page numbers, and other formatting issues.Reviewer #2: This paper is clear and straightforward, raises an important point, and uses good quality data and analyses. I have only minor queries. Frist, the description of the three different groups on page 6 (line 141) makes it sound like participants could not be part of the 'young' group and the 'general' group, but in actual fact I think from the description that these are overlapping samples. This should be clarified. Second, I felt that is worth emphasising that these estimates have been calculated on the basis of single, relatively complex tasks. It is therefore not possible to know whether similar effect sizes exist for different types of literacy measure or numeracy measure. For example, it may be that group differences are reduced for simple tasks such as word reading, but the authors are not able to confirm this.********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: NoReviewer #2: Yes: Dr Julia Carroll[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
30 May 2022We thank the editor and reviewers for their insightful comments and appreciate their feedback. We feel this revised manuscript is a clearer and more streamlined discussion of how well undergraduate students represent the general population in terms of literacy and numeracy. Below we have addressed each comment individually.Editor:In line with Reviewer's 1 comment on simulations, I too found the diversion into the modellingpart somewhat puzzling as EZ Reader deals with reading time data, which is not tested in thePIAAC. Moreover, while the original parameter settings might have been based on a restrictedsample, there is a large literature that show that the model does remarkably well in simulatingactual (undergraduate) reading data. Personally, I would suggest to de-emphasise thesomewhat hazy link with the modelling literature and provide more detail on importantdifferences that have been found when more diverse populations have been included (seereferences on lines 298-303).We thank the editor for this point. We have removed the modelling literature from the introduction and de-emphasised it in the discussion. We agree that this deletion makes for a more streamlined argument. This section of the introduction (lines 83 – 99) now focuses on how several large datasets such as the British Lexicon Project and Ghent Eye-tracking Corpus, which inform theories of word processing and reading behaviour, collect data primarily or exclusively from undergraduate student samples. We make the link that if students are unrepresentative when it comes to complex skills like numeracy and literacy, they are also likely to be unrepresentative in terms of component skills such as reading behaviour, numeric reasoning, working memory, and cognitive control. In addition, more examples of differences between student and the general population have been included on lines 49 – 82.Reviewer 1:* The title is too strong given the conclusions. The title presupposes that undergraduatesamples *are* misleading - something that doesn't necessarily follow from the results, giventhat, for instance, it is never demonstrated that numeracy actually influences reading behavior(see below). It depends entirely on what is being studied.We revised the title to “How representative are student convenience samples? A study of literacy and numeracy skills in 32 countries”. The title no longer presupposes student samples are misleading. To clarify, the study is not correlating literacy and numeracy nor stating that numeracy influences reading behaviour. Rather we ask how well students’ literacy represents other populations’ literacy levels and, separately, how well students’ numeracy represents other populations’ numeracy levels.* There is something of a disconnect between the literature review on page 2 and thediscussion of psycholinguistic computational modeling on page 3. It is left to the reader to thinkabout how differences in personality measures or analytic thinking would cast doubt on thegeneralizability of parameter estimates in the E-Z Reader model, for instance. It may help tobriefly introduce some of the relevant parameters and speculate about connections here.We thank the reviewer for this point. As noted above in response to the Editor’s comments, we have largely re-worked these sections of the introduction (lines 37 – 99). Discussion of personality measures has been removed and the literature on page 2 now focuses mainly on cognitive differences between undergraduate students and the general population. Lines 37 – 82 make a more explicit link between general cognitive skills and literacy/numeracy in that the tasks used to measure many cognitive skills target the component skills of literacy/numeracy. This is echoed again on lines 95 – 99.* More generally, I wasn't quite sure where the strong initial focus on computational modelingwas coming from. The main point - that conclusions drawn from student data may notgeneralize to the overall population - applies to any study that uses undergraduate students as"controls", even to studies where, say, monolingual undergraduate students are compared tobilingual undergraduate students. It is, for instance, conceivable that the student status of thesampled individuals would nullify any advantages or disadvantages that bilinguals may havecompared to monolinguals.The discussion on computational models has been removed from the present manuscript. Indeed, we agree that this removal makes for a more streamlined argument. Lines 87 – 99 now discuss how several large datasets such as the British Lexicon Project and Ghent Eye-tracking corpus, which inform theories of word processing and reading behaviour, collect data primarily or exclusively from undergraduate student samples. We make the link that if students are unrepresentative when it comes to complex skills like numeracy and literacy, they are also likely to be unrepresentative in terms of component skills such as reading behaviour, numeric reasoning, working memory, and cognitive control.We also agree with the reviewer’s analogy with monolingual vs bilingual students and would like to develop it further. If it were the case that researchers would only use convenience samples of monolingual students and claim their behavior to be representative of all students, they would introduce a systematic bias in their estimates. This is analogous to using all students (monolingual or bilingual) and presenting their performance as representative of that of the general population (monolingual or bilingual). It is also worth a reminder that in the present study we only considered native speakers of the respective languages of testing.* This ties into my next point of criticism: The authors largely avoid speculating about thepossible effects that differences in numeracy and literacy as measured by PIAAC may have on,say, online reading behavior. I was not able to find out if PIAAC measured reading times toassess reading fluency, for instance. I understand that it's not the authors' main interest todevelop a theory of what connections may exist, but I think readers would welcome somespeculation in this regard. This is perhaps even more important for numeracy, where I can'tthink of any obvious connections from the top of my head.While PIAAC does collect reading time data, this information is not available in the public data which we used in this study. We added a speculation in the discussion (lines 460-472), based on existing evidence from the reading literature, that highly proficient readers (like students) are faster than less proficient ones in reading for comprehension. This contrast is likely to propagate to the PIAAC tests of literacy, such that reading times estimates based on students will be unrepresentative (faster) than those of the general population or non-students of the matched age.* page 2: CRT, ACT, SAT and WPT should be briefly explained.Explanations added on lines 42 – 46.* page 2, line 59, "The second reason for singling out ..." - The scientific relevance of this pointis not clear to me.Points 1 and 3 make a scientific argument for our interest in literacy and numeracy skills. This point is intended to highlight the applied societal importance of studying these skills. We present it as such on line 100.* page 4, line 67: "Individuals typically need to succeed ..." - This sounds like it's referring touniversity admission tests, which are not used everywhere in the world. I assume that theauthors are also referring more generally to any tests taken during secondary education thatinfluence people's chances of getting admitted to university, but this should be clarified.Phrasing was adjusted to clarify that we are referring to general testing during secondary education and not American college entrance exams specifically (see lines 104 – 107).* page 5, line 115: I looked at the PIAAC sample items, which I had never seen before. Whatstruck me is that the sample items are very specific in that a clear question with a relativelysimple, unambiguous answer is asked before participants even start reading. The second itemis also not a text but a table. In both cases, 95% of the contents can be ignored by the reader ifthey only want to answer the question. What I'm asking myself is how representative thesesample items are of PIAAC as a whole, and how similar the PIAAC items are to items used in,say, a typical psycholinguistic experiment. This is relevant because the authors' contention isthat people who perform differently from the general population in PIAAC can't be reasonablyexpected to perform similarly to the general population in a psycholinguistic experiment - but ifthe two tasks mostly tap into unrelated skills, this conclusion would be less justified.For more transparency, we have described a sample item on lines 230-231. In terms of how the test items relate to skills tested in psycholinguistic experiments, PIAAC tasks draw on multiple psycholinguistic processes at any given time. The assessment measures reading for a purpose (i.e., to achieve a goal such as gathering knowledge, forming an opinion, evaluating a text, etc.) while simulating the reading demands of everyday work and life. Reading for a purpose requires participants to draw on information processing skills and working memory capacity in addition to basic reading (decoding, vocabulary knowledge, word and sentence processing, inferential reasoning etc.) and numeric (addition, subtraction, quantitative reasoning etc.) skills (PIAAC Literacy Expert Group, 2009). PIAAC data do not provide direct access to the participant performance in these specific component skills (with an exception of a small subset of mostly low-literacy participants (OECD, 2021)). Since literacy and numeracy require combining and coordinating these subskills, performance on the PIAAC assessment can reasonably provide qualitative (if not quantitative) insight into how participants might perform on these component skills. We describe this reasoning in the introduction on lines 187 – 200. Additionally, in the discussion we clarify that this link is made cautiously and can only be used as an indicator that future research is warranted to determine the degree to which the undergraduate student bias affects these component skills directly (see lines 460 – 482).We also construe psycholinguistic experiments as a means to achieve understanding of the cognitive processes as they occur in the real-world tasks and environments, including reading for the purposes described above. One facet of such real-world tasks is navigating the wealth of information of which only a small fraction is relevant for the specific information-processing purpose. Of course, any single experiment, include the PIAAC tasks, only gives us a partial and limited glimpse of those processes, typically in a highly constrained environment. If it is the case, as the reviewer suggests, that strong populational differences observed in the more complex literacy and numeracy tasks may not appear in more constrained psycholinguistic experiments, this may signal a limitation in generalizability of those experiments to the real-world cognitive demands rather than validity of using the more complex/inclusive tasks.PIAAC Literacy Expert Group (2009). PIAAC Literacy: A Conceptual Framework. OECD Education Working Papers, No. 34, OECD Publishing. http://dx.doi.org/10.1787/220348414075OECD (2021). The Assessment Frameworks for Cycle 2 of the Programme for the International Assessment of Adult Competencies. OECD Skills Studies. OECD Publishing: Paris. https://doi.org/10.1787/4bc2342d-en* page 6: The "Statistical considerations" section needs some clarification. The section startswith the introduction of "10 plausible values", which the reader doesn't know what to do with atthis point. There is some explanation later on, but it doesn't quite become clear to theuninitiated reader what is happening and why the raw data can't be used.Additional explanation has been added on lines 283 – 294. Raw data cannot be used because participants only respond to a subset of PIAAC test items. Plausible values were derived to estimate overall proficiency including on items that the participant did not respond to.* page 8: Instead of Tables 1 and 2, I would have liked to see a histogram of the distribution ofliteracy and numeracy scores in each sample. For instance, are the scores normally distributed?This is crucial for interpreting mean and SD values. Furthermore, I don't quite see what showingthe scores for each country contributes to the main point.We thank the reviewer for the suggestion. We have added Figs 1 and 2 which plot the distribution of literacy and numeracy skills respectively among the Student, Young and General samples of all countries combined to the main document (see lines 315-327). In the supporting information section we also include plots to show the distribution of skills for each country individually. Currently, we opted for leaving Tables 1 and 2 in the main paper, but we are prepared to move them to the supplementary materials if this is the Editor’s preference.We feel it is important to include the breakdown of skills by country to allow other researchers to zoom in on the characteristics of these countries and how they may predict the given situation. While out of scope for this paper, it is likely that the socio-economic status of countries and their investment into post-secondary education co-determines effects sizes of the inter-sample differences under consideration. Reporting the by-country breakdown makes it possible for social scientists to pursue this question directly, without retreating to the raw PIAAC data, like we did. We make this future direction explicit on lines 488-492.* page 10: I'm not sure that the comparison of effect sizes between PIAAC score differencesand psychology studies is warranted or useful. Even though Cohen's d is supposed to abstractaway from differences wrt *what* was measured on which scale, I think it's still apples andoranges to compare differences in traits between groups of people with differences independent variables caused by some experimental manipulation. I think this part could bedropped without loss of clarity or relevance. By contrast, what I would like to see are theestimates of the score differences between groups and their associated confidence intervals. Inorder to interpret these, it would be helpful to know what the distribution of scores is (seeprevious point).Score differences themselves are not highly interpretable on the PIAAC assessment. For example, literacy and numeracy scores range from 0-500 points which are translated into 6 levels: below level 1 (0-175 points), level 1 (176-225 points), level 2 (226-275 points), level 3 (276-325 points), level 4 (326-375 points), and level 5 (376-500 points). Individuals scoring at level 2 are also expected to be proficient at level 1. It is not self-evident that reporting a difference in, say, 25 points is more informative than reporting a difference in (effectively) units of pooled standard deviation. Similarly, moving from standardized effect size estimates to score differences would lose the ability to compare the effect sizes observed in our data to those in other literature. Given the emphasis of the current statistical literature on reporting effect sizes (including statistical guidelines to this journal: https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting), we opted for retaining Cohen’s ds. Mathematically, Cohen’s d and confidence intervals of score differences are highly related in that they account for standard errors of the distributions under comparison, thus we do not feel that this decision masks any critical information from the readers.We also respectfully disagree with the “oranges and apples” assessment of the application of Cohen’s d. Just like in the reviewer’s example, we report differences in dependent variables (literacy and numeracy scores), not in group or individual traits. Since seminal Cohen’s power primer (Cohen, 1992), the effect size has been proposed to apply to quantify differences between populations. This approach has also been used in numerous psycholinguistic experiments and meta-studies that compare different cohorts of participants, especially in bilingualism research (see a small selection of references below). Thus, we do not see an immediate reason for shying away from using an estimate of the effect size or from comparing it with the existing psycholinguistic literature.Schroeder, S. R. (2018). Do bilinguals have an advantage in theory of mind? A meta-analysis. Frontiers in Communication, 3, 36.Malgady, R. G., & Costantino, G. (1998). Symptom severity in bilingual Hispanics as a function of clinician ethnicity and language of interview. Psychological Assessment, 10(2), 120.Bird, E. K. R., Cleave, P., Trudeau, N., Thordardottir, E., Sutton, A., & Thorpe, A. (2005). The language abilities of bilingual children with Down syndrome.Souza, A. L., Byers-Heinlein, K., & Poulin-Dubois, D. (2013). Bilingual and monolingual children prefer native-accented speakers. Frontiers in psychology, 4, 953.* page 11: "We identified the magnitude of this bias ... " - This needs clarification. The authorsdid not identify the magnitude of the undergraduate sampling bias, which to my mind wouldindicate how much more likely undergraduates are to be sampled compared to other people. Ithink what the authors mean is that they quantified the sampling bias regarding literacy andnumeracy skills that *results* from the undergraduate sampling bias.We thank the reviewer for this point. We did not measure the magnitude of the bias, rather we quantified the degree to which undergraduate students represent other populations. Revisions made on line 416.* References: Many references have incorrect capitalization, incomplete page numbers, andother formatting issues.Revisions have been made to address formatting issues and missing information. This section was also updated to reflect the revisions addressed in this letter.Reviewer 2:This paper is clear and straightforward, raises an important point, and uses goodquality data and analyses. I have only minor queries. Frist, the description of the three differentgroups on page 6 (line 141) makes it sound like participants could not be part of the 'young'group and the 'general' group, but in actual fact I think from the description that these areoverlapping samples. This should be clarified.We thank the reviewer for a high evaluation of our paper. Participants could indeed be part of both the Young sample and the General sample, however both the Young and General sample are independent of the Student sample. Revisions made on lines 268 – 272 to clarify.Second, I felt that is worth emphasising that these estimates have been calculated on the basis of single, relatively complex tasks. It is therefore not possible to know whether similar effect sizes exist for different types of literacy measure or numeracy measure. For example, it may be that group differences are reduced for simple tasks such as word reading, but the authors are not able to confirm this.We have added a paragraph (lines 460 – 482) emphasising this limitation and while we predict that students will be unrepresentative in other measures, this question must ultimately be answered by future research.Submitted filename: Response to reviewers.docxClick here for additional data file.27 Jun 2022How representative are student convenience samples? A study of literacy and numeracy skills in 32 countriesPONE-D-22-04101R1Dear Dr. Wild,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Steven FrissonAcademic EditorPLOS ONEAdditional Editor Comments (optional):Thank you for addressing all the comments. I asked one of the original Reviewers to have another look at the manuscript and they (as well as I) are happy with the revision. I believe it's an important finding and I personally will not hesitate referring to it once it has been published!StevenReviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed********** 2. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes********** 4. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes********** 5. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes********** 6. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The revision has addressed all my comments on the previous version. I can therefore recommend publication.********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No**********29 Jun 2022PONE-D-22-04101R1How representative are student convenience samples? A study of literacy and numeracy skills in 32 countriesDear Dr. Wild:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Steven FrissonAcademic EditorPLOS ONE