| Literature DB >> 34644119 |
A J Alvero1, Sonia Giebel1, Ben Gebre-Medhin2, Anthony Lising Antonio1, Mitchell L Stevens1, Benjamin W Domingue1.
Abstract
There is substantial evidence of the relationship between household income and achievement on the standardized tests often required for college admissions, yet little comparable inquiry considers the essays typically required of applicants to selective U.S. colleges and universities. We used a corpus of 240,000 admission essays submitted by 60,000 applicants to the University of California in November 2016 to measure relationships between the content of admission essays, self-reported household income, and SAT scores. We quantified essay content using correlated topic modeling and essay style using Linguistic Inquiry and Word Count. We found that essay content and style had stronger correlations to self-reported household income than did SAT scores and that essays explained much of the variance in SAT scores. This analysis shows that essays encode similar information as the SAT and suggests that college admission protocols should attend to how social class is encoded in non-numerical components of applications.Entities:
Year: 2021 PMID: 34644119 PMCID: PMC8514086 DOI: 10.1126/sciadv.abi9031
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Fig. 1.Conceptual model.
Visualization of previous work, represented by a blue line, and our study, represented by red lines, on the relationship between application materials and household income.
Topics most positively (blue) and negatively (red) correlated with household income and SAT score along with excerpts from essays with highest topic score.
|
|
|
|
|
| Seeking answers (income | question, book, like, research, read, | telescop, astronom, map, probe, | “Ever since the big bang took place, |
| Human nature (income | world, human, natur, passion, | inher, manifest, notion, philosophi, | “From a young age, I have found a |
| China (income | chines, studi, student, also, time, | china, provinc, hong, kong, chines, | “I served as the Chunhui emissary and |
| Achievement words (income | result, provid, initi, began, becam, | dilig, remain, util, attain, endeavor, | “Rather than taking the fundamental |
| Despite words (income | howev, one, may, rather, even, | simpli, rather, may, fact, truli, | “To this day, I cannot begin such an |
| Time management (income | time, work, help, get, school, abl, go | homework, manag, get, stress, | “I do try hard to make sure that I |
| Helping others (income | peopl, help, can, make, way, differ, | peopl, can, other, someon, everyon, | “When I am helping the students, I |
| Tutoring groups (income | help, tutor, colleg, avid, also, go, | avid, tutor, ffa, et, ag, via, tutori | “It has also taught me to seek help |
| Preference words (income | also, like, thing, realli, subject, lot, | realli, lot, thing, good, favorit, | “My greatest talent or skill is acting. I |
| Education opportunity (income | colleg, educ, opportun, take, | advantag, educ, colleg, opportun, | “I also took real college classes with |
Fig. 2.Densities of correlations of essay content and style with SAT scores and household income.
(A) By topics and (B) by dictionary features.
Out-of-sample prediction error for prediction of household income by topics, dictionary features, and SAT scores using 10-fold cross-validation (CV).
|
|
|
|
|
| ||
| SAT Composite | 0.119 | [0.115, 0.124] |
| SAT EBRW | 0.083 | [0.079, 0.087] |
| SAT Math | 0.120 | [0.115, 0.124] |
|
| ||
| Topics | 0.161 | [0.157, 0.167] |
|
| ||
| LIWC | 0.129 | [0.127, 0.136] |
Out-of-sample prediction error for prediction of SAT score by topics using 10-fold CV.
|
|
|
|
|
|
| |||
| SAT Composite | 0.486 | [0.478, 0.489] | 124.87 |
| SAT EBRW | 0.428 | [0.419, 0.431] | 64.83 |
| SAT Math | 0.473 | [0.466, 0.477] | 74.34 |
|
| |||
| SAT Composite | 0.436 | [0.428, 0.440] | 130.85 |
| SAT EBRW | 0.369 | [0.362, 0.374] | 68.05 |
| SAT Math | 0.405 | [0.399, 0.410] | 78.96 |
Fig. 3.R2 of total SAT when stratified by household income decile.
Explained by topics (left) and dictionary features (right).
Fig. 4.Results from ldatuning suggesting 70 topics.
(Top) Algorithms that suggest number of topics based on minimizing certain statistical properties of the data and model and (bottom) algorithms that suggest number of topics based on maximizing certain statistical properties. The number of topics where the four algorithms are closest (70) is chosen for our model.