| Literature DB >> 36035588 |
Caitlin Tenison1, Guangming Ling1, Laura McCulla1.
Abstract
In this paper we use historic score-reporting records and test-taker metadata to inform data-driven recommendations that support international students in their choice of undergraduate institutions for study in the United States. We investigate the use of Structural Topic Modeling (STM) as a context-aware, probabilistic recommendation method that uses test-takers' selections and metadata to model the latent space of college preferences. We present the model results from two perspectives: 1) to understand the impact of TOEFL score and test year on test-takers' preferences and choices and 2) to recommend to the test-taker additional undergraduate institutions for application consideration. We find that TOEFL scores can explain variance in the probability that test-takers belong to certain preference-groups and, by accounting for this, our system adjusts recommendations based on student score. We also find that the inclusion of year, while not significantly altering recommendations, does enable us to capture minor changes in the relative popularity of similar institutions. The performance of this model demonstrates the utility of this approach for providing students with personalized college recommendations and offers a useful baseline approach that can be extended with additional data sources. © International Artificial Intelligence in Education Society 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.Entities:
Keywords: Collaborative filtering; International education; Recommender systems; Structural topic modeling; Undergraduate education
Year: 2022 PMID: 36035588 PMCID: PMC9390112 DOI: 10.1007/s40593-022-00307-0
Source DB: PubMed Journal: Int J Artif Intell Educ ISSN: 1560-4292
Fig. 1Graphical illustration of the structural topic model. Grey nodes indicate observable variables. We use the term ‘topic’ to refer to the latent-preference groups estimated by the model
Fig. 2Mean AUC for our held-out participants for 5-105 topic models. Error bars reflect 1 standard deviation in the means across 5 hold-out validation folds. Colors reflect the 4 different covariate structures we tested
Characterization of 10 select topics using descriptions from the Carnegie classification and IPEDs Indiana University Center for Postsecondary Research (n.d.); National Center for Education Statistics (2021)
| Label | Characteristics | Associated Institutions (ranked by FREX score) |
|---|---|---|
| Land-Grant | Large public R1 institutions where bachelor’s degree majors are equally balanced between arts and sciences, and professional fields. | Univ. Connecticut, Ohio State, Indiana Univ. (Bloomington), Pennsylvania State, Univ. of Pittsburgh |
| Private R1 | Private, highly selective R1 institutions. The majority of students enrolled at these institutions are graduate students. | Stanford Univ., Massachusetts Inst. of Tech., Univ. of Pennsylvania, Harvard, Columbia |
| Tech. Focus | Primarily large public R1 institutions that are well known for strong science and engineering programs. | Univ. of Illinois (Urbana-Champaign), Georgia Tech., Univ. of Wisconsin (Madison), Univ. of Michigan (Ann Arbor), Rosehulman Inst. of Tech. |
| Public R1 | Large public R1 institutions where 60-79% of bachelor’s degree majors were in arts and sciences. | Univ. of Virginia, Univ. of North Carolina (Chapel Hill), Emory Univ.,Washington Univ. (St. Louis), Univ. of Michigan (Ann Arbor) |
| U Cal. | Institutions within the University of California (UC) system. | UC San Diego, UC Santa Barbara, UC Davis, UC Irvine, UC Santa Cruz |
| Cal. State | Institutions within the California State (CS) University system. | CS Long Beach, CS Los Angeles, San Francisco State, San Jose State, CS Northridge |
| CA Comm. | Community Colleges located in Northern California. | De Anza Col., Foothill Col., Diablo Valley Col., Ohlone Col., Col. of San Mateo |
| Arts | Institutions focused on Art and Design. | Art Inst. Chicago, Maryland Inst. Col. of Art, Sch. of Visual Arts, Rhode Island Sch. of Design, Ringling Col. of Art and Design |
| Music | Music Conservatories (Cons.) and institutions with well known music programs. | Juilliard, New England Cons. of Music, Manhattan Sch. of Music, San Franscisco Cons. of Music, Cleveland Inst. of Music |
| SLACs | Highly selective small Liberal Arts colleges (SLACs) and universities. | Gettysburg College, Connecticut Col., DePauw Univ., Trinity Col., Skidmore Col., |
Fig. 3Network plot of the latent preference-groups. Text size indicates the proportion of topic within the dataset, edge size indicates the correlation of topic-institution distributions between topics. Nodes characterized in Table 1 are labeled. We only show edges with correlations greater than 0.2
Fig. 4Test-taker’s TOEFL scores affect the expected proportion a given topic is represented by a test-taker. We show the expected topic proportions for topics characterized in Table 1. Topics are plotted as a smooth function of TOEFL score with shading surrounding line representing 95% confidence intervals
Fig. 5A) Institution inclusion across years within the Land-Grant Topic and B) Institutions within the Private R1 University Topic. Distance from the central dotted line indicates the scaled-change in topic-institution distribution between 2015 and 2017. Font size is proportional to the institution’s occurrence within the data set, with institutions such as Ohio State occurring more frequently than Syracuse
Fig. 6Item-based beyond-accuracy measures A) Coverage: percent of institutions in entire dataset recommended across top 1-25 recommendations. B) Spread: Entropy of recommendations across top 1-25 recommendations.C) Novelty: The average normalized novelty of institutions within the first 1-25 recommended institutions
Top 5 recommendations for simulated students with varying TOEFL scores
| Student Selection | Rank | Recommended Institutions | |
|---|---|---|---|
| TOEFL: 70 | TOEFL: 100 | ||
Cal. State Fresno UC San Diego De Anza Col. | 1 | San Jose State | U. Washington (Seattle) |
| 2 | De Anza Col. | UC San Diego | |
| 3 | Santa Monica Col. | UC Irvine | |
| 4 | Cal. State (Long Beach) | UC Davis | |
| 5 | San Diego State | Boston Univ. | |
Swarthmore Col. Williams Col. Wellesley Col. | 1 | Tufts Univ. | Tufts Univ. |
| 2 | Dartmouth Col. | Dartmouth Col. | |
| 3 | Wesleyan Univ. | Wesleyan Univ. | |
| 4 | Brown Univ. | Brown Univ. | |
| 5 | Colby Col. | Colby Col. | |
Johns Hopkins Univ. | 1 | Univ. of Cincinnati | Boston Univ. |
| 2 | Oberlin Col. | Cornell Univ. | |
| 3 | Michigan State | UC Berkeley. | |
| 4 | Arizona State | Univ. Pennsylvania | |
| 5 | Johns Hopkins Univ. | Univ. Illinois (Urbana-Champaign) | |