| Literature DB >> 30475868 |
Abstract
With the high cost of the research assessment exercises in the UK, many have called for simpler and less time-consuming alternatives. In this work, we gathered publicly available REF data, combined them with library-subscribed data, and used machine learning to examine whether the overall result of the Research Excellence Framework 2014 could be replicated. A Bayesian additive regression tree model predicting university grade point average (GPA) from an initial set of 18 candidate explanatory variables was developed. One hundred and nine universities were randomly divided into a training set (n = 79) and test set (n = 30). The model "learned" associations between GPA and the other variables in the training set and was made to predict the GPA of universities in the test set. GPA could be predicted from just three variables: the number of Web of Science documents, entry tariff, and percentage of students coming from state schools (r-squared = .88). Implications of this finding are discussed and proposals are given.Entities:
Mesh:
Year: 2018 PMID: 30475868 PMCID: PMC6258235 DOI: 10.1371/journal.pone.0207919
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Schema of the analysis procedure.
Matrix of correlations in the training Set (n = 79).
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | GPA | 1.00 | ||||||||||||||||||
| 2 | Percentage of overseas (non-EU) postgraduates | 0.59 | 1.00 | |||||||||||||||||
| 3 | Citation impact | 0.50 | 0.52 | 1.00 | ||||||||||||||||
| 4 | h-index | 0.46 | 0.65 | 1.00 | ||||||||||||||||
| 5 | Mean graduate salary | 0.51 | 0.54 | 0.36 | 0.63 | 1.00 | ||||||||||||||
| 6 | Average entry tariff | 0.56 | 0.62 | 0.85 | 0.70 | 1.00 | ||||||||||||||
| 7 | Percentage of students with a disability | -0.32 | -0.42 | -0.22 | -0.32 | -0.54 | -0.39 | 1.00 | ||||||||||||
| 8 | Percentage of students with ADHD | -0.42 | -0.45 | -0.26 | -0.44 | -0.53 | -0.43 | 0.84 | 1.00 | |||||||||||
| 9 | Percentage of graduates employed | 0.27 | 0.08 | 0.07 | 0.29 | 0.12 | 0.38 | -0.08 | -0.06 | 1.00 | ||||||||||
| 10 | Percentage of students from classes 4 to 7* | -0.67 | -0.48 | -0.61 | -0.75 | -0.50 | -0.83 | 0.14 | 0.18 | -0.51 | 1.00 | |||||||||
| 11 | Number of FTE staff assessed | 0.66 | 0.44 | 0.57 | 0.59 | 0.81 | -0.30 | -0.38 | 0.26 | -0.69 | 1.00 | |||||||||
| 12 | Percentage faculty with PhDs | 0.49 | 0.55 | 0.75 | 0.48 | 0.75 | -0.23 | -0.37 | 0.20 | -0.68 | 0.66 | 1.00 | ||||||||
| 13 | Student satisfaction | 0.58 | 0.32 | 0.25 | 0.51 | 0.24 | 0.59 | -0.15 | -0.24 | 0.39 | -0.57 | 0.44 | 0.58 | 1.00 | ||||||
| 14 | Expenditure per student | 0.60 | 0.66 | 0.47 | 0.66 | 0.67 | 0.69 | -0.45 | -0.45 | 0.18 | -0.54 | 0.66 | 0.50 | 0.32 | 1.00 | |||||
| 15 | Student-to-staff ratio | -0.66 | -0.40 | -0.47 | -0.75 | -0.54 | -0.75 | 0.30 | 0.38 | -0.37 | 0.65 | -0.74 | -0.62 | -0.48 | -0.64 | 1.00 | ||||
| 16 | Career prospects | 0.69 | 0.48 | 0.38 | 0.69 | 0.74 | 0.81 | -0.40 | -0.42 | 0.48 | -0.68 | 0.63 | 0.67 | 0.51 | 0.65 | -0.70 | 1.00 | |||
| 17 | University income | 0.63 | 0.42 | 0.57 | 0.62 | 0.78 | -0.33 | -0.43 | 0.21 | -0.63 | 0.96 | 0.62 | 0.35 | 0.67 | -0.70 | 0.62 | 1.00 | |||
| 18 | Students from State schools | -0.67 | -0.52 | -0.59 | -0.84 | -0.71 | - | 0.29 | 0.29 | -0.30 | 0.82 | -0.83 | -0.67 | -0.49 | -0.64 | 0.68 | -0.73 | -0.81 | 1.00 | |
| 19 | Web of Science documents | 0.60 | 0.39 | 0.55 | 0.61 | 0.78 | -0.29 | -0.38 | 0.23 | -0.65 | 0.96 | 0.62 | 0.39 | 0.63 | -0.72 | 0.62 | -0.82 | 1.00 |
Notes: All p’s < .01. Figures appearing in bold font are discussed in the text. Social classes 4 to 7 include: small employers and own account workers, lower supervisory and technical occupations, semi-routine, and routine occupations.
Fig 2Variables ranked by order of predictive importance for GPA in the REF.
Green lines indicate thresholds for inclusion into the model. Solid dot on top of green line indicates variable was selected as a significant predictor of GPA.
Actual and predicted GPAs (Ranks) in the testing subset (n = 30).
| Institution | Actual | Predicted | Actual | Predicted | Rank |
|---|---|---|---|---|---|
| Cambridge* | 3.33 | 3.15 | 1 | 2 | -1 |
| York* | 3.17 | 3.12 | 2 | 8 | -6 |
| Durham* | 3.14 | 3.14 | 3 | 5 | -2 |
| St Andrews | 3.13 | 3.15 | 4 | 2 | 2 |
| Leeds* | 3.13 | 3.13 | 4 | 7 | -3 |
| Newcastle* | 3.09 | 3.21 | 6 | 1 | 5 |
| Nottingham* | 3.09 | 3.14 | 6 | 5 | 1 |
| Birmingham* | 3.07 | 3.15 | 8 | 2 | 6 |
| Strathclyde | 3.04 | 3.01 | 9 | 11 | -2 |
| Aberdeen | 2.97 | 3.05 | 10 | 10 | 0 |
| Leicester | 2.93 | 3.08 | 11 | 9 | 2 |
| Brighton | 2.84 | 2.85 | 12 | 13 | -1 |
| Roehampton | 2.83 | 2.59 | 13 | 23 | -10 |
| SOAS | 2.82 | 3.00 | 14 | 12 | 2 |
| Westminster | 2.72 | 2.71 | 15 | 16 | -1 |
| West of England | 2.70 | 2.78 | 16 | 15 | 1 |
| Kingston | 2.70 | 2.65 | 16 | 19 | -3 |
| Coventry | 2.67 | 2.71 | 18 | 16 | 2 |
| Glasgow Caledonian | 2.67 | 2.62 | 18 | 20 | -2 |
| Oxford Brookes | 2.66 | 2.82 | 20 | 14 | 6 |
| Queen Margaret | 2.65 | 2.38 | 21 | 27 | -6 |
| Middlesex | 2.58 | 2.60 | 22 | 21 | 1 |
| Lincoln | 2.54 | 2.53 | 23 | 24 | -1 |
| Edinburgh Napier | 2.52 | 2.60 | 24 | 21 | 3 |
| Central Lancashire | 2.51 | 2.69 | 25 | 18 | 7 |
| London Met | 2.44 | 2.45 | 26 | 26 | 0 |
| Wales Trinity | 2.39 | 2.37 | 27 | 28 | -1 |
| Anglia Ruskin | 2.37 | 2.46 | 28 | 25 | 3 |
| Bucks New | 2.19 | 2.06 | 29 | 30 | -1 |
| Glyndwr | 2.15 | 2.22 | 30 | 29 | 1 |
Asterisk (*) indicates Russell Group university
Fig 3Predicted vs actual GPAs in the testing subset.
Some universities are not displayed for clarity. See Table 2 for the complete list of universities in the test set with their actual and predicted GPAs.