| Literature DB >> 32836529 |
Yeow Chong Goh1, Xin Qing Cai1, Walter Theseira2, Giovanni Ko3, Khiam Aik Khor4.
Abstract
We study whether humans or machine learning (ML) classification models are better at classifying scientific research abstracts according to a fixed set of discipline groups. We recruit both undergraduate and postgraduate assistants for this task in separate stages, and compare their performance against the support vectors machine ML algorithm at classifying European Research Council Starting Grant project abstracts to their actual evaluation panels, which are organised by discipline groups. On average, ML is more accurate than human classifiers, across a variety of training and test datasets, and across evaluation panels. ML classifiers trained on different training sets are also more reliable than human classifiers, meaning that different ML classifiers are more consistent in assigning the same classifications to any given abstract, compared to different human classifiers. While the top five percentile of human classifiers can outperform ML in limited cases, selection and training of such classifiers is likely costly and difficult compared to training ML models. Our results suggest ML models are a cost effective and highly accurate method for addressing problems in comparative bibliometric analysis, such as harmonising the discipline classifications of research from different funding agencies or countries.Entities:
Keywords: Discipline classification; Supervised classification; Text classification
Year: 2020 PMID: 32836529 PMCID: PMC7367789 DOI: 10.1007/s11192-020-03614-2
Source DB: PubMed Journal: Scientometrics ISSN: 0138-9130 Impact factor: 3.238
Codes and titles of ERC evaluation panels
| Code | Title | Code | Title |
|---|---|---|---|
| PE1 | Mathematics | LS1 | Molecular Biology, Biochemistry, Structural Biology and Molecular Biophysics |
| PE2 | Fundamental Constituents of Matter | LS2 | Genetics, ‘Omics’, Bioinformatics and Systems Biology |
| PE3 | Condensed Matter Physics | LS3 | Cellular and Development Biology |
| PE4 | Physical and Analytical Chemical Sciences | LS4 | Physiology, Pathophysiology and Endocrinology |
| PE5 | Synthetic Chemistry and Materials | LS5 | Neuroscience and Neural Disorders |
| PE6 | Computer Science and Informatics | LS6 | Immunity and Infection |
| PE7 | Systems and Communication Engineering | LS7 | Applied Medical Technologies, Diagnostics, Therapies and Public Health |
| PE8 | Products and Processes Engineering | LS8 | Ecology, Evolution and Environmental Biology |
| PE9 | Universe Sciences | LS9 | Applied Life Sciences, Biotechnology, and Molecular and Biosystems Engineering |
| PE10 | Earth System Science |
Summary of numbers of classifiers and sizes of training and test sets
| Stage | Human classifiersa | Training set | Test set size | ||
|---|---|---|---|---|---|
| Code | Abstracts | Common | Individual | ||
| Undergraduates | 16 | A | 380 | 247 | 0 |
| 16 | B | 380 | |||
| 15 | C | 190 | |||
| 15 | D | 190 | |||
| High-performance undergraduates | 7 | A | 380 | 95 | 152 |
| 8 | B | 380 | |||
| 7 | C | 190 | |||
| 8 | D | 190 | |||
| High-performance undergraduates after feedback | 7 | A | 380 | 95 | 152 |
| 8 | B | 380 | |||
| 7 | C | 190 | |||
| 8 | D | 190 | |||
| Postgraduates | 26 | – | – | 95 | 152 |
aClassifiers that are excluded during analysis are not counted (see “Data Exclusions”)
Fig. 1F1 scores for human and ML classifiers across each stage and training set
Fig. 2F1 scores of high-performance undergraduate classifiers after feedback and the corresponding ML classifiers
Fig. 3F1 scores of postgraduate classifiers and the corresponding ML classifiers
Fig. 4Comparison of undergraduate versus ML classification performance by panel
Fig. 5Comparison of postgraduate versus ML classification performance by panel
Fleiss’ κ of human versus ML classifiers
| Stage | Trg. set | Human classifiers | ML classifiers | ||||
|---|---|---|---|---|---|---|---|
| SE | 95% CI | SE | 95% CI | ||||
| Undergraduates | All | 0.363 | 0.000 | [0.362, 0.364] | 0.515 | 0.007 | [0.501, 0.530] |
| A | 0.375 | 0.001 | [0.372, 0.378] | ||||
| B | 0.373 | 0.002 | [0.370, 0.376] | ||||
| C | 0.371 | 0.002 | [0.368, 0.374] | ||||
| D | 0.376 | 0.002 | [0.373, 0.379] | ||||
| High-performance undergraduates | All | 0.395 | 0.001 | [0.393, 0.398] | 0.540 | 0.010 | [0.521, 0.560] |
| A | 0.416 | 0.006 | [0.405, 0.427] | ||||
| B | 0.398 | 0.005 | [0.389, 0.407] | ||||
| C | 0.443 | 0.005 | [0.432, 0.453] | ||||
| D | 0.361 | 0.005 | [0.352, 0.370] | ||||
| High-performance undergraduates after feedback | All | 0.391 | 0.001 | [0.388, 0.393] | 0.513 | 0.010 | [0.493, 0.533] |
| A | 0.381 | 0.006 | [0.370, 0.392] | ||||
| B | 0.377 | 0.005 | [0.368, 0.387] | ||||
| C | 0.396 | 0.005 | [0.385, 0.406] | ||||
| D | 0.407 | 0.005 | [0.397, 0.416] | ||||
| Postgraduates | – | 0.405 | 0.001 | [0.402, 0.407] | 0.913 | 0.001 | [0.910, 0.915] |
Distribution of gender, subject area, and level of academic experience of participants
| All undergraduates | All postgraduates | |
|---|---|---|
| Male | 34 | 18 |
| Female | 29 | 8 |
| Health sciences | – | 1 |
| Life sciences | 13 | 3 |
| Physical sciences | 46 | 22 |
| Social sciences | 4 | – |
| 1st | 17 | 4 |
| 2nd | 27 | 3 |
| 3rd | 19 | 3 |
| 4th | – | 5 |
| 5th | – | 7 |
| Postdoc | – | 4 |
| 63 | 26 | |
Distribution of detailed subject areas of participants
| All undergraduates | All postgraduates | |
|---|---|---|
| Biological sciences | 4 | – |
| Biochemistry | 9 | 1 |
| Business and management | 3 | – |
| Chemistry | 1 | – |
| Computer science | 4 | 3 |
| Earth sciences | – | 1 |
| Engineering | 25 | 11 |
| Environmental science | 2 | – |
| Material science | 3 | 5 |
| Mathematics | 8 | 1 |
| Medicine | – | 1 |
| Neuroscience | – | 2 |
| Physics | 3 | 1 |
| Social sciences | 1 | – |
| 63 | 26 | |