| Literature DB >> 33793563 |
Paul J Albert1, Sarbajit Dutta2, Jie Lin3, Zimeng Zhu4, Michael Bales5, Stephen B Johnson6, Mohammad Mansour2, Drew Wright5, Terrie R Wheeler5, Curtis L Cole2.
Abstract
Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian's approach for generating a scholar's list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed.Entities:
Year: 2021 PMID: 33793563 PMCID: PMC8016248 DOI: 10.1371/journal.pone.0244641
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Significance of accuracy designations in Feature Generator API output.
| Computed total standardized article score (1–10) is greater than or equal to threshold (1–10) supplied by user | Computed total standardized article score (1–10) is less than threshold (1–10) supplied by user | |
|---|---|---|
| User feedback as recorded in GoldStandard table is ACCEPTED | True Positive (TP) | False Negative (FN) |
| User feedback as recorded in GoldStandard table is REJECTED or NULL | False Positive (FP) | True Negative (TN) |
Weights were optimized to maximize accuracy when all articles are first pooled and then accuracy was computed.
Accuracy of ReCiter.
| Pool all articles, then compute accuracy | Pool all articles, then compute accuracy | Compute accuracy for individuals, then average | Compute accuracy for individuals, then average | |
|---|---|---|---|---|
| PubMed + Scopus | PubMed only | PubMed + Scopus | PubMed only | |
| Accuracy | 0.9826 | 0.9795 | 0.9363 | 0.9231 |
| Balanced accuracy | 0.9487 | 0.9658 | 0.8505 | 0.8424 |
| Precision | 0.8706 | 0.8539 | 0.8299 | 0.7913 |
| Recall | 0.9040 | 0.9489 | 0.8824 | 0.8786 |