| Literature DB >> 29596415 |
David Moher1,2, Florian Naudet2,3, Ioana A Cristea2,4, Frank Miedema5, John P A Ioannidis2,6,7,8,9, Steven N Goodman2,6,7.
Abstract
Assessment of researchers is necessary for decisions of hiring, promotion, and tenure. A burgeoning number of scientific leaders believe the current system of faculty incentives and rewards is misaligned with the needs of society and disconnected from the evidence about the causes of the reproducibility crisis and suboptimal quality of the scientific publication record. To address this issue, particularly for the clinical and life sciences, we convened a 22-member expert panel workshop in Washington, DC, in January 2017. Twenty-two academic leaders, funders, and scientists participated in the meeting. As background for the meeting, we completed a selective literature review of 22 key documents critiquing the current incentive system. From each document, we extracted how the authors perceived the problems of assessing science and scientists, the unintended consequences of maintaining the status quo for assessing scientists, and details of their proposed solutions. The resulting table was used as a seed for participant discussion. This resulted in six principles for assessing scientists and associated research and policy implications. We hope the content of this paper will serve as a basis for establishing best practices and redesigning the current approaches to assessing scientists by the many players involved in that process.Entities:
Mesh:
Year: 2018 PMID: 29596415 PMCID: PMC5892914 DOI: 10.1371/journal.pbio.2004089
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
A list of sources examining the problems, potential unanticipated consequences, proposed solutions, and potential limitations when assessing science and scientists.
| Geographic region | The stated perspective of the problems assessing science and scientists | Unintended consequences of maintaining the current assessment scheme | Proposed solutions for assessing scientists | Potential limitations of proposal | |
|---|---|---|---|---|---|
| ACUMEN (2014) [ | Europe | The ACUMEN consortium was focused on ‘understanding the ways in which researchers are evaluated by their peers and by institutions, and at assessing how the science system can be improved and enhanced’. They noted that, ‘Currently, there is a discrepancy between the criteria used in performance assessment and the broader social and economic function of scientific and scholarly research’. | The report points to five problems: 1. ‘Evaluation criteria are still dominated by mono-disciplinary measures, which reflect an important but limited number of dimensions of the quality and relevance of scientific and scholarly work’; | The consortium has developed criteria and guidelines for GEP and has designed a prototype for a web-based ACUMEN performance portfolio. The GEP focuses on three indicators for academic assessment: (1) expertise, (2) outputs, and (3) impacts. The performance portfolio is divided into four parts: (1) narrative and academic age calculation, (2) expertise, (3) output, and (4) influence. | The ACUMEN portfolio stresses the inclusion of an evidence-based narrative, in which researchers tell ‘their story’. The risk is that researchers might be nudged to mold or distort their work and achievements to make them fit in a compelling narrative. Moreover, the already prevalent self-marketing techniques (i.e., how to best ‘sell’ yourself) risk becoming even more pervasive and interfering with the quality of research. Some valuable researchers might not have a coherent story to tell nor be adept narrators. |
| Amsterdam Call for Action on Open Science (2016) [ | Europe | Participants at the Open Science—From Vision to Action—Conference noted that ‘Open science presents the opportunity to radically change the way we evaluate, reward, and incentivise science. Its goal is to accelerate scientific progress and enhance the impact of science for the benefit of society’. | Conference participants argue that maintaining the current scheme will continue to create a climate of disconnect between old-world bibliometrics and newer approaches, such as a commitment to open science—‘This emphasis does not correspond with our goals to achieve societal impact alongside scientific impact’. | The conference participants’ vision is for ‘New assessment, reward and evaluation systems. New systems that really deal with the core of knowledge creation and account for the impact of scientific research on science and society at large, including the economy, and incentivise citizen science’. To reach this goal, the Call for Action recommends action in four areas: (1) complete OA, (2) data sharing, (3) new ways of assessing scientists, and (4) introducing evidence to inform best practices for the first three themes. Twelve action items are proposed: to (1) change assessment, evaluation, and reward systems in science; (2) facilitate text and data mining of content; (3) improve insight into IPR and issues such as privacy; (4) create transparency on the costs and conditions of academic communication; (5) introduce FAIR and secure data principles; (6) set up common e-infrastructures; (7) adopt OA principles; (8) stimulate new publishing models for knowledge transfer; (9) stimulate evidence-based research on innovations in open science; (10) develop, implement, monitor, and refine OA plans; (11) involve researchers and new users in open science; and (12) encourage stakeholders to share expertise and information on open science. | Although the Amsterdam Call provides several actionable steps stakeholders can take, some of the actions face barriers to implementation. For example, to change assessment, evaluation, and reward schemes, one action is to sign DORA, although few organizations have done so. Similarly, even if hiring, promotion, and tenure committees wanted to move away from focusing solely on JIFs, it is unclear whether they could do so without the broader research institution agreeing to such modifications in assessing scientists. |
| DORA (2012) [ | International | At the 2012 annual meeting of the American Society for Cell Biology, a group of publishers and editors noted ‘a pressing need to improve the ways in which the output of scientific research is evaluated by funding agencies, academic institutions, and other parties’. In response, the San Francisco DORA was produced. | DORA points to the critical problems with using JIF as a measure of a scientist’s worth: ‘the Journal Impact Factor has a number of well-documented deficiencies as a tool for research assessment’. | DORA has one general recommendation—do not use journal-based metrics, such as JIFs, as surrogate measures of the quality of individual research articles to assess an individual scientist’s contributions in hiring, promotion, or funding decisions and 17 specific recommendations, for researchers: (1) focus on content; (2) cite primary literature; (3) use a range of metrics to show the impact of your work; (4) change the culture; funders: (5) state that scientific content of a paper, not the JIF of the journal in which it was published, is what matters; (6) consider value from all outputs and outcomes generated by research; research institutions: (7) when hiring and promoting, state that scientific content of a paper, not the JIF of the journal in which it was published, is what matters; (8) consider value from all outputs and outcomes generated by research; publishers: (9) cease to promote journals by impact factor; (10) provide an array of metrics; (11) focus on article-level metrics; (12) identify different author contributions, open the bibliometric citation data; (13) encourage primary literature citations; and organizations that supply metrics: (14) be transparent; (15) provide access to data; (16) discourage data manipulation; and (17) provide different metrics for primary literature and reviews. | There is a focus on citing primary research. Within medicine, citing systematic reviews is often preferred. A clarification on this point would be useful. DORA is silent on how stakeholders should optimally implement their recommendations. Similarly, DORA does not provide guidance on whether (and how) hiring, promotion, and tenure committees should monitor adherence to and implementation of their recommendations. |
| The Leiden Manifesto (2015) [ | International | At the 2014 International Conference on Science and Technology Indicators in Leiden, a group of scientometricians met to discuss how data are being used to govern science, including evaluating scientists. The manifesto authors observed that ‘research evaluations that were once bespoke and performed by peers are now routine and reliant on metrics’. | ‘The problem is that evaluation is now led by the data rather than by judgement. Metrics have proliferated: usually well intentioned, not always well informed, often ill applied’. | The Leiden Manifesto proposes 10 best practices: (1) quantitative evaluation should support qualitative, expert assessment; (2) measure performance against the research missions of the institution, group, or researcher; (3) protect excellence in locally relevant research; (4) keep data collection and analytical processes open, transparent, and simple; (5) allow those evaluated to verify data and analysis; (6) account for variation by field in publication and citation practices; (7) base assessment of individual researchers on a qualitative judgment of their portfolio; (8) avoid misplaced concreteness and false precision; (9) recognise the systemic effects of assessment and indicators; and (10) scrutinise indicators regularly and update them. Abiding by these 10 principles, research evaluation can play an important part in the development of science and its interactions with society. | The focus of the Leiden Manifesto is on research metrics. Beyond the focus on metrics, it is unclear what the broader scientific community thinks of these principles. Similarly, it is not clear how best promotion and tenure committees might implement them. For example, while allowing scientists to review and verify their evaluation data (principle 5), it is less clear as to how this could be easily monitored. For example, should there be audit and feedback about each principle? |
| Wilsdon (The Metric Tide) (2015) [ | United Kingdom | The report was initiated to evaluate the role of metrics in research assessment and management as part of the UK’s REF. The report takes a ‘deeper look at potential uses and limitations of research metrics and indicators. It has explored the use of metrics across different disciplines, and assessed their potential contribution to the development of research excellence and impact’. | The report raises concerns ‘that some quantitative indicators can be gamed, or can lead to unintended consequences; journal impact factors and citation counts are two prominent examples’. | The report proposes five attributes to improve the assessment of researchers: (1) robustness, (2) humility, (3) transparency, (4) diversity, and (5) reflexivity. The report also makes 20 recommendations dealing with a broad spectrum of issues related to research assessment for stakeholders to consider: (1) the research community should develop a more sophisticated and nuanced approach to the contribution and limitations of quantitative indicators; (2) at an institutional level, higher education institution leaders should develop a clear statement of principles on their approach to research management and assessment, including the role of quantitative indicators; (3) research managers and administrators should champion these principles and the use of responsible metrics within their institutions; (4) human resources managers and recruitment or promotion panels in higher education institutions should be explicit about the criteria used for academic appointment and promotion decisions; (5) individual researchers should be mindful of the limitations of particular indicators; (6) research funders should develop their own context-specific principles for the use of quantitative indicators in research assessment and management; (7) data providers, analysts, and producers of university rankings and league tables should strive for greater transparency and interoperability between different measurement systems; (8) publishers should reduce emphasis on JIFs as a promotional tool, and only use them in the context of a variety of journal-based metrics that provide a richer view of performance; (9) there is a need for greater transparency and openness in research data infrastructure; (10) a set of principles should be developed for technologies, practices, and cultures that can support open, trustworthy research information management; (11) the UK research system should take full advantage of ORCID as its preferred system of unique identifiers. ORCID IDs should be mandatory for all researchers in the next REF; (12) identifiers are also needed for institutions, and the most likely candidate for a global solution is the ISNI, which already has good coverage of publishers, funders, and research organizations; (13) publishers should mandate ORCID IDs and ISNIs and funder grant references for article submission and retain this metadata throughout the publication life cycle; (14) the use of DOIs should be extended to cover all research outputs; (15) further investment in research information infrastructure is required; (16) HEFCE, funders, HEIs, and Jisc should explore how to leverage data held in existing platforms to support the REF process, and vice versa; (17) BIS should identify ways of linking data gathered from research-related platforms (including Gateway to Research, Researchfish, and the REF) more directly to policy processes in BIS and other departments; in assessing outputs, we recommend that quantitative data—particularly around published outputs—continue to have a place in informing peer-review judgements of research quality; in assessing impact, we recommend that HEFCE and the UK HE funding bodies build on the analysis of the impact case studies from REF2014 to develop clear guidelines for the use of quantitative indicators in future impact case studies; in assessing the research environment, we recommend that there is scope for enhancing the use of quantitative data but that these data need to be provided with sufficient context to enable their interpretation; (18) the UK research community needs a mechanism to carry forward the agenda set out in this report; (19) the establishment of a Forum for Responsible Metrics, which would bring together research funders, HEIs and their representative bodies, publishers, data providers, and others to work on issues of data standards, interoperability, openness, and transparency; research funders need to increase investment in the science of science policy; and (20) one positive aspect of this review has been the debate it has generated. As a legacy initiative, the steering group is setting up a blog ( | The Metric Tide, although independent, was commissioned by the Minister for Universities and Science in 2014, in part to inform the REF used by universities across the country. Although the recommendations are sound, some of them might be perceived as being UK- centric. To what degree the recommendations might apply and be implemented on a global scale is unclear. |
| NAS (2015) [ | United States | The US NAS and the Annenberg Retreat at Sunnylands convened this group of senior scientists ‘to examine ways to remove some of the current disincentives to high standards of integrity in science’. Incentives and rewards in academic promotion were included in this examination. | The authors indicate that if the current system does not evolve, there will be serious threats to the credibility of science. They state, ‘If science is to enhance its capacities to improve our understanding of ourselves and our world, protect the hard-earned trust and esteem in which society holds it, and preserve its role as a driver of our economy, scientists must safeguard its rigor and reliability in the face of challenges posed by a research ecosystem that is evolving in dramatic and sometimes unsettling ways’. | The authors ‘believe that incentives should be changed so that scholars are rewarded for publishing well rather than often. In tenure cases at universities, as in grant submissions, the candidate should be evaluated on the importance of a select set of work, instead of using the number of publications or impact rating of a journal as a surrogate for quality’. | Because new incentives could be potentially damaging, authors ‘urge that each be scrutinized and evaluated before being broadly implemented’. |
| Nuffield Council on Bioethics (2014) [ | UK | The Nuffield’s Culture of Scientific Research in the UK report ‘aimed to inform and advance debate about the ethical consequences of the culture of scientific research in terms of encouraging good research practice and the production of high quality science’. | The feedback received cautioned maintaining the focus on JIFs—'This is believed to be resulting in important research not being published, disincentives for multidisciplinary research, authorship issues, and a lack of recognition for non-article research outputs’. | The report suggested actions for different stakeholders (funders, publishers and editors, research institutions, researchers, and learned society and professional bodies) to consider, principally, (1) improving transparency; (2) improving the peer-review process (e.g., by training); (3) cultivating an environment based on the ethics of research; (4) assessing broadly the track records of researchers and fellow researchers; (5) involving researchers in policy making in a dialogue with other stakeholders; and (6) promoting standards for high-quality science. | While the authors suggested different actions for various stakeholders, they emphasised that ‘a collective and coordinated approach is likely to be the most effective’. Such collaborative actions may be difficult to operationalise and implement. |
| REWARD (2014) [ | Multinational | The Lancet commissioned a series, ‘Increasing value: Reducing Waste’, and follow-up conference to address the credibility of scientific research. The commissioning editors asked whether ‘the fault lie with myopic university administrations led astray by perverse incentives or with journals that put profit and publicity above quality?’ | If the current bibliometric system is maintained, there is a real risk that scientists will be ‘judged on the basis of the impact factors of the journals in which their work is published’. Impact factors are weakly correlated with quality. | The REWARD series makes 17 recommendations covering a broad spectrum of stakeholders: (1) more research on research should be done to identify factors associated with successful replication of basic research and translation to application in healthcare and how to achieve the most productive ratio of basic to applied research; (2) research funders should make information available about how they decide what research to support and fund investigations of the effects of initiatives to engage potential users of research in research prioritisation; (3) research funders and regulators should demand that proposals for additional primary research are justified by systematic reviews showing what is already known, and increase funding for the required syntheses of existing evidence; (4) research funders and research regulators should strengthen and develop sources of information about research that is in progress, ensure that they are used by researchers, insist on publication of protocols at study inception, and encourage collaboration to reduce waste; (5) make publicly available the full protocols, analysis plans or sequence of analytical choices, and raw data for all designed and undertaken biomedical research; (6) maximise the effect-to-bias ratio in research through defensible design and conduct standards, a well-trained methodological research workforce, continuing professional development, and involvement of nonconflicted stakeholders; (7) reward with funding and academic or other recognition reproducibility practices and reproducible research, and enable an efficient culture for replication of research; (8) people regulating research should use their influence to reduce other causes of waste and inefficiency in research; (9) regulators and policy makers should work with researchers, patients, and health professionals to streamline and harmonise the laws, regulations, guidelines, and processes that govern whether and how research can be done, and ensure that they are proportionate to the plausible risks associated with the research; (10) researchers and research managers should increase the efficiency of recruitment, retention, data monitoring, and data sharing in research through the use of research designs known to reduce inefficiencies, and do additional research to learn how efficiency can be increased; (11) everyone, particularly individuals responsible for healthcare systems, can help to improve the efficiency of clinical research by promoting integration of research in everyday clinical practice; (12) institutions and funders should adopt performance metrics that recognise full dissemination of research and reuse of original datasets by external researchers; (13) investigators, funders, sponsors, regulators, research ethics committees, and journals should systematically develop and adopt standards for the content of study protocols and full study reports, and for data-sharing practices; (14) funders, sponsors, regulators, research ethics committees, journals, and legislators should endorse and enforce study registration policies, wide availability of full study information, and sharing of participant-level data for all health research; (15) funders and research institutions must shift research regulations and rewards to align with better and more complete reporting; (16) research funders should take responsibility for reporting infrastructure that supports good reporting and archiving; and (17) funders, institutions, and publishers should improve the capability and capacity of authors and reviewers in high-quality and complete reporting. There is a recognition of problems with academic reward systems that appear to focus on quantity more than quality. Part of the series includes a discussion about evaluating scientists on a set of best practices, including reproducibility of research findings, the quality of the reporting, complete dissemination of the research, and the rigor of the methods used. | There is little in the series about the relationship between the trustworthiness of biomedical research and hiring, promotion, and tenure of scientists. Similarly, the series does not propose an action plan for examining hiring, promotion, and tenure practices. |
| REF [ | UK | There is a need to go beyond traditional quantitative metrics to gain a more in-depth assessment of the value of academic institutions. | Not being able to identify the societal value (e.g., public funding of higher education institutions and the impact of the research conducted) of academic institutions. | The REF is a new national initiative to assess the quality of research in higher education institutions assessing institutional outputs, impact, and environment covering 36 fields of study (e.g., law, economics, and econometrics). Outputs account for 65% of the assessment (i.e., ‘are the product of any form of research, published, such as journal articles, monographs and chapters in books, as well as outputs disseminated in other ways such as designs, performances and exhibitions’). Impact accounts for 20% of the assessment (e.g., ‘is any effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’), and the environment accounts for 15% of the assessment (e.g., ‘the strategy, resources and infrastructure that support research’). | In the absence of a set of quantifiers for assessing the impact of the environment, it remains unclear if, based on the descriptions proposed, different evaluators would reach the same conclusions. This might be the case more for some criteria and less for others. The REF might stifle innovation and decrease collegiality across universities. Other limitations have been noted [ |
| Benedictus (UMC Utrecht, 2016) [ | the Netherlands | The authors’ view, inspired by their related initiative, Science in Transition, is that ‘bibliometrics are warping science—encouraging quantity over quality’. | Focusing on meaningless bibliometrics is keeping scientists ‘from doing what really mattered, such as strengthening contacts with patient organizations or trying to make promising treatments work in the real world’. | To move away from using bibliometrics to evaluate scientists, the authors propose five alternative behaviours to assess: (1) managerial responsibilities and academic obligations; (2) mentoring students, teaching, and additional new responsibilities; (3) if applicable, a description of clinical work; (4) participation in organising clinical trials and research into new treatments and diagnostics; and (5) entrepreneurship and community outreach. | This is an intervention at the institutional level to try to change incentives and rewards. A novel set of criteria used to evaluate research and researchers was designed and has been introduced in a large academic medical centre. |
| Edwards (2017) [ | US | Two engineers are concerned about the use of quantitative metrics to assess the performance of researchers. | The authors argue that continued reliance on quantitative metrics may lead to substantive and systemic threats to scientific integrity. | To deal with incentives and hypercompetition, the authors have proposed (1) that more data are needed to better understand the significance and extent of the problem; (2) that funding should be provided to develop best practices for assessing scientists for promotion, tenure, and hiring; (3) better education about scientific misconduct for students; (4) incorporating qualitative components, such as service to community, into PhD training programs; and (5) the need for academic institutions to reduce their reliance on quantitative metrics to assess scientists. | Some of the proposals like convening a panel of experts to develop guidelines for the evaluation of candidates or reframing the PhD as an exercise in ‘character building’ might be ineffective, confuse the panorama further, and not have support from a considerable part of the scientific community. Panels of experts are a notoriously unreliable and subjective source of evidence, they are exposed to groupthink, potential conflicts of interest, and reinforcing already existing biases. There is no reason to assume expertise is also associated with coming up with good practices. Given the financial hardships, gruesome work, and tough completion, the PhD program already is a Victorian exercise in ‘character building’. |
| Ioannidis (2014) [ | US | This essay focuses on developing ‘effective interventions to improve the credibility and efficiency of scientific investigation’. | Currently, scientific publications are often ‘false or grossly exaggerated, and translation of knowledge into useful applications is often slow and potentially inefficient’. Unless we develop better scientific and publication practices, much of the scientific output will remain grossly wasted. | The author proposes 12 best practices to achieve truth and credibility in science. These include (1) large-scale collaborative research; (2) adoption of a replication culture; (3) registration; (4) sharing; (5) reproducibility practices; (6) better statistical methods; (7) standardisation of definitions and analyses; (8) more appropriate (usually more stringent) statistical thresholds; (9) improvement in study design standards; (10) stronger thresholds for claims of discovery; (11) improvements in peer review, reporting, and dissemination of research; and (12) better training of the scientific workforce. These best practices could be used as research currencies for promotion and tenure. The author provides examples of how these best practices can be used in different ways as part of the reward system for evaluating scientists. | The author acknowledges that ‘interventions to change the current system should not be accepted without proper scrutiny, even when they are reasonable and well intended. Ideally, they should be evaluated experimentally’. Many of the research practices lack sufficient empirical evidence as to their worth. |
| Mazumdar (2015) [ | US | This group was focused on ways to assess team science—often the role biostatisticians find themselves in. Their view is that ‘those responsible for judging academic productivity, including department chairs, institutional promotion committees, provosts, and deans, must learn how to evaluate performance in this increasingly complex framework’. | Concentrating on traditional metrics ‘can substantially devalue the contributions of a team scientist’. | To assess the research component of a biostatistician as part of a team collaboration, the authors propose a flexible quantitative and qualitative framework involving four evaluation themes that can be applied broadly to appointment, promotion, and tenure decisions. These criteria are: design activities, implementation activities, analysis activities, and manuscript reporting activities. | The authors state, ‘The paradigm is generalizable to other team scientists’. However, ‘because team scientists come from many disciplines, including the clinical, basic, and data sciences, the same criteria cannot be applicable to all’. One limitation is the potential gaming of such a flexible scheme. |
| Ioannidis PQRST (2014) [ | US | The authors of this paper state the problem as ‘scientists are typically rewarded for publishing articles, obtaining grants, and claiming novel, significant results’. | The authors note, ‘However, emphasis on publication can lead to least publishable units, authorship inflation, and potentially irreproducible results’. In short, this type of assessment might tarnish science and how scientists are evaluated. | To reduce our reliance on traditional quantitative metrics for assessing and rewarding research, the authors propose a best practice index—PQRST—revolving around productivity, quality, reproducibility, sharing, and translation of research. The authors also propose examples on how each item could be operationalised, e.g., for productivity; examples include number of publications in the top tier percentage of citations for the scientific field and year, proportion of funded proposals that have resulted in ≥1 published reports of the main results, and proportion of registered protocols that have been published 2 years after the completion of the studies. Similarly, one could count the proportion of publications that fulfill ≥1 quality standards; proportion of publications that are reproducible; proportion of publications that share their data, materials, and/or protocols (whichever items are relevant); and proportion of publications that have resulted in successful accomplishment of a distal translational milestone, e.g., getting promising results in human trials for interventions tested in animals or cell cultures, or licensing of intervention for clinical trials. | The authors acknowledge that some indicators require building new tools to capture them reliably and systematically. For quality, one needs to select standards that may be different per field/design and this requires some consensus within the field. There is no wide-coverage automated database currently for assessing reproducibility, sharing, and translation, but proposals are made on how this could be done systematically and who might curate such efforts. Focusing on the top, most influential publications may also help streamline the process. |
| Nosek (2015) [ | US | The authors state the problem as truth versus publishability—'The real problem is that the incentives for publishable results can be at odds with the incentives for accurate results. This produces a conflict of interest. The conflict may increase the likelihood of design, analysis, and reporting decisions that inflate the proportion of false results in the published literature’. | With the perverse ‘publish or perish’ mantra, the authors argue that authors may feel compelled to fabricate their results and undermine the integrity of science and scientists. ‘With flexible analysis options, we are more likely to find the one that produces a more publishable pattern of results to be more reasonable and defensible than others’. | The authors propose a series of best practices that might resolve the aforementioned conflicts. These best practices include restructuring the current incentive/reward scheme for academic promotion and tenure, use of reporting guidelines, promoting better peer review, and journals devoted to publishing replications or statistically negative results. | The analyses and proposed actions and interventions are very clear and generate awareness. To have even more effect, these actions should be taken up by leaders in academia and institutions. Actions by funding agencies may also be required to set proper criteria for their reviewers of grant proposals. |
| US | The editors of | The editors state, ‘The focus on publication in a high impact-factor journal as the prize also distracts attention from other important responsibilities of researchers—such as teaching, mentoring and a host of other activities (including the review of manuscripts for journals!). For the sake of science, the emphasis needs to change’. | To help counter these problems, the editors discuss several options, such as repositories for sharing information: Dryad for datasets; Figshare for primary research, figures, and datasets; and Slideshare for presentations. | While exemplary, it is unclear how widespread these initiatives will become and whether there are implementation hurdles. To have broader impact, similar initiatives need to be endorsed and implemented in thousands of journals. | |
| UK | The journal’s perspective is that ‘metrics are intrinsically reductive and, as such, can be dangerous. Relying on them as a yardstick of performance, rather than as a pointer to underlying achievements and challenges, usually leads to pathological behaviour. The journal impact factor is just such a metric’. | Relying on the JIF will maintain the aforementioned problems. | To help combat these problems, the journal has proposed two solutions: first, ‘applicants for any job, promotion or funding should be asked to include a short summary of what they consider their achievements to be, rather than just to list their publications’. Second, ‘journals need to be more diverse in how they display their performance’. | While the use of diverse metrics is a positive step, they are not helpful for researchers across different disciplines. For example, Altmetrics does not have field-specific scores yet. It is difficult to know what these alternative metrics mean and how they should be considered within a researcher’s evaluation portfolio. | |
| RCR (2015) [ | US | These authors were interested in developing a scientifically rigorous alternative to the current perverse prestige of the JIF for assessing the merit of publications and, by association, scientists. | The authors list a number of problems with maintaining current bibliometrics. Many of these are echoed in other reports/papers in this table. | The authors report on the development and validation of the RCR metric. The RCR ‘is based upon the novel idea of using the co-citation network of each article to field- and time-normalize by calculating the expected citation rate from the aggregate citation behavior of a topically linked cohort’. The article citation rate is the numerator and the average citation rate is the denominator. | More independent research is needed to examine the relation between the RCR and other metrics and the predictive validity of the RCR as well as whether it probes untoward consequences, such as gaming or endowing questionable research practices. A recent paper disputes the validity of the RCR by raising several concerns regarding the calculation algorithm [ |
| Journal citation distributions (2016) [ | The JIF is a poor summary of raw distribution of citation numbers from a given journal, because that distribution is highly skewed to high values. | The JIF says little about the likely citation numbers of any single paper in a journal, let alone other dimensions of quality that are poorly captured by citation counts. | The authors proposed using full journal citation distributions or nonparametric summaries (e.g., IQR) and reading of individual papers to evaluate both the papers and journals. | It is not clear how to use full JCRs to evaluate either a journal or a specific paper or what summaries of the JCR are most informative. Citation counts do not capture the reason for the citation. | |
| R-index (2015) [ | Canada/Denmark | Peer review is under many threats. For one, we are fast approaching a situation in which the number of manuscripts requiring peer review will outstrip the number of available peer reviewers. Peer reviewing, while an essential part of the scientific process, remains largely undervalued when assessing scientists. | The R-index is calculated as ‘Each journal, | The index is exceedingly complex; interpretation is not intuitive and requires standardised inputs that are not currently gathered at journals. It has not yet been applied to multiple journals and relies partly on IF. | |
| S-index (2017) [ | US | The current system, by not according any credit to the production of data that others then use in publications, disincentivises data sharing and the ability to assess research reproducibility or make new scientific contributions from the shared data. | The authors have proposed a metric to offer authors’ a measurable incentive to share their data with other researchers. | The S-index is in essence an H-index for publications that uses scientists’ shared data but does not include those researchers as authors. It counts the number of such publications, | This has not been applied in practice. It is not clear how to assign sharing credit to initiatives with many authors. There is no established method to track use of datasets; it relies on citation counts. |
Abbreviations: ACUMEN, Academic Careers Understood through Measurement and Norms; DOI, digital object identifier; DORA, Declaration on Research Assessment; GEP, Good Evaluation Practices; i, annual list of reviewers; ID, identifier; IF, impact factor; IFj, journal’s impact factor; IQR, interquartile range; ISNI, International Standard Name Identifier; j, journal; JIF, journal impact factor; N, number of publications that use a scientist’s shared data but does not include those researchers as authors; NAS, National Academy of Sciences; nj, number of papers reviewed; PQRST, Productive, high-Quality, Reproducible, Shareable, and Translatable; RCR, relative citation ratio; REF, Research Excellence Framework; REWARD, Reduce research Waste And Reward Diligence; UMC, Utrecht Medical Centre; wkj, total number of words.
Key principles, participant dialogue, and research and policy implications when assessing scientists.
| Number | Key principles | Participant dialogue | Research implications | Policy implications |
|---|---|---|---|---|
| 1 | Addressing societal needs is an important goal of scholarship. | There was a discussion on the need for research and faculty to look outwardly and not solely focus inwardly. The inward obsession, in which researchers have to struggle hard to secure and advance their own career, has resulted in bandwagon behaviour that has not always had positive societal impact. | There is a need to develop mechanisms to evaluate whether research helps society. This is particularly important for scientists working on applied science rather than blue-sky discovery, the immediate and midterm societal impact of which may be more difficult to discern. Innovative assessment practices in applied settings might reward the professional motivations of young researchers when their research efforts more closely align with solving the societal burden of disease(s). | Academic institutions are part of the broader society and their faculty should be evaluated on their contributions to the local community and more broadly. For example, universities could ask their faculty assessment committees to focus their assessments on whether patients were involved in selecting outcomes used in clinical trials or whether faculty shared their data, code, and materials with colleagues. Most bibliometric indicators do not necessarily gauge such contributions, and faculty assessment criteria need to be broadened to reward behaviours that impact society. |
| 2 | Assessing faculty should be based on responsible indicators that reflect fully the contribution to the scientific enterprise. | Besides bibliometric indices (which need to be used in their optimal form, minimising their potential for gaming), several responsible indicators for assessing faculty (RIAS’s) were discussed (see main text). | Identifying and collating current promotion and tenure criteria across academic institutions is essential to provide baseline knowledge, against which initiatives and/or policy changes can be evaluated. | Leadership is critical to moving this initiative forward. This could start by university leaders asking their respective assessment committees to review and make available current criteria used to assess scientists. The results could be shared with faculty members, as could their opinions about the relevance of the current criteria and suggestions for potential new and evidence-based assessment criteria. Such discussions could also be held with the local community to address whether the assessments reflect their values. |
| 3 | Publishing all research completely and transparently, regardless of the results, should be rewarded. | Participants discussed the need to reduce the problems associated with reporting biases, including publication bias. | Some journals are developing innovative ways, such as registered reports, to help ensure that all results are reported. Audit and feedback might augment the uptake of any new initiative to promote better reporting of research and subsequent publication. | Funders and academic institutions are well positioned to implement policies to promote more complete reporting of all research. Rewarding faculty for registering their planned studies and publishing/depositing their completed studies is essential. Academic institutions should consider linking ethics approval with study registration. Promotion and tenure assessment should reward all efforts to make research available and transparently reported. |
| 4 | The culture of Open Research needs to be rewarded. | Participants agreed on the need to value more open research (i.e., including sharing of data, protocols, software, code, materials, and other research tools). | Open research is becoming a more widely accepted cultural norm. Initiatives such as TOP [ | Funders, journals, and academic institutions should promote open research behaviours and reward them appropriately. This is likely a shared principle across stakeholders. A strong signal concerning its importance could be based on joint implementation across funders, journals, and academic institutions. For example, audit and feedback (e.g., Trialtracker tool [ |
| 5 | It is important to fund research that can provide an evidence base to inform optimal ways to assess science and faculty. | Newer fields of investigation, such as journalology (publication science) and meta-research, help generate data on optimal ways to assess faculty. | There have been several initiatives to reform how faculty is assessed for promotion and tenure. Any new initiative needs to have an evaluation component built into the process—does the new initiative (i.e., intervention) have its intended effect? | Funding agencies should set aside funding to promote evaluations for assessing faculty and research on research (see text for examples). In the same way that funders have priority and/or specific areas of interest (e.g., cancer), they should also be interested in how well they (and their grantees) meet best publication practices for the money they are spending. Specific research calls should be made on an ongoing basis. Such a policy would signal the importance funders place on how scientists are evaluated. |
| 6 | Funding out-of-the-box ideas needs to be valued in promotion and tenure decisions. | Participants held it was important to promote grant applications without specific aims, securing support for pursuing a broader investigative agenda. | Different models on how to support out-of-the-box ideas need to be evaluated comparatively, using appropriate midterm and long-term outcome indicators. | Funders need to promote opportunities for blue-sky thinking without any immediate return on investment and promote success stories. This might be achieved through stable longer-term, perhaps in the 4- to 8-year range, funding of scientists at different career stages. Such funding could be a specific fraction of the funder’s total budget. The Canadian Institutes of Health Research’s Foundation scheme and the Australia’s National Health and Medical Research Council Early Career Fellowships are examples of this type of funding. The Human Genome Project might provide a useful model to fund these programs. It invested 1% of its budget in the Ethical, Legal and Social Implications Research Program, with enormous payoff. |
Abbreviations: JIF, journal impact factor; OA, open access; RIAS, responsible indicator for assessing scientists; TOP, transparency and openness promotion.