Literature DB >> 35377894

Introducing the EMPIRE Index: A novel, value-based metric framework to measure the impact of medical publications.

Abstract

Article-level measures of publication impact (alternative metrics or altmetrics) can help authors and other stakeholders assess engagement with their research and the success of their communication efforts. The wide variety of altmetrics can make interpretation and comparative assessment difficult; available summary tools are either narrowly focused or do not reflect the differing values of metrics from a stakeholder perspective. We created the EMPIRE (EMpirical Publication Impact and Reach Evaluation) Index, a value-based, multi-component metric framework for medical publications. Metric weighting and grouping were informed by a statistical analysis of 2891 Phase III clinical trial publications and by a panel of stakeholders who provided value assessments. The EMPIRE Index comprises three component scores (social, scholarly, and societal impact), each incorporating related altmetrics indicating a different aspect of engagement with the publication. These are averaged to provide a total impact score and benchmarked so that a score of 100 equals the mean scores of Phase III clinical trial publications in the New England Journal of Medicine (NEJM) in 2016. Predictor metrics are defined to estimate likely long-term impact. The social impact component correlated strongly with the Altmetric Attention Score and the scholarly impact component correlated modestly with CiteScore, with the societal impact component providing unique insights. Analysis of fresh metrics collected 1 year after the initial dataset, including an independent sample, showed that scholarly and societal impact scores continued to increase, whereas social impact scores did not. Analysis of NEJM 'notable articles' showed that observational studies had the highest total impact and component scores, except for societal impact, for which surgical studies had the highest score. The EMPIRE Index provides a richer assessment of publication value than standalone traditional and alternative metrics and may enable medical researchers to assess the impact of publications easily and to understand what characterizes impactful research.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35377894 PMCID： PMC8979442 DOI： 10.1371/journal.pone.0265381

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The publication of clinical trial results and other medical advances is an ethical obligation and benefits a variety of stakeholders. Published information can be used by physicians, other healthcare practitioners, and patients to evaluate and understand potential treatments. Medical researchers and academics can use published results to inform their own research endeavors and to advance medical research. In addition, policymakers use published information to develop guidelines and treatment protocols that help to guide changes to clinical practice. Publications are therefore vehicles for communicating research insights for peer-to-peer validation and discussion. Article-level metrics provide an indication of the reach, engagement, and impact of publications, but they cannot be assumed to provide a measure of the quality (or even the full impact) of the underlying research. One study found altmetric scores h to be highly correlated with expert assessment of research impact, but not correlated with assessment of research quality [1], while another found no correlation between altmetrics scores or citations and the impact of research as assessed in the UK’s Research Excellence Framework [2]. Publication metrics can contribute to research assessment if conducted within a comprehensive framework that also assesses non-publication impact, such as the Becker Model [3]. Impact measurements aim to assess the utility of published research for its intended audience as well as the effectiveness of the communication. Objective measures of impact can support these endeavors by enabling comparative assessments to be made. However, making such measurements is challenging owing to the lack of available data and agreed definitions of impact. Historically, a common proxy for the publication impact of an article has been the impact factor of the journal in which it is published. However, although the journal impact factor (JIF) may help to identify journals with a high readership, it is widely recognized to be a poor indictor of the quality or impact of individual research articles [4, 5]. Article-level metrics avoid the category error of using JIF in this context. The number of citations is the most well-known metric, but this reflects only scholarly activity and citations can take years to accumulate [6]. Recently, the advent of alternative article-level metrics (altmetrics) has provided a new way to evaluate the impact of scientific publications. A wide range of potential altmetrics exists, signifying different interactions with the publication of interest but differing widely in quality and representativeness [7]. The sheer volume of potential metrics is evident in the information gathered by major aggregators including Altmetric, which collects nearly 20 different altmetrics, and PlumX, which collects over 40 [8, 9]. To make metrics easier to interpret, various approaches have been taken to distilling them into simplified scores. The most well-known of these is the Altmetric Attention Score (AAS), which weights a variety of individual metrics to reflect a subjective assessment of relative reach and aggregates them into a single number. Attempts to reduce any complex set of metrics into one linear scale have been criticized because they will tend to be driven by a single predictor, especially when the variables included are correlated [10]. Indeed, the AAS is dominated by Twitter and, to a lesser extent, news articles [10-12], so it does not reflect the impact of publications among researchers or policy-makers. Furthermore, the AAS has been criticized for arbitrary (and at times opaque) weighting of components [13, 14]. The full range of altmetrics is, however, multifactorial because they have diverse origins and represent different activities relating to publications [15-19]. The AAS is only weakly correlated with the number of citations [20, 21]. Among the most cited, downloaded, and mentioned articles published in general medical journals, only 2.5% were found in all three lists [22]. This implies that altmetrics cannot effectively be reduced to a single linear representation, and data reduction can, at best, provide several scores that group together related metrics. As a result, a metric system with summary scores designed on data-reduction principles must, if it includes diverse, weakly correlated metrics, provide for several distinct factors [19, 23]. We sought to develop a value-based, multi-component metric framework for medical publications, the EMPIRE (EMpirical Publication Impact and Reach Evaluation) Index, that would allow authors and other professionals within the medical and pharmaceutical fields to assess the impact of publications in terms meaningful to them. The metric framework is also intended to monitor the long-term impact of publications, predict the likely long-term impact using early indicators, and identify the effectiveness of communication efforts surrounding publications. Focusing on a single discipline, medicine, has several advantages when developing a metric framework. First, value is inherently subjective and is likely to differ between disciplines. Similarly, the relationship between metrics varies between scientific disciplines [15, 20, 24, 25]. Second, using the number of citations alone is known to underestimate severely the impact of clinical intervention research compared with basic and diagnostic medical research [26], underscoring the need for a multivalent approach to impact assessment. Third, medicine and medical sciences is the scientific discipline richest in metrics [20], providing a large dataset to examine.

Materials and methods

Approach to developing the scoring system

Development of the scoring system for the EMPIRE Index proceeded through a series of stages, outlined in Fig 1 and described in more detail in the sections below.

Fig 1

Process for developing the scoring system.

NEJM, New England Journal of Medicine.

Process for developing the scoring system.

NEJM, New England Journal of Medicine. In summary, during framework construction, a large set of publications was generated to gain an in-depth understanding of the statistical characteristics of altmetrics in a relevant sample. Publications of Phase III clinical trials were chosen for analysis because these studies typically require a high investment of resources and personnel and are most likely to have an impact on clinical practice. In addition, they are likely to be rich in metrics–the mean number of metric counts has a substantial effect on the size of the intercorrelation observed in a publication sample [27]. A series of statistical analyses was then conducted to determine which metrics were comprehensive and provided useful information, and how they were related to each other. The grouping and weighting of metrics was informed by these analyses but was ultimately driven by an understanding of the type of interaction each metric represented and by value judgments provided by a panel of stakeholders. Once the structure and weighting of the metric system had been decided, predictor scores were developed using altmetrics that accumulated rapidly. Scores for all components of the system were then scaled to a benchmark representing a very high level of impact; for this, Phase III articles published in the New England Journal of Medicine (NEJM) were chosen. The last stage in development was to characterize the performance of the final scoring system. This was carried out in three datasets: the original Phase III dataset, the Phase III dataset with metrics updated after 1 year (and including 1 new year’s worth of publications), and a dataset comprising publications selected by NEJM editors that were likely to influence clinical practice.

Sample acquisition

Reference Phase III sample

We identified a sample of publications (the reference Phase III sample) that was representative of the primary output of clinical medicine (Phase III clinical trials) as well as being sufficiently large to permit statistical and longitudinal analysis. Data were obtained across 3 years of publications to ensure the sample was large enough for analysis and included publications old enough to have accumulated citations in guidelines and policy documents, while minimizing the impact of confounding factors related to the change in use of publications over time (in particular, changes in social media mentions). Non-English publications were excluded because the distribution of altmetrics for these was likely to differ substantially from that of publications in English (e.g. news coverage). The search was conducted on May 23, 2019 in PubMed, using the search term: ("clinical trial, phase iii"[Publication Type]) AND (("2016/05/01"[Date—Publication]: "2019/05/01"[Date—Publication]) AND Clinical Trial[ptyp] AND English[lang]). Altmetrics for this sample were obtained on May 27, 2019. Article publication dates were obtained from Altmetric Explorer and were used to split the sample into two subsamples–the older 50% (1H) and the younger 50% (2H)–to assess the effect of temporal change in altmetrics.

Benchmark NEJM Phase III sample

The benchmark sample provided a ‘target’ against which to calibrate metrics achieved by other publications. For this reason, a sample was chosen from a journal widely considered the ‘gold standard’ for clinical trial publications, the NEJM, which has the highest JIF of all general medical journals and describes itself as “the world’s leading medical journal and website” [28]. The benchmark sample comprised all Phase III clinical trial articles published in the NEJM in 2016 (manually identified from a sample of all clinical trials obtained via a PubMed search). The year was selected to allow the accumulation of metrics such as article or guideline citations, and to match the base year in the reference Phase III sample. Altmetrics for the benchmark sample were obtained on July 31, 2019.

1-year update Phase III sample

An independent sample was obtained to assess the metric framework for consistency. This sample was identified on June 6, 2020 using the same search terms as the reference Phase III sample but for the consecutive 12-month period (i.e. ("clinical trial, phase iii"[Publication Type]) AND (("2019/05/01"[Date—Publication]: "2020/05/01"[Date—Publication]) AND Clinical Trial[ptyp] AND English[lang])). Metrics for this 1-year update Phase III sample as well as for the original reference Phase III sample were acquired on June 7, 2020 (approximately 1 year after the original metrics were acquired). To enable analysis of temporal changes, both the updated reference sample and the prospective Phase III sample were divided into 12-month subsamples (May 1 to April 31) based on publication dates provided by Altmetric Explorer. Publications with a publication date before May 1, 2016 according to Altmetric Explorer were excluded.

NEJM notable articles sample

An additional independent sample was identified with which to assess framework performance in other types of clinical research, especially the utility of the societal impact component. Annually, the editor of the NEJM curates a selection of articles published in the journal that year that they believe have practice-changing potential (‘notable articles’). We identified all of these articles for the years 2016, 2017, 2018, and 2019 [29-32], and obtained altmetrics for them on January 8, 2020. Articles were classified by the authors under a broad typology: interventional (studies describing an intervention with a medical treatment intended for clinical practice), observational (prospective and retrospective non-interventional studies), innovative (publications describing novel techniques or assays), and surgical.

Acquisition of altmetrics and other metrics

Data for all publications were obtained from the five sources listed below. Altmetric Explorer [9]: This was the primary source for altmetrics data as well as publication dates). PlumX [8]: In addition to a wide range of metrics similar to those provided by Altmetric Explorer, PlumX provided some unique metrics such as citations in articles classified by Medline’s indexers as ‘clinical practice guideline’ (PubMed guidelines). Pubstrat Journal Database [33]: This was scraped to determine JIFs for journals identified by Altmetric Explorer in the acquired datasets. CiteScore [34]: A journal-level, citation-based metric, similar to JIF. CiteScore was downloaded for all journals on August 7, 2019 and CiteScore values for 2016 were used. Scimago Journal Ranking [35]: A journal-level, citation-based metric that used a PageRank algorithm. In addition to these standard metrics, original tweets and retweets (provided by Altmetric.com) were obtained for the reference Phase III sample. In a similar way to the exploratory analysis of Costas et al. (2015), an ‘altmetrics-driven’ universe of publications was created in which all publications had at least one altmetric or citation (via Altmetric Explorer) [17]. Costas et al. noted that this analysis did not result in a meaningful impact on the precision of altmetrics as predictive tools for citations, but did reduce the zero inflation that can confound statistical analysis.

Statistical analysis

Analyses were conducted in Microsoft Excel using the Analyse-it plugin (Analyse-it Software, Ltd., Leeds, United Kingdom). Descriptive statistics were obtained and Spearman rank correlations between individual altmetrics were calculated. In addition, exploratory factor analysis was used to provide insights into how best to group similar metrics. Factor analysis assumes that latent or underlying factors exist that causally influence the observations. For the purposes of metric development, we wanted to explore the hypothesis that publications have an intrinsic ‘social’ interest leading to social media mentions that is fundamentally different from an intrinsic ‘scholarly’ interest leading to citations. An alternative data-reduction technique, principal component analysis, simply creates one or more index variables explaining as much statistical variance as possible without regard to theoretical differences in the metrics. In practice, the two approaches yield similar results. We used maximum likelihood factor analysis with oblique (oblimin) rotation. Because altmetrics follow a power-law distribution [36], data were log-transformed before factor analysis. All data were increased by 1, which allows the discretized lognormal distribution to be fitted to the full range of data [37]. Adding a positive constant to the dependent variable is a common solution to the problem of log-transformation of datasets containing zeros, although it does introduce a small distortion to the data [38]. Regression analyses were conducted using multiple linear regression on the untransformed data. EMPIRE Index scores for the NEJM notable articles were averaged over the different years (2016–2019). To control for the impact of time on the accumulation of altmetrics, EMPIRE Index scores for articles in each year were expressed as a percentage of the average score of observational studies (the highest-scoring article type), and the average of these yearly percentages was taken.

Value assessment

An internal Novartis cross-functional stakeholder panel meeting was convened on July 9, 2019, comprising representatives from scientific communications, medical, commercial, launch strategy, and medical analytics departments. Participants reviewed information on the analyses conducted as well as background information on metrics, and provided qualitative insights into the interpretation and importance of key metrics. Quantitative value assessments were obtained through points allocation (i.e. participants were given a 7 points to allocate among the 11 metrics according to the ones they felt best represented social impact, and a further 7 each to allocate according to which metrics they felt represent scholarly and societal impact (i.e. 21 points in total). Voting was conducted openly and in a single round. Points were summed for each metric and the proportion of points allocated to each metric was calculated.

Predictor scores

Two predictor scores were developed based on metrics that accumulate rapidly after publication. The early predictor score included altmetrics that accumulated most rapidly (Twitter, Facebook, and news mentions) [6, 39, 40], and included CiteScore, used here as a proxy for the readership and interest in a journal. The intermediate predictor score included blog mentions, F1000Prime mentions, and Mendeley readers–altmetrics that accumulate more slowly, but still faster than metrics with high lag, such as citations. The basis of each predictor score was a multiple linear regression of the altmetrics included in the predictor against the total impact score in the reference Phase III sample. Weightings for each metric were calculated as follows: where β is β from linear regression, sum is the sum total of the incidence of the target metric in the reference sample, and sum is the sum total of all metrics included in the predictor score.

Case examples

A total of 59 publications including Novartis-sponsored research were tracked over 9 months in three quarterly reports from May-December 2020. From these assessments publications with notable metrics were identified for further analysis. Two case examples of interest are reported here.

Results

Framework construction

The initial search found 3498 Phase III clinical publications, of which altmetrics for 3450 were identifiable by PlumX and 2891 by Altmetric Explorer. The analysis set comprised 2891 articles with at least one metric identified by Altmetric Explorer, of which eight were unavailable in the PlumX dataset. Publication metric characteristics of this sample are shown in S1 Table. Several altmetrics had a very low density so were discarded for further analysis (e.g. Weibo, LinkedIn, Google+, Pinterest, Q&A, peer review, video, and syllabi mentions). Some altmetrics were retained despite a low density as they were thought to provide unique insights relevant to the objectives (policy, patent, F1000Prime, Wikipedia, and guideline [from PlumX] mentions). Some metrics of high relevance (abstract and publication views and downloads) were discarded because the quality of the data was inconsistent–in particular, many papers had numerous citations and Mendeley readers without recorded views or downloads, suggesting that coverage was incomplete. Journal-level metrics were not included in the EMPIRE Index total impact score or component scores, but they were considered potential components of predictor scores. Given that the coverage obtained with CiteScore was higher than with the other two journal-level metrics examined (JIF and Scimago Journal Ranking–S1 Table part C), CiteScore was selected for further analyses. Pairwise Spearman correlations between the altmetrics included are shown in S2 Table. The most common metrics were strongly correlated (news, blog, and Twitter mentions, Mendeley readers, and Dimensions citations). The strongest correlation was seen between Mendeley readers and Dimensions citations, although Facebook mentions and tweets were also strongly correlated. In addition, original tweets and retweets were highly correlated with each other and with total tweets, suggesting that a single measure (total tweets) is sufficient. Other metrics showed only weak correlations with each other. Dividing the reference Phase III sample into two subsamples according to the publication date provided by Altmetric Explorer revealed important differences (S1 Fig). The more recent half of the publications (2H, after May 21, 2017) had higher mean Twitter, Wikipedia, and other counts, but lower Dimensions citations, than the older half (1H). Three-factor analysis was conducted on the full range of metrics selected for inclusion (S3 Table). Two-factor analysis was also carried out on a subset of metrics excluding those with low incidence (policy document, PubMed guideline, and patent mentions) (S4 Table). These analyses revealed consistent groupings, such as Mendeley readers with Dimensions citations, and news, blog, and Wikipedia mentions.

Weighting

Based on the results of these analyses and considerations, a framework for grouping metrics was developed comprising three component scores: social impact (news, blog, Twitter, Facebook, and Wikipedia mentions), scholarly impact (Mendeley readers, Dimensions citations, and F1000Prime posts), and societal impact (mentions in policy documents, PubMed guidelines, and patents). An initial statistical estimate of weightings was calculated as the inverse proportion of counts of each altmetric in the reference Phase III sample relative to the total number of all altmetric counts. The stakeholder panel was conducted as part of a 1-day workshop with 14 stakeholders (9 female, 5 male), all employees of Novartis and representing different functions within the company (4 Medical Affairs, 9 Scientific Communications and 1 Commercial). Most (n = 12) were European, with 2 US representatives. Stakeholders were asked to summarize in one or two words and phrases what they felt ‘impact’ meant to them. ‘Change’ was a key theme, mentioned 5 times (twice in the context of changing clinical practice, once each for changing mindsets and changing dogma, as once as simply ‘change’). ‘Patients’ were mentioned twice (in the context improving patient outcomes), while other phrases mentioned were communication, behaviour, access, reach, educate, utility, and supporting treatment decisions. Further discussions during the stakeholder panel meeting revealed the central importance given to guideline and policy document citations as a measure of article impact. This was also reflected in the quantitative session, in which guidelines and policy documents were allocated over one-third of the total points (Table 1).

Table 1

Value accorded to metrics by the stakeholder panel (quantitative scoring).

Metric	Social	Scholarly	Societal	Total	Percentage of points
Twitter mentions	24	1	0	24	10
Facebook mentions	15	0	0	15	6
Blog mentions	16	0	0	16	7
News mentions	20	1	3	24	10
Wikipedia mentions	8	0	0	8	3
Dimensions citations	0	40	0	40	16
Mendeley readers	0	16	0	16	7
F1000Prime mentions	0	14	0	14	6
Guideline mentions	2	7	45	54	22
Policy mentions	0	0	31	31	13
Patent mentions	0	0	5	5	2

Weightings derived from the statistical approach were revised to reflect findings from stakeholder value assessments. The selected weightings and their contribution to the total impact score based on the sample are shown in Table 2. In general, the approach taken was to balance the weighting such that the percentage contribution to scores in publications in the reference Phase III sample resembled the stakeholder value, while acknowledging relative importance (e.g. of news articles vs blogs) and prevalence (e.g. when Wikipedia entries were too infrequent to make a meaningful contribution without greatly inflated weighting relative to the value accorded by the stakeholder panel). To combine statistical and value-based weighting effectively, some related metrics were considered as combined entities (i.e. Twitter and Facebook mentions were allocated a combined 20% of points by stakeholders, and contributed a combined 17.7% to the total impact score in the reference Phase III sample).

Table 2

Weighting assigned to metrics included in the social, scholarly, and societal impact scores, along with their contribution to total impact scores in the reference sample.

Metric	Total in reference sample	Percentage of all metrics in reference sample	Social weighting	Scholarly weighting	Societal weighting	Percentage contribution to total in reference sample
Twitter mentions	94,235	29.25	3	–	–	17.0
Facebook mentions	3821	1.19	3	–	–	0.7
Blog mentions	1086	0.34	10	–	–	0.7
News mentions	18,539	5.75	15	–	–	16.7
Wikipedia mentions	70	0.02	5	–	–	0.0
Dimensions citations	78,785	24.45	–	4	–	18.9
Mendeley readers	124,866	38.75	–	1	–	7.5
F1000Prime mentions	252	0.08	–	15	–	0.2
Guideline mentions	183	0.06	–	–	1800	19.8
Policy mentions	321	0.10	–	–	900	17.4
Patent mentions	59	0.02	–	–	300	1.1
Percentage contribution to total	–	–	35.1	26.7	38.2	100

The variance in total impact scores explained by each predictor score was moderately high (early predictor vs total impact score, r2 = 0.56; intermediate predictor vs total impact score, r2 = 0.65, S2 Fig). An overall predictor score can be calculated as the average of early and intermediate predictor scores. The variance in total impact scores explained by the overall predictor score was also moderate (overall predictor vs total impact score, r2 = 0.69). Weightings calculated for each of the variables in the predictor score are shown in Table 3.

Table 3

Weightings assigned to metrics included in the early and intermediate predictor scores.

Metric	Total in reference sample	Early predictor score	Intermediate predictor score	Percentage contribution to overall predictor score in reference sample
CiteScore	15,384	57	–	26.2
News mentions	18,539	22	–	12.2
Twitter mentions	94,235	3	–	8.2
Facebook mentions	3821	30	–	3.5
Mendeley readers	124,866	–	12	44.8
Blog mentions	1086	–	125	4.1
F1000Prime mentions	252	–	146	1.1

Benchmarking

In total, 74 Phase III publications from the NEJM published in 2016 were identified for the benchmark sample. The non-adjusted, non-adjusted overall predictor score was selected as the benchmark for predictor scores, and the non-adjusted total impact score was selected for total, social, scholarly, and societal impact scores (Table 4).

Table 4

Scores in the benchmark sample before and after benchmark adjustment.

Non-adjusted scores chosen as benchmarks are shown in bold.

	Early predictor score	Intermediate predictor score	Overall predictor score^a	Social	Scholarly	Societal	Total^b
Mean non- adjusted score	3218	4414	3816	1622	1593	2854	6068
Benchmark value	3816	3816	3816	2023	2023	2023	6068
Mean benchmarked score	84	116	100	80	79	141	100

aThe overall predictor score is the average of early and intermediate predictor scores.

bThe non-adjusted total impact score is the sum of the social, scholarly, and societal impact scores. The adjusted total impact score is the average of the adjusted component scores.

Scores in the benchmark sample before and after benchmark adjustment.

Non-adjusted scores chosen as benchmarks are shown in bold. aThe overall predictor score is the average of early and intermediate predictor scores. bThe non-adjusted total impact score is the sum of the social, scholarly, and societal impact scores. The adjusted total impact score is the average of the adjusted component scores. Dividing the non-adjusted total benchmark by 3 before applying it to the component scores had the effect of upscaling them so that the adjusted total impact score represents the mean of the components (rather than the sum, as in the non-adjusted total impact score). EMPIRE Index scores are calculated by dividing the unadjusted score of interest by the appropriate benchmark and multiplying by 100.

Final scoring framework

An overview of the final EMPIRE Index framework is shown in Fig 2. The framework comprises the three component scores (social, scholarly, and societal impact), which are averaged to provide a total impact score. Each component score incorporates a separate type of altmetric, indicating a different aspect of engagement with the publication. The framework also includes the two predictor scores.

Fig 2

Example of the EMPIRE Index score for a single publication.

HCP, healthcare provider; NEJM, New England Journal of Medicine.

Example of the EMPIRE Index score for a single publication.

HCP, healthcare provider; NEJM, New England Journal of Medicine.

Characterization of the EMPIRE Index

Characterization in samples used in development

The distributions of scores in the reference sample 1H and 2H, and in the benchmark sample, are shown in Fig 3. Of note, social impact scores were lower and societal impact scores were higher in 1H than in 2H. Predictor scores were higher than total impact scores in the reference Phase III sample but not in the benchmark NEJM Phase III sample, and median social impact scores were closer to median total impact scores in the benchmark NEJM Phase III sample than in the reference Phase III sample.

Fig 3

Distribution of scores.

Distribution of scores.

Distribution of scores in (A) the reference Phase III sample and (B) the benchmark NEJM Phase III sample. Box = 1Q–2Q, whiskers = 1.5 × interquartile range, X = mean. 1H, older 50%; 2H, younger 50%. NEJM, New England Journal of Medicine. The correlations between component scores, the AAS, and CiteScore are shown in Table 5. Correlations between component scores were relatively low, the greatest being between social and scholarly impact scores. The social impact score correlated strongly with AAS, and both social and scholarly impact scores correlated modestly with CiteScore. However, the societal impact score is quite distinct from AAS, CiteScore, and the other component scores. Although predictor scores were moderately successful at predicting the total impact score, they were only weakly related to the societal impact score.

Table 5

Correlations (Spearman r) between component scores, AAS, and CiteScore in the reference Phase III sample.

Correlations > 0.6 are shown in bold.

Score	Early predictor	Intermediate predictor	Social	Scholarly	Societal	Total	AAS	CiteScore
Early predictor	–	0.61	0.79	0.63	0.19	0.70	0.76	0.91
Intermediate predictor	0.61	–	0.59	0.87	0.26	0.76	0.59	0.55
Social	0.79	0.59	–	0.59	0.18	0.74	0.95	0.58
Scholarly	0.63	0.87	0.59	–	0.32	0.84	0.59	0.58
Societal	0.19	0.26	0.18	0.32	–	0.55	0.27	0.16
Total	0.70	0.76	0.74	0.84	0.55	–	0.78	0.57
AAS	0.76	0.59	0.95	0.59	0.27	0.78	–	0.56
CiteScore	0.91	0.55	0.58	0.58	0.16	0.57	0.56	–

AAS, Altmetric Attention Score.

Correlations (Spearman r) between component scores, AAS, and CiteScore in the reference Phase III sample.

Correlations > 0.6 are shown in bold. AAS, Altmetric Attention Score.

Characterization in prospectively collected samples

1-year update Phase III sample characterization

Publication dates obtained from Altmetric Explorer indicated that 194 articles were published prior to May 1, 2015; 1173 from May 1, 2016 to April 30, 2017; 1101 from May 1, 2017 to April 30, 2018; and 423 from May 1, 2018 to April 30, 2019. The drop in publication numbers in the latter period most likely reflects a lag in MEDLINE indexing. The 2019–2020 search identified 503 publications, of which 435 met the date criteria based on publication dates obtained from Altmetric Explorer. Mean EMPIRE Index scores in these year groups in both the original altmetric acquisition and the 1-year update are shown in Fig 4. Little change was found in the social impact component. Scholarly impact and, especially, societal impact continued to accumulate. The greatest increase in scholarly impact was seen in the most recent publications, while societal impact scores increased similarly across all 3 years sampled.

Fig 4

Mean impact scores in the original reference Phase III sample and the 1-year update Phase III sample.

(A) Social, (B) scholarly, (C) societal, and (D) total mean impact scores.

Mean impact scores in the original reference Phase III sample and the 1-year update Phase III sample.

(A) Social, (B) scholarly, (C) societal, and (D) total mean impact scores.

NEJM notable articles characterization

In total, 48 notable articles were identified by NEJM editors from 2016 to 2019. Mean impact scores from the 2016 subset are shown in Fig 5, with mean scores from the 2016 benchmark NEJM Phase III sample for comparison. Notable articles had higher social and societal impact than benchmark articles.

Fig 5

Mean impact scores for NEJM notable articles from 2016 compared with benchmark scores.

Box = 1Q–2Q; whiskers = 1.5 × interquartile range; X = mean. NEJM, New England Journal of Medicine.

Mean impact scores for NEJM notable articles from 2016 compared with benchmark scores.

Box = 1Q–2Q; whiskers = 1.5 × interquartile range; X = mean. NEJM, New England Journal of Medicine. Of the 48 articles, the focus was assessed to be interventional in 24 cases, observational in 10 cases, innovative in 6 cases, and surgical in 8 cases. After adjusting for publication year, observational studies were found to have the highest total impact, with other publication types having lower impact scores across all component scores except for the societal impact of surgical studies. Innovative studies had notably low societal impact, indicating that they were infrequently referenced in guidelines or policy documents (Fig 6).

Fig 6

EMPIRE Index scores for NEJM Notable articles.

EMPIRE Index scores are expressed as percentages of the scores achieved by observational studies.

EMPIRE Index scores for NEJM Notable articles.

EMPIRE Index scores are expressed as percentages of the scores achieved by observational studies.

Case examples

We present here two illustrative case examples. The first was the publication of VERIFY, a Phase 3 study of vildagliptin in patients with Type 2 diabetes, published in The Lancet on the 18th September 2020 [41]. The initial analysis, conducted on the 7th April 2020, was 202 days after publication, by which time it had gained notably high Early Predictor Score and so was selected for further investigation (Table 6). The publication had been timed to coincide with presentation at Annual Meeting of the European Association for the Study of Diabetes (EASD) in Barcelona, Spain (16–20 Sept, 2019) and was accompanied by press releases from the EASD and Novartis. The Lancet also tweeted, but this was accompanied by a limited number of retweets. The article had been picked up in guidelines at an early stage, and subsequent tracking identified increases in the societal impact score due to guidelines citations and a patent citation. The scholarly impact score was lower than predicted at the time of initial assessment. On the most recent follow up (3 September 2021) the scholarly impact score had increased to 23.

Table 6

EMPIRE Index scores for the two case example publications and time of initial analysis and subsequent follow-up.

Publication	Publication age at time of analysis (days)	Early reach	Intermediate reach	Social	Scholarly	Societal	Total
VERIFY	202	37	23	34	5	89	43
VERIFY (follow up)	415	39	41	36	12	267	105
“Two Phase 3 trials”	20	50	0	36	0	0	12
“Two Phase 3 trials” (follow up)	233	60	58	51	19	0	23

The second case example is the publication of Two Phase 3 Trials of Inclisiran in Patients with Elevated LDL Cholesterol, published in the NEJM on 18 March 2020 [42]. At the time of analysis, only 20 days after publication, it had achieved a very high Early Predictor Score, due to a large number of news articles linked to a Novartis press release. Follow up analyses showed that the Social score continued to increase as a result of an NEJM infographic that was tweeted by an academic expert with 8,000 followers, and a postdoctoral student using the paper as an example of best practice in data visualization. On the most recent follow up (3 September 2021) the scholarly impact score had increased to 50, commensurate with predictions suggesting that impact among the academic community had been satisfactory. However, it had not yet achieved detectable societal impact, suggesting that impact on clinical practice may be limited.

Discussion

We have developed the EMPIRE Index, a metric framework to assess the multidimensional impact of medical publications, including the potential impact on clinical practice. It avoids the pitfalls of JIF-based research assessment and unidimensional scoring systems. It also fulfills the Leiden criteria of being open, transparent, and simple [43]. The EMPIRE Index aggregates selected article-level metrics into meaningful component scores and weights them according to the value placed on them by members of a stakeholder panel and statistical analysis of a representative sample of articles. It differs conceptually from both the AAS and other, recently developed scoring systems: the #SoME_Score [44], the Weighted Altmetric Impact, and the Inverse Altmetric Impact [18, 38]. First, the value-based approach to the weighting and grouping of metrics recognizes that simple statistical associations may be sample-dependent and may not relate to underlying conceptual underpinnings. Second, the EMPIRE Index is specifically designed for medical publications. Many studies have documented different scales and relationships between metrics in various disciplines [15, 20, 24, 25] and, given that the value of each metric is inherently subjective, this value is unlikely to be consistent across scholarly disciplines. Third, the EMPIRE Index is scaled against a clearly defined, relevant benchmark, because interpretation of a novel composite metric is difficult without such a reference point. Such are the potential advantages of the EMPIRE Index. However, its utility is dependent on the robustness of the selection grouping, weighting of metrics, and benchmarking, as well as its performance in the evaluation of suitable publications. In the process of investigating these factors, a series of results of broad interest to the altmetrics community were generated. These will be discussed in the sections that follow.

Metric selection

Suitable metrics were identified for inclusion by reviewing the coverage and density of metrics obtained for the reference Phase III sample through the two most established metric providers: Altmetric and PlumX. Previous research has shown significant differences between these providers in terms of Mendeley readers and Twitter coverage, as a result of different approaches to collecting, tracking, and updating metrics [45-47]. Furthermore, the approach to covering news and blog posts differs greatly between PlumX and Altmetric Explorer [48]. We found broadly similar metrics between the two providers except for news articles, with Altmetric reporting twice as many for our sample as PlumX, and Facebook mentions (because Altmetric extracts only mentions on Facebook pages whereas PlumX also extracts ‘likes’). The reference dataset was selected to provide a sample rich in altmetrics. PlumX identified at least one metric for 99% of our sample, while the figure was 83% for Altmetric. This result compares favorably with that of previous work [17, 20, 24, 25, 49, 50], most likely indicating the increasing volume of altmetric activity. One important metric not included was article views and downloads. Although these data were provided by PlumX, we found them to be patchy, with many articles reporting metrics such as tweets or Mendeley readers but no page views or downloads on the EBSCO information service. Similar to previous investigators, we found that news, blog, Twitter, and Facebook mentions, Mendeley readers, and Dimensions citations were the most common metrics in our sample. These metrics were included in our analysis, as well as additional metrics that, although rare, provided valuable insights into article impact: citations in policy documents, guidelines, patents, Wikipedia, and F1000Prime.

Rationale for the three component scores

The EMPIRE Index comprises three component scores, each representing a different factor underlying the observed patterns of metrics seen in the reference Phase III sample. The social impact component represents actions that involve or are accessible to the general public as well as healthcare professionals and academics. The scholarly impact component represents actions with an academic focus. The societal impact component represents actions in which the publication has been used to inform decisions around optimal care or, in the case of patents, medical advances. To inform our metric grouping, correlation analyses were performed and exploratory factor analysis was used. These revealed a close connection between Mendeley readers and Dimensions citations, in line with findings from previous research [51-53]. Correlations were also found between Twitter and Facebook mentions, and news/blog and social media mentions, which again aligns with previous observations [52, 53]. No meaningful correlations were found between mentions in F1000Prime articles, policy documents, guidelines, patents, or Wikipedia articles and other metrics. These metrics have not previously been widely studied, and the low correlations observed may reflect their very small coverage–over 90% of publications score zero on these metrics. However, an analysis of four biomedical journals found that recommended articles are cited more frequently than non-recommended articles from the same journal [54], while Bornmann and Haunschild (2018) reported that F1000Prime recommendations were more closely correlated with Mendeley readers and Dimensions citations than with Twitter mentions [55]. Pairwise correlations can give useful insights into relationships between different metrics, but for the purposes of reducing data into composite scores it is helpful to understand the shared variance between multiple metrics. The exploratory factor analysis in our study produced findings consistent with those reported in previous literature [15-18]. Separating articles into those that were older (1H) and younger (2H) showed that citations (including policy and guideline mentions) and Mendeley readers consistently grouped into one factor; news, blogs, Wikipedia, and F1000Prime mentions grouped into a second factor; and Twitter (and, usually, Facebook) mentions comprised a third factor. A two-factor analysis excluding policy document, guideline, and patent mentions confirmed that Mendeley readers and Dimensions citations formed a separate group from the remaining metrics. Separating metrics into statistically and conceptually homogenous groupings meets a key criterion specified by Gingras and Larivière for well-constructed indicators [56]. Another criterion they specify is that an indicator should adequately represent the concept that it is intended to represent. Each altmetric represents a different action on the part of an audience; this has implications for how we understand the meaning of individual metrics [7] and whether these statistical associations represent meaningful groupings. The social impact component comprises tweets, Facebook likes, blog and news article mentions, and Wikipedia citations. Much remains unknown about the motivation for tweeting, given that most tweets are empty of context [57] and content [58]. Often all that is certain is that the tweeter felt the research interesting enough to broadcast. Social media platforms are known to be used mostly by the general public, so a central motivation for scholars to tweet is likely to be to communicate and explain their work to lay people [59]. This may be particularly true of publications in biomedical sciences, which attain greater Twitter interest than those in other scholarly disciplines [59]. Twitter communities linked through publication tweets tend to be led by organizational accounts associated with well-known journals or leading scholars [60], although at least half of sharing on social media is likely to be non-academic [61, 62]. An analysis of Facebook users who shared links to medical and health-related research articles found that more than half were not academic, while 16% were healthcare professionals and only 4% were academics [62]. Similarly, blogs and news articles are likely to be read by a mix of audiences, including (for biomedical articles), healthcare practitioners and patients. News article metrics are derived from a curated list of news sources, including general interest, local interest, science/technology and health sciences outlets [48]. It should be noted, however, that the curated list for Altmetric.com data are biased towards the English language and around 50% are located in the USA [48]. Wikipedia is a widely used source of scientific information, including by scientists [63], and it is one of the most widely read accessed sources of medical information by the general public [64]. Thus, social media activity, news articles and Wikipedia share some commonality in that they play a role in disseminating information across diverse audiences. The Scholarly Impact component comprises scholarly citation, reference manager data, and F1000Prime recommendations. Citations indicate that one scholarly work has been acknowledged by another. Conventionally this is seen as an indication of influence or impact, although the act of citing is complex and can be influenced by a range of factors such as post-hoc justification for a research project [65]. Nevertheless, an analysis of 640 highly-cited medical publications found that only 9% were also found in a list of 652 articles with the highest AAS (i.e. primarily social media and news mentions), suggesting distinct motivations for scholarly citing versus sharing across diverse audiences [22]. Mendeley saves require the reader to have access to the free-to use Mendeley reference manager platform, and so reflect useage among individuals who consume a lot of scholarly literature. Although Mendeley users often add articles to their library with the intention of citing them, many also add these for professional or teaching purposes, which may explain why some articles have many readers but few citations [55]. Mendeley saves therefore have been suggested as an alternative to download counts as a source of readership evidence [6], although limited to those readers who have a Mendeley account. F1000Prime recommendations indicate an article has been recommended by F1000Prime Faculty members who have been nominated by peers as experts in their fields. Interestingly, articles rated in F1000Prime reviews as a ‘technical advance’ received higher Mendeley scores, but not higher Twitter scores, than those that were not rated this way [66]. The reverse was true for articles considered a ‘good for teaching’, further underscoring the difference between Mendeley and Twitter indicators. The societal impact score comprises citations in medical guidelines published in peer-reviewed journals and indexed on PubMed, policy documents (i.e. grey literature, typically in the medical arena these will be technical assessments of products as part of guidelines development) and patents. These represent a different activity from citations in scholarly literature, since only guidelines/policy documents and patent citations clearly reflect wider societal impact [67, 68]. This is supported by our finding that NEJM notable articles score higher in societal impact relative to scholarly impact compared with typical Phase III clinical studies publications. It should be noted, however, that guidelines do not always contain references (although these may be provided in associated grey literature) and, when present, these references do not explicitly indicate their value to the guideline [69]. The weighting of metrics in the EMPIRE Index was based on three considerations: the prevalence of metrics in the reference sample (highly prevalent metrics were weighted less), the need for each component to make a substantial contribution to the total impact score, and the value given to each metric as an indicator of impact. As a result, the weighting is quite different from other approaches based on purely statistical considerations. Several approaches have determined weighting by regressing altmetrics on citations. These typically result in, for example, higher weighting given to blog posts and Mendeley readers than to news articles (because blog posts are relatively uncommon) [20, 44, 52, 70]. Because the target variable is journal citations, each Mendeley save or F1000 citation may be weighted in a similar way to or higher than a policy document citation [44, 71]. Ortega has developed weightings based on principal component analysis and also on inverse prevalence (so that the rarest metrics receive the highest weighting). The two approaches create very different weightings–for example, a news article carries half the weight of a publication citation in the Weighted Altmetric Impact, but eight times the weight of a publication citation in the Inverse Altmetric Impact [23, 72]. These statistical approaches give very different results from the weighting developed for the EMPIRE Index. Given that some altmetrics accumulate early, there is long-standing interest in the use of a limited selection of rapidly accumulating altmetrics to identify publications likely to have high long-term impact. Earlier work has employed multivariate regression with citations as a measure of long-term impact [12, 20, 44, 70, 71, 73] but, as we have seen, citations are only one of several measures of long-term impact. Among common metrics, tweets and news articles accumulate most rapidly after publication, while Mendeley readers, blogs, and F1000Prime articles increase more gradually [6, 39, 40]. Wikipedia and policy document mentions can, like article citations, take well over a year to accumulate [39, 74]. The EMPIRE Index addresses this by using two predictor scores–early and intermediate. The early predictor score also uses CiteScore, a journal-level metric similar to Journal Impact Factor. Because CiteScore is not an article-level metric it is not suitable for assessing the impact of individual articles. However, the choice of journal can have a significant effect on the impact of the publication, primarily because of readership (i.e. some journals have significantly higher reach into key audiences). Unfortunately, there is no consistent, publicly available measure of journal reach measure, because most publishers don’t make readership figures available in a comparable format. We therefore included Citescore not as a measure of impact, but as a proxy for journal reach and therefore as a partial predictor of likely future impact. CiteScore, in this context, can be thought of as a proxy for the exposure an article is likely to have; it has previously been shown that combining citations over the first year with JIFs accurately predicts future citations [74, 75]. Predictor scores are a purely statistical construct so the weighting is quite different from the EMPIRE Index itself; however, the weighting is also different from methods employed in previous work using citations as a target. Compared with studies mentioned earlier that used statistically based weighting with only citations as a target, in the EMPIRE Index predictor scores, Mendeley readers carry less weight relative to news article citations. This most likely reflects the broader basis of the EMPIRE Index compared with citation-only targets. The reasonably strong relationship between predictor scores and the total impact score in the reference Phase III sample is to be expected, given that they share many of the same metrics. However, the weak correlation with the societal impact score indicates that the predictor scores will lack precision in identifying high-impact publications (given the importance of the contribution of societal impact to the total impact). Further work using longitudinal datasets is required to improve these predictor scores.

Responsiveness and characterization

The responsiveness and utility of the EMPIRE Index was evaluated in several ways. Averages and distributions of scores in the reference Phase III sample and the benchmark NEJM sample were explored, showing that both samples had similar social and scholarly metrics and the latter had far higher societal metrics. Because the scores were scaled to the benchmark NEJM sample, this resulted in predictor scores lacking sensitivity for lower-impact publications (i.e. although they retained precision for identifying higher-impact articles, they tended to overpredict the impact of lower-impact articles uniformly). The social score was shown to be closely correlated with the AAS. The AAS weights metrics in a way that is not possible for users of the Altmetric Explorer dashboard–news outlets are weighted in a proprietary (and undisclosed) tier system, while retweets are assigned only 75% of the weight of original tweets [9]. The high correlation between the social score and the AAS thus reassures users that these nuances make little difference. Changes over time were evaluated in a 1-year follow-up of the reference Phase III sample. The minimal change in the social impact component further underlines the similarity of this component to the AAS, and supports the notion that news article and tweet metrics accrue soon after publication. Both scholarly and societal impact scores continued to increase, and further follow-up is needed to identify the point at which these scores plateau. Finally, an independent dataset was investigated: articles selected by NEJM editors for their practice-changing potential. These papers had substantially higher societal impact than the benchmark set of NEJM Phase III articles, supporting the sensitivity of the societal impact component in identifying practice-changing publications. Furthermore, innovative articles were found to have relatively low societal impact, indicating that although these are of interest to scholars and wider society, they do not directly feed into clinical practice changes. Conversely, articles on surgery had a high impact on practice even though social and academic interest was low.

Using the EMPIRE Index

The EMPIRE Index can be used to monitor large numbers of publications (for example, relating to a research project or clinical trial programme) to assess whether the publications are having the hoped-for impact. It can be used to identify publications that are having higher than expected impact, with implications for best practice in publication dissemination (for example, whether enhanced publication activities such as videos or infographics affect publication metrics). It could also be used to identify publications with lower-than-expected impact, which could signify that additional communication efforts are needed to reach audiences that may be interested in the topic (or that the topic is of low interest to the audiences concerned). The EMPIRE Index can also be used on large datasets, for example to see how different journals are associated with different kinds of impact, which could inform journal choice for submission.

Weaknesses

Although the EMPIRE Index provides advantages over existing metric approaches, it has some potential weaknesses. For example, grouping and value weighting have a large subjective component that may not reflect the value assigned to metrics by others. However, the transparent nature of the approach will hopefully stimulate further debate and discussion around the inherent subjectivity and allow for future refinements. The analyses conducted were based on a closely defined subset of medical publications, in terms of both content (Phase III trials) and publication date. As metrics evolve over time owing to changes in the way audiences engage with publications or technical advances in the way metrics are recorded, these original analyses and assumptions may not apply. They may also not apply to other publication types or study designs and may vary across disease areas. Predictor scores are based on results of cross-sectional, rather than longitudinal, analyses; further follow-up will allow these scores to be refined and improved. Furthermore, benchmarking to very high-impact articles results in predictor scores that tend to overestimate the final impact of more usual articles. Any indicator must represent a relatively homogenous construct to be considered meaningful [56]. The component scores of the EMPIRE Index (Social, Scholarly, and Societal Impact) have been specifically designed to meet this criterion, but the combined Total Impact score inevitably does not. Interpretation of the Total score is difficult if quoted in the absence of supporting component scores. Lastly, although the scoring system is transparent and reproducible, it depends on metrics aggregated by two different proprietary systems. These metrics may not be available to all intended users of the index.

Conclusions

The EMPIRE Index is a novel metric framework incorporating three component scores that respond to different types of publication impact: social, scholarly, and societal. Whereas the social impact score is similar to the AAS and the scholarly impact score is closely linked to (but broader than) article citations, the societal impact score reflects a key and distinct aspect of publication impact. In a similar way to the AAS, the EMPIRE Index weights metrics subjectively to reflect their value from the user’s perspective as well as by prevalence. Unlike the AAS, it is designed for a limited subject area (medicine) and weights and benchmarks the metrics accordingly. It also has a clear, transparent explanation of the scoring system, and provides predictor scores to give an early estimate of likely future impact. The development of the EMPIRE Index incorporates objective analysis and subjective values. It is, therefore, only directly relevant to stakeholders who share broadly similar perspectives to our panel. However, the process used for developing the EMPIRE Index is general utility; any interested party can reweight the EMPIRE index using subjective values arrived at using their preferred method.1 Several potential uses are envisaged for the EMPIRE Index. Because it provides a richer assessment of publication value than standalone traditional and alternative metrics, it will enable individuals involved in medical research to assess the impact of related publications easily and to understand what characterizes impactful research. It can also be used to assess the effectiveness of communications around publications and publication enhancements such as infographics and explanatory videos. Fuller validation of the EMPIRE Index requires additional prospective and cross-sectional studies, which are ongoing.

Summary statistics for metrics obtained via (A) Altmetric, (B) PlumX, and (C) journal-level, citation-based indices.

Coverage is the proportion of articles with > 0 on that metric. (DOCX) Click here for additional data file.

Correlations (Spearman’s r) between investigational metrics in the sample of Phase III clinical trial publications.

Correlations > 0.5 are shown in bold. (DOCX) Click here for additional data file.

Three-factor analysis of included metrics in (A) the full sample, (B) older papers (1H), and (C) younger papers (2H).

Highest loadings for each metric are shown in bold. (DOCX) Click here for additional data file.

Two-factor analysis of metrics excluding citations in policy documents, PubMed guidelines, and patents in (A) the full sample, (B) older papers (1H), and (C) younger papers (2H).

Highest loadings for each metric are shown in bold. (DOCX) Click here for additional data file.

Percentage change in mean publication metrics in the more recent half of the publications (2H) versus the older half (1H).

(TIF) Click here for additional data file.

Correlation of total impact scores with (A) early predictor and (B) intermediate predictor scores.

Scores shown are not adjusted to the benchmark. (TIF) Click here for additional data file. 27 Aug 2021 PONE-D-21-11452 Introducing the EMPIRE Index: A novel, value-based metric framework to measure the impact of medical publications PLOS ONE Dear Dr. Rees, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The reviewers have raised a range of questions and concerns regarding the presentation of your methods and interpretation of your data that need to be carefully addressed when preparing your revisions. Please submit your revised manuscript by Oct 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Jamie Males Staff Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. Thank you for stating the following in the Competing Interests section: “I have read the journal's policy and the authors of this manuscript have the following competing interests: Avishek Pal is an employee of Novartis Pharma AG. Tomas Rees is an employee of Oxford PharmaGenesis and received research funding from Novartis Pharma AG for this study.” Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests). If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared. Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf. 4. Please amend your list of authors on the manuscript to ensure that each author is linked to an affiliation. Authors’ affiliations should reflect the institution where the work was done (if authors moved subsequently, you can also list the new affiliation stating “current affiliation:….” as necessary). 5. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Hi Dear Sir/Madam, The article is appropriate for publication in this format. It just need to promote statement of the problem as well and also method must be repeatable for all researcher. Sincerely Yours, Reviewer #2: Interesting paper in which the authors propose a composite indicator to assess medial clinical trial publications' societal, social and scientific impact. I have several concerns that should be addressed before considering the paper worth publishing. MAJOR COMMENTS Although the authors acknowledge the critiques to composite indicators (problems with interpretation, misleading views, etc.) they do not substantiate the need for the indicator they propose, nor the utility. Wouldn't readers be able to see what the EMPIRE Index offers by simply looking at the individual metrics included? I suggest the authors to read this paper where theoretical problems with the use of composite indicators are exposed (in this case with regard to the H-Index) https://doi.org/10.1002/asi.21678 I find troubling the uncritical way with which altmetric indicators are embraced, you indicate that Twitter, Facebook, etc. relate to social impact. What do you mean by social impact? In the same vein, you indicate that policy documents refer to societal impact. What is the difference between these two types of impact? Do mentions in these venues reflect impact? On whom? I think the paper would really benefit from a theoretical framing and motivation which would explain many of the decisions which are later made. You discuss the value of metrics and how these are subjective. But wouldn't that invalidate your whole method? If you were to replicate the same exercise with the same or a slightly different panel of stakeholders, would the weighting still hold? Overall, I find the paper to be methodologically robust as to the analyses the authors do, but the authors make a series of interpretations from the data for which they provide no explanations and ignore much of the literature critical with the potential use of altmetrics to measure quantitatively impact. I think that by including a more reflexive and critical review of the literature and a theoretical framework by which they can revisit and sustain many of the claims they make the paper will be much stronger. I suggest some papers which may be useful: - On value of research: https://link.springer.com/content/pdf/10.1007/s11024-011-9161-7.pdf - Critiques to the AAAS: https://doi.org/10.1007/s11192-016-1991-5 - On criteria for evaluating indicators: https://www.researchgate.net/profile/Nirmala-Svsg/post/What-indicators-applied-for-evaluating-online-catalogs-at-universities/attachment/59d63e0779197b807799ab81/AS%3A422227658186752%401477678326575/download/Criteria-MIT6.pdf Reviewer #3: Overall, I'm intrigued by the proposal of this new metric, and think it could merit publication. It's certainly novel, and the statistical analysis seems appropriate (though I am far from the most qualified judge on this matter). The metric is thoughtful, and in theory, could give useful information about an article that we don't currently have. That said, I have some serious concerns about how this metric was both constructed and presented. My biggest concern with this article is that it’s using (mostly) altmetrics as indicators of impact, and even of specific types of impact (I count 4 – predictive scholarly impact, along with social, scholarly, and societal). The correlation between altmetrics as a whole and their value toward quantifying impact has never been clearly laid out, and while there is some evidence to justify some of how it is being used (such as a high degree of correlation between Mendeley downloads and citations for STEM fields, justifying Mendeley download as a proxy for scholarly impact), there’s a lot that is still up for debate (what tweets signify, for example – for detail on that below). Altmetrics are commonly discussed as a complement to existing bibliometrics, because they serve to bolster and support existing impact claims rather than introduce their own – it’s why the words “attention” and “engagement” get used by Altmetric, among other stakeholders, when describing the value of altmetrics. They’re also commonly discussed for their qualitative value – being able to demonstrate moments of impact (say, research being tweeted by a relevant non-profit or titan of the field) rather than measure and contextualize their quantity (beyond the very high-level contextualization of an AAS compared to similar articles). Becker Model – I’d like to see the Becker Model (https://becker.wustl.edu/impact-assessment/files/becker_model-reference.pdf) mentioned, since it is the major impact framework for biomedicine that goes beyond scholarly impact / citation-based evaluation. There are some areas of overlap with EMPIRE, but it’s clear that it was not consulted or used in any significant way. Any reason why? Concerns about societal impact – I suggest that the measures included for this metric are too low to quantify, and correlation with benchmark articles could be due to attention related to those articles making them more attractive to policymakers, guideline producers, etc. Not everything that can be measured should be quantified, and this grouping makes an excellent case for qualitative description rather than quantification, since a single citation can make a big impact on the score. I know from personal experience that the “policy” category of Altmetric is something of a weak area, with links to documents that have no right to be called policy papers, so I’m wary of its use, and the lack of correlation of the societal measure with ANY other measure gives me pause. I’m not as familiar with guidelines, though I suspect that metric is a bit more straightforward and appropriate for the biomedical field. Transparency – I’d like to see more transparency regarding how the weighting was established – was there good agreement between the stakeholders, especially among different stakeholder groups? EMPIRE hinges on proper weighting. There is some good discussion around weighting, but this group needs more description – how many, how the weighting process went and how many iterations there were, demographics such as age and gender to capture an appropriately diverse set of opinions. This is one of the first times we’re seeing altmetrics directly used as an IMPACT indicator, so I’d like to hear a LOT more about this discussion. Predictors - I don’t take much issue with the predictive metrics, though just to note, Figure 2 incorrectly implies, through the arrow pointing to the total value, that the two predictors form the “total value” score – I wonder if maybe the three composite scores should be on the left, and the predictors should be on the right, or some other way to show that the predictor scores are separate. Anyway, my one significant issue is in using CiteScore, while also claiming that EMPIRE moves away from the problematic JIF. CiteScore is virtually identical to JIF, just with a different set of indexed journals (Scopus rather than Web of Science), but all of the existing biases/power dynamics, time restrictions, lack of complexity, and other JIF-related complaints still exist. I think EMPIRE should be entirely distinct from citation-based evaluations. As is, its inclusion just serves to reinforce journal-level metrics/classification rather than provide a distinct metric from them – with CiteScore, it becomes less about the attention an individual article is receiving and more about the prestige, notoriety, and reach of a publication. I don’t like that the two are just casually mixed together, and think it’s a stronger metric, and more accurate gauge of early attention at the article level, without it. I’d like to hear more about the intended use of this metric, specifically from Novartis Pharma AG and Oxford PharmaGenesis, as well as the intended audience for this metric. I greatly appreciate the weaknesses section of this paper, including mention of proprietary subscriptions necessary to calculate, but as with any new metric, the potential for misuse is awfully high for this. I think adding intended or suggested case uses and/or limitations would help with this – specific article types are mentioned, for example, but it should be clearly stated that this isn’t applicable outside of the biomedical field. Are EMPIRE scores meant to be directly compared to establish impactful articles along multiple impact dimensions? Are they used to justify the impact of a single researcher or research facility’s research? Further, it should also be stated more clearly that this metric is just an attempt at quantifying a much more complex concept, and should not be used as a sole metric for research evaluation. As mentioned above, some of the conjectures about what specific altmetrics like tweets signify are overly general and are not well agreed-upon (some tweets may be with an aim to influence a more general public, but academic Twitter is very much a thing, so you could also consider tweets as measures of scholarly or practitioner impact, depending on the context – in short, we’re still not sure what tweets mean, which is why altmetrics don’t usually get directly associated with impact metrics at all). Going back to the Becker Model, there’s SO much incorporated into its model that can’t be quantified, and some of the indicators in EMPIRE aren’t included there. In short, there’s a great deal of uncertainty that remains with EMPIRE, mainly due to assumptions about what individual altmetrics signify. In short, many aspects of EMPIRE are highly subjective. If Novartis Pharma AG and Oxford PharmaGenesis want to use it for internal evaluation purposes, that’s their prerogative, but I have deep concerns about this, and feel like there are too many assumptions being made about its validity to warrant general use without a lot more explanation about its intended purpose and large limitations. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 15 Sep 2021 We have provided a point-by-point to the reviewer comments in our 'response to reviewers' letter. Our responses are reproduced below (I haven't included the reviewer comments that these are a response to, as it made it difficult to read). Reviewer #1 We thank Reviewer #1 for their comments. We believe that the manuscripts states the problem clearly and that the methods are repeatable for all researchers. Reviewer #2 We believe that the popularity of summary scores such as the Altmetric Attention Score (AAS) demonstrates the interest in and potential utility of summary metrics scores, and the critiques of the AAS mentioned in our paper and also by the reviewers illustrates the need for an improved approach. Demonstrating the utility of the approach will depend upon use cases. These are outside the scope of this initial methodological paper, although we do have follow-up investigations in preparation for publication. The EMPIRE Index is not a raw count of metrics but weights them, so provides information not immediately discernible from raw counts. Furthermore, we developed the EMPIRE Index because we find people are often overwhelmed by the number of individual metrics. That is why, for example, the AAS is so popular. EMPIRE Index is, in essence a data reduction tool to make the individual metrics more digestible, but still retaining enough nuance to make different types of impact visible. The h-index is conceptually different from the EMPIRE Index because it depends on the interaction between two metrics. That is, it is not a homogenous indicator, in the terminology introduced by Gingras and Larivière (in the reference provided by this reviewer: Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact) This causes it to have all kinds of problematic ratio and scalar properties, as noted in the referenced article. The EMPIRE Index component scores are composed of related metrics aggregated linearly. This means that simple processes such as ranking, adding and averaging perform as expected, unlike the h-index. However, the EMPIRE Index is susceptible to other weaknesses, as noted in our paper and by the reviewers. In particular, as noted in reference 7 (Copiello S. Multi-criteria altmetric scores are likely to be redundant with respect to a subset of the underlying information. Scientometrics. 2020;124: 819–824. doi:10.1007/s11192-020-03491-9) and by by Gingras and Larivière. We developed the EMPIRE Index with component scores of closely related metrics to minimise this problem, however the total score is inevitably susceptible. We have expanded the ‘Weaknesses’ section to acknowledge this. We have expanded the discussion section (‘Rationale for the three component scores’) to address these comments. We argue that not only that the value of metrics are inherently subjective, but that this subjective aspect is frequently glossed over when evaluating impact (for example, no rationale is provided for the weightings in the Altmetric Attention Score). We provide and framework that transparently incorporates subjective values with quantitative analyses to provide a useful tool for analysis. The EMPIRE Index itself is, therefore, only directly relevant to stakeholders who share broadly similar perspectives to our panel. However, we believe the process used for developing the EMPIRE Index is of wider interest. Any interested party can reweight the EMPIRE index using subjective values arrived at using their preferred method. We have added these considerations to the conclusion. We agree that there is a large literature on the use and misuse of altmetrics and had briefly summarize some key considerations into the original submission. We have expanded significantly on these in the current resubmission (in the introduction and in the discussion section “Rationale for the three component scores’), while noting that the intention of the EMPIRE Index is not to provide a tool for evaluating the overall impact (or value) of research, but for the very much more narrowly defined question of evaluating the impact of individual publications. Reviewer #3 The EMPIRE Index is intended to provide a measure of the impact of a publication (rather than the full impact of a research project). We use altmetrics because they are available for any publication, which enables comparative assessment. However, we acknowledge the reviewer’s concerns and have provided responses below and in the publication. Note that the index provides three component scores (Social, Scholarly, and Societal). The fourth is a predictor score that has no theoretical foundation in its own right except as its potential to predict long-term EMPIRE Index scores. We acknowledge and agree with the summary and concerns laid out here. We do not intend that this methodological paper itself to answer them. In fact, we do not think there is a universal, objective answer. Rather we intend that this paper can contribute to the debate by providing a framework to interpret metrics, and by acknowledging that, while metrics have a statistical relationship, the value of metrics to a given individual will depend on the question they are trying to answer, and therefore will be inherently subjective. By combining statistical analyses with subjective insights, we present one potential approach to addressing this problem in a limited context (biomedical research publications). We have added a note to this effect in the conclusions. The qualitative understanding of metrics is, we agree, very important to understanding publication impact. This problem can be addressed through natural language processing and other techniques that are outside the scope of the current publication. However, we feel that purely quantitative metrics are useful in their own right. We have noted the Becker model in the introduction and clarified the difference in scope and intent. The Becker model is conceptually different to the EMPIRE Index in several ways. • The Becker model is much broader in scope, as it assesses the full impact of a research project, rather than the impact of publications. The EMPIRE Index could therefore be considered to address a subset of the Becker Model. • The Becker model is a list of indicators, but it does not address how to synthesise them to enable comparisons. It is not quantitative. In addition, there are practical differences in the approach. The EMPIRE Index is restricted to metrics that allow direct comparisons. For example, among the publication metrics listed in the Becker model is downloads. We excluded downloads from the EMPIRE Index because they are not available for many publications. Furthermore, the EMPIRE Index uses only metrics that can be collected automatically. We agree that statistical assessment of sparse indicators is problematic. Wikipedia mentions are likewise sparse. We found negligible correlations between these indicators and the other metrics, likely because of these zero inflated distributions. also agree that, as with other metrics, there are likely to be interactions. However, citations in these documents were considered by our panel to be very important and distinct from other types of citations. Therefore, we feel it is important to include these impact measures as part of an overall assessment of publication impact. We view metrics as a starting point for qualitative examination – for example, if it is identified that a publication has had high societal impact, that should be a trigger to investigate further to understand the nature of that impact (which guidelines, in which context etc). We agree that some aspects of these data are problematic – in fact all altmetrics measures can be criticised for being incomplete or inaccurate. News articles depend on somewhat arbitrary lists that are biased to English language, and the lists can change over time. Guidelines in PlumX currently only include those listed in PubMed; as a result, many are missing. Policy documents re similarly skewed (our observations are that for biomedical science they are primarily sourced from NICE and IQWiG), and all of these data are subject to change as new sources are included or excluded from monitoring. Despite these problems with the underlying data, we believe that altmetrics including policy and guidelines mentions are worthy of serious attention. We feel that this is supported by our observation of the relatively high Societal impact of the ‘notable publications’ selected by NEJM editors. We have added more details of the group discussion to the methods and results section. We have changed the presentation of figure 2 to address this. We agree with the reviewer concerns here and note that for these reasons we do not include citescore as an impact measure in the EMPIRE Index. However, for the predictor scores we are trying to estimate from early data what later impact might be. The choice of journal can have a significant effect on the impact of the publication, primarily because of readership (i.e. some journals have significantly higher reach into key audiences). Unfortunately, there is no consistent, publicly available measure of journal reach measure, because most publishers don’t make readership figures available in a comparable format. We therefore included Citescore not as a measure of impact, but as a proxy for journal reach. This is mentioned in the methodology but we have strengthened this in the discussion section (‘Predictor scores’) . We have included two case studies to illustrate the potential use of the index, as well as referenced two studies that we have conducted using the EMPIRE Index (both have been presented at conferences but not yet published). We agree and feel that this has been addressed in the discussion already. However, we acknowledge there is much more that can be said on this topic, and have strengthened the discussion section accordingly (‘Rationale for the three component scores’). We acknowledge the reviewer’s concerns. The EMPIRE Index was developed to address an identified need within Novartis and the community of medical publications professionals that support the communication of research sponsored by the pharmaceutical industry. The methodology and results have been presented to meetings of the International Society for Medical Publication Professionals on three occasions, as an oral and two poster sessions). These presentations are available on the Figshare project page: https://figshare.com/projects/EMPIRE_EMpirical_Publication_Impact_and_Reach_Evaluation_Index/85211 We feel that the approach taken and the insights gained from the development of the EMPIRE index may be of interest to others involved in the communication of scientific research. Our intention in seeking publication is not to present a definitive answer to the problem of research evaluation, but rather to share what we have learned and contribute to the debate. Submitted filename: Response to reviewers.docx Click here for additional data file. 2 Mar 2022 Introducing the EMPIRE Index: A novel, value-based metric framework to measure the impact of medical publications PONE-D-21-11452R1 Dear Dr. Tomas James Rees, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Leila Nemati Anaraki Guest Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Hi Dear Sir/Madam, The article is now appropriate to publish in this journal. all the review recommendations have revised. Sincerely Yours, Reviewer #2: (No Response) Reviewer #3: Thanks to the authors for their thorough response. I feel much better about the paper with the added information and context. The case uses and recommendation for use really helps contextualize the metric, to (hopefully) prevent misuse and abuse of this metric, since it is highly tailored to a specific audience and purpose, and the added details about the panel help to better understand the viewpoints of the stakeholders. I still think the metric has too much subjectivity for broader implementation, but feel that the manuscript adequately addresses this concern, and I appreciate the care that was taken to contextualize the decisions that were made. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 4 Mar 2022 PONE-D-21-11452R1 Introducing the EMPIRE Index: A novel, value-based metric framework to measure the impact of medical publications Dear Dr. Rees: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Leila Nemati Anaraki Guest Editor PLOS ONE

23 in total

1. Bibliometrics: The Leiden Manifesto for research metrics.

Authors: Diana Hicks; Paul Wouters; Ludo Waltman; Sarah de Rijcke; Ismael Rafols
Journal: Nature Date: 2015-04-23 Impact factor: 49.962

2. Longitudinal relationship between social media activity and article citations in the journal Gastrointestinal Endoscopy.

Authors: Zachary L Smith; Austin L Chiang; Deborah Bowman; Michael B Wallace
Journal: Gastrointest Endosc Date: 2019-03-29 Impact factor: 9.427

3. Disciplinary differences of the impact of altmetric.

Authors: José Luis Ortega
Journal: FEMS Microbiol Lett Date: 2018-04-01 Impact factor: 2.742

4. Predictors of High-Impact Articles in The Annals of Thoracic Surgery.

Authors: Jessica G Y Luc; Edward Percy; Sameer Hirji; Dominique Vervoort; Gurkiran K Mann; Kevin Phan; Mahmoud Dibas; Muthiah Vaduganathan; Ourania Preventza; Mara B Antonoff
Journal: Ann Thorac Surg Date: 2020-06-12 Impact factor: 4.330

5. Glycaemic durability of an early combination therapy with vildagliptin and metformin versus sequential metformin monotherapy in newly diagnosed type 2 diabetes (VERIFY): a 5-year, multicentre, randomised, double-blind trial.

Authors: David R Matthews; Päivi M Paldánius; Pieter Proot; YannTong Chiang; Michael Stumvoll; Stefano Del Prato
Journal: Lancet Date: 2019-09-18 Impact factor: 79.321

6. Two Phase 3 Trials of Inclisiran in Patients with Elevated LDL Cholesterol.

Authors: Kausik K Ray; R Scott Wright; David Kallend; Wolfgang Koenig; Lawrence A Leiter; Frederick J Raal; Jenna A Bisch; Tara Richardson; Mark Jaros; Peter L J Wijngaard; John J P Kastelein
Journal: N Engl J Med Date: 2020-03-18 Impact factor: 91.245

10. Characteristics of the Most Cited, Most Downloaded, and Most Mentioned Articles in General Medical Journals: A Comparative Bibliometric Analysis.

Authors: Ji Hyun Hong; Dae Young Yoon; Kyoung Ja Lim; Ji Yoon Moon; Sora Baek; Young Lan Seo; Eun Joo Yun
Journal: Healthcare (Basel) Date: 2020-11-18