Literature DB >> 28596626

A theoretical model of the relationship between the h-index and other simple citation indicators.

Lucio Bertoli-Barsotti1, Tommaso Lando1,2.   

Abstract

Of the existing theoretical formulas for the h-index, those recently suggested by Burrell (J Informetr 7:774-783, 2013b) and by Bertoli-Barsotti and Lando (J Informetr 9(4):762-776, 2015) have proved very effective in estimating the actual value of the h-index Hirsch (Proc Natl Acad Sci USA 102:16569-16572, 2005), at least at the level of the individual scientist. These approaches lead (or may lead) to two slightly different formulas, being based, respectively, on a "standard" and a "shifted" version of the geometric distribution. In this paper, we review the genesis of these two formulas-which we shall call the "basic" and "improved" Lambert-W formula for the h-index-and compare their effectiveness with that of a number of instances taken from the well-known Glänzel-Schubert class of models for the h-index (based, instead, on a Paretian model) by means of an empirical study. All the formulas considered in the comparison are "ready-to-use", i.e., functions of simple citation indicators such as: the total number of publications; the total number of citations; the total number of cited paper; the number of citations of the most cited paper. The empirical study is based on citation data obtained from two different sets of journals belonging to two different scientific fields: more specifically, 231 journals from the area of "Statistics and Mathematical Methods" and 100 journals from the area of "Economics, Econometrics and Finance", totaling almost 100,000 and 20,000 publications, respectively. The citation data refer to different publication/citation time windows, different types of "citable" documents, and alternative approaches to the analysis of the citation process ("prospective" and "retrospective"). We conclude that, especially in its improved version, the Lambert-W formula for the h-index provides a quite robust and effective ready-to-use rule that should be preferred to other known formulas if one's goal is (simply) to derive a reliable estimate of the h-index.

Entities:  

Keywords:  Geometric distribution; Glänzel–Schubert formula; Journal impact factor; Journal ranking; Lambert W function; h-index for journals

Year:  2017        PMID: 28596626      PMCID: PMC5438441          DOI: 10.1007/s11192-017-2351-9

Source DB:  PubMed          Journal:  Scientometrics        ISSN: 0138-9130            Impact factor:   3.238


Introduction

Some simple and basic bibliometric indicators, such as the total number of citations C, the total number of publications with at least a number of citations k each, T , the total number of citations for the t most cited papers, C , the average number of citations per paper (ACPP), (where, hereafter, T stands for T 0), as well as the h-index (Hirsch 2005; Braun et al. 2006; Schubert and Glänzel 2007; Harzing and van der Wal 2009), are routinely used to measure the relevance and citation impact of journals when computed according to suitable, pre-specified timeframes. In particular, time-limited versions of the ACPP lead to different types of “impact factors”, with possible variants defined according to different pre-specified publication and citation time windows, and also depending on the degree of overlap between these timeframes (synchronous and diachronous impact factors; Ingwersen et al. 2001). Similarly, alternative versions of the h-index have been defined (synchronous and diachronous h-indexes; Bar-Ilan 2010). In general, all these indicators merge information about the number of citations received by a journal within a pre-specified time window—typically a huge amount of data—into a single representative value interpretable as a measure of a journal’s “quality”. Their computation requires knowledge of the entire citation pattern, or at least most of it. In recent years, a certain interest has been shown in developing theoretical models with which to “estimate” one such indicator given the values of certain others. Well-known representative examples are theoretical models with which to obtain the value of the h-index, h: but also theoretical models with which to estimate C, as a function of h (Petersen et al. 2011), or as a function of m and h (Egghe et al. 2009), or as a function of T and h (Burrell 2013b), and so on. These models—usually based, in their turn, on the assumption of a specific probabilistic model for the citation distribution—may be effective, for instance, when the indicator of interest cannot be obtained directly because it is not accessible, or when the availability of citation data is incomplete. For example, there may be the case in which h is not available but we know C and T (Glänzel 2006; Schubert and Glänzel 2007; Bletsas and Sahalos 2009), or the case in which we have to impute missing values of impact factors using the availability of the h-index as a predictor (Bertocchi et al. 2015). as a function of C (Hirsch 2005), as a function of T (Egghe and Rousseau 2006), as a function of T 1 (Burrell 2013a), as a function of C and T (Glänzel 2006; Iglesias and Pecharroman 2007; Schubert and Glänzel 2007; Bletsas and Sahalos 2009; Egghe et al. 2009; Egghe and Rousseau 2012), as a function of C, T 1 and C 1 Bertoli-Barsotti and Lando (2015); In particular, in this paper we focus mainly on the problem of obtaining an explicit “universal” formula for estimating the actual value of the h-index. Recently, Burrell (2013b) and Bertoli-Barsotti and Lando (2015) introduced a model that has proved very effective in estimating the actual value of the h-index for individual scientists. More precisely, these approaches lead (or may lead) to two slightly different formulas, being based, respectively, on a “standard” and a “shifted” version of the geometric distribution. In the first part of section ‘Methods’ we present a (functional) equation, based on the geometric distribution, that constitutes a theoretical basis for both these approaches. Indeed, this equation allows us to derive a closed-form estimator of the h-index, expressed as a function of (some of) the above citation metrics. We shall call this estimator, for reasons which will be apparent below, the Lambert-W formula for the h-index. In the related scientific literature, authors often limit their analysis to the problem of estimating the unknown parameters of a suggested theoretical parametric model for the h-index, under the assumption of knowing the real values of the h-index. Instead, in this paper we consider the more practical (and in a certain sense, opposing) problem of determining the (unknown) h-index on the basis of a ready-to-use formula for it. Then, in our empirical analyses we will use the actual values of the h-index but only to evaluate, a posteriori, the performance of the proposed ready-to-use formulas and not to determine (maybe for interpretative reasons) unknown parameters of a theoretical parametric model. In this paper, we will concentrate on the case of the h-index for journals (Braun et al. 2006). One of the major differences between the cases of an individual scientist and a journal is that, in the latter, the h-index should be computed in a “timed” version, i.e. limited to suitable, usually relatively short, publication and citation time windows. In this regard, it should be noted that a familiar definition such as “a journal has index h if h of its publications each have at least h citations and the other publications each have no more than h citations” is somewhat inaccurate because it does not specify the time windows to be considered for the calculation of h. One of the aims of our study will also be to test the robustness of the formula empirically against different possible choices of (1) length of the time windows and (2) type of approach adopted for analyzing the citation process: “prospective” (diachronous) or “retrospective” (synchronous) (Glänzel 2004). We shall also focus on a comparison of effectiveness between the Lambert-W formula for the h-index and a popular class of alternative models, related to the so-called Glänzel–Schubert formula, that have already been proved to be highly correlated to the h-index. In the second part of section ‘Methods’ we review the existing literature on the Glänzel–Schubert family of models (and related models) and discuss some problematic aspects linked to the presence of unknown parameters in their expressions. Then, in section ‘Two empirical studies’, we report the results of an empirical comparison between the Lambert-W formula for the h-index and these alternative models, using two different dataset of journals. For this task, we downloaded citation data from the Scopus database on about 100,000 and 20,000 publications, respectively, for the first and the second dataset. Based on the results of our research study, we conclude that the Lambert-W formula for the h-index provides an effective ready-to-use rule that should be preferred to other known formulas if one’s goal is (simply) to derive a reliable estimate of the h-index.

Methods

Models of the relationship between h and other simple metrics based on citation counts

A basic equation connecting h, T and C

A model of a hypothetical equation of the typeis sought, connecting h, T and C. Naturally, we do not assume a deterministic relationship among observed values of h, T and C, rather, we shall determine a “probabilistic” relationship. Indeed, the problem addressed here is that of deriving a formula for predictions. In particular, we try to identify a model that is able to predict one input-term given the other two (e.g. h given T and C, or C given h and T, or, which is the same, C/T given h and T, and so on). A preliminary solution of the functional Eq. (1) can be obtained by “assuming” (which here represents a simple working hypothesis) the geometric distribution (GD) with parameter P,where p(x) gives the probability of observing x and P, P > 0, represents the expectation of the GD (Johnson et al. 2005, p. 210). Then the value expresses the “expected” number of articles with x citations (size-frequency function). Now, since for every k, , , the predicted number of papers with at least k citations is By definition of the h-index, h, this yields the equation . Then, assuming as an estimate of the expectation P (see Johnson et al. 2005, Eq. 5.12, p. 211), we derive the following model of functional equation We note in passing that this model yields, as a byproduct, the formula for the “uncitedness factor”, providing proof of the result conjectured by Hsu and Huang (2012) (see also Egghe 2013; Burrell 2013c). This equation represents a theoretical model of the relationship among the h-index, the number of publications T and the ACPP, m. Equation (4) can be solved with respect to any of its arguments. In particular, Given h and T, we easily obtain an estimate of the expectation P as follows:and Given T and C, we obtain an estimate of h as follows. Equation (4) is equivalent to , where and . Then, multiplying each side of the latter equation by log a, and substituting , we obtain , which leads immediately to the solutionwhere represents the so-called Lambert-W function (Corless and Jeffrey 2015). Remember that the Lambert-W function is the function W(y) satisfying , and can be currently computed using mathematical software, for example the Mathematica® 10.0 software package (Wolfram Research, Inc. 2014; it is implemented in the Wolfram Language as “LambertW”), or also using the R statistical computing environment (R Development Core Team 2012). Hencethat is, equivalently,where we have adopted a new symbol for differentiating the “predicted” h-index, , from the actual value h of the h-index. Note that the GD approach has been previously suggested by Burrell (2007, 2013b, 2014) but without giving an explicit formula, in closed form, for the estimation of the h-index.

An equation connecting h, T1 and C

As a general rule, one should expect that knowledge of other (i.e., other than m and T) simple summary statistics of the raw citation data will help increase the precision of the h-index estimate. Indeed, if we also assume that we know T 1, a modified version of the above formulas can be easily introduced by taking the shifted-geometric distribution (SGD) with parameter Qwhere p(y) represents the probability of observing the number of citations y of a paper cited at least once, and Q, Q > 1, represents the expectation of the SGD. Since for every k, , , then represents the number of papers with at least k + 1 citations. Then, assuming , the average number of citations of articles that have been cited at least once, as a proxy for the expectation Q, we derive the following functional equation This equation can be solved with respect to any of its arguments. In particular, Given h and T 1, we obtainand Given T 1 and C, and following a completely analogous sequence of steps as in the above point (b), we obtain the estimate of h

A formula for the h-index, as a function of T1, C and C1

If we also know the total number of citations of the most cited paper, C 1, we can hope to improve the accuracy of the above formula further. Indeed, with the use of the trimmed mean—that is, the sample mean obtained omitting the most highly cited paper— instead of m 1, we obtain a modified (improved) version of the above formula, which we shall define , As is well known, citation distributions are highly skewed; hence the sample mean is distorted by extreme values. In particular, the presence of individual highly-cited papers tends to overestimate C, and consequently , in comparison to the true h-index—that is clearly insensitive to a single very highly cited paper. In this sense, the use of a trimmed mean is simply a technique for reducing this possible bias. To summarize, we have: or also, equivalently, , and or also, equivalently, . We shall refer to these formulas as Lambert-W formulas for the h-index, respectively, in a “basic”, , and an “improved” version, . The formula has been considered elsewhere Bertoli-Barsotti and Lando (2015) for the estimation of the h-index for individual scientists.

Theoretical parametric models for the h-index related to the Glänzel–Schubert formula

A well-known alternative “theoretical model of the dependence of the citation h-index on the sample size and the sample’s mean citation rate” (Schubert et al. 2009) is the one proposed by Schubert and Glänzel (2007), who noted that the h-index is approximately proportional to “a power function of the sample size and the sample mean”, namely to the function (Schubert et al. 2009; see also Glänzel 2007, 2008). In applications, this fact has given rise to a plethora of “variants”, as possible parametric models for the h-index. It is useful to distinguish each of them with the following nine cases. Iglesias and Pecharroman (2007) derived the following one-parameter family of models of the h-index: where (the formula was reported by Iglesias and Pecharroman with parameter ). Glänzel (2008) estimated this model in an empirical comparative study of h-index for journals. He found that the estimate of the power parameter depends on the length of the citation window considered. In particular, he found that the formula (α = 2 in his notation, which corresponds to η = 2/3 in ours) is appropriate “for small windows comprising an initial period of about 3 years after publication”. From the above model, Iglesias and Pecharroman (2007) also obtained, for η = 2/3, the ready-to-use formula: (see also Panaretos and Malesios 2009; Vinkler 2009, 2013; Ionescu and Chopard 2013). By starting from a continuous probability distribution—a Pareto distribution of the second kind, (Johnson et al. 1994, p. 575; Arnold 1983, p. 44), also known as the Lomax distribution (Lomax 1954), where , represents the probability of observing a number greater than x, x > 0—and estimating its expectation (that exists if ) by the sample mean m, Schubert and Glänzel (2007) (see also Glänzel 2006) derived a slightly more general two-parameter model: here defined as also reported by Bletsas and Sahalos (2009); see their Eq. (4)), as an approximate (and generalized) solution of the equation where . In words, model (16) states that “the h-index can be approximated by a power function of the sample size and the sample mean” (Schubert et al. 2009). It is important to note that the model is similar to but different from the above model , because in the former the proportionality constant is not merely a function of the power parameter η, while in the latter γ represents a free parameter. This gives rise to a more flexible model. Malesios (2015) estimated the parameters of model (16) in a study on 134 journals in the field of ecology and 54 journals in the field of forestry sciences. He obtained the best fit, respectively, with the estimates (0.64, 0.7) and (0.66, 0.78) for the pair (η, γ) (in our parameterization). The above Pareto distribution of the second kind has also recently become known as the Tsallis distribution (Tsallis and de Albuquerque 2000). More specifically, with reparameterization and , the probability of observing a number greater than x, x > 0, becomes equal to (see Bletsas and Sahalos 2009; Shalizi 2007). Bletsas and Sahalos (2009) suggest obtaining an estimate of the h-index as the numerical solution of the Eq. (17), that is for a pre-specified fixed value of the unknown parameter q. Let us call the (implicit) solution of Eq. (18). It is important to stress that, unlike all the other estimators of h-index considered in the present study, a closed-form expression for h T does not exist. Nevertheless, in an empirical application to a set of electrical engineering journals, Bletsas and Sahalos (2009) found a very good fit between measured and estimated values of the h-index, assuming Tsallis distribution with parameter q = 1.5 and q = 1.6. It is interesting to note that these values correspond, respectively, to η = 2/3 and η = 0.625, since . For a special choice of the power parameter (η = 2/3 in the present parameterization) in model (16), Schubert and Glänzel (2007) derived the celebrated one-parameter model also known as the Glänzel–Schubert model of the h-index. This model has been widely used (mainly for interpretative purposes—i.e. to provide a better understanding of the “mathematical properties” of the h-index) because several empirical studies suggest the existence of a strong correlation between h-index and . Its drawback (as with model (16)) is obviously that the value of the proportionality constant γ is unknown. Certainly, this parameter can be determined (ex post) empirically, but it is likely to vary from case to case (Prathap 2010a; Alguliev et al. 2014). Then, as a ready-to-use formula for estimating the h-index a priori, the Glänzel–Schubert model is in fact unusable. Sometimes researchers find an ex post least square estimate of the parameter γ, starting from known values of the h-index. In different contexts, and for different datasets, the estimate of the γ parameter has been found to vary appreciably, in that it turns out to range approximately from 0.7 to 0.95. Indeed, for example, Schubert and Glänzel (2007) found, for γ, the estimates 0.73 and 0.76, in a study on the h-index for journals, for two different sets of journals, while Csajbók et al. (2007) found an estimate of γ of 0.93 in a macro-level analysis of the h-index for countries. Instead, other authors, among them Annibaldi et al. (2010), Bouabid et al. (2011) and Zhao et al. (2014), have found values of around 0.8. In quite different contexts (partnership ability and h-index for networks) Schubert (2012) and Schubert et al. (2009) have estimated the parameter γ of the model , obtaining values within the range 0.6–0.96. In the absence of a specific value of the proportionality constant γ, researchers sometimes decide to set γ equal to a fixed arbitrary value γ 0, obtaining a ready-to-use formula In the framework of the analysis of the h-index for journals, ready-to-use formulas for estimating the h-index with the formula have been adopted, for example, by Bletsas and Sahalos (2009), with the choice . Instead, for example, Ye (2009, 2010) and Elango et al. (2013) adopted the rule to set for journals and for other sources. Abbas (2012) and Vinkler (2013) also adopted the choice . It is worth noting that the latter value leads to the formula , which coincides with the so-called p-index defined by Prathap (2010b). Finally, note that . As noted above, empirical analyses suggest a “strong linear correlation” between the h-index and the function (Schubert and Glänzel 2007; Glänzel 2007; Schreiber et al. 2012; Malesios 2015). Strictly speaking, this only means that when h is plotted against , the data fall fairly close to a straight line. In other terms, h is approximately equal to , for suitable choices of the parameters δ and γ. Indeed, the following three-parameter model has been considered in literature (see Bador and Lafouge 2010) In a comparative analysis of two samples of 50 journals (taken from the ‘‘Pharmacology and Pharmacy’’ and ‘‘Psychiatry’’ sections of the Journal Citation Reports 2006), Bador and Lafouge (2010) obtained the LS estimates of the parameters δ and γ for different fixed values of the power parameter η (values of “α close to 2”, in their parameterization, where ). Their best estimates of the proportionality constant γ ranged from 0.7 to 0.8, with an intercept point always very close to 1. Based on these results, and a fortiori , underestimate the h-index. For the particular choice of the power parameter η = 2/3 in the above model , we obtain the two-parameter model This model directly generalizes the above Glänzel–Schubert model by introducing a free intercept parameter, δ. Tahira et al. (2013) tested this model in a scientometric analysis of engineering in Malaysian universities. They found the estimates δ = −0.28 and γ = 0.97. Finally, by assuming a linear dependence between the h-index and the function in a double logarithmic axis plot (log–log plot), one may define the following three-parameter model (see Radicchi and Castellano 2013) Indeed, after taking logs, this corresponds to a regression relationship between log h and the linear model , where . Needless to say, model is similar to but essentially different from the above models (a)–(h). Radicchi and Castellano (2013) analyzed the scientific profile of more than 30,000 researchers. They found a good linear correlation, in a log–log plot, between the true h-index and the values given by the model . Using this relationship, they obtained, in particular, the least square estimate of the parameter η: . It is quite puzzling to observe that the solution reached by Radicchi and Castellano is out of the parameter space of all the above models (η > 0.5).

Two empirical studies

A first dataset of journals

Journal selection

The Research Evaluation Exercise for the period 2011–2014 named “Valutazione della Qualità della Ricerca 2011–2014” (hereinafter VQR) is a national research assessment exercise organized under the aegis of the Italian Ministry of Education, University and Research for evaluating and ranking all Italian scientific institutions (typically, all national universities and research centers), on the basis of the quality of their research outcomes. The results obtained are particularly important because they determine the allocation of government funding to Italian universities. The VQR is carried out under the responsibility of a National Agency for the Evaluation of University and Research, the “Agenzia Nazionale di Valutazione del Sistema Universitario e della Ricerca” (ANVUR), and is organized with reference to 14 different academic fields, or Areas. The research assessment is actually conducted by Groups of Evaluation Experts (GEV, in the Italian acronym), one for each Area. For our first empirical analysis, we consider the so-called Area 13—Scienze economiche e statistiche—Economics and Statistics. The evaluation of each researcher is based on the quality of his/her research outcomes published during the period 2011–2014. As a general rule, the evaluation of a research product for Area 13 is made at journal-level. This means that journal bibliometric indicators are used as surrogate measures to quantify the quality of each individual research product (published in that journal). For this purpose, a list of “relevant” journals for Area 13 has been compiled by the corresponding GEV (the so-called GEV 13) and suitable journal-based metrics are extracted to this end from three sources, that is: Web of Science (WoS), Scopus, and Google Scholar (GS). The full list of the “relevant” journals for Area 13 includes 2717 journals and may be found on the ANVUR website (www.anvur.org). Each journal on the Area 13 list was individually assigned to one of five sub-areas, among them “Statistics and Mathematical Methods” (S&MM). For the purpose of our case study, we selected a somewhat homogeneous list of journals using the following steps: we considered all and only the journals (568 journals) belonging to the sub-area S&MM; to facilitate possible comparisons between databases, the journals selected were subsequently restricted to only those (253) journals indexed by all three databases: WoS, Scopus and GS; we excluded 15 journals with incomplete issues within the period under investigation, 2010–2014; finally, in order to preserve the homogeneity of the sample, we excluded 6 journals with a “too large” number of published papers (more than 2000) and 1 journal that publishes only online. Our final sample included 231 journals. According to the Scopus classification, these journals belong to a number of different “Subject Areas”. Table 1 shows the “Subject Areas” in which the 231 journals selected from the S&MM list are placed by Scopus (it should be recalled that Scopus classifies journal titles into 27 major thematic categories and a journal may belong to more than one category).
Table 1

Scopus “Subject Areas” of the 231 journals within the S&MM list

Subject areaCount%
Mathematics23938.3
Decision sciences7912.7
Computer science6310.1
Social sciences518.2
Engineering457.2
Economics, econometrics and finance375.9
Medicine233.7
Business, management and accounting172.7
Environmental science132.1
Others579.1
Scopus “Subject Areas” of the 231 journals within the S&MM list

Estimating the h-index

After selecting the S&MM list of journals, we retrieved citation data from the Scopus database. According to the VQR time-span, we considered all documents within the publication window of 5 years (2010–2014) (in fact GEV13 considers the 5-year Google Scholar’s h-index, for the period 2010–2014) and the citations that these items received until the time of accessing the database (last week of December 2015). This means a 6-year citation window, 2010–2015, over a 5-year publication window: 2010–2014. Harzing and van der Wal (2009) considered similar timeframes in a study on a set of journals in the area of economics and business. Overall, the dataset obtained included 99,409 publications receiving (until December 2015) a total of 485,628 citations. The complete list of the 231 journals in the S&MM dataset is reported in Table 2, where each journal is identified by its ISSN code. For each journal, we manually computed, on the basis of the citations downloaded, the actual value h of the h-index, as: the largest number of papers published in the journal between 2010 and 2014 and which obtained at least h citations each, from the time of publication until December 2015. Table 2 reports, for each journal, the h-index, h, and its estimates, obtained (1) with the Lambert-W formulas for the h-index, , , and, as a comparison, (2) with the Glänzel–Schubert formula, , for different values of the proportionality constant γ 0, namely, 0.63, 0.7, 0.8, 0.9, 1 (note that identifies formula ), and (3) by means of a numerical solution of Eq. (18), for different values of q 0, namely, 1.2, 1.4, 1.6. Table 2 also reports: the total number of citations, C; the total number of publications, T; the total number of publications cited at least once, T 1; the total number of citations of the most cited paper, C 1. To facilitate comparisons, have all been rounded to the nearest integer to produce numbers in the same range of values as the h-index.
Table 2

Basic statistics for the S&MM list of journals and the approximations of the Hirsch h-index calculated by means of different formulas (rounded values)

#ISSN code C T T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text{1}}$$\end{document}C1 h \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{W}^{\left( 0 \right)}$$\end{document}hW0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{h}_{W}^{\left( 1 \right)}$$\end{document}h~W1 h SG (.63) h SG (.7) h SG (.8) h SG (.9) h SG (1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.2} \right)$$\end{document}hBS1.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.4} \right)$$\end{document}hBS1.4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.6} \right)$$\end{document}hBS1.6
11405-74254215224633312222222
21012-93672763601111465644556456
30017-095X158166711355534455455
40315-36815574271774497866789788
51081-1826201140771266645567566
60957-37203232281221577755678677
70002-989058935117187988678910899
80361-092620331555754281191091011121491214
90117-1968163120612056544556555
101210-05524052051193198866789788
111056-21762902221012276755677666
120165-4896583320198161088678910899
130315-598616683482466645667665
140736-2994577283176199997781011899
150399-055915386473256545566555
161303-5010658334154561191078910119910
170927-70994632961621687866789788
181351-1610313150922388856789777
191292-810019178522277755678666
200361-09181036635369459998810111291011
210269-9648263172841677755677666
221532-6349308141931578866789777
230217-595952226115533989678910899
241018-589542418911525989678910888
250266-476321649015183231312121112141617131516
261471-678X336138922388867789888
270304-40687374332652599878910118910
280020-727648026515813888678910888
290023-59548133372083611101189101113101111
301220-1766526193137311010978910119109
311226-319245727113720108866789788
321618-2510305172903187756778777
331083-589X7393532092010910789101291010
341048-5252643283189171099789101191010
351004-37564431409627910107891011999
361009-61249794662405612101189101113101112
371120-97634344921651886755677567
381369-1473282140762487856778777
391230-16123461288432898678910888
400026-1335544283171241089678910899
410218-348X4761671293091097891011999
420167-715231691546945401612131213151719131618
430032-4663154103581366644566555
440282-423X4051961162098867889888
451748-670X1933822543361412121012131517121415
460094-96551649695425551412121011131416121415
470039-04023651298634999678910888
480894-984061533118429999778910899
490398-76206793031706610910789101291010
500219-02573361591023178766789787
510319-57245112061293610997891011999
520020-31577722851896011111089101213101111
530898-211259722814926111010789101291010
541524-19046693011554212911789101191010
550963-54837192721792411101189101112101111
561547-58167702902013711101089101113101111
570001-867882126920137111111910111214111211
580021-9002116847732135131111910111314111213
590257-01307192601791811101189101113101111
601026-022623061036610341512131112141617131516
610378-375838991334907711815161416182023161921
620377-7332135359734838151112910121315111313
631560-35477352491822511111189101213101111
640893-49837932972003612111189101213101111
651387-58416453051782610910789101191010
660167-63771702582399331413131112141517131515
671747-7778837135932941015121112141617141413
681054-3406109842927740131112910111314111213
691619-4500493125893812111189101112101010
700143-97827612581793112111189101213111111
711432-2994512207146299997891011999
720219-49373041781022177756678677
730033-5177173487852242141111911121415111314
741748-006X77923818431111111910111214111211
751381-298X3641138223999778911998
760277-669382521716061141212910121315121212
771435-246X7352631754311111189101113101111
781572-52865871581142512111189101213111110
791134-57644582461285988867899888
800932-5026829396210261110118810111291011
810926-26017692861967810101089101113101111
820890-85753331197447898678910888
830219-525980325417932121111910111214111211
840515-036144715089371110107891011999
850095-46166261921354611111189101113101110
860233-1934119149030424131112910111314111213
870167-59236632161523812111189101113101111
881469-76882100653404771714161213151719151617
891083-64891321488330321312121011121415121314
901392-511374720213852131212910111314111211
911863-817140411877341010107891011999
921380-78703791701033998867899888
931862-44721866652438321513141112141617131516
940219-876290530018565151112910111314111212
950218-12745537137010131362619201820232528212426
960747-493864914911354121212910111314121211
970020-79851280417268281612131011131416121414
980047-259X3329915650892117171416182123182021
990303-689886825618831121212910111314121212
1001471-082X40513488359997791011999
1010924-67034131177938910107891011999
1020346-12383371287928989678910888
1030748-80172076534380311915161314161820161718
1041389-4420793184124124151312911121415121212
1050146-621673721515530121112910111214111111
1060160-56823870853663902119191618212326202223
1070960-077927125704431182018181516192123192020
1080246-02031019266206331413131011131416131313
1090306-77345631478310112111189101213111110
1101350-72651499375294401515141113151618151515
1110021-932091027420722121212910121314121212
1120218-48851036297202811313131011121415121313
1131945-497X885162130571514141112141517141413
1141352-8505564192130641010107891112101010
1150003-13056702411334313101189101112101010
1161076-2787900224163491413131011121415131312
1171862-5347524125796311111189101213111110
1180022-471553021246966912420201820232528212426
1191133-068661724612754121011789101291010
1201539-160410752861941831313121011131416131313
1211434-6028772218491420722721212022252932232730
1220304-41492652791577441515151315171921161819
1230143-208710892281551521514141112141617141414
1240323-384712213272301291513131012131517131414
1250266-46661295303208331714151112141618141515
1260925-50013452849611612218191517192224192122
1271085-711768218312949131212910111214111111
1280927-53981505358250531815161213151718151616
1290899-82562942696512762017181516192123182021
1300035-92541023212169541414141112141517141414
1310893-9659951916311295953526272427313438293335
1320926-60032408508394782018181416182023181919
1331368-422153311686499121189111213111110
1341386-1999534120833013121289111213111110
1350254-5330450512418241902118191618202325192224
1361180-40091611325236521816171314161820161716
1370167-94737203154112351622622222023262932242830
1380013-16441350262214781616151213151719161615
1391050-51642089373322302018181416182023181919
1401544-61151073260199561514131011131516131413
1411055-678812433142202851214121112141517141414
1421076-998665514811060111212910111314121211
1430025-57183127595488602220201618202325202222
1440036-14103275618514852120201618212326212222
1450740-817X1881382302441817171315171921171818
1460167-66872779572469371918181517192124192021
1470364-765X1237227180611716161213151719151615
1481017-040520484263081901917171415171921171818
1491369-183X2904469398902421201718212426212222
1501545-59633954658524722622231820232629232525
1511064-12461887813504401612131011131516121415
1520025-55642637545434612018181516192123192020
1530036-13992359466390631918181416182123181919
1540022-3239413410056851122418201618212326202223
1550197-918310621951441311515151113141618151514
1560949-2984777146124251414131011131416131312
1570178-80511744408313471716161214161820161717
1581435-98711565347280511516151213151719161616
1590091-17982227408353562018181416182123191919
1600895-5646742123103431314141012131516131312
1610266-89201994281226982220201517192224202019
1620363-012937966615341122521221820222528222424
1630144-686X1902376287501717181315171921171818
1641061-86001661290237731817171315171921171817
1651066-527731654913802732522211719222527222323
1660020-7721558610318151802523232022252831242728
1670303-8300509312608501242519211719222527212425
1680006-341X3854717565752421211719222527222424
1690960-1627854189149361413131011131416131312
1700305-9049886209157561213131011121416131312
1710167-865512,8641417124911294035333134394449384243
1721932-81843207648414742419221618202325202222
1731613-9372832171134361314141011131416131312
1741479-840946111574461111118910111210109
1751874-89611560275206731917181314171921171717
1760960-317418914082841091916171314161921171817
1771742-546835721564950411913141314161820141720
1780885-064X1081185149961416151213151718151514
1790007-11029071491151231415141112141618141413
1800171-64681499215165821718191415172022181817
1811944-0391484201812811911778911999
1821726-21351007115112661617161314171921171614
1831544-84441703242210561719191416182123191918
1840032-47285581018734111312910121315121111
1850022-406575211388341415151112141517141312
1860039-36659131581191761315131112141617141413
1870168-6577536938053121312910121315121110
1880886-938323393652861282220201617202225202120
1890018-95294175469387942927282123273033272827
1901054-15005630936774802724242023262932252829
1910304-407653327236091653026262124273134272929
1920006-34442406392314852220201517202225202120
1930964-19981287234177501716161213151719161615
1941932-615727405243731022219201517192224192121
1951468-121812,517127111392384237363135404550394343
1960025-561039975674421942724241921242730252626
1971436-32403874661562662422211820232528232424
1980167-691172597316173513732322629333742343535
1990305-054813,373126111351564539393337424752424545
2000040-17061141235153791615161112141618141514
2010165-0114796211068181083328312427313539303334
2020883-725220552862341082220201517202225202019
2030272-43326416871687863327292325293336293131
2040277-671510,506178013146233527282528323640303437
2051568-45399761191061091517161314161820161514
2060022-24961417199160821918181415171922181816
2070033-312314312311722881417161314171921171716
2080951-83209529926850953735352932374246373939
2090304-380013,918168915114123634333134394449384244
2101384-581023342381981372424241820232628232321
2110169-743958807266451873028272325293336293131
2121538-634113412641321471716181213151719161615
2130030-364X50985544871203029292325293236293030
2140098-792118551981531432222221618212326212119
2151465-464423473042531422322211718212426222221
2160199-00391110140108951618171314171921171615
2171052-623443214143457652529262225283236292928
2180735-001519322451862582221201617202225202019
2190167-923610,5949237974584238383135404550404242
2200162-145952316635191563127282224283135282929
2210049-1241803115991481415131112141618141413
2220378-873328792312143912228252123263033272624
2231470-160X16,653163615162144440393539445055434849
2240070-33703714420376742626262022262932262726
2250962-280214762111531022118191415172022181817
2260090-536458354864333153133332629333741343433
2270027-317118861961514601822191718212426212119
2280883-423719092371513752121201617202225202019
2291532-443514,00511218419665542453539455056454847
2301369-741231861691494752332302527313539312926
2311070-55111374187152941818181415171922181716

C the total number of citations, T the total number of papers, T 1 total number of papers cited at least once, C 1 the total number of citations of the most cited paper, h the actual value of the h-index; , Lambert-W formulas for the h-index, the Glänzel–Schubert formula, for different values of γ 0, γ 0 = 0.63, 0.7, 0.8, 0.9, 1, the numerical solution of Eq. (18), for different values of q 0, q 0 = 1.2, 1.4, 1.6

Basic statistics for the S&MM list of journals and the approximations of the Hirsch h-index calculated by means of different formulas (rounded values) C the total number of citations, T the total number of papers, T 1 total number of papers cited at least once, C 1 the total number of citations of the most cited paper, h the actual value of the h-index; , Lambert-W formulas for the h-index, the Glänzel–Schubert formula, for different values of γ 0, γ 0 = 0.63, 0.7, 0.8, 0.9, 1, the numerical solution of Eq. (18), for different values of q 0, q 0 = 1.2, 1.4, 1.6

A second dataset of journals

We also analyzed a second dataset, based on the citation data of the top 100 journals, within the Scopus subject area of “Economics, Econometrics and Finance”, ranked according to the Scopus journal impact factor, i.e. the Impact per Publication (IPP) 2014. The list (let us call it the EE&F list) may be found at http://www.journalindicators.com and it consists of journals with a minimum number of 50 publications. We recall that the IPP 2014 of a journal is basically the average number of citations received by papers published in 2014 (registered in the Scopus database), to papers published by the same journal from 2011 until 2013. In particular, Scopus takes account of the following types of citable items and citing sources: articles, reviews, and conference papers. All other documents (e.g. notes, letters, articles in press, erratum, etc.) are excluded from the computation. We downloaded from Scopus the citation data of all 100 journals on the aforementioned list during the last week of April, 2016. The dataset obtained included 19,889 publications receiving a total of 74,096 citations (during 2014). The complete list of these journals is reported in Table 3, where each journal is identified by its ISSN code. Differently from above, we excluded all non-citable items (e.g. notes, etc.) in order to obtain sets of publications as close as possible to those employed for the computation of IPPs by Scopus. Once the set of papers for each journal has been selected, it is possible to request a citation report (“view citation overview”) and download the citations per paper received in the year 2014: that is, all and only the citations needed for the computation of the IPP 2014. In fact, we found some positive differences between the actual values of , with an average value over all 100 journals of 3.8, and the official IPPs 2014, with an average value of 3. These differences may be due to: (1) a delayed update of the database (the IPPs were published by Scopus in June 2015), and (2) a larger set of citing sources and documents (with Scopus, it is not possible to limit the citation report to particular citing sources or documents). Similar differences between official and observed values have been found and discussed, for instance, by Leydesdorff and Opthof (2010), Stern (2013) and Seiler and Wohlrabe (2014). Nonetheless, in this case the ACPP should, theoretically, represent a 3-year synchronous impact factor for the year 2014 (Ingwersen et al. 2001; Ingwersen 2012) in that we considered only citations received during 2014 of papers published within the previous 3 years. For each journal, we manually computed the actual value h of the h-index as the largest number of papers published in the journal between 2011 and 2013 and which obtained at least h citations each in the year 2014. Ultimately, we obtained a synchronous h-index (Bar-Ilan 2010), for a 1-year citation window.
Table 3

Basic statistics for the EE&F list of journals and the approximations of the Hirsch h-index calculated by means of different formulas (rounded values)

#ISSN code C T T 1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text{1}}$$\end{document}C1 h \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{W}^{\left( 0 \right)}$$\end{document}hW0 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{h}_{W}^{\left( 1 \right)}$$\end{document}h~W1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.63} \right)$$\end{document}hSG.63 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.7} \right)$$\end{document}hSG.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.8} \right)$$\end{document}hSG.8 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.9} \right)$$\end{document}hSG.9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( 1 \right)$$\end{document}hSG1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.2} \right)$$\end{document}hBS1.2 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \left( {1.4} \right)$$\end{document}hBS1.4 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.6} \right)$$\end{document}hBS1.6
10022-05156976963611516151213151719151412
21531-46501161127117581819181415182022181715
31557-121117731931731192121201618202325212019
41540-62611529190178541719191516182123191917
50895-3309995133111441517161214161820161514
61547-71851196153143411718171315171921171715
70092-070310151401281111517151214161819161514
80304-405X2413412372482019191517192224202020
91468-02621014187171351415141112141618141414
101523-24094348171261011118911121311109
111537-534X483927956101211910111214111110
121465-73681389288256381616151213151719151615
131540-65201062175147521516151213151719151514
141478-6990795155140381314131011131416131312
151945-77905161131032210121189111213111110
160002-82823303723562482119191617202225192122
171945-7715422917838911108910111310109
181741-62483615552521011108911121310109
191469-57582726546261099778910887
200165-4101517118992211111189111213111110
210925-527346781036888922220191719222528212425
221542-477464114812274101211910111314121211
231537-52771086234213241214131112141517141414
240921-34491723421363331515141213151719151616
251467-937X6881921473211111199111214111111
261945-774X42210993498109789111210109
271873-61812683667565261617161415182022171920
281547-7193948213188561314121011131516131313
291086-4415324574936101010891011121098
301741-2900234544234898678910887
311530-91421065292241271313121011131416131313
321530-929088724220838111211910121315121212
330001-482683721717848121212910121315121212
341090-951663915413423121211910111214111111
351547-7215239605414898678910887
361941-1383246665133898678910887
370921-80092620675567341716161415171922171919
380024-6301248584433998678910887
391468-27105861421223610121189111213111110
401468-029776021017929101211910111314111211
411066-224335585732791097891011998
421475-679X39811186211010107891011999
430308-597X1557475399351213121112141517141515
440022-199679424719122111111910111214111211
451096-04496731831422511121199111214111111
461573-6938340997268798778911998
472041-417X17855352677756778776
480306-919295129122435141212910121315121212
491537-270742213986739997891011999
500013-009517551392687756788776
511052-150X265705717898678910887
521533-446517956282587856778776
531526-548X6341821426111111089101213111111
541873-59911725540426221314131112141618141516
551389-575323164561788867889877
561572-308926886712478867889887
571468-12182068716522351413131113151618141617
580304-387887629522035131111910111214111212
590047-272795933124674111111910111314111212
600969-5931652213172169111089101113101110
611532-8007270102782378766789777
621075-425324580691078766789777
631386-418119268472477756778776
640265-133525282621288866789877
651537-530721479611177756788776
660301-42074901651223091097891011999
671096-122420061572278756789776
681467-64193491219018998678910888
691932-443X16353471167656678666
701756-691643316712519999778910899
710304-393238915410545898678910888
721572-3097265107781478756789777
731464-511435811910619798678910888
741911-38464371561103110997791011999
751096-047322087621777756778776
761095-9068325126991388867889888
771389-93418173252521710101089101113101111
780217-456140214812313898678910898
791548-800423810177877756778777
800304-4076103740430528121110910111214111212
810038-012121874493878756789776
820928-76553401339338888678910888
831747-762X20591603867655678666
841566-0141273110871678766789777
851392-86193681177945999778910998
861573-09137192611981811101089101113101111
871475-146124483642688766789777
881099-12553721631131588867899888
890176-268041617913518798678910888
901096-6099242113782567756678776
911432-11221758964856645667666
920929-1199553244172288997891011999
931573-06972627934717291314131214161819151718
941467-089515957441067755678666
950378-42661993893621361312111012131516121415
961877-858516764501567655678666
971179-189627212788967756788777
980308-514723188601488856788777
991043-951X44919414519898678910899
1000168-703417674411387755677666

C the total number of citations, T the total number of papers, T 1 the total number of papers cited at least once, C 1 the total number of citations of the most cited paper, h the actual value of the h-index, , Lambert-W formulas for the h-index, Glänzel–Schubert formula, for different values of γ 0, γ 0 = 0.63, 0.7, 0.8, 0.9, 1; the numerical solution of Eq. (18), for different values of q 0, q 0 = 1.2, 1.4, 1.6

Basic statistics for the EE&F list of journals and the approximations of the Hirsch h-index calculated by means of different formulas (rounded values) C the total number of citations, T the total number of papers, T 1 the total number of papers cited at least once, C 1 the total number of citations of the most cited paper, h the actual value of the h-index, , Lambert-W formulas for the h-index, Glänzel–Schubert formula, for different values of γ 0, γ 0 = 0.63, 0.7, 0.8, 0.9, 1; the numerical solution of Eq. (18), for different values of q 0, q 0 = 1.2, 1.4, 1.6 In the same way as above, for each journal we manually computed the actual value h of the h-index. Table 3 reports, for each journal, the h-index, h, and the other indicators also considered in Table 2, namely , , , for , the numerical solution of Eq. (18), for different values of q 0, namely , as well as the simple basic metrics C, T, T 1 and C 1.

Discussion and conclusion

The h-index is, today, one of the tools most commonly used to rank journals (Braun et al. 2006; Vanclay 2007, 2008; Schubert and Glänzel 2007; Bornmann et al. 2009; Harzing and van der Wal 2009; Liu et al. 2009; Hodge and Lacasse 2010; Bornmann et al. 2012; Mingers et al. 2012; Xu et al. 2015). Indeed, its value is currently provided by all the three major citation databases, WoS, Scopus and GS. In an earlier study (Bertoli-Barsotti and Lando 2015) the Lambert-W formula for the h-index was proved to be a good estimator of the h-index for authors. In this paper, we have extended the empirical study to the case of the h-index for journals. One of the major differences between the case of an individual scientist and that of a journal, for the computation of the h-index, is the role played by publication and citation time windows, and the approach adopted for the analysis and interpretation of the citation process (“prospective” vs “retrospective”; Glänzel 2004). As stressed by Braun et al. (2006): “The journal h-index would not be calculated for a “life-time contribution”, as suggested by Hirsch for individual scientists, but for a definite period”. In fact, “Hirsch did not limit the period in which the citations were received” (Bar-Ilan 2010). Unlike the case of individual scientists, and in view of a comparative assessment, calculations of a journal’s h-index must be timed (note that a notion of “timed h-index” has also been recently introduced by Schreiber (2015), for the case of individual scientists), i.e. it must be referred to standardized time periods of journal coverage, for example of 2, 3 or 5 years, as is usually done for the computation of the impact factor, in order to limit the typical size-dependency of the h-index—that is, its dependency on the total number of publications (an indicator is said to be size-dependent if it never decreases when new publications are added, Waltman 2016). A journal’s “impact factor” is essentially a time-limited version of the average number of citations by papers published in the journal in a given period of time. Several types of “impact factors” may be defined, depending on different time windows considered for publication and citation data and, possibly, different approaches to the analysis of the citation process, leading to synchronous or diachronous impact factors (Ingwersen et al. 2001; Ingwersen 2012). In its WoS form, the publication window is 2 years (defining the 2-year Impact Factor, IF) or 5 years (defining the 5-year Impact Factor, IF5), while Scopus adopts a 3-year publication window for its IPP. In all these cases, the impact factor is computed in a synchronous mode, i.e. the citations used for the calculation are all received during the same fixed period—1 year, in these cases. In this paper, we first presented the Lambert-W formula for the h-index in two versions (differing on the basis of the various citation metrics on which they depend), a basic version and an improved version, respectively and . Then we tested, by means of an empirical study, their efficiency and effectiveness, as well as: that of another popular theoretical model for the h-index that has been successfully applied elsewhere to the same type of application, i.e. the Glänzel–Schubert formula, , for different values of the free parameter γ 0, and secondly, that given by the numerical solution of Eq. (18), for different values of the free parameter q 0. We compared the performances of these formulas as estimators of the h-index—in particular, in terms of accuracy and robustness—with an empirical study conducted on two different samples of journals. We computed the h-index manually, on the basis of citations downloaded. In our empirical study, in the first dataset (S&MM), the ACPP can be interpreted as a diachronous impact factor (Ingwersen et al. 2001; Ingwersen 2012), because for each paper the citations are counted from the moment of publication until the time of accessing the database (as in the case of individual scientists). More specifically, we computed an “impact factor” involving a 6-year citation window over a 5-year publication window. As to be expected, due to the larger citation window, we obtained, for all 231 journals, the averages of 4.4 and 1.5 respectively for m and IF5{2014}, the traditional 5-year impact factors 2014, as published by WoS in its Journal Citation Report. Moreover, we also observed a high level of Pearson correlation, ρ, between m and IF5{2014}, that is: (quite similar to that observed between IF5{2014} and IF{2014}, the WoS 2-year and impact factors 2014, that is: ). Instead, in the second dataset (EE&F), m can be interpreted as a 3-year impact factor in its ordinary synchronous version, as computed by Scopus. Hence, following the terminology of Bar-Ilan (2010, 2012), we obtained a diachronous and a synchronous h-index, respectively, in the first and second empirical study. To evaluate the measure of fit of an estimate of the h-index, say (rounded to the nearest natural number), with respect to the exact value h , we computed the absolute relative error and the squared relative error for each journal j, j = 1,…,J. Then, as a criterion with which to assess the overall quality of the various estimators considered in the paper, we computed the mean absolute relative error, and the root mean squared relative error , for each estimator. As expected, the Pearson correlation between the actual value h of the h-index and each of its estimates , and , was very high, for both S&MM and EE&F datasets. In particular, this confirms previous empirical results concerning the formula (see Schubert and Glänzel 2007; Glänzel 2007). Indeed, ρ always exceeded 0.97. More specifically, we found the following: for the S&MM dataset, and ; for the EE&F dataset, and . Nevertheless, as can be seen from Figs. 2 and 4, a high correlation does not specifically identify a “good” estimator for the h-index. Formula yielded similar levels of correlation, but a much lower level of MARE, see Figs. 1 and 3 (be aware that the figures refer to non-rounded values of the estimates). Note that the correlation between the h-index and does not depend on the unknown value of , while, at the same time, the MARE of depends heavily on the choice of . As can be seen from Table 4, at its best (among the values of tested), the error of reached its minimum (in terms of both MARE and RMSRE), for , for the dataset S&MM, while for the EE&F dataset the error of is at its minimum for a slightly different value of γ 0, i.e. γ 0 = 0.8. This confirms that, for fixed values of γ 0, the effectiveness of the formula may depend on the length of the citation window considered (Glänzel 2008) and, finally, that there is no “universal” optimal value for the constant γ 0 in the formula . Instead, for both datasets, the formula gives similar, and even smaller, levels of error (in terms of both MARE and RMSRE).
Fig. 2

S&MM dataset: scatterplot of h vs Glänzel–Schubert formula . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line

Fig. 4

EE&F dataset: versus Glänzel–Schubert formula . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line

Fig. 1

S&MM dataset: scatterplot of h versus . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line

Fig. 3

EE&F dataset. Scatterplot of h versus . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line

Table 4

Relative accuracy, computed in terms of MARE and RMSRE (in italic), of different estimators of the h-index. For each dataset, the smallest error is indicated by a boldface number

Journal datasetMARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{W}^{\left( 0 \right)}$$\end{document}hW0 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{h}_{W}^{\left( 1 \right)}$$\end{document}h~W1 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.63} \right)$$\end{document}hSG.63 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.7} \right)$$\end{document}hSG.7 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.8} \right)$$\end{document}hSG.8 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( {.9} \right)$$\end{document}hSG.9 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{SG}} \;\left( 1 \right)$$\end{document}hSG1 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.2} \right)$$\end{document}hBS1.2 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.4} \right)$$\end{document}hBS1.4 MARE RMSRE \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h_{\text{BS}} \;\left( {1.6} \right)$$\end{document}hBS1.6
S&MM0.1040.0760.2720.1930.0990.0760.1630.103 0.065 0.076
0.133 0.100 0.283 0.207 0.122 0.117 0.198 0.129 0.094 0.103
EE&F0.092 0.050 0.2170.1270.0580.1300.2510.0580.0720.092
0.120 0.079 0.229 0.149 0.088 0.158 0.275 0.093 0.108 0.124
S&MM dataset: scatterplot of h versus . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line S&MM dataset: scatterplot of h vs Glänzel–Schubert formula . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line EE&F dataset. Scatterplot of h versus . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line EE&F dataset: versus Glänzel–Schubert formula . Pearson correlation , . The dashed line is identity, so ideally all the points should overlie this line Relative accuracy, computed in terms of MARE and RMSRE (in italic), of different estimators of the h-index. For each dataset, the smallest error is indicated by a boldface number The approach that consists of obtaining the numerical solution of Eq. (18) was also considered. We tentatively tested this method for nine different values of the free parameter q between 1 and 2, i.e. q 0 = 1.1, 1.2,…,1.9. As expected, the resulting estimates were more or less accurate depending on the set value of q 0. Of the nine values of q 0 tested, the smallest estimation error was obtained for a q 0 value equal to around 1.4 (MARE = 0.065; RMSRE = 0.094), for the S&MM dataset, and for a q 0 value equal to around 1.2 (MARE = 0.058; RMSRE = 0.093) for the EE&F dataset (see Table 4). Ultimately, h T was found to be the most accurate estimator (if one takes q 0 = 1.4), of those included in Table 4, for the S&MM dataset and the third best (if one takes q 0 = 1.2), for the EE&F dataset. Overall, the errors are not dramatically different in the range of q between 1.2 and 1.6, and then a value of q 0 = 1.5, also tested by Bletsas and Sahalos (2009), may be a good compromise solution. The Pearson correlation between the actual value h of the h-index and its estimate varies slightly according to the selected value of q 0, but it is still very high: in particular, for q 0 = 1.5, we obtain for the S&MM dataset and for the EE&F dataset. Hence, overall, the method may lead to a very good fit, but it has two main drawbacks. First, the expression of is not given by any explicit formula. Second, this method continues to suffer from the problem of the conventional choice of an unknown parameter, in that we do not know a priori the value of the parameter q that will yield the “smallest” estimation error. In conclusion, basically, the same type of equation (see Eqs. 4, 10), describes the relationship between the h-index and other simple citation metrics. The Lambert-W formula for the h-index works well (also) for estimating the h-index for journals—especially in its improved version (13). As can be deduced from our empirical study, this still holds true for different scientific areas, for different time windows for publication and citation, for different types of “citable” documents, and for different approaches to the analysis of the citation process (“prospective” vs “retrospective”; Glänzel 2004). At the same time, the Glänzel–Schubert class of models seems to be much less robust and reliable as an estimator of the h-index, because its accuracy closely depends on a conventional choice of one or more unknown parameters. We may accordingly conclude that and are quite effective “universal” (in the sense that they are ready-to-use) informetric functions that work well for estimating the h-index, for a sufficiently wide range of values. Indeed, our empirical analysis, though preliminary, suggests that the fit is very good, at least for the datasets that we studied, and for values of its arguments that are not too large, namely, h < 40, T < 2000 and m < 20, which may be considered standard values for the cases of both and scientists journals within time-spans of 2–5 years.
  6 in total

Review 1.  Diversity, value and limitations of the journal impact factor and alternative metrics.

Authors:  Lutz Bornmann; Werner Marx; Armen Yuri Gasparyan; George D Kitas
Journal:  Rheumatol Int       Date:  2011-12-23       Impact factor: 2.631

2.  Scientometric analysis of national university research performance in analytical chemistry on the basis of academic publications: Italy as case study.

Authors:  Anna Annibaldi; Cristina Truzzi; Silvia Illuminati; Giuseppe Scarponi
Journal:  Anal Bioanal Chem       Date:  2010-09       Impact factor: 4.142

3.  An index to quantify an individual's scientific research output.

Authors:  J E Hirsch
Journal:  Proc Natl Acad Sci U S A       Date:  2005-11-07       Impact factor: 11.205

4.  Bounds and inequalities relating h-index, g-index, e-index and generalized impact factor: an improvement over existing models.

Authors:  Ash Mohammad Abbas
Journal:  PLoS One       Date:  2012-04-04       Impact factor: 3.240

5.  Statistical regularities in the rank-citation profile of scientists.

Authors:  Alexander M Petersen; H Eugene Stanley; Sauro Succi
Journal:  Sci Rep       Date:  2011-12-05       Impact factor: 4.379

6.  Global nanotribology research output (1996-2010): a scientometric analysis.

Authors:  Bakthavachalam Elango; Periyaswamy Rajendran; Lutz Bornmann
Journal:  PLoS One       Date:  2013-12-05       Impact factor: 3.240

  6 in total
  23 in total

1.  Global Research Trends in Revision Total Knee Arthroplasty: A Bibliometric and Visualized Study.

Authors:  Shengjie Dong; Yu Zhao; Jiao Jiao Li; Dan Xing
Journal:  Indian J Orthop       Date:  2021-03-10       Impact factor: 1.251

2.  A bibliometrics analysis and visualization of osteoimmunology on osteoarthritis studies.

Authors:  Bo Lu; Zhenteng Liu; Fanlin Kong; Hepeng An; Xueling Hou; Guofeng Fan; Wei Dong
Journal:  Am J Transl Res       Date:  2022-06-15       Impact factor: 3.940

3.  Global Research Trends in Tendon Stem Cells from 1991 to 2020: A Bibliometric and Visualized Study.

Authors:  Huibin Long; Ziyang Yuan; Heyong Yin; Bo Yang; Ai Guo
Journal:  Stem Cells Int       Date:  2022-06-18       Impact factor: 5.131

Review 4.  Depicting Developing Trend and Core Knowledge of Primary Open-Angle Glaucoma: A Bibliometric and Visualized Analysis.

Authors:  Liting Zhao; Jinfei Li; Lemeng Feng; Cheng Zhang; Wulong Zhang; Chao Wang; Ye He; Dan Wen; Weitao Song
Journal:  Front Med (Lausanne)       Date:  2022-07-05

5.  Global Trends in Research of Macrophages Associated With Acute Lung Injury Over Past 10 Years: A Bibliometric Analysis.

Authors:  Sheng Wang; Huanping Zhou; Li Zheng; Wanli Zhu; Lina Zhu; Di Feng; Juan Wei; Guannan Chen; Xiaohong Jin; Hao Yang; Xuan Shi; Xin Lv
Journal:  Front Immunol       Date:  2021-05-20       Impact factor: 7.561

6.  Trends in the Use of Sphingosine 1 Phosphate in Age-Related Diseases: A Scientometric Research Study (1992-2020).

Authors:  Qiong He; Gaofeng Ding; Mengyuan Zhang; Peng Nie; Jing Yang; Dong Liang; Jiaqi Bo; Yi Zhang; Yunfeng Liu
Journal:  J Diabetes Res       Date:  2021-02-25       Impact factor: 4.011

7.  The h-index as an almost-exact function of some basic statistics.

Authors:  Lucio Bertoli-Barsotti; Tommaso Lando
Journal:  Scientometrics       Date:  2017-09-09       Impact factor: 3.238

8.  Methodological and reporting quality evaluation of meta-analyses on the Chinese herbal preparation Zheng Qing Feng Tong Ning for the treatment of rheumatoid arthritis.

Authors:  Mingge Liang; Lan Yan; Zhigang Mei; Yanan Luo; Xiaoqiang Hou; Zhitao Feng
Journal:  BMC Complement Med Ther       Date:  2020-06-26

9.  The State of Exosomes Research: A Global Visualized Analysis.

Authors:  Bin Wang; Dan Xing; Yuanyuan Zhu; Shengjie Dong; Bin Zhao
Journal:  Biomed Res Int       Date:  2019-04-03       Impact factor: 3.411

10.  The global state of research in nonsurgical treatment of knee osteoarthritis: a bibliometric and visualized study.

Authors:  Kai Wang; Dan Xing; Shengjie Dong; Jianhao Lin
Journal:  BMC Musculoskelet Disord       Date:  2019-09-04       Impact factor: 2.362

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.