Literature DB >> 29081557

The h-index as an almost-exact function of some basic statistics.

Lucio Bertoli-Barsotti1, Tommaso Lando2.   

Abstract

As is known, the h-index, h, is an exact function of the citation pattern. At the same time, and more generally, it is recognized that h is "loosely" related to the values of some basic statistics, such as the number of publications and the number of citations. In the present study we introduce a formula that expresses the h-index as an almost-exact function of some (four) basic statistics. On the basis of an empirical study-in which we consider citation data obtained from two different lists of journals from two quite different scientific fields-we provide evidence that our ready-to-use formula is able to predict the h-index very accurately (at least for practical purposes). For comparative reasons, alternative estimators of the h-index have been considered and their performance evaluated by drawing on the same dataset. We conclude that, in addition to its own interest, as an effective proxy representation of the h-index, the formula introduced may provide new insights into "factors" determining the value of the h-index, and how they interact with each other.

Entities:  

Keywords:  Journal ranking; Lambert W function; Weibull distribution; h-Index

Year:  2017        PMID: 29081557      PMCID: PMC5640781          DOI: 10.1007/s11192-017-2508-6

Source DB:  PubMed          Journal:  Scientometrics        ISSN: 0138-9130            Impact factor:   3.238


Introduction

The purpose of this paper is to present a formula with which to determine (estimate) the h-index, h, under incomplete information conditions (IIC). By IIC we mean the situation in which, for different kinds of reasons, we do not know the whole set of citation data, the entire citation profile that would allow us to obtain the actual exact value of the h-index. This is the case, for example, when only few “basic” citation statistics (other than the h-index) are published, or known to us. To be concrete, we will refer to simple citation indicators—to use the words of Hirsch (2005), “single-number criteria commonly used to evaluate scientific output”—as: total number of citations ; total number of citations for the () most-cited publications, ; thus, , where represents the number of citations to publication i, and where publications are ranked in decreasing order of the number of citations: . total number of publications ; total number of “significant” publications, that is, those with at least a predetermined number of citations each (), . In this paper we focus on these indicators in their simplest versions, that is: , , and . The purpose of the analysis is twofold: to estimate the h-index (when it cannot be determined directly from the data) and hence at the same time to identify the main factors which influence the level of the h-index. A crucial question is therefore the extent to which the h-index can be satisfactorily predicted from knowledge of only the above basic statistics—i.e. under IIC. More formally, we are searching for a formula , , , where . To be noted is that the formula can be interpreted as a genuine estimator of the h-index, , i.e. , because it does not depend on values of unknown parameters. Possible estimators under IIC of the h-index can be found in the literature: A very simple proxy for the h-index is given by . This model, which can be traced back to Hirsch (2005), is not a genuine estimator of the h-index because is still a function of an unknown parameter, , and it is not specified (by the formula itself) how to estimate this parameter in terms of the above basic statistics. Nevertheless, an estimator for the h-index can be obtained by substituting the unknown parameter with a fixed constant (Hirsch found “empirically” that lay between 3 and 5). Redner (2010) found that “ is essentially equivalent to the h-index, up to an overall factor that is close to 2” (put otherwise, he found that the distribution ratio has an empirical distribution “sharply peaked about 1”). This suggests the approximating formula with , , which we could then call the Redner formula—probably the simplest estimator of the h-index, under IIC. While is a model-free proxy for the h-index, more elaborate solutions has been attempted in the literature by assuming specific probabilistic distributions for the citation rate. For example, a formula that follows model (1), with , has been recently introduced by Bertoli-Barsotti and Lando (2017), where is nothing but a “trimmed” version of the simple sample mean , and where represents the so-called Lambert-W function (Corless and Jeffrey 2015). The Lambert-W function is the function satisfying , and can be currently computed using mathematical software, for example the Mathematica® software package (Wolfram Research, Inc. 2014), or the R statistical computing environment (R Development Core Team 2012). The use of a “trimmed” version of the sample mean is a simple technique with which to make the sample mean more robust with respect to a single outlier—a single highly-cited paper that could substantially inflate the mean, as is well known. Formula is based on the assumption that the citation rate of papers (cited at least once) follows a shifted-geometric distribution (SGD) with parameter with probability function , ; represents the probability of observing the number of citations of a paper (cited at least once), while represents the expectation of the SGD. Then, expresses the “expected”/estimated number of articles with citations. As an alternative approach, an important class of models is the one defined by the formula where is a fixed and known positive constant (Schubert and Glänzel 2007). From model (4), specific ready-to-use formulas are obtained by taking, in particular: (a) (Iglesias and Pecharroman 2007; see also Ionescu and Chopard 2013; Panaretos and Malesios 2009; Vinkler 2009, 2013), (b) (Schubert and Glänzel 2007), (c) Prathap (2010a, b). Following the notation of Bertoli-Barsotti and Lando (2017), let . Note that these formulas are functions of the data only through two out of the four basic statistics (, ), and they are based on the assumption of a continuous-type distribution. The formula is also known as the “p-index” (Prathap 2010a, b). Another approach which deserves mention for completeness, even if it does not yield a ready-to-use formula, is that proposed by Iglesias and Pecharroman (2007). Adopting a different perspective, i.e. the rank-size formulation, and starting from the assumption that the number of citations of the paper of rank , is approximately distributed following a stretched exponential type PDF (not to be confused with a Weibull PDF, see below), Iglesias and Pecharroman suggest deriving a formula for the h-index as the solution of the equation Interestingly, the solution may be derived in closed form (even if authors did not realize this) by means of the Lambert-W function. Unfortunately, this solution still depends on the value of an unknown free parameter, specifically [see their Eqs. (16) and (17)]. Hence, their formula could become a genuine estimator of the h-index—of the form , —only by constraining the unknown parameter to assume a fixed (but arbitrary) value .

A new formula for the h-index under the Weibull assumption

Let be the empirical citation distribution function, i.e. the function giving the number of papers which have been cited times at most. Then, in particular, , for , , is the number of papers that have been cited exactly y times. We assume that the citation rate of a paper is a random variable that is distributed as a two-parameter Weibull distribution, with CDF , , and 0 otherwise, where and . The probability density function is thenfor , and 0 otherwise. The Weibull distribution is a rather flexible model: the PDF is reverse J-shaped for and bell-shaped otherwise. Since our assumption involves a continuous distribution, a suitable discretization rule is needed. In particular, for every , , let express the “expected” number of articles with at least citations. Hence, represents the expected number of articles with citations exactly, and the expected number of papers which have been cited times at most. As a special case,can be interpreted as a model for the so-called uncitedness factor, (Hsu and Huang 2012; see also Egghe 2013; Burrell 2013). A Weibull model for the h-index is then yielded by the solution of the equation Replacing with in the equation, we have Thus, replacing with , we obtain the equivalent equation Hence, by definition of the above mentioned Lambert-W function, we find the solution and, since , we finally arrive at the formula An empirical counterpart of the above theoretical model for the h index may now be obtained by substituting the parameters and with estimates, and , based on suitable functions of the citation data only through the basic statistics and . This can be done firstly by using the uncitedness factor to derive the equation , that can be solved (under the assumption ) for the variable asas an estimate of parameter , and secondly, by using the trimmed sample citation rate,as an estimate of the expectation of X, that is . Note that, by construction, our approximation slightly overestimates the true average number of citations, so that a correction for continuity by one-half is needed. We then find as the solution (method of moments) of the equationthat can be solved numerically. It should be noted that the existence and uniqueness of the solution of Eq. (15) are not always warranted a priori. Indeed, it can be proved that the necessary and sufficient condition for existence and uniqueness of the solution is (see "Appendix"). We should then consider “out of range” the cases where , and exclude them from the analysis. With and replaced by and in formula (12) one finally obtains (, )where the suffix WW is motivated by the fact that the formula is based on a Weibull distribution and on the Lambert-W function.

Analysis

Two datasets

This section empirically investigates the effectiveness of formula as an estimate of the actual value of the h-index, . We will compare estimates derived from with the real values of the h-index. In order to facilitate possible comparisons with other formulas (see below), we choose to use the same two datasets as in Bertoli-Barsotti and Lando (2017), where the authors present an empirical study based on citation data obtained from two different sets of journals belonging to two different scientific fields: (1) the S&MM list and (2) the EE&F list. S&MM list The former dataset includes the 231 journals as selected from a former list of 568 journals identified as important (in the opinion of a group of experts) in the area “Statistics and Mathematical Methods” (S&MM). Overall, the S&MM dataset included 485,628 citations of 99,409 publications from these journals (for details see Bertoli-Barsotti and Lando 2017). For each journal, the actual value of the h-index was computed—on the basis of citations retrieved from the Scopus database in last week of December 2015—as the largest number of papers published in the journal between 2010 and 2014 and which obtained at least citations each, from the time of publication until December 2015. Thus, citation data referred to a 6-year citation window, 2010–2015, and a 5-year publication window, 2010–2014. The four basic statistics , , and were derived as well. The list of the 231 journals in the S&MM dataset is reported in Table 1.
Table 1

Basic statistics for the S&MM list of journals and the approximation of the Hirsch h-index calculated by means of the formula (rounded values). The value is not uniquely defined (N/D) for the first journal on the list (because of a too small average number of citations per paper).

(Data retrieved in December 2015)

#ISSN code C C 1 T T 1 h \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{WW} } \right\rangle$$\end{document}hWW
11405-7425426152243N/D
21012-93672761436011168
30017-095X158131667156
40315-368155744427177910
51081-1826201121407766
60957-37203231522812277
70002-98905898735117199
80361-092620332815557541112
90117-1968163201206156
101210-05524053120511999
111056-21762902222210178
120165-489658316320198109
130315-598616624834866
140736-29945771928317699
150399-055915332864756
161303-5010658563341541112
170927-70994631629616288
181351-1610313231509288
191292-810019122785277
200361-0918103645635369910
210269-9648263161728478
221532-6349308151419378
230217-59595223326115599
241018-58954242518911599
250266-476321643239015181314
261471-678X336231389288
270304-40687372543326599
280020-72764801326515889
290023-5954813363372081111
301220-176652631193137109
311226-319245720271137109
321618-2510305311729088
331083-589X739203532091011
341048-5252643172831891010
351004-37564432714096910
361009-6124979564662401213
371120-97634341849216589
381369-1473282241407688
391230-1612346321288489
400026-133554424283171109
410218-348X4763016712999
420167-715231694015469451614
430032-4663154131035866
440282-423X4052019611699
451748-670X1933368225431413
460094-96551649556954251414
470039-0402365341298699
480894-984061529331184910
490398-7620679663031701011
500219-02573363115910278
510319-5724511362061291010
520020-3157772602851891111
530898-2112597262281491110
541524-1904669423011551212
550963-5483719242721791111
561547-5816770372902011111
570001-8678821372692011111
580021-90021168354773211312
590257-0130719182601791111
601026-022623063410366101515
610378-375838997113349071818
620377-73321353385973481513
631560-3547735252491821111
640893-4983793362972001211
651387-5841645263051781010
660167-63771702335823991414
671747-7778837294135931012
681054-34061098404292771312
691619-450049338125891211
700143-9782761312581791211
711432-29945122920714699
720219-49373042117810277
730033-51771734428785221413
741748-006X779312381841111
751381-298X364231138299
760277-6693825612171601412
771435-246X735432631751111
781572-5286587251581141212
791134-57644585924612889
800932-5026829263962101112
810926-2601769782861961010
820890-8575333471197489
830219-5259803322541791212
840515-036144737150891110
850095-4616626461921351111
860233-19341191244903041313
870167-5923663382161521211
881469-76882100776534041718
891083-64891321324883301313
901392-5113747522021381313
911863-817140434118771010
921380-78703793917010398
931862-44721866326524381515
940219-8762905653001851513
950218-12745537136137010132622
960747-4938649541491131212
970020-79851280284172681615
980047-259X3329899156502119
990303-6898868312561881212
1001471-082X4053513488910
1010924-67034133811779910
1020346-1238337281287999
1030748-80172076315343801918
1041389-44207931241841241513
1050146-6216737302151551212
1060160-56823870908536632120
1070960-077927121185704432019
1080246-02031019332662061413
1090306-7734563101147831212
1101350-72651499403752941515
1110021-9320910222742071212
1120218-48851036812972021313
1131945-497X885571621301514
1141352-8505564641921301010
1150003-1305670432411331312
1161076-2787900492241631413
1171862-534752463125791112
1180022-471553029112469662421
1191133-0686617542461271212
1201539-160410751832861941313
1211434-6028772272184914202723
1220304-41492652447915771516
1230143-208710891522281551515
1240323-384712211293272301514
1250266-46661295333032081717
1260925-50013452618496112220
1271085-7117682491831291312
1280927-53981505533582501817
1290899-82562942766965122019
1300035-92541023542121691414
1310893-9659951995163112953530
1320926-60032408785083942019
1331368-42215334911686912
1341386-199953430120831312
1350254-5330450519012418242122
1361180-40091611523252361818
1370167-94737203162154112352623
1380013-16441350782622141616
1391050-51642089303733222018
1401544-61151073562601991514
1411055-678812432853142201213
1421076-9986655601481101112
1430025-57183127605954882220
1440036-14103275856185142120
1450740-817X1881443823021818
1460167-66872779375724691919
1470364-765X1237612271801716
1481017-040520481904263081918
1491369-183X2904904693982421
1501545-59633954726585242624
1511064-12461887408135041614
1520025-55642637615454342019
1530036-13992359634663901918
1540022-3239413411210056852423
1550197-918310621311951441515
1560949-2984777251461241413
1570178-80511744474083131716
1581435-98711565513472801515
1590091-17982227564083532017
1600895-5646742431231031314
1610266-89201994982812262221
1620363-012937961126615342523
1630144-686X1902503762871719
1641061-86001661732902371818
1651066-527731652734913802523
1660020-7721558618010318152525
1670303-8300509312412608502524
1680006-341X3854757175652423
1690960-1627854361891491413
1700305-9049886562091571213
1710167-865512,8641129141712494034
1721932-81843207746484142425
1731613-9372832361711341314
1741479-840946146115741111
1751874-89611560732752061919
1760960-317418911094082841919
1771742-546835724115649501916
1780885-064X1081961851491415
1790007-11029071231491151415
1800171-64681499822151651719
1811944-039148428201811112
1821726-21351007661151121615
1831544-84441703562422101719
1840032-472855834101871112
1850022-406575234113881415
1860039-36659131761581191314
1870168-65775365393801212
1880886-938323391283652862221
1890018-95294175944693872929
1901054-15005630809367742725
1910304-407653321657236093027
1920006-34442406853923142221
1930964-19981287502341771717
1941932-615727401025243732222
1951468-121812,517238127111394237
1960025-561039971945674422727
1971436-32403874666615622422
1980167-691172593517316173735
1990305-054813,373156126111354540
2000040-17061141792351531617
2010165-0114796210811068183336
2020883-725220551082862342221
2030272-43326416868716873332
2040277-671510,506623178013143533
2051568-45399761091191061516
2060022-24961417821991601919
2070033-312314312882311721416
2080951-83209529959268503734
2090304-380013,918412168915113633
2101384-581023341372381982424
2110169-743958801877266453027
2121538-634113411472641321718
2130030-364X50981205544873029
2140098-792118551431981532222
2151465-464423471423042532322
2160199-00391110951401081617
2171052-623443217654143452528
2180735-001519322582451862221
2190167-923610,5944589237974241
2200162-145952311566635193131
2210049-1241803148115991413
2220378-873328793912312142225
2231470-160X16,653214163615164437
2240070-33703714744203762626
2250962-280214761022111532119
2260090-536458353154864333134
2270027-317118864601961511820
2280883-423719093752371512121
2291532-443514,00596611218415552
2301369-741231864751691492329
2311070-55111374941871521818
Basic statistics for the S&MM list of journals and the approximation of the Hirsch h-index calculated by means of the formula (rounded values). The value is not uniquely defined (N/D) for the first journal on the list (because of a too small average number of citations per paper). (Data retrieved in December 2015) EE&F list The second dataset included the 100 journals (with a minimum number of 50 publications) top ranked according to the Scopus Impact per Publication (IPP; the IPP is defined as the ratio of citations in a year to papers published in the three previous years divided by the number of papers published in those same years) in 2014, within the Scopus subject area of “Economics, Econometrics and Finance” (EE&F). The citation data of all 100 journals in the EE&F list were retrieved during the last week of April 2016. The dataset obtained included 19,889 publications receiving a total of 74,096 citations. In this case, differently from the above dataset, in order to obtain citation and publication windows as similar as possible to those employed for the computation of the IPP 2014 by Scopus, the citations used were those received during 2014 of papers published within the previous 3 years 2011–2013 (for further details see Bertoli-Barsotti and Lando 2017). For each journal the actual value of the h-index was then computed as the largest number of papers published in the journal between 2011 and 2013 and which obtained at least citations each in the year 2014. The list of the journals in the EE&F dataset is reported in Table 2.
Table 2

Basic statistics for the EE&F list of journals and the approximation of the Hirsch h-index calculated by means of the formula (rounded values)

(Data retrieved in April 2016)

#ISSN code C C 1 T T 1 h \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{WW} } \right\rangle$$\end{document}hWW
10022-05156976169631515
21531-46501161581271171818
31557-121117731191931732120
41540-62611529541901781718
50895-3309995441331111517
61547-71851196411531431717
70092-070310151111401281515
80304-405X2413484123722017
91468-02621014351871711413
101523-24094342681711011
111537-534X4835692791011
121465-73681389382882561614
131540-65201062521751471516
141478-6990795381551401312
151945-7790516221131031010
160002-82823303487235622120
171945-7715422389178910
181741-62483615255521010
191469-5758272266546109
200165-410151722118991111
210925-527346789210368882218
221542-4774641741481221011
231537-52771086242342131212
240921-34491723334213631513
251467-937X688321921471111
261945-774X422491099389
271873-61812683266675651615
281547-7193948562131881312
291086-44153243657491010
301741-290023434544288
311530-91421065272922411312
321530-9290887382422081111
330001-4826837482171781211
341090-9516639231541341211
351547-721523914605488
361941-138324633665188
370921-80092620346755671715
380024-630124833584498
391468-2710586361421221010
401468-0297760292101791010
411066-224335527857399
421475-679X39821111861010
430308-597X1557354753991211
440022-1996794222471911111
451096-0449673251831421111
461573-693834068997278
472041-417X17826553577
480306-9192951352912241412
491537-2707422731398699
500013-009517526513987
511052-150X26517705788
521533-446517925562887
531526-548X634611821421110
541873-59911725225404261313
551389-575323117645687
561572-308926824867177
571468-12182068357165221414
580304-3878876352952201311
590047-2727959743312461111
600969-593165216213172910
611532-8007270231027877
621075-425324510806977
631386-418119224684777
640265-133525212826288
651537-530721411796177
660301-42074903016512299
671096-122420022615776
681467-6419349181219099
691932-443X16311534766
701756-69164331916712598
710304-39323894515410588
721572-3097265141077877
731464-51143581911910677
741911-384643731156110109
751096-047322017876277
761095-9068325131269987
771389-934181717325252109
780217-45614021314812388
791548-800423881017777
800304-40761037284043051210
810038-012121838744977
820928-7655340381339388
831747-762X20538916066
841566-0141273161108777
851392-8619368451177999
861573-0913719182611981110
871475-146124426836487
881099-12553721516311388
890176-26804161817913578
901096-6099242251137867
911432-11221758896456
920929-11995532824417289
931573-06972627299347171313
941467-089515910574467
950378-42661993368936211312
961877-858516715645066
971179-189627291278867
980308-514723114886088
991043-951X4491919414588
1000168-703417613744187
Basic statistics for the EE&F list of journals and the approximation of the Hirsch h-index calculated by means of the formula (rounded values) (Data retrieved in April 2016)

Estimation of the h-index with the formula

Table 1 for the S&MM list and Table 2 for the EE&F list report, for each journal, identified by its ISSN code, the four basic statistics, , , and , the h-index, , as computed using the above procedure, and the value provided by the formula in its rounded-off version , that is, in symbols,where is the floor function (recall that the floor function of gives the greatest integer less than or equal to ). Note that, from an operational point of view, all estimating formulas (1) generate real numbers. However, for estimation purposes, these numbers should be rounded-off to the nearest integer, not only in order to produce numbers in the same range of values as the h-index but also to avoid “false precision”. (Hicks et al. 2015). To give an example illustrating the calculation of this estimate, let us consider the case of the Journal of the American Statistical Association (ISSN 0162-1459, from the S&MM list). We have and . Hence Then, substituting and into the Eq. (15) we findwhich yields the solution . Thus, sincewe finally conclude thatso that the rounded-off version of in this case exactly coincides with the actual h-index, In Figs. 1 and 2 we plot for each journal, respectively for the S&MM list and the EE&F list, the empirical value of the h-index h versus its predicted value by .
Fig. 1

Scatterplot of the empirical value of the h-index h versus its predicted value by , for the S&MM list of journals. The dashed line is identity, so ideally all the points should overlie this line

Fig. 2

Scatterplot of the empirical value of the h-index h versus its predicted value by , for the EE&F list of journals. The dashed line is identity, so ideally all the points should overlie this line

Scatterplot of the empirical value of the h-index h versus its predicted value by , for the S&MM list of journals. The dashed line is identity, so ideally all the points should overlie this line Scatterplot of the empirical value of the h-index h versus its predicted value by , for the EE&F list of journals. The dashed line is identity, so ideally all the points should overlie this line

A comparative analysis of the accuracy

To verify the accuracy of formula , comparatively, we considered, among several possible ready-to-use formulas, the following ones among those defined above: , , , , , which have been viewed as important or promising alternatives to the formula—due to an empirically recognized high correlation with the h-index [see Bertoli-Barsotti and Lando (2017) for formula , Glänzel (2006), Malesios (2015), Schreiber et al. (2012) and Schubert and Glänzel (2007) for formulas , and Redner (2010), for formula ]. To measure the magnitude of the observed accuracy, for each of the six estimation formulas respectively numbered as: (1) , (2) , (3) , (4) , (5) , (6) , we calculated the absolute relative error (ARE) of the estimator of the actual h-index, , for each journal , , where is the rounded-off version of formula , , then, as a criterion with which to assess the overall quality of the formula, we computed the mean absolute relative error (MARE), The results are summarized in Table 3.
Table 3

Relative accuracy, computed in terms of MARE, of different estimators of the h-index; r represents the number of basic metrics on which the estimation formula is based for each dataset, the smallest error is indicated by a boldface number

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< h_{WW}^{{}} >$$\end{document}<hWW> \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {\tilde{h}_{W}^{\left( 1 \right)} } \right\rangle$$\end{document}h~W1 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{SG} \left( {0.63} \right)} \right\rangle$$\end{document}hSG0.63 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{SG} \left( {0.75} \right)} \right\rangle$$\end{document}hSG0.75 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{SG} (1)} \right\rangle$$\end{document}hSG(1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\langle {h_{R} } \right\rangle$$\end{document}hR
r 442221
S&MM list (230 cases) 0.060 0.0760.2710.1410.1620.224
EE&F list (100 cases)0.056 0.050 0.2170.0810.2510.192
Relative accuracy, computed in terms of MARE, of different estimators of the h-index; r represents the number of basic metrics on which the estimation formula is based for each dataset, the smallest error is indicated by a boldface number

Conclusion

This paper has addressed the need to gain better understanding of how simple citation metrics are related to the h-index, or rather, to a “good” proxy representation of the h index. This also responds to the more basic requirement of “building bridges” between different types of known and available measures of impact/impact indicators—under IIC. Differently from other studies (that consider the problem of defining a “model” of the h-index), our concern has not been to estimate the parameters (sometimes even considered at the unit level, i.e. single journal, or single scientist; see e.g. Petersen et al. 2011) of a parametric model for the h-index under the assumption of knowing the entire citation pattern; rather, we addressed the quite different and more practical problem of finding a proxy representation of through a universal formula that only depends on few summary statistics of the data. The formula is “universal” in the sense that it gives a proxy representation of h that holds for any given journal and any dataset. The issue of determining an indicator under IIC is closely related to the search for a solution of the problem of recovering and comparing impact indicators from different databases. As a simple but significant example of this issue, we may cite the specific problem of determining/estimating the IF for journals using the Google Scholar-based h-index as a predictor (Bertocchi et al. 2015). As confirmed in our case study analysis, the h-index can be viewed as an almost-exact function of and , through , i.e. that the basic statistics and provide salient information for the evaluation of the h-index with high precision. In practice, while computation of the h-index h requires knowledge of the entire citation profile (or at least large part of it, e.g. the so-called h-core), formula requires knowledge of only a few elementary summary statistics, but reproduces the actual value of h quite well. In truth, in our computations we found that the estimates yielded by were slightly biased downwards for quite high values of the h-index but, as can be seen from Table 3, overall the formula yields very accurate approximations to the empirical value of the h-index, with values of the MARE ranging around 5–6%, not too dissimilar from those obtained by formula (Bertoli-Barsotti and Lando 2017). Both formulas and exhibit comparable levels of accuracy (the advantages of the formula , as compared to formula , may be that: (i) it yields an explicit expression of the basic indicators and , while the latter not, and (ii) it is based on a simpler probabilistic model). Even though the Pearson correlation, , is not an adequate measure of the accuracy of the estimation and should not be used to compare the effectiveness of the different estimators considered (and this is the reason why this concept has been banished from this study), for the sake of completeness we point out that: (1) for the S&MM dataset (230 journals), we found , , and ; (2) for the EE&F dataset we found , , and . Ultimately, despite the differences between the datasets considered—in terms of scientific areas, time windows for publication and citation, types of “citable” documents considered, mean level of the basic indicators and (with values of respectively 2111, 95, 432 and 312 for the S&MM dataset and 741, 33, 199 and 159 for the EE&F dataset)—we may conclude that, on the whole, provides fairly accurate approximations to the real value of the h-index, at least for not too large values of T (e.g. ), m (e.g. ) and h (e.g. h < 40), such as those considered in this study.
  4 in total

1.  An index to quantify an individual's scientific research output.

Authors:  J E Hirsch
Journal:  Proc Natl Acad Sci U S A       Date:  2005-11-07       Impact factor: 11.205

2.  Bibliometrics: The Leiden Manifesto for research metrics.

Authors:  Diana Hicks; Paul Wouters; Ludo Waltman; Sarah de Rijcke; Ismael Rafols
Journal:  Nature       Date:  2015-04-23       Impact factor: 49.962

3.  Statistical regularities in the rank-citation profile of scientists.

Authors:  Alexander M Petersen; H Eugene Stanley; Sauro Succi
Journal:  Sci Rep       Date:  2011-12-05       Impact factor: 4.379

4.  A theoretical model of the relationship between the h-index and other simple citation indicators.

Authors:  Lucio Bertoli-Barsotti; Tommaso Lando
Journal:  Scientometrics       Date:  2017-03-20       Impact factor: 3.238

  4 in total
  7 in total

1.  Productivity of CNPq Researchers from Different Fields in Biomedical Sciences: The Need for Objective Bibliometric Parameters-A Report from Brazil.

Authors:  Jean Paul Kamdem; Daniel Henrique Roos; Adekunle Adeniran Sanmi; Luciana Calabró; Amos Olalekan Abolaji; Cláudia Sirlene de Oliveira; Luiz Marivando Barros; Antonia Eliene Duarte; Nilda Vargas Barbosa; Diogo Onofre Souza; João Batista Teixeira Rocha
Journal:  Sci Eng Ethics       Date:  2018-02-05       Impact factor: 3.525

2.  Research Hotspots and Trends of Peripheral Nerve Injuries Based on Web of Science From 2017 to 2021: A Bibliometric Analysis.

Authors:  Shiwen Zhang; Meiling Huang; Jincao Zhi; Shanhong Wu; Yan Wang; Fei Pei
Journal:  Front Neurol       Date:  2022-05-20       Impact factor: 4.086

3.  The Lambert Function Should Be in the Engineering Mathematical Toolbox.

Authors:  Iordanis Kesisoglou; Garima Singh; Michael Nikolaou
Journal:  Comput Chem Eng       Date:  2021-02-17       Impact factor: 3.845

4.  Burnout amongst radiologists: A bibliometric study from 1993 to 2020.

Authors:  Muhammad Fazal Hussain Qureshi; Danish Mohammad; Syed Mustafa Ali Shah; Mahira Lakhani; Muzna Shah; Muhammad Hassan Ayub; Sara Sadiq
Journal:  World J Psychiatry       Date:  2022-02-19

5.  Evaluating the Academic Influence of Orthopedic Surgeons in Spinal Literature Through Relative Citation Ratio.

Authors:  Zachary T Grace; Harsh Patel; Ali M Omari; Angeline Sanders; Nareena Imam; John D Koerner
Journal:  Cureus       Date:  2022-05-19

6.  Are network growth and the contributions to congresses associated with publication success? A pediatric oncology model.

Authors:  Frank Berthold; Christoph Bartenhagen; Lothar Krempel
Journal:  PLoS One       Date:  2019-01-25       Impact factor: 3.240

7.  Evaluating the impact of citations of articles based on knowledge flow patterns hidden in the citations.

Authors:  Mingyang Wang; Jiaqi Zhang; Shijia Jiao; Tianyu Zhang
Journal:  PLoS One       Date:  2019-11-21       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.