Acceso abierto

Equalities between h-type Indices and Definitions of Rational h-type Indicators


Cite

Introduction

Definition: The classical h-index

Consider a set S of publications, ranked decreasingly according to the number of citations each of these publications has received. Publications with the same number of citations are given different rankings. Then the h-index of set S is h if the first h publications received each at least h citations, while the publication ranked h+1 received strictly less than h+1 citations. Stated otherwise: the h-index of set S is the largest natural number h such that the first h publications received at least h citations (Hirsch, 2005).

When applied to the publication list of a researcher the previous definition favors more prolific, e.g. older, scientists above those with less publications, e.g. younger ones. For this reason one may use a publication window in calculating an h-index. Also the citation window can be adapted to make a difference between short-term and long – term influence. As databases differ in content an h-index may also differ according to the used database. Besides these adaptations of the original definition it is also possible to calculate h-indices for other types of citations, e.g. of patents and for fractionally counted items.

Definition: the g-index

Additional citations to publications among the first h play no role at all. For this reason another indicator has been introduced. This is the g-index, proposed by Egghe (2006a). It is defined as follows: articles are ranked in decreasing order of received citations (as for the h-index). Then the g-index of this set of articles is defined as the highest rank g such that these g articles together received at least g2 citations. If necessary, fictitious articles with zero citations are added to the publication list.

Definition: Kosmulski’s index and its generalizations

Another variation on the h-index was introduced by Kosmulski (2006). He proposed the h(2)-index as follows. Again one ranks the set of articles for which one wants to determine the h(2)-index in decreasing order of received citations. Now this set (authors, journals, etc.) has an h(2)-index equal to h2 if r = h2 is the highest rank such that the first h2 articles each received at least (h2)2 citations.

As a next step, colleagues observed that one may define in a similar manner an h(k)-index (k = 1, 2, 3, ….). This has been done e.g. in (Deineko & Woeginger, 2009), who proposed an axiomatic characterization of an even more general family of indices and in (Egghe, 2011), who studied this index in a Lotkaian framework.

Concretely the h(3) index is defined as follows. Consider a list of articles ranked decreasingly according to the number of citations each of these articles has received. Articles with the same number of citations are given different rankings. Then the h(3)-index of this set S is h3 if the first h3 articles received each at least (h3)3 citations, while the article ranked h3+1 received strictly less than (h3+1)3 citations. Stated otherwise: the h(3)-index of a set S is the largest natural number h3 such that the first h3 publications each received at least (h3)3 citations (Fassin & Rousseau, 2018).

In this contribution we represent the units of attention (authors, journals, research groups, etc. ) as a finite array such as A = (10, 7, 7, 2, 0). This symbol shows that author A has five publications with respective (ranked) citations equal to 10, 7, 7, 2 and 0. Clearly author A has an h-index equal to 3 and a g-index equal to 5. The h(2)-index is equal to 2 and the h(3)-index is equal to 1. The number of items with a non-zero number of citations is called the length of the array. This array has length4. For simplicity we will always assume that values in array A are natural numbers (including the value zero). In this contribution we restrict our attention to the g, h, h(2) and the h(3) index, to which we refer as h-type indices.

When do we have equality?

It follows from their definitions that always g ≥ h ≥ h(2) ≥ h(3). In this section we tackle the question: for which arrays are two different h-type indices equal?

A. When is h = h(2)?

We recall the two conditions: a set of articles has h-index h if the first h articles received at least h citations and the article ranked h+1 received strictly less than h+1 citations. Similarly: a set of articles has h(2)-index h2 if the first h2 articles received at least (h2)2 citations each and the article ranked h2+1 received strictly less than (h2+1)2 citations. If the two conditions must be satisfied at the same time, then the first h articles must have received at least h2 citations each and the article ranked h+1 must have strictly less than h+1 citations. Obviously, h = h(2) can only occur for an array of length at least equal to h.

The following array A = (100, 30, 9, 3) is an example for which h = h(2) = 3.

The least number of citations for the case h = h(2) = 3, occurs for the array (9, 9, 9, 0). We added a non-essential zero at the end to make it clear that the length of this array is three. Generally, the least number of citations for the case h = h(2) is (h2,h2,h2,0htimes).$\left( \underbrace{{{h}^{2}},{{h}^{2}},\,\ldots {{h}^{2}},0}_{h\,times} \right).$Of course, there is no upper limit to the corresponding number of citations.

B. When is h = h(3) or equivalently, when is h = h(2) = h(3) ?

As, by definition h(2) is always situated between h and h(3), it suffices to solve the problem: when is h = h(3)?

Again we recall the two conditions: a set of articles has h-index h if the first h articles received at least h citations and the article ranked h+1 received strictly less than h+1 citations. Similarly: a set of articles has h(3)-index h3 if the first h3 (here equal to h) articles received at least (h3)3 citations each and the article ranked h3+1 received strictly less than (h3+1)3 citations. If the two conditions must be satisfied at the same time, then the first h articles must have received at least h3 citations each and the article ranked h+1 must have strictly less than h+1 citations.

The following array A = (100, 30, 27, 3) is an example for which h = h(3) = 3.

The least number of citations for the case h = h(3) = 3, occurs for the array (27, 27, 27, 0). Generally, the least number of citations for the case h = h(3) is(h3,h3,,h3htimes,0).$\left( \underbrace{{{h}^{3}},{{h}^{3}}\,,\,\ldots ,{{h}^{3}}}_{h\,times},0 \right).$Of course, even when h = h(2) = h(3) there is no upper limit to the corresponding number of citations.

C. When is h(2) = h(3)?

Recall that a set of articles has h(2)-index h2 if the first h2 articles received at least (h2)2 citations each and the article ranked h2+1 received strictly less than (h2+1)2 citations; and a set of articles has h(3)-index h3 if the first h3 articles received at least (h3)3 citations each and the article ranked h3+1 received strictly less than (h3+1)3 citations. If the two conditions must be satisfied at the same time then the first h2 articles must have received at least (h2)3 citations each and the article ranked h2+1 must have strictly less than (h2+1)2 citations.

The following array A = (100, 30, 27, 15) is an example for which h(2) = h(3) = 3. Note that this array has an h-index equal to 4.

The least number of citations for the case h(2) = h(3) = 3, occurs for the array (27, 27, 27, 0). Generally, the least number of citations for the case h(2) = h(3) is (h3,h3,,h3htimes,0).$\left( \underbrace{{{h}^{3}},{{h}^{3}}\,,\,\ldots ,{{h}^{3}}}_{h\,times},0 \right).$

D. When is h = g?

A set of articles has g-index h if the sum of the citations of the first h articles is at least h2 and the sum of the first h+1 articles is strictly less than (h+1)2. If

X = (x1, x2, …xj,…) then we see that if g(X) = h, then i=1hxih2.$\sum\limits_{i=1}^{h}{{{x}_{i}}\ge {{h}^{2}}.}$This inequality always holds if the h-index of X is equal to h. Now from i=1h+1xi<(h+1)2$\sum\limits_{i=1}^{h+1}{{{x}_{i}}<{{\left( h+1 \right)}^{2}}}$and the

fact that i=1hxih2,$\sum\limits_{i=1}^{h}{{{x}_{i}}\ge {{h}^{2}},}$we see that if xh+1 = h then h2i=1hxi<h2+h+1.${{h}^{2}}\le \sum\limits_{i=1}^{h}{{{x}_{i}}<{{h}^{2}}+h+1}.$

For h = g = 3, xh+1 = x4 = 3 and for the largest possible number of citations for x1 we have: (6, 3, 3, 3) as an example. If xh+1 = x4 = 0 we have (9, 3, 3, 0) again for the largest possible value of x1. An example, still for g=h=3, of an intermediate case is (5, 4, 3, 2). In general, again trying to give the first item the largest possible value, we have for the largest possible integer value, namely h, for item h+1 an array of the form(2h,h,h,,h,hh1times).$\left( \underbrace{2h,h,h,\ldots ,h,h}_{h-1\,times} \right).$If item h+1 has value zero, then we have: (3h,h,h,,hh1times,0).$\left( 3h,\underbrace{h,h,\,\ldots ,h}_{h-1\,times},0 \right).$

E. When is g = h = h(2) = h(3)?

If h = 3 then the condition h = h(2) = h(3) leads to an array of the form (27, 27, 27, 0), or with higher values. As 27+27+27+0 = 81 = 92 we observe that the g-index is at least 9. Hence the equality g = h = h(2) = h(3) is not possible for h=3, and certainly not for higher values.

If h = 2, then h = h(2) = h(3) leads to an array of the form (8, 8, 0), or with higher values. As 8+8+0=16=42 the g-index is at least 4. Hence, also for h=2 it is impossible to have equality.

Finally for h=1 it is easy to find examples for which g = h(3) such as (2, 1). As the sum of the first two citations must be at most equal to 3, this example is an extreme. Similarly (3, 0) is an extreme. Among publication-citation arrays of length one (3), (2) and (1) are the only three cases; among publication arrays of length two we have (2, 1) and (1, 1). From this we conclude that equality among the four indices can only occur for h=1 and, even then, occurs in just a few cases.

Note that there are no conditions on the tail so that there is no condition on the total number of citations. The array (2, 1, 1, 1,...) has h-index = g-index = h(3)-index = 1, but there is no upper limit on the total number of received citations. If the number of articles, N, is given, then the upper limit for the number of received citations is N+1; the lower limit is 1.

Rational indices

Rational h-type indices can be used to make a distinction between cases with the same h-type value. The rational variant of the h-index, denoted as hrat, was introduced by Ruane and Tol (2008) in the context of publications and citations. It is defined as follows.

Definition: Consider a researcher with h-index h. Let n be the smallest possible number of citations necessary to reach an h-index equal to h+1, then the rational h-index, denoted hrat, is defined as:

hrat=h+1n2h+1$${{h}_{rat}}=h+1-\frac{n}{2h+1}\,$$

We next explain this formula. If a researcher has h-index h, then one may ask about the minimum number of citations necessary to reach an h-index equal to h + 1. This number is denoted here as n. The next question is now: if you only know that this scientist’s h-index is h what is then the largest number of citations that this researcher needs to reach an h-index equal to h+1. The answer is 2h+1, corresponding with the “worst case scenario” that there are h publications with h citations each and the publication at rank h + 1 has 0 citations. This explains the occurrence of the factor 2h+1 in the formula for the rational h-index (Rousseau et al., 2018). In a similar way a rational g-index was introduced in (Guns & Rousseau, 2009). Next we define the rational h(2) and h(3) indices.

Similar to the case of the h-index we note that the worst case for a set of articles with h(2) index equal to h2 happens when the first h2 articles received (h2)2 citations and the article ranked h2+1 has no citations. Such an article needs h2 times (h2+1)2-(h2)2 = h2(2h2+1) extra citations plus (h2+1)2 new citations, leading to a total of 3(h2)2+3h2+1 citations. Consequently:

(h2)rat=h2+1n23h22+3h2+1$${{\left( {{h}_{2}} \right)}_{rat}}={{h}_{2}}+1-\frac{{{n}_{2}}}{3h_{2}^{2}+3{{h}_{2}}+1}\,$$

where n2 is the minimum number of citations necessary to reach an h(2)-index equal to h(2) + 1.

Finally, for the h(3) index we note that the worst case for a set of articles with h(3) index equal to h3 happens when the first h3 articles received (h3)3 citations and the article ranked h3+1 has no citations. Such an article needs h3 times (h3+1)3-(h3)3 = 3(h3)2 + 3(h3) +1 extra citations plus (h3+1)3 new citations, leading to a total of 4(h3)3 + 6(h3)2 + 4h3 + 1 citations. Consequently:

(h(3))rat=h(3)+1n34h33+6h32+4h3+1$${{\left( {{h}^{\left( 3 \right)}} \right)}_{rat}}={{h}^{\left( 3 \right)}}+1-\frac{{{n}_{3}}}{4h_{3}^{3}+6h_{3}^{2}+4{{h}_{3}}+1}$$

where n3 is the minimum number of citations necessary to reach an h(3)-index equal to h(3) + 1.

An example. Array A = (100, 30, 27, 3) has a rational h(3)-index of 3+10+34+37+614*27+6*9+4*3+1=41321753.246.$3+1-\frac{0+34+37+61}{4*27+6*9+4*3+1}=4-\frac{132}{175}\approx 3.246.$

The relative rational h-index

When researchers reach an h-index of h, it will rarely occur that they really need 2h+1 new citations to reach the value h+1. Usually some of these citations may already have been received. In the extreme case they will only need two new citations, namely when their publication-citation array is (h+1,h+1,,h+1h1times,h,h)$\left( \underbrace{h+1,h+1,\,\ldots ,h+1}_{h-1\,times},h,h \right)$or with more citations. If, at the moment a researcher reaches an h-index of h, they need m new citations to reach an h-index of h+1, then at a later moment their relative (or individual) rational h-index, denoted hr,rat, is

hr,rat=h+1nm$${{h}_{r,rat}}=h+1-\frac{n}{m}$$

where n has the same meaning as before, namely: the minimum number of citations still necessary to reach an h-index equal to h + 1. As m ≤ 2h+1, hr,rat ≤ hrat. For an individual researcher this relative rational h-index is clearly more meaningful than the absolute one. An example: if A0 = (6, 5, 3, 1) when an h-index of 3 was reached and if this researcher’s publication-citation array is now A = (9, 6, 4, 2), then their relative rational h-index is 4 – 2/4 = 3.5; the absolute one would be 4-2/7 ≈ 3.71. Similarly, one may define relative rational g, h(2) and h(3) indices and apply them not only to persons, but also to journals or other units of interest.

Equality between h and g in a Lotkaian framework

In this section we use a continuous framework. This has no direct application in research evaluation, but it is part of a context in which researchers use a continuous version of h-type indices for modelling purposes (Egghe, 2005). We first recall the definition of the h- and the g-index in this framework. If f(r) is a given rank-frequency function (Zipf-type) then the h-index is the solution of the equality (in r):

f(r)=r$$\text{f}\left( \text{r} \right)=\text{r}$$

while the g-index is the solution, g, of the equality

0gf(r)dr=g2$$\int _{0}^{g}f\left( r \right)dr={{g}^{2}}$$

We recall that always (in a continuous as well as a discrete framework) g ≥ h. It has been shown (Egghe & Rousseau, 2006) that in a Lotkaian framework

h=T1/αforα>1$$h={{T}^{1/\alpha }}\,\,\text{for}\,\,\alpha >1$$

where α is the exponent of the underlying Lotka (power) function and T is the total number of sources. Similarly, it has been shown (Egghe, 2006b) that

g=(α1α2)(α1)αT1αforα>2.$$g={{\left( \frac{\alpha -1}{\alpha -2} \right)}^{{\scriptstyle{}^{\left( \alpha -1 \right)}/{}_{\alpha }}}}{{T}^{{}^{1}/{}_{\alpha }}}\,\text{for}\,\alpha \,\text{}\,\text{2}\text{.}\,$$

Now we prove that, for α > 2, and for fixed T, g-h is decreasing in α.

Proof. We consider the derivative of g-h with respect to α and prove that this derivative is always negative.

ddα(g(α)h(α))=ddα(((α1α2)(α1)α1)T1α)=((α1α)(α1α2)α1α1α2(α1)(α2)2+ln(α1α2)α1αα(α1)α2)T1α+(((α1α2)(α1)α1)ln(T)T1α(1α2))=T1α{(((α1α2)(α1)α1)(ln(T)α2))+(α1α2)α1α((1α(α2))+ln(α1α2)1α2)}$$\begin{align}& \frac{d}{d\alpha }\left( g\left( \alpha \right)-h\left( \alpha \right) \right)=\frac{d}{d\alpha }\left( \left( {{\left( \frac{\alpha -1}{\alpha -2} \right)}^{{\scriptstyle{}^{\left( \alpha -1 \right)}/{}_{\alpha }}}}-1 \right){{T}^{{\scriptstyle{}^{1}/{}_{\alpha }}}} \right) \\ & =\left( \left( \frac{\alpha -1}{\alpha } \right){{\left( \frac{\alpha -1}{\alpha -2} \right)}^{\frac{\alpha -1}{\alpha }-1}}\frac{\alpha -2-\left( \alpha -1 \right)}{{{\left( \alpha -2 \right)}^{2}}}+\ln {{\left( \frac{\alpha -1}{\alpha -2} \right)}^{\frac{\alpha -1}{\alpha }}}\frac{\alpha -\left( \alpha -1 \right)}{{{\alpha }^{2}}} \right){{T}^{{\scriptstyle{}^{1}/{}_{\alpha }}}} \\ & \,+\,\left( \left( {{\left( \frac{\alpha -1}{\alpha -2} \right)}^{{\scriptstyle{}^{\left( \alpha -1 \right)}/{}_{\alpha }}}}-1 \right)ln\left( T \right){{T}^{{\scriptstyle{}^{1}/{}_{\alpha }}}}\left( -\frac{1}{{{\alpha }^{2}}} \right) \right) \\ & ={{T}^{{\scriptstyle{}^{1}/{}_{\alpha }}}}\left\{ \left( \left( {{\left( \frac{\alpha -1}{\alpha -2} \right)}^{{\scriptstyle{}^{\left( \alpha -1 \right)}/{}_{\alpha }}}}-1 \right)\left( -\frac{\ln \left( T \right)}{{{\alpha }^{2}}} \right) \right)+{{\left( \frac{\alpha -1}{\alpha -2} \right)}^{\frac{\alpha -1}{\alpha }}}\left( \left( \frac{-1}{\alpha \left( \alpha -2 \right)} \right)+ln\left( \frac{\alpha -1}{\alpha -2} \right)\frac{1}{{{\alpha }^{2}}} \right) \right\} \\ \end{align}$$

The first factor is positive; the first term of the second factor is clearly negative, being a product of a positive and a negative factor. Now the second term of the second factor is a positive number multiplied by negative one (shown below), so that the derivative of g-h with respect to α is negative. This proves that, for fixed T, the difference between g and h decreases with α.

Now we consider the factor (1α(α2))+ln(α1α2)1α2$\left( \frac{-1}{\alpha \left( \alpha -2 \right)} \right)+ln\left( \frac{\alpha -1}{\alpha -2} \right)\frac{1}{{{\alpha }^{2}}}$and show that it is negative. This holds if ln(α1α2)<αα2.$ln\left( \frac{\alpha -1}{\alpha -2} \right)<\frac{\alpha }{\alpha -2}.$Taking exponentials we have to show that α1α2<eα(α2)=1+αα2+12(αα2)2+.$\frac{\alpha -1}{\alpha -2}<{{e}^{{}^{\alpha }/{}_{\left( \alpha -2 \right)}}}=1+\frac{\alpha }{\alpha -2}+\frac{1}{2}{{\left( \frac{\alpha }{\alpha -2} \right)}^{2}}+\ldots \,.$Now eα(α2)=1+αα2+${{e}^{{\scriptstyle{}^{\alpha }/{}_{\left( \alpha -2 \right)}}}}=1+\frac{\alpha }{\alpha -2}+$12(αα2)2+>1+αα2=2α2α2.$\frac{1}{2}{{\left( \frac{\alpha }{\alpha -2} \right)}^{2}}+\,\ldots >1+\frac{\alpha }{\alpha -2}=\frac{2\alpha -2}{\alpha -2}.$As α > 2, this boils down to the inequality 2α-2 > α-1, which is clearly true. This proves the required inequality.

limαTconstant(gh)=limαTconstant(((α1α2)(α1)/α1)T1/α)=0.$\underset{\underset{T\,\text{constant}}{\mathop{\alpha \to \infty }}\,}{\mathop{\lim }}\,\left( g-h \right)=\underset{\underset{T\,\text{constant}}{\mathop{\alpha \to \infty }}\,}{\mathop{\lim }}\,\left( \left( {{\left( \frac{\alpha -1}{\alpha -2} \right)}^{\left( \alpha -1 \right)/\alpha }}-1 \right){{T}^{1/\alpha }} \right)=0.$This shows that for fixed T, g tends to h. This is easy to understand: indeed, if the Lotka-coefficient α tends to infinity, the Zipf-coefficient β (the coefficient of the Zipf distribution equivalent of the Lotka distribution with coefficient α) tends to zero (recall that β=1α1).$\beta =\frac{1}{\alpha -1}).$Now a Zipf-coefficient equal to zero corresponds to a ranking in which all elements are equal, which means that g = h.

Discussion and conclusion

We derived conditions under which h-type indices, and in particular the h and the g-index, are equal. Next we introduced the rational h(2) and h(3)-index. We moreover proposed a relative or individual rational h-index. Finally we studied the limiting behavior of the difference g-h in a continuous Lotkaian framework.

Although this article is explicitly meant to be a contribution in theoretical informetrics, it might have some practical use. This holds, in particular, for the introduction of the relative rational h-index.

We recognize that all h-type indicators do not always behave in a logical way (Bouyssou & Marchant, 2011; Waltman & van Eck, 2012). Like many other indicators the h, h(2), h(3) and g indicator are only PAC (Probably Approximately Correct) (Rousseau, 2016). However, this practical observation has no direct relation with the mathematical properties studied in this contribution. We further note that these h-type indices may play a role in heuristic approaches to support informed peer review (Bornmann et al., 2018).

eISSN:
2543-683X
Idioma:
Inglés
Calendario de la edición:
4 veces al año
Temas de la revista:
Computer Sciences, Information Technology, Project Management, Databases and Data Mining