The -measure for Research Priority

In this contribution we continue our investigations (Rousseau & Yang, 2012) related to the activity index (AI) and its formal analogs. The activity index (AI) of country C with respect to a given domain D (and with respect to the world, W) over a given period P is defined as:

$\begin{matrix} A I (C, D, W, P) = the country's share in the world's publication output during the \\ period P in the given domain D divided by the country's share in the world's \\ publication output during the same period P in all science domains. \end{matrix}$ $$\begin{array}{} AI(C,D,W,P)=\text{the country's share in the world's publication output during the }\\ \text{period }P\text{ in the given domain }D\text{ divided by the country's share in the world's}\\ \text{publication output during the same period }P\text{ in all science domains.} \end{array} $$(1)

We note, moreover, that publications are counted as retrieved in a given database. This index was introduced in informetrics by Frame (1977). We refer to this formulation as the basic activity index because, instead of the world one might, for instance consider the USA or China and instead of a country one may consider a state or province. Clearly many other variants are imaginable. The basic activity index is said to characterize the relative research effort a country devotes to a given domain D. Stated otherwise, the AI gauges the share of a country’s or region’s publication activity in a given domain in its total publication output against the corresponding world standard. The lower bound of the AI is zero, while it has no upper bound. It is easy to show, see Equation (3) and i.e., (Schubert & Braun, 1986) that the activity index can also be expressed as:

$\begin{matrix} A I (C, D, W, P) = the given domain's share in the country's publication output \\ during the period P divided by the given domain's share in the world's publication \\ output during the same period P . \end{matrix}$ $$\begin{array}{} AI(C,D,W,P)=\text{the given domain's share in the country's publication output }\\ \text{during the period }P \text{ divided by the given domain's share in the world's publication}\\ \text{output during the same period } P. \end{array} $$(2)

When the context is clear or when it does not matter we simply write AI. The mathematical framework of the AI, though with other meanings and sometimes slightly transformed, has been used in many contexts and with other names. In all cases one studies a nominal cross-classification table. Some of these, such as the attractivity index (replacing the term publication output by received citations in Equation (1), the relative specialization index and the (relative) priority index, are discussed further on.

The AI and the attractivity index are classified by Vinkler (2010) among the contribution indicators, used to characterize the contribution or weight of a subsystem, such as a country, to the total system, e.g., the world.

Next we have a look at the constituent parts of the AI and introduce some notations. For simplicity we stay within the context of Equations (1) and (2) but recall that everything we show in the context of the basic activity index can also be said in other contexts. Criticisms we exert refer to the meaning of the mathematical formula: a ratio of ratios, but to make things precise we work mostly in the context of the standard activity index.

We consider the following parameters: O_CD, O_D, O_C and O_W, where, as a memory aid, the symbol O refers to the word output. Further:

O_CD denotes the number of publications by country C in domain D during a given publication window;

O_D denotes the total number of publications in the world in domain D during the same publication window;

O_C denotes the number of publications – in all domains – by country C during the same publication window;

O_W denotes the total number of publications in the world and in all domains during this publication window.

Then clearly, we have the following relations:

0 ≤ O_CD ≤ O_D ≤ O_W; 0 ≤ O_CD ≤ O_C ≤ O_W; and further:

O_CD/O_D is: the country’s share in the world’s publication output in the given domain D

O_CD/O_C is: the given domain’s share in the country’s publication output

O_C/O_W is: the country’s share in the world’s publication output in all science domains

O_D/O_W is: the given domain’s share in the world’s publication output

Finally we note that

$\begin{matrix} A I = \frac{(\frac{O_{C D}}{O_{D}})}{(\frac{O_{C}}{O_{W}})} = \frac{(\frac{O_{C D}}{O_{C}})}{(\frac{O_{D}}{O_{W}})} . \end{matrix}$ $$\begin{array}{} \displaystyle AI=\frac{\left(\frac{O_{CD}}{O_D}\right)}{\left(\frac{O_C}{O_W}\right)}=\frac{\left(\frac{O_{CD}}{O_C}\right)}{\left(\frac{O_D}{O_W}\right)}. \end{array} $$(3)

It is well-known that, assuming disjoint domains, a country cannot have an AI(D) value larger than one for all domains D (Rousseau, 2012).

A Short Literature Study

In this section we recall some articles that used or studied the activity index, the attractivity index or its variants, without trying to be exhaustive.

Thijs and Glänzel (2008) used the AI to describe the national profile of eight European countries’ research fields. Zhou, Thijs, and Glänzel (2009) studied the regions of China, including in their investigations the scientific production (where the AI plays a role), relative received citations (but they did not include the attractivity index), and regional R&D expenditure. Ramakrishnan and Thavamani (2015) used the basic activity index in a study of the contribution of India to the field of leptospirosis. Further, Sangam et al. (2017) show that the AI (they use the term relative priority index) depends on the used database. Concretely, they study hepatitis research and compare results obtained from data retrieved from PubMed, Web of Science (WoS), and a sub-database of the WoS consisting of fields in the life sciences.

Instead of the term AINagpaul and Sharma (1995) use the term (relative) priority index, but with the same meaning as the AI. This terminology has also been used by Bhattacharya (1997) and in the already mentioned publication by Sangam et al. (2017). The revealed comparative advantage (RCA) or Balassa Index (Balassa, 1965) is an index used in international economics for calculating the relative advantage or disadvantage of a certain country in a certain class of goods or services as evidenced by trade flows. The RCA is defined as the proportion of the country’s exports that are of the class under consideration divided by the proportion of world exports that are of that class. Mathematically this index has the same form as the AI. A comparative advantage is “revealed” if RCA > 1. If RCA is less than unity, the country is said to have a comparative disadvantage in the commodity or industry under consideration.

Next we draw our attention to studies that include some theoretical aspects or variations of the AI. First we mention that some authors prefer the AI multiplied by 100 and refer to this as the modified activity index (MAI), see e.g., (Guan & Gao, 2008). These authors studied the MAI for bioinformatics over the period 2000–2005 and observe that the MAI value (hence also the AI-value) of China in this field has doubled over the observed period. Chen and Xiao (2016) proposed the Keyword Activity Index (KAI) of a keyword in a given domain as:

KAI = (the share of the given domain in publications containing the given keyword)/(the share of the given domain in all publications).

Egghe and Rousseau (2002) place the activity and the attractive index within a larger abstract framework of relative indicators. Hu and Rousseau (2009) compare the research performance in biomedical fields of 10 selected Western and Asian countries. The results confirm that there are many differences in intra- and interdisciplinary scientific activities between the West and the East. In particular they found that in most biomedical fields Asian countries perform below world average. Stimulated by these experimental results they find that the ratio of the attractivity index over the activity index, in a given domain and for a given country, can be expressed in terms of normalized mean citation rates (for the precise results we refer the reader to the original publication).

The relative specialization index (RSI) as used e.g., in Glänzel (2000) and Aksnes, van Leeuwen, and Sivertsen (2014) is defined as:

$\begin{matrix} R S I = \frac{A I - 1}{A I + 1} . \end{matrix}$ $$\begin{array}{} \displaystyle RSI=\frac{AI-1}{AI+1}. \end{array} $$(4)

The RSI is a strict order preserving normalization of the AI. If AI = 0 then RSI = -1 and if AI increases to infinity, then RSI tends to 1. This transformation makes sure that values stay bounded between -1 and +1. This indicator but with Chinese universities instead of countries was used in Li, Miao, and Ding (2015). Besides comparisons with the world, they also performed comparisons with respect to China and with respect to leading universities in the world as reference group. Aksnes, van Leeuwen, and Sivertsen (2014) studied the impact on the RSI of the increased representation of China in the WoS. They choose the Netherlands as a case study to study this effect. We note that here two dynamic aspects are at play: the huge growth of China in terms of publications (described as “booming”) and the change of the WoS over time (possibly influenced by China). They concluded however that, although the influence of China is visible in the RSI for the Netherlands, and this especially in the last decade and in domains where these countries have opposite specializations, the basic research profile of the Netherlands as measured by the RSI remains the same. We note though that this is not a strictly mathematical result but rather a heuristic impression related to the stability of this index. Zhang, Rousseau, and Glänzel (2011) applied the RSI formula using document types instead of scientific domains. They find that the USA, Canada, and Australia are balanced cases, while the UK has the highest relative contribution in book reviews.

Stare and Kejžar (2014) point out that although +1 is indeed an upper bound for the RSI, this upper bound depends on the domain under study and as such can in practice be much lower than +1 (for a given domain). They show that for the period 2005–2009 and for the Natural Sciences, this upper bound is as low as 0.32. They conclude that the differences in maximum values of AI and RSI between scientific fields are so big that any conclusions based on analyses of these indices seem questionable. For this reason they propose another index which takes the maximum value of the AI for a given domain into account. This indicator, denoted as SAI (standardized AI) is defined as follows:

$\begin{matrix} S A I = (\begin{matrix} \frac{A I}{2} & i f A I \leq 1 \\ 1 - \frac{M A X (A I) - A I}{2 (M A X (A I) - 1)} & o t h e r w i s e \end{matrix}) \end{matrix}$ $$\begin{array}{} \displaystyle SAI=\left\{\begin{array}{} \frac{AI}{2}&&\quad if\,\,AI\leq 1\\ 1-\frac{MAX(AI)-AI}{2(MAX(AI)-1)}&&\quad otherwise\end{array}\right. \end{array} $$(5)

Here, MAX (AI) is the theoretical maximum value of AI, given the real number of publications in the domain. Clearly, SAI takes values between 0 and 1 and when AI = 1, then SAI = 0.5.

Reflections on the Meaning of the Activity Index

What is the meaning of the activity index? In (Rousseau, 2012) we stated that if the values of O_CD, O_D, and O_C stay the same—and these are the parameters we are interested in—then AI(Y+1) may differ from AI(Y), the values of the activity index in the years Y+1 and Y, if there is an increase or decrease in O_W, unrelated to country C or domain D. Concretely, the activity index of the USA in chemistry may increase just because China, or any other country, has an increase in articles on biology (leading to an increase in O_W). Hence a change (increase or decrease) in the activity index can happen for reasons which have nothing to do with the country or the domain one is interested in. This observation is important for policy reasons as fluctuations in the value of the AI for a given country and domain is never the result of that country’s policy with respect to the domain D alone. For this reason we consider O_CD, O_C, and O_D as endogenous factors (the factors of interest), while O_W is considered an uncontrollable external, i.e., exogenous factor.

Because of these remarks strange, i.e., counterintuitive, results may occur when calculating an AI. We provide two examples.

Example A. Suppose that a country is the leading country in the world, according to the activity index, in a particular domain. Then it is possible that another country becomes the leading one by publishing less in other domains. Consider the following Table 1. At the start the activity indices for countries 1 and 2 are respectively 5.33 and 4.48. When country 2 publishes 17,000 articles less in other domains the activity indices become respectively 5.28 and 5.34. Although this is a fictitious case, it clearly demonstrates the fact that this indicator does not behave as intuitively expected, and worse it does not measure what it (probably) is supposed to measure. The problem lies in the parameter O_W.

Table 1

Calculations related to Example A; the indicator F is introduced further on.

	Original situation		New situation: Country 2 publishes 17,000 articles less in other domains

	Country 1	Country 2	Country 1	Country 2
O_CD	200	1,400	200	1,400
O_D	5,000	5,000	5,000	5,000
OC	12,000	100,000	12,000	83,000
O_W	1,600,000	1,600,000	1,583,000	1,583,000
AI	5.33	4.48	5.28	5.34
F	0.0235	0.0267	0.0235	0.0318

Example B. Next we provide another counterintuitive example which comes from (Rousseau & Yang, 2012). This example is even more counterintuitive as there are no pure exogenous influences. It shows that if a country’s activity in a domain (parameter O_CD) increases and nothing else changes (the changes in the domain, country and world, are only the result of the change introduced by the country and the domain under study), then it is possible that the AI decreases and similarly if the activity decreases it is possible that the AI increases. Of course this again is a purely theoretic example, but it clearly shows the intrinsic problem with the AI-formula. Data and results are shown in Table 2.

Table 2

Data and calculations related to Example B.

	Basic	Increase in O_CD	Decrease in O_CD
O_CD	190	200	180
O_D	200	210	190
O_C	200	210	190
O_W	400	410	390
AI	1.9	1.859	1.945
F	0.95	0.9524	0.9474

These two examples clearly show that there are serious problems in the interpretation of the AI. Finally, we mention the following situation. Consider O_CD, O_C, O_D, and O_W in a particular year. The next year O_CD, O_C, and O_D are exactly the same, but O_W has increased. Comparing the AI for these two years we see that the numerator has stayed the same but the denominator has decreased. Consequently the AI-value has increased. Reflecting on this we see that, with respect to the world, the contribution of country C and of the domain D have decreased. Yet, according to the AI, the activity of C in D has increased! Also this result is difficult to grasp.

A New Proposal: F-measure for Research Priority

Although the AI and its variants do have a meaning as relative (or even double relative) measures (Rousseau, 2012) we think that in many cases researchers are actually interested in another indicator.

The ratios O_CD/O_D, namely the country’s share in the world’s publication output in the given domain D and O_CD/O_C, namely the given domain’s share in the country’s publication output are the indicators in which one generally is interested. Working with O_CD/O_D and O_CD/O_C we form their harmonic means, which conceptually is the same as the F-score with respect to Recall and Precision in information retrieval (Manning, Raghavan, & Schütze, 2008). This leads to the indicator, see (6), we propose instead of the activity index and its variants.

$\begin{matrix} F (C, D, W, P) = \frac{2}{\frac{1}{O_{C D} / O_{D}} + \frac{1}{O_{C D} / O_{C}}} = \frac{2}{\frac{O_{D}}{O_{C D}} + \frac{O_{C}}{O_{C D}}} = \frac{2 O_{C D}}{O_{D} + O_{C}} \leq 1. \end{matrix}$ $$\begin{array}{} \displaystyle F(C,D,W,P)=\frac{2}{\dfrac{1}{O_{CD}\,/\,O_D}+\dfrac{1}{O_{CD}\,/\,O_C}}=\frac{2}{\dfrac{O_D}{O_{CD}}+\dfrac{O_C}{O_{CD}}}=\dfrac{2O_{CD}}{O_D+O_C}\leq 1. \end{array} $$(6)

We further write F(C, D, W, P) simply as F when C, D,W and P are assumed to be known. We already note that

$\begin{matrix} 0 \leq F \leq 1, \end{matrix}$ $$\begin{array}{} 0\leq F\leq 1, \end{array} $$(7)

where the minimum and the maximum value only occur in the uninteresting cases that O_CD = 0, i.e., the country has no contribution in that particular domain or when the country is the only one active in that particular domain and is, moreover, only active in that domain: O_CD = O_D = O_C. So, from now on we assume the strict inequalities in (7). Being a mean we have for each concrete case that

$\begin{matrix} min (\frac{O_{C D}}{O_{C}}, \frac{O_{C D}}{O_{D}}) \leq F \leq max (\frac{O_{C D}}{O_{C}}, \frac{O_{C D}}{O_{D}}) . \end{matrix}$ $$\begin{array}{} \displaystyle \min\left(\frac{O_{CD}}{O_C},\frac{O_{CD}}{O_D}\right)\leq F\leq \max\left(\frac{O_{CD}}{O_C}, \frac{O_{CD}}{O_D}\right). \end{array} $$(8)

The value of the F-measure in a domain D for the whole world is $\begin{matrix} \frac{2 O_{D}}{O_{D} + O_{W}} \end{matrix}$ $\begin{array}{} \dfrac{2O_D}{O_D+O_W} \end{array} $. The larger O_D the larger this world value. Of course one could divide the value for a country in a domain with the corresponding value for the world, but this would re-introduce the parameter O_W. For this reason we prefer to consider the world value as a separate piece of information about the priority given by the whole world to this particular domain. We note that a special application of the F-score, the so-called feature F-measure was used by Lamirel (2012) as an element in an unsupervised clustering method.

In (Rousseau & Yang, 2012) we investigated under which conditions an increase in O_CD (or a decrease) would lead to an increase (decrease) in AI. Recall that we already know that this—expected—behavior does not always happen. Yet, we think that such an increase or decrease should not depend on other variables but should always happen. The next result shows that this is the case for the F-measure for research priority. Here and further on we exclude the trivial case that O_CD = OD = OC.

Theorem 1

If O_CD increases then the F-measure increases (addition property).

If O_CD decreases then the F-measure decreases (subtraction property).

Proof.

Let λ > 0 then we have to show that

$\begin{matrix} \frac{2 O_{C D}}{O_{D} + O_{C}} < \frac{2 (O_{C D} + λ)}{(O_{D} + λ) + (O_{C} + λ)} \\ \Leftrightarrow 2 O_{C D} \cdot O_{D} + 2 O_{C D} \cdot O_{C} + 4 O_{C D} \cdot λ < 2 O_{C D} \cdot O_{D} + 2 O_{C D} \cdot O_{C} + 2 λ \cdot O_{D} + 2 λ \cdot O_{C} \\ \Leftrightarrow 2 O_{C D} < O_{D} + O_{C} \end{matrix}$ $$\begin{array}{} \displaystyle \frac{2O_{CD}}{O_D+O_C}\lt \frac{2(O_{CD}+\lambda)}{(O_D+\lambda)+(O_C+\lambda)} \\\Leftrightarrow 2O_{CD}\cdot O_D+2O_{CD}\cdot O_C+4O_{CD}\cdot \lambda\lt 2O_{CD}\cdot O_D+2O_{CD}\cdot O_C+2\lambda \cdot O_D+2\lambda \cdot O_C\\ \Leftrightarrow 2O_{CD}\lt O_D+O_C \end{array} $$

This last inequality obviously holds.

Similarly, for O_CD > λ > 0, we show that:

$\begin{matrix} \frac{2 O_{C D}}{O_{D} + O_{C}} > \frac{2 (O_{C D} - λ)}{(O_{D} - λ) + (O_{C} - λ)} \\ \Leftrightarrow 2 O_{C D} \cdot O_{D} + 2 O_{C D} \cdot O_{C} - 4 O_{C D} \cdot λ > 2 O_{C D} \cdot O_{D} + 2 O_{C D} \cdot O_{C} - 2 λ \cdot O_{D} - 2 λ \cdot O_{C} \\ \Leftrightarrow - 2 O_{C D} \cdot λ > - λ (O_{D} + O_{C}) \\ \Leftrightarrow 2 O_{C D} < O_{D} + O_{C} \end{matrix}$ $$\begin{array}{} \displaystyle \frac{2O_{CD}}{O_D+O_C}\gt \frac{2(O_{CD}-\lambda)}{(O_D-\lambda)+(O_C-\lambda)} \\\Leftrightarrow 2O_{CD}\cdot O_D+2O_{CD}\cdot O_C-4O_{CD}\cdot \lambda\gt 2O_{CD}\cdot O_D+2O_{CD}\cdot O_C-2\lambda \cdot O_D-2\lambda \cdot O_C\\ \Leftrightarrow -2O_{CD}\cdot \lambda \gt -\lambda(O_D+O_C) \\\Leftrightarrow 2O_{CD}\lt O_D+O_C \end{array} $$

Proving the case of a decrease in O_CD.

We further note the logical property that if O_D and/or O_C increases and O_CD stays the same then F decreases.

Reconsidering Examples A and B we calculate the F-measure in these cases and notice that for Example A country C₂ has already a higher F-measure than country C₁; while for Example B, all counterintuitive results disappear (illustrating Theorem 1). Next we briefly discuss the notion of independence (Bouyssou & Marchant, 2011) in relation with the F-measure.

If S₁ and S₂ represent sets of publications then strict independence for an indicator J means that if J(S₁) < J(S₂) and one adds to S₁ and to S₂ the same publications, leading to sets S₁’ and S₂’ then still J(S₁’) < J(S₂’).

The indicator J is said to be relative independent if the independence property holds for sets S₁ and S₂ with the same number of elements. If one wants to stress the difference between independent and relative independent one may use the term absolute independent for the former.

Theorem 2 (Relative independence)

If countries C₁ and C₂ have the same number of publications, i.e., O_{C, 1} = O_{C, 2} = O_C, if the relation F(C₁,D,W,P) < F(C₂,D,W,P) holds and if we add the same number of publications, q > 0, in the domain D, to the output of these two countries then still F(C’₁, D,W,P) < F(C’₂, D,W,P) where the notations C’₁ and C’₂ refer to the same countries but with an increased number of publications in the field F.

Proof. We know that

$\begin{matrix} \frac{2 O_{C D, 1}}{O_{D} + O_{C}} < \frac{2 O_{C D, 2}}{O_{D} + O_{C}} \end{matrix}$ $$\begin{array}{} \displaystyle \frac{2O_{CD,1}}{O_D+O_C}\lt \frac{2O_{CD,2}}{O_D+O_C} \end{array} $$

Hence O_{CD, 1}< O_{CD, 2}. Now we have to show that:

$\begin{matrix} \frac{2 (O_{C D, 1} + q)}{(O_{D} + 2 q) + (O_{C} + q)} < \frac{2 (O_{C D, 2} + q)}{(O_{D} + 2 q) + (O_{C} + q)} \end{matrix}$ $$\begin{array}{} \displaystyle \frac{2(O_{CD,1}+q)}{\big(O_D+2q\big)+\big(O_C+q\big)}\lt \frac{2(O_{CD,2}+q)}{\big(O_D+2q\big)+\big(O_C+q\big)} \end{array} $$

This is obvious as O_{CD, 1}< O_{CD, 2}.

Note. The F-measure is not an absolute independent indicator. Indeed, consider the following example. Let O_{CD, 1}= 2; O_{CD, 2} = 3; O_D = 88; O_{C, 1} = 49 and O_{C, 2}= 99. Then

$\begin{matrix} F_{1} = \frac{2.2}{88 + 49} \approx 0.029 < F_{2} = \frac{2.3}{88 + 99} \approx 0.032. \end{matrix}$ $$\begin{array}{} \displaystyle F_1=\frac{2.2}{88+49}\approx 0.029\lt F_2=\frac{2.3}{88+99}\approx 0.032. \end{array} $$

If we add now one unit to O_CD,1 and O_{CD, 2} then we obtain the following values for the parameters: O_{CD, 1} = 3; O_{CD, 2} = 4; O_D = 90; O_{C, 1} = 50 and O_{C, 2} = 100. The relation between the new F-values, denoted as F₁’, and now F₂’ now becomes:

$\begin{matrix} F_{1}' = \frac{2.3}{90 + 50} \approx 0.0429 < F_{2}' = \frac{2.4}{90 + 100} \approx 0.0421. \end{matrix}$ $$\begin{array}{} \displaystyle F_1\text'=\frac{2.3}{90+50}\approx 0.0429\lt F_2\text'=\frac{2.4}{90+100}\approx 0.0421. \end{array} $$

This shows that the F-measure for research priority is not an absolute independent measure.

If the domain stays fixed a ranking of countries (C₁ and C₂) according to AI and to the F-measure may yield opposite results. Consider, indeed, the following example: let O_{CD, 1} = 4; O_D = 20; O_{C, 1}= 14; O_{CD, 2} = 3 and O_{C, 2} = 10, where subscripts refer to the corresponding countries, then AI₁ = 4 O_W/280 and AI₂ = 3O_W/200 and hence AI₁< AI₂. Yet F₁ = 8/34 > F₂ = 6/30.

Similarly, if the country is fixed then a ranking of domains according to AI and to the F-measure may yield opposite results. This remark is nothing but a confirmation that AI and F measure different properties. Only the second one is determined by endogenous factors and hence can be the direct result of an appropriate policy.

Further Mathematical Results

Next we answer the question: if O_CD increases with a given percentage p, what is its influence on the other parameters?

We first consider the parameter O_CD/O_D: the country’s share in the world’s publication output in the given domain D.

Proposition 1. Let 0 < p <1 then an increase of 100p% in O_CD leads to an increase between 0 and 100p% in O_CD/O_D. In many realistic cases, i.e., O_CD<< O_D, this increase is close to 100p%.

Proof. If O_CD becomes OC_D + O_CD.p, then O_CD/O_D becomes (O_CD+ O_CD.p)/(O_D + O_CD.P). Then:

$\begin{matrix} \frac{\frac{O_{C D} + O_{C D} \cdot p}{O_{D} + O_{C D} \cdot p}}{\frac{O_{C D}}{O_{D}}} = \frac{O_{D} (1 + p)}{O_{D} + O_{C D} \cdot p} = \frac{O_{D} + O_{C D} \cdot p}{O_{D} + O_{C D} \cdot p} + \frac{p (O_{D} - O_{C D})}{O_{D} + O_{C D} \cdot p} \\ = 1 + p \frac{O_{D} - O_{C D}}{O_{D} + O_{C D} \cdot p} = 1 + p \frac{1 - \frac{O_{C D}}{O_{D}}}{1 + \frac{O_{C D} \cdot p}{O_{D}}} = 1 + p . R \end{matrix}$ $$\begin{array}{} \displaystyle \frac{\dfrac{O_{CD}+O_{CD}\cdot p}{O_D+O_{CD}\cdot p}}{\dfrac{O_{CD}}{O_D}}=\frac{O_D(1+p)}{O_D+O_{CD}\cdot p}=\frac{O_D+O_{CD}\cdot p}{O_D+O_{CD}\cdot p}+\frac{p(O_D-O_{CD})}{O_D+O_{CD}\cdot p} \\ \qquad \qquad\qquad\quad\, =1+p\dfrac{O_D-O_{CD}}{O_D+O_{CD}\cdot p}=1+p\dfrac{1-\dfrac{O_{CD}}{O_D}}{1+\dfrac{O_{CD}\cdot p}{O_D}}=1+p.R \end{array} $$

The factor R is strictly positive and smaller than 1, proving this result. If O_CD/O_D is small then R is close to 1 and the increase in O_CD/O_D is close to p (but always strictly smaller).

This proposition also holds for O_CD/O_C.

As the F-measure is an average the proposition also holds here. For completeness sake we calculate the value of the corresponding R parameter:

$\begin{matrix} \frac{\frac{2 O_{C D} (1 + p)}{(O_{D} + O_{C D} \cdot p) + (O_{C} + O_{C D} \cdot p)}}{\frac{2 O_{C D}}{O_{D} + O_{C}}} \\ = \frac{(1 + p) (O_{D} + O_{C})}{O_{D} + O_{C} + 2 O_{C D} \cdot p} = \frac{O_{D} + O_{C} + 2 O_{C D} \cdot p}{O_{D} + O_{C} + 2 O_{C D} \cdot p} + \frac{p (O_{D} + O C - 2 O_{C D})}{O_{D} + O_{C} + 2 O_{C D} \cdot p} . \\ = 1 + p \frac{O_{D} + O_{C} - 2 O_{C D}}{O_{D} + O_{C} + 2 O_{C D} \cdot p} = 1 + p \frac{1 - \frac{2 O_{C D}}{O_{D} + O_{C}}}{1 + \frac{2 O_{C D} \cdot p}{O_{D} + O_{C}}} \end{matrix}$ $$\begin{array}{} \displaystyle \dfrac{\dfrac{2O_{CD}(1+p)}{(O_D+O_{CD}\cdot p)+(O_C+O_{CD}\cdot p)}}{\dfrac{2O_{CD}}{O_D+O_C}}\\ =\dfrac{(1+p)\big(O_D+O_C\big)}{O_D+O_C+2O_{CD}\cdot p}=\dfrac{O_D+O_C+2O_{CD}\cdot p}{O_D+O_C+2O_{CD}\cdot p}+\dfrac{p\big(O_D+OC-2O_{CD}\big)}{O_D+O_C+2O_{CD}\cdot p}.\\ =1+p\dfrac{O_D+O_C-2O_{CD}}{O_D+O_C+2O_{CD}\cdot p}=1+p\dfrac{1-\dfrac{2O_{CD}}{O_D+O_C}}{1+\dfrac{2O_{CD}\cdot p}{O_D+O_C}} \end{array} $$

The corresponding factor R is $\begin{matrix} \frac{1 - \frac{2 O_{C D}}{O_{D} + O_{C}}}{1 + \frac{2 O_{C D} \cdot p}{O_{D} + O_{C}}} \end{matrix}$ $\begin{array}{} \dfrac{1-\dfrac{2O_{CD}}{O_D+O_C}}{1+\dfrac{2O_{CD}\cdot p}{O_D+O_C}} \end{array} $ which is again close to 1 if the F-measure is small and close to zero if F is close to 1. For small values of the F-measure an increase of O_CD by p100% leads to an increase of the F-measure by almost p100%.

The F-measure, considered a mathematical function, depends on two variables x = O_CD/O_D (the country’s share in the world’s publication in domain D) and y = O_CD/O_C (the domain’s share in the country’s publication output). As a function of x and y we have:

$\begin{matrix} F (x, y) = \frac{2}{\frac{1}{x} + \frac{1}{y}} = \frac{2 x y}{x + y} \end{matrix}$ $$\begin{array}{} F(x,y)=\dfrac{2}{\dfrac{1}{x}+\dfrac{1}{y}}=\dfrac{2xy}{x+y} \end{array} $$(9)

defined for x ≥ 0, y ≥ 0 and (x,y) ≠ (0,0). We already note that F(x,x) = x.

Considering the parallel lines x + y = c, with c a strictly positive constant, we see that for points (x,y) on this line $\begin{matrix} F (x, y) = \frac{2 x (c - x)}{c} . \end{matrix}$ $\begin{array}{} F(x,y)=\dfrac{2x(c-x)}{c}. \end{array} $ Hence when x + y = c, F(x,y) has the form of a parabola, taking the value zero for x = 0 and x = c, i.e., y = 0. The top of such a parabola is obtained for x = c/2 = y, and takes the value F(c/2,c/2) = c/2. From this analysis it follows that when either x or y is close to zero also the F-measure for research priority is small. Figure 1 shows the function F(x,y) for x and y between 0 and 1. It also shows the F-values for points on x + y = 0.5 and on x + y = 1.

Graph of the function F(x,y); origin is nearest to the viewer.

A Real-world Example

As a real-world application we consider a table of publications in the Humanities, containing information on publications by Flemish researchers (Engels, Ossenblok, & Spruyt, 2012). These data, published as part of Table 1 in (Engels, Ossenblok, & Spruyt, 2012), came about as follows: In 2008 the Flemish government provided the legal framework for the construction of the Flemish Academic Bibliographic Database for the Social Sciences and Humanities (“Vlaams Academisch Bibliografisch Bestand voor de Sociale en Humane Wetenschappen” or “VABB-SHW” in short). This database provided the Flemish government with a useful tool to fine-tune the distribution of research funding over universities in Flanders. As a consequence it became possible for researchers to analyze changing publication patterns in the larger Flemish peer reviewed literature (not just restricted to the WoS). Five publication types are included in the VABB-SHW:

articles in journals;

books as author;

books as editor;

articles or chapters in books;

proceedings papers that are not part of special issues of journals or edited books

In Table 3 a distinction is made between articles in journals included in the WoS and other ones, and similarly for proceedings papers, leading to seven types of publications. In the VABB-SHW all records are assigned to disciplines on the basis of the author(s) affiliation(s) with a SSH unit in which the author carries out research. For the Humanities one makes a distinction between the following disciplines: Archaeology; Art History (including Architecture and Arts); Communication Studies; History; Law; Linguistics; Literature; Philosophy (including History of Ideas); Theology (including Religious Studies). Finally, we mention that data in our Table 3 do not include the remainder category “Humanities-general.”

Table 3

Flemish Humanities publications (2000–2009) in the VABB.

Disciplines	Journal articles		Book chapters	Edited books	Monographs	Proceedings papers		Row totals

	VABB-non-WoS	VABB-WoS	VABB	VABB	VABB	VABB-WoS	VABB-Non-WoS
Archaeology	176	133	40	6	11	12	18	396
Art History	295	150	135	38	12	22	28	680
Communication Studies	425	170	94	16	3	19	1	728
History	773	193	233	52	28	0	19	1,298
Law	4,018	144	320	89	55	11	20	4,657
Linguistics	908	457	511	135	59	54	83	2,207
Literature	631	143	376	87	36	0	31	1,304
Philosophy	786	603	279	42	30	36	9	1,785
Theology	610	85	410	85	53	1	4	1,248
Column totals	8,622	2,078	2,398	550	287	155	213	14,303

Next, in Table 4, we show AI-values for the data shown in Table 3. In this case AI-values refer to the relative preference of disciplines for certain publication types. Table 5 shows the corresponding F-values.

Table 4

Values according to the AI-formula for the data shown in Table 3.

Disciplines	Journal articles		Book chapters	Edited books	Monographs	Proceedings papers

	VABB-non-WoS	VABB-WoS	VABB	VABB	VABB	VABB-WoS	VABB-Non-WoS
Archaeology	0.737	2.312	0.602	0.394	1.384	2.796	3.052
Art History	0.720	1.518	1.184	1.453	0.879	2.985	2.765
Communication Studies	0.968	1.607	0.770	0.572	0.205	2.408	0.092
History	0.988	1.023	1.071	1.042	1.075	0.000	0.983
Law	1.431	0.213	0.410	0.497	0.589	0.218	0.288
Linguistics	0.682	1.425	1.381	1.591	1.332	2.258	2.525
Literature	0.803	0.755	1.720	1.735	1.376	0.000	1.596
Philosophy	0.730	2.325	0.932	0.612	0.838	1.861	0.339
Theology	0.811	0.469	1.960	1.771	2.116	0.074	0.215

Table 5

Values according to the F-measure for the data shown in Table 3.

Disciplines	Journal articles		Book chapters	Edited books	Monographs	Proceedings papers

	VABB-non-WoS	VABB-WoS	VABB	VABB	VABB	VABB-WoS	VABB-Non-WoS
Archaeology	0.039	0.108	0.029	0.013	0.032	0.044	0.059
Art History	0.063	0.109	0.088	0.062	0.025	0.053	0.063
Communication Studies	0.091	0.121	0.060	0.025	0.006	0.043	0.002
History	0.156	0.114	0.126	0.056	0.035	0.000	0.025
Law	0.605	0.043	0.091	0.034	0.022	0.005	0.008
Linguistics	0.168	0.213	0.222	0.098	0.047	0.046	0.069
Literature	0.127	0.085	0.203	0.094	0.045	0.000	0.041
Philosophy	0.151	0.312	0.133	0.036	0.029	0.037	0.009
Theology	0.124	0.051	0.225	0.095	0.069	0.001	0.005

Next we calculate the correlation for each type of publication (ranks for the calculation of the Spearman correlation go from 1 to 9 as there are 9 disciplines) between the numbers of publications, their AI-values and their F-values. Results are shown in Table 6.

Table 6

Correlation values.

	Pearson			Spearman

	PUB-AI	PUB-F	AI-F	PUB-AI	PUB-F	AI-F
Journal articles VABB-non-WoS	0.873	0.998	0.872	0.183	0.983	0.267
Journal articles VABB-WoS	0.521	0.964	0.704	0.431	0.470	0.750
Book chapters VABB	0.599	0.922	0.852	0.633	0.933	0.783
Edited books VABB	0.621	0.789	0.965	0.500	0.683	0.933
Monographs VABB	0.461	0.645	0.961	0.233	0.533	0.867
Proceedings papers VABB-WoS	0.631	0.724	0.990	0.731	0.849	0.950
Proceedings papers VABB-Non-WoS	0.595	0.734	0.979	0.633	0.800	0.933

Note. PUB stands for number of publications

Generally, correlations between the numbers of publications and the AI-values are the lowest, while correlations for PUB-F and AI-F are roughly of the same level, the case of the Spearman rank-correlation between journal articles in non-WoS journals being an exception. The main lesson to be learned from this example is that numbers of published items per discipline per publication type, relative preference of disciplines for certain publication types (based on the AI-formula) and the corresponding F-measure are different, but to some extent correlated indicators.

Discussion and Conclusion

The criticism on the AI-formula (in general) is not always valid. If in the original table row or column sums are fixed, the criticism does not hold. This is clarified in the Appendix.

Any average, including weighted averages, of (O_CD/O_D) and (O_CD/O_C) satisfies the addition property (Theorem 1). Because of the formal analogy with the F-score from information retrieval and because it is generally agreed that when rates are involved one should use a harmonic mean, we choose this option. In this way we obtain the additional sensitivity benefit that if either O_CD/O_D (the country’s share in the world’s publication output in the given domain D) or O_CD/O_C (the given domain’s share in the country’s publication output) is small also the F-measure for research priority is small. This property does not hold for an arithmetic mean. If deemed necessary one may even consider weighted harmonic means of (O_CD/O_D) and (O_CD/O_C). Another sensitivity aspect relates to the parameters O_D and O_C. If one studies a large domain such as the Natural Sciences or Medicine then the parameter O_D, being for most countries and certainly for most universities, much larger than the parameter O_C, has the largest influence on the actual value of F. On the other hand, if one studies a small specialty then the parameter O_C may have the biggest influence. However, we do not think that actual values of F are of importance but rather changes in value and resulting changes in rankings between comparable units.

Although the AI and its mathematical equivalents, such as the attractivity index, or their monotone transformations such as the relative specialization index, can be used to characterize the contribution or weight of a subsystem to the total system, they can certainly not be used for science policy purposes. The number of publications by country C in domain D during a given publication window (O_CD), the total number of publications in the world in domain D during the same publication window (O_D) and the number of publications—in all domains—by country C during the same publication window (O_C), can be considered as endogenous factors in a science policy model, while the total number of publications in the world and in all domains during this publication window (O_W) is an exogenous factor. For this reason we propose the F-measure as a better and more sensitive policy indicator.

eISSN:: 2543-683X
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 4 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Informationstechnik, Projektmanagement, Datanbanken und Data Mining

Zeitschrift RSS Feed

The F-measure for Research Priority

Article Category: Research Paper

Online veröffentlicht: 13. März 2018

Seitenbereich: 1 - 18

Eingereicht: 01. Nov. 2017

Akzeptiert: 25. Dez. 2017

DOI: https://doi.org/10.2478/jdis-2018-0001

SchlüsselwörterActivity index, Harmonic mean, -measure, Research policy, Endogenous and exogenous factors

© 2018 Walter de Gruyter GmbH, Berlin/Boston

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Figure 1

Schlüsselwörter
Activity index, Harmonic mean, -measure, Research policy, Endogenous and exogenous factors